A Database of Drosophila Genes & Genomes

 

FB2007_03 Release Notes

hide The FlyBase 2007_03 Update

This release is a full update to the database that adds new records to most classes of FlyBase object. In addition, new data has been added to existing records. While no new gene annotations were added in this release there have been significant changes in annotated transcripts of existing gene models.

Drosophila melanogaster Release 5.4 Annotation Update.

The release 5 sequence by BDGP (see the BDGP release notes) contains some major improvements to the assembly of the major chromosome arm scaffolds as well as improvements to the assembly of those portions of the centric heterochromatin that cannot currently be attached to the major arms. The improvements to the arms include the major differences noted below plus an additional 4.7 Mbp of heterochromatic sequence attached to the proximal ends of the arms. Further, Release_5 is the first non-redundant assembly of the D. melanogaster genome, unifying the previously separate assemblies of the largely euchromatic arm scaffolds and the heterochromatic scaffolds. The entire Release_5 assembly can be downloaded from the BDGP web site.

The Release 5.4 annotation update is a collaboration between FlyBase, BDGP and DHGP, and includes a full set of annotated gene models for the D. melanogaster euchromatin and heterochromatin assemblies.

The latest Genbank version of the D. melanogaster genome is Release_5.2 which was released in September and October of this year.

TABLE 1: Release_5 Assembly (from BDGP) and Release_5.2 Accessions

Scaffold
Length (bp)
Gaps
Release 5.1
GenBank Accession
Major Difference Compared to Release 4
ArmX
22,422,827
3
AE014298.4
8kb added to the distal end, gaps filled in regions 1-11
Arm2L
23,011,544
2
AE014134.5
591kb added to the proximal end of the arm
Arm2R
21,146,708
1
AE013599.4
380kb added to the proximal end
Arm3L
24,543,557
1
AE014296.4
16kb added on distal end, 718kb added to proximal end, other gaps filled
Arm3R
27,905,053
0
AE014297.2
None
Arm4
1,351,857
1
AE014135.3
70kbp added to the distal end
XHet
204,112
n.a.
CM000460.1
n.a.
YHet
347,038
n.a.
CM000461.1
n.a.
2LHet
368,872
n.a.
CM000456.1
n.a.
2RHet
3,288,761
n.a.
CM000457.1
n.a.
3LHet
2,555,491
n.a.
CM000458.1
n.a.
3RHet
2,517,507
n.a.
CM000459.1
n.a.
ArmU
10,049,037
n.a.
DS483543-DS486008
n.a. (Arm U was not assembled into an ultrascaffold by Genbank.)

Release 5 of the euchromatic sequence contains eight (known) gaps. There are two gaps on the X that have estimates for their size and 6 other gaps in the genome which are not sized. (Gaps of unknown size are denoted by 100 N's in the fasta files.). The gap 21485539..21485638 of scaffold Arm2L is the Histone gene cluster which reputedly contains ca. 100 copies of a ca. 5kb repeat unit containing the His1, His2A, His2B, His3 and His4 genes.

TABLE 2: Known Gaps in the Release 5 Assembly

Scaffold
GenBank Accession
Gaps
Notes
ArmX
AE014298.4
111523..129522
sized
21684450..21684549
unsized
21687344..21759343
sized
Arm2L
AE014134.5
21485539..21485638
unsized
22420242..22420341
unsized
Arm2R
AE013599.4
16668213..16668312
unsized
Arm3L
AE014296.4
5107767..5107866
 
Arm3R
AE014297.2
None
 
Arm4
AE014135.3
1221289..1221388
 

For details on new, split and merged gene models see the Annotation release 5.4 statistics of the Drosophila Melanogaster(R5.4) section below.

hide General FlyBase Statistics
Number of references in FlyBase
184921
Number of research papers
77773
Number of abstracts
36284
Number of personal communications to FlyBase
3809
Number of fly stocks
85084
Number of fly images
870
Drosophila workers registered with FlyBase
7394
hide Drosophila melanogaster (R5.4)
Statistics
Gene records
30971
Genes located to the genome
15181
Genes not located to the genome
15790
Alleles
87725
Alleles of located genes
69154
Alleles of unlocated genes
18571
Aberrations
30101
Deficiencies
19614
Deficiencies with mapped endpoints
13479
Transposable element insertions
67221
Insertions mapped to the sequence
40466
Annotation Release 5.4
Summary of changes from previous release
New Gene Models
0
Restored Gene Models
0
Deleted Gene Models
4
Merged Gene Models
0
Split Gene Models
0
Unchanged polypeptides
20451
hide Annotated Gene Models
Annotated Gene Models
Count
Avg. size
Longest
Shortest
Change
Genes
15181
5361
258567
19
-4
Protein coding genes
14141
5706
258567
132
-3
Protein coding transcripts
20823
2321
69571
132
85
Exons
68139
477
27725
1
123
Introns
50557
1343
166135
11
82
5' untranslated regions
18682
184
3391
1
64
3' untranslated regions
12981
375
5684
1
107
Unique polypeptides
17961
571
23015
9
66
rRNA genes
161
504
6026
123
-3
rRNA
161
504
6026
123
-3
tRNA genes
314
75
186
61
0
tRNA
314
73
87
61
0
snRNA genes
47
115
275
36
0
snRNA
47
115
275
36
0
snoRNA genes
249
113
316
46
0
snoRNA
249
113
316
46
0
miRNA genes
90
24
100
19
0
miRNA
90
24
100
19
0
Miscellaneous non-coding RNA genes
88
3015
31065
31
0
Miscellaneous non-coding RNA
105
1182
14084
31
0
Pseudogenes
88
3218
179585
53
0
Transposable elements present in the sequenced strain
5552
1507
66001
23
0
Annotated repeat regions
10159
hide Other Annotated Gene Features
Mapped Nucleotide Changes
Annotated Gene Features
Count
Change
total mapped nucleotide changes
3421
0
aberration junction
190
0
complex substitution
48
0
deletion
213
0
insertion site
46
0
point mutation
2664
0
sequence variant
203
0
TE target site duplication
40
0
uncharacterized change in nucleotide sequence
17
0
Mapped Regulatory Elements
Annotated Gene Features
Count
Change
total mapped regulatory elements
2261
0
enhancer
22
0
poly A site
99
0
protein binding site
1396
0
regulatory region
240
0
rescue fragment
504
0
Mapped Reagent Features
Annotated Gene Features
Count
Change
transposable element insertion site
41122
1733
microarray amplicons
14095
0
dsRNA amplicons
67381
0
BAC
958
0
oligonucleotide
583294
0
hide Aligned Evidence Features
Nucleotide Alignments
Annotated Gene Features
Algorithm
Count
Change
D. melanogaster cDNA inserts
sim4tandem,splign
15499
0
D. melanogaster EST
sim4,splign
502593
0
Other melanogaster DNA sequences
sim4tandem,splign
12586
0
Gene Predictions
Annotated Gene Features
Algorithm
Count
Change
Augustus prediction
Augustus 1.0
12292
0
BATZ Contrast
CONTRAST
14219
0
BATZ Contrast NA
CONTRAST
13589
0
CONGO exons
CONGO
40544
0
DGIL snap
SNAP
19640
0
DGIL snap homology
SNAP
22949
0
Genie prediction
Genie v2.2/flyGenie
11248
0
Genscan prediction
Genscan 1.0
18909
0
NCBI gnomon
GNOMON
19729
0
RGUI geneid
GENEID 1.2
12389
0
RGUI geneid u12
GENEID 1.2
12717
0
Proteins Aligned
Annotated Gene Features
Algorithm
Count
Change
D. melanogaster proteins
WU-blastx 2.0, Prosplign
6133
0
Other insect proteins
WU-blastx 2.0
5195
0
Nematode proteins
WU-blastx 2.0
6361
0
Yeast proteins
WU-blastx 2.0
2170
0
Plant proteins
WU-blastx 2.0
8396
0
Rodent proteins
WU-blastx 2.0
14824
0
Primate proteins
WU-blastx 2.0
13691
0
Other invertebrate proteins
WU-blastx 2.0
13046
0
Other vertebrate proteins
WU-blastx 2.0
10443
0
Other proteins
Prosplign
10672
0
Translated Nucleotide Alignments
Annotated Gene Features
Algorithm
 
 
Insect ESTs
WU-tblastx 2.0
 
 
A. gambiae genomic
WU-tblastx 2.0
 
 
D. pseudoobscura genomic
WU-tblastx 2.0
 
 
hide Location of Heterochromatin
Chr/Arm
Sequence coordinates
X
22030326..22422827
2L
22000975..23011544
2R
1..1285689
3L
22955576..24543557
3R
1..378656
hide Known Mutations in the Sequenced Strain

The sequenced strain, usually described as the y1; cn1 bw1 sp1 strain, was known to carry mutations in those four genes. During annotation, mutations in other genes have been discovered (currently known are mutations in oc, LysC, lab, MstProx, GstD5, Rh6, Gr22b, Gr22d, Or98b and CG33964). To allow compilation of a comprehensive proteome, wild-type protein sequences for these genes have been included in sequence entries to GenBank/EMBL/DDBJ. Wherever possible, a RefSeq accession based on an alternative wild-type sequence and curated as a FlyBase Annotated Genome Sequence (ARGS) has been provided.

hide Drosophila pseudoobscura (R2.0)
Statistics
Gene records
12713
Genes located to the genome
12192
Genes not located to the genome
521
Alleles
316
Alleles of located genes
20
Alleles of unlocated genes
296
Aberrations
61
Deficiencies
1
Deficiencies with mapped endpoints
0
Transposable element insertions
0
Annotation Release 2.0
Summary of changes from previous release
New Gene Models
0
Restored Gene Models
0
Deleted Gene Models
0
Merged Gene Models
0
Split Gene Models
0
Unchanged polypeptides
9868
hide Annotated Gene Models
Annotated Gene Models
Count
Avg. size
Longest
Shortest
Change
Genes
12192
4479
154903
37
0
Genes with annotated transcripts
9868
4545
154903
150
0
Protein coding transcripts
9868
1554
23133
133
0
Exons
39795
384
13161
1
0
Introns
29904
541
53699
1
0
5' untranslated regions
0
NA
NA
NA
0
3' untranslated regions
0
NA
NA
NA
0
Unique polypeptides
9871
517
7711
44
0
hide Other Annotated Gene Features
Other Annotated Features
Count
Avg. size
Longest
Shortest
 
Syntenic regions
1013
95325
1262829
105
 
hide Aligned Evidence Features
Nucleotide Alignments
Annotated Gene Features
Algorithm
Count
Change
D. pseudoobscura ESTs
blastn
34292
0
Gene Predictions
Annotated Gene Features
Algorithm
Count
Change
BATZ Contrast NA
CONTRAST
16158
0
BREN N-Scan
N-Scan
17088
0
DGIL snap
SNAP
23678
0
DGIL snap homology
SNAP
21676
0
EISE exonerate
EXONERATE
34828
0
EISE genemapper
GENE MAPPER
32857
0
EISE genewise
Genewise
42938
0
GLEANR consensus
GLEANR
17328
0
Genewise
Genewise
17882
0
Genscan
Genscan 1.0
16829
0
NCBI gnomon
GNOMON
19259
0
OXFD exonerate
EXONERATE
11690
0
PACH genemapper
GENE MAPPER
17265
0
RGUI geneid
GENEID 1.2
19060
0
RGUI geneid u12
GENEID 1.2
18970
0
Twinscan
Twinscan
18082
0