A Database of Drosophila Genes & Genomes

 

FB2008_02 Release Notes

hide General FlyBase FB2008_02 Statistics
Number of references in FlyBase
187261
Number of research papers
79366
Number of abstracts
36435
Number of personal communications to FlyBase
3959
Number of fly stocks
85568
Number of fly images
870
Drosophila workers registered with FlyBase
7418
hideDrosophila melanogaster (R5.5)
Statistics
Gene records
30965
Genes located to the genome
15186
Genes not located to the genome
15779
Alleles
101739
Alleles of located genes
83174
Alleles of unlocated genes
18565
Aberrations
30209
Deficiencies
19722
Deficiencies with mapped endpoints
13479
Transposable element insertions
89724
Insertions mapped to the sequence
42049
Annotation Release 5.5
Summary of changes from previous release
New Gene Models
1
Restored Gene Models
5
Deleted Gene Models
0
Merged Gene Models
2 -> 1
Split Gene Models
0
hide Annotated Gene Models
Annotated Gene Models
Count
Avg. size
Longest
Shortest
Change
Genes
15186
5381
258567
19
5
Protein coding genes
14146
5727
258567
132
5
Protein coding transcripts
20925
2331
69571
132
101
Exons
68268
477
27725
1
129
Introns
50648
1351
166135
11
91
5' untranslated regions
18755
183
3391
1
73
3' untranslated regions
13065
374
5684
1
84
Unique polypeptides
18055
574
23015
25
94
rRNA genes
161
504
6026
123
0
rRNA
161
504
6026
123
0
tRNA genes
314
75
186
61
0
tRNA
314
73
87
61
0
snRNA genes
47
115
275
36
0
snRNA
47
115
275
36
0
snoRNA genes
249
113
316
46
0
snoRNA
249
113
316
46
0
miRNA genes
90
24
100
19
0
miRNA
90
24
100
19
0
Miscellaneous non-coding RNA genes
88
3015
31065
31
0
Miscellaneous non-coding RNA
105
1182
14084
31
0
Pseudogenes
88
3218
179585
53
0
Transposable elements present in the sequenced strain
5552
1507
66001
23
0
Annotated repeat regions
10159
hide Other Annotated Gene Features
Mapped Nucleotide Changes
Annotated Gene Features
Count
Change
total mapped nucleotide changes
3583
162
aberration junction
193
3
complex substitution
52
4
deletion
225
12
insertion site
48
2
point mutation
2804
140
sequence variant
204
1
TE target site duplication
40
0
uncharacterized change in nucleotide sequence
17
0
Mapped Regulatory Elements
Annotated Gene Features
Count
Change
total mapped regulatory elements
2319
58
enhancer
22
0
poly A site
98
-1
protein binding site
1396
0
regulatory region
240
0
rescue fragment
563
59
Mapped Reagent Features
Annotated Gene Features
Count
Change
transposable element insertion site
42049
927
microarray amplicons
14095
0
dsRNA amplicons
67381
0
BAC
973
15
oligonucleotide
583294
0
hide Aligned Evidence Features
Nucleotide Alignments
Annotated Gene Features
Algorithm
Count
Change
D. melanogaster cDNA inserts
sim4tandem,splign
15980
0
D. melanogaster EST
sim4,splign
503590
0
Other melanogaster DNA sequences
sim4tandem,splign
12673
0
Gene Predictions
Annotated Gene Features
Algorithm
Count
Change
Augustus prediction
Augustus 1.0
12292
0
BATZ Contrast
CONTRAST
14219
0
BATZ Contrast NA
CONTRAST
13589
0
CONGO exons
CONGO
40544
0
DGIL snap
SNAP
19640
0
DGIL snap homology
SNAP
22949
0
Genie prediction
Genie v2.2/flyGenie
11248
0
Genscan prediction
Genscan 1.0
18909
0
NCBI gnomon
GNOMON
19729
0
RGUI geneid
GENEID 1.2
12389
0
RGUI geneid u12
GENEID 1.2
12717
0
Proteins Aligned
Annotated Gene Features
Algorithm
Count
Change
D. melanogaster proteins
WU-blastx 2.0, Prosplign
6133
0
Other insect proteins
WU-blastx 2.0
5195
0
Nematode proteins
WU-blastx 2.0
6361
0
Yeast proteins
WU-blastx 2.0
2170
0
Plant proteins
WU-blastx 2.0
8396
0
Rodent proteins
WU-blastx 2.0
14824
0
Primate proteins
WU-blastx 2.0
13691
0
Other invertebrate proteins
WU-blastx 2.0
13046
0
Other vertebrate proteins
WU-blastx 2.0
10443
0
Other proteins
Prosplign
10672
0
Translated Nucleotide Alignments
Annotated Gene Features
Algorithm
 
 
Insect ESTs
WU-tblastx 2.0
 
 
A. gambiae genomic
WU-tblastx 2.0
 
 
D. pseudoobscura genomic
WU-tblastx 2.0
 
 
hide Release 5 Sequence Assembly

The release 5 sequence by BDGP (see the BDGP release notes) contains some major improvements to the assembly of the major chromosome arm scaffolds as well as improvements to the assembly of those portions of the centric heterochromatin that cannot currently be attached to the major arms. The improvements to the arms include the major differences noted below plus an additional 4.7 Mbp of heterochromatic sequence attached to the proximal ends of the arms. Further, Release_5 is the first non-redundant assembly of the D. melanogaster genome, unifying the previously separate assemblies of the largely euchromatic arm scaffolds and the heterochromatic scaffolds. The entire Release_5 assembly can be downloaded from the BDGP web site.

The Release 5.5 annotation update is a collaboration between FlyBase, BDGP and DHGP, and includes a full set of annotated gene models for the D. melanogaster euchromatin and heterochromatin assemblies. At present, we are working with NCBI on the submission of the Release_5.5 data used in to GenBank, but until that is completed, the latest GenBank version of the D. melanogaster genome is Release_5.1. The table describing Release_5.1 is presented here. We will update this table once the GenBank submission is completed and we have the relevant accession numbers and versions.

TABLE 1: Release_5 Assembly (from BDGP) and Release_5.1 Accessions

Scaffold
Length (bp)
Gaps
Release 5.1
GenBank Accession
Major Difference Compared to Release 4
ArmX
22,422,827
3
AE014298.4
8kb added to the distal end, gaps filled in regions 1-11
Arm2L
23,011,544
2
AE014134.5
591kb added to the proximal end of the arm
Arm2R
21,146,708
1
AE013599.4
380kb added to the proximal end
Arm3L
24,543,557
1
AE014296.4
16kb added on distal end, 718kb added to proximal end, other gaps filled
Arm3R
27,905,053
0
AE014297.2
None
Arm4
1,351,857
1
AE014135.3
70kbp added to the distal end
XHet
204,112
n.a.
n.a.
n.a.
YHet
347,038
n.a.
n.a.
n.a.
2LHet
368,872
n.a.
n.a.
n.a.
2RHet
3,288,761
n.a.
n.a.
n.a.
3LHet
2,555,491
n.a.
n.a.
n.a.
3RHet
2,517,507
n.a.
n.a.
n.a.
ArmU
10,049,037
n.a.
n.a.
n.a.

Release 5 of the euchromatic sequence contains eight (known) gaps. There are two gaps on the X that have estimates for their size and 6 other gaps in the genome which are not sized. (Gaps of unknown size are denoted by 100 N's in the fasta files.). The gap 21485539..21485638 of scaffold Arm2L is the Histone gene cluster which reputedly contains ca. 100 copies of a ca. 5kb repeat unit containing the His1, His2A, His2B, His3 and His4 genes.

TABLE 2: Known Gaps in the Release 5 Assembly

Scaffold
GenBank Accession
Gaps
Notes
ArmX
AE014298.4
111523..129522
sized
21684450..21684549
unsized
21687344..21759343
sized
Arm2L
AE014134.5
21485539..21485638
unsized
22420242..22420341
unsized
Arm2R
AE013599.4
16668213..16668312
unsized
Arm3L
AE014296.4
5107767..5107866
 
Arm3R
AE014297.2
None
 
Arm4
AE014135.3
1221289..1221388
 
hide Location of Heterochromatin
Chr/Arm
Sequence coordinates
X