FB2014_03, released May 9th, 2014
 
 

A Database of Drosophila Genes & Genomes

FB2007_01 Release Notes

hide THE FLYBASE 2007_01 UPDATE

This release consists of a full update of FlyBase data which incorporates additions and changes to several classes of data including; the bibliography, genetic and molecular information curated from the literature, fly stocks and images, Drosophila melanogaster annotated gene models and reagent sets that have been mapped to the melanogaster genome sequence. In addition, the genome sequences and gene prediction sets of 10 newly sequenced Drosophila species can be viewed via the GBrowse genome browser.

Drosophila melanogaster Release 5.2 Annotation Update.

The release 5 sequence by BDGP (see the BDGP release notes) contains some major improvements to the assembly of the major chromosome arm scaffolds as well as improvements to the assembly of those portions of the centric heterochromatin that cannot currently be attached to the major arms. The improvements to the arms include the major differences noted below plus an additional 4.7 Mbp of heterochromatic sequence attached to the proximal ends of the arms. Further, Release_5 is the first non-redundant assembly of the D. melanogaster genome, unifying the previously separate assemblies of the largely euchromatic arm scaffolds and the heterochromatic scaffolds. The entire Release_5 assembly can be downloaded from the BDGP web site.

The Release 5.2 annotation update is a collaboration between FlyBase, BDGP and DHGP, and includes a full set of annotated gene models for the D. melanogaster euchromatin and heterochromatin assemblies. At present, we are working with NCBI on the submission of the Release_5.2 data used in FB2007_01 to GenBank, but until that is completed, the latest GenBank version of the D. melanogaster genome is Release_5.1. The table describing Release_5.1 is presented here. We will update this table once the GenBank submission is completed and we have the relevant accession numbers and versions.

TABLE 1: Release_5 Assembly (from BDGP) and Release_5.1 Accessions

Scaffold
Length (bp)
Gaps
Release 5.1
GenBank Accession
Major Difference Compared to Release 4
ArmX
22,422,827
3
AE014298.4
8kb added to the distal end, gaps filled in regions 1-11
Arm2L
23,011,544
2
AE014134.5
591kb added to the proximal end of the arm
Arm2R
21,146,708
1
AE013599.4
380kb added to the proximal end
Arm3L
24,543,557
1
AE014296.4
16kb added on distal end, 718kb added to proximal end, other gaps filled
Arm3R
27,905,053
0
AE014297.2
None
Arm4
1,351,857
1
AE014135.3
70kbp added to the distal end
XHet
204,112
n.a.
n.a.
n.a.
YHet
347,038
n.a.
n.a.
n.a.
2LHet
368,872
n.a.
n.a.
n.a.
2RHet
3,288,761
n.a.
n.a.
n.a.
3LHet
2,555,491
n.a.
n.a.
n.a.
3RHet
2,517,507
n.a.
n.a.
n.a.
ArmU
10,049,037
n.a.
n.a.
n.a.

Release 5 of the euchromatic sequence contains eight (known) gaps. There are two gaps on the X that have estimates for their size and 6 other gaps in the genome which are not sized. (Gaps of unknown size are denoted by 100 N's in the fasta files.). The gap 21485539..21485638 of scaffold Arm2L is the Histone gene cluster which reputedly contains ca. 100 copies of a ca. 5kb repeat unit containing the His1, His2A, His2B, His3 and His4 genes.

TABLE 2: Known Gaps in the Release 5 Assembly

Scaffold
GenBank Accession
Gaps
Notes
ArmX
AE014298.4
111523..129522
sized
21684450..21684549
unsized
21687344..21759343
sized
Arm2L
AE014134.5
21485539..21485638
unsized
22420242..22420341
unsized
Arm2R
AE013599.4
16668213..16668312
unsized
Arm3L
AE014296.4
5107767..5107866
 
Arm3R
AE014297.2
None
 
Arm4
AE014135.3
1221289..1221388
 

For details on new, split and merged gene models see the Annotation release 5.2 statistics of the DROSOPHILA MELANOGASTER(R5.2) section below.

hide GENERAL FLYBASE STATISTICS
Number of references in FlyBase
184745
Number of research papers
78122
Number of abstracts
35866
Number of personal communications to FlyBase
3778
Drosophila workers registered with FlyBase
7415
Number of fly stocks
85022
Number of fly images
870
hide DROSOPHILA MELANOGASTER (R5.2)
Statistics
Gene records
30887
Genes located to the genome
15185
Genes not located to the genome
15702
Alleles
86844
Alleles of located genes
68097
Alleles of unlocated genes
18747
Aberrations
30086
Deficiencies
19615
Deficiencies with mapped endpoints
13052
Transposable element insertions
72293
Insertions mapped to the sequence
67227
Annotation release 5.2
Summary of changes from previous release
New Gene Models
698
Restored Gene Models
7
Deleted Gene Models
195
Merged Gene Models
205 old -> 90 new
Split Gene Models
21 old -> 43 new
Unchanged peptides
18944
hide ANNOTATED GENE MODELS
Annotated Gene Models
Count
Avg. size
Longest
Shortest
Change
Genes
15185
5351
258567
19
+526
Protein coding genes
14218
7140
224977
67
+362
Protein coding transcripts
20728
2315
69571
132
+943
Exons
68007
477
27725
1
+2841
Introns
49729
1265
132737
11
+1188
5' Untranslated regions
18385
185
3391
1
+657
3' Untranslated regions
12757
375
5684
1
+521
Unique peptides
17888
570
23015
25
+794
rRNA
164
512
1325
133
+66
tRNA
314
75
186
61
0
snRNA
47
115
275
36
0
snoRNA
268
116
316
46
+203
miRNA
93
24
100
19
+1
miscellaneous non-coding RNA
88
3015
31065
31
+9
pseudogenes
88
3218
13064
53
+37
Transposable Elements Present in the Sequenced Strain
6170
1369
66001
21
+168
Annotated repeat regions
9419
 
 
 
 
hide OTHER ANNOTATED GENE FEATURES
MAPPED NUCLEOTIDE CHANGES
Annotated Gene Features
Count
Change
total mapped nucleotide changes
3421
+803
aberration junction
190
+30
complex substitution
48
+30
deletion
213
+140
insertion site
46
+25
point mutation
2664
+680
sequence variant
203
-141
TE target site duplication
40
+36
uncharacterized change in nucleotide sequence
17
+3
MAPPED REGULATORY ELEMENTS
Annotated Gene Features
Count
Change
total mapped regulatory elements
2273
+304
enhancer
22
-10
poly A site
99
-19
protein binding site
1396
+16
regulatory region
251
+47
rescue fragment
504
+264
signal peptide
1
0
MAPPED REAGENT FEATURES
Annotated Gene Features
Count
 
transposable element insertion site
67227
 
microarray oligonucleotide
583294
 
microarray amplicons
14095
 
dsRNA amplicons
67381
 
BAC
958
 
hide ALIGNED EVIDENCE FEATURES
NUCLEOTIDE ALIGNMENTS
Annotated Gene Features
Algorithm
Count
 
D. melanogaster cDNA inserts
sim4tandem,splign
14476
 
D. melanogaster EST (total)
sim4
314062
 
EST from sequenced strain
sim4
146413
 
EST from different strains
sim4
167549
 
Other melanogaster DNA sequences
sim4tandem
13433
 
GENE PREDICTIONS
Annotated Gene Features
Algorithm
Count
 
Genie prediction
Genie v2.2/flyGenie
11248
 
Genscan prediction
Genscan 1.0
18909
 
Augustus prediction
Augustus 1.0
12292
 
BATZ Contrast NA
CONTRAST
13589
 
BATZ Contrast
CONTRAST
14219
 
CONGO exons
CONGO
40544
 
DGIL snap
SNAP
19640
 
DGIL snap homology
SNAP
22949
 
NCBI gnomon
GNOMON
19729
 
RGUI geneid
GENEID 1.2
12389
 
RGUI geneid u12
GENEID 1.2
12717
 
PROTEINS ALIGNED
Annotated Gene Features
Algorithm
Count
 
D. melanogaster proteins
WU-blastx 2.0, Prosplign
18576
 
Other Insect proteins
WU-blastx 2.0
7076
 
Nematode proteins
WU-blastx 2.0
6361
 
Yeast proteins
WU-blastx 2.0
2170
 
Plant proteins
WU-blastx 2.0
8396
 
Rodent proteins
WU-blastx 2.0
14824
 
Primate proteins
WU-blastx 2.0
13691
 
Other invertebrate proteins
WU-blastx 2.0
13070
 
Other vertebrate proteins
WU-blastx 2.0
10443
 
Other proteins
Prosplign
9873
 
TRANSLATED NUCLEOTIDE ALIGNMENTS
Annotated Gene Features
Algorithm
 
 
Insect ESTs
WU-tblastx 2.0
 
 
A. gambiae genomic
WU-tblastx 2.0
 
 
D. pseudoobscura genomic
WU-tblastx 2.0
 
 
hide LOCATION OF HETEROCHROMATIN
Chr/Arm
Sequence coordinates
X
22030326..22422827
2L
22000975..23011544
2R
1..1285689
3L
22955576..24543557
3R
1..378656
hide KNOWN MUTATIONS IN THE SEQUENCED STRAIN

The sequenced strain, usually described as the y1; cn1 bw1 sp1 strain, was known to carry mutations in those four genes. During annotation, mutations in other genes have been discovered (currently known are mutations in oc, LysC, lab, MstProx, GstD5, Rh6, Gr22b, Gr22d, Or98b and CG33964). To allow compilation of a comprehensive proteome, wild-type protein sequences for these genes have been included in sequence entries to GenBank/EMBL/DDBJ. Wherever possible, a RefSeq accession based on an alternative wild-type sequence and curated as a FlyBase Annotated Genome Sequence (ARGS) has been provided.

hide DROSOPHILA PSEUDOOBSCURA (R2.0)
Statistics
Gene records
12705
Genes located to the genome
12192
Genes not located to the genome
513
Alleles
316
Alleles of located genes
18
Alleles of unlocated genes
298
Aberrations
61
Deficiencies
1
Deficiencies with mapped endpoints
0
Transposable element insertions
0
Annotation release 2.0
Summary of changes from previous release
New Gene Models
0
Restored Gene Models
0
Deleted Gene Models
5
Merged Gene Models
0
Split Gene Models
0
Unchanged peptides
9868
hide ANNOTATED GENE MODELS
Annotated Gene Models
Count
Avg. size
Longest
Shortest
Change
Genes
12192
4479
154903
37
-5
Genes with annotated transcripts
9868
4541
154903
150
-3
Protein coding transcripts
9868
3195
61441
150
-3
Exons
39786
384
13161
1
-9
Introns
29424
537
47157
1
-6
5' Untranslated regions
0
NA
NA
NA
0
3' Untranslated regions
0
NA
NA
NA
0
Unique peptides
9868
517
7711
44
-3
hide OTHER ANNOTATED GENE FEATURES
Other Annotated Features
Count
Avg. size
Longest
Shortest
 
Syntenic regions
1101
106917
1262829
105
 
hide ALIGNED EVIDENCE FEATURES
NUCLEOTIDE ALIGNMENTS
Annotated Gene Features
Algorithm
Count
 
D. pseudoobscura ESTs
blastn
34292
 
GENE PREDICTIONS
Annotated Gene Features
Algorithm
Count
 
Genscan
Genscan 1.0
16829
 
Genewise
Genewise
17882
 
Twinscan
Twinscan
18082
 
BATZ Contrast NA
CONTRAST
16158
 
DGIL snap
SNAP
23678
 
DGIL snap homology
SNAP
21676
 
EISE exonerate
EXONERATE
34828
 
EISE genemapper
GENE MAPPER
32857
 
EISE genewise
Genewise
42938
 
NCBI gnomon
GNOMON
19259
 
OXFD exonerate
EXONERATE
11690
 
PACH genemapper
GENE MAPPER
17265
 
RGUI geneid
GENEID 1.2
19060
 
RGUI geneid u12
GENEID 1.2
18970