FB2016_04, released July 28, 2016
 
 

A Database of Drosophila Genes & Genomes

D. melanogaster Release 6 assembly

With the FB2014_04 web site update of FlyBase, we have incorporated the BDGP's new D. melanogaster Release_6 assembly. While much of the euchromatic sequence has remained unchanged, each of the major arm scaffolds 2L, 2R, 3L, 3R and X has had more centric heterochromatic sequences added to the assembly. The sequences of the Release_5 heterochromatic scaffolds 2LHet, 2RHet, 3LHet, 3RHet and XHet have either been incorporated into the major arm scaffolds or are provided as small individual scaffolds with the other remaining unassembled scaffolds. In FlyBase, we are not presenting pseudo arms consisting of concatenation of mapped but unordered and unoriented scaffolds within Release_6 (comparable to arm U). Rather, the assembly contains 1870 total scaffolds including the major arm scaffolds. See Table 1 for information on the scaffolds included in this assembly.

For the first FlyBase release of annotations on Release_6 – annotation release Dmel_Release_6.01, we have migrated the existing release 5.57 annotations to the new assembly with as little change as possible. Assembly differences led to the deletion of 31 non-coding RNA annotations – most of which were repeated copies of rRNA gene fragments and Su(Ste) genes (Table 2). The coding regions of 18 annotations were changed due to differences between the old and new assembly (Table 2). In addition, we migrated as many sequence features and as much aligned evidence and predictions as possible to the new assembly. This includes migration of the non-discrete continuous RNA-Seq data displayed as Topoview graphics in GBrowse. Due to the changes between assemblies, especially in heterochromatic regions of the genome, some features failed to migrate. We anticipate providing newly aligned and analyzed data for several classes of evidence and sequence features in future releases as they become available.

GenBank submissions: Both Dmel_Release_5.57 and Dmel_Release_6.01 have been submitted to GenBank/NCBI so that the two most comparable annotation sets on the former and current assemblies are available as reference RefSeq datasets. The FlyBase FB2014_03 (Dmel_Release_5.57) and FB2014_04 (Dmel_Release_6.01) web sites will be permanently available through our data archives for those who will want to refer back to these transitional datasets. As has always been FlyBase policy, future D. melanogaster gene model annotation will accrue on the current assembly, Dmel_Release_6. We anticipate that Dmel_Release_6.01 will be the stable RefSeq annotation set for the next 12 months.

Converting genome sequence locations between assemblies: We have extended our sequence coordinate converter tool to convert bidirectionally between Dmel_Release_5 and Dmel_Release_6. A downloadable converter (5->6) is also available at our GitHub site. Once the NCBI Dmel_Release_6 is public, NCBI tools for coordinate conversion will also be available. If anyone is having difficulty with coordinate conversion or has any other issues pertaining to the assemblies, we ask you to email FlyBase at our usual Contact FlyBase help form.

Referencing the assemblies and annotation sets: With the transition to Dmel_Release_6, it is especially important in papers and bulk data sets for the authors to indicate which BDGP assembly release and which FlyBase annotation set has been used for analysis and presentation of coordinates.

Table 1. Dmel_Release_6 Scaffolds
2L
2R
3L
3R
4
X
Y
rDNA
2Cen_mapped_Scaffold_10_D1684
2Cen_mapped_Scaffold_43_D1668
2R2_mapped_Scaffold_56_D1828
3Cen_mapped_Scaffold_1_D1896_D1895
3Cen_mapped_Scaffold_27_D1777
3Cen_mapped_Scaffold_31_D1643_D1653_D1791
3Cen_mapped_Scaffold_36_D1605
3Cen_mapped_Scaffold_41_D1641
3Cen_mapped_Scaffold_50_D1686
X3X4_mapped_Scaffold_14_D1732
X3X4_mapped_Scaffold_6_D1712
XY_mapped_Scaffold_42_D1648
XY_mapped_Scaffold_7_D1574
Y_mapped_Scaffold_12_D1771
Y_mapped_Scaffold_15_D1727
Y_mapped_Scaffold_18_D1698
Y_mapped_Scaffold_20_D1762_D1719
Y_mapped_Scaffold_21_D1683_D1693
Y_mapped_Scaffold_23_D1638
Y_mapped_Scaffold_26_D1717
Y_mapped_Scaffold_30_D1720
Y_mapped_Scaffold_34_D1584
Y_mapped_Scaffold_53_D1765
Y_mapped_Scaffold_5_D1748_D1610
Y_mapped_Scaffold_9_D1573
Unmapped_Scaffold_11_D1754
Unmapped_Scaffold_13_D1782
Unmapped_Scaffold_17_D1756_D1775
Unmapped_Scaffold_22_D1753
Unmapped_Scaffold_24_D1707
Unmapped_Scaffold_28_D1723
Unmapped_Scaffold_29_D1705
Unmapped_Scaffold_32_D1773
Unmapped_Scaffold_35_D1599
Unmapped_Scaffold_37_D1608
Unmapped_Scaffold_38_D1625
Unmapped_Scaffold_44_D1670
Unmapped_Scaffold_45_D1673
Unmapped_Scaffold_46_D1675
Unmapped_Scaffold_48_D1678
Unmapped_Scaffold_4_D1555_D1692
Unmapped_Scaffold_51_D1697
Unmapped_Scaffold_52_D1739
Unmapped_Scaffold_54_D1776
Unmapped_Scaffold_58_D1862
Unmapped_Scaffold_60_D1601
Unmapped_Scaffold_8_D1580_D1567
1814 scaffolds with identifiers like 2110000...
Table 2. Annotation changes in Dmel_Release_6.01
Deleted Annotations
CR40507
CR40508
CR40528
CR40546
CR40560
CR40565
CR40572
CR40574
CR40581
CR40597
CR40728
CR40734
CR40779
CR41539
CR41540
CR41604
CR41605
CR41608
CR41617
CR41618
CR42405
CR42413
CR42423
CR42434
CR42435
CR42436
CR42437
CR42440
CR42441
CR42442
CR42443
Genes with changed CDS
FBgn0262795 CG43176
FBgn0261399 Pp1-Y1
FBgn0001314 kl-3
FBgn0001315 kl-5
FBgn0267363 JYalpha
FBgn0262124 uex
FBgn0052823 Sdic3
FBgn0053499 Sdic4
FBgn0046698 Pp1-Y2
FBgn0001313 kl-2
FBgn0046323 Ory
FBgn0010247 Parp
FBgn0085520 CG40801
FBgn0085556 CG41020
FBgn0050271 CG30271
FBgn0040034 CG15831
FBgn0085658 CG41497
FBgn0085521 CG40813