D. melanogaster Release 6 assembly
With the FB2014_04 web site update of FlyBase, we have incorporated the BDGP's new D. melanogaster Release_6 assembly. While much of the euchromatic sequence has remained unchanged, each of the major arm scaffolds 2L, 2R, 3L, 3R and X has had more centric heterochromatic sequences added to the assembly. The sequences of the Release_5 heterochromatic scaffolds 2LHet, 2RHet, 3LHet, 3RHet and XHet have either been incorporated into the major arm scaffolds or are provided as small individual scaffolds with the other remaining unassembled scaffolds. In FlyBase, we are not presenting pseudo arms consisting of concatenation of mapped but unordered and unoriented scaffolds within Release_6 (comparable to arm U). Rather, the assembly contains 1870 total scaffolds including the major arm scaffolds. See Table 1 for information on the scaffolds included in this assembly.
For the first FlyBase release of annotations on Release_6 – annotation release Dmel_Release_6.01, we have migrated the existing release 5.57 annotations to the new assembly with as little change as possible. Assembly differences led to the deletion of 31 non-coding RNA annotations – most of which were repeated copies of rRNA gene fragments and Su(Ste) genes (Table 2). The coding regions of 18 annotations were changed due to differences between the old and new assembly (Table 2). In addition, we migrated as many sequence features and as much aligned evidence and predictions as possible to the new assembly. This includes migration of the non-discrete continuous RNA-Seq data displayed as Topoview graphics in GBrowse. Due to the changes between assemblies, especially in heterochromatic regions of the genome, some features failed to migrate. We anticipate providing newly aligned and analyzed data for several classes of evidence and sequence features in future releases as they become available.
GenBank submissions: Both Dmel_Release_5.57 and Dmel_Release_6.01 have been submitted to GenBank/NCBI so that the two most comparable annotation sets on the former and current assemblies are available as reference RefSeq datasets. The FlyBase FB2014_03 (Dmel_Release_5.57) and FB2014_04 (Dmel_Release_6.01) web sites will be permanently available through our data archives for those who will want to refer back to these transitional datasets. As has always been FlyBase policy, future D. melanogaster gene model annotation will accrue on the current assembly, Dmel_Release_6. We anticipate that Dmel_Release_6.01 will be the stable RefSeq annotation set for the next 12 months.
Converting genome sequence locations between assemblies: We have extended our sequence coordinate converter tool to convert bidirectionally between Dmel_Release_5 and Dmel_Release_6. A downloadable converter (5->6) is also available at our GitHub site. Once the NCBI Dmel_Release_6 is public, NCBI tools for coordinate conversion will also be available. If anyone is having difficulty with coordinate conversion or has any other issues pertaining to the assemblies, we ask you to email FlyBase at our usual Contact FlyBase help form.
Referencing the assemblies and annotation sets: With the transition to Dmel_Release_6, it is especially important in papers and bulk data sets for the authors to indicate which BDGP assembly release and which FlyBase annotation set has been used for analysis and presentation of coordinates.
|Table 1. Dmel_Release_6 Scaffolds|
|1814 scaffolds with identifiers like 2110000...|
|Table 2. Annotation changes in Dmel_Release_6.01|
|Genes with changed CDS|