Newly aligned modENCODE RNA-Seq coverage data
Sue Celniker's group has provided FlyBase with updated RNA-Seq coverage data for the modENCODE transcriptome datasets. The high-throughput sequencing reads from these experiments have been re-aligned to the new BDGP Release 6 reference genome assembly (NCBI accession GCA_000001215.4). This update includes new RNA-Seq coverage profiles for unpublished Sindbis virus treatments. These re-aligned data are part of FlyBase update FB2014_06, and replace the “Release 5-to-Release 6” lift-over RNA-Seq data in place for the past two Release 6-based FlyBase updates (FB2014_04, FB2014_05).
The great advantage of these re-aligned data is that RNA-Seq coverage data is now available for new regions of the genome assembly. This improvement is notable in regions of centric heterochromatin, which have undergone substantial revision for the new Release 6 genome assembly. The re-aligned data are also particularly important for previously fragmented gene annotations that have been properly assembled in Release 6. Take JYalpha (FBgn0267363) as an example: the re-aligned RNA-Seq data (bottom panel) provide a more comprehensive transcription profile for the 3' end than the previous lift-over RNA-Seq profile (top panel).
The re-aligned modENCODE transcriptome data include the developmental profile of Graveley et al. (FBrf0213330), the cell line profile of Cherbas et al. (FBrf0213077) and the tissue and treatment profiles of Brown et al. (FBrf0225793). Included among the treatment profiles are four new, unpublished RNA-Seq coverage profiles for Sindbis virus treatment samples (described in a personal communication to FlyBase, FBrf0226107). These re-aligned RNA-Seq data can be accessed through GBrowse, and are used to calculate RPKM expression values. These RPKM values are available on FlyBase gene reports, in the “RNA-Seq RPKM values” precomputed file (http://flybase.org/static_pages/downloads/bulkdata7.html), and provide the basis for the RNA-Seq Search tool (http://flybase.org/static_pages/rna-seq/rna-seq_search.html).
Note that only reads mapping to unique regions of the genome are included in the new re-aligned data. As such, regions of duplication and multi-copy genes will have lower coverage values compared to the previous lift-over RNA-Seq profiles. For such regions, users may prefer to use the previous lift-over RNA-Seq profiles, accessible in the archived FB2014_04, R6.01 version of FlyBase (http://fb2014_04.flybase.org/).
For details of the re-alignment protocol, including accessions for the datasets used, see Sue Celinker's personal communication to FlyBase and its associated file (FBrf0226107). Experimental details for the datasets can be found in the FlyBase Dataset reports: developmental profile (FBlc0000085), cells (FBlc0000260), tissues (FBlc0000206) and treatments (FBlc0000236). See the current FlyBase Release Notes for more details on the BDGP Release 6 reference genome assembly (http://flybase.org/static_pages/docs/release_notes.html).