D. melanogaster RNA-Seq Data
We are delighted to announce the incorporation of new genome-wide data types into FlyBase, especially those relying on Next Generation sequencing outputs, and to tell you of our plans for such data sets in the future.
With our March release (FB2010_03), we introduce GBrowse views of D. melanogaster RNA-Seq developmental profile and cell line expression data, and D. pseudoobscura RNA-Seq male vs. female adult head expression data. We also introduce a prediction set for insulator elements based on genome-wide localization of six insulator associated proteins. (Examples of these GBrowse views are presented at the bottom of this Commentary.)We are grateful to contributions from Bryce Daines and Rui Chen of the Baylor College of Medicine, and Susan Celniker and Brent Graveley of the modENCODE Transcriptome Group for the D. melanogaster RNA-Seq profiles, and David Sturgill and Brian Oliver of NIDDK-NIH for the D. pseudoobscura RNA-Seq data. We are grateful to Kevin White and his modENCODE colleagues for the insulator prediction set (Nègre et al., 2010, PLoS Genetics 15: e1000814).
FlyBase is working with high-throughput data providers to ensure that the data sets that we incorporate are stable and will be the reference sets for at least several months and/or represent the reference data sets used for the foundation publications by these groups. We will incorporate these data into FlyBase so that they can be examined in GBrowse and interrogated through our query engines in the context of other community genome-wide and literature-based information. In addition, many of these data-types help inform transcript structures, and so these data will be an important source of information that leads to changes in the FlyBase/RefSeq reference gene model annotation set. In the future, we will incorporate other data such as caged-RNA and 5' RACE data, chromatin marks, protein-binding sites and inferred locations of DNA elements (such as transcription start sites, regions of actively transcribed chromatin, enhancer elements, insulator elements and origins of replication). A principal source of these data will be the Drosophila melanogaster modENCODE project.
Another aspect of our plan for FlyBase is to include sufficient so-called metadata information about experimental conditions, so that FlyBase users can understand the nature of the experiments that underpin the data displayed e.g., source of material, class of experiment, type of assay, major aspects of computational analysis, literature citations or URLs where more detailed methodology information can be found. Finally, when available from the data producers, we will incorporate indications of confidence scores to help our user community understand the reliability of the many data elements.
Because of the nature of these high throughput data, it will frequently be the case that GBrowse views of the data can be posted more rapidly than incorporation of certain data features and/or metadata into the central FlyBase database can be realized. Thus, typically, you can expect to see new GBrowse features for a period (usually several weeks) in advance of fully integrated implementations of these data, including the ability to query on these data types as well as rich links between the GBrowse views and other information in FlyBase. This is the case for the RNA-Seq data, where the GBrowse coverage data and experimental metadata are now incorporated into FlyBase, but the junction (exon-exon join) data are not yet fully incorporated.
To see the GBrowse D. melanogaster RNA-Seq data, go to GBrowse and select [D. melanogaster RNA-Seq Data] in the data source pull down menu. The D. pseudoobscura RNA-Seq data can be found by changing to the [D. pseudoobscura] data source. Finally, the insulator predictions can be found in the [D. melanogaster] data source by selecting the "Mapped Features" called "Insulator class I" and "Insulator class II". These insulator positions can also be viewed in the context of the RNA-Seq data by selecting the appropriate tracks in the [D. melanogaster RNA-Seq Data] set.
Here are the sorts of data that you will see:
D. melanogaster RNA-Seq:
D. pseudoobscura RNA-Seq:
Insulator Prediction Data: