Poly(A)+ RNA was subjected to first strand cDNA synthesis. Second strand cDNA synthesis was done in the presence of UTP to generate stranded cDNA libraries. Double-stranded cDNA was end-repaired and ligated to adapters. After size selection (200-600bp) and UDG-ase treatment for strand specificity, DNA was PCR amplified and the quality assessed by Agilent bioanalyzer.
Strand-specific paired end reads were filtered for rRNA sequences. Filtered reads were then aligned using TopHat (v1.4.1) against the D. melanogaster genome (FlyBase r5.44) and a maximum of six mismatches. Introns between 20 and 150,000 bp were allowed. snRNA, rRNA, tRNA, snoRNA and pseudogenes were masked.
FlyBase reports gene expression levels calculated from RNA-Seq coverage data as RPKM (reads per kilobase of exon model per million mapped reads). The RPKM value is calculated as follows. The uniquely transcribed region(s) for each gene is determined by taking regions covered by exons of the gene and excluding transcribed regions from any overlapping genes, both with respect to genes lying on same strand (for calculation using strand-specific RNA-Seq coverage data), and for genes on either strand (for calculation using unstranded RNA-Seq coverage data). RNA-Seq coverage read-count data was then correlated by location with the uniquely transcribed region(s) of each gene to produce the sum of reads over the entire uniquely transcribed region for the gene. Reads per kilobase of exon model per million mapped reads (RPKM) was then calculated using the method from Motazavi et al, Nat. Methods 6, 621-628 (2008). (RPKM = 10^9 * C / N * L * R, where C = number of reads in gene, N = number of uniquely mappable reads in the experiment, L = sum of uniquely transcribed bases in bp, and R = read length in bp).
The RPKM values are binned into eight expression levels: Bin 0: No/Extremely low expression (0); Bin 1: Very low expression (1-3), percentiles 1-25, approximately; Bin 2: Low expression (4-10), percentiles 26-50, approximately; Bin 3: Moderate expression (11-25), percentiles 51-75, approximately; Bin 4: Moderately high expression (26-50), percentiles 76-85, approximately; Bin 5: High expression (51-100), percentiles 86-95, approximately; Bin 6: Very high expression (101-1000), percentiles 96-99, approximately; Bin 7: Extremely high (>1000), the 100th percentile, approximately.
FlyBase RPKM data for all genes can be downloaded from the FlyBase Downloads page (link in the blue navigation bar at the top of all FlyBase web pages).