Dataset mE_Transcription_Start_Sites
| General Information | |||
|---|---|---|---|
| Name | mE_Transcription_Start_Sites | Species | D. melanogaster |
| Dataset type | genomic sequence feature | FlyBase ID | FBlc0000202 |
| Source & Content | |||
| Consists of |
Genomic sequences identified by integrative analysis of ESTs, CAGE or RLM-RACE.
|
||
| Created by | |||
| Available from |
Not available as reagents.
|
||
| Strain |
iso-1
|
Stage & tissue | |
|
Stage
Tissue/Position (including subcellular localization)
Reference
Comment:0-24 hr AEL
|
|||
| Cell Line |
|
||
Recent Updates
|
|||
| Description |
What does this section display?
This section contains items that were added to this record for each release.
It currently only tracks new links between this FlyBase report and other
FlyBase data classes (e.g. genes, references, stocks) or controlled
vocabulary terms (e.g. GO, anatomy terms).
What does this section not display?
This section does not currently display links that were removed or gene model changes.
|
||
| Update Feed |
Click the icon below to subscribe to this FlyBase record and receive updates automatically through your
feed reader.
|
||
| FB2013_03 | |||
| FB2013_02 | |||
| All updates | Click here to see a list of all updates to this record from FB2010_08 and on. | ||
Description & Members
|
|||
| Description |
Genomic sequences identified as transcription start sites (TSS); a synthesis of expressed sequence tags (ESTs), cap analysis
of gene expression tags (CAGE) and RNA ligase mediated rapid amplification of cDNA ends (RLM-RACE).
|
||
| Parent collections |
|||
| Component collection(s) |
|||
| Number in collection | |||
| Comment on number in collection | |||
| Members | |||
Experimental protocol
|
|||
| Vector | |||
| Sample preparation |
See component data set reports for details.
|
||
| Collection preparation |
See component data set reports for details.
|
||
| Mode of assay |
See component data set reports for details.
|
||
| Assay platform |
See component data set reports for details.
|
||
| Data analysis |
Characterization of TSS distributions: within each TSS the distribution of tags from each of the three assays was modeled
as a multinomial distribution, each bin corresponding to a single nucleotide. Each assay tended to provide tag distributions
"shifted" by 1 or 2 bp from each other assay. The smoothed distributions across the three assays were combined to obtain consensus
probability density functions (PDFs) for each TSS. A shape index (SI) was calculated for each TSS; the SI is analogous to
the thermodynamic entropy of a system and quantifies the number of states occupied by the system (the tag heights and locations)
and the total possible states (the entire promoter region). A shape index value of -1 was somewhat arbitrarily chosen to separate
TSSs into discrete classes: 2337 "peaked" TSSs with SI > -1, 6607 "broad" TSSs with SI <= -1 and 3456 TSSs designated as "unclassified"
due to either low tag count (2487 TSSs) or class-instability (982 TSSs).
Classifying TSS evidentiary support: TSSs were grouped based on evidentiary support into either validated (V), supported (S)
or RACE-only (R). The validated set (8694 TSSs) is defined by two or more data types (5477 TSSs have all three data types).
The supported set (3062 TSSs) is defined by either a CAGE peak or at least three RACE reads overlapping a 5' UTR. The RACE-only
set (698 TSSs) is defined by three or more RACE reads with no support from an overlapping 5' UTR. The majority of unsupported
CAGE peaks are likely associated with other phenomena, and not with bona fide transcription initiation sites.
Identification of tag clusters: an iterative hierarchical clustering procedure was devised to group tags into TSS regions
and applied to the RE EST, CAGE, and RACE data sets independently. These clusters were then integrated to produce consensus
clusters based on the tags from all three data sets. 12,454 TSSs (promoters) were identified, of which 11,672 TSSs were associated
with 8037 gene annotations (FlyBase release 5.12, October 2008) by a progressive strategy: first, peaks were associated with
5' UTRs, then with regions within 100 bp of a 5' transcript end, followed by 3' UTRs, introns, protein-coding exons and finally
other annotations (e.g., pseudogenes and regions within 100 bp of a 3' end). The remaining TSSs were classified as intergenic.
Mapping of tags to the genome: 66,169 cap-trapped and normalized RE ESTs (FBrf0152058) were reanalyzed to ensure accurate vector trimming and genomic alignment, of which 61,429 RE ESTs were mapped uniquely to
the genome. An EST was associated with a gene if an EST alignment shared genomic coordinates with either the start or stop
codon, or the start or end coordinate of any exon. See component collections for details of RACE and CAGE tag mapping.
|
||
Additional data
|
|||
|
More information is available under:
|
|||
| Associated files | |||
| Additional sites | |||
Comments
|
|||
|
modENCODE Transcription Start Sites
Core promoter sequence motifs are differentially enriched in the peaked and broad classes of promoters.
Genes with peaked promoters have a marked and highly significant tendency to be expressed in spatially and temporally restricted
patterns, and genes with broad promoters do not.
CAGE peaks within 3' UTRs appear to be associated with cytoplasmic transcript degradation products, and not independent promoters.
|
|||
Synonyms & Secondary IDs
|
|||
| Reported As | |||
| Symbol Synonym |
Integrated promoters
mE_Transcription_Start_Sites
|
||
| Secondary FlyBase IDs | |||
|
|
|||
References
( 2 )
|
|||
| Research paper |
|
||
| Supplementary material |
|
||
Recent Updates
Description & Members