|Citation||Ma, L. (2011.6.8). HOT spots analysis (Data Set S8) and 41 TFBS GFF files (Data Set S7). (Export to RIS)|
|Publication Type||Personal communication to FlyBase|
|Text of Personal Communication||
I have sent you the final version of Data Set S8 from the modENCODE integrative Drosophila paper. This data set lists 38,562 unique TFBSs (including the 1,962 HOT spots) derived from an integrated analysis of the TFBS distributions for 41 different TFs (see below). We calculated the complexity of TF binding at these 38,562 sites ("hotness") as follows. To eliminate the bias of longer range of high hotness HOTspots, we applied Gaussian kernel density estimation to calculate the contribution of every peak to a hotspot. The analysis excluded U, Y and *Het chromosomes. A hotspot is a single base which is then was extended to a region after using a cutoff to define the farthest contributed peak. The 6th column in this file is the density score ("complexity score") that reflects the hotness of a POINT. From this point the region was defined by identifying the contributing peaks. The 9th column of the file contains three parts split by ";".
(1) # of peaks physically overlapped with HOTspot POINT.
(2) # of peaks contributed to the HOTspot (only peaks with contribution >=0.1 was counted).
(3) the rest are ordered in: TF name, density of the middle point.
The 1,962 HOT spots are those sites with complexity score >=8. Please note that in this final version, we modified the calculation of hotness score a little bit to keep consistency with descriptions in the paper. The older version posted on the modENCODE website has not yet been updated, but I will make sure the dataset in the modENCODE website will be replaced shortly.
The HOT spot analysis represents an integration of the binding site calls for 41 individual TFs, which are listed as 41 GFF files in Data Set S7 of the modENCODE integrative Drosophila paper. This Data Set S8 consists of 25 TFs analyzed by modENCODE and 16 TFs analyzed by the BDTNP (Berkeley Drosophila Transcription Network Project). Where multiple data sets were available for the same factor, the peak calls (binding sites) for each data set were merged and the union was taken. Only ChIP experiments from embryonic material were used. The accessions/files used to generate each of the 41 TFBS files are listed at the end of this message. The genetic background, developmental stage, antibody and array platform are as described in the related accessions. In some cases, the peak calls listed in Data Set S7 were derived from alternative analyses of ChIP data that differ slightly from the updated modENCODE versions (the underlying raw data is the same); these data sets are marked by an asterisk in the list below. Data sets for the BDTNP factors (with FDR1 cutoff) were downloaded from UCSC Goldenpath, which is based on dm3 assembly, to keep consistency with modENCODE datasets. The download address is "http://hgdownload.cse.ucsc.edu/goldenPath/dm3/database/" and the files used have the prefix "bdtnp".
TF Data Set S7 file Accession/File
bab1 E0-12h_bab1.gff modENCODE_628
chinmo E0-12h_chinmo.gff modENCODE_608
cnc E0-12h_cnc.gff modENCODE_627
Dll E0-12h_dll.gff modENCODE_606
en E0-12h_en1.gff modENCODE_625
eve E1-6h_eve.gff modENCODE_2603
ftz-f1 E0-12h_ftz-f1.gff modENCODE_624
GAF (Trl) E0-12h_GAF.gff modENCODE_23
inv Embryo_inv.gff modENCODE_605, modENCODE_619
kn E0-12h_kn.gff modENCODE_618
Kr E0-8h_Kr.gff modENCODE_898
run E0-12h_run.gff modENCODE_617
Stat92E E0-12h_Stat92E.gff modENCODE_616
ttk E0-12h_ttk.gff modENCODE_615
Ubx Embryo_Ubx.gff modENCODE_603, modENCODE_612, modENCODE_613, modENCODE_614
zfh1 E0-12h_zfh1.gff modENCODE_604
bks (sbb) Embryo_bks.gff modENCODE_609, modENCODE_2569*
cad Embryo_cad.gff modENCODE_902, modENCODE_2626, modENCODE_2637, modENCODE_2570*, modENCODE2578*
D Embryo_D.gff modENCODE_626, modENCODE_2571*
disco E0-8h_disco.gff modENCODE_2572*
GATAe E0-8h_GATAe.gff modENCODE_2573*
h E0-8h_h.gff modENCODE_2574*
hkb E0-8h_hkb.gff modENCODE_2575*
jumu E0-8h_jumu.gff modENCODE_2576*
sens Embryo_sens.gff modENCODE_978, modENCODE_979, modENCODE_2577*
bcd bdtnp_bcd.gff bdtnpBcd1Fdr1, bdtnpBcd2Fdr1
da bdtnp_da.gff bdtnpDa2Fdr1
dl bdtnp_dl.gff bdtnpDl3Fdr1
ftz bdtnp_ftz.gff bdtnpFtz3Fdr1
gt bdtnp_gt.gff bdtnpGt2Fdr1
hb bdtnp_Hb.gff bdtnpHb1Fdr1, bdtnpHb2Fdr1
kni bdtnp_Kni.gff bdtnpKni1Fdr1, bdtnpKni2Fdr1
Mad bdtnp_mad.gff bdtnpMad2Fdr1
Med bdtnp_med.gff bdtnpMed2Fdr1
prd bdtnp_Prd.gff bdtnpPrd1Fdr1, bdtnpPrd2Fdr1
shn bdtnp_Shn.gff bdtnpShn2Fdr1, bdtnpShn3Fdr1
slp1 bdtnp_Slp1.gff bdtnpSlp11Fdr1
sna bdtnp_Sna.gff bdtnpSna1Fdr1, bdtnpSna2Fdr1
tll bdtnp_Tll.gff bdtnpTll1Fdr1
twi bdtnp_Twi.gff bdtnpTwi1Fdr1, bdtnpTwi2Fdr1
z bdtnp_Z.gff bdtnpZ2Fdr1
* Older analyses of these ChIP experiments were used that differ from the modENCODE versions.
|Research paper||A cis-regulatory map of the Drosophila genome.
Nègre et al., 2011, Nature 471(7339): 527--531 [FBrf0213303]
|Supplementary material||DataS8: HOT regions. [FBrf0213505]
DataS7: Predicted TFBS. [FBrf0213603]
What does this section display?
This section contains items that were added to this record for each release. It currently only tracks new links between this FlyBase report and other FlyBase data classes (e.g. genes, references, stocks) or controlled vocabulary terms (e.g. GO, anatomy terms).
What does this section not display?
This section does not currently display links that were removed or gene model changes.
Click the icon below to subscribe to this FlyBase record and receive updates automatically through your feed reader.
|All updates||Click here to see a list of all updates to this record from FB2010_08 and on.|
|Associated Files||File date: 2011.4.19 ; File size: 4581556 ; File format: txt ; File name: Ma.2011.4.19-modENCODE-HOTSpotAnalysis.txt|
|Language of Publication||English|
|Additional Languages of Abstract|
|Also Published As|
|Data from Reference|