Open Close
Reference
Citation
Ma, L. (2011.6.8). HOT spots analysis (Data Set S8) and 41 TFBS GFF files (Data Set S7). 
FlyBase ID
FBrf0213928
Publication Type
Personal communication to FlyBase
Abstract
PubMed ID
PubMed Central ID
Text of Personal Communication
Hi Gil,
I have sent you the final version of Data Set S8 from the modENCODE integrative Drosophila paper. This data set lists 38,562 unique TFBSs (including the 1,962 HOT spots) derived from an integrated analysis of the TFBS distributions for 41 different TFs (see below). We calculated the complexity of TF binding at these 38,562 sites ("hotness") as follows. To eliminate the bias of longer range of high hotness HOTspots, we applied Gaussian kernel density estimation to calculate the contribution of every peak to a hotspot. The analysis excluded U, Y and *Het chromosomes. A hotspot is a single base which is then was extended to a region after using a cutoff to define the farthest contributed peak. The 6th column in this file is the density score ("complexity score") that reflects the hotness of a POINT. From this point the region was defined by identifying the contributing peaks. The 9th column of the file contains three parts split by ";".
(1) # of peaks physically overlapped with HOTspot POINT.
(2) # of peaks contributed to the HOTspot (only peaks with contribution >=0.1 was counted).
(3) the rest are ordered in: TF name, density of the middle point.
The 1,962 HOT spots are those sites with complexity score >=8. Please note that in this final version, we modified the calculation of hotness score a little bit to keep consistency with descriptions in the paper. The older version posted on the modENCODE website has not yet been updated, but I will make sure the dataset in the modENCODE website will be replaced shortly.
The HOT spot analysis represents an integration of the binding site calls for 41 individual TFs, which are listed as 41 GFF files in Data Set S7 of the modENCODE integrative Drosophila paper. This Data Set S8 consists of 25 TFs analyzed by modENCODE and 16 TFs analyzed by the BDTNP (Berkeley Drosophila Transcription Network Project). Where multiple data sets were available for the same factor, the peak calls (binding sites) for each data set were merged and the union was taken. Only ChIP experiments from embryonic material were used. The accessions/files used to generate each of the 41 TFBS files are listed at the end of this message. The genetic background, developmental stage, antibody and array platform are as described in the related accessions. In some cases, the peak calls listed in Data Set S7 were derived from alternative analyses of ChIP data that differ slightly from the updated modENCODE versions (the underlying raw data is the same); these data sets are marked by an asterisk in the list below. Data sets for the BDTNP factors (with FDR1 cutoff) were downloaded from UCSC Goldenpath, which is based on dm3 assembly, to keep consistency with modENCODE datasets. The download address is "http://hgdownload.cse.ucsc.edu/goldenPath/dm3/database/" and the files used have the prefix "bdtnp".
Best,
Lijia
TF	Data Set S7 file	Accession/File
bab1	E0-12h_bab1.gff	modENCODE_628
chinmo	E0-12h_chinmo.gff	modENCODE_608
cnc	E0-12h_cnc.gff	modENCODE_627
Dll	E0-12h_dll.gff	modENCODE_606
en	E0-12h_en1.gff	modENCODE_625
eve	E1-6h_eve.gff	modENCODE_2603
ftz-f1	E0-12h_ftz-f1.gff	modENCODE_624
GAF (Trl)	E0-12h_GAF.gff	modENCODE_23
inv	Embryo_inv.gff	modENCODE_605, modENCODE_619
kn	E0-12h_kn.gff	modENCODE_618
Kr	E0-8h_Kr.gff	modENCODE_898
run	E0-12h_run.gff	modENCODE_617
Stat92E	E0-12h_Stat92E.gff	modENCODE_616
ttk	E0-12h_ttk.gff	modENCODE_615
Ubx	Embryo_Ubx.gff	modENCODE_603, modENCODE_612, modENCODE_613, modENCODE_614
zfh1	E0-12h_zfh1.gff	modENCODE_604
bks (sbb)	Embryo_bks.gff	modENCODE_609, modENCODE_2569*
cad	Embryo_cad.gff	modENCODE_902, modENCODE_2626, modENCODE_2637, modENCODE_2570*, modENCODE2578*
D	Embryo_D.gff	modENCODE_626, modENCODE_2571*
disco	E0-8h_disco.gff	modENCODE_2572*
GATAe	E0-8h_GATAe.gff	modENCODE_2573*
h	E0-8h_h.gff	modENCODE_2574*
hkb	E0-8h_hkb.gff	modENCODE_2575*
jumu	E0-8h_jumu.gff	modENCODE_2576*
sens	Embryo_sens.gff	modENCODE_978, modENCODE_979, modENCODE_2577*
bcd	bdtnp_bcd.gff	bdtnpBcd1Fdr1, bdtnpBcd2Fdr1
da	bdtnp_da.gff	bdtnpDa2Fdr1
dl	bdtnp_dl.gff	bdtnpDl3Fdr1
ftz	bdtnp_ftz.gff	bdtnpFtz3Fdr1
gt	bdtnp_gt.gff	bdtnpGt2Fdr1
hb	bdtnp_Hb.gff	bdtnpHb1Fdr1, bdtnpHb2Fdr1
kni	bdtnp_Kni.gff	bdtnpKni1Fdr1, bdtnpKni2Fdr1
Mad	bdtnp_mad.gff	bdtnpMad2Fdr1
Med	bdtnp_med.gff	bdtnpMed2Fdr1
prd	bdtnp_Prd.gff	bdtnpPrd1Fdr1, bdtnpPrd2Fdr1
shn	bdtnp_Shn.gff	bdtnpShn2Fdr1, bdtnpShn3Fdr1
slp1	bdtnp_Slp1.gff	bdtnpSlp11Fdr1
sna	bdtnp_Sna.gff	bdtnpSna1Fdr1, bdtnpSna2Fdr1
tll	bdtnp_Tll.gff	bdtnpTll1Fdr1
twi	bdtnp_Twi.gff	bdtnpTwi1Fdr1, bdtnpTwi2Fdr1
z	bdtnp_Z.gff	bdtnpZ2Fdr1
*	Older analyses of these ChIP experiments were used that differ from the modENCODE versions.
DOI
Related Publication(s)
Research paper

A cis-regulatory map of the Drosophila genome.
Nègre et al., 2011, Nature 471(7339): 527--531 [FBrf0213303]

Associated Information
Comments
Associated Files
File date: 2011.4.19 ; File size: 4581556 ; File format: txt ; File name: Ma.2011.4.19-modENCODE-HOTSpotAnalysis.txt
Other Information
Secondary IDs
    Language of Publication
    English
    Additional Languages of Abstract
    Parent Publication
    Publication Type
    Abbreviation
    Title
    ISBN/ISSN
    Data From Reference