Subject: nc5 report, second installment Hi Aubrey, Here is the second installment of nc5 curations. I had substantial help from Guochun, who has placed most of our P insertions on the genomic sequence. Guochun provided me with data that I include in my report--here is the key from Guochun of what the various fields are: p_name: P insertion name. name: CG name gene: if the CG is a known gene, then here is the gene name ct_name: is the CT of the above CG, and being used to decide the relation to P insertion relation: is the relation of P insertion and the CG. There are 3 values for this field: inside, front, and behind. 'inside' is obvious, and if a P is inside a CG, then inside_intron or inside_exon field will indicate which intron/exon it actually inserts. I'll use following diagram to explain 'front' and 'behind' relation. ===> is CG with orientation. \------- is genomic sequence. Normally there are 4 CG around a given P insertion site. P insertion site | front | behind ==========> V =======> 5' \----------------------------------------------> 3' 3' <---------------------------------------------- 5' <====== <======== behind front so each P insertion will have TWO CG with relation 'front' or 'behind'. You also can go to my P insertion Genescene launch page 'http://weasel.lbl. gov:94 /cgi-bin/pins/test.pl' to see some examples. It'll help to explain the relation. r_orientation: is the relative orientation of P insertion alignment and CG. e.g. if CG is on plus strand, and P aligned on minus strand, then r_orientation is '-'. inside_intron: indicates which intron the P insertion inserts. 0 means not inside. inside_exon: indicates which exon the P insertion inserts. 0 means not inside. dist5: gives the distance from 5' end of CT to the P insertion site dist3: gives the distance from 3' end of CT to the P insertion site My personal comments after looking at the BFD report and GeneSeen are in brackets <up></up>. I also looked at each line in GeneSeen, and looked at the BFD insertion line report to check the genetics. I found many cases where insertions mapped to the same place, or very close together, but surprisingly (to me, anyways) many times these insertions complemented each other. I'm not sure what to do in cases where insertions map to the same transcription unit but complement; I'm sure you all are experts at this sort of situation, and will annotate appropriately. . Let me know if anything is unclear. \-sima \------------------------------------------------------------------------------ -- >l(2)01528,l(3)rJ880,l(3)j12B4 sequenced insertions in repeat in genome >l(2)00231=? p_name l(2)00231 name CG4272 gene BcDNA:GH09817 ct_name CT9361 relation behind r_orientation \- inside_intron 0 inside_exon 0 dist5 351 dist3 3998 <up>l(2)00231 complements l(2)k08232, even though they are only 115bp away from each other; l(2)k08232 is 236bp upstream of BcDNA:GH09817 , which is nearest gene to l(2)00231</up> >l(2)00232 sequence=CG7664 p_name l(2)00232 name CG7664 gene crp ct_name CT23425 relation inside r_orientation \- inside_intron 2 inside_exon 0 dist5 18013 dist3 5643 . >l(2)00248=l(2)01275=l(2)k17014=l(2)k07303=Ef1alpha48D p_name l(2)00248 name CG8280 gene Ef1alpha48D ct_name CT24517 relation behind r_orientation \- inside_intron 0 inside_exon 0 dist5 42 dist3 2850 <up>most likely Ef1alpha48D, since insertion is only 42bp upstream of transcription start at 6892926; hotspot for insertion since l(2)01275 22bp upstream of l(2)00248, l(2)k17014 17bp upstream, and l(2)k07303 14bp upstream; l(2)01275 and l(2)k17014 are known to be alleles of Ef1alpha48D and both are further away (more 5'); none ever tested for complementation</up> >l(2)00629=? p_name l(2)00629 name CG13438 gene CG13438 ct_name CT32796 relation front r_orientation \- inside_intron 0 inside_exon 0 dist5 5530 dist3 4790 <up>l(2)00629 insertion not near any annotations, but hotspot for insertion, since l(2)00629, l(2)k07001, and l(2)k06409 within 7bp of each other; amazing l(2)00629 was complemented by l(2)k06409 & l(2)k07001</up> >l(2)01038=mm p_name l(2)01038 name CG10941 gene mm ct_name CT30649 relation inside r_orientation \+ inside_intron 2 inside_exon 0 dist5 38086 dist3 57869 <up>gene has 4 introns, this insertion in middle of largest (huge) intron</up> >l(2)01085=CG15426 p_name l(2)01085 name CG15426 gene ct_name CT35488 relation inside r_orientation \+ inside_intron 0 inside_exon 6 dist5 20251 dist3 3841 >l(2)01094=? p_name l(2)01094 name CG9403 gene CG9403 ct_name CT9101 relation behind r_orientation \+ inside_intron 0 inside_exon 0 dist5 3149 dist3 8614 p_name l(2)01094 name CG15234 gene CG15234 ct_name CT35171 relation behind r_orientation \- inside_intron 0 inside_exon 0 dist5 3130 dist3 3358 <up>possible that l(2)01094=l(2)k03204 since they are inserted 288bp away from each other, but never tested for complementation (unless typo and l(2)k03404 that non-complements l(2)01094 should be l(2)k03204)</up> >l(2)01296=CG3186 p_name l(2)01296 name CG3186 gene CG3186 ct_name CT10685 relation inside r_orientation \- inside_intron 1 inside_exon 0 dist5 493 dist3 1150 >l(2)01351=? p_name l(2)01351 name CG13109 gene ct_name CT32343 relation behind r_orientation \- inside_intron 0 inside_exon 0 dist5 22258 dist3 23996 <up>no annotations nearby; hotspot for 10 insertions 10.7kb to right</up> >l(2)01424=CG3845 p_name l(2)01424 name CG3845 gene CG3845 ct_name CT12829 relation inside r_orientation \- inside_intron 0 inside_exon 1 dist5 4 dist3 7047 >l(2)01466=ATPCL=CG8322 p_name l(2)01466 name CG8322 gene ATPCL ct_name CT18257 relation inside r_orientation \- inside_intron 1 inside_exon 0 dist5 937 dist3 6500 >l(2)01810=CG5304 p_name l(2)01810 name CG5304 gene CG5304 ct_name CT16877 relation inside r_orientation \- inside_intron 1 inside_exon 0 dist5 334 dist3 7870 >l(2)01848=? p_name l(2)01848 name CG2672 gene Tkr ct_name CT9053 relation behind r_orientation \+ inside_intron 0 inside_exon 0 dist5 514 dist3 13063 <up>l(2)01848 is 514bp upstream of Tkr, but l(2)03263 is inserted 17bp closer and remarkably they complement</up> >l(2)01857=l(2)k00107=l(2)00681=CG2140 p_name l(2)01857 name CG2140 gene CG2140 ct_name CT6982 relation inside r_orientation \- inside_intron 1 inside_exon 0 dist5 645 dist3 1587 <up>inserted in first intron just 10bp downstream of l(2)00681; l(2)k00107 is located just upstream of start of transcription of CG2140; none tested for complementation; l(2)00681 must be multiple insert line since one insert maps to 51B5 and doesn't complement ttv, but sequence maps to CG2140 at 43D3</up> >l(2)02045=CG11546 p_name l(2)02045 name CG11546 gene CG11546 ct_name CT36453 relation inside r_orientation \- inside_intron 1 inside_exon 0 dist5 4301 dist3 5002 <up>in intron of transcript CT36453; upstream of 2 other transcripts, CT36451 and CT9385</up> >l(2)02074=CG1512 p_name l(2)02074 name CG1512 gene CG1512 ct_name CT3821 relation behind r_orientation \- inside_intron 0 inside_exon 0 dist5 24 dist3 4050 <up>only 24bp upstream of transcription start</up> >l(2)02836, l(3)03928, l(3)04069 sequences all map to repeat in genome p_name l(2)02836 name CG6983 gene CG6983 ct_name CT21627 relation behind r_orientation \- inside_intron 0 inside_exon 0 dist5 1055 dist3 6919 <up>must be multiple insert line since genetically insertion is at 53B1, but sequence maps to 66D1, and other 2 insertions are supposed to be on third, but instead sequences map to identical nucleotide</up> >l(2)03050=CG9350? p_name l(2)03050 name CG9350 gene CG9350 ct_name CT26565 relation behind r_orientation \+ inside_intron 0 inside_exon 0 dist5 203 dist3 961 <up>CG9350 is 203bp downstream but only evidence for gene was homology to EST, no gene prediction</up> >l(2)03105=l(2)k16702=? p_name l(2)03105 name CG12464 gene CG12464 ct_name CT32655 relation behind r_orientation \- inside_intron 0 inside_exon 0 dist5 13822 dist3 14253 p_name l(2)03105 name CG18369 gene CG18369 ct_name CT41749 relation front r_orientation \+ inside_intron 0 inside_exon 0 dist5 26306 dist3 24529 <up>nothing nearby; l(2)k16702 inserted at identical nucleotide; these were never tested for complementation</up> >l(2)03497=wun? p_name l(2)03497 name CG8804 gene wun ct_name CT4876 relation behind r_orientation \+ inside_intron 0 inside_exon 0 dist5 164 dist3 9085 <up>seems to be a hotspot, with 4 insertions within 219bp of each other, just upstream of wun; complementation never tested with l(2)k09507, which is inserted in first exon of wun</up> >l(2)03563=? p_name l(2)03563 name CG17390 gene CG17390 ct_name CT33481 relation behind r_orientation \+ inside_intron 0 inside_exon 0 dist5 572 dist3 7041 <up>CG17390 is only gene close by, but 572bp downstream</up> >l(2)03605=l(2)03832=? p_name l(2)03605 name CG17952 gene CG17952 ct_name CT39996 relation behind r_orientation \+ inside_intron 0 inside_exon 0 dist5 990 dist3 3746 <up>CG17952 990bp downstream; l(2)03832 inserted 1bp upstream of l(2)03605, but these were not tested for complementation</up> >l(2)03709=CG15081 p_name l(2)03709 name CG15081 gene CG15081 ct_name CT42565 relation inside r_orientation \+ inside_intron 0 inside_exon 1 dist5 20 dist3 2334 >l(2)03771=CG18323=CG14028 p_name l(2)03771 name CG18323=CG14028 gene ct_name CT41595=CT33587 relation behind r_orientation \+ inside_intron 0 inside_exon 0 dist5 141 dist3 619 <up>nothing else around but CG18323=CG14028 141bp downstream</up> >l(2)03832=l(2)03605=? p_name l(2)03832 name CG17952 gene CG17952 ct_name CT39996 relation behind r_orientation \+ inside_intron 0 inside_exon 0 dist5 991 dist3 3747 <up>see record for l(2)03605; start of transcription of CG17952 is 991bp downstream</up> >l(2)03996=CG8258 p_name l(2)03996 name CG8258 gene CG8258 ct_name CT8297 relation behind r_orientation \- inside_intron 0 inside_exon 0 dist5 7 dist3 2159 <up>CG8258 is just 7bp before the start of transcription</up> >l(2)04008=? p_name l(2)04008 name CG6320 gene ct_name CT19694 relation behind r_orientation \+ inside_intron 0 inside_exon 0 dist5 1499 dist3 8603 p_name l(2)04008 name CG16874 gene Vm32E ct_name CT19798 relation front r_orientation \+ inside_intron 0 inside_exon 0 dist5 2850 dist3 2499 <up>not close to anything, 1.5kb from start of transcription of CG6320</up> >l(2)04111=? p_name l(2)04111 name CG10871 gene ct_name CT30433 relation behind r_orientation \+ inside_intron 0 inside_exon 0 dist5 20188 dist3 28966 <up>nearest annotation, CG10871, is 20kb away</up> >l(2)04154=CG5935 p_name l(2)04154 name CG5935 gene EG:EG0003.6 ct_name CT18411 relation inside r_orientation \- inside_intron 1 inside_exon 0 dist5 252 dist3 3886 <up>but complemented by l(2)k0997, which is inserted in second intron, 769bp downstream</up> >l(2)04329=Nacalpha p_name l(2)04329 name CG8759 gene Nacalpha ct_name CT25274 relation behind r_orientation \- inside_intron 0 inside_exon 0 dist5 7 dist3 1158 >l(2)04493=smt3 p_name l(2)04493 name CG4494 gene smt3 ct_name CT14617 relation behind r_orientation \+ inside_intron 0 inside_exon 0 dist5 33 dist3 832 <up>also shown to non-complement l(2)04841, which is inserted in smt3</up> >l(2)04530=CG9342 p_name l(2)04530 name CG9342 gene CG9342 ct_name CT3751 relation behind r_orientation \- inside_intron 0 inside_exon 0 dist5 36 dist3 5986 <up>maps only 36bp upstream of gene</up> >l(2)04535=l(2)k16713=tkv p_name l(2)04535 name CG14026 gene tkv ct_name CT33585 relation inside r_orientation \+ inside_intron 2 inside_exon 0 dist5 5237 dist3 17291 <up>l(2)04535 must be multiple insert line since in situ said it mapped to 42C1-42C2; sequence of l(2)04535 and l(2)k16713 maps to large second intron of tkv, 211bp apart; l(2)04535 shown to be an allele of tkv genetically, l(2)k16713 never tested for complementation but is also a known tkv allele</up> >l(2)04723 sequence=dock p_name l(2)04723 name CG3727 gene dock ct_name CT42218 relation inside r_orientation \+ inside_intron 1 inside_exon 0 dist5 798 dist3 6434 p_name l(2)04723 name CG3727 gene dock ct_name CT12313 relation inside r_orientation \+ inside_intron 1 inside_exon 0 dist5 827 dist3 6434 . >l(2)04845=l(2)k00208=AGO1? p_name l(2)04845 name CG6671 gene AGO1 ct_name CT20708 relation inside r_orientation \- inside_intron 2 inside_exon 0 dist5 1210 dist3 8908 <up>l(2)04845 maps to second intron of CT20708, l(2)k08121 maps to third intron of CT20708, second intron of CT42234, and l(2)k00208 maps to first intron of CT20708; l(2)k08121 surprisingly complemented l(2)04845 genetically but l(2)k00208 not tested</up> >l(2)05070=CG8392 p_name l(2)05070 name CG8392 gene CG8392 ct_name CT18263 relation inside r_orientation \- inside_intron 0 inside_exon 1 dist5 9 dist3 957 <up>l(2)05070 in first exon</up> >l(2)05095=? p_name l(2)05095 name CG10248 gene Cyp6a8 ct_name CT28799 relation behind r_orientation \- inside_intron 0 inside_exon 0 dist5 5329 dist3 7009 p_name l(2)05095 name CG17453 gene Cyp317a1 ct_name CT31993 relation behind r_orientation \+ inside_intron 0 inside_exon 0 dist5 5185 dist3 6742 <up>must be multiple insert line since in situ at 39E1-39E2 and sequence maps to 51D3; sequenced insertion not really near any gene</up> >l(2)05248=? p_name l(2)05248 name CG8297 gene CG8297 ct_name CT20148 relation behind r_orientation \+ inside_intron 0 inside_exon 0 dist5 12775 dist3 13826 p_name l(2)05248 name CG8291 gene CG8291 ct_name CT21678 relation front r_orientation \- inside_intron 0 inside_exon 0 dist5 9789 dist3 1070 <up>original in situ for both l(2)k02205 and l(2)05248 at 52D1-52D2 but inferred genomic sequence map is 52D9; insertion sequence maps 50bp from l(2)k02205, which surprisingly complements l(2)05248; not really near any gene</up> >l(2)05287=CG12050 p_name l(2)05287 name CG12050 gene CG12050 ct_name CT3661 relation behind r_orientation \+ inside_intron 0 inside_exon 0 dist5 10 dist3 7350 <up>insertion maps 10bp upstream of gene; in situ was at 39A1-39A2 but inferred genomic sequence map is 39B1</up> >l(2)05428=l(2)k06503=Cdk4/6 p_name l(2)05428 name CG5072 gene Cdk4/6 ct_name CT16072 relation inside r_orientation \- inside_intron 3 inside_exon 0 dist5 4371 dist3 1562 p_name l(2)05428 name CG5072 gene Cdk4/6 ct_name CT15896 relation inside r_orientation \- inside_intron 3 inside_exon 0 dist5 3765 dist3 1562 <up>l(2)05428 and l(2)k06503 map to third intron of gene within 71bp of each other; never tested for complementation</up> >l(2)k00107=l(2)01857=l(2)00681=CG2140 p_name l(2)k00107 name CG2140 gene CG2140 ct_name CT6982 relation behind r_orientation \- inside_intron 0 inside_exon 0 dist5 36 dist3 2268 <up>insertion is located 36bp upstream of start of transcription of CG2140; other insertions map to intron, within 10bp of each other; none tested for complementation; l(2)00681 must be multiple insert line since one insert maps to 51B5 and doesn't complement ttv, but sequence maps to CG2140 at 43D3</up> >l(2)k00208=l(2)04845=AGO1? p_name l(2)k00208 name CG6671 gene AGO1 ct_name CT20708 relation inside r_orientation \+ inside_intron 1 inside_exon 0 dist5 560 dist3 9558 p_name l(2)k00208 name CG6671 gene AGO1 ct_name CT42234 relation behind r_orientation \+ inside_intron 0 inside_exon 0 dist5 1849 dist3 9558 <up>l(2)04845 maps to second intron of CT20708, l(2)k08121 maps to third intron of CT20708, second intron of CT42234, and l(2)k00208 maps to first intron of CT20708; l(2)k08121 surprisingly complemented l(2)04845 genetically but l(2)k00208 not tested</up> >l(2)k02205=? p_name l(2)k02205 name CG8297 gene CG8297 ct_name CT20148 relation behind r_orientation \- inside_intron 0 inside_exon 0 dist5 12825 dist3 13876 p_name l(2)k02205 name CG8291 gene CG8291 ct_name CT21678 relation front r_orientation \+ inside_intron 0 inside_exon 0 dist5 9839 dist3 1120 <up>original in situ for both l(2)k02205 and l(2)05248 at 52D1-52D2 but inferred genomic sequence map is 52D9; insertion sequence maps 50bp from l(2)k02205, which surprisingly complements l(2)05248; not really near any gene</up> >l(2)k03204=? p_name l(2)k03204 name CG9403 gene CG9403 ct_name CT9101 relation behind r_orientation \+ inside_intron 0 inside_exon 0 dist5 3437 dist3 8902 p_name l(2)k03204 name CG15234 gene CG15234 ct_name CT35171 relation behind r_orientation \- inside_intron 0 inside_exon 0 dist5 2842 dist3 3070 <up>possible that l(2)01094=l(2)k03204 since they are inserted 288bp away from each other, but never tested for complementation</up> >l(2)k06409=? p_name l(2)k06409 name CG13438 gene CG13438 ct_name CT32796 relation front r_orientation \- inside_intron 0 inside_exon 0 dist5 5523 dist3 4783 <up>l(2)k06409 insertion not near any annotations, but hotspot for insertion, since l(2)00629, l(2)k07001, and this one within 7bp of each other; amazing since l(2)k06409 was complemented by l(2)00629 & l(2)k07001</up> >l(2)k07001=? p_name l(2)k07001 name CG13438 gene CG13438 ct_name CT32796 relation front r_orientation \+ inside_intron 0 inside_exon 0 dist5 5525 dist3 4785 <up>l(2)k07001 insertion not near any annotations, but hotspot for insertion, since l(2)00629, l(2)k07001, and l(2)k06409 within 7bp of each other; amazing since all 3 complement</up> >l(2)k08121=AGO1? p_name l(2)k08121 name CG6671 gene AGO1 ct_name CT20708 relation inside r_orientation \- inside_intron 3 inside_exon 0 dist5 4098 dist3 6020 p_name l(2)k08121 name CG6671 gene AGO1 ct_name CT42234 relation inside r_orientation \- inside_intron 2 inside_exon 0 dist5 1689 dist3 6020 p_name l(2)k08121 name CG6671 gene AGO1 ct_name CT42236 relation inside r_orientation \- inside_intron 0 inside_exon 1 dist5 5 dist3 6020 <up>l(2)04845 maps to second intron of CT20708, l(2)k08121 maps to third intron of CT20708, second intron of CT42234, and l(2)k00208 maps to first intron of CT20708; l(2)k08121 surprisingly complemented l(2)04845 genetically but l(2)k00208 not tested; hard to determine why these are all in AGO1 but 2 complement</up> >l(2)k16702=l(2)03105=? p_name l(2)k16702 name CG12464 gene CG12464 ct_name CT32655 relation behind r_orientation \- inside_intron 0 inside_exon 0 dist5 13822 dist3 14253 p_name l(2)k16702 name CG18369 gene CG18369 ct_name CT41749 relation front r_orientation \+ inside_intron 0 inside_exon 0 dist5 26306 dist3 24529 <up>nothing nearby but l(2)03105 at same nucleotide; not tested for complementation</up> >l(3)03928, l(2)02836, l(3)04069 sequences all map to repeat in genome p_name l(3)03928 name CG6983 gene CG6983 ct_name CT21627 relation behind r_orientation \- inside_intron 0 inside_exon 0 dist5 1055 dist3 6919 <up>must be multiple insert line, since insertion maps genetically to third, but sequence maps to 66D1, and other 2 insertions' sequence likewise map in wrong position at identical nucleotide</up> >l(3)04069, l(3)03928, l(2)02836 sequences all map to repeat in genome p_name l(3)04069 name CG6983 gene CG6983 ct_name CT21627 relation behind r_orientation \- inside_intron 0 inside_exon 0 dist5 1055 dist3 6919 <up>must be multiple insert line, since insertion maps genetically to third, but sequence maps to 66D1, and other 2 insertions' sequence likewise map in wrong position at identical nucleotide</up> >l(3)j12B4,l(2)01528,l(3)rJ880 inserted in repeat in genome >l(3)rJ880,l(2)01528,l(3)j12B4 inserted in repeat in genome \------------------------------------------------------------------------------ --