Here is my Category 10 report; the methods are the same as used for the previous categories. Category 10:genes that were not properly linked even in the penultimate XML, but for which a candidate is pretty clear. CanB FBgn0010014 170 aa = CT13882 CG4209 170 aa MAPk-Ak2 FBgn0013987 359aa =CT10332 CG3086 359aa nej FBgn0015624 3190 aa U88570=CT35306 CG15319 3275 aa (C-terminus diverges at aa 3146 of nej, needs to be checked/fixed) Pp2A-29B FBgn0005776 591aa=CG17291|FBan0017291|CT32725| 591aa Adf1 FBgn0000054 253 aa = CG15845|FBan0015845|CT37731| 253 aa br FBgn0000210 877 aa=CG11491|FBan0011491|CT36317 880 aa (there are mismatches in the middle section, to be fixed.) (br has several isoforms) comt FBgn0000346 745 aa=CG1618|FBan0001618|CT41661| 737 aa (7 aa missing from CT41661 between aa34-35 wrt CDS from U09373) ct FBgn0004198 2175 aa =CG11387|FBan0011387|CT31792| 53 aa (by nucleotide alignment to AE003441; translation is severely truncated) (Aubrey, you mentioned CG12690; CG12690|FBan0012690|CT35562| is the neighboring gene, by alignment CHES-1-like FBgn0029504 120 aa is likely to be CG12690|FBan0012690|CT35562| 1161 aa CHES-1-like is a partial CDS) e(r) FBgn0011586 104 aa=CG1871|FBan0001871|CT29800|CT5770| 104 aa (Two mRNAs that have alternately spliced 5' UTRs but encode identical proteins) lola FBgn0005630 MERGE CG12052 and CG18376 lola FBgn0005630 467aa=CG12052|FBan0012052|CT40982| 465aa lola FBgn0005630 894 aa=CG18376|FBan0018376|CT40707| 442 aa (CT40982 of CG12052corresponds to short form encoded by accession U07606, longer by 2 aa but rest perfect match.) (The other 4 transcripts of CG12052 differ only in the 5' UTR and all encode a 787 aa protein the C-terminus of which is suspect; (CT40707 CG18376 corresponds to the C-terminus of the 894 aa long isoform of lola (accession U07607), to be fixed.) sol FBgn0003464 1597 aa = CG1391|FBan0001391|CT30735| 671 aa AND CG1391|FBan0001391|CT40937| 718 aa N-terminus of both translations match sol but both are truncated; structures need to be fixed (these two transcripts may need to be merged: have to see if there is EST evidence before deciding.) Neither match the short (395 aa) isoform of sol. Sxl FBgn0003659 366aa= CG18350|FBan0018350|CT40521 334aa (translation start site is further upstream than the one in CT40521) jim FBgn0027339 820aa=CG11352|FBan0011352|CT31349| 820 aa (check alternate transcript CT31353 when reannotating) (Aubrey, you said 'see CT23590.jam': where can I find CT23590.jam?) SNF1A FBgn0023169 582 aa =CG3051|FBan0003051|CT10258| 582 aa myo-inositol-1-phosphate-synthase FBgn0025885 565 aa=CG11143|FBan0011143|CT31151 565 aa (Aubrey, what is 9bg in your original list?) βTry FBgn0010357 253 = CG18211|FBan0018211|CT41196| 247aa εTry FBgn0010425 256 aa=CG18681|FBan0018681|CT42649| 256 aa BcDNA:GH06348 FBgn0027580 1181 aa=CG1516|FBan0001516|CT3885|CT40220|CT40222 1181 aa (CT3885|CT40220|CT40222 differ only in their 5' UTRs and encode the same CDS.) BcDNA:GH10333 FBgn0027553 594 aa=CG12152|FBan0012152|CT8453 594 aa _______________________________________________________________________ ( BcDNA:LD08743 is a synonym of Eb1) (Aubrey and MA: Eb1 and BcDNA:LD08743 have to be split) There are two adjacent genes annotated as Eb1 (see accession AE003789): CG3265 and CG3267. BcDNA:LD08743 (Acc. AF132560) =CG3265|FBan0003265|CT10989 291aa |CT37737 291aa 'Eb1' (Acc. AF006654) 93 aa = CG3267|FBan0003267|CT10957| 578 aa Acc. AF006654 has only a partial CDS that aligns with the last 93 aa of CT10957 (CG3267); it is not possible for me to tell if the overall structure of CT10957 (CG3267) is correct. BcDNA:LD08743 must have been automatically assigned to Eb1 because it hits the third accession under Eb1, AF006645, which is for 'l(2)4524 PZ element flanking sequence.' _______________________________________________________________________ BcDNA:LD14270 is a synonym of Cas Cas FBgn0027064 975 aa =CG13281|FBan0013281|CT32567|975 aa BcDNA:LD24527 is not a valid FlyBase gene, since sequence AF145685 was deleted from GenBank (although it was retained in the CDS set.) Might be best to keep CG9638 as the valid name. (The gene formerly known as BcDNA:LD24527 FBgn0027502) 418 aa = CG9638| FBan0009638|CT27244| 418 aa BcDNA:LD32148 FBgn0027494 163 aa = CG12275|FBan0012275|CT17676| 117 aa (CT17676 is 46 aa longer at the N terminus than BcDNA:LD32148 ) BG:DS07108.1FBgn0028864 464aa= CG18477|FBan0018477|CT42122| 464aa bgcn FBgn0004581 245aa=CG17611|FBan0017611|CT38864| 245aa EG:103B4.2 FBgn0023550 475 aa=CG18031|FBan0018031|CT40356| 491 aa (one internal region of mismatch; CG18031 is also 16 aa longer at C-terminus.) EG:115C2.6 FBgn0025635 467 aa=CG17829|FBan0017829|CT39573| 467 aa EG:132E8.3 FBgn0024986 160 aa= CG3719|FBan0003719|CT12473| 160 aa EG:80H7.1 FBgn0025385 281 aa = no match (The version of EG:80H7.1 in the CDS set out of date.) ( EG:80H7.1 CDS aligns to coordinates (8957-8481)(8413-8050) of AE003421, There is no gene annotated there. CDS of CG14778/CT34588 is at coordinates (9138..9273,9335..9417,9477..9659,9722..9880) of AE003421; The EDGP cosmid AL031027 does not have a gene annotated at the corresponding coordinates) EG:BACN32G11.1 FBgn0027796 410aa= CG18531|FBan0018531|CT42308| 410aa Elongin-B FBgn0023212 118 aa=CG4204|FBan0004204|CT13840 6aa!!! (By alignment of AB007692 to AE003731, Elongin-B=CG4204 but transcript and translation have to be fixed. Severe truncation of CT13840!) kin17 FBgn0024887 244 aa = CG5649|FBan0005649|CT17834| 390 aa (kin17 CDS is partial) Mst57Da FBgn0011668 55 aa=CG9074|FBan0009074|CT26012| 75 aa (verified by alignment of Z33647 to AE003753; likely that expanded repeat region gave rise to size difference) WD-40-family-member FBgn0026012 730 aa=CG7392|FBan0007392|CT22739| 734 aa (4 aa insert in CG7392; rest matches) Calo MERGE with poe (Alignments sent to AG and MA) Calo/poe 5322 aa=CG14472|FBan0014472|CT34171| 5322 aa) Cdic FBgn0013761 653 aa =CG18000|FBan0018000|CT40242| 246 aa (note: part of Cdic is homologous to Sdic) dlt FBgn0024246 871 aa=CG12021|FBan0012021|CT1747| 871 aa (note: JTBR FBgn0025820 152 aa = CG1935|FBan0001935|CT5993| 152 aa, dlt and JTBR mRNAs overlap at their 3' ends, maybe this caused confusion) Hrb87F FBgn0004237 386 aa = CG12749|FBan0012749|CT27250|385 aa ( note:this is also the only CG and only scaffold that Hrb85CD aligns with so far, but Hrb87F is correct by gene order; is it possible that Hrb85CD and Hrb87F are the same thing? Their DNAs align perfectly. Will investigate further.) l(2)37Cc FBgn0002031 203 aa = CG10691|FBan0010691|CT29956| 276 aa (CT29956 starts further upstream than known l(2)37Cc CDS, has extra aa in one matching stretch (intron-exon boundary needs fixing), and two proteins are mismatched at their C termini.) l(3)70Da FBgn0013563 1006 aa = CG6760|FBan0006760|CT20911| 1006 aa l(3)82Fd FBgn0013576 1270 aa = CG10199|FBan0010199|CT9007| 1325 aa (CT9007 has an extra stretch of aa s immediately after aa 1078 of AF125384 CDS but rest matches well.) l(3)87Df FBgn0002354 73 aa = CG7620|FBan0007620|CT23225| 110 aa (Aubrey: Warning: CDS set sequence S77927 is WRONG: it is a psq-l(3)87Df fusion protein!)(CG7620 has additional aa at C terminus wrt CDS of Acc. S41484.) PebIII FBgn0011695 158 aa = CG11390|FBan0011390|CT31804| 124 aa (better match than CG9358) rl FBgn0003256 376 aa = CG12559|FBan0012559|CT34260 65 aa| CT39192 55 aa (N terminus only, rest not annotated-heterochromatin region gene) Scp1 FBgn0020908 185 aa = G15848|FBan0015848|CT40161 172 aa (intron-exon structure needs to be fixed, ATG is further 5') ETH FBgn0028738 203 aa = CG18105|FBan0018105|CT40683| 203 aa Fer2LCH FBgn0015221 227 aa = CG1469|FBan0001469|CT3604| 227 aa sut2 FBgn0028562 491 aa = G17975|FBan0017975|CT40087| 438 aa (CT40087 starts at 3rd Met wrt CDS of acc. AF199484, missing aa 266-299) sut3 FBgn0028561 476 aa = CG17976|FBan0017976|CT40089| 476 aa anon-3Ca FBgn0014096 56 aa = CG18089|FBan0018089|CT40590| 56 aa aralar1 FBgn0028646 682 aa = CG2139|FBan0002139|CT6974| 682 aa arginase FBgn0023535 351 aa = CG18104|FBan0018104|CT40671| 235aa (diverge after aa 197, fix structure.) ( BcDNA:GH10148 is a synonym of mbf1) mbf1 FBgn0026208 145aa = CG4143|FBan0004143|CT13720| 145aa BG:DS00180.14 FBgn0028939 648 aa = CG18146|FBan0018146|CT40902| 701 aa (CT40902 has extra aa between aa21-22 of BG:DS00180.14 ) BG:DS02252.3 FBgn0028901 1931 aa = CG18109|FBan0018109|CT40741| 1379 aa (CT40741 starts at 2nd Met of BG:DS02252.3 , and is missing >500 aa at the C terminal) Dsk FBgn0000500 128 aa =CG18090|FBan0018090|CT40598| 128 aa EG:152A3.2 FBgn0023541 507 aa=CG3540|FBan0003540|CT11900|507 aa fs(1)K10 FBgn0000810 463 aa=CG3218|FBan0003218|CT10801|463 aa Met75Ca no match (see note below) >From Category 1: Met75Cb(FBgn0028415)=CG18064|FBan0018064|CT40477|FBan0018064 ( note:CG18064 is incorrectly identified as Met75Ca; sequence corresponding to Met75Ca exists but is unannotated. This is by alignment of Acc.AJ249253 (which has both genes) to AE003520.) OrfKD FBgn0011820 194 aa=CG18103|FBan0018103|CT40669| 69 aa (assignment by DNA alignment, structure has to be fixed; OrfKD is a partial CDS derived by PCR) poe MERGE with Calo (Alignments sent to AG and MA) Sdic FBgn0025801 517 aa = CG9580|FBan0009580|CT17580| 215 aa (CG9850 corresponds to the C terminus of Sdic) (N terminus of Sdic is homologous to part of AnnX, and C terminus is homologous to part of Cdic. The Sdic gene is tandemly duplicated approximately 10 times. So this region is a bit tricky.) Snap25 MERGE CG17884|FBan0017884|CT39799 (exon 5 of Snap25) AND CG17676|FBan0017676|CT39055 (exon 6 of Snap25) (The exons of Snap24 extend over 120 kb acc. to FBrf0098336; I BLASTed the pieces of DNA that have individual exons annotated on them and got the following: exon 2 U81147 aligns to AE003395 (first coding exon) exon 3 U81148 aligns to AE003204 exon 4 U81149 aligns to AE003242 exon 5 U81150 aligns to AE003379 CG17676 exon 6 U81151 aligns to AE002931 CG17884 exon 7 U81152 aligns to AE002931 exon 8 U81153 aligns to AE003013 (All scaffolds above are short.) AlstR FBgn0028961 394 aa= CG2872|FBan0002872|CT9822| 106 aa ( EG:121E7.2 ) (not a good BLASTP alignment, but by BLASTN this is the correct region; CG2872 is too short; AF163775 aligns to 159497(ATG)-178653(@) of AE003428 but GC2872 is annotated for <175964..>178557 of same scaffold) (MERGE AlstR and GR, will send alignment to MA and AG) Cht4 (incomplete CDS) aligns to the same region of AE003452 (complement(284130-284527)) as CG3986|CT12499 (complement(join(284288..284314,284377..284505)) but there is no amino acid alignment. Aubrey: Do you put correspondences in when the structure is so obviously off?