Bayraktaroglu, L. (2000.4.12). Annotation identities: Category 10. 
FlyBase ID
Publication Type
Personal communication to FlyBase
PubMed ID
PubMed Central ID
Text of Personal Communication
Here is my Category 10 report; the methods are the same
as used for the previous categories.
Category  10:genes  that were not properly linked even in the 
penultimate XML, but for which a candidate is pretty clear.
CanB FBgn0010014 170 aa = CT13882 CG4209 170 aa  
MAPk-Ak2 FBgn0013987 359aa =CT10332 CG3086 359aa
nej FBgn0015624 3190 aa U88570=CT35306 CG15319 3275 aa
(C-terminus diverges at aa 3146 of nej, needs to be checked/fixed)
Pp2A-29B FBgn0005776 591aa=CG17291|FBan0017291|CT32725| 591aa 
Adf1 FBgn0000054  253 aa = CG15845|FBan0015845|CT37731| 253 aa
br FBgn0000210 877 aa=CG11491|FBan0011491|CT36317 880 aa
(there are mismatches in the middle section, to be fixed.)
(br has several isoforms)
comt FBgn0000346 745 aa=CG1618|FBan0001618|CT41661| 737 aa
(7 aa missing from CT41661 between aa34-35 wrt CDS from U09373)
ct FBgn0004198 2175 aa =CG11387|FBan0011387|CT31792| 53 aa 
(by nucleotide alignment to AE003441; translation is severely truncated)
(Aubrey, you mentioned CG12690;  CG12690|FBan0012690|CT35562| is the 
neighboring gene, by alignment CHES-1-like FBgn0029504 120 aa is 
likely to be CG12690|FBan0012690|CT35562| 1161 aa 
CHES-1-like is a partial CDS)
e(r) FBgn0011586 104 aa=CG1871|FBan0001871|CT29800|CT5770| 104 aa
(Two mRNAs that have alternately spliced 5' UTRs but encode identical 
lola FBgn0005630 MERGE CG12052 and CG18376
lola FBgn0005630 467aa=CG12052|FBan0012052|CT40982| 465aa 
lola FBgn0005630 894 aa=CG18376|FBan0018376|CT40707| 442 aa
(CT40982 of CG12052corresponds to short form encoded by accession U07606, 
longer by 2 aa but rest perfect match.) 
(The other 4 transcripts of CG12052 differ only in the 5' UTR and all encode a 
787 aa protein the C-terminus of which is suspect; 
(CT40707 CG18376 corresponds to the C-terminus of the 894  aa long isoform of lola 
(accession  U07607), to be fixed.)
sol FBgn0003464 1597 aa = CG1391|FBan0001391|CT30735| 671 aa AND 
  CG1391|FBan0001391|CT40937| 718 aa
N-terminus of both translations match sol but both are truncated; structures 
need to be fixed (these two transcripts may need to be merged: have to see
if there is EST evidence before deciding.) Neither match the short (395 aa) 
isoform of sol.
Sxl FBgn0003659 366aa= CG18350|FBan0018350|CT40521 334aa 
(translation start site is further upstream than the one in CT40521)
jim  FBgn0027339 820aa=CG11352|FBan0011352|CT31349| 820 aa
(check alternate transcript CT31353 when reannotating)
(Aubrey, you said 'see CT23590.jam': where can I find CT23590.jam?)
SNF1A FBgn0023169 582 aa =CG3051|FBan0003051|CT10258| 582 aa
myo-inositol-1-phosphate-synthase FBgn0025885 565 aa=CG11143|FBan0011143|CT31151 565 aa
(Aubrey, what is 9bg in your original list?)
βTry  FBgn0010357 253 = CG18211|FBan0018211|CT41196| 247aa
εTry FBgn0010425 256 aa=CG18681|FBan0018681|CT42649| 256 aa
 BcDNA:GH06348  FBgn0027580 1181 aa=CG1516|FBan0001516|CT3885|CT40220|CT40222 1181 aa
(CT3885|CT40220|CT40222 differ only in their 5' UTRs and encode the same CDS.)
 BcDNA:GH10333  FBgn0027553 594 aa=CG12152|FBan0012152|CT8453  594 aa
( BcDNA:LD08743  is a synonym of Eb1) 
(Aubrey and MA: Eb1 and  BcDNA:LD08743  have to be split)
There are two adjacent genes annotated as Eb1 (see accession AE003789):
CG3265 and CG3267.
 BcDNA:LD08743  (Acc. AF132560) =CG3265|FBan0003265|CT10989 291aa |CT37737 291aa
'Eb1' (Acc. AF006654) 93 aa = CG3267|FBan0003267|CT10957| 578 aa
  Acc. AF006654 has only a partial CDS that aligns with the last 93 aa
  of CT10957 (CG3267); it is not possible for me to tell if the overall
  structure of CT10957 (CG3267) is correct.
   BcDNA:LD08743  must have been automatically assigned to Eb1 because it 
  hits the third accession under Eb1, AF006645, which is for 'l(2)4524 PZ 
  element flanking sequence.'
 BcDNA:LD14270  is a synonym of Cas
Cas FBgn0027064 975 aa =CG13281|FBan0013281|CT32567|975 aa
 BcDNA:LD24527  is not a valid FlyBase gene, since sequence AF145685
was deleted from GenBank (although it was retained in the  CDS set.)
Might be best to keep CG9638 as the valid name.
(The gene formerly known as  BcDNA:LD24527  FBgn0027502) 418 aa = CG9638|
FBan0009638|CT27244| 418 aa
 BcDNA:LD32148  FBgn0027494 163 aa = CG12275|FBan0012275|CT17676| 117 aa
(CT17676 is 46 aa longer at the  N terminus than  BcDNA:LD32148 )
 BG:DS07108.1FBgn0028864  464aa= CG18477|FBan0018477|CT42122| 464aa
bgcn FBgn0004581 245aa=CG17611|FBan0017611|CT38864| 245aa
 EG:103B4.2  FBgn0023550 475 aa=CG18031|FBan0018031|CT40356| 491 aa
(one internal region of mismatch; CG18031 is also 16 aa longer at C-terminus.)
 EG:115C2.6  FBgn0025635 467 aa=CG17829|FBan0017829|CT39573|  467 aa
 EG:132E8.3  FBgn0024986 160 aa= CG3719|FBan0003719|CT12473|  160 aa
 EG:80H7.1  FBgn0025385 281 aa = no match
(The version of  EG:80H7.1  in the CDS set out of date.)
( EG:80H7.1  CDS aligns to coordinates (8957-8481)(8413-8050) of  AE003421, 
There is no gene annotated there. CDS of  CG14778/CT34588 is at coordinates
(9138..9273,9335..9417,9477..9659,9722..9880) of AE003421; The EDGP cosmid
AL031027 does not have a gene annotated at the corresponding coordinates)
 EG:BACN32G11.1  FBgn0027796 410aa= CG18531|FBan0018531|CT42308|  410aa
Elongin-B FBgn0023212 118 aa=CG4204|FBan0004204|CT13840  6aa!!!
(By alignment of AB007692 to AE003731, Elongin-B=CG4204 but transcript 
and translation have to be fixed. Severe truncation of CT13840!)
kin17 FBgn0024887 244 aa = CG5649|FBan0005649|CT17834|  390 aa
(kin17 CDS is partial)
Mst57Da FBgn0011668 55 aa=CG9074|FBan0009074|CT26012| 75 aa
(verified by alignment of Z33647 to AE003753; likely that expanded repeat
region gave rise to size difference)
WD-40-family-member FBgn0026012 730 aa=CG7392|FBan0007392|CT22739| 734 aa
(4 aa insert in CG7392; rest matches)
Calo MERGE with poe
(Alignments sent to AG and MA)
Calo/poe 5322 aa=CG14472|FBan0014472|CT34171|  5322 aa)
Cdic FBgn0013761 653 aa =CG18000|FBan0018000|CT40242| 246 aa
(note: part of Cdic is homologous to Sdic)
dlt FBgn0024246  871 aa=CG12021|FBan0012021|CT1747| 871 aa
(note: JTBR FBgn0025820 152 aa = CG1935|FBan0001935|CT5993| 152 aa,
dlt and JTBR mRNAs overlap at their 3' ends, maybe this caused
Hrb87F FBgn0004237 386 aa = CG12749|FBan0012749|CT27250|385 aa
( note:this  is also the only CG and only scaffold that Hrb85CD aligns 
with so far, but Hrb87F is correct by gene order; is it possible
that Hrb85CD and Hrb87F are the same thing? Their DNAs align perfectly.
Will investigate further.)
l(2)37Cc FBgn0002031 203 aa = CG10691|FBan0010691|CT29956| 276 aa
(CT29956 starts further upstream than known l(2)37Cc CDS, has  extra aa in 
one matching stretch (intron-exon boundary needs fixing), and two proteins
are mismatched at their C termini.)
l(3)70Da FBgn0013563 1006 aa = CG6760|FBan0006760|CT20911| 1006 aa
l(3)82Fd FBgn0013576 1270 aa = CG10199|FBan0010199|CT9007|  1325 aa
 (CT9007 has an extra stretch of aa s immediately after aa 1078 of AF125384 
 CDS but rest matches well.)
l(3)87Df FBgn0002354 73 aa = CG7620|FBan0007620|CT23225| 110 aa
(Aubrey: Warning: CDS set sequence S77927 is WRONG: it is a psq-l(3)87Df 
fusion protein!)(CG7620 has additional aa at C terminus wrt CDS of Acc. 
PebIII FBgn0011695 158 aa = CG11390|FBan0011390|CT31804| 124 aa
(better match than CG9358)
rl FBgn0003256 376 aa = CG12559|FBan0012559|CT34260 65 aa| CT39192 55 aa
(N terminus only, rest not annotated-heterochromatin region gene)
Scp1 FBgn0020908 185 aa = G15848|FBan0015848|CT40161  172 aa
(intron-exon structure needs to be fixed, ATG is further 5')
ETH FBgn0028738 203 aa = CG18105|FBan0018105|CT40683| 203 aa
Fer2LCH FBgn0015221 227 aa = CG1469|FBan0001469|CT3604| 227 aa
sut2 FBgn0028562 491 aa = G17975|FBan0017975|CT40087| 438 aa
(CT40087 starts at 3rd Met wrt CDS of acc. AF199484, missing aa 266-299)
sut3 FBgn0028561 476 aa = CG17976|FBan0017976|CT40089| 476 aa
anon-3Ca FBgn0014096 56 aa = CG18089|FBan0018089|CT40590| 56 aa
aralar1 FBgn0028646 682 aa = CG2139|FBan0002139|CT6974| 682 aa
arginase FBgn0023535 351 aa = CG18104|FBan0018104|CT40671| 235aa
(diverge after aa 197, fix structure.)
( BcDNA:GH10148  is a synonym of  mbf1)
mbf1 FBgn0026208 145aa = CG4143|FBan0004143|CT13720| 145aa
 BG:DS00180.14  FBgn0028939  648 aa = CG18146|FBan0018146|CT40902| 701 aa
(CT40902 has extra aa between aa21-22 of  BG:DS00180.14 )
 BG:DS02252.3  FBgn0028901 1931 aa = CG18109|FBan0018109|CT40741| 1379 aa
(CT40741 starts at 2nd Met of  BG:DS02252.3 , and is missing 
>500 aa at the C terminal)
Dsk FBgn0000500 128 aa =CG18090|FBan0018090|CT40598| 128 aa
 EG:152A3.2  FBgn0023541 507 aa=CG3540|FBan0003540|CT11900|507 aa
fs(1)K10 FBgn0000810 463 aa=CG3218|FBan0003218|CT10801|463 aa
Met75Ca no match (see note below)
>From Category 1:
( note:CG18064  is incorrectly identified as Met75Ca; sequence 
corresponding to Met75Ca exists but is unannotated. This is by 
alignment of Acc.AJ249253 (which has both genes) to AE003520.)
OrfKD FBgn0011820 194 aa=CG18103|FBan0018103|CT40669| 69 aa
(assignment by DNA alignment, structure has to be fixed; OrfKD is a
partial CDS derived by PCR)
poe MERGE with Calo 
(Alignments sent to AG and MA)
Sdic FBgn0025801 517 aa = CG9580|FBan0009580|CT17580| 215 aa
(CG9850 corresponds to the C terminus of Sdic)
(N terminus of Sdic is homologous to part of AnnX, and C terminus
is homologous to part of Cdic. The Sdic gene is tandemly duplicated 
approximately 10 times. So this region is a bit tricky.)
Snap25 MERGE CG17884|FBan0017884|CT39799 (exon 5 of Snap25) AND 
             CG17676|FBan0017676|CT39055 (exon 6 of Snap25)
(The exons of Snap24 extend over 120 kb acc. to FBrf0098336; I BLASTed the
pieces of DNA that have individual exons annotated on them and got the following:
exon 2 U81147 aligns to AE003395      (first coding exon) 
exon 3 U81148 aligns to AE003204
        exon 4 U81149 aligns to AE003242
        exon 5 U81150 aligns to AE003379 CG17676
        exon 6 U81151 aligns to AE002931 CG17884
        exon 7 U81152 aligns to AE002931
        exon 8 U81153 aligns to AE003013
        (All scaffolds above are short.)
AlstR FBgn0028961 394 aa= CG2872|FBan0002872|CT9822| 106 aa ( EG:121E7.2 )
(not a good BLASTP alignment, but by BLASTN this is the correct region;  
CG2872 is too short; AF163775 aligns to 159497(ATG)-178653(@)
of AE003428 but GC2872 is annotated for <175964..>178557 of same
(MERGE AlstR and GR, will send alignment to MA and AG)
Cht4 (incomplete CDS) aligns to the same region of AE003452 
as CG3986|CT12499 (complement(join(284288..284314,284377..284505)) 
but there is no amino acid alignment. 
Aubrey: Do you put correspondences in when the structure is so obviously off?
