Subject: more bee-suggested annotation improvements Dear Gillian, Early this year I sent to you, via Sima Misra, a set of unannotated genes, plus some annotation improvements, suggested by some of the 15,000 honey bee brain ESTs we have. I've been working on this more, and have about another 40 such instances, only 2-3 of which are completely unannotated genes in Drosophila. There are many more instances like these and I will try to get them to you soon. For now the attached WORD MAC TEXT file has a simple format of each bee EST sequence with a short note about what I think it suggests, but I leave working up the details of the needed changes to the Drosophila gene annotations to you folk. Hugh Hugh M. Robertson Professor Department of Entomology University of Illinois at Urbana-Champaign \------------------------------------------------------------------------------ -- More honey bee interestings things, found by actually comparing TBLASTX and BLASTX scores; wherever the former is much higher than the latter one might expect that the BLASTX found a member of a gene family, while the TBLASTX found the real ortholog Also, now comparing BLASTX to nr with all of these searches of the Drosophila genome, and those coparisons turn up additional annotation improvements. I've not worked any of these up in detail, simply note the honey bee EST that provides the data, and some indication of the problem. \***************A. mellifera BB260004B20D3.F AGATCGCCCTCACGACAACATCCCCGTGGCTCCTCTTCTTCGACCTGGTCGCCATTTACAAAGACGACCCGGACCTGAG GCTCTGCCTCGAGGTGTGGCCGCGCCCGAAAGACGAAACCCTCTTCTTCCTGATCGGCAATCTGACCCTGTGCTACGTA CTACCCACGATCCTCATCTCCCTCTGCTACATATTGATCTGGATCAAGGTGTGGCGGAGGCACATACCCTCCGACACGA AGGACGCCCAAATGGAGAGGATACAGCAGAAGTCGAAGGTGAAGGTGGTGAAAATGTTGGTCGTGGTCGTAATACTGTT CGTCCTCTCGTGGCTACCCCTCTACGTCATCTTCACTGTGATCAAGCTGGGCGACGAGCAAAGGGAGGACGAGATCGTC CCCATAGCAACGCCGATCGCCCAATGGCTGGGAGCGAGCAACTCGTGCATCAATCCGATCCTCTACGCCTTCTTCAACA AAAAGTATCGGCGAGGCTTCGTCGCGATTCTGAAG Very unusual example. The EST has two forward ORFs. One matches the translation of CG10823 at about e-18, but there is a TBLASTX genomic match at e-60 for the other Turns out that the latter is the correct translation, matching as it does tachykinin receptors in vertebrates, and the Drosophila annotation of CG10823 is using the wrong reading frame! \***************A. mellifera BB260006A20B4.F TGACTGAAGAAACACAACAGTCTGAGACTGCCGCACAAAATGAGGCACAAACTAGTTCTCCAGATGTTGGAAAAGTTAA AGATAGTAAAAAAGAAAAATGCCGACCTATGACAAAGGTAGTAATACGGAGATTACCTCCAACTATGACTCAAGAACAA TTTCTAGAACAGGTTTCTCCATTGCCAGAAAATGATTATCTTTATTTTGTGAAAGCTGATATGTCTATGGGACAATATG CTTTTGCCCGTGCTTATATTAACTTTGTTGAACAACAGGATATTTTTATGTTCAGAGAAAAATTTGATAATTATGTATT TATCGACTCTAAAGGTACAGAATATCCAGCTGTAGTAGAATTTGCACCTTTTCAAAGATTACCAAAAAAAAGAACAGGA AAAAAGAAAGATTTAAAATGTGGTACAATAGAATCAGATCCTTATTATATAAGCTTCTTAGAAACTCGTAAAAATCAAG AAGCTGAATCTAATATATCACAACCAAAAACAGAATACTCATATCAACCACCTGATAATACACCAAAAAAAATTACAAC CACTCCTCTTTTGGAATATGTAAAACAACGTAAACAAGAAAAGCAACGTCTCAGAGATGAAAAACGTGAAGAGAGACGA CGAAGAGACCTAGAACGGAGGCGAACAAAAGAAGATCCTATCATATCTAAGGTATTGAAAATCAAGATCTTGATAAAGA AATGTGTAAAGATTATAAAGAAATAGGGAAGAAAAGATAAT Shows that the N-terminus of CG11184 is missing \***************A. mellifera BB260007B10E5.F GCTACAAGAAAAAGTATTAGCCGAAACATCAATAAAAAAAGAAAATAAAGACAACACTGCAAGTGGTGGCGAATCTTTA GAAACGGTTTTAGAAACAGAAACAACTGAAAATGTAGAGGAAACCTTTAATGATCAAGAAAGTATACCTTATTGCGTGA AGACTAAACCTTATTATTCTATTAATGCGAAATTAAATACAAGAGAAAAAATTAATATTCCTTCCATTAATGCTTCCAC CAACAAATATGTTGAGAAAACATATTCGGAAAGTGGAATAGATGAGAGTACACCTGCTTTAGAAGATCAAATTACTCCA ATAGAAGATACAAAACCTCCTCCTGTTTGGGATGCGAATTTTAAATTTCTTACTAGCTCTGTGAGCGGTAAAAGTTCTC AGAATTATAATAAATCTAGTGTTACTTGTCTTATTTGCGGAAAGCAATTAAGTAATCAATATAATTTGCGAGTACATAT GGAAACTCATAGTAATAGTTCATATAACTGCACAGCTTGTTCACATGTATCAAGATCACGAGATGCCCTTAGAAAGCAC GTTTCTTATAGACATCCAATGGCCTCGCCACAAAAACGTTCACGTTATAGTACACCGAAATCCTAAAAAATGGTATCTT TAATCTTCGATTAAATCTTTGATCTAAGAATAGGACGTTTTAATTAAAACATAAAACTTTGTGTAATTTAATGCCCAAA TTTTTAACGGTTAATGGTCTAAGGAAATGCCGATTGATTTAAAAAAAAATAAAAGCCA Encodes C-terminus of a > 200aa protein LQEKVLAETSIKKENKDNTASGGESLETVLETETTENVEETFNDQESIPYCVKTKPYYSINAKLNTREKINIPSINAST NKYVEKTYSESGIDESTPALEDQITPIEDTKPPPVWDANFKFLTSSVSGKSSQNYNKSSVTCLICGKQLSNQYNLRVHM ETHSNSSYNCTACSHVSRSRDALRKHVSYRHPMASPQKRSRYSTPKS BLASTX match is weak at e-06 to multiple four-cysteine repeats within CG2889; but genomic e-15 match is to a single four-cysteine repeat interupted by an intron; N-terminus doesn't align Genomic match is to 7kb unannotated region \- try to reconstruct gene below: no ESTs to help I think there are two more exons upstream, with excellent splice sites, but they don't splice in frame to this below, so unsure what's going on in absence of cDNA ATGCCAGACAACCGCAGCATCATCATTTTGATCAGCAGCTGCAGCACAATGACAGCCAACAGCAATTCTGCCTGCGATG GCATAACCATCAGgtgagatcctccagataaaattcatttgaattttcatttgtggggggactatattcaacattacct cataagtaagatccattagaatggactggacatcgttcagtgaagcgagcaggccgaaagtaaaatgtcgccgcatcta tatataggatccatctatccaaacacattcccctatacgtacagattcacacatagttttccgcattttcatatagtga caatttttgtgtttcggcccaaagggtgtaatgcggtgaaagcggggcagcggtgaaactggctgtccaaaatagcaaa acagccaaatggacaaagagtagtgaggaggagtgttcagtgtgtgtgtcacataaatgaacagattcgtgcaattgcc aaaatccaaagaatttccacagcgagtaaatcgaaaagttggaaagcagctgcagccaacccctgtcccctgttacacc gcagacagaggagcatcatggggcccaggaaattgtatttttatgatcgtgtgacccacttatgagcacttttcacatc cccaaccccattccctctgatcccttccagACAAGTTTGCTGAGCACTCTGCCCATTCTGCTGGACCAATCCCATCTGA CGGATGTGACCATTTCGGCCGAGGGACGCCAACTGAGGGCCCATCGCGTGGTACTGAGTGCCTGCAGCAGCTTTTTTAT GGACATCTTCCGGGCTCTAGAGGCCAGTAACCATCCAGTCATCATCATACCAGGGGCCAGCTTCGGCGCCATCGTCTCA CTGCTCACCTTCATGTACTCCGGAGAGGTGAATGTATACGAGGAGCAGATACCGATGCTGCTCAACCTGGCCGAGACAC TGGGCATCAAGGGACTGGCCGATGTCCAGAACAACAATgtgagtagctcaagtgcaagatctagttagataatttaaat aaacttgtagTTTTTCCCTGAAACTCATACGCCTTTCTCTTTGGTCTCATAAATCCAAGCAGTTACCAAAAACAGCGAG AAGTGGAGGTGGCTCCTAC And not even sure about the C-terminus, although it does align somewhat AE003538.2 ATGGATACGACGAATGAGAAGTCCTCTGAGTTTGAACGTCCCACCACACCCTCGCCCACGCCCACCCCCACCCTTACGC CCTCCCACACGCCTACCCCAAGCCATTCACTCCCCCTGCCCCAGCTACCAAGTGCAGCACTTAACACCCCTCTCCTGGC CAACAAACTGGGATCGGTGAACTCCAGTGGAATGGGCACCACGCCCCTGGAGAATCTCTTTAAATCACTGCAGTTCTAC CCCAGTCTGCTGCCCCAACCACTCAACTTCTCGCAGACGGCGCTCAACAAGACCACTGAGCTGTTGGCCAAGTATCAGC AGCAATGTCAGCTCTACCAGAGCGGGATGCAGGAGGATCAACTGGAGACGGACTGTTTTGGCTCCAAGAGGCTAAAGGG CGACAGTCCGCCGAAGGAGCTTAGGCGACTGGAAAAAAGCCTTTTAAAGAATCCAAAATCCTCATCGACGACCAACAGT AGCAGTAGCAAGTCGCCTCAGGAATGCTCCAACCCGAATCCCATTGTGGCCACTTCACCAGTAACTCTCGCTCCCCCGA CCATGGCACACTTCTCCCCTCAGTTGCCGGTGGTCAAGTGCTCCTCGGCTAGTTACCCCAGCGCTCTCGGCCAAGGTCA ACTCTATAGTAGCAAGCCACCACTTTATAGTGCAGCAGTCACGCCGACTGCCGCCCAGCAGGCGGCGCAGATGCACCAC CACCCGCAACCCGGGCCATCGCCCTACATCTCGGCCGAGGATCATGCCAAGCTGCAGCTCCACATCGAACAGTATCAGC GGGAGGCGGCAGCAGCAGCGGCAGCGGCCGCCGGCGGAATGGCGCTGGTCAGCGCCAAGTCGGAGCCCAATCTGCTCTC GCTGAGCGCCGACCGCGACAAGTCGCTGGCCACCGCTCCCATCAAGCCGCCGTCCAACTCGAAGCTCTATGCCACCTGT TTCATCTGCCACAAGCAGCTGAGCAACCAATACAACCTGCGCGTCCACCTCGAAACCCATCAGAATGTTCGgtaagtgg tctgaattttaattgttataaaacaatagaagtcacctttggaattacttttcgattccatcagGTATGCCTGCAATGT CTGCTCCCATGTGTCCCGCAGCAAGGATGCCCTGCGCAAGCACGTTAGCTACCGACATCCTGGGGCGCCATCGCCATGT CGAAAACGAGGCTCGCCGGAAGAGGGTCTCCAAGCTAGCAGCGACCACTGTGCCCACTTCCACGCCCATGTCCATGAGC GCCAGTCACACGGTCACCAGTGGCGATGTGGGTCCAGCTCCAGCGACCACACTTGGATGTTCGGGTCAGGAGGCAAGGA ATCCGTACCTTTTCCTGCCCAATCAATTTCAGATGGCTGCTGCTGCAGCAGCCGTAGCAGTGGCCGAATCCTCGCCAGC TTCTGGTCAACCATCGCTAGACTTGGCACACGAAGCGCCACCGAGCATCAAAAGTGAGCGGGAGCCACCGACGGCGAGC AACGGAGAGGCGACCGGTGTAGAGGCATCGGCGTCAACCACCTGAGGATAATTTTTTTGAATATTTTT translation M D T T N E K S S E F E R P T T P S P T P T P T L T P S H T P T P S H S L P L P Q L P S A A L N T P L L A N K L G S V N S S G M G T T P L E N L F K S L Q F Y P S L L P Q P L N F S Q T A L N K T T E L L A K Y Q Q Q C Q L Y Q S G M Q E D Q L E T D C F G S K R L K G D S P P K E L R R L E K S L L K N P K S S S T T N S S S S K S P Q E C S N P N P I V A T S P V T L A P P T M A H F S P Q L P V V K C S S A S Y P S A L G Q G Q L Y S S K P P L Y S A A V T P T A A Q Q A A Q M H H H P Q P G P S P Y I S A E D H A K L Q L H I E Q Y Q R E A A A A A A A A A G G M A L V S A K S E P N L L S L S A D R D K S L A T A P I K P P S N S K L Y A T C F I C H K Q L S N Q Y N L R V H L E T H Q N V R---------------------------------------2-------------------------------- Y A C N V C S H V S R S K D A L R K H V S Y R H P G A P S P C R K R G S P E E G L Q A S S D H C A H F H A H V H E R Q S H G H Q W R C G S S S S D H T W M F G S G G K E S V P F P A Q S I S D G C C C S S R S S G R I L A S F W S T I A R L G T R S A T E H Q K Z Leave it at this for now. encodes weak similarity to several other zinc finger proteins in Drosophila, e.g. sob \*********** extra \- there is a single testis EST for near the end of this 7kb section, beyond this gene, which could encode a small protein; no matches at all. bs85b10 5' GCACCAGGTTTTTCGTCTTCTGCACTCGAGCGCCTTTCTAGCTATCTTAATTTATGTACTTACGTTTAGCACAATTTAC GTTTATTTTTTCTGAAACTTTAGCAACCTCCGAGGCTCTCAAACTGCAAGTTACAAACACAATCCTTTCGCTTTCACGC TCTCTCACAAGCACACGTCCACACATTCTAGTAATCTAAGCCAAGTTTTTATAAATATTGTAATTATACACCTGAACAC GCACACACACTTGCACACAAAAAGACAGGGTGAAGACCTACCAAAAAAAAAAAAAAAAAA translation M Y L R L A Q F T F I F S E T L A T S E A L K L Q V T N T I L S L S R S L T S T R P H I L V I \************A. mellifera BB260008A10B2.F GCTAAGGGAAAGGCGAAGTCTCGTTCCAACAGAGCTGGTTTGCAATTCCCTGTTGGTCGTATTCATAGACTTCTTCGCA AAGGAAATTACGCAGAACGTGTCGGTGCAGGAGCACCAGTGTATTTAGCAGCCGTTATGGAATATTTGGCTGCTGAAGT GTTGGAATTGGCGGGAAATGCTGCTCGTGATAATAAAAAAACCAGAATTATACCACGTCATCTTCAACTTGCTATCCGT AATGATGAAGAATTAAATAAATTACTTTCTGGAGTAACTATTGCTCAAGGTGGTGTTTTACCAAATATTCAAGCAGTTT TATTGCCAAAGAAAACTGAAAAAAAAGCTTAACCATATAAACATCGATATAAATGGCCCTTTTTAGGGCCACAATTTTT TAAAACGAAGGAATTTCTTATCCGATTCATAATTAAATAAAATACATTTTTATATAATAAAAAAATATAGAATAGAGAA GTATAAATAATCTAAAGATATATTATTCAAAGATAGAATTAAATATATTATTGAAAAAAAAAGATATGATATAGAAAAT TTCGTGAAATCATTTTTAATATAATAAAAGCAATATATTTTTATATTATTAACTACAATTTAAATCATACATATCTATT TTCAAAAATTTTATATTATAAGTTATTTTATAAATATAATATATTATTATTTTTCTTTTTTGTTAATTATTAAATATTT GTAATTTTCTTTATATCTAATAAGACTAGTAAATATAACATAAGAATATTGGACTATTAGATTCGTAAATTGCATTATA AAATAATAAAAATTATATTTAGTCTTATACCGTATATTTTTTTTATACG This encodes the end of Histone 2A, but genomic match is not annotated because it is a smallisolated scaffold with the N-terminus truncated! And there is something else wrong with it because it does not encode the C-terminus \- indeed the ESTs show there is a deletion in the genomic DNA. There are many ESTs AE002735.2 TGAAGGGAAAGGCAAAGTCCCGCTCAAACCGTGCCGGTCTTCAATTCCCTGTGGGCCGTATTCACCGTTTGCTCCGGAA GGGCAACTACGCAGAGCGTGTTGGTGCAGGCGCTCCAGTTTACCTAGCTGCCGTAATGGAATATCTAGCCGCTGAGGTT CTCAAGTTGGCTGGCAATGCTGCTCGTGAGAACAAGAAGACGAGAATTATTCCGCGTCATCTGCAACTGGCCATCCGGA ACGACGAGGAGTTAAACAAGCTGCTCTCCGGCGTCACAATTGCACAAGGT----------------------------- -------------------------------------------------------------------------CGTCAA TCAAACCGTCCTTTTCAGGACGACCAAATTATTAGCAAAGAATTGAAAAAAATTTTAACCACGCAATTTGTTGTATAAT ATTAAATCATACAAAAAATATTTCAAACTATTTATTTACGTAAAGATTGTAATATAATACGGTTTTTGTATTTTTTCTA TTATATGCGGTATAAACTATAATTTGTTTCTTTAATTACTCACACATTACTCTAATTACTAATTAGATTACTCTCCAAT TATAATTACTAATAAATTACTCTCCACAAATCAATGCTAGGAATACACCTTGGTATACCTGAAGGAGTACGAACGCCGG ACATTTATCATACGCGTTACTTTTAGAGTAAAAGGGTATACTAGATCAGTTGAAAAGCATGTAACAGGCAGAAGCCCCA CCGCTATCGCCCTGGAACCGTGGCCTTGCGTGAAATTCGTCGCTACCAAAAGAGCACCGAGCTTCTAATCCGCAAGCTG CCTTTCCAGCGTCTGGTGCGTGAAATCGCTCAGGACTTTAAGACGGACTTGCGATTCCAGAGCTCGGCGGTTATGGCTC TGCAGGAAGCTAGCGAAGCCTACCTGGTTGGTCTCTTCGAAGATACCAACTTGTGTGCCATTCATGCCAAGCGTGTCAC CATAATGCCCAAAGACATCCAGTTAGCGCGACGCATTCGCGGCGAGCGTGCTTAAGCTGACACGGCATTAACTTGCAGA TAAAGCGCTAGCGTACTCTATAATCGGTCCTTTTCAGGACCAAAAACCAGATTCAATGAGATAAAATTTTCTGTTGCCG ACTATTTATAACATAAAAAAAAATAAGAGAACAAAATTCATATTCTATTATTTATGGCGCAAATGGTACTGGGTCTTAA ATGTAAAAATAGTAATTCTTTCAGAGAAAGAATCAAAATAATCTT ESTs GGCACGAGGACTAAGTGAAATAAACGCAAAGCAAAATGTCTGGACGTGGAAAAGGTGGCAAAGTGAAGGGAAAGGCAAA GTCCCGCTCAAACCGTGCCGGTCTTCAATTCCCTGTGGGCCGTATTCACCGTTTGCTCCGGAAGGGAAACTACGCAGAG CGTGTTGGTGCAGGCGCTCCAGTTTACCTAGCTGCCGTAATGGAATATCTGGCCGCTGAGGTTCTCGAGTTGGCTGGCA ATGCTGCTCGTGACAACAAGAAGACTAGAATTATTCCGCGTCATCTGCAACTGGCCATCCGCAACGACGAGGAGTTAAA CAAGCTGCTCTCCGGCGTCACAATTGCACAAGGTGGCGTGTTGCCTAATATACAGGCTGTTCTGTTGCCCAAGAAGACC GAGAAGAAGGCCTAAACGTTTCAAAGGCTAAGCTAAAAACCTACATGTACATAAAATCGTCAATCAAACCGTCCTTTTC AGGACGACCAAATTATTACCAAAGAATTGAAAAATTTTTTAGCTTGGCAATTTCTTGTA-ATTAGTAAATCATAAAGAA TTATTAACGTAAA Histone 2A M S G R G K G G K V K G K A K S R S N R A G L Q F P V G R I H R L L R K G N Y A E R V G A G A P V Y L A A V M E Y L A A E V L E L A G N A A R D N K K T R I I P R H L Q L A I R N D E E L N K L L S G V T I A Q G G V L P N I Q A V L L P K K T E K K A \* \**********extra There are some weaker matches to this histone and they are not annotated either This one is to a 15kb scaffold with just one protein annotated at the front No ESTs for this clear histone relative \- not sure why AE002870.2 tattgccccacaagcttagccgaaaaATGTTTCGGTCACATTCCCTCCTCTTCACTTGGTGCAAAATAAATTGCCGGTG CCGGTCTTCAATTCCTGTGGGCCGTATTCACCGTCTGCTCTGGAAAGGCAACTACGCGTGTGGGTGCAGGCGCCCCAGT TTACCTAGCTGCCGTAATGGAATATCTGGCCCTGAGGTTCTCGAGTTGGCTGGCAATGCTGCTCGTGACAACAATAAGA CTAGAATTATTCCTCGCCATTTGCACCTGGCCATCCGCAACGACACGGAGTTAAACATGCTGCTCTCCGGCGTCACAAT TACACAAGGATGCTCTCTGTTGCCTAAAAAGTCAGAAAAGAAGGCCTAAACGTTTCAAAGGCTAAGCTAAAAAACAACA CGTACATAAAATCGTCAATCGATGC translation M F R S H S L L F T W C K I N C R C R S S I P V G R I H R L L W K G N Y A C G C R R P S L P S C R N G I S G P E V L E L A G N A A R D N N K T R I I P R H L H L A I R N D T E L N M L L S G V T I T Q G C S L L P K K S E K K A \* \*********** Check out the rest of this 15kb, and in the first half above histone relative is all there is with BLASTX matches, and no ESTs for it But the second half has two long 1kb long ORFs that must encode something, on the opposite strand to the histone gene But it's just a boring LTR retrotransposon! \**************A. mellifera BB260007B20F9.F AAATCATGTCGATTACTCGCGCACTTAAGTAGCAATTAATTAATATTCGTATATATATCCGATCCGATTTTTTTAGAGC ATGATAATGTACCGTGAATATGCTCTAACACTGAAATGTGAGTTTGAATGTGTCGAGTGTCCAAGGAATTTTTTTTTTT TTTCAAAAAGAACACGTGCGTGTGATTGTGCGGGGAGGAAGACGGGATGACCACTGGCTTCTTGTCTTCTTCTGCCCTG CGTTGCGCGTGAAATGGTAAGCGTGAATGTAGATGCGTGACTGTGGACGAGTGATTCTGACGGGAGAGACACTGCGAGA GTTGTTTTTTCTCTCTCTCTCTCTCTCTCCCCCCCTCCTCCTTTTCCTTTGTAACTATTGTTCGCGATCGATCCCGCAA AAGTATGTTACGTAATAAAAGTATCTGGCACATTTTCTTTCAGTTACTGGACACCCTCCCGGTGTGCCAAGATTTTAAT CGACAAGTGTGCAACCGTCCCGCCTGCAAGTTTATCCATCTCAGTGACGGAAACGTGGAGGTGATCGAGAATCGCGTGA CCGTGTGCAGGGACGCACTGAAGGGCGCGTGCATGCGACCCCAGTGTAAATATTATCACATACCGGTCGCGTTGCCGCC GGCGCCCTTGATGGCGATCACGTTCCCTGCGACGCCCTAATTACTCCTTCTCGTTTAGCGGATGTGTGCATCACGACGA GGAGAATAGAACACCGGCCAACCGATGATGATGAACGCGGAGAGAAAATTTTGACAGGGATCGAAAGAAACGAGGATCC AACCAAGGAATTAATTGCCCAAGAAAGCATCGC This appears to be an unspliced transcript with an exon in the middle. It has genomic matches e-21, but BLASTX only e-08 and to vertebrate equivalents of muscleblind B isoform Turns out the current annotation is incorrect near the C-terminus, leaving out at least one exon. It is a massive 100kb gene, with huge introns, and I can't find anything in them! \************A. mellifera BB260003B20D12.F AAATCCTAACAGTTCTCATGTACTTACGGAAGATACTATATCGAGAAAAGTTAAAAATGGTATATTATATAGCACACGT CTTTTGACAAAAACTAATAGAGTACCTAAATGGGGAGAGAGATTTGTTAGCAAAAATATAGTAAAAATTATTGAAGAGA GTATAGTGGATCCAAAAACAAAAACTTTAACAACATATACAAGAAATTTAGGTTACACTAAAGTCATGAGCATTGTAGA GAAGGTTGTTTATAAAGTATGTGAAGAAAACTCTAATTGGACAGTAGCAAAACGATCAGCTTGGATTGATAGTCAAGTA TTTGGATTCAGTAGAGCTATCCAAGCATTTGGATTGGATAGATTTAAAAAGAATTGTACTCTGATGTATAACGGGTTTA ATTACGTTCTAGCTCATTTGTTTCCTCACACAGCACAATATATGAATCCATCGCTTTCTCAAATGGGTTTTGCTCATCT AGTCGAAGAATTTCCTGGAAAGACAAGTTTAGCAGAAGATTTCCAACATTCGTTACAAGGTAAAGCAGAAAAAGTAAAA GATGCTGCGAAAAAGGCAACTGATTTAGCAAAGAAAAAGGCTGGCACCATTTATGCTACATAACATTCTGAGCAATCAT AATTAAAACAACATCGTCGGTGTGAATGATTTAAAATATTTGGCGAATAGAGAAAATGAAAATTCAGAAAATAATTTTA AAATGGAAACAAAGGATAATTATTTGATGGTTTATTGGATGAGTAAATAATAGAATGAAATTGTTAGCGTTATTCGTTA AAAAATA This one is barely better in TBLASTX, but the human BLASTX is much better, and indeed find the annotation of CG8806 needs fixing, I think to remove an exon \************A. mellifera BB260004A10B10.F TCAACAAACTATTCCATCGAATATGAATCATAACTTTCTTCAGAGCCTAAAGATGGCAAATCAATTTTAAATCGATATA TGAAGTTGTACTCGTCTGCCATCTCGTGGCGCTACGTAAAAAATATTTATCGTACCGATATAACATGGCGTTCACGTTG TGGGCACTTTTCGAGGTAACCGTGTTATGTTTGAATGCGGTCTGTATTTTGAACGAAGAGAGATTTCTTGCAAAAGTTG GCTGGGCATCGTGGCAAAATGTTCAAGGTTTTGGAGAAACTGCTACAGCTAAATCACAAATTTTGAATCTTATTAAATC GATACGAACGGTAGCACGAGTTCCATTGATATTTTTAAATATCATAACAATAATTGTGAAACTGGTGCTCGGTTGAAAG AAACAATTGATGATGTAAATAAGAGGATAATGCACAGTGAAAAGTCAAAGCGTGATGTACATAAATGAACAATTAAATT CTTTTATTTTTATCGCTGCATTAAAAAAAAAAAAAAAAAAAGCAAC Shows that CG6316 needs to be split into two genes. The C-terminus matches this cDNA, which is full-length and encodes an 80aa protein with excellent matches to similar length human and yeast proteins \***************A. mellifera BB260004A20B3.F TTTCTTCTTAGATTTGATACTAAATAAAGAACTTAAATTGCATAATTTCAATTTATATAAATGACTATGCAGACTTTAA ATTGTGTGATGAAAGTGATTTTGAACTATATATGTTTTTTTAAGTGCGCTTCTTATAGTGAAATTAAAGCATAAGGTGA TAATTAATAATTAATCAAAATGGCTAACATTCAACTTCGTGAATTAGAAGAATATTTACAACAGTTGGATGGATTTGAT AAACCAAAAATATTACTTGAACAATATTGTACTAGTGCTCATATTGCATCACGCATGTTGTACTGTGCTGAAGTTCAAT TTAATGACATAGAGGGACATTCAGTAGGTGACTTAGGTTGTGGGTGTGGTGTTTTATCACTTGGGGCACAGATGCTTGG AGCAAGTCATGTAATTGGTTTTGAAATAGATTCTGATGCACTTAAAATTCAATCTAAAAATTGTAATGAAATAGATTTG TTTGTGGAAACTGTACAATGTGATGTATTACAATATTTACCAGGCCGATTTGAGAAGTACTTTGATACAATTATTATGA ATCCACCATTTGGTACAAAGCATAATACAGGTACAGATATGAAATTTTTAAAAGTTGCAACCAAATTAGCATCAAATAC AGTGTATTCATTACATAAGACAAGTACCCGTAACTATGTTCTTCAGAAAGCTGCACAATATGGAGCCAAAGGCAAAGTT ATTGCAGAACTGAGATATGATTTACCAAAAGCATATAAGTTTCATAAAAAAATGTCTGTAGATGTTCAAGTGGATTTTA TACGATTTGAATTAAATTACTAAATATTTTCATGAAAAAAAA Shows that CG9666 needs to lose its N-terminus and gain a C-terminus by adding a simple exon. Human and C. elegans proteins confirm this new structure. cDNA LD25448.3prime confirms this \***************A. mellifera BB260004A20G4.F TCATGTGTGAGGATACAGTGACAGCTGTGGACAGGTACGCGTTTTAAATCTGTTATGATTAAAAACGTACCAAGTAAAG CCGGTCATCGACCAGCTGTGAATTCAAGAAAGAGATCGTTGTAAACGCGCGTGTCGAAAATTCTGTGAACGTGCGTTTT GATTTATCGGTTTATTACGCCTAGTCGAAATCGTATTACCGAAAAGAGTCTTACCGAAAGAAACGGTGCGTGTAAGAAC CTTTTTGGATACTACCTGACAAGATGTGGTGTCTCGTTAGTCAAGCTAATTCTGTCATTCTTGAGGTACAAGTCGATCC CAAAGCTATTGGTCAGGAGTGTCTCGAAAAGGCATGCGATTGTTTGGGCATTAGCAAGGAATGCGACTACTTTGGGCTG AAGTATCAGAACGCGAAGGGCGAGGAGCTCTGGTTGAATCTGAGGAATCCTATAGAGAGGCAAACGGGCGGCGGTGTGG CCCCGCTAAGATTCGCATTGAGGGTTAAGTTTTGGGTACCGCCTCACCTGTTGCTGCAAGAAGCTACCAGGCATCAATT CTACTTGCACTCTCGCCTCGAGCTTCTCGAAAGTAGGCTAAAAATGGCGGATTGGAGTTCGGTGGTGCGTCTGTTTGCT TGGATAGCGCAAGCCGATATTCGTGATTACGATCCATTGTCGCACCGAACGCCCTCTTCTTGCATTGCTGTCCAATTCA ACGGCGGAAACAAGCGAATCAAACCGTTGGTCTTATTCACCGGATGGTCCACCACCCAAGAATTGAAGGGAAGAAGCCT TCCCGGG This suggests the annotated N-terminus of CG12489 might be wrong, since the honey bee sequence nicely matches the N-terminus of human ortholog, yet cannot find better N-terminus in Drosophila genome? Drosophila cDNAs agree with current annotation, so may be real difference between Drosophila and other insects \***************A. mellifera BB260008B10H2.F GCGAGCGGTGGGACGGGCGGGTACATCGCGCCTACCACGTGGGATCAGCTGATGCAAGACGATCATTTCCTCGGCAAAT TCTTCCTCTACTTCTCCGCCATCGAGAGGAGGATTTTGGCTCAGGTATGCTTAAGATGGAGAGACATACTTTACGCGCG GCCTCGACTTTGGGCAGGCTTGGTGCCCGTAGTAAGATGTCGCGAGGTACGTGCCATGCCTTCTACTTCACGCACGCGG CTCTACGCCTCCTTAGTTAGAAGAGGATTTCATTCGTTGGTTCTTCTCGGAGCATCGGACGAGGATATCCCGGAACTGA CGCACGGCTTTCCATTAGCGCAAAGAAATATTCACTCGTTATCGTTGAGATGTTGCGCGGTGACCGACAGGGGACTAGA AGCTCTCTTAGATCATTTGCAAGCGTTGTTCGAGCTTGAACTAGCAGGTTGCAACGAAATAACGGAAGCCGGGTTGTGG ACTTGCTTGACACCTAGAATAGTATCGCTCTCCTTGTCGGATTGTATCAACGTGGCTGATGAAGCCGTCGGTGCTGTCG CTCAATTACTGCCGAGTCTCTACGAGTTCTCGTTGCAAGCTTATCATGTAACCGACGCCCGCCTTGGATATTTTCACGC AACCCAGAGCAGCTCCCTTAGCATCTCAGGCTGCAGTCCTGCTGGGGACCTACCCACCATGGTATAGTCAATATTGTAC ATTTCTTGCCCAATCTGACTGGTCTATCGGTGGCCGGATGCAGCAAAGTAACCGAGGACGG Something very strange here. An excellent human match is recovered, as is a TBLASTX to Drosophila genome, but can't get a BLASTX match to Drosophila protein; yet there is a protein annotated in this region, CG6060, but the annotation doesn't quite match \- check it out Matches entire human protein, but clearly is longer at N-terminus at least. Weak full-length match to Partner of paired , from 170 of 550aa 225198RC-ATGGAGAGAGCACGGGGAAAGCCACGGGTCGAAAATCGATACTGCTGCCCAGGCAGCACGCAGGCTTACG GATACGGACTCTCCACCCGGACCCTGGAACCCCGGATTCGGGCGAAGGCGgtgcggagtgcattatggcaatgtggag- 2822-ccccatggtgtatcgtgttgcctcgttacagGTGAAACTCGCTAATTGTCGGGTGGAAAAAGgttagcaataca cagagcaaacgggatgtaaaggataagcaaagccgacaccataagcgattgaaaaacgaagcagaggagaggacgaagg actccacttttccagagcgcttcgtttcccagGAGGCAAGGTGAATCTGTTCCAGCACACGATGTCATCGATCTCGGCG CAGGGCGTGGTCGAGCGAGCATCCGCGGAGTTGTCGAAGCGGATCAATGGCCTGGGCCTGCGCTCGAAGCACCATCATA GCAGCACATCCAGTGGTGCTGGTGGCGCCGGCGATGCTGCATCCCCGGCAGGAGCCACGCCCACTCCGGCAGCGCCCAG CGGCAAGACGTCGGTGATGGAGCGCGTAACGAACGCCCTGTGCGGCGGTGGCAACTCAAATTCTAACTCAGGATCGAAT AGCTCCAATAGCAACACCTCTTCAGCGTCTGCCACCGCTGCCACATCGCCCGCCAGCAACGCCAATCCTCCACAGACGC CGGACAAACCGTCGCGTGGCAGTAGCCCCAGTCCCGGCGGTATCACAATGCCAGgtggccagtcgcaggtccagaactc cacacaccacctcctgcagcagcaacaacagcaacagcagcatatgcagctgcaacaatcgcagcagcagcatctccag ctgcaagcctccacgctgatcaactccaaccaccatgtgatggtgggtcctgctccgcccactggcatgcctctgggtg ccccgcccacgccgacagtgaagtccattgccaagcagatgaacataaccataccggg CG6060 M E R A R G K P R V E N R Y C C P G S T Q A Y G Y G L S T R T L E P R I R A K A \-------------------------------0--------------------------------- V K L A N C R V E K \----------2----------------------2----------------------2--------------------- -2----------------------2--------------------G G K V N L F Q H T M S S I S A Q G V V E R A S A E L S K R I N G L G L R S K H H H S S T S S G A G G A G D A A S P A G A T P T P A A P S G K T S V M E R V T N A L C G G G N S N S N S G S N S S N S N T S S A S A T A A T S P A S N A N P P Q T P D K P S R G S S P S P G G I T M P \-------------------------------1---------------------------------------------- -----1---------------------------------------------------1--------------------- ------------------------------1------------------------------------------------ --- G R S K S R F A H L Q H H G H G G R P E G G G Q C W R Y T V A V A E T T A Q S S P A PSVWQYGHQRSIAADHATPAPPAAYPSMSTRSVCAAPRLATGDWRPCLTTCRVCLNWSWLAATR I can't reconstruct this gene with their coordinates, so leave it up to them to sort out. \************A. mellifera BB260008B20A1.F AAGCGCAGCTAGGACGTCTCCAGTACGGTCGACGCCCTAGTTCCCCTCCGCATGGAGGGAACAGGCGCGCACCCCGAGG GCCAAATCTACTTCCTCACCCCCTGACGCGACCACGACATGGGAAGATCCACGAAAAACAGCGGCGGCGGCGAACGTTG CGGCTGTGGCCGCAGCCGTCGACAATGGGAAATCCTCGACCGGCGCTACCAATTCTCTAGGTCCATTGCCCGACGGATG GGAACAAGCGCGTACTCCCGAAGGAGAAATCTATTTCATTAATCATCAGACACGCACCACTTCGTGGTTCGATCCAAGA ATCCCTACTCATCTTCAAAGGGCTCCGACCTCAGGTGCAATGTTACCGCAAAATTGGCTTCAACAGCAACAACCTACAG GTGGTGGTATTCAGAATAATCAAACATTGCAAGCGTGTCAACAGAAACTTCGCCTCCAGTCGCTACAAATGGAACGCGA GCGTCTCAAACAACGGCAACAGGAAATTATACGTCAGCAAGAGCTAATGCTTCGACAGAGCACCACCGACGCCGCTATG GACCCATTTTTGTCGGGAATCAACGAGCAACACGCACGCCAGGAGAGCGCGGACAGCGGCCTGGGCCTTGGTTCCGCTT ATTCCCTCCCTCACACACCGGAAGATTTTCTTGCAAATATCGACGATAATATGGATGGTACAAGCGATGGCGGCGCACC CATGGAGACCCCGGATCTTTCTACTCTGAGCGATAATATCGATTCGACCGACGATCTCGTTCCATCGTTACAGCTGAGC GAAGATTTTAGTAGCGATATTTTGGACGATGTGCAATCGTTGATAAACCC Has much better mammalian matches, and indeed indicates that there is at least an exon missing from CG4005 \************A. mellifera BB260008B20D1.F TCAACCCCTTGCGTGGTACCGATGTCATTTTACCGGAAACCGCAGTATTCGTAATAGCGCACAGCCAAGCTTGTCATAA CAAAGCTTCCACAACAGATTATAATTTAAGAGTTGCAGAATGTCGCTTAGCTGCACAGATGATAGCAAAGAAAAGAAAC AAACCTTGGGAACATGTACAAAGACTAATCGATATCCAAGAGAGTCTTAATATGAGCTTAAACGAAATGGTTTCAGTTA TAACAACCGACCTTCACGAAGAACCATATACCCTGAGCGAGATTAGCAAGAACCTTGATACAACGAATGAGAAACTTCG TGAAATATCATTATTACAAAATTTTAGCAATGCGCAAATTTTCAAATTGAAACAACGCGCTCTCCATGTGTATCAAGAG GCGGCTAGAGTGCTCGAATTCCAACATATTAGTGAGAAAAATGCAATTATGGAAGAGGAGAAGCTAAAACAACTGGGCA ATCTGATGTCCAACAGCCATTTCAGTATGCACAAACTATACGAGTGCAGTCATCCTAGTGTCAATTCACTCGTTGACAA AGCTATGGCTTGTGGTGCACTCGGTGCAAGGCTCACGGGAGCTGGATGGGGTGGCTGCATAGTGGCCATCATAACGAAA GACAAGGGTTCTCACATTGTGGATACACTGAAAAAAGAACTCGATCTATGCGGGATAAAGGATGGATTCAAGCTCCACG ATTTGGGTTTTCCAACGGAACCGAACCAGGGTGCTGCAATTTATATGAGCTAAGTTTCATTTTCATCTGTTCCTGGCTC TTGCATTTATAATTTCGACCATAATTATTAAAGGGTTCTAAGATATATTCTAATTTAAG The C-terminus of this and the mammalian orthologs indicate that the drosophila protein CG5288 needs to be longer, and the genome encode the appropriate sequence \************A. mellifera BB260008A20D9.F TTACGTTAATATTTGAATGAATAAATCGAAGAAATTTGGTTCATAAAAAATTTTTAAGACATTAATCGTGATGTGTCTA CAAATTATCCATTGATTTATTCGAAAAACTTTCGAGATATTCAACGGTCAGTTCCTATTAATAAATCGATTTTTCGCAT TCGAGTGAAGTTGCAGAGAGTTCGATCGATGGAACGACCAGAGGATTTAGTGTCGAAAGAAATGTGCAGAATGTGGTCA ATTTACAACGTATAAATTGTCTATCAGTTAAAACACGAAGAGCAAAATGTCGGTTGAAGTGAAGGGGGGTCGACCAACA ATGCCAACAATTCCACAATCCAAGAGACCAACCATTTTCGTTTATCCCACAGTAACTCCAGAGAGCATTATCATCCCGA TAGTATCATGCATACTCGGATTTCCATTACTGGCCCTTATGGTCATCTGTTGCTTAAGAAGAAGAGCAAAGTTAGCGAG AGAACGTGCACGAAGAAGAAATTGTGATCTAAATCATGGAACCCTTAGTCTCGGTCGTTTTAGCCCTGTTCACCGGTTA AGTAAATTAAACATCTTCTTTTTTATTTTAATACCATGTACATATCTAAAAAAAAGAAATGCAATTTATTTAAATTAAA TTTTACTTGAATTATTCCATCCTTACTACCACATTATGGTCGAATTAAAAATTAATGGTTAAGTCTTAAACTTAAAAGC CACGTTTTCCTACCAAACCTTTTTGTAACAAAAACACATGGAGGCCCCAAACAATTGGGTTCAACCCCAACCGCTAATT CGGTTGTTTTTCCCCTAAAACGCTGCTGACT Unspliced transcript has an exon in the middle, flanked by splice sites, with good match to unannotated region of Drosophila genome \- no BLASTP or EST matches 7kb in front and 3 after available agcctttcccttttaaatccatttcagTTTCACCCGAATCGATTGTCATCCCGATCGTCTCCTGTATCTTCGGCTTCCC CATCCTGGCGCTTCTGGTGATCTGTTGCCTTCGAAGGAGGGCCAAGTTGGCCAGGGAGAGGGATAGGAGGCGTAACTAC GATATGCAGGACCATGCCGTCAGCCTGGTCAGATTTAGTCCAATACATAGGCTTAGTGAGTTTGAATGTCGTAATGTAT TGTGTGTACGAAGAACGTTTCTTTTTCTCGTCTCGTCTCGTTTTCTTTTTTATCTCTGGTTTT V S P E S I V I P I V S C I F G F P I L A L L V I C C L R R R A K L A R E R D R R R N Y D M Q D H A V S L V R F S P I H R L S E F E C R N V L C V R R T F L There are several good looking splice sites and ORFs in the 3kb 3' to this, but can't easily put a gene together without any more guidance. Same for the 5' 7kb. \*************A. mellifera Contig1 GGAAAACTATTTTAACTGTCAAAAGAGAAATTTCGAACACATATTATATTATGTCTCATTCCACATCTCAGAATAAAAA CTACAATATTATTATTGGAGATATGCGAATTGCATACAAAGACAAAAACGAAACTTTTACAGAAGATCATCTTGTAAGT AAAGAACCAATTGGTCAATTTAGAGCCTGGTTCGATGAAGCATGCAAAATTCCGCAAATTTTTGAAGCAAATACAATGT TTCTTGCCACAGCTACCAAAAATGGAATCCCGTCCGTGCGACCAGTATTACTCAAAGATTATGGAGAAGATGGTTTCAA ATTTTACACTAATTATGAAAGTAGGAAAGCTCGCGAAATAGCTGAAAATCCAAATGTGGAGGTGAATTTTTACTGGCAA CCTTTACATCGAAGTGTACGTATAGCAGGTACAATAAAGAAAACTTCCTTAAAAGATTCAGAACGTTATTTTCAAAGCC GACCATATGCAAGTCAAATAGGATCAATGGCTAGTAAACAGAGTAGTGTAATTGCAAATAGAAATACACTTATAATAAA AGAAAGAGAATTGTTAGCTCAATTTCCAGAAGGGAAAGTTAAAAAACCAGATTGGTGGGGAGGATATATTATTATTCCA CATTCCATAGAATTTTGGCAAGGTCAAAGCGATCGCTTACACGATAGAATTCATTTTAGACGATTAAAACCAAACGAAA AAATCGACAATGTACTTGTTCAT Vertebrate matches are 260aa proteins, so suggests that CG2649 gene product is a fusion of two related genes, since first half matches well and second half matches more weakly \***********A. mellifera Contig1058 ACTAATCTTCCTTTGCCCCGAGAGCCGGAAATTAGCTATTGCGGGGAGCGGAAGGCACGTTGTCTTGTTCAAATTCAAG AAAGTAGAGAGTATGTCCGAGGTAGTGACTTTGGATATATCACTGACGGCCGAACCGGTTAAGGAAGTGGAAAGTTCAT CCGATCACGATTCTCCGGCTGGCGGCAACACTTCAGGGAGCAGCGAGTCGAAGAATAATGAATCGAGCCAATCGCTGAA AATTAAGACTGGTTTGCAGAAGAGGGCCGCGGGTTTCCAAGCGACCCTGGTCTGCTTGACGGTTACCAACAGCGGGGAA CAAGCTGAGAACATAACTGCTCTCAGTTTGAACTCTTCCTACGGTTTAATGGCTTACGGGAACGAGTGCGGCATAGTGA TAATAGACATTGTCCAGAAGATCTCTTTGATCGTGTTGAACACGGGCGACATAGGTGGTAACGTGGATCTGTGCCAACG GGTGCTGCGCAGCCCGAAACGTCAGGACGAGTTGAAACGGGATAACGAGGACAAAGCGAGGAGTCCTAGCACAGATCAG CCAACTATGTGTCTACCCACGTTGAAACAAGTTCAAATCAGTTTTGCGGTCTTCCCCGACAGCAAGGTTGATTCAGACA AATATGACAGTTCGTTTCAACGGTCAAGGAGCTCGTGCATGTCGTCGCTCGAGAACATCACCACCGAGACCATCAGCTG CCTTACATTCGCAGATTCCTACACGAAAAAGAGCGATACCAGCCCCGTGCCAACGCTTTGGATCGGTACCTCTTTAGGA TCTATACAAACGGTGATATTTAACACGCCGCCTCGTGGAGAACGACACGCGCATCCGGTCGTTGTTTCTACGTGCAACG GATCAACGTTCAAGTTGAAAGGATGCATTCTGTCTATGTCATTCCTGGATTGTAACGGGGCCCTGATCCCGTATTCCTA CGAATCTTGGAAAGATGACAGTATGGAAAGCAAAGAGCGCAACAGGAGTC Shows there are problems with annotation of CG17762; genome has exons for the additional matches, and vertebrate homologs indicate there are sections missing too \*************A. mellifera Contig1107 TTTTTTTTTTTTTTGTCCACTTATATTAGTGTTCAATTTGTTTAATAGTTATAAACGTTTGAATTTTCAGTGAATTTTA CTTATTTTTAGCAATTCAAAACTTCTAATGATGTCGAAGTTTTTATTGCTTCTTTGCTTTACTGCTGTCCACGTTCTAG GTGAAGACCAGGAAAATGTTTTATCTAAAAAGAATGATACTCTTCTCACTTCTCCCAGGACATCTTTACAAAATGATTC CGAATATTTAAAAGCAAATATTCTATCAAATAAGAATATTTCCATGACAGATTTAGGAATTATTTCAAACACAGAGAAT AAAATAAAGCAACAAGAAGTTATTAAAAATTCTCTTCAAACTTCTACAATTATTCCCAATCATTCTATTGTCATGCCAT TAGATATGACAGCTATTTCAATTTCTACAAATGAAACATCTAAAAAAATTTCGGATATAACTAATCCAATAGTTATACA TGCTCCAATAAATTCTACCTTGTCTTCTTACACAACTGGTAAATGGACAGTTGTTAATGGAACAGATCAAATTTGTATT GTAATACAGATGTCTGTAATGTTTAATATCTCTTATGTCAACATTAATAATAAGACATCTTTTATAACATTCGATATAC CAACAGATAATGTTACTACAAAAGCAAGTGGATATTGTGGAAAACTGGAACAAAATTTGACATTAGAATGGTCTGCTAA AAATATAACTAATGGTAGTATGACATTGCATTTTATGAGAAATGCAACTGAAAATGATTATTCTCTTCACCATTTGGAA GTCATTCTTCCAGCATCAGATTTTCCTTCAAATTTAAAACTGAATGGATCAGTATCTTTAGTACATGAAACACCTGATT TTGAAGTTAGATTATCTAATTCTTATAGATGTTTAAAACAACAAACACTCAACTTAAAACAGAATAATAGTAATGAGAC ATCTGGTTATTTAATTGTATCAGGACTCCAATTTCAAGCATTCAAAGTTGATAATTCTACTATGTTTGGTTTAGCCAAA GATTGCGCTTTTGATACACCAGACGTCGTACCAATAGCAGTAGGCTGTGCATTGGCAGGATTAGTGATTATAGTATTGA TCGCGTACTTGATTGGTCGTCGTCGAAATCAAGCTCATGGCTATCTTAGTATGTAATGTTAATATGTTTTTTATTTTTA ATTTTTTTCGTTGATTCAGTGAAAATTTCATTATTTAGTTTATATAATGTATTATAATGTTAGCTGCAAAACTTAAAAA AAGATATTTCCCAAAAATCATATATTAAAAATTGCAATAC Both this and the matches to vertebrates strongly indicate that the N-terminus, and certainly the C-terminus of CG3305 are missing ; a good C-terminus is encoded in the genome. \**********A. mellifera Contig1163 AGGAGTACGCCACCGCTGGATACTGCACCCCACATTGCACGCACACGATGTTCCCCGAGAGCGGAGTGAACATCGTTTC GGTGGTGCTGCACTCCCATCTGGCCGGTCGGCGGCTAAGCCTGAAGCATATCCGTCAAGGGAAAGAATTGCCGAGGATA GTGGAGGACAATCACTTCGATTTCGAGTACCAGCAGTCTCACACTCTGGAAAAGGAAGTGAAGGTGCTTCCGGGAGACG AGCTGGTGGCCGAATGCGTTTACGGCACTCTGGATAGAACCAAGCCCACTTTGGGGGGATACGCCGCTTCTCAGGAGAT GTGTCTCGCATTCGTGGTCCATTACCCGAGAACCCCGCTTGCCGCCTGCTACAGCATGACTCCGTTGAAACATCTGTTC AAAACATTGGGGGTGTACAGCTTCAAAGGCGTCACTATGGACCACTTGGAGAAACTCTTCCTAACGACCAGAACGGACG CAGTAACCATTCCTTCGACCGGCCAACAACAACTTCCTATCTACCCGGCAACCAGGCCTAGCGAGGACATCGACGAAGA GCTTATTAGGGAGGCCAAGTCAGCGTTGAGGGCCGTGAAGGATTACACTCTGGAGCAGGATAACGAAAATGTTTTCTCT AGATTGATCATCGAGGAACCGGAAGAGTTCAGAGGTCGAACTTTGGCAGAGCACATGCTGGCGTTACCTTGGACCGAAG AACTTCTGGCAAGGGCCATCGAGCAGAACCTGTACCACGGAAGGCACATGACTTTCTGCAGGAAGAGAGACGATAAACT CGCTCTGCCAGCAGACATACAAACGTTCCCTAATTACACGGAATTACCGGAAGCAAATGAAACGATGTGCACGGAAATG GCAAAATTATCCAATGCGTCGGGGAGGATGTCGTACCTCGATATCGCCACGTTCCTCGCG Identifies a 1500bp ORF at the C-terminus of CG13075 that needs to be included in the annotation \***********A. mellifera Contig1167 GCAATAGGCGAGGAAATTTTAGATTTATCTGCTATTGCGCATCTATTCGATGGACCATTATTAAAAAACAAGCAAGATG TATTTCGTCGTGATTATCTCAATGATTTTATGGCCTTGGGAAGATCCGCTTGGATAGAAGCCAGGAACAAACTTCAAGA CTTATTATCAATCAGTAATCCAACCTTGCAGGAATCTAATATTCGTTCAAATGCCTTTGTAAAACAAAATGAAGCAACA ATGCATCTACCAGCAAAAATTGGTGATTACACAGATTTTTATTCCTCGATTTACCATGCTACAAATGTGGGCATCATGT TCCGTGGAAAAGAAAATGCTCTGATGCCAAATTGGAAACATTTACCAGTCGCTTATCATGGAAGAGCGAGTTCAGTGGT CGTTTCTGGAACACCGATAAGAAGACCTTTAGGTCAAACAGTTCCGATAGAGGATGCAGATCCAGTTTTTGGCCCTTCA AGATTAGTAGACTTTGAATTGGAAGTAGCTATCTTTGTCGGAGGACCACCTACAAATCTAGGTGACGCTGTTCCAGCAT CCAAAGCTTACGATCATATTTTTGGAATGGTTACTATGAACGACTGGAGTGCAAGAGACATTCAAAAATGGGAATACAT TCCATTGGGACCTTTCGGTGCAAAAAATTTTGGAACTACTATTTCTCCATGGATAGTCACTATGGAAGCTCTAGAGCCT TTCAAAGTGCCCAATGTGCATCAAAATCCAACCCCATTCCCCTATTTACAACACAATGAATCTTGTAACTTTGATATTA AATTAGAAGTTGACATTAAATCTCCAAACGGTACCGTCACAACCGTCTGTCGCAGTAACTATAAATTCCAATACT Annotation of CG14993 needs fixing; N-terminus is missing, shown in a nearby exon by this transcript and the mammalian proteins. \**********A. mellifera Contig2454 ATTCTGATCCTAGCCAACAAGCAGGATCTGCCAGGTGCCAAAGAGGTGGGCGAATTGGAAAAGCACCTGGGCGTGCTGG AATTGGCGGGGATGCCGGGGAGCGCGTGCATCAGGGTGCAGCCGGCCTGCGCGATCACCGGCGAGGGGCTTCACGAGGG TTTGGACACTCTTTATCAGCTGATACTGAAGCGGCGCAAGCTCGCGAAGCTGAACAGGAAACGGGCCAGGTAGGCCAGG GCCTCGGAGGACTGCACGTGCGTCTTCTGATCTTGCAAGTCGCGGACAGTCTCTCCTTCGGGCACAGCCACGCCTTCCA CGGTGTCTTCTTCCTCCCCGTCACGATGCCGCGGCGCCGACCCGCTCGAGTGAACAACGGAGCTGCGCGCGCATCCAAC GTCTCCACGAACATTGCCCACTTCGCCGTGGACGAAGTCGTCTCGTCCGTTCCAACGTAGTTGGAATCTCTGTTGCGTC GCGTCGAAGATCTTTCCAAGCTTTCATCGATTCGTAGAGGATCTCCTCTTTATTCCTCTCGATCGGGAGATCTCGGTTC ATCGAGTTTCGGATGAACTCGAAACAAGGATGGAATTTGGAACGGGCGCTTTTTTAACGCGCGAGGAGGAATGTCGAAC GGGATATCCCTCTCCGCGAAGAGGAGGATAAAATAATCTAGGA Matches region annotated as CG2219; yet does not show up in the translation? \************A. mellifera Contig2801 TTTTTCTTTAAGTAACAATAAACATGACAACAGCACTGTATTTAGAACATTATTTAGACAGTTTGGAACATCTACCTAT TGAATTACAAAGAAATTTCACTTTAATGCGAGACCTTGATGCTAGAGCACAAGGATTAATGAAAGATATAGATAAATTA GCAGATGATTATTTAAAAAATGTAAAGAAAGAATCCCCAGAAAAGAAAAAGGAACAATTGACTCATATTCAAAACTTAT TTAACAAGGCAAAGGAATATGGTGATGATAAAGTACAATTAGCAATACAAACATATGAATTAGTTGATAAACATATTAG GAGACTAGATTCTGATTTGGCTAGATTTGAAGCTGAAATACAAGATAAAGCTTTAAATAGTAGTAGGGCACAAGAAGAA AATAATGCTAGTAAAAAGGGCAGGAAAAAATTAAAAGAAAAAGAAAAACGAAAGAAAGGTGCAGGTACTAACAGTGAAG ATGAATCGAAAACAGCTAGAAAAAAACAGAAAAAAGGAGGATCTGTTGCTTCTGCTTCATCAGCTGGAGCTGTAGGAAG TGGTGCTCAAGTAGATTCTACTGCACTTGGTCATCCAGCAGATGTTTTAGATATGCCAGTTGATCCTAACGAACCAACT TATTGCCTATGTCATCAAGTTTCTTATGGGGAAATGATAGGTTGTGATAATCCAGATTGTCCTATAGAGTGGTTCCATT TTGCATGTGTTC Encodes excellent match to N-terminus of CG9293; but alignment, along with vertebrate matches, and confirmed by cDNA LD46333.5prime, shows two and maybe three introns that need removing. \***********A. mellifera Contig379 TTGTAATCGAAGTAAATATCGTGAGTTATTCGTTTCATTTTACGGGAGAAAAAAAATTTTCTCCTAACAGTGTCACAAT GCTACAATCGTTAATCTAAACAAACTATAAAGAAAGGGGAACAAAGATGATGTTTTGCTGTTTGAGAAATTGTTTTGAC GGCCTTGGCTTTGCCGCAACTCAAACACCGAAGAGAGAACCAAATCCTATATCTTTAGACACGTCTTATATGGGACATG AAGTTGTGATAGTAAAAAATGGTCTAAGAGTATGCGGTCGTGGTGGTGCCTTAACAAATGCTCCTCTTGTCCAAAATAA AAGTTATTTTGAAATAAAAATACAACAAGGTGGTATATGGGCTATTGGATTGGCTACAAGATCCACAGATCTCAATATT ACTATTGGAGGAAATGATAAAGAAAGTTGGGCTCTTAATTATGATTCTATTATAAGGCATAATCAACAGGAAATACATA AGATTCAAAGTTCGGTTCAAGAAGGAGATATTATAGGCGTATCTTATGATCACATAGAACTTAATTTCTATTTAAATGG AAAACCAATAGGTGCTCCAGTAATGGGCATAAAAGGAACTGTTTATCCAGTACTTTATGTGGATGATGGGGCCATTCTA GACTTAATTTTGGATAATTTTATTCATCCTCCACCTACAGGTTTTGAAAAATCATGTTGGAGCAATCATTACTCTAGCA AAAAATATTATTACATAAGTATCAATCTTTTTATTCCTTGACACTAAAAGCATT This, and homologous human and C. elegans protein, strongly indicate that there are at least two in-frame ORF introns retained in the annotation of CG7785 \************A. mellifera Contig400 GAAAAATAAACTCAGGCCTAACAGTGGAAATGGTGCTGATCTGCCTAATTACAGATGGACGCAGACACTTCAGGATTTG GAGATCAAAGTGCCTTTGAAAGTGAACTTCTCAGCCAGGCCCAAGGACGTGTCGGTGACGATCACGAAAAAACGATTGA CCTGCGGCATCAAGGGTCAACCGCCGATCATCGACGGTGATTTTCCACACGAAGTCAAAGTCGAAGAATCCACCTGGGT GATCGAGGATGGAAAAGTGTTGCTTCTCAACCTGGAGAAGGTGAACAAAATGCAATGGTGGGCTCACGTGGTAACCTGC GATCCGGAGATCAGCACGAAGAAAGTGAACCCCGAGCCGAGCAAGCTTTCCGATCTCGATGGTGAAACTAGAGGCCTGG TGGAGAAGATGATGTATGACCAGAGACAAAAGGAACTGGGTTTGCCAACGTCCGACGAGCAGAAGAAGCAGGACGTGAT CAAAAAGTTCATGGAACAGCATCCAGAGATGGATTTCTCCAAGTGCAAGTTCAATTGAAATTCCAATTAGATAGGGGAG CAGCCGAATATTCAAGGCGTTGTTGTGAAACTGATAATACAAATGATAAAACAGATGATATATTATCGATATATCCAAT CCAGAAAAGCTCTTTATGATACTCTCTTGTTAATTGTCACCGCGGCGAAGTTTTTCTGCACCTACTTTATCGTATCGTA TTCCAAAGCGTAAAAGATGGCGCCACGATCCATGCCACGATTGAATCG This, and vertebrate matches, suggest there is an intron retained in CG9710 \***********A. mellifera Contig58 CCGCGTATTTTTAATACAATTGTTACAATATCGTTTTTTTCATTGTGGAAATAACGAATTTTCAACATGGTGCTAAGTG AAGTGAATAAATTTCTTCATGAATTAGAAAAAGCTGAATTAGAGGCGCCTGGTGGAGTTGCATCATCACAAACTTATGC TCAATTATTAGCTGTATATCTTTATCAAAACGATCTATGCAATGCCAAATACTTGTGGAAGCGGATACCAACGGATCTG AAAAGCGGAAATGCAGAACTTGGTCAAATATGGATGGTAGGACAGCGTATGTGGCAAAGAGACTGGCCTGCAGTTCATG TCGCCCTCAATGCAGAATGGAGTGAAGATGTTTCTGATATTATGGCTGCTTTGAAAGATAATGTTCGAGAAAGGGCAAT CACCTTAATATCAAAGGCTTATTCTTCACTAAGTTTAACCGTATTTGCGTCAATGACAGGCTTAACATTAGAGGAAGCG CGTCGTGTAGCAATTGAAAGGGGTTGGAACGTAGATGGAACGATGGTGCAACCTTGTAAGATTCAGAAAGAAGAGAGTA ACCTCGTGAACGAGGTGTGTCTTACTGAGGATCAGCTGTACAAACTCACTCAATTCGTGTCTTTCTTGGAAAACTGAAC AAGCAATGAAAGTCATCAATGATGAAATTCACGCAACACAAAGACACGATCGACAGTACTTTAAACTTAGTT This and vertebrate matches show that the N-terminus of CG13383 is missing, and it is there in the DNA separated by an intron. \************A. mellifera Contig622 AAAAAATTCGTTGAGCGGTGAAAAACCAAAATGAGAAACTCCTAATGGACCAAAGAAGGACGTGCTTATGCTGTTACTG CTGGGATGACTATGCCTTTGTGATCCAAGACGCACACCAACTCCAGCAAGGTCAGAAAATAAATCTTCAAAAGATGAAC CGCCAAAGAATTCTCTAAATACTTCTTCAGGATCCCTGAACATGAAAGTACCAGCAAAGTGTGGATCAAAGTCTTCCTT ATGTCGCCTCTTGCCACCAGGCATTTGAAGTCCTTCCTTTCCATATTGGTCATAAACCCTCCTTTTCTTTTCATCGCTT AGCACTTCATATGCCTCAGATATTTCTTTAAATCTCTTGTTTGCTTCCTCCAAATTTTCAGGATTTTTATCCGGATGCC ATCTCAACGCCAATTTTCTATATGCTTTTTTGATATCTCCGCTCGTGGCGGTTCGCTGCACTTCTAGTACCTTGTAATA GTCAACCATCGTTCACAATATTCACGCTCGGATGTTAGGGCTTAGGTAACCACCTAGAAGTGGCTCCTGCTTCTTCTTC GCCGTTCACGCTC CG8448 is annotated for this ortholog as mRNA, but not translated? \***********A. mellifera Contig78 GAAACATCAACAACTCCAGAATATGCTCAGGGATTAACTTCAATTAATCCACCTGCTACCCCCATTACTCCTGTAGCAT CTGTACAATCTTATACACCTACTACTCCAAGTGGAGTAGTACCAGTAACAACACCACAAACTCCTACAACACCAAGTAC ACCTACAAATCCGAGTACAGTCATACCTGTTACGACACCGACAGTTATTACACCTGTGGAAAGTACACCTGTTGGAGTG CAAACGGTTAGACCAGCCCAAACAGTTACACAAATTCGTATACAAACTACTGCACAACCTGCTAATGCGGCAGCAAATA CGAGAAAAGGCTTGTCTCTCACGCGAGAACAAATGTTGGAAGCACAAGAAATGTTTAGAACAGCTAATAAAGTAACTCG ACCAGAGAAAGCTCTTATTCTAGGTTTCATGGCTGGTTCAAGAGATAATCCTTGTCCAAAGTTGGGTAATATAGTCACA GTAATGCTCTCAGAAAATATAGAAGAAGTGACTCAACCGGATGGTACAACAGTTCCCATGTTGGTTGAGACACATTTCC AAATGAATTATACAAATGGCGAATGGAAGAGGATAAAGAAAAATCGACGAATTATTACAGAAGAATCAACGTCCACTAC GACTCCTACTCCCAGTGTGACGGCAACAGCTTCCAATTGAAAAAAAGATAAAAGAAATTTTTATCGAATTGCAATTATA TAGGTCAATAATTCTGTCATTTTTTGGATGACGGTGTTTTAAAAAACTCCTGTTCTATAGATAGCAAAAGAATTTGAGTG Indicates that CG5874 needs additional n and C-terminal sequence, encoded in genome; vertebrate matches agree. \***********A. mellifera Contig974 TTGGATACTCTTTTCATATTCTCGTGCTCACGTGCGTTATGCAAGAATTCATGCTTCTCCAAATTCTGATTGGTTTTTC TTTCTTGGCAACGGCAATTCCGAAGCCGGAAAATGATCACAAGCCGCGAGTTATTAATAAGGAACCAAACAGTGAAGAA CATTATGTCAATTCTCAACATAATCCTGCCTATGATCATGAAGTTTTTTTAGGTGAAGAAGCAAAAACTTTTGATCAGC TTACTCCTGAAGAAAGTACAAGAAGATTAGGAATAATAGTTGATAAAATAGATAAAGATAATGATGGTTATGTTACTGG AGAAGAACTTAAAGATTGGATATTATATTCTCAACGGCGTTACATACGGAACAATATTGAACATCAATGGAAATCTCAT AATCCTGAAGAAAAAGAGAAGCTTCCATGGACAGAATACTTAGCAATGGTTTATGGAGATATGGATGAACAGGAAGCAG AAAATCACGAAAAATCTAAAGATAATACTTTTTCGTATGCTGCTATGCTTAAAAAAGATCGCAGACGTTGGACAGCTGC AGATTTAGATGGTGATGATGCTCTTACAAAAGAAGAGTTTGCTGCTTTCCTTCATGTAGAGGAAGCTGATCATACAAAA GATATTGTAGTATTAGAAACCATGGAAGATATTGATAAAGATGGTGATGGAAAAATATCTCTTTCAGAATATATTGGTG ATGTATATGA This and several Drosophila ESTs suggest there are problems with the scf gene annotation. \***********A. mellifera BB260012B10B5.F GGGTTCACTGGTGGGCCTGGGTGGTGCTAAATCAGGTGGTAATACCCCGATGAACCCATCGCTACAACAGCGGATCAAC TTCCTCCAAAGTCATCTGAGCCAAGCACCAATGCCTTCCGTTGCTACCAAGAGGCGGCAACTGCCGTCTATAGAAGAGG CTTGGAACTTACCCATTAGTGCTGAGATGTCTAGTAGACAGCAACAACAGCAACAAACACCCACTGGTCCGGGTTATAA ATATGGTTCCACTCCTTCTGGACCACCACCTCCTTATCCTCAAGGACAAGGGCAGAATCTAAATACAAAAAGATTTAAG CCGGGAGAAGAACCAATTTCTCCAGGTTCACAACAGAGACCACCACCATTTTATCTCACGTCTCAACAACTGCAGATGT TACAGTTTCTTCAACAAAATCATGGAAGTTTAACGCAACAGCAGCAAGGTTTGCTTGCACAATTACAACAACAATACAG ATGTATGCAACAACATCAACAACAAATTAGATTACAACAGCAACAAGCTGCTCAAAGAGGTTTAAGGCCAGGACAACCT GGTTATCCTACAGGTTACAATCATTCACAACTAGGACAACCTGGCGTGATCAAGAATTACGGGATACCTCAGCAACCGT TGCAACAAGGTGGAACTGTTGCTTTACAAACAGGATTCTCAGATTCTAATGTCGGTTATAACACGGCAGCAACTGGGAA CAGTCAAAC This, mammal matches, and cDNA HL02950.5prime all seem to suggest there is a region missing from CG5640 \**********A. mellifera BB260013A20E9.F GAAACCTAGTGCTATGATAGTTTTTTCATTTATACTTTTATCATATTTTCTAGTAACTGGAGGTATAATATATGATGTA ATTGTGGAACCACCTAGTGTAGGCTCAACAACAGATGAACATGGCCATACAAGACCTGTAGCATTTATGCCGTATCGAG TAAATGGGCAATATATTATGGAAGGATTGGCATCTAGTTTCCTTTTTACATTAGGTGGAATTGGTTTTATAGTATTAGA TCAAACACATAATCCATCAACACCTAAGCTTAATAGAATTCTTTTAATATGTGTTGGATTTATTAGTGTTATTGTCTCA TTTATTACCTGTTGGGTTTTTATGAGAATGAAACTACCGTAAGATTTATATATATATATATATATTATATATTGTAATT TATAAAAGAACAAATAACACATTAATTAATTTATAAATACATTTTTTATATTTCTTTATACATTTAAATGAGACTTATT TACATACTTTTTTGTAATATACATATATAATATAAATACATAAGAAAAATAAAAAAAAGACTAGATTTTAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAGCAAC This and vertebrate matches and bombyx mori EST indicate that there is an in-frame open reading intron in the annotation for CG9662 that should not be there, that is, it should be coding. \**********A. mellifera BB260014B10A5.F AGCTTGCGGCCGCGTTGCTTTTTTTTTTTTTTTTTTAATCGTCTCAATACGAAAGACCACGCGGATGTGTCGTTAGTGT GCACGTGTTCTTTACGCTCGCGCAGTGTTTCGTAAAGAGAAGAGGCCAAGATACATACGTACGCGGAACTGCAGCTGAA AGGGAGGAAAAGAGAAGAAACCCAACGAAGAAGAAAAGGAAGAAGTAGAAGATGAGCAAGACGTGCGCCCGCTGCGAGA AAACGGTCTACCCGATCGAGGAGCTCAAGTGCCTCGACAAGATATGGCACAAACAGTGCTTCAAGTGTCAGGGCTGCGG CATGATCCTGAACATGCGGACGTACAAGGGTTTCAATAAACAGCCATACTGCGAGGCGCACATACCAAAGGTGAAAGCC ACCACAATGGCCGAGACGCCGGAACTGAAACGCATCGCGGAGAACACGAAGATTCAGAGCAACGTGAAATACCACGCCG AATTCGAGAAGGCGAAAGGCAAGTTCACCCAGGTTGCGGACGATCCAGAGACACTGAGAATCAAGCAGAACAGCAAAAT TATCTCGAACGTTGCCTATCACGGTGAACTTCAAAAGAAGGCCATCATGGAGCAGAAAAGGACGATGACTGGTGAAAAT GGCGAACAGATCGTGACGAATCCACCGACCAGAAAGATTGGTTCTGTAGC This and vertebrate and C. elegans matches, and several Drosophila ESTs, show that the N-terminus needs revision, including a 5' exon about 16kb upstream in genome; plus current annotation is twice as big as the other matches. \**********A. mellifera BB260018B20E5.F TAAAAATGCCAAGGGGAAAATGGTATTGTTCTAATTGCCACAGTAAACAACCAAAGAAGAGAAATAGTAGTCGAAGGAG TCATACCAAAGGGGGAGGCACCAGAGAAAGTGAAAGTTCTGATCATCCACCAGCTAGTCCAACGCCGTCAACGGCATCG AACACACACGTAGAGGACGTCAGTTCATCGGAACCAGCAACCCCAACTGCCTCACCACGGAAGGAGGGAAACAATAGGA CGCTCACGAAGAAACAACAACGAGAGTTGGCTCCTTGTAAGGTGCTACTCGAACAGTTGGAGCAACAGGACGAGGCCTG GCCGTTCCTCTTGCCGGTGAACACCAAACAGTTTCCTACCTACAAGAAAATTATTAAAACACCCATGGATCTCAGTACT ATTAAGAAGAAATTGCAGGATTCCGTGTACAAGTCTCGCGATGAGTTTTGCGCCGATGTCAGACAGATGTTCATCAACT GCGAGGTATTCAACGAGGACGACAGTCCCGTGGGCAAGGCCGGACATGGGATGCGCAGTTTCTTCGAAATGCGTTGGAC CGAGATTACTGGGCACCACCTCCACACCCGCAACGCATAGCTGAGGCTCGGTTCGCTCTCGCAACACCTGCAACAGTTT TGCCCACTTCTACTACTAGAGGGTAAAACCCTCGAGAG Ortholog is CG10897, is annotated as mRNA, but not translated \- Two Drosophila ESTs show it well; \**********A. mellifera BB260019A10H3.F AGATTCTCGGGAAAGCCTTGAAGTAGCGATACAATGTTTAGAAAGTGCTTATAATGTACAAGCATCAGATACTCCAACA AATTTTAACTTATATGAAGTTTATAAGTCTTCCGTAGAAAATGCAAAACCTTATTTAGCTCCAGAAGCTACTCCAGAAG CAAAAGCTGAAGCTGAAAGATTAAAAAATGAAGGAAATACTCTCATGAAGGCTGAAAAACATCATGAAGCTCTTGCCAA TTATACAAAAGCAATTCAATTAGATGGTCGTAATGCTGTGTATTATTGTAACCGTGCTGCAGCATATAGTAAAATTGGC AATTATCAACAAGCAATTAATGATTGTCATACTGCATTGTCCATTGATCCCTCATACAGTAAAGCATATGGACGTTTAG GTTTAGCATATTCCAGCTTGCAAAGACATAAAGAGGCTAAAGAAAGCTATCAAAAAGCTTTAGAAATGGAACCTGACAA TGAAAGTTATAAAAATAATTTACAAGTAGCAGAAGAAAAATTAGCTCAGCCAAGCATGAGTAATATGGGATTAGGGGGA AGTGCATTACCAGGCATGGATCTTAGTTCACTCTTGAGTAATCCTGCTCTTATGAACATGGCTCGTCAAATGTTATCCA ATCCAGCTCTACAAAATATGGTGAGCAATTTTATGAGTGGACAAGTTGAACAGGGAGGACATATGGATGCTCTTATAGA AGCTGGTCAACATTTTGCACGA Human match is much better than Drosophila CG5094; but this may be because Drosophila has several large insertions that might be unspliced introns \- also not present in B. mori ESTs! \**********A. mellifera BB260019A20B12.F GTTGCTTTTTTTTTTTTTTTGTGAACCTGTGAGAATAGCGTTGCTTTCTCATTCGTGCCAAAGAGAATTCTGCCTGTCT TGTGAGCTAGGATTCCTGTTTCACATGTTGGATACATCTCGAGGATTGCCGTGTCAAGCTGCTAATTTTCTTCGAGCTT TTAGAACAGTACCTGAAGCGGCAGCTTTGGGACTTATACTCAGTGATCTCCATCCGGAGGCGAAAAGGAAAACAAATTT GGTACGATTAATACAGAGTTGGAACAGATTTATATTGCACCAGATTCATTATGAAGTTTTGGAAACAAGAAAACGACAG AAAGAGGAAGAAGAAGCTGCTCGATTAAAATCAGGACCAAAATGTCCACCGT Together with a new Drosophila EST shows there is an exon missing from CG8232 \**********A. mellifera BB260019A20D5.F AAAAAAGGAATTGAACTTCTTCGTATTCAATTTCCGATGTTCTGATTTAAAGATCAACTATAATGTTAACAGTGTCATT AATTTCATTTAAACGTGTGTGATCAAGTTAATTAAAAATAAAGTGTAAATAAACAACAAAATTCTTTGAAATATTTTAA GAGAGTACGAATGTTTTATTCGCTTTGATAACGCTACATTGCTTTGTCGTTTTTAACCTAAATCGAGATGGCTGATTCA GAACAAGATTTCGGAGATCGTGGAGATAATGACAATTTAAAAACTGATAAATTATTTATCTTAAAGAAATGGAATGCTG TAGCTATGTGGAGTTGGGATGTGGAATGTGACACTTGTGCAATTTGTCGAGTTCAAGTAATGGATGCATGTCTTCGATG TCAAGCGGAGAGCAAAAAAGATGATAGCCGACAAGACTGTGTCGTCGTCTGGGGAGAATGCAATCATTCATTTCATTAT TGTTGCATGTCACTTTGGGTGCAACAGAATAATCGTTGTCCATTATGCCAGCAAGAATGGTCCATTCAACGAATGGGAA AATAACTAAATCAATCAAGCGAAACTTCAATACTTATTTTGTTTCCTTTTTGTTTCGTTATTTTCTGCTTATTTTTCCT TCTTTCATTTCTTTCTCCCTTCTCTCTCACATATACACACATATGCACACGCACATACACATACACTCTCTCATTCACT CACTAAGTGG NEW GENE \- There is a Drosophila testis EST that has excellent match, but gene is unannotated. \***********A. mellifera BB260021A20F4.F GCGTTGCTTTTTTGTCTGCTCTTGATAAGATGGTTTCAGATAATATACAAGATAGAATGCGAGATTCAGTAAAACCACA ACAAGTAGATATTTCAGTTCCTTTACATGTAAAAAGTACTAAAAAAACATATGAACAATTGCAAGAAAGACCTTCTGAT AATAGTACAGTTGATTTTGTACTTATGTTGAGAAAAGGTAACAAGCAACAATATAAAAATTTAGCAGTTCCAGTATCAT CAGAATTAGCAATGAATCTTCGAAACAGAGAACAAGAACAGAAAGAAGAAAAAGAACGAGTTAAAAGATTGACATTAAA TATTACAGAAAGACAAGAGGAAGAAGATTATCAAGAAACAATTAATCAGAGTACCAAGCCAGTAACGGTAAACTTGAAT AGAGAACGGCGACAAAAATATAATCATCCCAAAGGTGCACCAGATGCCGATCTTATTTTTGGTCCTAAAAAAATACGGT AGATTTAATATTTTAATGTTTTTGGGACAAATTAATACTTTCCTATTAGAAAATCACAAGTGATCTAAGTTATGGACTA TTTGAAGTCCATATTTTTGTGTAGAATTTATAACAAATTAATAATTTTTATTTTTAATTTAGAATACATATCTACATAT ACATTATTTAAACGAATTTTCAACCATACTATATGATTTTTGGTGTAAAAATACATTTTACAATTATATT Comparisons with this and human homologs strongly suggest there is an extra intron near the C-terminus of this protein \**********A. mellifera BB260021B10D9.F GAATTTTCAACATGGTGCTAAGTGAAGTGAATAAATTTCTTCATGAATTAGAAAAAGCTGAATTAGAGGCGCCTGGTGG AGTTGCATCATCACAAACTTATGCTCAATTATTAGCTGTATATCTTTATCAAAACGATCTACCTGAACGAGAGAGgttt cgatctgtctttgtttgatcgaaaaatcgccgtcgcagATGCAATGCCAAATACTTGTGGAAGCGGATACCAACGGATC TGAAAAGCGGAAATGCAGAACTTGGTCAAATATGGATGGTAGGACAGCGTATGTGGCAAAGAGACTGGCCTGCAGTTCA TGTCGCCCTCAATGCAGAATGGAGTGAAGATGTTTCTGATATTATGGCTGCTTTGAAAGATAATGTTCGAGAAAGGGCA ATCACCTTAATATCAAAGGCTTATTCTTCACTAAGTTTAACCGTATTTGCGTCAATGACAGGCTTAACATTAGAGGAAG CGCGTCGTGTAGCAATTGAAAGGGGTTGGAACGTAGATGGAACGATGGTGCAACCTTGTAAGATTCAGAAAGAAGAGAG TAACCTCGTGAACGAGGTGTGTCTTACTGAGGATCAGCTGTACAAACTCACTCAATTCGTGTCTTTCTTGGAAAACTGA ACAAGCAATGAAAGTCATCAATGATGAAATTCACGCAACACAAAGACACGATCGACAGTACTTTAAACTTAGTTNTACC TCGTAATCATATTATATAAAGGAGGAAGTGTGCTTCAAAAGAGACACCTGTGTCCCTTNCCCTAGTATACACCCCGGAC TACCTACGCGTNCTGCTCATTCAATCGGAAGACTCGCGATGAAGCT Comparisons with human COP9 homolog show that the N-terminus is missing from the Drosophila homolog annotation, CG13383, and there is one available in the genomic sequences. \**********A. mellifera BB260021B20G3.F ATTTCCACAAAATTTATTAGCTTTGGCACCATTGCTACGTACATTGGATTTATCGGAAAATGAATTTGTTCATATTCCC GATAATATTGGTAATTTTACGTTATTAAAGCTATTGAATGTTAATCATAACAAATTGACAACTTTACCCGAAGCACTTG GAGCATTGACAAAATTAGAATGTTTAAATGCAAGTTCGAATCAAATAAAAACTATCCCATGGTCATTGTCAAAACTAAC ACGATTGAAACAAGTCAACTTATCTGATAATCGTATAACCGAATTTCCTCCTATGTTTTGTGATTTAAAATTTCTGGAT GTGTTAGATTTATCGAAGAATCGAATTACGACAATCCCTGATGCGGCTGGAGCGTTACATATAGTTGAACTTAACCTCA ATCAAAATCAGATATCAACTATATCTGAGAAATTGGCGGAATGTTCGCGCCTAAAAACATTAAGACTTGAAGAAAATTG TTTACAACTGAATGCAATACCTAGTAAAATTTTGAAAAATTCTAAAATTTCAGTCCTGTCTGTTGAAGGAAATTTATTT GAGATGAAACAATTTGCTAATCTTGATGGTTATGATAACTATATGGAAAGATATACCGCTGTAAAGAAAAAACTCTTTT AAGAGATATTTTAAATGAATATTTATTATTGAATCTATTATAGGTAATTATTATACATATATAATTATTTTATAATATT GAAAAAGATGCCGCATCGTGTTCGCGTAATTATACTCGATACCTGCGATAATATAATATAGAAATAAATTTTCAATTAA TTGATAAT This and vertebrate matches and cDNA GM01152 indicated that major reannotation of CG3040 is needed. \***********A. mellifera BB260024A20E12.F AGAAAAATCAGATGTTTGTTTAATTGGTCTACATGCTTGTGGAGATCTTAGTATACATGCATCAAAAATATTTCGAGAT ATGAAAATAGCACGTATTTTTATTTTAATTCCTTGTTGTTATCATAAGCTTTCAATATCAAAAAGGATAAGAATAAATA CATCAAGTGAAAAGCAATACTTTAATAATTTTCCTTTATCTAATTGTTTTAAAACTATTATTAATAATACTAATTTTGA TATTGGTACTTTTTTGAGGCAACCTTTTTTACGACTAGCATGTCAAGAACCAGTAGATAGATGGTATAACATGTCTATT GAAACACATAATAAACATTCTTTTTATGTTCTTGCAAGAGCTGTCCTTCAATTGTATGCAACTAAAAATGGATTTTCTC TTAAGAAATGTACTCAAAAAGGAACAAGAAAATCACAATGTTTAAATTTTGAAACATATATTAAAGATGCATTGACTAG GTACATTTTACAACCACAAGAAAAAGAAACATTCAAAAAACAAGATGTAGAATTTAATCTTGATACACATAAAAGAAAT ATAATAGAATTATGGAAAAATCATTGTGATAAATTTAAAATTGTAGAAATATATACTGGTTTACAACTGATGTTGCAAG CACCAGCAGAATCACTTGTTTTACAAGACAGATTATGTTGGATGGAAGAACAAGGTGGATTTCCTAATGATTGTCTGGA GTTTGTTGCTGAATTCCAG This and a B. mori EST and the human, Arabidopsis and C. elegans orthologs are indicate that the annotated N-terminus of CG8447 is incorrect, it should be another \+200aa. \***********A. mellifera BB270001B10B9.F ATCGCGATATAAAGGTGTTCCTGTAGAATATCTCGTGAGTTACATTAACCGAGTCATTCTGTCATCTGTTTCAAAGAAG AGAGAGGTCGGTGGCTCTGTGTATTAGTGTACAATCGCGCCATGGGGCATCAGTGTTGGTTCTTCACAAACGACGGGAC CTGAATGTGTGCATATTTCTCCTGGCTGACTGGCCTGCCTTCCACCGTTAAGCTGCATTCACAAAGGAAGCCGATTTTT GGGCAAGGACCGTATCTCGGTCGTTCTTGCCAAGAGTACGAATCGAACCGGTTAATGAGGACTATCTGGCTTCGATCGT CCAATCAGCTAGCAGCTAGACTGGACGAGCATAATCGGATGAAGTTGGGCCAAGCATGCCCTCTAAGAAGCAATATAAT CTCGTACATAATGACGAGTACGACACGAGGATACCACTGCACAGTGAAGAGGCATTCCACCGTGGAATTGTCTTCCATG CCAAGTTCATCGGCTCTATGGAGGTTCCTCGACCGACCAGCCGAGTGGAGATCGTGGCGGCGATGCGAAGAATCCGCTA CGAGTTCAAGGCCAAAGGGATCAAAAAGAAGAAAGTGACGCTGGAGGTATCCGTGGACGGGTTGAAAGTCACTCTTCGA AAGAAGAAGAAGAAGCAACAGCAGTGGATGGACGAGAATAA This EST and it's mammalian orthologs suggest major problems with annotion of CG17357 and CG3179 \***********A. mellifera BB270004A10C3.F GTTGCTTTTTTGCATCTTCATAGGATTGTGGATGTGCATTCCATTCGCTTGGACCAATCCAAAGGTGCAATCTCTCAAG TCTATGGAGGTGGATTGGATCGGTGAAGTGAAGCCTGGGGAATACTGGTCTTACGTGGATTATGGTCTTTTGTTGATAT TCGGTGGTATCCCTTGGCAAGTATATTTCCAACGTGTCCTGTCCTCGAAAACTGCTGGAAGAGCGCAAGTGTTGAGCTA CGTAGCCGCGATAGGGTGCATTATCATGGCCATACCACCTGTCCTGATCGGTGCACT This EST and vertebrate matches show problems with the C-terminus of CG7708 \***********A. mellifera BB270007B20H3.F GTTGCTTTTTTTTTTTTTTTTTTTATTAAAACATATAATTCATAGTAATTAATCAATAGATTCATATTGCAGCGTGAAA CATGTCGAATAAATTTGAAGCGTTTGCAAGTGTGGAGCAGTTTTGGAGTCTTTACAGTCATTTAGTCCGGCCATCAGAA TTAACAACATCTACAGATTTTCATCTTTTCAAAGTTGGCATAAAACCAATGTGGGAAGATGAGGCAAATCAAAAAGGTG GTAAATGGATAGTACGATTAAGAAAAGGTTTAGTTTCTAGATGTTGGGAAAATCTTATATTAGCTATGTTAGGAGAACA ATTTATGGTTGGAGAAGAGATATGTGGAGCTGTTGTATCTATAAGGTTTCAAGAGGATATAATATGTGTATGGAATAAG ACTGCATCTGATTATGCAACAACAGCACGTATTAGAGATACATTAAGGAGAGTTTTACATCTTCCAGCAAGTGCCTCAA TGGAATACAAAACTCATAATGAAAGTTTAAAGAATGTTCATCGGCTCTAAAATCTTGTGATGTCAACTCAAAGATTTGA ATTCTTTATGGATTTTCAGCCAATTGATACTTGTT This unspliced EST and it's vertebrate matches indicate there is a problem with the C-terminus of CG10716; the appropriate sequence is available in the genome sequences. \**********A. mellifera BB270021A20F1.F ATTATTGTGAATATTGTGATAGATCGTTTAAGGATGATCCGGAAGCCAGAAAAAAACATCTTTCAAGTTTGCAACATGC GAAAAATCGTGCAGATCATTATAATATGTTCAAAGATCCAGAAATTATTTTAAGGGAAGAATCTACAAAGATACCATGT AAATGGTATTTAACTAATGGTGAATGTGCATTTGGCCTTGGTTGCAGATATTCCCATTATACTCCTCCTATGATATGGG AACTTCAACGTCTTGTTGCTATGAAAAATCAATCAAAGTTGAATATAAATCTCGAAAATGGCTGGCCAAATCCTGACGA TATAATTAAAGAATATTTTGAGAATAATACGAGCACAAGCACTACAGATGATTTTACGTATCCAAATTGGCACAGACCA TCGGAGCTACATGATTATTCTATGCTATCACCATCGTTATGGCCTATTACGCCTGAAAGTTTAGCAAATACTACAAGAT TCGAAGAATGGGGTTAAAAATGTAAGGAATATAATGTTCCTTGTTAAATTAAATGAAAATACATATATTTAAGACTCGT CCAAAAGAGTAAATAATATATGATATATAAACAATTTAAATATATAGTTTTTCAAAAAACTGAATATTTTGTTAATAAA TGAACATATAAAATTGGGAAATGTGTTCACTTTCTTAAATAGGAAATATTATTGTTAATATTATATTGTAATAATATTA TT Bee translation Y C E Y C D R S F K D D P E A R K K H L S S L Q H A K N R A D H Y N M F K D P E I I L R E E S T K I P C K W Y L T N G E C A F G L G C R Y S H Y T P P M I W E L Q R L V A M K N Q S K L N I N L E N G W P N P D D I I K E Y F E N N T S T S T T D D F T Y P N W H R P S E L H D Y S M L S P S L W P I T P E S L A N T T R F E E W G Z NEW GENE \- completely unannotated \-between CG5105 and CG5118 \- also mammalian matches AE003587 ATGGGTGGCAAAAGTTATTATTGCGACTACTGCTGTTGCTTTCTGAAAAACGATCTGAATGTGAGGAAATTGCACAATG GTGGTATTGCACACGCAATTGCAAAGAGCAACTATTTGAAGCGTTACGAGGgtaaagcttttgttgcacgatattcccg aaaaagctgaattttcaatatttgtagATCCCAAAAAGATTTTGACTGAAGAGCGGCAGAAAACTCCTTGCAAGCGATA CTTTGGCAGTTACTGCAAGTTTGAAACATATTGCAAGTTTACCCACTATAGTGGCGATAATCTACGGGAACTGGAGAAG TTGGgtgcgtgataaaacgcagatttaaaacgaaaaataacttcttcttgaaaactttcagTTCTCGCTAGAAAGAAGA GAAAATCCCGAAAGAAAACCAACAAATGCAAGAGATGGCCCTGGAAAACTCATCTGCGAAAGGGATTACCCCCTTCCTT GCAACCCATTAACCCGGAAAAACTCAAGCAAACCGACTTTGAACTCAGTTGGGGCTAAATATATTTACAGAATGCACAC TTTATGTCAAGTATTCAGTATGCTAATTACTTTCTCCATGACGCGCGCCACTTTCAGGTTG translation M G G K S Y Y C D Y C C C F L K N D L N V R K L H N G G I A H A I A K S N Y L K R Y E \---------------------------1---------------------------D P K K I L T E E R Q K T P C K R Y F G S Y C K F E T Y C K F T H Y S G D N L R E L E K L \-------------------------------1-------------------------V L A R K K R K S R K K T N K C K R W P W K T H L R K G L P P S L Q P I N P E K L K Q T D F E L S W G Z Nice small gene with two short introns Fly MGGKSYYCDYCCCFLKNDLNVRKLHNGGIAHAIAKSNYLKRYEDPKKILTEERQKTPCKRYFGSYCKFETYCKFTHYSG DNLRELEKLVLARKKRKSRKKTNKCKRWP------------------------------WKTHLRKGLPPSLQPINPEK LKQTDFELSWGZ Bee YCEYCDRSFKDDPEARKKHLSSLQHAKNRADHYNMFKDPEIILREESTKIPCKWYL-TNGECAFGLGCRYSHYTPPMIW ELQRLVAMKNQSKLNINLENGWPNPDDIIKEYFENNTSTSTTDDFTYPNWHRPSELHDYSMLSPSLWPITPESLANTTR FEEWGZ \**********A. mellifera BB270024A20E1.F AACAGGAAAGGCTTGACGCCGCTCTACTACAGCGTCATCTACAAAACCGATCCGATGCTGTGCGAAACGTTGCTCCACG ACCACGCGACGATAGGCGCCCAGGATTTGCAGGGATGGCAGGAAGTGCATCAGGCCTGCCGCAACAACCTGGTCCAACA CCTGGATCACTTGCTCTTTTACGGTGCCGACATGAACGCGCGTAACGCGTCCGGCAACACGCCGTTGCACGTATGCGCT GTGAACAACACGGACTCGTCGTGCATACGCCAGTTGCTGTTCAGAGGCGCGCAGAAGGACAGCCTGAATTACGCGAACC AGACTCCCTACCAGGTGGCGGTGATCGCTGGGAACATGGAGCTGGCCGAGGTCATTAAGAATTATCAGCCGGAAGAAGT TGTACCGTTTAAAGGGCCGCCACGCTACAACCCGAAACGGCGCTCGGTGGCGTTCGGAGGCACGTCGACGATGACCACT AGCTGCTCGGCCAGCAACTTGGGCACCCTGACCAGGATACCGTCCGCCGAACACCAACACGCCTCTGGCGGAACAGGAG GAGGAGGAGGAGGAGGGGGAGGTGGTAGCCTGACCAGAACGATCTCAGTGGAGCAGTACGCGGTGACGAGAGTACCGTC CGCCGAGCAATACGCGACCGGCAACCTGACCAGAGTACCGTCCACGGAACAATACGCCAGCCCTATCGCGACCGCGAC This EST and it's vertebrate matches suggest there is a major unannotated region upstream of CG8122 that should be added to it. \**********A. mellifera BB270025A20A3.F AGTTCTAGATCTTGCCACTGAAACTGCTACTGCTGTAAGAGAAACAAGTAGAAGTGCTCATCGTACGATACCAAAACGC GATAGACCTCCTCGTGTGGCAAGTGGTTCTGCTGGTCTATTACCACCCTATAATCGCCAACAAGCAGAGGGCCAAGAAT TTCTTTATATAATAAATGAACATAATTATTCAGAATTATTTGTGGCATATGAGTGTTTACGTAGTGGAACGGAGAATCT AAGAATTCTTGTTTCTAATGAAAGAGTTCGAGTGATTTCCGGAGGTACCAAAGGAGTTGTAACCGAAGTCAGTCTAGCG GACTTATTATATTGTCAACCAATGCATAAGCTAGAAAGTAATGGTGTTACTTTATACTATATTGAATTAATATCTAGAT CAGATTCAACGATAACCGTTAACATGGACGGTCCAGAACTTCTAAGAAGACCTAAAGTTCGATGTGACAATGAAGAAGT AGCCAAAAGAGTATCGCAGCAAATTAATTACGCTAAAGGAATGCACGAGGAACGTAGCTTGACTCTTTCTTCTTCGGAT AATATGTTAGATGATGTACAGTACTATAAGTAGTTACAAACAATCATATATGAAAATTTATTTTGTATTTGACAAAAGT TTGGAATCACTATGTTTTTACAAAAAATTTTTATGGGAAATTAATGCATTAAAATATTTTCATTTCAATGTTAATTCC This EST plus it's human match show that the C-terminus of CG11003 is incorrectly annotated. \**********A. mellifera BB270031A10B1.F GAAGACAGAAAGCTGTTCGTGGGAATGCTCAGCAAGCAACAAACAGAAGACGATGTCAGACAGTTGTTCACTGCCTTTG GCACAATAGAGGAGTGTACCATCCTCCGAGGACCTGACGGCAGTAGCAGAGGCTGTGCATTCGTAAAACTTTCATCGCA TCAAGAAGCGCTAGCGGCGATCAATACCTTACACGGTAGCCAAACTATGCCGGGTGCGTCATCCAGCTTAGTGGTAAAG TTCGCAGATACTGAGAAAGAGAGACAACTAAGACGCATGCAGCAAATGGCCGGGAACATGAGCCTCCTTAACCCTTTCA ACGTCTTCAATCAGTTCGGCGCTTACGGCGCTTACGCTCAGCAGCAAGCAGCCCTGATGGCTGCGGCAACGGCACAAGG GACGTATATCAATCCAATGGCGGCATTGGCACACGTTGGCGCTGGCCAACTGCCGCACGCGTTGAACGGCATGCCAAAC CCCGTCGTTCCACCGACTTCCGGTTTGCTCGTAGGTACCGGTACAGGGCAGCCTGTTAACGGGGCGATACCGTCGTTAC CCA This EST suggests that CG12478 connects to CG10046. \**********A. mellifera BB270032A10D3.F TCGTCCACTAACGCGGGAGGAGCGGGTGCACCCAGCGAAGACAATACGCAGATATTGGTGATGAACAATTACTTCGGTA TCGGCCTCGATGCCGATCTTTGTTTAGACTTTCACAACGCCAGGGAAGAAAATCCGAATAAATTTAAAAGCAGATTGCG TAACAAAGGGGTGTACGTAACCATGGGTTTGCGAAAAATGGTAAAACGGAAACCGTGCAAAGATTTGCACAAAGAGATA CGACTGGAAGTGGACGGGAGACTCGTCGAATTACCTCAAGTCGAAGGAATAATTATTCTAAACATTTTAAGTTGGGGTT CTGGGGCAAATCCTTGGGGACCAGACATCAAGGAAGACCACTTTCAAACACCGAATCACGGGGATGGGATGTGGGAAAG TTGGCGAAGTCACGGTGTTTGCATCTTTGGACAAATCCAATCTGGTCTCCGTACAGCGATGAGGATAGCACAGGGTGGA CATATAAAAATTCATTTGTACTCCGACATACCAGTGCAAGTAGACGGAGAACCATGGATCCAGAGTCCGGGGGATATCG TAGTTCTGAAATCGGCACTGACGGCCACCATGTTGAAGAGCATAAGATCAAGCGTCGGAATACCGAACCTTCGATTCCA CCCGCTAATGGGGTGGGGGGCAAGAGCTCGGACGAGTGTCGCGACGAAAGCTTGATGCCGCAGTTTCCGCGCTA This EST and its nematode and human matches show that there is a long C-terminus to CG5875; and there is a Drosophila EST. \**********A. mellifera BB270032B10E8.F TTCAATAACACCATCAACAGAAAACTCAATTAGTCCAGAACCTGAGATTAAACCATTGACAGACATTAATATTAATCTT CATGATATTAAACCAGGTATTAATCCACCTATAACAGTAATTGAAGAAAAAAATGGTATATCCGTAGTGCTTCATTTTG CTCGAGATAATCCAAGAAAAGATGTATTTGTTGTAGTGATTACAACAATGAGTAAGAATTTGAAGCCACTTACTAATTA CTTGTTTCAAGCAGTGGTGCCAAAAAGATGTAAATGTAGACTTCAGCCACCTTCTGGAACAGAATTGCCTGGTCATAAT CCATTTTTACCTCCATCTGCAATTACTCAAATTATGTTAATTGCAAATCCTACCAAGGAAACGGTATCGTTAAAATTTA TGTTAAGTTATACTATGGATGATGAAACTTTCACAGAAATGGGTGAAGTAGAAAAATTGCCTTTAGTTTAAAGTACTTA AAGTGTAATTATAATATAAATTAAATTCAAGAATTACAGACTCTAATAGCCAAAAGAAGAATATATTGTTTTTTAAGAT ATGAATTTTCAAAAGATATTGTTTCTTTTACTTTATTTCTCAGATTGTAATTACAGTTTTATCTTTATTATTATATTCT GAGATATATTAATGTTATGTAGATATTTATAAGTCATGTTGTGATTACATAACGAAAATATCAATAACAATCTATTTTA AGATCGA This EST and mammalian matches indicate the C-terminus of CG3002 is not annotated \- it is there in the genome. \------------------------------------------------------------------------------ --