Subject: Third set of annotation suggestions from bee ESTs Dear Gillian, Attached is a TEXT file with the last set of suggestions for annotation improvements deriving from our honey bee EST project. .. Thanks, Hugh Hugh M. Robertson Professor Department of Entomology University of Illinois at Urbana-Champaign \------------------------------------------------------------------------------ -- \**********A. mellifera BB260007A20E1.F TGGCTTTTAAAATAAATAATTTTTAGATTTTATCAAAAAGGTTAGTAAGTCCGGTCAGGTACGACGAGAATTTAAGTTT TCCAGTGCAGAATGTTTTGGAAAATGGTTGGTGAACCGGAGAACAACAAGCAGAGCCATTTAAAGCTATAAGATTTAAT CTGAACGGAAATGGAAAATTATTTTCCCAAGGTGAAATGGAAAATTAATTTCCCAATCATTCAGTTTAATAGATTTTTA TATTTTAAATATCATGCTATTATTTTATTTTACACAGATTAAGGGAATTGGATAGAGATGAACTTTTCGAAAAAGCAAG AGGAGAAATCTTGGATGAAATTGTAAATCTGTCCCAAGTTTCGCCTAGACACTGGGAAGAAGTGTTAATGGTCAGAATT TGGGATAAAGTTAGTATGCACGTATTTGAAAACATCTATCTACCCGCGGCTCAAAGCGGAAGTCCAAGTATATTTCACA ATAATTTTATATGATTTTCCCTTATGAATATTTACTATTATAATTATCATTATAATTATTTTTCTTTGTTATAGGTACA TTTAATACTACCGTTGATATAAAACTTCGTCAATGGGCTGAACAACAATTGCCAGCTCGAAGTGTCGAAAGTGGTTGGG AATGTTTACAACAGGAATTTCAACATTTTATGAATCAAGCAAAACTCAGTCCTGATCATGATGATATTTTTGATAATTT AAAAAATGCAGTAGTCAGTGAAGCTATGAGACGTCATTTTT Unspliced transcript, yet clearly shows that CG8479 is missing an exon, the sequence of which is also present in mammalian and nematode orthologs \**********A. mellifera BB260009B10G1.F RC GGTCATAGGGGTCAGGATCGAGGGTTATGCCGAATTTATCTTCGTTATGCTCAAAATTTTTTTCAATGAAACAATATAA TTAATGAATTTAAAAGTTTTAGAAATCCTGATTTTCAAGAAGCAGTTTTTCAAAAAGATGGTCAGATACTCTTAACATT TGCAATTGCTAATGGATTTAGAAATATACAAAATCTTGTACAAAAATTGAAGCGTGGAAAATGTCTATATGACTATGTT GAAATTATGGCATGTCCTTGTGGATGTCTTAATGGAGGAGCACAAATTAGACCTTTAAATAATGTTCAATCACGTGAAC TGGCATTAAAATTAGAATCTATATATCGAGAACTTCCTCAAAGTGATCCTGAACAAAATTTAATAGTGAAAAATTTATA CAAAAATTGGTTAGGAGGAGAATATACAGACAAAGCTTTAGCATATTTTCATACTCAATATCATGAAATAAAGAAAGTA AATACTGCTCTTGCTATAAAATGGTAATTTATGATAAATATATATTATTATTAATTTAAATTAGAATTTTTAAATACAA CTAATATCAAATTAATATGATTTAAAAATATAGTTGTATGATTAGATGTATATTTATTCTATTACTATTATATATTAAA AAACCTTAATTGTTGGCAAATATTCATTAAAGAATATAAAAGACAAATTTTTAATAAATATTTATATCTTTTATAAAAA AAAAAATAATTTAATAATATACATGTAAATATAAATTTGATATTATATAGAGTATAATGTATTTAATGTAAATTATCTA ATTATAAGGATATAAAAGAAATTTC The vertebrate matches for this are about 476aa, while the Drosophila CG17683 is only 2141, but when search with vertebrate proteins, Drosophila genome show N-terminus in exon just upstream of CG17683 \**********A. mellifera BB260011A10D4.F AATATTCCAGAAGGAACTCATCAATATGATTTGATGAAGCATGCAGCAGCCACTTTAGGTTCTGGTAATTTAAGACTTG CAGTTATGCTTCCAGAAGGTGAAGATTTGAATGAATGGGTTGCAGTAAATACTGTTGATTTTTTCAACCAAATCAATAT GTTATATGGCACTATCACAGAATTCTGTACTGAAGAAAGTTGTCCCATCATGTCTGCAGGACCAAAATATGAATACCAT TGGGCTGATGGGCATACTGTCAAAAAACCAATAAAATGTTCTGCACCAAAATATATTGATTATTTAATGACTTGGGTAC AGGATCAATTAGATGATGAAACCTTATTTCCTTCAAAAATTGGTGTTCCTTTTCCAAAAAACTTCTTGTCTATTGCTAA AACAATATTGAAAAGACTATTCAGAGTATATGCTCATATTTATCATCAACATTTTAGTGAAGTCGTTCAACTTGGCGAA GAAGCACATTTAAATACATCATTCAAACATTTTATATTTTTTGTTCAAGAATTTAATTTGATAGAAAGAAGAGAATTGG CACCATTGCAAGAATTAATAGAGAAGTTAACAGCAAAGGATGCTCGATGAATGATCTTGATAGTTAACTGAAAACTCCT CCAAAGAATCTCACTGTGCTATCTTTTTAACGTGTAATTGTGCATGTTATTTATTAATCCAATTCTTGCT This and excellent mammalian matches show that the N-terminus is missing from CG13852 \**********A. mellifera BB260015B20F5.F GTTGCCTTTTTTTTTTTTTTTTTTTTTTGCTCTGTACTTATTTGAATCTGTAGCTAACAGAGAGCTAATTTCAGAGATA TACTTCGCTATATAGAGGAGGATATTCTACCAGAAATGCATGTAAAATTTGGCCAAGAAATTCTATATTTAGAAGGATG GTGTGCACGAACTCAGTATAATGCGTGCTGTAGATTACTTGGTCCGGGAATAAATATTCATCTTGCGGAAAATCAACTT CTGCGCGAAATTTTTCACCTTGGTAACAAAGTTATGCCGATACAATTGAATCAAAAGACAAGTAAATTAGAAAGGACAT TAATGAATGCAGCTGCATTCAAAGCGCGCACTATTCAGCGTAATAAAAACCGAGACAAACGATCAGCTGCCTTGGCACC GTGATTCTTCAACTTTCTTTTTTTATATATATTAGATAATTTCTTTTAAAATAAGAAAAAAAAAAGCAACGC This EST and the vertebrate matches show that CG3098 and CG15401 need to be fused. \**********A. mellifera BB260017A20C9.F CCAGGATCTCAATTGCCGAGATGTCGTACCGCGGAATGCTGAGCACCTCGGACGGTTCGCCCGTCGAGCTGCGCGTCCA GGATGTGGACTCCTTTGACGGATCGATGATGTCCCATGCGGCCACCAATGCGGCCGGTCCGCTGGGCAGCAAGGCCAGC GGCAAGTCCCTCAACGCCTGCACCTCCATGTCCATGCCCCAAAAGCCGAGTTCCCTAAGGGCGACACCCAGCCGCTGCC CATGTGATCTCGCCGATTCACCGGTGGATGAGAAGGCCGAAGCTGATGTGGAGGTGACCAACGAGATCTCGGCGCCCTA TGAAGTGCCCCAATTTCCCATTGAACAGATCGAGAAGAAGCTGCAAATCCAGCGCCATCTCAATGAAAAGCAGGCTACA GGACCTAGACCAGTGGCCACAGCCGCCGTATTGGCCACAAACAGGGAATCATCCAGCAGCACCGAGGGCCGCGAGTCCG CCGTTACTATGGAGAGAAACGATGCGGACATTAACTTTCAACGGGTCTCAATTTCGGGTGAGGACACCAGTGGCGTACC GCTCGAGGATTTGGAGCGGGCCTCGACCCTGCTGATCGAGGCACTGCGCCTGCGTAGCCACTACATGGCCATGTCGGAT CAATCGTTTCCCTCGACCACGGCTCGATTCCTCAAGACAGTGAAGCTCAAGGATCG This EST and vertebrate matches and then a whole set of old and new Drosophila ESTs show that CG15762, CG11065, and CG11058 need to be fused. \**********A. mellifera BB260018B20A5.F GGGAGTAAATTCTGGTGCTCATCAGCTTGACCAGTTCAGCACAACAGAATTGACGCCCGAACATCAGAAGATTCTAATA GACATCAGACGAAAGAAGACGGAACTGCTACTCGAAATACAACAACTGAAAGATGAGCTGGGTGAAGTGGTGGCTGAAA TGGAGGCGATGGAGGGCGGTGGTCTCGCCGTCGATGAAACCAAGCCATCGAATAAGGCTAAGCAGACGTCGATCGGCCG TAAGAAGTTCAACATGGACCCTAAGAAGGGCATCGAATACCTGATCGAGCACAATCTGCTAGCGCCGACGCCCGAGGAC GTCGCCCAATTCCTCTACAAGGGCGAAGGTCTTAACAAAACGGCGATTGGCGATTACCTTGGTGAAAGACACGACTTCA ACGAGAGGGTGTTGAGGGCATTCGTCGAATTGCATGATTTCACCGATTTGATCCTCGTACAAGCTCTTCGACAATTTCT TTGGTCGTTCCGAATGCCCGGCGAAGCGCAAAAGATCGATGGATGGAGGAATGCTTCGCACAAAGGGACTGGCCGGTGA AATCCCATTTTTTACCAATTCCGACCCTTGGTACCTGGTCAGGTTGCGTTTATATGGTGGGAACCTCTTTTCCCAATCC GAGGGCAAGGATAACCTACGGGGAG This and vertebrate matches show that CG11633 and CG11628 need to be fused \- indeed Drosophila EST from clone AT31091 unifies them already \**********A. mellifera BB260020A10B2.F GCTTTTTTTACTGGTGGAGAAATAGTCAGTACTTTTGATAATCCTGATATGGTGAAACTTGGCAAATGTGATCTTATTG AACAGGTGATGATAGGTGAAGATACTTTATTACGATTTTCAGGAGTTCCTTTAGGAGAAGCTTGTACAGTAATCATTAG AGGTGCAACCCAGCAAATTCTTGATGAAGCTGAGAGGTCTTTACACGATGCACTTTGTGTATTATCAGCTACCGTTCGC GAATCAAGAATTGTTTATGGAGGAGGATGTAGTGAGATGATAATGGCTTGTGCTGTTATGAGAGCTGCTGCTTCTACTC CTGGAAAAGAATCAGTTGCAATGGAATCTTTTGCTCGAGCATTGCAACAATTACCTACTGTTATTGCTGATAATGCTGG TTATGATTCAGCTCAATTAATTAGCGAATTACGAGCTGCACATAATTCTGGTGCAAATACCATGGGTTTGGATATGGAA CGAGGAAAAGTAAGCTGTATGAAACAATTAGGAGTAACTGAATCTTGGGCTGTGAAGAGACAAGTTTTACTTAGTGCAG CAGAAGCAGCAGAAATGATTTTACGTGTGGATGATATTTTACGAGCAACGCCCAGAAAGCGTGTTAAAGATCGCGGACG TTGTAATTATAGGTTACATATAACAATGCTCATTTATTGCTT This and vertebrate matches show that CG7033 needs a C-terminus, and it is readily endocded after an intron; \**********A. mellifera BB260020A10C8.F AATTTCAATATTTAAAGTTACCTCTTCTTAGGAAATGAAAAGAAATATTGAAACTATATTGAACTTATTAAAATAGAAA AAATTTATCATAAAAACTCTTTTAATTGATACTTGCGATTTGTGATAGATGATTTGTATAGTACGATAAAAGGATGAGG ATAGCAAAACAATTTAAATATTTTATACTTGCGAATAATTATCTGTGTAAATATATTTAGGCTTAGATTTACGTCATGA TCCTATGATTGTACTCCAATATGCGTTAAATCATCAAAAATATCATCAAAATGCCACGTATAGATCCTCTATCTTTATT GAAATGTCTTAGTGTTTTATTGGGGCCAACTGGAGGTATAAAAAGTAAAGAAGAAGTTCATCGGTTGGCTAGTTTGATG ACAAAATTTTCCAAAAAACTTGTTTCAAAATGCATTTATATACAAATATTGAAAACTACAAATACAGATTTACTTAGTC AATTTATGGGTGCTGGAGGATGGAATCTCATACATATGTGGCTTACAGATGGTATTCTTGCAAAAAATTGGGCTTTAAT TCAAGAACTTCTAGAACTTTTATTATTATGCCCAGTAGATATAGAAAGATTAAAAAGCAACAATTGTCCAAAATTAATA AAAGGTCTATCAAAAGAAAGTAGTCATCAAAGTGGTAAAATGTTA Shows an N-terminus is needed for CG4124 \- it is annotated as mRNA, but not translated. There are ESTs from Drosophila and Bombyx mori also showing it. \**********A. mellifera BB260020A20G9.F TGGTTCAGAGACTTTGCACTAGTGTAGCTGCGCTTAGCCTCGCCCTAAGTGCTTGCGTTTTTTTTCCGCGTGCTGTCAT GGCGATCGACTTAAGTCGATTTTATGGTCATTTCAACACGAAACGTTCAGGAGATGCTTGTCGTCCATATGAGCCATTC AAATGTCCAGGAGACGATACATGCATTTCGATTCAATATCTGTGTGACGGAGCTCCCGATTGTCAAGATGGATATGATG AAGATTCGCGATTATGCACAGCTGCAAAACGACCACCAGTAGAGGAAACTGCTAGTTTTCTGCAGTCATTGTTAGCAAG TCATGGTCCAAACTATCTTGAAAAATTGTTCGGGACTAAAGCGCGGGATACTTTAAAGCCTCTTGGTGGAGTGAATACA GTTGCTATAGCGCTTTCCGAATCTCAGACGATCGAGGATTTTGGTGCAGCCTTGCATTTGTTACGCACAGATTTGGAAC ATTTGCGCTCCGTATTTATGGCAGTGGAAAATGGCGATCTAGGCATGTTGAAATCAATAGGTATTAAAGATTCTGAATT GGGAGATGTGAAATTTTTCCTTGAAAAGCTCGTGAAAACTGGTTTTCTCGACTGAACAAGCTTTTTTTTAAACGTAGGG GCTATAACGTATATACGTTGACTGTATCGTGTTATAATCGCACTAACATATTTCTAGAAAGATTGTTGGAAGATTCTTT CACTAATTTTTGCAAATGATCACCGCCTCTCTTTTCTTCATTATTACTCT Shows that CG7237 needs both N and C-terminal regions readily available in genome. \**********A. mellifera BB260020B20A4.F GTTGGCCAGGAGTCGGAGCACTCACGCATTGAAATCGAGGGAGCCGAGCCCGGAGCGAGACAGGGTGGGCGCAGAGAAG GATGGGGCTGCGTTAAGTTCGTGGGCACGGTACTTGAAGAACAAGTACGGGAATCGAACGACCAAAGATAAGGAGCCTT CGTCCTCTGCCTCGACGATTCCATCATCGAGTGGAAGCACCTCGAGGAGGTTATCGCTCGGATTACCTTTGAGGCACGG TGGCCAAACGTCCTTCGAATCTTCTGACGACGATCAAAAAAACCCGTCAGGCTCCCCCACGTCCCCTACGGCAGCTCCC GTTATACCCGCGGCAGCAGGTTCCTCCACTAGCAATGGGCGGAGGAGTCACTACTTGCTGAAGCGGCGGCAGCTGTTCA AGTTCGGGATGCGGGGGAGCGAACCCGGATGCTTTACTTGGCCCAGAGGCCTCGCGGTTGGCCCTGACAACTCCATCGT AGTGGCCGACAGCTCTAACCATCGTGTTCAAGTGTTCGACTGTAATGGGAACTTCATGAAGGAGTTCGGATCGTACGGC AGCGGCGAGGGTGAATTCGATTGCCTGGCCGGGGTAGCGGTGAACAGGATCGGGCAATACATCATAGCGGATCGTTACA ACCACAGAATTCAAGTTCTCGATCCTTCCGGTCGTTTCTGAGAGCGTTTGGCTCCCAAGTACCGCAGACGGGCGGTTTA ATTATTCTTGGGGAATACCACCGATGCTCTTGGATTATTT Shows that there are other alternative splices of CG15105, and there is a Drosophila EST that confirms this. \**********A. mellifera BB260021B10H4.F GCGTCATAAGATTTGGAAACCATTACGAATGTTAATTGTCAAAAAATCATTATATCAATGTTAATTATTTAATTAAATT GATTATATGAATTGCTAATAAAAGCAATTAATTTAATTATTAAATAATGGAAAAACAACCGTTAAATCCAAAAGATGAT AAACCTCCGTCTTACTCAGCGGCAACTGCACCAAAAGTAAATTGTTCATGGCAACCACCACCAGGTTACAATTCCACAC AATGTCAAGAATCTACTTCATGGCCTCCACCTCCAGGATATTATCCTAGTGCCAGTGAAACTAATAGTCATAACTATGT ACCTTCTTATGGCTCTACACAATCAACTACAATAATTATGCCAGAAATTATTTTAGTTGGAGGATGTCCTGCATGTAGA GTGGGCATTATGGAAGATGATTTTACATGTCTTGGACTACTATGTGCTATTCTATTCTTCCCAGTGGGCATAATTTGTT GTTTGTTATTAAAGACACGACGTTGTTCTAATTGTGGTGCATATTTTGGTTAATGTGTTACTATATTACAGTAACCATT TCTGAAACTAAACATATTATTAATCTAATTTTAATATAATTATGTAGCTAATATAATTATATGATATTGAAAAATAAGA ACATAGGCAATTAATTTTTTATTAATATTATATAATTATAATTATCTATGCATTTGCATTATTTGAGGAAAAATAATGC CTCAATCATTCATATAATATACAATAATTTGTTAAATAGAAGTTCATGAAATATTGACATATATACATTAAGAAGATCT CTATGCATTTAAGACTTTATAATTTAAG This and vertebrate matches show that N-terminus is missing from CG12012; Bombyx mori ESTs agree. \**********A. mellifera BB270008B10B6.F ATATTCTCTATTATTAATGTAAATAGTTATTAATTTTAAGTTTTATTTTGTTACAACTGGTACCTGATTAATTTGTAGA TAGTAACATAACGTTTAACATCATGCCTATTCTAAGGAAAAAATCTAGCAAAAAAGTAAATGTGGAAAATGGCAATGAC AATTTGATTGGAGATATAACTATCAATGATTTTGCAAATTCATCACGGATACACACAAGATTTAAAAAGGCAATAATAA ATGTCTCCTCTGAAGTGAGACAAAAAATAACAATAGATGAAGAAAGTTTTTTAAATGAAATGAATGAAAATTACCAAAA TAATTTGTATGATTCGAATAAGTCAAAGAAAAGTACTAAAGGAGTTTTAACAAATCAAAGAAATAAAAAGACAAATTAT TTTAAAAAAGATATAGAAGAACAAGAAGGTATAGATTATCCATTAGATATATGGTTTATTATATCTGAATATATCAGTC CTGAGGCTATAGGAAAATTTGCTCAAATATGTAGAAGTTCTTATTATGTAGTTTCAACTGGAAAATTTTGGTTTCATTT GTATAAATCTTATTATAAATTTGTTCCTGGTTTACCAGAACGTCTACAACCACAATGTATGGTTCGTACTCATGGACTT CGAGCTTGTGTCATCAGAACATTACATTATACTTATTTTGCTTTGAAAAGAAAAGTTGATGATGTATCCTATTTAAGAA CAGATGAACCACATTCACTTATAAAA This and a new testes cDNA suggest that CG12765 needs a longer N-terminus. So do the vertebrate matches in mouse. \**********A. mellifera BB270010A10B7.F CTGTTCGGCGGCGGTGTTATTCGTACGGCCAATGAGGAAAGCGGAGCACGTGACCATGCTGGATCCTTTTCAAGAGAGA TACGGTGCCGGAGTGGGGGGTCTCTTGTTCCTGCCTGCCCTCTTCAGTGATCTGTTCTGGTGTGGCGGCGTGTTGAGAG CTTTGGGAAGCTCGTTGGCAGTGGTTGCTGGCGTGAATCCCGACATCAGCATAGTCGCCTCTGCCCTCTTGGCAGCCGT ATACACAGTGTTCGGTGGGCTTTACTCCGTCGCGTGCACCGATGCGTTGCAGCTGGCATGCATCGTGATCGGGTTGGGA TTGGCCGCGCCATTTTCCGTTCTCCACCCCGCCGTCAACTTCGAGAAAAATCTAACGCCGCACGAATGGCTCGGGGAGA TCAAGAACGAGGATCTCGGCGAGTGGGTGGATTGCATGCTGTTGTTGGTATTTGGCGGTATACCCTGGCAGGTATATCG AATATTATATTCGTCAAGATAATTCTGATTTGTGAATACGAATCGCGTGAATCTACACGAGCGTGAATAGATACTCGAT ACGCGCGAACTCTCGTTTC This and vertebrate matche indicate that CG7708 needs a different C-terminus, and it's there in the genomic matches. \**********A. mellifera BB270020A10D10.F TGGCCGTACACAACCAGAACTTATGGGAAGAGAACGACATGCAAAAACATTGGAAATTGCTCAAGAAGAGGTACTTACA TGTTTAGGAATGTGTGTTGCTGAACGTTTACATAGAGTTCATAGACGATTAAGAGAAGAGGAAACTGTATGCAAAGTGT TAGCTGCTGTTGCAGTAGATGCATTATCTAGAAATTTTCAAATGGCTGTAGAAGTTAAACAAGGTATTTCTCAATTAGA ACTTCTCTATGAAGAATTAACAAGAGAAGAGATAGCAAAACAACAAAGACGAGAAAAGTTACGTTTGAAACGTAAAAAG AAGAAAGAACGACGATATGAGACAGAAGAAAAAGAAAATACATGTGATGTATGTTATTATTTGAAAATTAAAATTTTCA AAATTTATATATTTAATATTCTTACATAAGTTAATTATTATTAGTGTTCGAGCAAAAAACAAAGTGGTAATAGTGATAC ATCTTGTGTTTGTGCGGATTCAAAACCGACAACACAAAATATAGATCAACATAAGTTACAAGTATTAGATCCAAAAAAT AAGGGACCACCTACTTGTAATGTCCGGATTGTGTAAAAAAATCAAAATCTAGTATATCACGTTCACAGAGTCAAACACA ATTAGCATTCCCAAAAAAATCATCGAACGTGCAAAAAACTACAATTAAAAAGAGTTCTTCTGAATCAAAGACCAATTTC TACAAT This and a B. mori EST, and now two Drosophila EST sfrom testes and Schneider cells, and vertebrate match indicate there is a segment missing from CG2182, and it's there in the genome. \**********A. mellifera BB270023A20B8.F CATATCGTATGTTGGTTATATACGAATAGTTTAAGATATCTACCAGTATTGGTAAGGCAATGGTGGAGTACTGCTGATA GTAGAGTCAGCGCTGCCGTGGATAAGATCACAACACATTATGTTAGCCCTATGCTTTGTCAAGAGGAACTTCTCAATAA TAAATTACAAAATATTGAAAACATGCAAGTAAAAGTACATCCAACATTCCGTGAAGTGATAGCTTTATATCAAATGGAT GATACAAAATTAGAACTTAATATTACATTGCCATCTAATCATCCTTTGGGACCAGTTAGCGTTGAACCTGGACAACACG CAGGTGGTACTGCGAATTGGCGGAATTGTCACATGCAATTATCCATATTTTTTACACATCAAAATGGATCTGTTTGGGA TGGACTTGCATTATGGAAAAGAAATTTAGATAAGAAATTCGCCGGCGTTGAAGAATGTTACATATGTTTCAGTATTTTT CACATAAATACGTATCAAATACCAAAATTATCTTGTCACACATGTCGTAAGAAATTTCATACTGCATGCTTGTATAAAT GGTTTAGTACAAGTCAAAAATCCACGTGTCCAATTTGTAGAAATATATTTTAATCTATTATATTATTTATAAACAAAAT TTGTATTTGTATTTAAATAAATAAAATAATATCTGTACATAAAAAAAAAAAAAAAAAAAAAAAAAAACCTCGTGCCGCC TCGTGCC This encodes a protein with an excellent match to the end of a very long human protein \- suggesting that the annotation of CG9274 needs revision \- and there is now a Drosophila Schneider cell EST matching it too. \**********A. mellifera BB270023B20C3.F RC GTTGGGCAACGTGAACAAGATAATAATACGGCCGCCAGGGCATAGCGGGAAGGCGAAGAAAGGGCACATCTGCTTCGAC GCCTCGTTCGAGACCGGCAACTTGGGCAGGGTGGATCTTATCTCGGAATTCGAGTACGATCTGTTCATCAGGCCGGACA CTTGCAACCCGCGACTTCGTATGTGGTTCAATTTCACCGTGGACAACGTGAAGGCCGACCAACGAGTGATCTTCAACAT AGTTAACATATCCAAGAGCGCGAATCTGTTTCGAAATGGGATGACACCGTTGGTAAAGAGTAGCAGTAGATCGAAGTGG CAAAGAATTCCCAGGGATCAAGTGAGTCATCGTCGAAAAGAACGAAGAAATACGAAGATATATCGAGAGATATCTTATC TTCCAAAAATTTTTATCCTAGGTTTTCTACTACAAATCGGCGCAACATCAAAACCATTACGTGCTCAGTTTCGCATTTT CTTTCGACCGCGAGGAGGACGTGTATCAATTCGCCCTGACGTATCCGTACTCGTACAGCCGTTATTTGGCGCATCTGGA CAACCTTTGCACCAGGTTAACGTACACGAGGAGAGAAACTATAGCCACGTCGATACAAAAGAGGAAGATCGAGTTGGTC ACGATAAGTTCGAATCTGGACGACGTTCAAGATCGTTCGAGAAAGGTGGTGGTTGTCCTCGCAAGGGTGTATCCA continuous ORF encodes LGNVNKIIIRPPGHSGKAKKGHICFDASFETGNLGRVDLISEFEYDLFIRPDTCNPRLRMWFNFTVDNVKADQRVIFNI VNISKSANLFRNGMTPLVKSSSRSKWQRIPRDQVSHRRKERRNTKIYREISYLPKIFILGFLLQIGATSKPLRAQFRIF FRPRGGRVSIRPDVSVLVQPLFGASGQPLHQVNVHEERNYSHVDTKEEDRVGHDKFESGRRSRSFEKGGGCPRKGVS Not part of any annotation, indeed is within the first 5.5kb intron of CG2246! And in opposite orientation. Best matches are a C. elegans and mouse genes, both to N-terminus, although they differ greatly in size (447 versus 137, but mouse may be incomplete) Try to reconstruct it. This is the entire intron sequence, in forward strand, from about 163912-169487 \- try using FGENESH at CGG website, and it predicts this gene, which is almost perfect. First, I probably would not have found the large first intron, but in fact is fairly well predicted and quite possibly right Second, I don't think the internal third phase X intron is needed, because it is open, so could add those aa to total |AE003774 aattcaaatgattagtacgagttagggtcaagagagtgcaaatcaaagccataacgaaacattaacttggattagggaa aagctacctgcgtttggacatgctgtcctagctgtagcttttggggttattggttatgttaatggggtggtgcatggtt ctctcggtggcataatcgttaaagtaacacacacgcagtcgttacggttcccacgttgcgatgtacgaccgtatcgatt agccccagtgaactgaatagcccccggtttttgcttgaacgcaattaatcctgccgctcctctcggttgcccagttgtg gaaactgctctctggctgcggaggaacccaaaaccacaggcgaaccgaagcccctagccggaaactactgcgaattatc gatcaaacaaaaaggcgacacacgtgagtggcgtggcaacgtcaaagtgccaacatgcgaatcacagttccggagtccg cagcccgaattccggattccaaggcagtcgtctgtgcggtgcggaggaagcggcgaggagcgaccaggagctgcccggg attagcagatagagatcgacctccagcgcccagacttcgcacgcgacttcaccgtcttgggagccccactcgagaaaag tgaaaggccaggaggcgttggctaattgaatttctatttttgatcctgcccaaggagcgaagccccaaagaaaagagaa gttgatgggttgaggcggcggaATGGGCGACTCAGgtaagcgaaatggtcataaatactcgcgataagtacgttgaggg aatcccctagcaaagcaatgtaaaaatacactaggagtttatgaagatgttattgaaaaatagttaaccatgaacaaaa gctagcaatctatacctcattccgaccaaaaatcacaacacccacggttagggtataaatgaagctatttaaaaacctt tcaagcctcgagacattcacctccggctttcaactcaggattggtttggcatgggcttgggattggggggccgacgcaa ccgcgaggcgactattcaattttagacatttgacgacttgagctgctcgaggctcagccaaaaggcgttggcaggtgca gtgagtgcaatggaaacttgcggcacgcatatgtggcacaaggaaaaggaaaggaccggcgggatgggatgccttgatt taagatagcagccgggtggcattaggaggaggattagaaggcgccccggccaagaaagttatgacaggccgggcagagg ccaaagcctcgagggattagggccactgcactcggattggcttggttgcattagaagaacgtagatttttgccaaagaa agacgcatcagcggataagagtcatcatgttgggaatagcctgcgatccacttcagcatacgagcatgtacgaacgcgg atcagatggtagtgaaattgatattaaacactcatcttgaacagattagaaactaagagatagtatactacgcagatca aaaaaagaatattggaaatatttgctcttatcaatcagtttcttatttcctaaactagaggcatgccgcttaaaatctt tttggctcaattcatagtttttatacttctaatggatattcctttgcagATAGCGAAGACAGCGACGGAGAAGGCGGTC TGGGCAACGTTTCCCGGGTTATCATCCGTCCGCCGGGTCAAAGTGGCAAGGCCAAGAGGGGCCATCTCTGCTTTGATGC GGCCTTCGAAACAGGAAACCTGGGAAAAGCGGAGCTGGTGGGCGAATTCGAGTACGATTTGTTCCTTAGACCGGATACG TGTAATCCTCGCTTCCGTTTCTGGTTTAACTTCACCGTGGACAACGTAAAGCAGGATCAGCGAGTGCTCTTTCACATTG TTAACATCAGCAAGAGCAGGAATCTCTTCTCCTCGGGACTGACTCCCTTGGTGAAGAGCTCCAGTCGACCCAAGTGGCA GAGGCTGTCCAAGCGGCAGGTGTTCTTCTACCGATCGGCCATGCACCAGGGTCACTATGTCTTGAGCTTCGCCTTTATC TTCGACAAGGAAGAGGATGTCTACCAGTTTGCGTTGGCCTGGCCTTATAGCTATTCGCGTCTGCAGTCCTATTTGAATG TGATTGATGCCCGGCAAGGATCGGA M G D S \---------------------------1------------------------------------------------1- -----------------------------------------------1------------------------------- -----------------1------------------------------------------------1------------ ------------------------------------1------------------------------------------ ------1------------------------------------------------1----------------------- -------------------------1------------------------------------------------1---- --------------------------------------------1---------------------------------- --------------1------------------------------------------------1--------------- ---------------------------------1--------------------------------------------- ---1------------------------------------------------1-------------------------- ----------------------1------------------------------------------------1------- ---------------D S E D S D G E G G L G N V S R V I I R P P G Q S G K A K R G H L C F D A A F E T G N L G K A E L V G E F E Y D L F L R P D T C N P R F R F W F N F T V D N V K Q D Q R V L F H I V N I S K S R N L F S S G L T P L V K S S S R P K W Q R L S K R Q V F F Y R S A M H Q G H Y V L S F A F I F D K E E D V Y Q F A L A W P Y S Y S R L Q S Y L N V I D A R Q G S D Fly protein MGDSDSEDSDGEGGLGNVSRVIIRPPGQSGKAKRGHLCFDAAFETGNLGKAELVGEFEYDLFLRPDTCNPRFRFWFNFT VDNVKQDQRVLFHIVNISKSRNLFSSGLTPLVKSSSRPKWQRLSKRQVFFYRSAMHQGHYVLSFAFIFDKEEDVYQFAL AWPYSYSRLQSYLNVIDARQGSDKRFTRCVLVKSLQNRNVDLLTIDHVTAKQRSTNRLDRSFIRVIVVLCRTHSSEAPA SHVCQGLIEFLVGNHPIAAVLRDNFVFKIVPMVNPDGVFLGNNRCNLMGQDMNRNWHIGSEFTQPELHAVKGMLKELDN SDvsrgietdligiifvcsynisfqTYQIDFVIDLHANSSMHGCFIYGNTYEDVYRYERHLVFPRLFASNAQDYVADHT MFNADERKAGSMRRFSCERLSDTVNAYTLEVSMAGHYLKDGKTISLYNEDGYYRVGRNLARTLLQYYRFINILPMPIVT EVRSKRRGRNRHAHHSRSRSKTRYEVKPRPKTPRCHAPIAYTNLSICYDSGGGGGSSDEGGFSPARPLAPGSSCFSGYR NYRRAATASCSAHPGHDQYSPFALGALKTGSDHGGGVGGSKGKRSAAVTIEVPLPVNVPPKPYLSIIDLNQLTRGSLKL KSNSFDAADRRZ Bee EST LGNVNKIIIRPPGHSGKAKKGHICFDASFETGNLGRVDLISEFEYDLFIRPDTCNPRLRMWFNFTVDNVKADQRVIFNI VNISKSANLFRNGMTPLVKSSSRSKWQRI-not sure why no match after this?-PRDQVSHRRKERRNTKIYREISYLPKIFILGFLLQIGATSKPLRAQFRIFFRPRGGRVSIRPDVSVLVQPLFG ASGQPLHQVNVHEERNYSHVDTKEEDRVGHDKFESGRRSRSFEKGGGCPRKGVS No Drosophila ESTs for this region Great BLASTP matches to both C. elegans and human proteins of comparable size in GenBank \**********A. mellifera BB270028B10A5.F AGAACCCTTAGATCTTATAAGATTGAGTCTTGATGAAAGAATATACGTTAAAATGAGAAACGAGAGAGAATTAAGGGGA CGATTACATGCTTACGACCAACATTTGAATATGGTGTTGGGTGAAGCAGAGGAAACCGTAACCACAGTAGAAATTGATG AAGAAACATACGAAGAAGTGTATCGTACTACTAAAAGGAATATTTCTATGCTTTTTGTTCGTGGCGACGGTGTGATTTT GGTTTCACCACCGAGCATGAGAGCACCGATATAAAAAATTTCTATTTACATTACAATCTGTCTAAAAATTAAGAAACAT ATAAGTAATATATAATATTACGTCGATACAATTTTATCAACTTTTGTAACGAGCATTTAACAAGTGTGTCGACAAAAAT CTAACGACAAACCAACCATAGATATTTAATTATAGATGAGAAGAATCTGTAACGATAGGAATATAAAAAATTGTAATCC AAATATGGAAAAAAGAAAACAAATCAAGATGTGGTACACTATGTTAAAATGATATATTAATTAAAAGTTTATAAATAAT TATAAAAAAAAAAAAAAAAAAAGCAAC This EST, plus genomic matches, plus human match, plus a Drosophila EST all indicate that the N-terminus of CG5926 is missing \**********A. mellifera BB270029A10G1.F TTTTTTTTTAAATATGGAATATATATACCGTTTACCATTTTTAGCATTAGAAGTACCTAACTTAAAATTAAAGAAGCCA TCATGGTTTGTGAAACCTAGTGCTATGATAGTTTTTTCATTTATACTTTTATCATATTTTCTAGTAACTGGAGGTATAA TATATGATGTAATTGTGGAACCACCTAGTGTAGGCTCAACAACAGATGAACATGGCCATACAAGACCTGGAGCATTTAT GCCGTATCGAGTAAATGGGCAATATATTATGGAAGGATTGGCATCTAGTTTCCTTTTTACATTAGGTGGAATTGGTTTT ATAGTATTAGATCAAACACATAATCCATCAACACCTAAGCTTAATAGAATTCTTTTAATATGTGTTGGATTTATTAGTG TTATTGTCTCATTTATTACCTGTTGGGTTTTTATGAGAATGAAACTACCGGGATATCTGCAATCATAAATTTAATAGAT TTAAAATAATATGGCAAATTGAGCTTAAAACTCAATATTTTGTACTCATCAAAACTATTACATTTTGTATTAAATATGT GGATTAGAAATGTGATTTATAAATAATATTATAAAAAAATTCTGTCAAATATATTTAAAAAAAAATAAATTCTTATTTT TATTTAAAAAAAAAAAAAAAAAAAGCAAC This and vertebrate matches, B. mori EST, and Drosophila EST, show there is an intron in the annotation of CG9662 that is open and in frame and encodes needed aa. \**********A. mellifera BB270029A20C10.F TATGGTCGCACGAAACCGCTCATTTCCAGGACAATGATGAAGAACATTCTTGGCCAAGCTATCTATCAGTTGACTGTAA TTTTTATGCTTCTTTTCGTTGGTGATAAGATGCTCGACATCGAAACAGGCCGAGGAGTTGCGCAGGCTGGTGGCGGTCC AACGCAACACTTTACTATTATCTTTAATACATTCGTCATGATGACACTTTTCAACGAATTTAACGCTAGAAAAATCCAT GGTCAGCGTAATGTCTTCCAAGGAATATTCACCAATCCCATCTTTTACACTATCTGGATCGTGACATGTCTATCGCAGG TAGTTATCATACAATATGGTAAAATGGCGTTCAGCACGAAAGCTCTCACATTAGAACAATGGATGTGGTGCCTATTCTT CGGAGTCGGTACTCTATTGTGGGGCCAAGTAATTACAACTATTCCTACGCGCAAGATTCCTAAAATCCTTTCATCCGCG TAGTGAACGCATTTCGGCAGGGCCTAGACGCACGCTACACGAGCGAGCACAGCAGCACTACATTGGCGGAGGTACTGAG AAAACAGTCGTCCTTAAGCAAGCGGCTCTCGCAACGAGCAGCATTGAATACGCCGATAACAACCCAGACGAACTGACCA TACCCGAGATAGATGTGGAAAGATTGTCAAGTCACAGTCATACAGAGACCGCTGTTTAGAATGGGAGAAGATGGCGGGT CAGCA This and vertebrate and nematode and Drosophila EST indicate there is an intron boundary problem in CG2165 \**********A. mellifera BB270030A10C8.F GAGTGGACAATTTCTCAGAGGTGCCTTTAGTAGTGAATCAGATTACTGTACACGATAATTTGCAATGGACAGTCAGACC ACATTCCTCTGTACCATATGCATTAGACGAACCCATACAATCTGCTGGTTTGGTTCTAACTGCACCAGGAGGTGTGACA GCTACATATGATTTAAATATACTTGGAGAGGGCAGAGTTCTGACTTATGAAAATTTTATTTATATTGCTTTTACAGGAA CCTTTAAAAATGATTGTGATCTGGGAACAGGAGAAAGTGGTTTAGATCCTTTGGATGTTGAAACACAGAGATTAGTTTT AGAAGCTGGTCCTGGTGGTTGGGTTGCTTTGCGCAGAAAACAATATGGAGCTAGATCTCAATTATGGAGAATGACTGGA GATGGTCAATTACAACATGAGGGATCTAGTCCACCTAGAGATAAACATTCAAAAGTTCCAGAGACAGTATTAGTATTAG ATATAGCAGGCACAGCACCTCAACCATTCACTTATTGTGCTCTTGCCTTAAGAAAGCCTGATCCCCGAAGGAGATCTAC ACAAACTTGGCGATTCACAGATGATGGACGATTGTGTTGTGCACATAAAAATATGTGTGTGCAATCGAAAGATGGATTC ATGGGATTACATGAAGGGAGTGAAGCAGTATTAGGTCCTCAACACATACTAATCCTCCTCCAATTGAACAGCGTGTTGG TAGACAA The N-terminal region of the translation of this EST has clear vertebrate and Drosophila genomic matches, indicating that the annotation of CG11003 needs a change. \**********A. mellifera BB270032A10D4.F TGCAACCTTTTTTGCGAAAAATGTTGGAATTACAAATATTTCAACAATTTATAGAAGAAAGACTGAATATGCTTAATTC AGGACTTGGTTTTTCTGATGAATTTGAAATGGAGGCTTGCAGTTATTCTGCTAAATCTGGTAGCAAATTTATGCAGCAA TATCGAGAATGGACTTATACTATGCGAAAAGAAAGCTCTGCATTTTTCCGCAGTGTAAAAGACAAGGCCAATCCAGCTG TCAAATATGCAGTTAAATCAGTAAAAGATAAAGGAAAAGATATGAAAACGGTATATAAAGGATTAAAATGGAAAGGACG ATCAAATAGAAGCGATACAAGTTTGAGATTCCATCAACCAAGATCAGCACCTAGTTCACCTACATCGGATCGAAGGCCT ATCGATTTTTCATCACCTCCAAAATCTCCAAATGGTTTTACTGCTACTACTAGTTATAGAAAAGATCTTCGAATACGTA ATAGTAATTTCACCGATTCAAGCAGAAAACAATATTCACCATTAAGTCCTAGTTCACCAGAAGAATCTGATTTTCCACC AGAAAGAGTAAATATTGATTTGATGCAAGAACTCCGTCATGTAATATTTCCGAACACACCTCCTGTTGATAGAACAGTT TCTCCTGAAGTGCCAGATTTAATTAGATTAGATTCGACAACAAGTACTGAAGATTTCGATCCACTACTTTCTAAAT This and a new Drosophila testes cDNA, and vertebrate matches, show that CG18659 needs a C-terminus \**********A. mellifera Contig1366 GAATTTCTTACCACTTTTAAAATCGACATCTGTCATTTCAAATGGAATGCAACCAGTGACCGGAAGGGGTATTGTAGAA AAAATAAAAATTATTACTAAATCTGACGTCAACCAGATTCTCGTTCAAAGTTCGCAAGAATCTCTTATTACAGGAGCTC TACGAGTGAGTTCAGGAATTGGAGCAAGTTTATTAGCAACGCAGAGGAGGTTAGCACATACTGATATTCAGTGGCCAGA TTTTAGTGATTATCGTCATGAAGCTGTACAAGATCCAAGAAGTAAAAGCAAAGAAAATTCAAGTAGCCGTAAATCATTT GCATATGTTATGACAGCTGCAAGTGGAATTACCGGTGCTTATATAGCAAAATCAGCCATACATGATTTAGTAGCTACAT TTAGTGCTTCAGCTGATGTACTTGCATTGGCAAAAATTGAAATAAAACTTGATGCTATTCCTGAAGGAAAAAGTGCTGT CTTTAAATGGCGAGGAAAACCTATATTTGTACGACACAGGTCAAAAAAAGAAATTGAAAAAGAGGCAGCAGTTGATATT AAGATTCTTAGAGATCCACAAGTGGATTTAGATCGTGTAAAGCAGCCACAATGGTTGATTGTTTTGGGTGTATGTACAC ATTTAGGATGTGTACCAATTGCAAATGCAGGTGATTTTGGTGGTTATTATTGTCCTTGCCATGGATCTCATTATGATGC TAGTGGCAGGATTAGGAAAGGACCAGCTCCATTAAATTTGGAAGTACCACCTTATGATTTTATCGACGATAATACAGTA GTTATCGGTTAATGTATTAATATATGTAGTGAAATGATAAATGTAATTTATATCAAATTGTTTTACACAAATTCGATTT TTATTTGTAAATTTTTTCCTAGAACCATCACATGTTGAAATGTTAAACTACAGTGAAATATAAATAAAAAACTTAAGTT AATTTTAAAAAAAAAAAAAAAAAGCAAC This and vertebrate matches show there is additional complexity to CG7361 \- lots of Drosophila EST? \**********A. mellifera Contig1440 GTCAATGTTAATTTAAATCGAAGTTTTTAATATTTTTGTGGCGTGGCTCGAATAAACTCTTAATATTCGTATATGTAAA TCAGAAAATTATGTGGAAGGCAGTGTTGTTGATCGCAATCCTGTCCGCTGCAACGAATAAAAGTGTTGCAAACGACTGC GTGCCACGAAGTTTCGGGACAAATAATATCGTATGCGTTTGCAACTCGACTTACTGCGACAGCACACCGGAACCGAAGC CGAGCAGTCCAGAAAAAGGTACTTTTCACTGGTACGTATCGAGCAGAGATGGCCTCAGGCTGAGCTTGTCAAAAGGACA AATGGGCCGTTGTCAAAACGATGGATCTTTAACCCTGAACATCGATACCTCGAAAAGATATCAAACGATCCTCGGTTTT GGTGGCGCTTTTACCGACTCGGCGGGAATGAATATCAAGAATCTGAGCGAGGCTACTCAGGATCAATTAATCAGAGCAT ATTTCGATCCGAAAGATGGAAGTAGATACACGTTAGGCCGTATACCAATAGGAGGAACAGACTTCTCTACGAGAGCCTA TACATTGGACGATTACGATGACGATGCGACGTTGCAACATTTCGCGCTTGCTCCTGAAGATGTCGAGTATAAGATACCG TATGCGAGGAAAGCTGTCGAATTGAATCCCGATTTAAGATTCTTTAGCGCCGCGTGGTCGGCGCCGACATGGATGAAAA CTAATCACAAAATCAATGGATTTGGTTTCTTGAAGACCGAATATTATCAAACTTTTGCCAATTACATATTGAAATTCAT AGAGGAATATAAAAAGAATGGAGTAGATATATGGGGTGTTTCAACTGGAAACGAACCATTCGATGCCTATATTCCTTTT GAACGTCTTAACAGTATGGGATGGACACCAGAGCTGGTTGGCGATTGGATCGCTAACAACTTGGGCCCAACTTTGGCAA ATTCCGAATACAACGCCACACATATTTTCGTTTTGGACGATCAAAGACTAGGATTACCTTGGTTCGTTAACGAAATCTT TAAAAATGAAATTGCGAGAAATTATGTTTACGGCATAGCCGTACATTGGTACGCGGATATATTGATTCCACCGGTAGTA TTAGATCAAACGCACAATAATTTTCCTGACAAAAATCTGTTGATGACCGAAGCATGTGAAGGATCTTTTCCATTGGAAA AAAAGGTTGTGTTGGGATCATGGGAAAGAGGAAAAAGATATATATTAAGTATAACGCAGTATATGAATCATTGGGGAGT TGGATGGGTGGATTGGAACATAGCTTTGAACAAAGATGGTGGACCAACCTATATCAATAATAACGTCGACTCGCCCATT ATTGTAAATCCGGAAAATGATGAATTTTATAAACAGCCGATGTATTACGCTCTTAAGCACTACAGCAGATTCGTCGACA GAGGATCGGTCAGGATTTTCATCACCGACACGATTGAAATTAAGGCCGCAGCCTTTATAACGCCCTCGAACGAAATTGT GGTTGTTGCGTATAACGACAATAATGAAAAAACAAATGTGGTTCTGAACGATGTGACAATTGAAGATTATATTTGTTTG GAATTACCCCCACATTCTATGAATACTGTAATTTACAACAAATAGAATACAACATATGAAATGAAAGATCATATGAATG AAAAGCAACCGATGAAGTATATTACAAATTCTATCATGTTAAA This encodes a full-length ~500aa protein with ful-length matches to various vertebrate and nematode proteins, and indicates that CG10299 may need to be split into two or three proteins \**********A. mellifera Contig1578 CGGACCGATGGTCGGCCTACGGTATCCCCACCCAAGTCCCGGCGGGGTACGGTTCCCCGGCAAAGCAACCGATTTCAGT AGCAGGACAACAACAGAATGCAGCGTCTACTGGTAAAGTCCTGACTGGAGATTTGGATAGCAGTCTTGCTAGTCTTGCT CAAAATTTGACCATCAACAAAAGTGCTCAGCAACAAGTCAAAGGTATGCAATGGAATTCGCCTAAAAATGCTGCCAAAA CTGGTGGCTCAGCTGGAGGATGGACACCGCAACCTATGGCAGCTACAACTGGCGCTGGATATCGTCCAATGGGAATGCA AGGTGTGCCAATAGGCATGCAAAGTATGCAAGGCATAAGACCTATGATGAGCACAATATCTGGTGGTCCTGGTAACATG ATGGTCACAGGAGGAGCTGCACCAATGATGATGTCTAGTTCAAATTCGATGATGGGTACTAACCTTCAACAACAACCAC AGCAGCAACCACAGCAGAATATTGCGCAACCACAAAATAATCAAGTTCAACTTGATCCATTTGGTGCCCTGTGAACAAA TTAGGTATCTATTTGATATCTCCTTGTGAAAAAAAAATTATCAATAATAAAATATGGATTATGATTTTAATTAAGTAAA ACATTATTTAATTGAATACTTAAAATATTTATAAATTTTGCCTCTTTACATTCCCATTTTTTGAATTTTATGAATTTTC AGTTTGCGAATGAACAGAAAAATGTATTGATTTGGTAATGTTATTTTGAAAAATTTTTTTTCAGAAGTAATTATATATA ACAGAATATAATTACTTCTGAAAAAAAAAGCAAC This EST, plus vertebrate matches, plus an A. gambiae EST, indicate that there are two exons missing near the C-terminus of CG2520 \**********A. mellifera Contig1667 TTAAAAAAAATATTTATATTTTCCTATTATATTTGGTCAAATAAAACACAATTCTTTTGACAAAGAAAAATTATCAGAA TTGTTTCATTGAAAGAAGTTTGCTTTGCAATTTGTAGATATTTCAATTAAGGGCAAGATTATTGGATTTTAATAAAATT TTAAAAATGGAGGTGGATTATAGTAGCAATTGTGATGTTAAAATTCCAGAATGTAAAAAATTAGCAAGTGAAGGAAAGT TACATGATGCTTTGGACCAATTACTAGCATTAGAAAAACTAGCACGAACAAGTGCAGATGTGGCATCTACATCTCGAAT TCTTGTTGCTATTGTTCAAATTTGTCTAGAAGCAAAGAATTGGGCAGTATTAAATGAACACATAGTATTGTTGTCCAAA AGACGTTCTCAATTAAAACGAGCTGTTACAGCAATGGTTCAAGAATGTTGTACTTATGTAGATAAAATGCCTGATAAAG AAACCAAAATTAAATTGATAGAAACATTACGTACTGTAACAGAAGGAAAGATATATGTAGAAGTTGAAAGAGCAAGACT TACTCATCGTTTGGCAAAAATCAAAGAAGAAGATGGAGATATTTCGGGTGCAGCAGCTGTTATGCTTGAATTACAAGTT GAAACATATGGTAGTATGTCACGTTTAGAAAAAGCATCTCTTATTCTAGAAGCAATGCGATTATGTTTAGCaAAAAAAG ATTTTATGCGCACTCAAATAATAGCTAAGAAAATTAATGTTAAATTTTTTAATGATGAAAATGATGAAGAAACACAATC TCTTAAATTGAAATATTATGATCTAATGATGGAATTGGCTCGTCATGAAGGTTGGCATTTAGAATTATGTAGACATAAT CGAGCAGTATTGGAAACTCCAGCAGTTAGAGATGATCCTGAAAAAAGACATGTTGCACTTTCACGAGCTGTTCTGTATC TCGTACTTGCACCACATGAACCAGAACAAGCTGATTTGACCCATAGGTTGCTTTCTGATAAACTTCTTGATGAGATACC AACATACAAGGAATTATTACGACTTTTTGTAAATCCAGAACTAATAAAATGGTCAGGACTTTGCGAAATTTACGAAAGA GATCTTAAAGCTACAGAAGTTTTTAGTCTATGGACTGAAGAAGGACGTAAACGATGGGCCGATCTTCGAAATCGTGTTG TCGAACATAATATCAGAATTATGGCAAAATATTACACAAAAATTACATTGACTCGCATGGCTGAATTATTGGATTTACC AGTTGAAGAAACTGAAGCATGTCTATGCAGTTTAGTAGAAACTGGTGTGATAAATGCCCGTACGGATCGTCCAGCCGGT GTGGTTCGTTTTACAGGAACCCAAGAGCCGGCTGCTCTTTTGGACGCATGGGCTGCATCTTTATCAAAATTAATGAGTC TTGTCAATCATACAACTCATCTTATTCATCAAGAAGAAATGTTGGCTGTAGCTCAATCCTGAAAGATAAATTTTCACTT TTTCTTTTTCCCTATTTTTTTTTCTCCTTTCTTTCTTCTCTTTGGATCAAAACTTGGTTAATTTTCTTCTTTATATAAT CCTTTTACTTTTTTAAAAGTTATTTCCATAAAATGACTGCACATAAATAATTGTACACTTTCTATAAAACTTTAAAAAA CTTTAAAAAAAACATTTAA This long contig indicates that there might be an unspliced intron in the Drosophila gene CG3294; it's a proteasomal subunit and the honeybee sequence aligns well with all others, but Drosophila has a large insertion. \**********A. mellifera Contig1674 GTATACTTGTTGACCATCCCCGGCCTTCGACGTGCACCTGTCAGCGTCATGTATTCGCGTTCTCGAATGCGACCACGCC GACTCTTCTCCACGATGCCTTTCGTGTTGTAAATAATACTGTGAAAAGAAACCTCAGACCGAAGATGGGCAAGCTTTTG AGCCTTCTGGCTCGAGATGAGTCTACCTGTTGCACCCCTCAAAAGTACGACGTCTTTTTGGATTTCGAAAATGCACAAC CTTCCGATATAGAACGGGAGACCTTTGAAGCGGTGCAAAGAGTTTTGAAAAATTCAGAATCTATTTTAGAGGAGATTCA ATGCTACAAAGGTGCCGGAAAAGAAATCAGGGAGGCGATTTCGGCTCCCACGGAAGAGTGTCAACGAAAAGCTTACCTG ACTGTTGCACCTCTAGTCGCCAAGCTGAAAAGATTCTATGAATTTTCATTGGAACTTGAGAAGGTAGTACCAAAAATCT TAGGCCAACTGTGCTCCGGTAATCTCTCCCCGACCCAACATCTCGAGACTCAACAGGCATTGGTGAAACAGCTGGCGGA AATTCTGGAATTCGTCTTGAAATTCGACGAGCACAAGATGAAGACACCCGCTATTCAAAATGATTTTAGTTATTACAGA AGAACGTTGACCAGAGCATCTCTGGCGCGACAAGAAAGCGCTGAAAAGGACCTCGTGGTCGGGAACGAGCTCGCCAACC GAATGTCCTTGTTCTACGCCCACGCGACACCCATGCTTCGTGTTCTGAGTCACGCGACCATTACTTTCTTGATGGACAA CGAAGATGTAGCACGTGAAAATATTACTGAAACTCTTGGCACCATGGCCAAAGTTTGTCTGCGCATGTTAGAAAATCCG AATTTATTGGCGCAATTCCAACGAGAAGAAACTCAACTCTTTGTTCTGAGAGTGATGGTAGGGTTAGTGATCCTTTATG ATCATGTTCATCCCCAAGGTGCCTTTGTTAAGGGTTCGAATGTCGATGTTAAAGGCTGTGTAAAGCTATTGAAGGATCA ACCACCTTGCAAAAGCGAGGGTCTTTTGAATGCTCTTCGCTACACCACCAAGCACCTGAACGAGGAGAACACACCGAAG AACATTAAGAACCTTCTAGCAGCATGATTACCAAAGCAGCGCGGATGCTGCATCAGAAGCTAAGACAAATTGAACGAAT GACATCCATTCAAATTTTGTGTGTACGAGATCGTCGAAGCGACTTTATATGACACCAAACAAGCAATCTCCTGTTTCTG ATCGTGGCTGGGTGGTGGACCATTAAAAAAAAAAAAACGTGTCTTTCGTAATAGAGTCTATAGCTGCCCGAAAACTGCC GACTTCTACATACCTTCTGACGAAAAAAAGAAATTAAAAAAAAATAAATAAATAAATAAATAAATAAAAAAAACAAAAA AAAGTAAAAAAGAAAAAAAAAAAAAAAAAAAGCAACGC This contig, plus vertebrate matches, show that CG6487 and CG6491 should probably be fused. There's a B. mori EST that agrees, but amazingly no Drosophila ESTs at all. \**********A. mellifera Contig1686 GATAAAAAGAGTATGCAAGCTTTTTTAAAAACTGGTAAATTAGGTCCTGGCGAGTTCAAAAAAGTTTCGAATTCACGTT CAAAAGAAGAACGTAGTGGTCCTGCACCGCCATGGGTTGAGAAATATCGTCCAAAGAACGTAGAAGATGTTGTTGAACA AACAGAAGTAGTAGAGGTATTACGACAATGTTTAAAAGGAGGTGATTTTCCAAATTTATTATTTTATGGTCCACCTGGA ACTGGTAAAACAAGTACTATATTAGCTGCAGCTAGACAATTATTTGGTAGTCTTTATAAAGAAAGAGTATTAGAATTAA ATGCTTCTGATGAACGGGGTATTCAAGTTGTAAGAGAAAAAATCAAATCTTTTGCACAACTTACAGCAGGTGGTATGAG AGATGATGGAAAAAGTTGCCCTCCTTTTAAAATTATTGTCTTAGATGAAGCAGATAGTATGACTGGTGCTGCACAAGCT GCACTTCGTCGTACTATGGAGAAAGAATCTCATAGTACTAGATTTTGTTTGATTTGTAATTATGTATCAAGAATCATAG AACCTTTGACTTCTCGTTGTACAAAATTCAGATTTAAACCATTAGGAGAAAATAAAATTATTGAGAGATTAGAATATAT ATGTAAAGAGG This contig and human matches show that CG8142 needs an N-terminus and it is available in the genomic sequence, with an N-terminal exon and intron. There are embryon Drosophila ESTs showing this. \**********A. mellifera Contig1749 TCAAAATGGGAAATTGTTTGAAACGCGCTGGAAGCGGTCAACAGGACAATACCACTTTGCTGAGTAACAATCCTGATCC TCCTACATTGACTAGCGGTTCTTTACAAGAAGGCCTTGGACCTCCGATACCTAACAATGAGGCTGTAACCTTTTCTTAT GCACCAGTTTTTACAAGGGAACTTCATCTTCAACAAATTGGCATTGGTGTTAATCTAGGACCAGGTAGTGAAGAAGAAC AACAAGTTAGAATAGCAAAACGCATAGGACTTATTCAACATTTGCCTATGAGAGAATATGATGGGACTAAAAAAGGAGA ATGCGTGATATGTATGATGGAGCTGCAGGTGGGAGAGGAAGTGCGTTATTTACCCTGTATGCATACTTATCATGCAGTA TGTATCGACGATTGGTTGCTGCGTTCTTTGACTTGTCCATCGTGCATGGAGCCTGTAGATGCAGCATTGATTAGTTCAT ATCATCCAACCACTTAACACCAAGGAGATGGAGAATGAAATAGCTAATCGGTAATCAAACTATATTATCGCGATGAAAC AGAGAAATTGGAAAAGATTATTAATTTTATTATCATATATAATGGCTTCTAAATAGCAAAAGGGTATTTTCTTCTTGTT ATGGCTAAATTTGCCATTGTATTCAAATATATCTGTTCGCATGCTAATAAT Encodes full-length protein, by vertebrate and other matches, which is unannotated in the Drosophila genome MGNCLKRAGSGQQDNTTLLSNNPDPPTLTSGSLQEGLGPPIPNNEAVTFSYAPVFTRELHLQQIGIGVNLGPGSEEEQQ VRIAKRIGLIQHLPMREYDGTKKGECVICMMELQVGEEVRYLPCMHTYHAVCIDDWLLRSLTCPSCMEPVDAALISSYH PTT There's even a head cDNA for it. AE003844 TAACACCATCTTCTAACAATCGTCAACTTTCCGATGAAAATCAAGTGAAAATTGCAAAGCGAATTGGATTAATGCAGTA CTTGCCAATAGGAACATACGACGGGAGCTCAAAGAAAGCACGAGAATGTGTTATCTGCATGGCTGAATTTTGTGTTAAT GAAGCCGTACGTTATCTACCTTGCATGCATATTTATCATGTTAATTGTATAGATGATTGGTTGTTGAGAAGTCTAACTT GTCCCAGTTGTTTAGAGCCTGTTGATGCTGCCTTGCTGACGAGTTATGAATCGACATAGCGTTATAAAAACATTAGCTT ACAATTTTGCTGCTGTAATGTGTTTTGGGATAACAAAACCTTTG GH26713.5prime GAATAAATCAGGCTTTATTAAATCGAATCTAGTCTAATTTCAAAGAAAGTCACATTTAATGTTTTTTTTTTTTTAAATC AACTAACTAAATTGTTTCTGTTTATTATGAAAGTTGTGTATACATATGTGCATTTTATATACATGCATGCGTACTTATT AATTTAAGATTTCTTGGGGATTGGTACTAATTGGTACTGTATATTTAAATCTTCGAAAAACGCATGAAATGGGTAATTG CTTAAAAATTAGCACTTCAGATGACATTTCACTTTTACGCGGCAATGACAGTCAAATCAGCGGGACACAGCCAGTGTAT CATCAGGGAGAGCATTATCAACGAGAATTGTACCCTTCCACGTCGTCTTCGACAACGCTAACACCATCTTCTAACAATC GTCAACTTTCCGATGAAAATCAAGTGAAAATTGCAAAGCGAATTGGATTAATGCAGTACTTGCCAATAGGAACATACGA CGGGAGCTCAAAGAAAGCACGAGAATGTGTTATCTGCATGGCTGAATTTTGTGTTAATGAAGCCGTACGTTATCTACCT TGCATGCATATTTATCATGTTAATTGTATAGATGATTGGTTGTTGAGAAGTCTAACTTGTCCCAGTTGTTTAGAGCCTG TTGATGCT M G N C L K I S T S D D I S L L R G N D S Q I S G T Q P V Y H Q G E H Y Q R E L Y P S T S S S T T L T P S S N N R Q L S D E N Q V K I A K R I G L M Q Y L P I G T Y D G S S K K A R E C V I C M A E F C V N E A V R Y L P C M H I Y H V N C I D D W L L R S L T C P S C L E P V D A A L L T S Y E S T \* Drosophila protein is MGNCLKISTSDDISLLRGNDSQISGTQPVYHQGEHYQRELYPSTSSSTTLTPSSNNRQLSDENQVKIAKRIGLMQYLPI GTYDGSSKKARECVICMAEFCVNEAVRYLPCMHIYHVNCIDDWLLRSLTCPSCLEPVDAALLTSYEST \**********A. mellifera Contig2388 TGGGATGTGAAAACACTTTCGGAATGCTGTAGGACTGATCATGGTTATACACCAGATTCTCGTGCTATTCGTTTTCTAT TTGAAGTTATGTCAAAATATAATAGTGAAGAACAAAGGCAGTTCGTTCAATTTGTTACAGGTTCACCTCGATTACCAGT AGGAGGTTTCAAGAGTTTAACACCGCCGTTAACAATAGTGCGTAAAACGTTCGATCCATCTATGAAAACAGACGATTTC TTACCATCCGTAATGACTTGTGTTAATTACTTAAAACTGCCTGATTATACAACATTAGAAATAATGCGGGAAAAGTTGC GAATAGCTGCACAAGAAGGACAACATTCGTTCCACCTTTCCTAGAAACGGATGGAAAACCGGCGCGCCATTTGCTCTTT TGCTATCGTATTGTCCAGGTTGAAAAAATTCTTCAAACACCATTTAAAATATTTAAAAAAGTAACTGCGCGCGCGTTTT CTTCAACTCAGTAATCGTATATTTGCACTTTATATGAGCTTTTTCGAAAATATCTTTTTCTTGATTTTAGTGAGAATTC ATTTAAATAAATTTTAATTTGATGTTTATCTTTTTACTTATAAATATGGATTGTGATATATTTTCTCAAAAAAGTGCAA AAATGACATTAGCTCTATACTATGAGTTTCCTCTTGTGGTATTATGAACCAGGTAGAATGGGAGCAGCCTTGTTCTAGG AAACTTATCTTTGATGTGTTACAATAATTAATAAAAGTTGTAAATATAGTTTAATATGCGCAAGTCTATATAAATACTT TATATAATAGTAACAAAAAAAAATGATATTGCTACGAAACTCTATACTAATTAAATATAATTTAATCTGAGAGGACGTC TATTAGACATAAGTAAATTTTATATCAGTGGTGCATTTAGCATCGCCATAAATCGCCAATTTCATGCCTCACTGCAGAT TGTAATGAAATCACAAGTCAAAATTGATCAAAAAAGAAAAAAAATACGCGCTAATAACGTAGCTCATGTTTCATCATCA TAAAAATATAATAACTGCGAATATAATTGCGTGTTTACGATTATCGTTTTAATTTCTTAATCGATCTCTTTTGTTCGCT GCTTAGTTCTCAGTATGCGAGCACTGCGCGAATTAACGTGCAAAACTCTTATATTTAAGTTTATTCAATGTATCACTAA TTAAACTTATTTGATATTTTTATTC This EST and mammlian matches indicate that CG17735 needs a C-terminus. Seems to be indicated as a transcribed region, but not in translation? \**********A. mellifera Contig2672 GGTATTATGATGGTTGTACGCGATGCCTGATCCCGTAACGCTGAGCAACAATGGTGACGTAGAGCTGTTCATCCTGGCG GTCGTATGATCCTCGGCGGTCTTGTTCGGCAAATTGCCCTTGCTGGCGTTCAGCGCACTGTTTGTCGGGTCTTCGTCAA TGTATGTGAAGTCAGAAAGTGGTCGGGGCCCAGTGTGGCTGCGTGAACTGCGCATACTGTGTAAGCTGTGTAAACTATG GACAGAGCGACGCTCCTCCAAATCTTCTAGAGTCGACTTGAACGCTCTGATTACTCGAAGCTGTGTCTGTAGTCGTGTT AGACCACGGATCCATAGAATTTGTCCTGCGCGCGGCTTTTTATCCGAGTCAGGGTCGAATTTCTCATCTCCTAGATTGA TCGCACTGATATCATCCGGCTGGCCGCGGCCCCATGAAAGGATTTTAGGAATCTTGCGCGTAGGAATAGTTGTAATTAC TTGGCCCCACAATAGAGTACCGACTCCGAAGAATAGGCACCACATCCATTGTTCTAATGTGAGAGCTTTCGTGCTGAAC GCCATTTTACCATATTGTATGATAACTACCTGCGATAGACATGTTACGATCCAGATAGTGTAAAAGATGGGATTGGTGA ATATTCCTTGGAAGACATTACGCTGACCATGGATTTTTCTAGCGTTAAATTCGTTGAAAAGTGTCATCATGACGA This and vertebrate matches show that a longer C-terminus is needed for CG2165, and it is available in two exons in the genomic sequence. \**********A. mellifera Contig2806 AACACATGCAAAATGGAACCACTCTATAGGACAATCTGGATTATCACAACCTATCATTTCCCCATAAGAAACTTGATGA CATAGGCAATAAGTTGGTTCGTTAGGATCAACTGGCATATCTAAAACATCTGCTGGATGACCAAGTGCAGTAGAATCTA CTTGAGCACCACTTCCTACAGCTCCAGCTGATGAAGCAGAAGCAACAGATCCTCCTTTTTTCTGTTTTTTTCTAGCTGT TTTCGATTCATCTTCACTGTTAGTACCTGCACCTTTCTTTCGTTTTTCTTTTTCTTTTAATTTTTTCCTGCCCTTTTTA CTAGCATTATTTTCTTCTTGTGCCCTACTACTATTTAAAGCTTTATCTTGTATTTCAGCTTCAAATCTAGCCAAATCAG AATCTAGTCTCCTAATATGTTTATCAACTAATTCATATGTTTGTATTGCTAATTGTACTTTATCATCACCATATTCCTT TGCCTTGTTAAATAAGTTTTGAATATGAGTCAATTGTTCCTTTTTCTTTTCTGGGGATTCTTTCTTTACATTTTTTAAA TAATCATCTGCTAATTTATCTATATCTTTCATTAATCCTTGTGCTCTAGCATCAAGGTCTCGCATTAAAGTGAAATTTC TTTGTAATTCAATAGGTAGATGTTCCAAACTGTCTAAATAATGTTCTAAATACAGTGCTGTTGTCATGTTTATTGTTAC TTAAAGAAAAA This EST, and vertebrate matches, and even Drosophila cDNA LD46333, show that CG9293 has three open introns retained that need to be spliced out. \**********A. mellifera Contig321 TTGCTCCTTTTAGGATTGGATAATGCAGGAAAAACAACAATTTTAAAATCATTGGCCAGTGAAGATATTACACAGGTAA CACCGACACAAGGATTTAATATAAAAAGTGTTCAAAGCGAAGGTTTTAAATTGAATGTCTGGGACATAGGAGGTGCTCG AAAAATTCGACCTTATTGGCGAAATTATTTTGAAAATACAGATGTTTTAATATACGTCGTGGATAGTGCGGATGTAAAG AGATTAGAGGAAACGGGTCAAGAACTATCAGAACTTTTATTGGAGGAGAAATTGAAAGGTGTTCCATTATTAGTTTATG CGAATAAACAAGATCTTGGACAAGCAGTCACAGCAGCAGAAATTGCCGAAGGCCTCGGATTACATAATATCAAAGATCG CGATTGGCAGATACAATCGTGCATTGCTATCGACGGGAAAGGCGTGAAGGAGGGTCTTGAATGGGCATGCAAAAATATC AAAAGAAAGTAATATTGGCACAAGTGTCTTAAAAAATCGAACAGCTTACAATGTACTCTCATACATTCAACATACTTTT TATTTGGCTTTTATCTAATATTTACAAAAGCTGTTAACAGACAACGTAAACGAATGGATTTACTAGTCAATTAACTTCC GTCCTTCATGGAAGGAGCTTTTCTAAAGCTCTAATTTACGATGCCTTAATGAATACGATATCTATTATA This and vertebrate matches, and cDNA AT01916 indicate that there is an open intron that needs to be spliced from CG6560. \**********A. mellifera Contig457 ATATATACAGTTGCGACACAGTTGCCGGTGCGACACAACGCGATATTTAGTCGAGCGATATGCCGACGAGTTTGTTCAG CGAAATGGATCGCGGAGCTAATGGCGGTGGAACTGCTTCCCTCGAAGACCAGAGAGCCCTACAGTTGGCATTAGAATTA TCTATGCTTGGTCTCGAAGGTACTCCCGGATGTCCGGGCACTGGTACAGGAACCGGCACCGGAACGGCCAACGATCCGG ATCCTTTGCAAACGACACCGGCAGGCGTTTTCGAGGAAGCACGTTCTAAGAAGAGCCAGAATATGACCGAGTGCGTGCC GGTGCCTAGCAGCGAGCACGTGGCAGAGATCGTCGGCCGACAAGGTTGTAAGATCAAAGCGCTCCGAGCGAAGACCAAC ACCTACATCAAGACGCCGGTGCGCGGCGAGGAGCCGGTGTTCGTGGTGACCGGGCGCAAGGAGGACGTGGCCCGCGCGA AGCGCGAGATACTGTCGGCCGCCGAGCATTTCTCGCAGATCCGTGCCTCCCGCAAGAGCTCGTTGGGCGCCCTGTTGGG CGCGCCCCCCGGCCCGCCAGCCTCCGTTCCGGGCCACGTGACGATTCAAGTACGAGTCCCGTACAGAGTAGTCGGGCTT GTGGTAGGACCGAAGGGCGCGACCATCAAGAGAATCCAGCACCAGACCCACACCTACATAGTGACGCCGAGCCGCGACA AGGAGCCGGTGTTCGAGGTGACGGGGCTACCCGAAAGCGTGGAGGCTGCGAGGCGCGAGATCGAGGCACACATAGCGCT TAGAACCGGCACAGGAACCACCCTCGACGATTCGGAACTGCTAAGCGTGCTCTGTCGCGGTGGCCTGGGCTCGATCCTC GGTTGCCTCGACCCACCCGGCTCGAACGGCTCGAACGGATCCAGCGGAGCGTTCTCCAGCAGCGGCAGTTGCAGCAGCT CGTCCAGCAGTTCCGGCGCGCCCGGGCTCAACGATCTCGTCGCGATTTGGGGCGCCGGCATGGAGAGGGACGAGGGCCT GGGGGAGTCGCCCTCCTTCGAGTCCCAAACGGCGTCCGCCTCCTCGATCTGGTCGTTCCCAGGCGTCGCGCTACCCTCG AGGCCATCCCCGCCGGCGTCAGCGAGCCCGACGTCGCCCACGGACTCGCTGCTGGGCGGTGGACGGCGGGAATGCGTGG TGTGCGGCGAC This and vertebrate match indicate at least two exons missing from the N-terminus of CG11360; and they are encoded in Genome. \**********A. mellifera Contig49 TCAACCACCAACTCAAGTTACAACTTTAGATTGTGGTATGAGAATTGCTACTGAAGATAGTGGGGCACCAACAGCCACA GTTGGATTGTGGATTGATGCTGGTAGTCGTTTTGAAACTGATGAAAACAATGGAGTTGCCCATTTTATGGAGCATATGG CATTCAAAGGAACTACTAAACGTTCACAAACTGATTTAGAATTAGAAATTGAAAATATGGGTGCTCATTTAAATGCATA TACAAGTAGGGAACAAACAGTATTTTATGCTAAATGTTTGGCAGAAGATGTTCCAAAAGCTGTTGAAATTTTAAGTGAC ATCATTCAAAATTCAAAACTTGGCGAAAATGAAATAGAAAGAGAACGTGGTGGTATTTTAAGATAAATGCAGGAAGTTG AAACAAATCTTCAAGAAGTTGTTTTTGATCATTTACATGCTAGTGCTTATCAGGGTACACCATTAGGAAGAACAATTCT TG This and vertebrate matches and many ESTs show CG3731 needs N-terminus, and there is an exon in the genomic sequences \**********A. mellifera Contig663 AGGAGGTAATGTTTTTAAACGAACTCGAAGAAATTCTTGACGTGATCGAACCTGCGGAATTCCAAAAAGTTATGGATCC GTTGTTTAGACAATTAGCGAAATGTGTATCATCTCCGCACTTTCAGGTCGCTGAAAGAGCTTTGTATTACTGGAACAAC GAATACATAATGTCACTTATATCGGATAATTATTCAGTCATTCTACCAATTATGTATCCAGCATTCTATAGAAATTCCC GAAATCATTGGAACAAAACTATCCATGGTTTAATCTATAATGCATTAAAGCTTTTTATGGAGATGAATCAGAAAGTATT CGACGAGTGTACTCAACAGTATTATCAGGATCGACAAAGAGAAAGAAAACTTATGAAAGACAGAGATGAAGCGTGGATG CGCGTTGAAGCTCTGGCAATGCGACATCCAAATTACAACGCGACTATAAAAGGTATTACGAATACAACAATTGGCACAA TATCGCAACAACAATTGGACAGCCCTCCGCCCGATGAAGATGGTGATACTGATCAGACACCGCTTACATTGGAAAAAAT AGAGGCGAAAGCAAATGAGGCAAAAAAAATGACGAACTCTAACAAAACAAAGCCACTTTTACGGAGGAAAAGCGATTTA CCCCAAGACACGTATACAATGCGGGCATTATCTGATCACAAACGTG This and vertebrate matches show that CG7913 needs a different C-terminus, and it's available in the genomic sequences. \**********A. mellifera Contig75 CCAAATTCTACCCATGCAAGTAGTAATTCTTGTTTTGAATTTTGACGATTTTTTACAGTATAATAATGTTTCCTTAACC GCCATCCCATATCAGGAATGGCTTTACAAACTTCAAAATTTGCATCGCTACGTAATGCTAATGACACCTGTTGCAAAGC TATTTCATGTAATGGTGCTGTATAATTATCTGGATCTAAGAATTTTTTCATAGGAAGTGATGAAGCCAATATAGGATTC ATTAGTATATTGTTTAAATAATTCTGTAGGGCTATTTGACGTTGAGCTATGAAGTCTGGTTCCATATTACCAATGATTT TCTTTGGAGGAAATGCCAGATCAATGCCAGATATTGATAAAGCTGCATTAAGCTGTACAAAGTCATTGTAGCGTCTGCT AACTCTCCAAAATTTTTCAGAAAGTGGACCTCTTTGTGTTCTAATCACATATTCCGTATGTCCGTCGATTGTTCTTGCA TTTTCAATGACGCTCGTCAGCTTTTCTGTATCATCTAACAGCACTTTATTTGTATATCGTTTCTCAAATAAAGCCATTG ATCATTAGTCCAACAAAATGCAGAGCCCGTGCGGGGCGAACTGCGTATGGCTAGCGCATATTTTTCGTAAGTC This and vertebrate matches, and cDNA LD23236 show that CG8726 needs an N-terminal exon, and it's there in the genomic sequence. \**********A. mellifera Contig94 GGAATATTAAAGATTGTTTTCCAGATCTTCAAACGCAAGTTAAACTTAAATTACTTCTTTCATTTTTCCACATACCAAG ACGGAATGTCGAGGAGTGGCGTGTTGAATTGGAAGAAATCATTGAAGTGGCCTCATTGGATAGTGAATTATGGGTATCA ATGCTATCCGAAGCGATGAAAACATTTCCATCAACAGGTTCTTTGAATACAGACATTACAGATTTAGATGAACATAGGC CTATTTTTGGAGAACTAGTCAATGATCTTCGAAAACTTTTGAAAAAACAAAATGATCCAGCTATGCTTCCATTAGAATG TCATTATCTTAATAAGACTGCTCTTACTTCTGTGGTTGGCCAACAACCTGCACCAGTTAAGCATTTTACTTTAAAGAGA AAACCAAAAAGTGCTGCATTAAGAGCAGAGTTACTTCAAAAAAGTACAGATGCTGCAAGTAATTTAAAAAAAAGCACTG CTCCTACAGTACCTGTAAGAAGTAGAGGAATGCCTAGGAAAATGACTGACACAACACCTTTGAAAGGCATTCCCAGTAG AGTCCCTACAAGTGGTTTTCGTTCTCCCTCGCTTACGAGTTCTTCAATGTCTAACAGGACACCTC This and vertebrate matches, and cDNA SD13146 show that CG5874 needs at least two additional exons \**********A. mellifera Contig950 AACGAAGTTGGTAGTTTTGTAATGAGTTCTTTGCTTTCAAGGTAGAATTTAGGAAAGGATAAATTATTTTTTCGTAGTA TTCTTGAGAATTTATATTAAAAGATGGAGGACGTTCTCGAAGAGGTTGTTTCTTCTGATGATTTAAAGAAATTTGAACG TATATATAATGAGCAATTACGTTCATCAGTAATAACACAAAAAGCTCAATTTGAATATGCATGGTGTCTTGTTAGAAGC AAATATCCTGCAGATATCAGAAAAGGAATAATGTTATTGGAAGATTTATATTGCAATCATAGTGATAGCGAAAAACGGG ATTGTCTTTATTATTTAGCCATTGGAAATGCTAGAATAAAGGAATATACAAAAGCTTTAGCGTATGTCAGATCGTTTCT TCAGGTTGAACCTGGAAATCAACAAGTACAGCATCTGGAAACATTGATCAAGAAAAAAATGGAAAAAGAGGGACTTTAT GGTATGGCCATTGCAGGAGGAGTTATTATTGGTCTTGCAAGTATTCTCGGCCTCAGCATTGCTATGGTCAAAAGAAACT AATCATTTCATGTGAAAAAATCTGTAACTATTGTGTGATGTGAAATAGTTGCTTTGTACAACGCGTTATATATATTATT TATTGATTGAAATA This and vertebrate matches show that CG17510 needs an N-terminal exon, present in genomic sequences. \**********A. mellifera Contig982 GCAAGTGTCAATTTGATTATTGTCATCGTTTCTGTGACTATCGATAGTTCTGTAAGTTTGGAATTTCTACCGATGATCG ACAAATTGCATATACGTATTTCTGTGGGTGCCGTGAGGCGGCGAGAAAGGCCACCTCTTTAAATTAAAAGAAGGAATGA CGCTAGAGAATCCGTTCTTTGTCGTCAAGGACGAAGTTTGTAAAGCTTTAAATAAAAATCGCGGTTTATACGGGCGCTG GACGGAATTGCAGGATGTTGTCACGAGTCCTACTGTGAGTGGGGGAATCCCAATCTCACGCGACGAATTAGAGTGGACT ACTACTGAATTACGGAAAGCTTTACGTTCAATCGAATGGGATCTTGATGATTTAGAAGACACAATTTGTATTGTAGAAA AAAATCCAACAAAGTTTAAAATAGATAACAAAGAATTAACGGTTCAACGAAGTTTTATCGAACAAACTCGAGAAGAAGT TAAGACCATGAAAGATAAAATGAATTTAAGTAGAGGTCGGGATCGTGATAACACAGCAAGACAGCCACTTTTAGATAAT AGTCCTGCTCGAGTTCCTGTCAATCATGGCACAACAAAATATAGCAAATTGGAAAATGAAATTGATAGTCCAAATAGAC AATTTTTAGGAGATACCTTACAACAACAAAATGATATGATGAGACAACAAGATGAGCAACTAGATATGATAGGTGAAAG CATTGGAACATTGAAAACAGTATCTAGACAAATCAATACTGAATTAGATGAACAAGCAGTTATGTTAGATGAATTT This and vertebrate matches suggest that CG7736 needs different C-terminal splicing. \------------------------------------------------------------------------------ --