Subject: Re: more annotations Sima, . Let me make sure you understand what I have done. These are the results from a small 600 EST project on a tephritid fruit fly, Rhagoletis suavis, conducte d by myself and Stewart Berlocher, and the large 20,000 honey bee EST project o f Gene Robinson's lab described in my email to Michael. The former yielded fou r unannotated genes, and the latter about twelve, but many more suggestions for improvements of existing annotations (it turned out that my previous estimate of tens to a hundred unannotated genes was exagerated when I went back carefully and tried to reconstruct each of them). We did BLASTX against Release 1 Drosophila proteins at E-05 and TBLASTX against the genome at E-06; for the +/-9000 honey bee contigs and singletons this yielded about 150 TBLASTX hits without corresponding BLASTX hits. I've examined all of these carefully , and the ones I am sending you are the 50 or so that look interesting (others are proteins only in release 2, or artifacts). In addition I was sometimes curious about large neighboring unannotated regions in the genome and note some interesting unannotated genes in them. The two files are both text files, however the New Drosophila Genes file is i n PAUP format and hence can only usefully be viewed with that program (I've attached an old Mac version if you don't have it). Each annotation suggestio n is separated by a series of asterisks, following which is the name of the EST or contig and its sequence, so you can do the searches for yourself to see what I saw. The mRNA and protein FASTA sequences are together in the second file. I'm considering doing something similar for the other available insect EST projects, that is, the 15,000 Bombyx mori, 6000 Anopheles gambiae, and 1300 Aedes aegypti, as well as our own 1200 Manduca sexta ESTs, and perhaps the 17,000 Anopheles gambiae GSS sequences available. I wonder if you've already heard from the Bombyx mori and mosquito folks in this regard? Hugh . Hugh M. Robertson Professor Department of Entomology University of Illinois at Urbana-Champaign \----------------------------------------------------------------------------- \-- 'New Drosophila Genes' file <up>FlyBase curator comment: original PAUP version of this 'New Drosophila Genes' file is archived. What follows is a version containing the text of the file with the genomic and EST sequences that are shown aligned in the original file but does not show the actual alignments between these sequences.</up> New genes in the Drosophila genome identified by TBLASTX searches with tephri tid fly and honey bee ESTs that don't match Drosophila proteins by BLASTX Turns out most are really just problems with the current Drosophila annotatio n, where N and C-termini are not annotated \********R. suavis J3-A2 CAAAGTTTAAAAAAAAATCCAGCCTTAATTCCATTATATGTATGTGTTGGGGTTGGAGCGCTCGGTGCAGTTTTCTA TACTTTGCGCTTAGCTACACGAAACCCTGATGTAACATGGAATCGTTCATCGAATCCTGAACCTTGGCAAGAATACA AAGACAAACAATACAAGTTCTATTCACCGATAAGGGATTACAGCAACATTAAATCCCCAGCCCCTAAATATGAGGAA TAATTTAATAAAGATGTATCTTTCGGCTAGAACATTTCGTTAGCTAATGAAATTAAAAAAAAAAAAAAAAAAA Matches human and other organism NADH-UBIQUINONE OXIDOREDUCTASE MLRQ SUBUNIT (COMPLEX I-MLRQ) at 52% over length of short 80aa protein TBLASTX match to Drosophila genome is an unannotated region of a small messy scaffold 142000013386032 AE002656.1 176729-ATATAGTTCCATTCTGTTTTATTGGATTGAGTAAAGTTAGgtaattttttataaatgtgtttctttctta catattaaatacattttaattataacgtagACAAAATGCAAGGTCTTGGTCTGCAAAGTCTTAAAAAAAATCCAGCT gtaagttttatgattatattaagttatctgttgattgcataacaaaaatttgttcagTTAATTCCACTTTATGTGTG CGTTGGAGCGGGAGCTATTGGAGCCGTCTACTATATGGCTCGACTTGCTACTCGTAATCCCGATGTCACTTGGAATC GCACATCAAATCCCGAACCATGGCAAGAGTACAAAGAAAAGCAATACAAGgttagatttgctccttaatattcatat tctattttcatattttaatgggccaagaaaagccctcgaccgtggttttgttaataaccaaatttgcttatagatat atgtatttttgcaattttttacttttttggcgatttaaccgacatgccataaccattacacatatatatttcacgaa tatatactatgttgtagttgtagttatacccgttactcgtagagtaagatgttatactagattcattgaaaagtatg taataggtagaaggcagagttttcaccatatgaagcctatatattctcgatcaggatcaatatccgagtcgatctga cgctgtccgtccgtctgtatgaatgtcgagatctcaggaactatacaatctagaaggttgagattaagcatacagat tctagagaaatacccgcagcgcaagtttgttggcctatgttgccacgcccacaaatcttcaaaaactgccacatttt ttcacatttttattagttttttaaattttgattgatttcccaaaaattttattcaccaatatctatcgatatcccag acaaatcatgaaatttcgctatggcattttaactagctgaataacgggtatctgatagtcgaggaagtcgactattt tgtacagtgatactttttttcacttttattatttgtctcgcaaaacaattcccaataaccgtaaacaccgcctgtta acaatctctaaacggagatatttgcgtatatcaactagctaataacaaattattaaaacaccatactcaccattctt tatggcagaaaaaatgtatatatatatatatatatgggtaatttaggtaaataggtaaaataagtgaaattaagaaa tgatggggtgtcattagtgcgtattaaggaataattcaacccttaaataaaaaggaagagggatatgcttcaatatg tagtgattgcgacttaaacgccaatcaaaccctttaattttatcaattttataataatctagttatttaaaactatg aaaaaaaaataaagatcacaaaaatgttctaacatttaatattttatagcaattcggacgcaatgcgggagaactta tcatttattttgtcgacagtgactaaccagaaatgtagggatcgccagagatttacacagggtctaaaacgttcgct tcagataaatttctatcaaattctatcacttcactagaatagtagattctttaaaatgcatgcaaccagttaatgga tgccttcctgaccttaaaaattatacatattcttgaaccgagatgaacctcgtaaacctccagggaactattgaagc tagaaagttgagactaagcatgtgtatttaagggtcaactacgcagtgaaggtacataagttccattatattaccca caagccccgcaaacactcactgttacttaataatgttttaatttttttttgtttttgcctttcaaatttcttttgct aggtaaacaattgtttgcgtaagtaatgtaaaactgattgcgatcgttcagtgtcagtttactaaaatctaaaaata tttaaatctcttaattgctggtggacttataggtttactgtaattcgtgttttttagatgaaatatataatattata atataatataataatataatataataatataatataatataataatataatataatataatataatataataatata atataatataatataatataatatataatatattgaatcttgcattttttttcagTTTTATTCGCCTGTGAGGGATT ATTCCAAAACTAAGAGTGCTGCCCCAAACTTTGATGAATAAATTACGTTTCCCTAGCAGCTGCAAT GH04411.5prime TTAG------------------------------------------------------------ACAAAATGCAAGG TCTTGGTCTGCAAAGTCTTAAAAAAAATCCAGCT------------------------------------------- \--------------TTAATTCCACTTTATGTGTGCGTTGGAGCGGGAGCTATTGGAGCCGTCTACTATATGGCTCGA CTTGCTACTCGTAATCCCGATGTCACTTGGAATCGCACATCAAATCCCGAACCATGGCAAGAGTACAAAGAAAAGCA ATACAAG---------------------------------------------------------------------- \----------------------------------------------------------------------------- \------------TTTTATTCGCCTGTGAGGGATTATTCCAAAACTAAGAGTGCTGCCCCAAACTTTGATGAATAAAT TACGTTTCCCTAGCAGCTGCAATTTAAAAATGTAAAATGAAATAACTTCAAATTATAAATAAACAT GM01687.5prime ATATAGTTCCATTCTGTTTTATTGGATTGAGTAAAGTTAG------------------------------------- \-----------------------ACAAAATGCAAGGTCTTGGTCTGCAAAGTCTTAAAAAAAATCCAGCT------- \--------------------------------------------------TTAATTCCACTTTATGTGTGCGTTGGA GCGGGA-CTATTGGAGCCGTCTACTATATGGCTCGACTTGCTACTCGTAATCCCGATGTCACTTGGAATCGCACATC AAATCCCGAACCATGGCAAGAGTACAAAGAAAAGCAATACAAG---------------------------------- \----------------------------------------------------------------------------- \------------------------------------------------TTTTATTCGCCTGTGAGGGATTATTCCAA AACTAAGAGTGCTGCCCCAAACTTTGATGAATAAATTACGTTTCCCTAGCAGCTGCAATTTAAAAA translation M Q G L G L Q S L K K N P A \--------------------------0--------- \--------------------- L I P L Y V C V G A G A I G A V Y Y M A R L A T R N P D V T W N R T S N P E P W Q E Y K E K Q Y K \------------0----very long F Y S P V R D Y S K T K S A A P N F D E Z So, translation is correct, encoding 83aa protein with 56% almost colinear ma tch to full-length of mammalian proteins Drosphila ESTs indicate that there is an intron in the 5' UTR, no surprise th ere. All splice sites are predicted, except intron 2 acceptor; but there are sever al good acceptors within the long intron that are ignored. MQGLGLQSLKKNPALIPLYVCVGAGAIGAVYYMARLATRNPDVTWNRTSNPEPWQEYKEK QYKFYSPVRDYSKTKSAAPNFDE \*********an extra one! The long intron above has several potential ORF-like regions, and indeed one matches a short predicted protein, CG13301; Score = 45.1 bits (105), Expect = 0.003 Identities = 24/36 (66%), Positives = 27/36 (74%) So is on reverse strand, about 400bp in: AE002656.1 ATGCTTAATCTCAACCTTCTAGATTGTATAGTTCCTGAGATCTCGACATTCATACAGACGGACGGACAGCGTCAGAT CGACTCGGATATTGATCCTGATCGAGAATATATAGGCTTCATATGGTGAaaactctgccttctacctattacatact tttcaatgaatctagtataacatcttactctacgagtaacgggtataactacaactacaacatagtatatattcgtg aaatatatatgtgtaatggttatggcatgtcggttaaatcgccaaaaaagtaaaaaattgcaaaaatacatatatct ataagcaaatttggttattaacaaaaccacggtcgagggcttttcttggcccattaaaatatgaaaatagaatatga atattaaggagcaaatctaac translation M L N L N L L D C I V P E I S T F I Q T D G Q R Q I D S D I D P D R E Y I G F I W \* No EST matches for this or CG13301, however these is another something out there that is similar. Lots of room for a long 5' UTR, and a possible large intron in it. \*********R. suavis J3-D1 AGCAGTTCTTCTATGGTGGCTGTCAAGGAAATGATAATCGCTTTGACACCA AGGAAGAGTGTGAGAAAACATGCCTTTAAGTGATGCGATCAAGTTTTATTGATAAATTATATCGATAAATTTATCGT CGTTGGATTGAAGTCGAGGCATTTAGTTGTTGATGGCCTTTTGTAAATATGTGTTGTACCTTATGAAAATGTATTTG TTAAAAAATTATTGTATTATCTCCGTTAAAAACTGTGCATACTAAAATATAATTTATAAAATTTTAAAAAAAAAAAA AAAAAA Matches various Kunitz domains; best Drosophila genome match is an unannotated region of about 22kb! AE003623.1 aaaccactatggtggttcgcatcgagctttgccgtggttgttgttcgatgattttgggtatagcacacggcatgaag ttatatcgagacttaattaagtgccgggtccagaggacctaattcaactgggtcctggagctggaatggaaaatgat tgtacggcggttactaacaaattctaacggatatgggttttgtgtaacaaactgctagttgatttttgggttttaca tttaaatggaaaggaaaaaaaaaagaaaacaaatttaggtgctggaaaaacgtattttaagcaccactctgtgtaac ccaatattttctatggctttatctacatctcacacctgaaaatgattaatttaaaaatttcaatgtacttcgatata atctaggcgttcctcatcctccgctaaatatcaccattaatgtgcaataaagaattatgtagagacagttaataccc gcgagacaaaaagcggctggcttatcttagccaccaccccgaaattcgcgtgcaaaacaattttcaactcgcttcct tttggctgacgacgaaattttcgggatataccccgcctattcaggcatttcccccacattctgtctgccaatttacg ttatcgttttcaaattgaaaatttgcgtaatcacattttaagtgaaaatcgctttgcaaattaaattaaatcgattt tttcaagacgataaaatcgggctgagtgaaaatcgttctaccaaaagttgcagggcacgtaaatatacattatggcc ttctttttatttccgtttccggtttggcattagagctttaagctctgaaatatgcatctgcagttttactttggtgt ctgcatagtcgcgacattgcatctcaaagacccatcaacgtcatcaaatcgttagtcaaatacgagaaaagaaaaaa tatttaagttctttgctcttcgcgcaagtctctcgtggcaataaaaatgatgaagtactttgaaagtactagtagga ctgggctggaaaaataagagaagaagttagaaaagcttgcattcctggcatatcctttagtttttatgcatacccaa caatgaaaaaaagggtacattgagatttcgaacgatttgaatgattaagagtagttttgattcttatgatttaaaat aaataatagataaataaatttaaaaaaatcatttctagtaattcagctgatgtgaaatattaaaatatatttcacat attattctgtcatactagccttgaactttcattaaatgtaaacaaaacaatttataacttgtgtccgtaattttcaa atatttttatcATGTATATTTATTTTCCAACAATCTTTCTCTTATTTTTGTATCCAGTAGTAGCAGTTGTCCCTCAA GGATTTACAATTAAACAACgtaggtcttagaagaaccaaaaattgcatatgaaatttaaaattgtatatattcttag CAAAATGCTGGTATGTGGCAAACCCTGGACCCTGTGATGATTTTGTAAAAGTCTGGGGCTACGATTATTTGACTAAT CGTTGCATTTTCTTTTATTATGGAGGCTGTGGTGGAAATCCAAATCGATTTTATACGAAAGAGGAGTGCTTGAAAAC ATGCCGTGTGTACAGACCTCCAAATCgtaagaaaagggaagaaaatttggacgaagaagaggaggaagagtttgagg aagatattgacaactgggacaaatgggacagcgaatgggacaggatggatctatgaccatatcaatactaagggtat tcaagacccgacatgccgaatagacattggcttcaagttaacttttgattcgttgtgcaaacgcataatttcagttt gaaagaaagaacttcccgtcgagttggttgtaatgcgttttcttttagccgtctccactgcattttccatcttactt acgacggatatatatatatatccctctagACGTCTGTTTGCTGCCAATCTGGGCGACGGCCATTAAGTCAAACCGTT TGAAGCAATTTGAAAGCTACCCAGACTATGCAACATATATATTTTTACAGACTCCCTGGGTTATTTATCAACAATTT TATGTGGATAGCGTTGCGATTCTGACAATTTTTGACATGCAATTTGCCATTTTCCATCTGCTTCAGCCGTATTTTGG GTGTGGAATTTGGCATTTTTCCGCAGGCTGCAATAAGTTTTGGCAGCGAATGCAGATGAGGCTCAT translation M Y I Y F P T I F L L F L Y P V V A V V P Q G F T I K Q \--------------------------2-------------------------------P K C W Y V A N P G P C D D F V K V W G Y D Y L T N R C I F F Y Y G G C G G N P N R F Y T K E E C L K T C R V Y R P P N \-----1--------------------------------------------------------- \----------------------------------------------------------------------------- \-----------------H V C L L P I W A T A I K S N R L K Q F E S Y P D Y A T Y I F L Q T P W V I Y Q Q F Y V D S V A I L T I F D M Q F A I F H L L Q P Y F G C G I W H F S A G C N K F W Q R M Q M R L M M R Z Can reconstruct a nice small gene with two introns, encoding 180aa protein with a single Kunitz domain in it. MYIYFPTIFLLFLYPVVAVVPQGFTIKQPKCWYVANPGPCDDFVKVWGYDYLTNRCIFFYYGGCGGNPNRFYTKEEC LKTCRVYRPPNHVCLLPIWATAIKSNRLKQFESYPDYATYIFLQTPWVIYQQFYVDSVAILTIFDMQFAIFHLLQPY FGCGIWHFSAGCNKFWQRMQMRLMMRZ Kunitz domain C G C WYYD C F YGGC GN N F T C E C Best BLASTP match is to PROTEASE INHIBITOR CARRAPATIN, a 69aa protein from ticks; 53% over 53aa. Neighboring genes are TOLL-4 and Or30a! No other BLASTX matches for this 3000bp region, although the entire region between the neighbouring genes is 22,000bp, so could be others in it, or long 5' UTR introns. No ESTs to help solve this one, even from the entire 22kb! \***********R. suavis J3-A7 ATCAGTTTACGTGGAAGCAATCAAATATTTGCCGAAAATAAGCGTTGATGTTTTAACTAATTACAGTCATTTTGATT GTTTCCAAACAAACAAAAAACAGCGAAATGATTGAAGTTACAGATTTACAAAAAATCGGTATTGGGCCTGGCCTGGA TTTGGTGTCTCATTCCTATTTTTGGGTGTTCTATTTCTGTTCGACAAAGGATTGCCTGGCCATAGGAAACTTGCTCT TCCTAAGTGGTCTGGCATGTGTAATTGGAGTTCAACGAACATTCAGGTTTTTCTTTCAACGGCACAAAGTTAAAGGT ACCACAGCATTTTTTGGCGGAATTTTTGTGGTTCTACTGGGATTTCCAATGATCGGAATGGTTATTGAGTTGTATGG ATTTTTTGTGCTGTTCAGCGGATTTTTCCCCGTGGCGGTAAATTTCTTGGGGCGAGTACCGGTGCTGGGATCAATTT TGAATACTCCATTGATTCAAAAGCTGGTACAAAAACTCGGTGGGGACGCGAATCGAACAACGGTATAGTAGAGATCC AAACAAACTTTCAACTAAGGATAATTTTAATTTAGTTTATTCCTCCGTGAAATTAAACTTAAAGCGCGTTTTTGTAC AAAAATACATAGATTCCACTTTCACATGA Encodes complete protein with excellent matches to 180aa proteins from all eukaryotes; 50% identity to CGI-141 protein <up>Homo sapiens</up> AE003501.2 RC atacgtattaaatggtaaaaatagttttatttaatacattttatcatttatttgcatccaaatggactgaatccaag aggttttaaataattaatcggcagcttgatcgctttacaacactaaagcgcttgcatccctgcaaacattgtttaca gtcgttagcacgtaactttgaatgaaagtcgaaatcagctgtttgtgctttgaaattgaattggtgtttacgtggat ttaattttttctgaaatcaaattaaagcagagccaaatacaaaATGATTGAAATATCAGATTTGCAGAgtaagtagc cattaatatgcgaaagccatcagcgcaactaaaccgcctattgaaatcttcccgcagAAATTGGCATCGGCTTGGCT GGTTTTGGCATTTTCTTTTTGTTTCTCGGCATGCTGCTGCTGTTCGATAAAGGACTGCTCGCCATTGGCAATgtacg ttacccacatgcacatgcaccatatgccccatatagcacacccttcgagctctataataatagcaatcgcctctttc ccgcagATTCTATTCATATCGGGCCTGGCCTGCGTCATTGGCGTGGAGCGCACGATGCGCTTTTTCTTCCAACGGCA CAAAGTCAAAGGCACAACGGCCTTCTTAGGGGGAATCGTCATCGTCCTGCTGGGATTCCCCATCTTCGGCATGATTA TTGAATCCTATGGATTTTTCGCACTCTTCAGgttcgtagcaccctggccaagtcccagtcggattcgtttaagacca tttgagcggggctaaccggttacgcactccacatggtcttttcgctttcccgctcttaatacaccattcgcaagtca gggcttggcagtcaaagttgctgtcagatgacatcctttagaaatattttatattttaattgaaagactgcaagtca tgtagatgggacaatttgacgctgtggcattaacagcaaataccaagtataattcttcagattcgcttacaaaaaaa gctcgatcagattttatagcatttgatcagagctaaagaaagcaaaaagtatgtccttcattataactattcgctgt ttctgtctgttttgttctgttctgtccttttatacattttttcctatctatacctctctccttttaccttttcaatt gcatgcttcttcataagttacgaccaaaggtgcaacttaataattaccctaatatatgtaggtctttttattcatta aaatatcttttaacgaacgttaaaaggtctgcttcatatatatcggattaatgaatggaaatctattcaaagtaggg aaaatattcagtgaatataagaatttaacttaacttcgaaacgtgcaagatggcaatacaagtggaagtaacttctt ttagtgatttgtcgtttacttattacttattaatgtccttctctttatattttcagCGGCTTCTTCCCCGTGGCCAT TAATTTCCTAGGCCGAGTGCCTGTTTTAGGATCGCTGTTTAATTTACCATTTATACAAAAGgtaaggtagcatggag cctcacaaaaaataaaaaatataacatagcaaatagcccatactcatgatcacttaacgcgctcatctcgccttgac gcgaaagcccattgaaataaactgaagggtccacattcccaaacgaccacgcagactcgatttcatagccaattttg gtatttatctaagctatgtatttcaacatttacaagtccaaagtagagggatttagtggcgcgtttagtggatcttt tcccagtcctcacgctcctcgcgctccttcacgacctccttgaggtagggttccagatacttgacgtcctcctcgta cttggtccactgctccttgggcagaatggtcttggtcatggacagatggagggccctcatgatgcggtagttacgct catcgtacagcttcctgggcaatcggcgcacggcctccttcacatcctcgttctcatacagacaatcatcgcgatgc agacctgcaaaaatcgatggcatcaagctcagctgttaggaatcactaggttatatagtagtcctaccgtattggtt gaatccggagagattgtaggcccatctgcccagatttgctgtggggaaaaattcgcattacatatttacgtaatcat aatattccggcgtactgcactacgaaatttgttgcatgcaacgtgagtccgattgttgcccgttcg translation M I E I S D L Q \--------------------------------1-------------------- \-------------K I G I G L A G F G I F F L F L G M L L L F D K G L L A I G N \----------------------------------------------0-- \--------------------------------------- I L F I S G L A C V I G V E R T M R F F F Q R H K V K G T T A F L G G I V I V L L G F P I F G M I I E S Y G F F A L F S--------2---- \--------- G F F P V A I N F L G R V P V L G S L F N L P F I Q K \---------0---------- Can deduce great little gene encoding 140aa protein Drosophila protein MIEISDLQKIGIGL-AGFGIFFLFLGMLLLFDKGLLAIGNILFISGLACVIG VERTMRFFFQRHKVKGTTAFLGGIVIVLLGFPIFGMIIESYGFFALFSGFFPVAINFLGRVPVLGSLFNLPFIQKIV QKLGGDGNRTTV J3-A7 MIEVTDLQKIGIGPGLDLVSHSYFWVFYFCSTKDCLAIGNLLFLSGLACVIG VQRTFRFFFQRHKVKGTTAFFGGIFVVLLGFPMIGMVIELYGFFVLFSGFFPVAVNFLGRVPVLGSILNTPLIQKLV QKLGGDANRTTV The region shown here is the RC strand between the forward strand eas and cas genes, with CG3560 being in the long 4th intron of this gene, on the forward strand \- and it is a real gene! No ESTs to help out with this one; although there are several for CG3560, which is a component of the cytochromes \***********R. suavis J3-B3 ATTGGCAGAACTGAATGTTGAAACAACGAATCAAGATGGACGTGTAAATCAATTAGTGAGTGCTTCGCAATACAAAG CAGGCATCTACAAGTTGCATTTCGATGTGGCATCATATAACGCAGAACGTGGAGTTAAAAGTTTTTATCCTTTTATT GAGATTGTCGTACAATGTGAACGGAATCAACATTATCATATTCCATTACTGTTAAATCCGTTTGGTTACACAACCTA TAGGGGTACTTGAATGTTAAGCTGATTAATGAATTATTAAACTAATATTTGAACACACATGGAAATAAATGATGCTT TTATATATAAAAAAAACATACCGTGACTCAAACAATGATAACGTGTAGATTTTTTTTTAAATATTCGTTATGTTTTT CGAGCTAAACTTTTTTTGTTATTTTTCTATAAAAACACCGCCCAAGCTTTGATCAGAAACATATTAACCACAGTTTC AAACCTTATCTAAAAATCATAAAAATTCAAGAAATTTTCATCGAAATCTAAACTGTGCGGAATGGTAGTGCGCACTT CCAACTACATCACAGCGCTCAGCACTTGTTAGTTATATGCATGAAAAAAGCAAACGA possible end of ORF encodes matches to end of yeast and C. elegans 120aa proteins; called probable transthyretin precursor \- fission yeast gcaggttgtgtgaccagatcctattaattacgctgttactaataccaaagacac tagaataatagaatattctagtgtctttgctaatactttaaacaaatactgaaacatacggcatcaaacgattttta tatttttaaaaactgttgcaaatttattcgtttatttggaattaatacttcattaaaaaaaagtatatatatgatcg ttccaatttatttccccaaacagatttttggcttacatattattttgatatcccacttatcagtgattttcattgtt aatgcaagatgcaatcaaagattaagataaagagtcccataaactggagcttttgctaccgtctgagataccagaga tatctacgatctttaatcttaaagttgccctcaagATGGATGCACGAAAGTTTTCTACCCACATATTGgtaactttg tttcaatttatttatatgtataaaatttttcgttaaccttttacagGATACTTCGGTGGGAAAGGCGGCAGCCAATG TGAGAGTAACAGTTTCCAGGCTGGACGAGATTCAGGAATGGAGATCCCTTCGGGCGGCCCAAACTGATGCGGATGGT CGCTGCCTGCTCTTGGAACCTGGTCAATTTCCCGGCGGGATCTATAAGCTGACCTTTCACGTGGGCGCCTATTACGC GGAGCGCAATGTGAGGACACTTTATCCAGCAATTGACTTGATTGTGGATTGCAGTGAGAATCAGAACTATCACATTC CTTTGTTACTCAATCCCTTTGGGTATTCCACATATCGTGGAACATAGCTCGGTTAAAACCGAATAATGGATGTTACA CAACTTACAAAAATGTAATTGATTTGAATAAAGGTTTTTTAAATGTTTATTTGTAACATCTCGGGATCGCTTTACAC TTCGTGCGATGTTCGTGCATGAGTAACCGTGTCATGAAATCAACTAACCTCACGCTCCC translation M D A R K F S T H I L \--------------------------0------------------ \---------- D T S V G K A A A N V R V T V S R L D E I Q E W R S L R A A Q T D A D G R C L L L E P G Q F P G G I Y K L T F H V G A Y Y A E R N V R T L Y P A I D L I V D C S E N Q N Y H I P L L L N P F G Y S T Y R G T \* Drosophila protein MDARKFSTHILDTSVGKAAANVRVTVSRLDEIQEWRSLRAAQTDADGRCLL-LEPGQFPGGIYKLTFHVGAYYAERN VRTLYPAIDLIVDCSENQNYHIPLLLNPFGYSTYRGT J3B3 LAELNVETTNQDGRVNQLVS ASQYKAGIYKLHFDVASYNAERGVKSFYPFIEIVVQCERNQHYHIPLLLNPFGYTTYRGT No ESTs for this one \*************A. mellifera Contig1006 GGAAAACATACGTGGAGTCGACGGTTACTACTGGGATCCAGATCTATACGTCAGAGACGTTGAAGCAAGCCGAAGGT GTGGTGAGCACAGTGAGGTGCGAGGGCAGGGAACATGCCCTCGGCCGCGGAATCAAGCGAAAGCTGGACTCCATCCA TTCCATGCATTCTACCCTGCATGAAGACCAAGATGTAGCCGAGGCAAAGTCGGAGGAGAAGAGCCAGAGGAAACTGG AGGTGGGTGAGCTAGTATGGGGCGCCGCAAGAGGAAGTCCGGCGTGGCCGGGCAAGGTCGAGTCTTTGGGCCCACCG GGCACCATGACGGTGTGGGTCCGTTGGTACGGGGGCGGGGGCGGTCGGAGCCAGGTCGAGGTCAAGGCTCTCAAGTC CCTCTCCGAAGGCCTCGAGGCGCACCACCGTGCGCGAAAAAAGTTTAGGAAAAGTCGTAAATTGAACATGCAGCTGG AGAACGCTATACAGGAGGCGATGGCTGAGCTGGACAAGGTGACGGAGTCGAGCAAGGAGCAGAAGGTCGGCGGGAAG TCGTGCAAGGTGTCGAGCGGAAGCAAGGAGTGCGGCAACGCAGGCTCGAAGCAGGATGGGAAGAGGTCGTCATCGAA GAAGGCGTCTGTCGGTTCTGTGAATCCTGTCGCGGCTGAACAGAAACAGTGTCGGTGATCCGTGTCGATGGATCCTC GTACCAGTCAATTTGATAGATCACGATGATGATGAAACGCACTGTTCCATCTGTGAACCATAATCAAACGTGTTAAT TTTGTGATTGAGAAACACCGACACCGAAAAACTTCCTCCACAACTCTTGTTTTTTAAAATTCTCGACTGACACGTAC ACACACACATGCGCATACATATATACATATACACACGGTGTTTCCCTTGATAACAGAAAAAAA Matches DNA cytosine-5 methyltransferase- mammal 40%; Dros EST 51%; Dros genome 37, 67%; seems to be in same region, but not same translation, as sba gene? So this one is tricky. First, it matches the extreme end of a 1498aa human protein KIAA1461, in a region known as a PWWP domain in CDD However in other DNA cytosine-5 methyltransferases this domain is around 300aa in 900aa proteins. Since our best match is clearly KIAA1461, and ours is clearly at the end of an ORF, try searching Drosophila genome with this protein. Hopeless, since get lots of matches, including to sba gene. So could represent an alternative end to sba gene. \***********A. mellifera Contig1287 CCAAGATTGAGAAGGCAGACGTCCAACTCGAGCTTGGACAACGTCGCGCTCAAGCAAATTTTACATTCCAGCGAGAA CGTCAATTCGGAAGGTGACACGTCCAAATTGGCCAGCTTCGCGAATCTGAGCAGGCAAAGCTCGGAGAAGGGGATCA ACTTGACGTACACGGAACAGGATCGAGATGACGGGAAATCGAATATGTCCGGTAAGAAGTTTGGCCAGACGAATGGT AATGGGAACGGTAATGAGAAGAAGACTACGTTCGCCACTCTGCCGAACACGACCACGTGGCAACAGCAGAGCAGCCA GCAATCCCAACAGGTGGAACAACATTCTGTTGATGAAAACGGTGGTAACACCATTATGGCCTCGCAACTGAATAACA TTAGATTGAAGCTGGAGGAGAAACGTCGGCACATAGAGAACGAGAAGAGGAGGATGGAGGTCGTGATGTCGAAACAG CGTCAAAAAGTTGGCAAGGCTGCGTTCCTGCAAGCTGTCACGAAGGGTAAGGTTAAATCTCCCTCTTCATCAACGTC TGGGGGGGACAGTCCGGCTGAAATTGGTCCCCCCACTTCTGTAACCTCCGGATCTTCGGGGGAGACCCCGACAAGTG TTTCCGAGACGACCCCTGTAACCCAACAACCCTCTCAAGAAAAACCACAGAGACCCTTCTCGCTCAAGGAAATTAGT GAAGATGTTCGAGATGTTGAACATAAATGGTTGGAACATGACGGAAATGCGCCATTTATTGAAACAAGACGTACTCC AGATATTGAGAACATGGATATTGAACAGTATCATCAATCCATATCACAAATGAATAACAGTCTTAGTGAAATTCAAG CTGACATACAACGTTTAGCAAATCGAGCAAATCAAATACAACAACAGCATCTAATGACCCAACACCAACAA complete ORF encodes 13% serine and 12% glutamine protein >300aa; BLASTX is to N-terminus of 856aa KIAA1078 protein <up>Homo sapiens</up>; 24% over 260aa and 5e-10; genomic match links CG18462, CG18459, and CG18460!; three ESTs, but not very useful So, this tripartite gene spans 10kb, and is probably beyond my abilities to reconstruct. Instead suggest they search with KIAA1078 protein and see how it spans these three genes. \**********A. mellifera Contig1312 AAAATCTTAAATTTTGGTTAGACATGTAATAAGTATGGTTTTTGCATTATTCTTTCACATATCTTCTTGTTTTTATT AATATCTTTAATAGTTTTTTAAAACTCTGTAAATACATATTCTTAAGATATTTAAAGTAATATCAAGTGTATATTTA TATATCTGATGTAAGATACAGGTTATAATTTTGAGATTATTAATACAATGGATTTATCAAAAATTCCAAATGATAAA AAATTATATCTTTGCAAATGGTATTTTAGAGCTGGATTTGTTTTTCTACCATTTCTTTGGGCTGTGAATGCTATTTG GTTTGCAAAAGAAGCTTTCGTTGAACCACATTATGAGGAACAAAAACAAATTAAAAGATATGTAATATTTTCTGCAA TTGGAGCAGCTATATGGTCAGCTGCTCTTTTAGCATGGATTGTTACATTTCAAACACAAAGAGCAGCATGGGGTGAA TTTGCAGACTCTATTAGTTACATAATTCCAACTGGCATTCCTTGATGTTAAAATATATATCTTTGTATATTAATAAG TGATTATTATAATATAAATTTTTATTGTACGAATTAAAATGAAATAAGACTATTCTTTGTCTTGGTATTTAATTAAA TAATAATTGTCTTTAAGATGTAAAAAATATATTATTTATTTAGATGTATTGATTTTAAATAAGATTAAAATTTGATA AAACATTCAAATTTTAATTAATTAAATATAATTGAACATAAAATGTAATATTTTATACAATTTATATCATTTATAAA AATTTAAGTTACAAATAAATATTTAAGAACTTAAAAAAAAAAAAAAAAAAAAAAAAGCAAC Could have internal ORF of 300bp, encoding 14% alanine 100aa protein; indeed excellent BLASTX to uncharacterized hematopoietic stem/progenitor cells protein MDS033 from human and another 100aa protein from C. elegans, 40% and e-18; NEW GENE in unannotated region; sadly no ESTs, but will be easy to annotate. AE003800.2 204127-ATGGACATCTCAAAGGCACCAAATCCGCGAAAACTGGAGCTGTGTCGCAAATACTTCTTTGgtaagagtt actaccaatgagtaatgattggattttaaccaagttactttctatttgtctcgaacttagCTGGCTTTGCATTTCTG CCCTTTGTGTGGGCCATTAACGTTTGCTGGTTTTTCACGGAGGCCTTCCATAAGCCACCATTTTCGGAGCAGAGCCA AATAAAGAGATgtaagtcaatatatgaatagatgcccatgccatacagtctaatattccacaatttctttcctacct tcctccagATGTTATATACTCTGCAGTGGGGACTCTATTCTGGCTGATAGTACTAACTGCCTGGATAATAATATTCC AGACAAATCGCACAGCCTGGGGCGCCACAGCGGACTATATGAGCTTCATCATACCCCTAGGCAGTGCATAGACATAA CTAGATTAATTCGTTAGCA translation M D I S K A P N P R K L E L C R K Y F F \----------------- \-------------------1--------------------------------A G F A F L P F V W A I N V C W F F T E A F H K P P F S E Q S Q I K R \----------------------------------------1--------------------------------- Y V I Y S A V G T L F W L I V L T A W I I I F Q T N R T A W G A T A D Y M S F I I P L G S A \* bee MDLSKIPNDKKLYLCKWYFRAGFVFLPFLWAVNAIWFAKEAFVEPHYEEQKQIKRYVIFSAIGAAIWSAALLAWIVT FQTQRAAWGEFADSISYIIPTGIP fly MDISKAPNPRKLELCRKYFFAGFAFLPFVWAINVCWFFTEAFHKPPFSEQSQIKRYVIYSAVGTLFWLIVLTAWIII FQTNRTAWGATADYMSFIIPLGSA Neat little two phase 1 intron gene. \***********A. mellifera Contig1411 ACGTTTTCGTGTGTAATCAGTGAAATTTTTGTGAAAATGTTTTCCACTTCTACACTGAGTGTCCTATTAGGGACAAT TTTACTTATCTCTTCGATTCCCGATGCAGTATCTTTTAGCAAGTACGGGAGGACGTGCAAGGACATCGGTTGCATGA GGGATGAGGTCTGCGTGATGGCCGAGGATCCTTGTTCGATCTACCAACGAGATAACTGCGGTCGTTATCCGACTTGT ATGAAATCTCGTCCAGGCGAGGCTAATTGTGCCAGCACTCTGTGCGGTGAAAACGAATACTGCAAAACCGAGAATGG CGTCCCAACATGTGTGAAGAAATCAGCAGTAAATGGATTCGAGTCGGCGGGCGTTTCTTACGTGAACGGGCAGCGGG TGAACACGGACGAGAAGCAGCAGCTCGATAAGACGACGGCCAGCAACAGCGCTAGCAATTCGAACCCTTACGCCAAT GCTAATGCGCCACCTGCCCCCGCCGAGCCAGCGGGAGGGTATCGCCATCAAGTGAATTCCGCCACCAATTTGGGTTA TCCACCTTATCCTAGCTCCGACACGGAGCGTTCCAAGAGCGGTGGGTATCCATCGTATCCTGCTGGCTCCAGCAACG GGTATCCGCCTTATCCTTCCCCCAATCAAGGGAATCGCCAACAGGATTTAGGATACCCACCGTATCCAACGCACAAC AAAATGCCGATGCCCGGACAGTCGAATTATCCCACGTATCCCGGCCAACCGGGCCACTCCAATTA complete ORF encodes 250aa protein with 12% serine and 11% proline; lots of weak BLASTX matches involving cysteines in the first 100aa, hard to say anything about them; genomic match is similar but at 50%; then there are a bunch of ESTs, including two B.mori, and to same sequences as Genomic, so is a real gene; NEW GENE in unannotated region TFSCVISEIFVKMFSTSTLSVLLGTILLISSIPDAVSFSKYGRTCKDIGCMRDEVCVMAEDPCSIYQRDNCGRYPTC MKSRPGEANCASTLCGENEYCKTENGVPTCVKKSAVNGFESAGVSYVNGQRVNTDEKQQLDKTTASNSASNSNPYAN ANAPPAPAEPAGGYRHQVNSATNLGYPPYPSSDTERSKSGGYPSYPAGSSNGYPPYPSPNQGNRQQDLGYPPYPTHN KMPMPGQSNYPTYPGQPGHSN TTCCCGCGGGCGAAGAGTACGGCCGTGGATGCGGGGACATTGGCTGCCTGCCCA CCGAGGAGTGCGTCATCACCAGCGACTCGTGCAGCTACAACCAGCGTGACGGCAAGGATTGCGGCAACTATCCCACC TGCAAACGGCGCTCCGGCGGAGGATCATCCGCCTCGAACAGCAGCCCCAACTTGGCAGCCCCCTCGGCCAATCCGTC AGGTATTTAGCTTGGAACCCCCCACTTGATTActcattgtgcgcttccctgggggagttgtccaggcgattggcatc gtgtttcgtaattcgaatttcgaggtttggcgaccggcgattggcggctacatctttttggttcctcaaaacaagtt atttcgggttaaatccaacgaatttctgggggcgcaaaagctgacttcgggcgccgaagataacaatagcctctcag agacgagaaataacacgttgcattgaatacaattcaagaaataataattttctgaaatatacaaataatataacgct attgactgacctattgtcctaataatcacatcacgtattcacatactctttccaagcctcgaaactctgcaaggcta taagcttaattccaaattaccgaacttatttgtcgtttctttttgtttggccccaagtatttgttaagttttgtgat catttccatttatatgtatgtatctatgctatataactgaatgtaatatgttttatggcgatctaatacccacacca acacactgagcgcttccgctgactcaatcaatttaacactcaattcgactctcaatcaatcaatcaaccaatcactc gctcgctcgctcattcatcgcctcattcgctcaaggcttgcaaatattcgactgctaaccacccacccgcccccgcc tctccccctctccaccgatcacttgtgcatgttttaggaagcaacagtacagagagaaaaaaggcaacatgcaacac gtctatttacttctgtgtatataacatatatttcaaattcaaattcaactttattagcactaaaaatctttacaact gcataatagcttttgtattttaagcttacacaatctagcagctgaaatgaaaataataatttaagagcaaaaccaat ttcttaatgtccatgtcttaaaattaacatcaagaaggttacgagtacttgaattaaaatgctagaatcatgttgtt aaggaagcttgactggtgcctgacccatcatagattaacatatatat translation E Y G R G C G D I G C L P T E E C V I T S D S C S Y N Q R D G K D C G N Y P T C K R R S G G G S S A S 5' region ttgcaagcttgtaatgcgcttttgattggttacatagggcagatgcgttttttt ttttgtttagaagcaaactgccttcaaacttgttttaactcttacgcgaaagttggccaactgaaaaaaaagtattt ttccatcttgtactttgcagcaacttttgatccaagggcgtgacatcgatagcgagcaacaggatgctggcactgcc tttgctgactctggcggtcctcgccagctgcggctactccgtggacgcctactccagtacgtatacgtaatatttca ggtgggggttggctttcgtggacaccttaccaccaactattgtcctagaaagccatcaccccactgccatttgttgt gttgtgtaatcagtactcctgaaatgagcaccgatcccttggatcggtggtgaacctttcatgccttcatcaaccct tgcccctccatcattgacatactgaagccagatgcggtatccccgattttgagcagacttataatttgattttcttt tttttttccgtatgattttgacccacccactctgatccacaaaacacacaccgaaacccgcaatccgcaacccgaaa tccgaaatccgtaatccgcaacccgaaaccgtaaaccgtaatccgtaatccttgaacctaatcgaat This is going to be horrendous, with the ESTs showing a 5' end in the next file, about 70kb away with a bunch of gene inbetween! then linking CG1735 and CG1726. SIMA \- this one is rather complicated, suggesting that there is a huge gene linking CG1735 and CG1726, try working with the Bombyx mori ESTs identified using ours. \***********A. mellifera Contig1463 GCAATTATCCTCTTCCTTCCCGTTTTTGTTTTCGTTTTTTTTTCCTTTCTTTCCAGTTCGCTTTTTTTTTTTTTTTC ACTTCACTTCTCCCTCCTTGGGCATAAAGGTCAACTCGAAATTGGTTGTTCATTGTTTTTATTGTTTAACTCTCGAT CGATCGATCGTTCGTTCGTTCGTTCGTTCGTTCGTTCGTCATGCGTGATTGCGTGCGTACTTAAATGAGTGCGTGCG TGCGTATTCACTCACAGACTTGCAGAAGTTAGCTCCATGACTGGTAGGAGTCTTGGTACCATTCGTGATCATCGCTG GAAACACCGCCAAGAGTGGCGGTCCTACCTCCGAATCCTATGTCACCGCCGCCACCACCACCACTGCCACCATTCAC ACTCGCCAATCCGTACGCATTGCCCAGTGGTTGTTGGGCGAGGGGTTGGTTTCCCCAGTTGCTCTGGAAGCGACGCT TAGATTCCCCCTGGTTCTGGTGACCCCCGTCAAATTTTCGTTTACCTGCTGGTAAACTTCCCTTGGCGCGGACCCCC CCACGAGCGGACAGCCGCTGGACCCCACGAGCCCCTGAAGTTGCGGGGTTGCGTCCCCCCCTGACTGGCCCACGGAC TTGGGGGCCACCGACTCGGCCACGCGGCACTACTCCACGGCCACGTCCAGCCGGTTGAGGCTGCCTGCCTCTCCCTC TGGCAGGTGGCGGAGGCGGCGCGTAATCGAAATAGTAATCTTCGTATCGATAATAGTCATCGTAATATGGATCAC no obvious ORFs; but in RC end of ORF encodes 21% glycine and 11% arginine protein with 35% e-05 match to end of 600aa heterogeneous nuclear ribonucleoprotein R <up>Homo sapiens</up>; seems real to me; genomic match is to C-terminus of CG17838; provides real end of this protein. \***********A. mellifera Contig1481 TGTGCTTCCAATTTCTTTTTTCTTTTTTCAATTCTTTTATGCTTTGTACTTTTTAAATGAATGTTCCATTGAAATTC TCCAATAAATATCCTATCGCAAATATCACAAAAGTGTCTTTCCTCGTTGCTACTATCACTGAATTTTTTATTCTCTA TTGATTCGTTTAAAGGTTTTTGTTCAGGCTTTTCTCCTCTTAGCACAGCTTCGATTATTGCCACAGCTGGTTCATAT ACACAACTGTCCCATTGATTCACATCGGTAGAATCTAATACATAAATTGGTGGTACTTGTCTGTCACTGCGACGAAG TAGACGATTCATAACCCATTTTTTCTGTTTTTTCGCGTACCTCTTCGTGACCATTTTTAAGTCATCAATACCCCTTT GCAATAATTCTTGTCCTTTTTTCCCTCCCTTCTCTTCTTCTGGCAACACAAGGTAATCATGAAACTCTTTGAAGCCG ATACTTTGAAAAATGCCCTTCGTATAATCGACTGATGTGTTGGATTTAATCCGTTGCTTGTTGTATCTCCGATGAAA GTCAAGCAGTTCCTGAACCAGACCGGTCTCCACCATGTCGTCGACCCTTCTCTCCAACCGATCCTCGAGGACTTTCA TGTCACAATTGATCCATAATAGAATGGCATTGCGGTATCTCAAAGGACCTCCTAATCCAGAACCACCAGCTATCCTT TGAGCTTTTAGCAATTCTGAATGCTTCACACCATGTTGCTCGAACACTTCAAGTGATCGAATGATCTTCCTCCTATT GTTCGGATGAAATCTCTTCGCCATTTCCGGATCCACTTTAACCAACTCTTCGTAAAGCTCCTGGTTATCCTTTGTCA TCGATCGATCCAACTCGATCTTCATCCTCTTCGTACGCGACACATTCTCGTCCAACCGGTCATCATCGTCCTTGCCG ATCCCCGAGTCGTTCATCAGAACTTCCCAAAGGATGGACTCTATGTAATAGTTGGTGCCACCGACGATGATCGGGAG CTTCCTCCTCGCGAGAAGATCGTTGATAATAGGTATGGCAGCATCCCTGAATTGTACCACCGTGTAGCTAGGGTTCA GAGGGTCTACGATGTCCAACATGTGGTGAGCCGCCTTTGCTTGTTCCTCTTTCGTTACTTTCGCGGTCACGATGTCG AGGCCTTTGTACACCTGCATACTATCGGCTGAAATAATTTCTCCGAAGAATTTACAAGCTAATTCGATGGCCAAAC complete ORF in RC encodes tRNA isopentenylpyrophosphate transferase <up>Homo sapiens</up>; 47% full-length e-102; genomic match is 43%? But single ORF and unannotated! NEW GENE. One EST Match to human protein misses just 40aa on each end, a 467aa protein AE003749.2 TACCAATTACTTGTAAGCACAAAAAACAGCTGACGGCAACAAGTGGTTCGGTCC CCATCGGAATACACGTGCTCAAAACGTGTGGGTTTTATTTGCCTTAATTGACTTAAATTCACTCGCAATAAGTGGAA ATGATTCGAAAGGTGCCGCTAATTGTAGTCCTGGGCTCCACGGGCACCGGAAAGACGAAACTGTCTTTGCAACTGGC CGAACGCTTCGGAGGAGAAATAATCAGCGCTGACTCCATGCAGGTTTACACCCACCTGGACATCGCCACCGCCAAGG CAACCAAGGAGGAGCAGTCCCGGGCACGACATCATCTACTGGACGTGGCCACACCGGCCGAACCCTTCACAGTCACT CACTTTCGTAACGCAGCACTGCCCATTGTGGAGCGCCTGCTCGCCAAGGACACTTCTCCGATTGTGGTGGGCGGCAC GAATTACTACATAGAATCCCTACTTTGGGATATTCTGGTTGACTCGGATGTCAAGCCGGACGAAGGCAAACATTCGG GGGAGCATCTTAAGGATGCCGAACTGAATGCTTTGTCCACCCTCGAGCTGCATCAGCACCTTGCCAAGATCGACGCA GGTAGTGCCAACCGTATTCACCCCAACAACCGGCGCAAGATCATCCGGGCTATCGAAGTGTATCAGAGCACCGGGCA GACTTTGAGCCAGATGCTGGCGGAACAGCGGGCACAGCCGGGAGGAAACCGCCTGGGTGGACCCCTTCGCTATCCAC ACATCGTTCTCCTTTGGTTGCGTTGCCAGCAGGATGTTCTAAACGAGCGATTGGATTCCCGCGTAGATGGCATGCTG GCCCAAGGGCTGCTCCCTGAACTACGACAGTTTCACAATGCCCACCATGCTACCACTGTGCAAGCCTATACGTCGGG AGTTCTGCAGACGATTGGCTACAAGGAGTTTATTCCCTATCTGATCAAGTACGACCAGCAGCAGGACGAAAAGATAG AGGAGTACCTCAAAACCCATAGTTACAAGCTGCCAGGCCCAGAAAAACTGAAAGAAGAAGGTCTTCCAGATGGCTTG GAACTCCTACGCAATTGTTGCGAAGAACTAAAGTTAGTCACTCGCCGATACTCAAAGAAGCAGCTGAAGTGGATCAA CAATCGATTCCTGGCCAGCAAAGATCGTCAAGTGCCGGATCTCTACGAACTGGACACCAGTGATGTGTCAGCTTGGC AGGTGGCAGTCTACAAGCGGGCAGAGACCATCATAGAAAGCTATCGAAACGAAGAGGCTTGCGAGATACTACCAATG GCCAAGCGGGAGCATCCTGGAGCGGATTTGGATGAGGAGACTAGCCATTTTTGTCAAATATGCGAACGGCATTTCGT TGGGGAGTACCAATGGGGACTGCATATGAAGTCCAACAAACACAAGCGAAGAAAGGAGGGACAGCGCAAGCGGCAAA GGGATCACGAAACAATGCTCTCAACGGATCTAGCGAAGAAGCAAAAGGAGGAGAAAGAGGAGGCAGGAAAGGCGGAG ACTCAGCCACCACCCAGCCGAGTCAATGATACTGATAAGGCAATGtaacactagacgcggcttggcaataaatgaac ctacgtaaatttgagtcatttgttgttgttttgaatctcaatcccaccgttttgctgctgatgcaagcggcttgagg agtatctgataaccctacacctcgctaatggggaccacagaccgcaggggaggtcgttgcctagccagaaaagcgaa aacgcgtaaacatgtttgtgcaccgaacaaccagcccacacaatcgccatcgcccactgactgatctcgtctttcat ttgcatttcagttgcccagcggttcagacgcaattagagaaaccaat LD10347.5prime TACCAATTACTTGTAAGCACAAAAAACAGCTGACGGCAACAAGTGGTTCGGTCCCCATCGGAATACACGTGCTCAAA ACGTGTGGGTTTTATTTGCCTTAATTGACTTAAATTCACTCGCAATAAGTGGAAATGATTCGAAAGGTGCCGCTAAT TGTAGTCCTGGGCTCCACGGGCACCGGAAAGACGAAACTGTCTTTGCAACTGGCCGAACGCTTCGGAGGAGAAATAA TCAGCGCTGACTCCATGCAGGTTTACACCCACCTGGACATCGCCACCGCCAAGGCAACCAAGGAGGAGCAGTCCCGG GCACGACATCATCTACTGGACGTGGCCACACCGGCCGAACCCTTCACAGTCACTCACTTTCGTAACGCAGCACTGCC CATTGTGGAGCGCCTGCTCGCCAAGGACACTTCTCCGATTGTGGTGGGCGGCACGAATTACTACATAGAATCCCTAC TTTGGGATATTCTGGTTGACTCGGATGTCAAGCCGGACGAAGGCAAACATTCGGGGGAGCATCTTAAGGATGCCGAA CTGAATGCTTTGTCCACCCTCGAGCTGCATCAGCACCTTGCCAAGATCGACGCAGGTAGTGCCAACCGTATTCACCC CAACAACCGGCGCAAGATCATCCGGGCTATCGAAGTGTATCAGAGCACCGGGCAGACTT M I R K V P L I V V L G S T G T G K T K L S L Q L A E R F G G E I I S A D S M Q V Y T H L D I A T A K A T K E E Q S R A R H H L L D V A T P A E P F T V T H F R N A A L P I V E R L L A K D T S P I V V G G T N Y Y I E S L L W D I L V D S D V K P D E G K H S G E H L K D A E L N A L S T L E L H Q H L A K I D A G S A N R I H P N N R R K I I R A I E V Y Q S T G Q T L S Q M L A E Q R A Q P G G N R L G G P L R Y P H I V L L W L R C Q Q D V L N E R L D S R V D G M L A Q G L L P E L R Q F H N A H H A T T V Q A Y T S G V L Q T I G Y K E F I P Y L I K Y D Q Q Q D E K I E E Y L K T H S Y K L P G P E K L K E E G L P D G L E L L R N C C E E L K L V T R R Y S K K Q L K W I N N R F L A S K D R Q V P D L Y E L D T S D V S A W Q V A V Y K R A E T I I E S Y R N E E A C E I L P M A K R E H P G A D L D E E T S H F C Q I C E R H F V G E Y Q W G L H M K S N K H K R R K E G Q R K R Q R D H E T M L S T D L A K K Q K E E K E E A G K A E T Q P P P S R V N D T D K A M Amazingly is a single ORF! With no obvious 5' intron either. \*************A. melifera Contig1578 GTTGCTTTTTTTTTCAGAAGTAATTATATTCTGTTATATATAATTACTTCTGAAAAAAAATTTTTCAAAATAACATT ACCAAATCAATACATTTTTCTGTTCATTCGCAAACTGAAAATTCATAAAATTCAAAAAATGGGAATGTAAAGAGGCA AAATTTATAAATATTTTAAGTATTCAATTAAATAATGTTTTACTTAATTAAAATCATAATCCATATTTTATTATTGA TAATTTTTTTTTCACAAGGAGATATCAAATAGATACCTAATTTGTTCACAGGGCACCAAATGGATCAAGTTGAACTT GATTATTTTGTGGTTGCGCAATATTCTGCTGTGGTTGCTGCTGTGGTTGTTGTTGAAGGTTAGTACCCATCATCGAA TTTGAACTAGACATCATCATTGGTGCAGCTCCTCCTGTGACCATCATGTTACCAGGACCACCAGATATTGTGCTCAT CATAGGTCTTATGCCTTGCATACTTTGCATGCCTATTGGCACACCTTGCATTCCCATTGGACGATATCCAGCGCCAG TTGTAGCTGCCATAGGTTGCGGTGTCCATCCTCCAGCTGAGCCACCAGTTTTGGCAGCATTTTTAGGCGAATTCCAT TGCATACCTTTGACTTGTTGCTGAGCACTTTTGTTGATGGTCAAATTTTGAGCAAGACTAGCAAGACTGCTATCCAA ATCTCCAGTCAGGACTTTACCAGTAGACGCTGCATTCTGTTGTTGTCCTGCTACTGAAATCGGTTGCTTTGCCGGGG AACCGTACCCCGCCGGGACTTGGGTGGGGATACCGTAGGCCGACCATCGGTCCG possible end of ORF in RC encodes 130aa 14%G/Q protein; BLASTX match is 39% over 56aa, 5e-05 to clathrin assembly protein AP180 short form \- rat; and many others, indeed frog is best match; genomic is unannotated within an intron \- NEW GENE ; one EST to Anopheles gambiae of all things, but only 43%? Turns out is just an alternative C-terminus for the lap gene, with some extra exons within the final large intron of annotated gene. \*************A. mellifera Contig1637 CTGTGCACGTCGCACAGGCGCCAGCTGCTCGTCATGCTGCAGAACCACAACAAGCTGCGCGACATCAGGCGTAGGTG CACCAAGGCGAAGGAGGAGCTGTCCGTGAACATCTATCACCGGCTCAAGTGGATCATGTACGTGGAGAACAAGATGA TGGAGGTGGACGGCAAGTTGGTCATGTATCACGAGAGCCTGAAACGTCTGAGAAGGCACCTCGAGGTGTTGCAACAG ATCCATCTCGCGCCCCAGATGTACATGAACGCCGTGGCCGAGGTCGTTCGTAGGAGAACGTTCTCGCAAGCTTTCCT GGTCTGGGCGAGCAACCTGGCCTGCCAATTGCTCACCGTTCACAGCGAGGAATTGGCACGTAGAAGGGAGTTTCAGA GCAAATTCGACGGCCACTTCCTCAACACGTTGTTCCCAGGCCTCGAGGACACGCCACCGCCGTTCGCCACCCAGGCG CCGTCCGTTTTCGACAACGGATTGCCAAAGTTGACGGCCGAGGATATGGAATCTCTGAGATCTCAGCTACCCGATCT GGCGCTCACCATCTCGTCGCCAGATTTGAACAGCATCACCCAGTTCTTCCTGTCCAAGAGTCTCACCAGCACGGACG AGAACAACAAGGAGAAGGACGGCGCCTCGATGCGCGTGGAC complete ORF encodes at least 130aa 14% leucine protein; full-length 39% match at e-23 to KIAA0203 gene product <up>Homo sapiens</up> and C. elegans and arabidopsis; genomic match is to 46% to region of CG1347; but annotation needs fixing it seems \- NEW ANNOTATION; one good EST \************A. mellifera Contig1793 AGATTCCTAAAATCCTTTCATGGGGCCGCGGCCAGCCGGATGATATCAGTGCGATCAATCTAGGAGATGAGAAATTC GACCCTGACTCGGATAAAAAGCCGCGCGCAGGACAAATTCTATGGATCCGTGGTCTAACACGACTACAGACACAGGT AATAGGTGGCGAGTTGCAGGAACGTTTGATACCAGTACCCTACAGCAAGAGTTCGACAGACCAAGCTATCCGCGTAG TGAACGCATTTCGGCAGGGCCTAGACGCACGCTACACGAGCGAGCACAGCAGCACTACATTGGCGGAGGTACTGAGA AAACAGTCGTCCTTAAGCAAGCGGCTCTCGCAAACGAGCAGCATTGAATACGCCGATAACAACCCAGACGAACTGAC CATACCCGAGATAGATGTGGAAAGATTGTCAAGTCACAGTCATACAGAGACCGCTGTTTAGAATGGGAGAAGAATGG CGGGGTCAGCAGCGATATGTTTATCAAGCTGTGTTACGTTCGATCCTCTCTGACGATATGTAAACACGAAAAAGAAG TCATCTTCATCCTTGTTTATCTCGTGGCCAGCGCGTTCGGAGGAAAGGACAAAAAAAAAAAAAATAAATA long ORF encodes >130aa serine rich protein; BLASTX says is N-terminus of Ca2+-transporting ATPase (EC 3.6.1.38), plasma membrane isoform 1c from rat and C. elegans; 40% over 82aa; 4e-04; genomic match is 75% to unannotated region \- NEW GENE; several ESTs. AE003844.2 aataatatatttatttatttatatccttatattttagGTGGGGTCGCGGACATC CCGAGGAGTACACAGATGGTATGAATCTGGGTGAGGAACGCTTTGATTCAATTGATTCTGATAAAAAGCCTAGGGCT GGTCAAATTCTATGGATTCGTGGTCTAACTCGCTTGCAAACACAAgtaagtgtcaattcaacaacaacatcaacttg tcttaagaaactagaactaaacaataattacctaaaaggatcaacagttatatatcaatttgtttaaataatgactt ttttatgttgttgtgatatatttttataaatttctatttctattcattattcacatttattacattgatattttata tgatataatgatataattaattattaaataataatataatataatataaatataataaaatattatatatatatata tatacacatatatatacctacctactatatagannnnn---nnnnn11kb intron, at least,ttttatgac tattttactctttcccacttatattttctgtattactttctatctccctctctttctcttcattcttaaagGTAATA GGCGGCGAATTGCAAGAACGCTTGATTCCGGTCCCATATAGCAAGAGCAACACTGATCAAGCTgtaagccttaacaa ttttgttttatgttatttttacagttttaaaataagtcgaattattaaattctatctcggcaaaagcgtaactataa gaatgaacgatgatatacttgtagtacgtttgtctattcactaagaacacaatttttttaaatcgtctgtttgtccg taataataaggtaatgaaaggcaaggcaatttaataggcgtattgtgtgtcaaccacaagcttaatttaatatgccg aatttttaacccacctatatccaaaatatatatggttatatttcttttttaattatgatttagttggtttttcgtgc ccactggttttgctttaaacttccatcatgtagaagaacgatatacttagttttttaatgtgtttgttcgtcaccac ttaaataaaatcaaaaaaaggttgtcgcataatgcatgattaggcaaattaattttagattgctgattaattggtaa attaccgagctacagtcctccgaattatgaacaaaataagcgaaatattaaaaagaataagcaactcatatgaagtg ttgttactgattatttgtccgtctgaattaattggtatttggatatcccattattaacttaagatctatcttatcat ttatgtgtcgcatctgctcgtagtaagaattgtcaaaataatatttgtatttaatttgaactagaaatatacataaa agaaagtatgttcatatatgtataattggatctttagcttgaatgattaaagatgttcttcttatactattgtttat gtcccttttgtcactgttcttggtgttgttcttttctttttactaaatgtacgcttagATACGAGTGGTAAACGCAT TCCGCCAGGGTCTGGACGCCCGTTACGGTGATCACACCAACACATCCCTGGCAGAGGTACTGCGTAAGCAGACTTCG TTGAGCAAACGCCTTTCGGAAACGTCTTCCATTGAGTATGCCGATAATATACCTGATGAGCTGACCATACCCGAAAT TGATGTCGAACGTCTATCATCCCACAGTCACACTGAAACTGCAGTTTAAATTTCAGTGGCATCCATATCCATATAAA AATAAACCGCACACATTCTCAGAAATAACAATTCTAAgaaatccttagcacagcttggaatttttataaaaaaaggt tttgttcaggataacagcaatgtagctgtgaattggttaaaaagcattttgtaattcagcaaaaataatccgtaaaa aaaatgtaaatttctaattttttttgttagtatgtatgctaacaaaatatataaagtacttaatattaatataattg taatgcaagggcatacatacattgattataaccacctttaactcaaaatgtaagcggatcggttttgtctcgcacac tgaagccattaattaatattttatcgtcttacatgtaataaatgattcaatgaataaacatgttttatttacttaca cgtggaaaaggttagcactataataaatcgaccaaacggtgcaaaagaaacagaaaagcacggatc GH15464.5prime GTAAACGCAT TCCGCCAGGGTCTGGACGCCCGTTACGGTGATCACACCAACACATCCCTGGCAGAGGTACTGCGTAAGCAGACTTCG TTGAGCAAACGCCTTTCGGAAACGTCTTCCATTGAGTATGCCGATAATATACCTGATGAGCTGACCATACCCGAAAT TGATGTCGAACGTCTATCATCCCACAGTCACACTGAAACTGCAGTTTAAATTTCAGTGGCATCCATATCCATATAAA AATAAACCGCACACATTCTCAGAAATAACAATTCTAAGAAATCCTTAGCACAGCTTGGAATTTTTATAAAAAAAGGT TTTGTTCAGGATAACAGCAATGTAGCTGTGAATTGGTTAAAAAGCATTTTGTAATTCAGCAAAAATAATCCGTAAAA AAAATGTAAATTTCTAATTTTTTTTGTTAGTATGTATGCTAACAAAATATATAAAGTACTTAATATTAATATAATTG TAATGCAAGGGCATACATACATTGATTATAACCACCTTTAACTCAAAATGTAAGCGGATCGGTTTTGTCTCGCACAC TGAAGCCATTAATTAATATTTTATCGTCTTACATGTAATAAATGATTCA LP02848.3prime RC ACGCAT TCCGCCAGGGTCTGGACGCCCGTTACGGTGATCACACCAACACATCCCTGGCAGAGGTACTGCGTAAGCAGACTTCG TTGAGCAAACGCCTTTCGGAAACGTCTTCCATTGAGTATGCCGATAATATACCTGATGAGCTGACCATACCCGAAAT TGATGTCGAACGTCTATCATCCCACAGTCACACTGAAACTGCAGTTTAAATTTCAGTGGCATCCATATCCATATAAA AATAAACCGCACACATTCTCAGAAATAACAATTCTAAGAAATCCTTAGCACAGCTTGGAATTTTTATAAAAAAAGGT TTTGTTCAGGATAACAGCAATGTAGCTGTGAATTGGTTAAAAAGCATTTTGTAATTCAGCAAAAATAATCCGTAAAA AAAATGTAAATTTCTAATTTTTTTTGTTAGTATGTATGCTAACAAAATATATAAAGTACTTAATATTAATATAATTG TAATGCAAGGGCATACATACATTGATTATAACCACCTTTAACTCAAAATGTAAGCGGATCGGTTTTGTCTCGCACAC TGAAGCCATTAATTAATATTTTATCGTCTTACATGTAATAAATGATTCAATGAAT translation \-------1---------------- W G R G H P E E Y T D G M N L G E E R F D S I D S D K K P R A G Q I L W I R G L T R L Q T Q \---------------0-------------- V I G G E L Q E R L I P V P Y S K S N T D Q A \---------------0------------ I R V V N A F R Q G L D A R Y G D H T N T S L A E V L R K Q T S L S K R L S E T S S I E Y A D N I P D E L T I P E I D V E R L S S H S H T E T A V \* 5'region for 5' exon \- 4kb more available attcttaaatctatgtttgaggactatgaccgcgctaatcacctcccgcatggtcatatctcttgacacttcggcaa taggccgctgcaatcgtacttatctgtagttgccacttatgctgtccggtgacatgtttattgtagcccataaatag aacatttttatagtttgcctacttatcattatggaaatatcgacatggctagtgatcagtatcaataacatacaatt aactttatatcgtaggatatgcgtctttctattgccagaattgtttcttttaagtattctatgaatagtaaagggtt tattaacctgtagttttatcatatatgataattataccttttactcgtagaggaagcgcttccgacaatataaagta tatatatttctgaccaaaacaaccacacccccacttgtcttgccaatatcggtcggtatactaaatacacttttttt ttttaatatggctctgtggctgtccaattgattaaatgcgttcagttctcgtctttgaagagtggtttctgttctaa aatgatggtcctgatcaagaatatatatacttaatatggtctgaaaagtttccttctgcctgttaaatacttttcaa caaatctggtaattcttttactctcgcctaacgggtataattaattagtcatatcgggctactatatcatatagctg ccatatagcgatcggtctgcaataaagtgtttgtatggttggcagctgcccttctctggacctaaaaggaatgttca agaaattttataatttgctgcctatcacgtaacttcccgttgtttatttacactatgaatatgaattctactatctg ccccctgctggctgatggcctggcgacgcccttgacaaaatatatgtaaaaataatattacaaaatgttacaacaaa gtttgattgaagtttatttgtcttggttatctatcagtacagcaaaacattttagccgcgccccttccaaagcccac aagtcgctcaaaactgtcatgtcaacacgtttcaatatattattttctggccatatggaatctgatagtcaaggaac tcgactatagcattctctctttttttattcttatgttggtcatacgttatcaagttacacaa It turns out that this is a large 1200aa protein in others, and we have just the C-terminus. Indeed we need to add it to CG2165, providing the C-terminus for this gene \*************A. mellifera Contig253 TGTCCGTACTCGTTGAAACGCATAAATACACGCTAATGAA TAATTTTTACTGATGTATATAGGCACTTTTTTCTTCGTTCTCCTCTTTCTCTCTCTCTCTCTCCTCATTCTTTCCGT TTCTCACTTTCTCTCTCTCTCTCTCTCTCGTTTCATTCTTCTTACGCTCTCTCTTTCTCTCGCTCATATACTATTCC AAAACAACAAATTTGTTTATCAACCTTAAAAGGCTCCTTTTTTTTTTCCCGATAAGAAAAGCTTGCAGCTTCAAAAA CAGATATTTTTTTGTGTTTAGACGATCGTTTTCTTAAAAGCTAAAAAACATTATGAAATCGAATCAAAAATTCACGC CTATCATCATCTCAACGAAGAATCGGTCAAAAATCATCTTCGTTAACGGAAAGTCTTTTGATATATATAAAAAGGTA TCGAGACAGTGACAGAAGGAATTGAAAGCGGGTTGAAGGAAACATTATGAGAACAATTACGTGCCTCTTAACCTCTA CCAAGTCTTTTACATAGTCGCTTTGATATGGTAATAAATAGGAAATATTTCACGTTTCACGAGAGGATGTAATACAA TAATACAGTTTAGTTGGTTAGAAGAGAATAGAGAGCTGTATGAAAACAGAAGGTAGAGGTAGAAATAGGAAAAAGAT ATAACGACAGATAGAAGAATCTCGAGCGAGAGGCTGCTGTTACATTTTCGCATTTTCGTTCTGTTTCCGGACGTAGT AATTCGTACTTAC long ORF in RC encodes 17% arginine and 14% glutamic acid; 230aa protein; BLASTX match to eukaryotic translation initiation factor 4B <up>Homo sapiens</up> 40% e-08 and in yeast; genomic match is full-length and clear but low at e-09 to C-termianl annotated region of CG10837, so think have additional C-terminal region of this gene; one EST \*************A. mellifera Contig2709 TGAAAATGCGAAAAATACAATAAATGGTGGAAAATATAG CAATTTAAATATACCAGTAACATCACAACAAGGATTAGCGCCACTTAGTCCCTATTTAAATTTTGATCCTGCATATC TTCCTCCAAGCCAACCAGAATATATATTTCCGGAAGGAGCAGCAAAACAAAGAGGAAGATTTGAATTGGCTTTTAGT CAGATTGGTGCAGCATGTATTATAGGAGCTGGTATTGGAGGTGCTACTGGTTTGTATAGAGGCATTAAAGCAACATC TTTAGCTGACCAAACTGGGAAACTTAGAAGAACACAATTAATCAATCATGTTATGAAAAGTGGATCGTCGTTAGCAA ATACATTTGGAATAGTATCTGTGATGTATAGTGGATTTGGTGTGCTTTTATCTTGGGTCAGAGGTACAGATGATTCC TTAAATACATTAGCAGCAGCAACTGGAACAGGAATGTTGTTCAAATCTACAACTGGCTTAAAAAAATGTGCATTGGG TGGTTGTATAGGACTAGGAATAGCATCTGTATATTGCTTATGGACTAATCGAGAAGCCTTACTGGAATTGAGGCATC GCAATATAAATCCAGCGTAAGACTGTGTGTAACAGCAAAAACTCTGAATATTCCTAAATATTTTTCTTATAGTGATT TAAATTTCTTAGTAGTACTAACAAAGGAATGATAGAAGGGTTTGCATTCTGTAATTGATATAAATTATATATGAATG ATTATTTTTACAAAAATAAAAAAAAA long ORF encodes 13% glycine >200aa protein; BLASTX matches full-length at 43% and e-31 for 200aa translocase of inner mitochondrial membrane <up>Mus; genomic match is to a small 15kb contig; several ESTs, so annotation would be easy \- NEW GENE AE003403.2 TTGAACACAGATGTCACTTCTACAGGGGAAAAAAGTTTAAAAACA AGTAAATCACAGAAAACGTCGTTTCCTTTTGCTAATAGAGCGCCTGAATTCGGTGGAAATAGCAAAAATAATATATC ACCATTCTTGGGACTGCAAACAAATTCGAAAATGAGTGACAATTTTTCAAGAACACCATATTCTGATGGGCACGCTG CAACCCgtaagcaacaaagaatttggttacataattttattaattattaaatagattgctgttatcttttttctttg taaattgaaatgcaatacaatatgcagATGAGGAAGCATCAAAACCCCACTACACTACCACTACGAGTTCTTTTAGT AGAACTCCGGTCTCGCCGTACCTCAACTATGATTCGCGATATCTGCAGCAAGCACAGCCAGAGTTCATTTTTCCCGA AGGGGCCAACAAGCAGCGTGGACGCTTCGAGTTGGCCTTCTCTCAGATAGGCACTTCGGTAATGATTGGCGGTGGAA TTGGCGGCCTAGCAGGTGTTTATAATGGTTTAAAAGTCACAAAAGCACTCGAGCAGAAGGGAAAAGTTCGTCGAACA CAgtaagcaattggcggaattaaaattggttgcaacctcacaactttgactcaacacgtagGTTACTTAATCACATT ATGAAGCAAGGTTCCGGCACAGCTAACACATTAGGTACATTGACGGTGCTGTATTCGGCTTGTGGAGTTTTGCTGCA GTTTTTCCGCGGAGAAGATGATCATATAAACACAGTAATTGCGGGCTCTGCCACAGGACTATTATACAAGTCAACAG gttagctaaattttccatatatcgaaaaataatatttattaactagttgcattgtattttacagCTGGCCTTAGGAC GTGTGCTTTTGGTGGAGCTATTGGGCTGGGCATCTCGTCCCTCTATTGCTTATACCTAATAGCACAGGAAAACAGTT CGAACTCAAGTCCCAAATACCTATAGATGGCTGAAATATGTAGTACGCAGGCATTAATAGGATCACTCCTAGCCGAT TAAAATTATAAATACGAAGTTTTAATTTTATTTTGTTTTATTGCATTTTATACTAAGCATTTTTGCATTAACTTGCT GTTGTAGATAAAGCCATCACATTCCCCCACGCAATTTAGTTAGGAACCCAATTTCCAAACTCGCTAATAGTCCAAGT TTTTGGTATCGGCGCCTACCATGATTGCCCTGCTGCCCCCAGCCATTTCATTATTGGCCGGAGCTTCGCAGCTGGTA TCCTGTGTGGAGCACGTCTGGCAGTGACAGTTCACTGCCTCCATGTACTGGTACTTGCTTACGCTGTCCTCTGCTTT AGGGTGACAATTCTTTAGGATAGCTACTACCAGCTGCCGCTGGGCGTGGACACAAACAGGATGGAACGAACGCTTGT AAGGAAACTTCCAATCGGAAATTTCGCTTGAGTCGCATCGTCCCCAACACGACCAAACGCTTACATAGTCCCAACAC TCGTGTCCTTGAAGATCGGACTGTGTCACTTTATACGTGTATACACGACGATGGCATCCCAAAGGCGTCACGATATG TCCGTTGTTCATCGGCTTAATTTCCGACAAACTTGAAGATGAAACCGAAACAAGCACAACAGACGTACCTACAAAGA TAGCTAAAGTCCTGAAAAAAATTATTCTGAGCATAGATGAACAATGC LP07554.5prime TTGAACACAGATGTCACTTCTACAGGGGAAAAAAGTTTAAAAACAAGTAAATCACAGAAAACGTCGTTTCCTTTTGC TAATAGAGCGCCTGAATTCGGTGGAAATAGCAAAAATAATATATCACCATTCTTGGGACTGCAAACAAATTCGAAAA TGAGTGACAATTTTTCAAGAACACCATATTCTGATGGGCACGCTGCAACCC-------------------------- \------------------------------------------------------------------------ATGAG GAAGCATCAAAACCCCACTACACTACCACTACGAGTTCTTTTAGTAGAACTCCGGTCTCGCCGTACCTCAACTATGA TTCGCGATATCTGCAGCAAGCACAGCCAGAGTTCATTTTTCCCGAAGGGGCCAACAAGCAGCGTGGACGCTTCGAGT TGGCCTTCTCTCATATAGGCACTTCGGTAATGATTGGCGGTGGAATTGGCGGCCTAGCAGGTGTTTATAATGGTTTA AAAGTCACAAAAGCACTCGAGCAGAAGGGAAAAGTTCGTCGAACACA------------------------------ 2----------------------------GTTACTTAATCACATTATGAAGCAAGGTTCCGGCACAGCTAACACATT AGGTACATTGACGGTGCTGTATTCGGCTTGTGGAGTTTTGCTGCAGTTTTTCCGCGGAGAAGATGATCATATAAACA CAGTAATTGCGGGCTCTGCCACAGGACTATTATACAAGTCAACAG-------------------------------- 1-------------------------------CT AT20116.5prime GGCACGAGGCTTCTACAGGGGAAAAAAGTTTAAAAACAAGTAAATCACAGAAAACGTCGTTTCCTTTTGCTAATAGA GCGCCTGAATTCGGTGGAAATAGCAAAAATAATATATCACCATTCTTGGGACTGCAAACAAATTCGAAAATGAGTGA CAATTTTTCAAGAACACCATATTCTGATGGGCACGCTGCAACCC--------------------------------- \-----------------------------------------------------------------ATGAGGAAGCAT CAAAACCCCACTACACTACCACTACGAGTTCTTTTAGTAGAACTCCGGTCTCGCCGTACCTCAACTATGATTCGCGA TATCTGCAGCAAGCACAGCCAGAGTTCATTTTTCCCGAAGGGGCCAACAAGCAGCGTGGACGCTTCGAGTTGGCCTT CTCTCAGATAGGCACTTCGGTAATGATTGGCGGTGGAATTGGCGGCCTAGCAGGTGTTTATAATGGTTTAAAAGTCA CAAAAGCACTCGAGCAGAAGGGAAAAGTTCGTCGAACACA------------------------------2------ \----------------------GTTACTTAATCACATTATGAAGCAAGGTTCCGGCACAGCTAACACATTAGGTACA TTGACGGTGCTGTATTCGGCTTGTGGAGTTTTGCTGCAGTTTTTCCGCGGAGAAGATGATCATATAAACACAGTAAT TGCGGGCTCTGCCACAGGACTATTATAACAGTCAACAG--------------------------------1------ \-------------------------CTGGCCTTAGGACGTGTGCTTTTGGTGGAGCTATTGGGCTGGGCCATTTGTC CCTCTATTGC GH02609.5prime CAGGAATGCTTGAACACAGATGTCACTTCTACAGGGGAAAAAAGTTTAAAAACAAGTAAATCACAGAAAACGTCGTT TCCTTTTGCTAATAGAGCGCCTGAATTCGGTGGAAATAGCAAAAATAATATATCACCATTCTTGGGACTGCAAACAA ATTCGAAAATGAGTGACAATTTTTCAAGAACACCATATTCTGATGGGCACGCTGCAACCC----------------- \----------------------------------------------------------------------------- \----ATGAGGAAGCATCAAAACCCCACTACACTACCACTACGAGTTCTTTTAGTAGAACTCCGGTCTCGCCGTACCT CAACTATGATTCGCGATATCTGCAGCAAGCACAGCCAGAGTTCATTTTTCCCGAAGGGGCCAACAAGCAGCGTGGAC GCTTCGAGTTGGCCTTCTCTCAGATAGGCACTTCGGTAATGATTGGCGGTGGAATTGGCGGCCTAGCAGGTGTTTAT AATGGTTTAAAAGTCACAAAAGCACTCGAGCAGAAGGGAAAAGTTCGTCGAACACA--------------------- \---------2----------------------------GTTACTTAATCACATTATGAAGCAAGGTTCCGGCACAGC TAACACATTAGGTACATTGACGGTGCTGAATTCG translation M S D N F S R T P Y S D G H A A T \-------------------------- \--------------------------1---------------------------------------------H E E A S K P H Y T T T T S S F S R T P V S P Y L N Y D S R Y L Q Q A Q P E F I F P E G A N K Q R G R F E L A F S Q I G T S V M I G G G I G G L A G V Y N G L K V T K A L E Q K G K V R R T Q------------------------------ 2---------------------------- L L N H I M K Q G S G T A N T L G T L T V L Y S A C G V L L Q F F R G E D D H I N T V I A G S A T G L L Y K S T \-------------------------------- 1-------------------------------A G L R T C A F G G A I G L G I S S L Y C L Y L I A Q E N S S N S S P K Y L Z several intron splices not predicted, but cDNAs make it clear, as well as matches. Interesting 5' UTR, has no stop codons, but this is the first M and it aligns okay. \***********A. mellifera Contig764 GATCCTGCCAGTAGTCATATGCTTGTCTCAAAGATTAAGCCA TGCATGTCTCAGTACATGCCGAATTAAGGTGAAACCGCGAATGGCTCATTAAATCAGTTATGGTTCATTAGATCGTG GACACATTTACTTGGATAACTGTGGTAATTCTAGAGCTAATACATGCAAACAGAATTCCTCTCAGAGATGGGAGGAA TGCTTTTATTAGATCAAAACCAATCGGTGGCGGACGGCTCGTCCGTTCGTCCATCGTTTGTTTTGGTGACTCTGAAT AACTTTGTGCTGATCGCATGGTCATCTAGCACCGGCGACGCATCTTTCAAATGTCTGCCTTATCAACTGTCGATGGT AGGTTCTGCGCCTACCATGGTTGTAACGGGTAACGGGGAATCAGGGTTCGATTCCGGAGAGGGAGCCTGAGAAACAG CTACCACATCCAAGGAAGGCAGCAGGCGCGCAAATTACCCACTCCCGGCACGGGGAGGTAGTGACGAAAAATAACGA TACGGGACTCATCCGAGGCCCCGTAATCGGAATGAGTACACTTTAAATCCTTTAA no obvious ORFs; WEIRD BLASTX match for RC full of stop codons, at 70% and many gaps and e-14 to human non-functional folate binding protein <up>Homo sapiens</up>; Needs four stop codons included! Is this a mouse pseudogene cDNA? No, has genomic match in Drosophila at e-37, but to a tiny incomplete 1000bp contig \- so will not be able to reconstruct this amazing gene; AND tons of 70% B. mori and some Drosophila ESTs with their own set of stop codons in them! What on earth is this gene? Mitochondrial code doesn't fix problem Contig 764 RC TTAAAGGATTTAAAGTGTACTCATTCCGATTACGGGGCCTCGGATGAGTCCCGT ATCGTTATTTTTCGTCACTACCTCCCCGTGCCGGGAGTGGGTAATTTGCGCGCCTGCTGCCTTCCTTGGATGTGGTA GCTGTTTCTCAGGCTCCCTCTCCGGAATCGAACCCTGATTCCCCGTTACCCGTTACAACCATGGTAGGCGCAGAACC TACCATCGACAGTTGATAAGGCAGACATTTGAAAGATGCGTCGCCGGTGCTAGATGACCATGCGATCAGCACAAAGT TATTCAGAGTCACCAAAACAAACGATGGACGAACGGACGAGCCGTCCGCCACCGATTGGTTTTGATCTAATAAAAGC ATTCCTCCCATCTCTGAGAGGAATTCTGTTTGCATGTATTAGCTCTAGAATTACCACAGTTATCCAAGTAAATGTGT CCACGATCTAATGAACCATAACTGATTTAATGAGCCATTCGCGGTTTCACCTTAATTCGGCATGTACTGAGACATGC ATGGCTTAATCTTTGAGACAAGCATATGACTACTGGCAGGATC translation L K D L K C T H S D Y G A S D E S R I V I \# F V T T S P C R E W V I C A P A A F L G C G S C F S G S L S G I E P \* F P V T R Y N H G R R R T Y H R Q L I R Q T F E R C V A G A R \* P C D Q H K V I Q S H Q N K R W T N G R A V R H R L V L I \* \* K H S S H L \* E E F C L H V L A L E L P Q L S K \* AE003241.2 entire AGTGATCCACCGCTTAGAGTTTTATAATTCATTTTTATATAATGTCAATTATGT TTTTATTGAAAGAAATTAAAAATACACCATTTTACTGGCATATATCAATTCCTTCAATAAATGTATTTATATACCTA AAATAAATGTTGCGAAATGTCTTAGTTTCATATAAGCATTATGTATCATAATAATCTGGTTGGTTATGGGGTTTGCT ATTTTGGGTGACACATACTGCAATTTATATAAAACATTAACCTGATGGATGCCAGGTACAACATTGTTTATTTCAGG TTGTTGCATTAGCCAACGTATGCCCATAACTAAGATGAACAATACATATTCGCAACGCGTGTATAGTAATAAATACA CACAAATTTTAAAAATTAGTTAATATCTACCAATTATATTAACACTTATTTCGATGATTACCACACATTCGAAATTA TTTTATTTTGATTCGACTTCCACTTTCGAATTTTGTTTTTTCGATTTTCATGTTCGAAACATTATTTTTATAGGAAA CGCCGTTGTTGTAAGTACTCGCCACAAATACGCACAACATACATTAGAAATGTTAAAATCTTTTTATGAGGTTGCCA AGCCCCATCTTCGTTTTATTTTGATTTTAACTTTTTGTATGAAAAGATACAAGTATTTAATCACATATAAGAACTCC ACCGGTAATACGCTTACATACATAAAGGTATAGTACTAACCACAATTGTAAGTTGTACTACCCGTATGAAGCACAAG TTCAACTACGAACGTTTTAACCGCAACAACTTTAATATACGCTATTGGAGCTGGAATTACCGCGGCTGCTGGCACCA GACTTGCCCTCCAATTGGTCCTTGTTAAAGGATTTAAAGTGTACTCATTCCAATTACAGGGCCTCGGATATGAGTCC TGTATTGTTATTTTTCGTCACTACCTCCCCGAGCTGGGAGTGGGTAATTTACGCGCCTGCTGCCTTCCTTAGATGTG GTAGCCGTTTCTCAGGCTCCCTCTCCGGAATCGAACCCTGATTCCCCGTTACCCGTTGCAACCATGGTAGTCCTAGA TACTACCATCAAAAGTTGATAGGGCAGACATTTGAAAGATCTGGCGTCGGTACAAGACCATACGATCTGCATGTTAT CTAG translation L K D L K C T H S N Y R A S D M S P V L L F F V T T S P S W E W V I Y A P A A F L R C G S R F S G S L S G I E P \* F P V T R C N H G S P R Y Y H Q K L I G Q T F E R S G V G T R P Y D L H V I \* Yikes, all the cDNA matches are backwards too, that is to the RC as with the honey bee, and even for the vertebrate clones? What on earth can this be? \************A. mellifera BB260003A10H2.F ACAGAGAGAAAAAGCAAGCTGCCTTACCTATCGCT GTCTTTGGAATGGAAATGGTGGAGAAATTCTATTCGAAACAATTCACGGATAAGGAAGAGGGTCTGATGCAATTGAA AGAAGAATTAAAGACGTTTGATCCAGAAGTTTCGAAACATTCCGCGAATAAAACGGCCAGAGCTGCGATTTTGTTGT TACACAGAGCCCTTAGGGACAAAGTTTTCAGTGTATACAGTCTAGCTGCACAATTGATTAGAATTTTTTTTTCAGAA TTTGCAACTAGGGTATCTTCTACGGAGATTGCGAGAAGTGTGGAAAGATTGCTCCCAGAATTATTGACTAAATCAGG GGATACCACTCCAAGGATTCATAACATGGCAGTCCACACGATACTCAGTATGGCAGATTGTAAATGTGTCCGAGAAT TGCACATTATACCAGTGCACTTAACAAGACCTGTCAGCAGTAGCACTCATCAAAGATTGGCGTTGAGTAGGTTAGAA ATGGTGGAGCAGTTGATTCTGAGCCATGGAATATCTACTGATAAACAAAGTGGGCTCACTTGTCGAACATTGTCGGA ATTGGGTTCTACGGGATTGCATCATCCGGCGGAAGCAGTGAGAAAAGTTTCGGAAAGAATTCTAGTGTTGGTGTACA AAGTGAATCCTAGATTGGTTCGCAAACAATTGCCTCCTGACGATGATATTACCAGGAGAAATTTATTGTATCGTCAA CTTTTTCACGAATTTGATGTTT full ORF; BLASTX glycine-, glutamate-, thienylcyclohexylpiperidine-binding protein <up>Rattus; 28%; 3-22; could be rest of CG10137; no ESTs</up> Indeed, there is a human protein, KIAA0562 , that is about 900aa long, and its first part matches CG10137, while its second matches our cDNA. Similar long proteins are known from C. elegans and Leishmania major, so the truncated rat protein above of 400aa, is aberrant. \*************A. mellifera BB260003A20B1.F CCTTAGATGGACCGATCTGATATGCAACCTAATA CAAATCAATTTACTGTATGGAGTCCTGCAGGCCAAGACATTAGTACGAGGGTCTCCTGGAATTCTGATCGACGACCA ACCGCCCGAGAGCCCGAGCGACAGTAACGAAACCGACGAGACGAATAAAAATTTCCCGTGGATGAGGGTGTTGGCCC AGTTCGCCAACTCGTTCAACTTTTACTGCTCCCATCAGAATTTCTGCCACCCGTATTGCCACAGGCGGCAGATGCGC GCGTGCAGCAGATTGATCAAGTCCATAAGGAAAATCTACGGGGAGGAGTTTGGCATATTGAACGGGACAGGGATATT CGACTTGGACACGGATAAGAAGGAGGCGAGCAAAAAGGAGAAACGGAGCCGAAAAGTTTCGGAGCAGGCCAGCACCC AAGTGTCCCCTGTGAGGAGGAAGGACAGCGTAGGGAAGAAGTACAAGTAACATTTATCTTCCTTAATTGATTCAGTG AAAACTGACTGTTCTAATAATTTCCTGAATAAATTTAAGATAAGATATTCTTCAAACACAATTTGGATTAACATCAA CATCTCTTCAAATTATTCTAAGATTATTTCTATATAATATATTTCTTATTATAAGATCTTATCTCTAAGATCTGTTA AAGAAATGCATGGCTCTATTTTATTAATATTTTAAATACAATCAAACTGATAACTGGTCTTGAAAANTTATTCAAAT GAAAAGTGGCATCTTGCGTTCCTTTCCTCTTCTTCGACAGGGTTGAAAAGAACATGGATGGGTCTTAATTAGGCAGG CTCGCTCAACGAGACTTGTC end of ORF encodes weak match to C. elegans and human proteins; methyl-CpG binding protein 1 <up>Homo; 32% over 84; e-0.27; unannotated NEW GENE; no ESTs The human protein is about 600aa long, with our matching in the middle; this region has two zing fingers, but the methyl-CpG binding domain is at the extreme N-terminus; This genomic match is very good at about 55% so see how far it can be extended, we have 14kb region to play with! AE003762.2 CACCTCGGAACAGGGCATGATCAGTGGTGGCGAGGAATCACCAGGAATTCTCAG TGACGATCAGCAGCCGGAGTCACCAACGGACTCGAATGAAAACGATGATACGGCCAAGAATATGCCGTGGCTAAAGG CAATTATCGATCTTATGTCCAGCTATAACTACTACTGCACCCATAAAGGATATTGCCATCCATTTTGCTATAAACGG CACATGCGATCCTGCACTCGCTTGGTCAAAGCCACCAGAAAGgtaagataccatttacggaagtcaaacttaagcca aaaaattcgattgtttcctagGTTTATGGCGAGGAGTTTGGATTCACCTTCGATGCAGACCATCCGAATGTGGAGCC CACTATCATCACCTCCAGTAAGCCACATACTTCTCGAGCTCGCTCCACTAGAAAAGTATCGGAGCAGAGTTCCACTC AGACATCTCCGTCCAAGCGAAAGGATAGCTTGTCACGCAAAGATCGGTGGGTAGTATTCATAGTTAGCCAAAACATA GCTTATATATACATGCTCACAGGA translation S P G I L S D D Q Q P E S P T D S N E N D D T A K N M P W L K A I I D L M S S Y N Y Y C T H K G Y C H P F C Y K R H M R S C T R L V K A T R K \------------------- \---0--------------------------------- V Y G E E F G F T F D A D H P N V E P T I I T S S K P H T S R A R S T R K V S E Q S S T Q T S P S K R K D S L Now figure out that is already recognized as CG18437, except for some reason the entire annotated mRNA is not translated into the protein CG18437? \**************A. mellifera BB260003B20H2.F GAAGATATAAAAATTATTTAACGAAAATCTAAA TTTTTGTCGAACGATTAAAATTTTGCGGAGTTAACCTATACGTTTACATTCAATTACCAATAACGACACATTATCAT AAAAGAATTTCCAACATTAATATCACGGAAAGATTAACGAGGTGGGGCAGAATTACTTTTCAATAAAAAAAAAAAGG CTCGAAGTAAACTTACCTGGAGTAAACGAATGGACTATACAAAAGCTACTTTTCTGGATAACAGATAATCTGTTAAA GGAACGACAGGAACTTTTTATACAAGGGGATACCGTGCGTCCAGGAATCTTGGTTCTTATAAATGACATCGATTGGG AATTATTGGGCGAAAGCGATTATAAAATAAAATCAGGCGATACTATATTATTCATATCCACTTTACACGGAGGATAA GAAAGATAAAAGTGGAAAAAGAAAAGATAAAAGTCGATGCAAAGAAGATCGAAGATAATACCGCATAACATAATATA AGTGGAATCCGAGAAAAACGTAATTCTTCAGAAAAACATAATTTCACGGAATGTTTGTACCGATGGTGAAATACGTG GATACGCGCGTAGCGTTAAAAAAAAAAAAAAAATAAACCGTTTATTCCTTCGTAAGATAAGTAACGATTTCTATTAG CCCCGCCGG no clear ORF, yet ubiquitin like protein; Urm1p <up>Saccharomyces cerevisiae</up>; 47% 3-11; has no start codon and a frameshift, but encodes full-length 99aa protein; TBLASTX is clear; unannotated in genome NEW GENE; no ESTs at all Only two kb available for this gene between annotated neighbors; this is the entire region AE003558.2 gtgttttttttaatttcataaattcacaacgaaatgtaaaatgtcttcttagag cacagatcataatatgctaatgaaaactttacctagcgttccattggaatgccacctgtttatgtttctcaatggga ataattataatgctgagtcccctcgtcgaaggtgaatggtaaaggtaaatttacgtacaaattataatttctttcaa aatgcaataatttttggaagtttagagcatgtggttgccagactttttagaaattaggattctatatttggtattat ttttgaATGGTGGATTTTTTTGAAAAGCTTAGACGCGGTCACACATTTATTTACATCGAACATATGATGGGCACGCC GGAATTAAAAATCATATTAGAATTCAGgtatataaagccttgttcgaattattatttccaataaatgagacaatttg taatttaatttcgtagTGCAGGGGCGGAGTTACTATTTGGTAACATAAAACGCCGTGAATTGAACTTGGACGGTAAA CAAAAATgtatgttctaaaagatgttttaaacttgagtgaaatcggttatgaattatatatattttaaagGGACTAT TGCTAATCTGCTTAAGTGGATGCATGCGAATATTTTAACGGAGCGTCCGGAACTTTTTCTTCAAGGAGATACTGTgt aagtttggcttaaactataagggatacatagtctatactttctattgtatgttttcagGCGACCTGGAATTTTAGTA CTCATAAATGATACAGACTGGGAATTGCTGgtaagtaagagtacagaatatggaattctaaaaactataattaactc aacttctagGGTGAACTGGACTACGAGCTGCAGCCCAACGACAATGTGTTGTTTATATCAACTTTACACGGTGGTTA AAAAACGTTCTGGAATCTAAATTATAGGAGAAAAGTTTATTTTTATACTACAACCCATTaaattgtattcaaaaaac aaacaaaaacagtttttgactgaatcagaattacgtccctttaccagaggcaaaggcccacacgatggttcgggtag taaagttttcatcatatcatctagagattgtaagctagcaggttttcgctatttataaaagcacagcattgaacaag cacttgggaattgtagggaagttaaaaatagaacaatccagatgcggtttggcgggtaattcgaagtaaggacaaca cgttagttttacattgaagcaacatttattgaatttaaattttcgcttaaaattatttatcgagttattatgaagta tagatatatttttttaatgttcgtttacgattattttaactataggcacttaggttacatatcaactaactgtacgt aatgaagttcgattcagatatgttgggcatcggccacgcccctttttcgggtgctgctcatggctcccatcgatttc tgtgtgtgtgtggtgcggatcgttcgattgtgtggactaagaactagatatgtatgaaacttgcttcgacattgaca cgctgaattcaataatttcgactgatttttccgataaacaaggcaaaacgaaaagcagcgactatgacaaattaacg aaaactaaaaatgtaataaataataaataacaaaaataatgaaataaaattagcgatcgaacgtaatacgatgatac tacatgggatccgatgaaaccgttctgctaaagctattaatggggatttatactatatctagatgcgat translation M V D F F E K L R R G H T F I Y I E H M M G T P E L K I I L E F S--------------------------------2------------------------ \--------- A G A E L L F G N I K R R E L N L D G K Q K \---------------------------------1-----------------------------W T I A N L L K W M H A N I L T E R P E L F L Q G D T V--------- \-------------------------2------------------------- R P G I L V L I N D T D W E L L \------------------------------0----------------------- \-- G E L D Y E L Q P N D N V L F I S T L H G G \* Lovely small compact gene with all intron boundaries predicted; encoded protein is 121aa Drosophila MVDFFEKLRRGHTFIYIEHMMGTPELKIILEFSAGAELLFGNIKRRELNLDGKQKWTIANLLKWMHANILTERPELF LQGDTVRPGILVLINDTDWELLGELDYELQPNDNVLFISTLHGG Honey bee RGGAELLFNKKKRLEVNLPGVNEWTIQKLLFWITDNLLKERQELFIQGDTVRPGILVLINDIDWELLGESDYKIKSG DTILFISTLHGG \**************A. mellifera BB260004B10A11.F GTTGCGCACTCCTAGAAACGCATCGGCAACGC GAAACAACTGGCATCAAGAAATTGTACAATAGCTTCTTCGTAATGTTCTAAGGATAACTCTTCCTTAAGGAAGTTCG GCCCTTTATTAACCGGTGGTCGAGTGATGCACCGTGCACCTCCGAGATCATTGTCGATCCCGATCGAGGAAAATACG CGGAGAAAACGCACAAAAGAAGGAGGACGTTTTGGTTAGTAAACGAAAGAAAAGGAACGAGCTGAGCGGAGGAAGCC GAGAAGCGGCGAAAACAAAGAGGAAAAAAAAAAAAAAAAAAAGCCAC tiny match 88% over 25aa at e-07. Could be end of an ORF. No BLASTX matches though, so novel protein; one Drosophila EST too. Match is to an unannotated region of about 5kb, so try to figure it out. AE003830.2 cgtaaacaactactatataatgtccgtcgccatctgacagtggtccaaacaacc agcacaaagagctagaagtcgccgcagccagtgacaagtttcaccacagcgagtgagacctgtaccgtagaaccaac acaagacacttaagcttgcaacacgggctaaccaattcagcgataaATGGAGAAATCTGAAATACGACTGCAACGCA TGTCTAATGAATATCAGTCGCAATCGAGCTATATGTACCTCCGGACCAAGATGCTGTTAAAAATCGAGAATACCCTA CTTCGAAGCCATCGTCAGCGCGAGACCACCGGTATCAAGAAACTATACAATTCGTTTTTCGTATTGTTTTAATTTGC CCCCCCGGCCAGT translation M E K S E I R L Q R M S N E Y Q S Q S S Y M Y L R T K M L L K I E N T L L R S H R Q R E T T G I K K L Y N S F F V L F \* This is all I can annotate, a simple little ORF, the second half of which matches our honey bee EST as well as one Drosophila EST (which has other problems including a piece in RC. No idea if it is real, no BLASTP matches. \*************A. mellifera BB260009A20B5.F TTTTTTTTTGAATTCTGTCCGAGATTCCATTCTT AAAAAAGAAAACAAAAGCTAAAAGAATTAATAAGAAAAAAAATATATATATACACATATATTAAATAGTATAAATAA ACATAACCTATAAATAGTAAAATATTCACATACATTTTAAAAGATTATTACTATTCTTAATAATAGACTTATAATGG TTCTTCGCTATATCAATAAATTCTTATATTATTTAAATAAAATTATAGATTGAAAAAAATGATTTAAGTGAAATTAA ATATCGAAAAAGAAAATAGAGATATCCACTTCCATACACAATATAAAGGGAAATGTAAATCGAAGTTAATTGTGATA CAAATGCATACAAGAGAAATTAAGAAATACAATCTTATTTCATTATCATGCTATTTAAAAATACATTATATGAAAGC AACTTTATTATGAAGTATCAAGAACTCCATTTTTATTCGTATTATATCCGGGACACATGGTTATTTTCTAACCTTAA TAATACGTTTTAGTTTGGGTAGTGATTATTTAGAAATATTGAATGGAATATAAGATAAAAAAATCATGTCAAAGCTT TTTGTTTTATTTGAACATGCTGCTGGCTATGCCATATTTTCTGTCAGAGAATTTGAAGAAGTGGGAATGTTATTGCC TCAAGTTGAAGCATCTGTAACAGATTTGTCTCGTTTTAACTCAAGTGTGAAATTAATNTGGATTTTCACCTTTTAAA ACTGGCTTAACAGCTCTAGAAAGTATAAATAATATTTCTGAAGGAATTGCCCACA no obvious ORFs; frameshifted ORF at end of sequence (long 5' UTR or chimeric?) has excellent match to nucleolar protein <up>Drosophila subobscura</up>; tons of ESTs; why is it not annotated? CG13849 is annotated for the same region, but the amino acids are different? No, is part of this gene CG13849. \*************A. mellifera BB260010A20C3.F GAAAAGGAAAACTCCATCAAAATTAAAGACCATG GGTAATCATGATGACTTTTTAAATCGTATATCTAAATCGCTTTATTATGCAAAACTGCCAGTGACTGATTGCCTCAG TTTACCTGTTACTGAATTGGCAGCAGAATTATTCACTGAAGTGAAGAGTGGTTATACACTTGAAAGATTAGATGTAG AAGAGGCTAGTAGAATTTCTAGAAATGCATGTGTATCACCATGTTCTCTTGTTTTGGCATTGTTATATTTGGAGAGA TTAAAAGATTGTAATCCAGAATATCTTCAACAAGTGGCACCTTCTGAGCTCTTCCTTGTTTCTTTGATGGTGGCTAG TAAATTTTTAAACGATGAGGGAGAAGATGATGAAGTTTTCAATACTGAATGGGCACAATCAGCTGATTTGACTATAT TACAAATAAATCGGTTAGAAAAAGATTTTCTTAAAGCTATTGATTGGACTGTTTTTGTTCATAATCAAGATTTTTGG GAAAGATTGCAGAAATTAGAAAGAGATATAGCTTATAAGGAAGCACAAAAAAGAGGCTGGTTTTCATATACAGAATT AAGTTGTCTAATGAATTCAATGCAATTAATTGCAGTAGCACATGCTGTAGTAAATGTATCATCTATTTGCTTAGCAA CATATACTGCANGAGTAGTTACTCTTTTAGTTCTGCTTTAGTTGCAAGCTATCTTCCAGGAACAGTACTTAACAATC CAAGACAAGTAACTAATTCTACAGATATTATGAAAGCAGATTTTAAATTCAAGATGGATATAACATCACCTATCGAA ATTTATCAGAAATGTTTTACAACAGATTTATATC long ORF encodes BLASTX match to CGI-57 protein <up>Homo sapiens</up>; e-37; and nematode protein; these are 400aa proteins, and match is from the N-terminus, our ORF may be frameshifted at end TBLASTX match is to five regions in tiny unannotated scaffold; one Drosophila EST and a Bmori EST AE003132.1 TTCGCATATTTCAGTTATTTATTTAGAAATGGGGCGATTTAAGTTATGTGCTTC GCCGAGAGAGgtatttaaaaagttttcacgaatatatgtattagcaaattttttaaatttccagGTTATGAAGTACG AAGACTTTATAAAACGCATTCGAAAAAGCCTCTACTATGGCGTTGGAACACCAGACACAGAAATGTCGGTCTCCTTA CCCTTTGCGGAGTACGCGGCAGATTTGTTTTCGGAGACTCATCGCGGGCATTCTTTGCATCGCCTAAGTTGCGTATC TGCTGCACAAGTACATGCCACGCCTTGCTCTTTAATTATGGCATTGATATACCTCGATCGCTTAAACGTCATCGACT CGGGCTATAGCTGCAGAATCACACCACAGCAGCTGTTTGTTGTGTCACTAgtaagtacgcactcctctataacttgc aaactaatgcaaacaacaatatgaacgcaccgtaaaaaagtacatggctataaatgtcgaaactgtagctgctgaaa caaatttccgttagtttcactgtcggctgaatgaaaaatgacgatgattttgatcagaaataaattgtaaaatttca cagcggcactcactgtgtctgtacatgcactcagtcagcaagaagttttgacgtggctaccattgcttgagtccgct ttaatacatatgtgattgtctgtttttaatatggtaattataagttgaataaatggtaattatctttacagATGATT TCCACAAAATTCTACGCGGGCCACGACGAACGGTTCTATCTGGAAGACTGGGCCAGTGACGCTTGTATGACGGAAGA TAGGCTCAAGGCAGTCGAGCTCGAATTTCTTTCCGCTATGgtaaactttacaatgtctaaaatacaaaaataaaata cgttttttctagGGTTGGAATATATACATATCCAATGAGCTATTCTTTGATAAGTTAAGAAACGTTGAACGTTCTTT GGCTGAACAGCAGGGACTGCGTAGAGGTTGGCTCACTTACAGTGAGCTCGTGCAGTTGCTGCCTAGCCTTGAATGGA CGAAATTCCTCGTTAACAGCCTGTCTGTACTATCTCTAAGCTATGCGGCAAGTATTATAACATTAGCCGGAGCTTTT TTTATTGCGAGCCAAGTTCCCGGTACGTTATGGCATCGGGATGTGGAAACTGCCTCAGATTTCACCATGACAATTAG CAGTCAGGTATCCGTTTCAAATGCATTAGAGTCCACACCTTTTATTAATGTCCAAGTATCCTCACTTTTACGTAAAA CGAGTAACGTGAATGTTGAATTGATGAATCTTGAGAAGACAAGCTGCGCCAGGGCAAGACTGAATAAAATTGAATAT AAGCATCCGCGCCATCAATCAGTACCTACGCTTTCATTCATAAGCACCTGTCCACAACTTGATTTATTGTATGCCCA AGATGGAACAAGGAATTGGCTAAATATTAAATCGCCCAACAGCGACTACAAAAACAACAGAAACCTTTCAATAACAG TTAGATCCGTACAACTAGAAGAGCAAAAGGCTGAAAATGATTCCGTTATTTGGCAAGCCAACACCGAAGCAATGCAG TAAttgtttttaccgcaaaacttaaagaggtgctaacaactgatataaaataaatatatttatattatatataatat caatataataatattgataacaaattaaccaagcgtacgagtaatataacatgcataacagtaatacgaaactgctt ttatttcttcacagaactaatgttcgctggcttaatcaaactgtcataaaaactataatagcacattattatatgtg cctagaggtggatactttggatgctaaactaaaatgaacaaaataagttagattgttctatattatattaaaataaa atgtttctgttgctctacatataaggaaatacttttttaaggaacaaaattatggccatcggacttttagttttccc ggactttcgtccactcgaagcgtttttccgagataaataagttcgattattatacatacaagctgtaatatgttgcg ctatacttcaaagttactgccttacactgaccgaaatcatttacaaaacaagagagaattctataatcaagttcccc aactgtaactcagctggtgcaaagacactagaataacaagatgcgtaacggccatacattggtttg HL02313.5prime CATATTTCAGTTATTTATTTAGAAATGGGGCGATTTAAGTTATGTGCTTCGCCGAGAGAG----------------- \-------------------------------------GTTATGAAGTACGAAGACTTTATAAAACGCATTCGAAAAA GCCTCTACTATGGCGTTGGAACACCAGACACAGAAATGTCGGTCTCCTTACCCTTTGCGGAGTACGCGGCAGATTTG TTTTCGGAGACTCATCGCGGGCATTCTTTGCATCGCCTAAGTTGCGTATCTGCTGCACAAGTACATGCCACGCCTTG CTCTTTAATTATGGCATTGATATACCTCGATCGCTTAAACGTCATCGACTCGGGCTATAGCTGCAGAATCACACCAC AGCAGCTGTTTGTTGTGTCACTA------------------------------------------------------ \----------------------------------------------------------------------------- \--------------------------------------------ATGATTTCCACAAAATTCTACGCGGGCCACGAC GAACGGTTCTATCTGGAAGACTGGGCCAGTGACGCTTGTATGACGGAAGATAGGCTCAAGGCAGTCGAGCTCGAATT TCTTTCCGCTATG-------------------------------------------------GGTTGGAATATATAC ATATCCAATGAGCTATTCTTTGATAAGTTAAGAAACGTTGAACGTTCTTTGGCTGAACAGCAGGGACTGCGTAGAGG TTGGCTCACTTACAGTGAGCTCGTGCAGTTGCTGCCTAGCCTTGAATGGACGAAATTCCTCGTTAACAGCCTGTCTG TACTATCTCTAAGCTATGCGGCAAGTATTATAACATTAGCCGGAGCTTTTTTTATTGCGAGCCAAGTTCCCGGTACG TTATGGCATCGGGATGTGGAAACTGCCTCAGATTTCACCATGACAATTAGCAGTCAGG translation M G R F K L C A S P R E \-------------------------0---------------- \------------ V M K Y E D F I K R I R K S L Y Y G V G T P D T E M S V S L P F A E Y A A D L F S E T H R G H S L H R L S C V S A A Q V H A T P C S L I M A L I Y L D R L N V I D S G Y S C R I T P Q Q L F V V S L \-- \--------------0-----------------------------------0-------------------------- \---------0-----------------------------------0------------------------------- \----0-----------------------------------0-----------------------------------0 \-----------------------------------0-----------------------------------0----- \------------------- M I S T K F Y A G H D E R F Y L E D W A S D A C M T E D R L K A V E L E F L S A M \------------ \-----0------------------------------- G W N I Y I S N E L F F D K L R N V E R S L A E Q Q G L R R G W L T Y S E L V Q L L P S L E W T K F L V N S L S V L S L S Y A A S I I T L A G A F F I A S Q V P G T L W H R D V E T A S D F T M T I S S Q V S V S N A L E S T P F I N V Q V S S L L R K T S N V N V E L M N L E K T S C A R A R L N K I E Y K H P R H Q S V P T L S F I S T C P Q L D L L Y A Q D G T R N W L N I K S P N S D Y K N N R N L S I T V R S V Q L E E Q K A E N D S V I W Q A N T E A M Q Z Nice gene, with all intron termini predicted This is the available 5' region tacgcataaaaaaaaatttatccaaatatctccaatagtttaggaggtattaatttttgtaaaaaaacaggccaaat gccccatagtgcagcggcgtcaccatcgccgtggtcctaattgtttatgggatcatccaaaacatccgttactgtcg agtcaaaagacgacaccccgactccagcatcgagatgacgatatcttcccgaaaagccacggacgactttaacctaa cgcggggaggagttaacacgctaactccacgataagaccggacgcgggcgcagtaacgtcagcgcgactgcaactaa gtcgcgaaatatgactcctgcattccaacatgcccggagcgtgtgaagcgcaatgtcagtattctgccgtgagcgct gcttcagaagacgggctacttcatattaagcttaagttctctgtctttagtttaaaaactcatcagaacgcgcatag tcgcataataaatctcaataattaaaattgtttgttaatttatataaggctttttattcacgttgtttctctttcca gctcttgacttaagcttctcgacctcgataacactatcgcttgttcttaagacaagacaattaattctatcgatata agtgttagctagtattttatatttatacaccgtgtactcttagtattttatatttatagacagcttacaaaacaaaa aatcgaaaacttggggtttgaattaaacatttaatagtcaattatttctatttggcatatccctttttagtttttac tgcgcttggtaacgcctaatggtgtgcattaccatacaaaaattgtatgaactaaaaaagagtgtttctcttctcat cgtttctaaaaacctctgcatggtaatgccggcagcttgacgatttttttaaaagtaattaaaaatttattagatcg gtatcggttttttttataggactaaatagttttatt \**************A. mellifera BB260017A10H4.F AGCTATCATAGGTTAGGAACGATACATGTTTCA AAACCTAGTATACAAAGACGAAGCGGAAGTGATCCTAGCGAAGTAGCAAATGTTACAACTTTAGGCGAAGCTATTGA AAAGTTGAATACTTCAAAAGATTCGACTTTGAGAAGAAACTCTGGTGGTCACGCAACGGTAACACGAAGTCATAGTA GCTTATATGGATTAACGAGGCCAACCGATGAACGTTTAATAACACATCCGTTAAATCGTACATCTTCTCATGGTCAC TTAAGTTTCGAGGAAATGTGTAAAGGAAATGAAATAAAATCAAAAGTATGGAATTCAGAAGTTATAGCTCCACCTGA TGATGTTCAGACTCGACTAGGAATAGAGATGCTTACGCAACGAGATTTAACCAAATTACAACCATTATTATGGCTTG AACTTACTGCGGTATTTGACAAATATAACGTACCATTAAAAAAACGCAAACCAAATAAACGTCGAACAAAAGCTGGA AATCTATTCGGGGTTTCATTATCAACTCTTTTACTTCGAGATAGTCAATTATCGTCCGAGGAGAGTAATATTCCATT AGTATTTCAAAAACTTTTTAACGAATTAACGAGACGTGGTGTTAAAAAAAAGGGTATTCTTCGCGTTGGAGGACATA AGCAGAAAGTTGAATCAATATGCATGCAGCTGGAAACGGATTTCTATTGTAACCT full ORF encodes >240aa leucine and serine-rich oprotein; 31% over 144aa; 3-11 BLASTX match to KIAA1314 protein <up>Homo sapiens</up>; genomic match is to an unannotated region of a short messy scaffold; indicates that other missing genes might be similar; no ESTs AE003032.2 cctattgtcatttaataatactaattttatgcaaaatttaatttgtttactaaa aggaacaagggacagttacacgctcacgaagtgccactccggattccctggactctttacaaatcgATGAAGCTTGG ACCAACAATTCTTTGTATGTATTTTCCAATCGCTTAAGCCCCGTTTCCTTACTTTAAATTTATTCAATAGACCGACT TTTGTAAATGTATATGAAAAAAATACCGAAACTGCCATACAATGTGTAGAACAATCAAACGAAATATATTTAAAACA AAACCTGCGACGAACCCCTAGCGCCCCGCCTAAAAGCGGCACCTATGCGGATATATTTCGTGGTTCGCAAGTGCGAT GTGATATACCTTTGTACTCGGCGGATGGTGTAGAATTGCTTGGATATTCACGGATTGGCACCATACAATTTCCTCGG AACCGATCCGTCTCTGATCCATTTTGTTCTATTGGGTATGTTATTTTTAATAATTTGAGGACTCCTAAGGGGTTTGC CTTTCTAAAGTTATGTGTTTTCAGTCGATCAAAGGAGTCAAGAAGTGAAAATGATGCACGATCACAAAAGAAAAAAT CGAGTGAAGTGCTGTCAGCCTCAGAGAATGAATGCGGTCGCCTTTTACCGATGCCCTACAATGTCCTGAGCTTTGAA AGTATTTGTCGAGATTCTTCTAGTCTTGATAGCTGCGAAGTCCTAGATACCTGTGATATCCCTTCAACGCTATTCAC AGATGTCGTATTAATAAGCGAAACTGACATGAAACGTTTACAGACAATACTTTGGTTAGAGTTAGCCACAATATTTG ATCGCAACAAAGTTTCTTTAGATAAAAGAAAACCTTTTAAGCGTCGTCGCAAAGAAGAGGGTAATCTTTTTGGGGTA TCTATAAACGCTCTTATTCGTCGGGATCAGCAAGTTACTGGCACTGACTCATCTTTGGTCCCACTATTTTTGGAAAA GCTAATTGGCGAACTTCTGCGACGTGGCTCTAGAGAAGAAGGATTACTTCGAATAGGTGGTCATAAACAAAAGgtta taataataaataatgttaataatgaaaataattggtaatacaattttaacaattttctattttcagACTGAATTACT TTATAATGAATTAGAATCAACATTTTATCAAAATCCAGATAATCTAGATAACCTCTTTCGCACAGCTACTGTTCATG AACTTAGTTCGTTGCTAAAACGATGGCTGCGCGAACTTCCTCAACCTTTGCTTACTAATGAGCTTATACAACTGTTT TATCAATGTCACACACTTCCATCAATAGATCAAATGAATGCACTATCGATTTTATGTCACCTGCTTCCGCCTGAAAA TAGAAACACATTACGTTCATTATTAAGCTTTTTTAATATTATAATTAATTTAAAAGATATAAACAAAATGAATGTGC ATAACGTAGCAACAATAATGGCACCGTCAATGTTTCCACCACGTTATATACATCCGAGTGACAATAACAGCATTGCA GAACAAGTAAGAATGGCCGCTCAGTGTTGCCGTTTGACGAATATTTTAATCCTACGTGGCGAAAAACTTTTCCAAGT ACCAAACAATTTAATTGTGGAGTCACAGAAAACAATGATGGTATGTGTTATTCTTGAGTTTTTAATTAACCGTAAAT ATATATGTTTCCATAGGGTAAGAAAGGATGGCATCGGCATCGGAATTCAAATGAAATTACGGCAAAACCAAGCGGAA AGGCGAGCAATGTCGGCGTTGGACACGACTCTACAGTTATAAATAAATACTCAACCAATTTAAAGCATTTACATCCA TTTGTTATTTAAACAAACGGCTTCTAACGGTGCTTAAGTTGTATTATATGTTGATAAAATATTTACCTATTATTAAA GAAAATATAAAGAATATCCTCTATAAACCGTCAAATTGAAAAAAATGTTCAATTTCAAACCAATTTCAAATGTTCAG ATTCAAACCAAAAATTAAGTGAAGCTCAACTGAATAGTTTTGAAAACCCTTTCTAACATTTTTTTTGTTCCTTTTCA AATTTTTGATCTTTCGAACTGCTTCAAACTCTACCTATTTTGCTGGTAAAATTATGGAAGAATATATATATT translation M Q N L I C L L K G T R D S Y T L T K C H S G F P G L F T N R Z S L D Q Q F F V C I Q S L K P R F L T L N L F N R P T F V N V Y E K N T E T A I Q C V E Q S N E I Y L K Q N L R R T P S A P P K S G T Y A D I F R G S Q V R C D I P L Y S A D G V E L L G Y S R I G T I Q F P R N R S V S D P F C S I G Y V I F N N L R T P K G F A F L K L C V F S R S K E S R S E N D A R S Q K K K S S E V L S A S E N E C G R L L P M P Y N V L S F E S I C R D S S S L D S C E V L D T C D I P S T L F T D V V L I S E T D M K R L Q T I L W L E L A T I F D R N K V S L D K R K P F K R R R K E E G N L F G V S I N A L I R R D Q Q V T G T D S S L V P L F L E K L I G E L L R R G S R E E G L L R I G G H K Q K \-----------------------------------1-------------------- \-------------- T E L L Y N ELESTFYQNPDNLDNLFRTATVHELSSLLKRWLRELPQPLLTNE LIQLFYQCHTLPSIDQMNALSILCHLLPPENRNTLRSLLSFFNIIINLKDINKMNVHNVATIMAPSMFPPRYIHPSD NNSIAEQVRMAAQCCRLTNILILRGEKLFQVPNNLIVESQKTMMVCVILEFLINRKYICFHRVRKDGIGIGIQMKLR QNQ A E R R A M S A L D T T L Q L Z This turns out to be a huge complicated gene, starting with the annotated CG17082, but extending far further over a section of nnnnns Here are the cDNAs that match LD04957.5prime AATTCGGCACCAAGGAAAAGAAGTTACAACCTGCTACCCAGTTGATGATTTTTTGTTG ATGAAACTAGTAGTTTTGCACATAAGCTGTGTCTAATTTACTTAACTTTTTATAATTAAAAAAAGTGTTTGTTTATT TTAATATGAAAAAACAACTAGATATGCGTGTTGTCATGGGTTCTTAGGTATTTTCTCCCATGGATACGAGAAAATTT GTTATAAAGAGACGTTTAAAATGAACAGCAATACAGATCTCCATCATTCAGATGATCAGGACTTTTCGGAATTTCTC AATGAGTATTATCTGCAAAGCAATTCCCAAAGTATCGAACCTGAAGCAAGTTACGAAGATGGAGAAATGGAAGCAGA GTGGCTAGTTTCTGCTGGTTATCCAGAATTGACAAAACCGTTTG LD16910.5prime CTGCTACCCAGTTGATGATTTTTTGTTG ATGAAACTAGTAGTTTTGCACATAAGCTGTGTCTAATTTACTTAACTTTTTATAATTAAAAAAAGTGTTTGTTTATT TTAATATGAAAAAACAACTAGATATGCGTGTTGTCATGGGTTCTTAGGTATTTTCTCCCATGGATACGAGAAAATTT GTTATAAAGAGACGTTTAAAATGAACAGCAATACAGATCTCCATCATTCAGATGATCAGGACTTTTCGGAATTTCTC AATGAGTATTATCTGCAAAGCAATTCCCAAAGTATCGAACCTGAAGCAAGTTACGAAGATGGAGAAATGGAAGCAGA GTGGCTAGTTTCTGCTGGTTATCCAGAATTGACATAACCGTGTGAACAAGGGTTAGA LD08837.5prime CTTTACACAAACATCTTGATATTT GCCTTAGTAAATCTTTAGTGTATAGGTGAAATATACTGAATTTCTGTGTATTTTCTCCCATGGATACGAGAAAATTT GTTATAAAGAGACGTTTAAAATGAACAGCAATACAGATCTCCATCATTCAGATGATCAGGACTTTTCGGAATTTCTC AATGAGTATTATCTGCAAAGCAATTCCCAAAGTATCGAACCTGAAGCAAGTTACGAAGATGGAGAAATGGAAGCAGA GTGGCTAGTTTCTGCTGGTTATCCAGAATTGACAAAACCGTTTGAACAAGGGTTAGAGGTTTCTAAAAAGGACTTGG AACCCATACTGACTACTTTATCGAAGCCCCATGCTGAAGCGATTGTACAACTAGTAAGGACTTTAAACAAAACAGTA CGGGTGCGCACAAAAAGTCGGCCAAAACGGAAACCTGATATAAGGGATGTATTTCGCGAATTTGATGAACAAGGGAC AGTTACACGCTCACGAAGTGCCACTCCGGATTCCCTGGACTCTTTACAAATCGATGAAGCTTGGACCAACAATTCCT TACCGACTTTTGTAAATGTATATGAAAAAAATACC LD34572.5prime TATTTTCTCCCATGGATACGAGAAAATTT GTTATAAAGAGACGTTTAAAATGAACAGCAATACAGATCTCCATCATTCAGATGATCAGGACTTTTCGGAATTTCTC AATGAGTATTATCTGCAAAGCAATTCCCAAAGTATCGAACCTGAAGCAAGTTACGAAGATGGAGAAATGGAAGCAGA GTGGCTAGTTTCTGCTGGTTATCCAGAATTGACAAAACCGTTTGAACAAGGGTTAGAGGTTTCTAAAAAGGACTTGG AACCCATACTGACTACTTTATCGAAGCCCCATGCTGAAGCGATTGTACAACTAGTAAGGACTTTAAACAAAACAGTA CGGGTGCGCACAAAAAGTCGGCCAAAACGGAAACCTGATATAAGGGATGTATTTCGCGAATTTGATGAACAAGGGAC AGTTACACGCTCACGAAGTGCCACTCCGGATTCCCTGGACTCTTTACAAATCGATGAAGCTTGGACCAACAATTCTT TACCGACTTTTGTAAATGTATATGAAAAAAATACCGAAACTGCCATACAATGTGTAGAACAATCAAACGAAATATAT TTAA LD27621.5prime CAGATCTCCATCATTCAGATGATCAGGACTTTTCGGAATTTCTC AATGAGTATTATCTGGCAAGCAATTCCCAAAGTATCGAACCTGAAGCAAGTTACGAAGATGGAGAAATGGAAGCAGA GTGGCTAGTTTCTGCTGGTTATCCAGAATTGACAAAACCGTTTGAACAAGGGTTAGAGGTTTCTAAAAAGGACTTGG AACCCATACTGACTACTTTATCGAAGCCCCATGCTGAAGCGATTGTACAACTAGTAAGGACTTTAAACAAAACAGTA CGGGTGCGCACAAAAAGTCGGCCAAAACGGAAACCTGATATAAGGGATGTATTTCGCGAATTTGATGAACAAGGGAC AGTTACACGCTCACGAAGGNCCACTCCGGATTCCCTGGACTCTTTACAAATCGATGAAGCTTGGACCAACAATTCTT TACCGACTTTTGTAAATGTATATGAAAAAAATACCGAAACTGCCATACAATGTGTAGAACAATCAAACGAAATATAT TTAAAACAAAACCTGCGACGAACCCCTAGCGCCCCGCCTAAAAGCGGCACCTATGCGGATATATTTC LD04154.5prime TGGCTAGTTTCTGCTGGTTATCCAGAATTGACAAAACCGTTTGAACAAGGGTTAGAGGTTTCTAAAAAGGACTTGG AACCCATACTGACTACTTTATCGAAGCCCCATGCTGAAGCGATTGTACAACTAGTAAGGACTTTAAACAAAACAGTA CGGGTGCGCACAAAAAGTCGGCCAAAACGGAAACCTGATATAAGGGATGTATTTCGCGAATTTGATGAACAAGGGAC AGTTACACGCTCACGAAGTGCCACTCCGGATTCCCTGGACTCTTTACAAATCGATGAAGCTTGGACCAACAATTCTT TACCGACTTTTGTAAATGTATATGAAAAAAATACCGAAACTGCCATACAATGTGTAGAACAATCAAACGAAATATAT TTAAAACAAAACCTGCGACGAACCCCTAGCGCCCCGCCTAAAAGCGGCACCTATGCGGATATATTTCGTGGTTCGCA GTGCGATGTGATATACCTTTGTA LD15784.5prime AAAAGTCGGCCAAAACGGAAACCTGATATAAGGGATGTATTTCGCGAATTTGATGAACAAGGGAC AGTTACACGCTCACGAAGGTCCACTCCGGATTCCCTGGACTCTTTACAAATCGATGAAGCTTGGACCAACAATTCTT TACCGACTTTTGTAAATGTATATGAAAAAAATACCGAAACTGCCATACAATGTGTAGAACAATCAAACGAAATATAT TTAAAACAAAACCTGCGACGAACCCCTAGCGCCCCGCCTAAAAGCGGCACCTATGCGGATATATTTCGTGGTTCGCA AGTGCTATGTGATATACCTTTGTACTCGGCGGATGGTGTAGAATTGCTTGGATATTCACGGATTGGCACCATACAAT TTCCTCGGAACCGATCCGTCTCTGATCCATTTTGTTCTATTGGTCGATCAAAGGAGTCAAGAAGTGAAAATGATGCA CGATCACAAAAGAAAAAATCGAGTGAAGTGCTGTCAGCCTCAGAGAATGAATGCGGTCGCCTTTTACCGATGCCCTA CAATGTCCTGAGCTTTGAAAGTATTTGTCGAGATTCTTCTAGTCTTGATAGCTGCGAGTCCTAGATACCTGTGATAT CCCTTCAACGCTATTCACAGATGTCGTATTAATAAGCGAAACTGACATGAAACGTTTACAGACAATACTTTGGTTAG AGTTAG Contig AATTCGGCACCAAGGAAAAGAAGTTACAACCTGCTACCCAGTTGATGATTTTTTGTTG ATGAAACTAGTAGTTTTGCACATAAGCTGTGTCTAATTTACTTAACTTTTTATAATTAAAAAAAGTGTTTGTTTATT TTAATATGAAAAAACAACTAGATATGCGTGTTGTCATGGGTTCTTAGGTATTTTCTCCCATGGATACGAGAAAATTT GTTATAAAGAGACGTTTAAAATGAACAGCAATACAGATCTCCATCATTCAGATGATCAGGACTTTTCGGAATTTCTC AATGAGTATTATCTGCAAAGCAATTCCCAAAGTATCGAACCTGAAGCAAGTTACGAAGATGGAGAAATGGAAGCAGA GTGGCTAGTTTCTGCTGGTTATCCAGAATTGACAAAACCGTTTGAACAAGGGTTAGAGGTTTCTAAAAAGGACTTGG AACCCATACTGACTACTTTATCGAAGCCCCATGCTGAAGCGATTGTACAACTAGTAAGGACTTTAAACAAAACAGTA CGGGTGCGCACAAAAAGTCGGCCAAAACGGAAACCTGATATAAGGGATGTATTTCGCGAATTTGATGAACAAGGGAC AGTTACACGCTCACGAAGGTCCACTCCGGATTCCCTGGACTCTTTACAAATCGATGAAGCTTGGACCAACAATTCTT TACCGACTTTTGTAAATGTATATGAAAAAAATACCGAAACTGCCATACAATGTGTAGAACAATCAAACGAAATATAT TTAAAACAAAACCTGCGACGAACCCCTAGCGCCCCGCCTAAAAGCGGCACCTATGCGGATATATTTCGTGGTTCGCA AGTGCTATGTGATATACCTTTGTACTCGGCGGATGGTGTAGAATTGCTTGGATATTCACGGATTGGCACCATACAAT TTCCTCGGAACCGATCCGTCTCTGATCCATTTTGTTCTATTGGTCGATCAAAGGAGTCAAGAAGTGAAAATGATGCA CGATCACAAAAGAAAAAATCGAGTGAAGTGCTGTCAGCCTCAGAGAATGAATGCGGTCGCCTTTTACCGATGCCCTA CAATGTCCTGAGCTTTGAAAGTATTTGTCGAGATTCTTCTAGTCTTGATAGCTGCGAGTCCTAGATACCTGTGATAT CCCTTCAACGCTATTCACAGATGTCGTATTAATAAGCGAAACTGACATGAAACGTTTACAGACAATACTTTGGTTAG AGTTAG Put these together to encode a 296aa protein at 14% serine. Seems good, but too hard to put all the introns together here. \*****************A. mellifera BB260019B20F2.F GGATCGTTATCCTTTCAATGACGAATAATTATTCTAAAATCGTATCAAAATTACTAACGAGTCAAAAATTTTACCTG TGCTCGAGAAACGCGCGTAGTATAAATAAAATTGGACGGATATCCTTTCAATGACGAATAATTATTCTAAAATCAAA ATTAACGAGTCAAAAATAAGCGTTGAAAGAGAAAGAAAATTTTAAGAGGAGAATGTAAATGGAATCGATAGATGACG GTGGCTGTCACGATGCTGCAAGGTCGACCGGAAGCGAGACGGGGTGTTCAATTTCCGCCGAATGAAACGAAGCTGTG CCCGTTTCAGGAACTCGCGCACAAGCTCTCGCGGAGCTTCGACATGCAGGAGGCTCAGCTCCTCGAGGAGGGTGGCT CTGGTGCCGCTACGGCGGGTGCTGGGGGCGCGGGTGGCCCGCCACGAAGACATCACAGCGCTCAGAGATTGGCCAGG AGCGAAATATCTGAGAGAAGGGAAGAGGACGGAGCGTTGGTGGTACCCGATCACCAGGGCAACCTGAGGATCACCGT GAAGAAGACCAAGTCGATTTTGGGCATTGCCATCGAGGGCGGTGCCAATACCAAACATCCACTGCCCAGGATCATCA ATATACACGATAATGGGGCAGCTTACGAGGCAGGTGGCCTCGAGGTCGGTCAACTGATCCTAGAAGTCGATGGACAC AAAGTGGAAGGTCTGCATCATCAGGAGGTGGCAAGACTGATCGCAGAGTCGTTCGCGAGGCGCGATCGCAACG long ORF after long 5' UTR encodes 11% G/A protein; weak but convincing BLASTX matches to several Drosophila proteins; especially CG7151 at e-05; strangely there is a BLASTX match to KIAA1526 protein <up>Homo sapiens</up> which is far longer and better at 32% over 175aa; e-18; genomic match is even better at 80% to unannotated region; NEW GENE!; but no ESTs? Most of it is a PDZ domain. There are three versions of the human protein annotated, 500, 700 and 900aa long, and in each we have the C-terminus, so perhaps our cDNA is unspliced? AE003536.2 tcgaaaccgaaactaaaattaaatgaataggagttggccacaaaactctcaaga agtttcgatATGAAGGAAGAAGGCGGCACATTGCTGGGCGATAAAGGTGTACGAAGGCATCAGgtgagtgctactcg aaaatagtgcaagtacactaaataaacagatatctatatcaacaaataataactgaatacagcacttgaatgccaac aagagaaaccagtatacgtttcagctcgaaattaacagagttaaatcatattatcataagactctcgtcggtttttc tattaaattctatacaattcggtcaccccagaacatttcatgattcattccaaacaaatcacacaaaagtttctact ttctatctctctaattctctttatacctaactcaaaaataaacagcaactgtaccattgattattttaaactagcaa tatactatgtaaatttgaattggcctaccaatagcttgaaccgtttcctattgtttgaacccacacgtaattaagta caagaacccaaaacagtatacatatacaaattccttgaatcaaccaattgttaaattgcttgcctttaccaaatcac agaatctattttatcagcttgaaccaagttacttttaatcaaaccaatacacaatttgaattgtaagctaatcctag aagtaaagccaatggaatatttttgatcaaaccggaaaaacctactaaaatctttaatttagtttaaagatttttca tgaacttggcctatttgtttttcagTCCATGCAGCGTCTGTCAGCGGAGCAGAATGGTGGTTCAACGACTGAACAAA CACATGAACACAATCCAAACGTCGTACCTGATCATAGAGGCAACTTACACATTACAGTTAAGAAAACCAAACCAATT TTAGGTATTGCTATCGAAGGTGGTGCTAATACAAAACACCCGCTCCCTAGGATAATCAATATCCATgtaagtacgca atagatcataaaatgactttttgagactcaaatgcctggcaaatagGAAAATGGTGCAGCATTTGAAGCGGGCGGCT TAGAAGTCGGCCAACTCATCCTGGAGGTAGATGGAACGAAAGTGGAGGGTCTGCATCATCAGgtatgttcgagctcc cttttttttttgagttattaatttgacggttttctttttgattttccagGAGGTTGCTCGACTAATAGCCGAATGCT TTGCTAATCGTGAAAAGGCTGAAATAACCTTCTTAGTTGTCGAAGCAAAAAAATCAAATTTGGAACCGAAGCCGACG GCGCTGATATTTTTAGAAGCCTAAcatttgcttgccctgccggccagtgacccattggaacgacaaaaactcgagtt cctttacgaatggggcatcgatttaacagagacgccaaaaccaatgccaacaccaataccgctaacgaaagccaaga atccgccgccactgccccatgagttgcacaacaacatcaacagccagtatggtagcagtgcggctctcagcaatcat caacctcatcagcatacgcatccacatccccagcagcagcagcagcaacaacaacattcaaacacaaaaacgcccaa cacgaacagcaacaaaacacaaggaacaccaacaacgggaacgggagctgcaacaactggcagcaaacaacaacagc aaccgggaaacaccaccaacacaccaacgaaggcgtcgcgtgaggcgactcccacaagggagcagcatc translation M K E E G G T L L G D K G V R R H Q \----------------0------- \------------------------0-------------------------------0-------------------- \-----------0-------------------------------0-------------------------------0- \------------------------------0-------------------------------0-------------- \-----------------0-------------------------------0--------------------------- \----0-------------------------------0-------------------------------0-------- \-----------------------0-------------------------------0--------------------- \----------0-------------------------------0-------------------------------0-- \-----------------------------0-------------------------------0--------------- \--------------- S M Q R L S A E Q N G G S T T E Q T H E H N P N V V P D H R G N L H I T V K K T K P I L G I A I E G G A N T K H P L P R I I N I H \--------------------- \-------0---------------------------- E N G A A F E A G G L E V G Q L I L E V D G T K V E G L H H Q \------------------------0 \--------------------------------------- E V A R L I A E C F A N R E K A E I T F L V V E A K K S N L E P K P T A L I F L E A Z This is highly speculative, and there's plenty of room for more 5' Drosophila MKEEGGTLLGDKGVRRHQSMQRLSAEQNGGSTTEQTHEHNPNVVPDHRGNLHITVKKTKPILGIAIEGGANTKHPLP RIINIHENGAAFEAGGLEVGQLILEVDGTKVEGLHHQEVARLIAECFANREKAEITFLVVEAKKSNLEPKPTALIFL EAZ Honey bee from below MPRTRAGCRRTPQHQVTTSDLALSNDDECDNQDYEDELENGRGRRHSSPGGSRGNPRDYGHHHLHPDNLELAHKLS RSFDMQEAQLLEEGGSGAATAGAGGAGGPPRRHHSAQRLARSEISERREEDGALVVPDHQGNLRITVKKTKSILGIA IEGGANTKHPLPRIINIHDNGAAYEAGGLEVGQLILEVDGHKVEGLHHQEVARLIAESFARRDRNEIEFLVVEAKKS NLEPKPTALIFLEA \****************extras \- the first 90kb of AE003536.2 above is unannotated \- but some possible ORFS in it, so try BLASTX searches, and find NOTHING! Trul y 90kb of nothing! \****************A. mellifera BB260020B10C4.F GGCATCCAGGATAGAGCAGCGACGATATCGC AATCGAGAGAGTGCAATCGATGAATGTCATCAAGAAGAATGACCAACTGTTCCTTTATAGGTTCCTCCTCGATCCGT CTCATAATTTGCCCCAGCCAATTGTTCAAATAAAGAGGATCAAACGAGGCGTCTTTGGGAAGATTTCCATCTAACGA GCCGGAAAGGAAGCCGACGTGTTGACAGAGTACGCGAAGCAGTTCGAGACTATACGCTGATCGAGGTGTAGACGCAC AGAGGCGAACAATCCTGAATGTGGGACAATCCAACCAGGTGGAAGCATCCTCGTAAATTCTTGACAGCAACGATGAT TTCCCGGAATATTTACCGCCTTTTATCAATATGGGGCCATGCTTCTGTGATTTACCCGATATTAAGATCTCCCTGAT CCTTTCAATATTCTCGTTGTTCTCGCAGGGTTTAATCTCCCTAAGAAGCGCCAGATGTACCAGACTCTCCGCATAGA TTTCCTGCACCATCTTCTTCTTGCTCTTGATTTCTGGATCCGCTTCTATAGATGCGTTCACCAAATTCTGCACGCGA TTCACCACCAAGCGACAAAGATGAGCCAGGTAGCCTTTATGGGATCTATTATCGGGATTAATGCCGCCTTGCTTCTG ATCCACATCGATGTTAATTACATTGTCGGATGGGAGACGGTTGTCCACCAAAGACTTCAACGAGGCAACTGCTCTTG TGCTCTTTTTGCCTTTCCATGTGCGTTTCACGGCTATTATGCCGTCACCTGTTTCG full ORF on RC encodes 260aa 13% leucine protein; only very weak BLASTX matches to Drosophila proteins; nothing more in nr; genomic match is near N-terminus of 1240aa protein BcDNA:GH04922 gene product; with two ESTs, so could be alternative splice? In fact end of this translation overlaps N-terminus of this protein, so is probably even longer. Also, weakly matches C. elegans T05C3.2 , which is 2340aa, from 900-1150, so Drosphila protein could be a lot longer. In fact, connects CG17671 to BcDNA:GH04922 , but the two Drosophila ESTs are only for the start of the latter. I will not try to figure it all out. \*****************A. mellifera BB260022B10E6.F ATGCCCAGGACCAGAGCTGGGTGCAGGAGG ACGCCGCAGCACCAAGTGACCACCTCCGATCTCGCCCTCTCCAACGACGACGAGTGCGACAACCAGGATTACGAGGA CGAGTTGGAGAACGGGCGTGGACGTAGGCACAGCTCGCCAGGTGGCTCGAGGGGGAACCCCAGGGATTACGGCCATC ATCATCTTCATCCGGACAATTTGGAACTCGCGCACAAGCTCTCGCGGAGCTTCGACATGCAGGAGGCTCAGCTCCTC GAGGAGGGTGGCTCTGGTGCCGCTACGGCGGGTGCTGGGGGCGCGGGTGGCCCGCCACGAAGACATCACAGCGCTCA GAGATTGGCCAGGAGCGAAATATCTGAGAGAAGGGAAGAGGACGGAGCGTTGGTGGTACCCGATCACCAGGGCAACC TGAGGATCACCGTGAAGAAGACCAAGTCGATTTTGGGCATTGCCATCGAGGGCGGTGCCAATACCAAACATCCACTG CCCAGGATCATCAATATACACGATAATGGGGCAGCTTACGAGGCAGGTGGCCTCGAGGTCGGTCAACTGATCCTAGA AGTCGATGGACACAAAGTGGAAGGTCTGCATCATCAGGAGGTGGCAAGACTGATCGCAGAGTCGTTCGCGAGGCGCG ATCGCAACGAGATCGAGTTCCTGGTGGTCGAGGCGAAAAAAAGCAACCTGGAGCCGAAGCCGACAGCGCTGATATTC CTGGAGGCGTAGCAGGAATCTCCCACCTCCGAGCCGGAACCACCGTAGCCCAACCTGACCAGAGCTCGGCCCGCGAC ACACTTGGAGCGTGTCGACGCTGAACCGACGTCATGAGAAG long ORF encodes G/A rich protein with only weak BLASTX match to several Drosophila proteins; 46% e-20 match to KIAA1526 protein <up>Homo sapiens</up>; genomic match is 85% to at least three exons in unannotated region; NEW GENE; no ESTs This protein has a PDZ domain, so could be same as BB260019B20F2.F? Indeed it is. But differ in the 5' regions, which is probably an intron in that one. >BB260019B20F2.F 766 0 766 ABI GGATCGTTATCCTTTCAATGACGAATAA TTATTCTAAAATCGTATCAAAATTACTAACGAGTCAAAAATTTTACCTGTGCTCGAGAAACGCGCGTAGTATAAATA AAATTGGACGGATATCCTTTCAATGACGAATAATTATTCTAAAATCAAAATTAACGAGTCAAAAATAAGCGTTGAAA GAGAAAGAAAATTTTAAGAGGAGAATGTAAATGGAATCGATAGATGACGGTGGCTGTCACGATGCTGCAAGGTCGAC CGGAAGCGAGACGGGGTGTTCAATTTCCGCCGAATGAAACGAAGCTGTGCCCGTTTCAGGAACTCGCGCACAAGCTC TCGCGGAGCTTCGACATGCAGGAGGCTCAGCTCCTCGAGGAGGGTGGCTCTGGTGCCGCTACGGCGGGTGCTGGGGG CGCGGGTGGCCCGCCACGAAGACATCACAGCGCTCAGAGATTGGCCAGGAGCGAAATATCTGAGAGAAGGGAAGAGG ACGGAGCGTTGGTGGTACCCGATCACCAGGGCAACCTGAGGATCACCGTGAAGAAGACCAAGTCGATTTTGGGCATT GCCATCGAGGGCGGTGCCAATACCAAACATCCACTGCCCAGGATCATCAATATACACGATAATGGGGCAGCTTACGA GGCAGGTGGCCTCGAGGTCGGTCAACTGATCCTAGAAGTCGATGGACACAAAGTGGAAGGTCTGCATCATCAGGAGG TGGCAAGACTGATCGCAGAGTCGTTCGCGAGGCGCGATCGCAACG ATGCCCAGGACCAGAGCTGGGTGCAGGAGGACGCCGCAGCACCAAGTGACCACCTCCGATCTCGCCCTCTCCAACG ACGACGAGTGCGACAACCAGGATTACGAGGACGAGTTGGAGAACGGGCGTGGACGTAGGCACAGCTCGCCAGGTGGC TCGAGGGGGAACCCCAGGGATTACGGCCATCATCATCTTCATCCGGACAATTTG----------------------- \----------------------------------------------------------------------------- \-----------------------------------------------------------GAACTCGCGCACAAGCTC TCGCGGAGCTTCGACATGCAGGAGGCTCAGCTCCTCGAGGAGGGTGGCTCTGGTGCCGCTACGGCGGGTGCTGGGGG CGCGGGTGGCCCGCCACGAAGACATCACAGCGCTCAGAGATTGGCCAGGAGCGAAATATCTGAGAGAAGGGAAGAGG ACGGAGCGTTGGTGGTACCCGATCACCAGGGCAACCTGAGGATCACCGTGAAGAAGACCAAGTCGATTTTGGGCATT GCCATCGAGGGCGGTGCCAATACCAAACATCCACTGCCCAGGATCATCAATATACACGATAATGGGGCAGCTTACGA GGCAGGTGGCCTCGAGGTCGGTCAACTGATCCTAGAAGTCGATGGACACAAAGTGGAAGGTCTGCATCATCAGGAGG TGGCAAGACTGATCGCAGAGTCGTTCGCGAGGCGCGATCGCAACGAGATCGAGTTCCTGGTGGTCGAGGCGAAAAAA AGCAACCTGGAGCCGAAGCCGACAGCGCTGATATTCCTGGAGGCGTAGCAGGAATCTCCCACCTCCGAGCCGGAACC ACCGTAGCCCAACCTGACCAGAGCTCGGCCCGCGACACACTTGGAGCGTGTCGACGCTGAACCGACGTCATGAGAAG See above file. The N-terminus encoded by the 5' part of this contig does not have a Drosophila match, but rest is clearly a good match, so unclear which one is right or wrong. \******************A. mellifera BB260023A20H5.F TGTACGCGCATGTGACAAATAGAAAAGAA TTTATTTCTTATTTTTTTTATATTTTTTTTTTCTTACAATGAATCAATATAATATAATATCTATATAAAAATTTATT AAAAGATGGAATTCATTAATTTTAAAATATATACGATAGTGATTATAAGCCTCTATTGTTTGGGCGTTGTCGGACAA TATGAGTGGCAAGCTAGAGATGCTTTTGATGAAATCCGTTTAAAAATGGATAAGATTAATGAAGAGAATTGTCCTAT TCAACATATCGGAGACCTCTATCTACCGGAAGATACAGTCTCTCATTTACCTGACATTAAAGATATTAATATCAATC CTGTATTTCCAAATAGAACTGCTTTACTTCATCTTCATAATATGGCTCTTAGTAGATCATTTTTTTGGAGTTATATT TTACAATCTCGATTCATACGTCCAGCTATCAATGATACTTATGATCCAGGCATGATGTATTATTTTTTATCAACAGT TGCTGATGTTTCTGCAAATTCACATATAAATGCTTCTGCTATATATTTCTCACCAAATATGTCTTATTCTTCATCAT ATAGAGGTTTTTTTAATAAAACTATGCCCAGATTTGCTCCGAGAACTTTTAGAGCTGATGATNTTAATGATCCTATA CATTTAGAAAGAATATCCACAAGAAATACATTTAATGTGCAAGATCTTGGGGCATTTGCCAGTGGTAGTCTTGGTGA AGATTATACAACAGATTATTATCGTATAAATGAA Long complete ORF encodes secreted 180aa 12% isoleucine protein; genomic match is 80% full-length e-108 to single exon is unannotated region; NEW INSECT GENE; two ESTs From 6-42kb is empty in this accession! Check for other genes too. AE003732.2 RC ATATATAGAAGCAATAGCATTGAAATAGCAACATTCATTTTTTGTATTAATGTT CGATATCAGCCAGCAACCAATATAAACAATATATCGATACTATCGGACAGGTGGCCCATCTCTGTTTGAGGGCGTAG TTCCAACAAGTGCTGAGCATCACAATTTTCTATTACTAAGCCCAGCTTTGCGTTGGCGCGCCCCAGAATCTCATTTT ATATTTAGTTTCTGCCAGTTTAGTTAATTAGTTAGTTGATAGTGTTGTTTGTTTCTTCTGCAACAATTGTGTGCGAT AGGAGTCGGGCAAAATGTTCCCGTCGTCGATTTTGGGGCGCAGCTATTTGCTTTTTATGCTGGTGCTCGCCGTGGGC GTGTTCGCCCAACACGAGTGGCAGGCCCGGGATGCCTTTGATGAGATAAAGAGGCAGTTCGACAAGGTGAACGCGGA TAACTGCCCCATCCAACACCATTCGGACCTTTTCATGCCCATGGACGCGGTGTCCCACAAGCCGGACATCAAGGAGA TCAACGTGAATCCGGTGTTCCCCAACCGAACTGCCCTGCTGCATCTGCAGAATATGGCCCTTAGCAGAAGCTTCTTC TGGAGCTACATCCTCCAGTCGAGGTTTATTCGACCCGCCATCAACGACACCTACGATCCCGGCATGATGTACTACTT TCTGTCCACCGTAGCCGATGTATCCGCCAACCCACATATCAACGCCTCGGCCGTGTACTTCTCCCCCAACAGCTCGT ATTCGTCGTCGTATCGCGGCTTCTTCAATAAGACGTTCCCCAGATTCGGGCCAAGAACCTTCAGGCTGGACGACTTC AACGATCCCATTCATCTGCAGAAGATATCGACGTGGAATACTTTCGATGTTCAGGATCTGGGCGCCCATCACCCGGA CTCCATATCCAAGGACTACACCCACGACCTGTATAAAATAAACGAGTGGTACCGCGCCTGGCTACCAGACAACGTCG AGGGACGGCACGATACGAAGATCACCTACCAGGTGGAAATCCGCTATGCGAACAACACAAACGAGACGTATACCTTC CACGGACCGCCTGGCTCTGAAGAAAACCCTGGTCCGATTAAATTTACAAGGCCGTACTTCGATTGTGGCAGGTCCAA CAAGTGGCTGGTGGCCGCAGTAGTGCCAATTGCGGATATCTACCCCCGACACACGCAGTTCCGTCACATTGAGTATC CCAAgtaagataccttgaatatcccctgaataccctccttttatctactgtatcgcttttagATACACGGCCGTTTC GGTTCTTGAGATGGACTTCGAGCGTATCGACATAAACCAGTGTCCATTGGGTGAAGGCAACAAAGGACCTAATCACT TTGCGGATACGGCGCGGTGTAAAAAAGAAACGACAGAGTGTGAACCATTACAAGGCTGGGGCTTTAGGCGCGGTGGC TACCAGTGCCGTTGTAAGCCAGGTTTTCGGCTGCCCAACGTAGTGCGGCGACCTTATCTGGGCGAGATTGTGGAGCG CGCATCGGCAGAACAGTACTACAACGAGTACGACTGCCTTAAGATTGGCTgtatgttttaagtagcaatatgtaaaa gtatgagatttgactcttgatgtttttttttagGGATCCAAAAGCTTCCCATTCAGTGGGATAAGGCCTCCTACCAC ATTCGCCAAAAGTATCTGGACCGGCATCCGGAATATCGCAACTACACCACCGGCTCGCGATCACTTCATGCTGAGCA CTTAAATATTGATCAGGCGTTGAAGTATATTCATGGAGTCAACTATCGCACTTGCAAAAAgtaagacacatacaaaa cttatccagccaaggtcacttcaataaactgatcaattatgctatcgccttgacagCTTCCATCCGCAGGATCTGAT TCTTCGCGGTGATGTGAGCTTCGGCGCCAAGGAGCAGTTCGAGAACGAAGCCAAGATGGCCGTGAGACTGGCCAACT TTATTAGCGCCTTTCTGCAGgtaagcaaacgattcagagcaaaggattcccatcgccttcacgctaaatgaagagca ataattgataacccgacacctattaagagccttcgacgacggctcttgaaaacttctcaagtgtaaattataatttt ccacgcgtaattcaacttcctcgaatttcctgcattgccagtttctcggttcttaccgatgctgct LD18575.5prime CTCCGGTTTGAGGGCGTAG TTCCAACAAGTGCTGAGCATCACAATTTTCTATTACTAAGCCCAGCTTTGCGTTGGCGGGCCCAAGAATCTCATTTT ATATTAAGTTTCTGCAAGTTTAGTTAATTAGTTAGTTGATAGTGTTGTTTGTTTCTTCTGCAACAATTGTGTGCGAT AGGAGTCGGGCAAAATGTTGCCGTCGTCGATTTTGGGGCGCAG-TATTTGCTTTTTATGCTGGTGCTCGCCGTGGGG CTGTTCGCCATACACGAGTGGCAGGCCCGGGATGCCTTTGATGAGATAAAGAGGCAGTTCGACAAGGTGAACGCGGA TAACTGCCCCATCCAACACCATTCGGACCTTTTCATGCCCATGGACGCGGTGTCCCACAAGCCGGACATCAAGGAGA TCAACGTGAATCCGGTGTTCCCCAACCGAACTGCCCTGCTGCATCTGCAGAATAT LD21417.5prime CTAAGCT---NTNTGCGTTGGCGCGCCCCAGAATCTCATTTT ATATTTAGTTTCTGCCAGTTTAGTTAATTAGTTAGTTGATAGTGTTGTTTGTTTCTTCTGCAACAATTGTGTGCGAT AGGAGTCGGGCAAAATGTTCCCGTCGTCGATTTTGGGGCGCA-CTATTTGCTTTTTATGCTGGTGCTCGCCGTGGGC \-TGTTCGCC-AACACGAGTGGCAGGCCCGGGATGCCTTTGATGAGATAAAGAGGCAGTTCGACAAGGTGAACGCGGA TAACTGCCCCATCCAACACCATTCGGACCTTTTCATGCCCATGGACGCGGTGTCCCACAAGCCGGACATCAAGGAGA TCAACGTGAATCCGGTGTTCCCCAACCGAACTGCCCTGCTGCATCTGCAGAATATGGCCCTTAGCAGAAGCTTCTTC TGGAGCTACATCCTCCAGTCGAGGTTTATTCGACCCGCCATCAACGACACCTACGA LD13768.5prime CGCCCATCACCCGGA CTCCATATCCAAGGACTACACCCACGACCTGTATAAAATAAACGAGTGGTACCGCGCCTGGCTACCAGACAACGTCG AGGGACGGCACGATACGAAGATCACCTACCAGGTGGAAATCCGCTATGCGAACAACACAAACGAGACGTATACCTTC CACGGACCGCCTGGCTCTGAAGAAAACCCTGGTCCGATTAAATTTACAAGGCCGTACTTCGATTGTGGCAGGTCCAA CAAGTGGCTGGTGGCCGCAGTAGTGCCAATTGCGGATATCTACCCCCGACACACGCAGTTCCGTCACATTGAGTATC CCAA----------------------------------------------------------ATACACGGCCGTTTC GGTTCTTGAGATGGACTTCGAGCGTATCGACATAAACCAGTGTCCATTGGGTGAAGGCAACAAAGGACCTAATCACT TTGCGGATACGGCGCGGTGTAAAAAAGAAACGACAGAGTGTGAACCATTACAAGGCTGGGGCTTTAGGCGCGGTGGC TACCAGTGCCGTTGTAAGCCAGGTTTTCGGCTGCCCAACGTAGTGCGGCGACCTTATCTGGGCGAGATTGTGGAGCG CCCATCGGCAGAACAGTACTACAACGAGTACGACTGCCCTAAGATTGGCT--------------------------- \---------------------------------GGAT LD16802.5prime CGCCCATCACCCGGA CTCCATATCCAAGGACTACACCCACGACCTGTATAAAATAAACGAGTGGTACCGCGCCTGGCTACCAGACAACGTCG AGGGACGGCACGATACGAAGATCACCTACCAGGTGGAAATCCGCTATG-GAACAACACAAACGAGACGTATACCTTC CACGGACCGCCTGGCTCTGAAGAAAACCCTGGTCCGATTAAATTTACAAGGCCGTACTTCGATTGTGGCAGGTCCAA CAAGTGGCTGGTGGCCGCAGTAGTGCCAATTGCGGATATCTACCCCCGACACACGCAGTTCCGTCACATTGAGTATC CCAA----------------------------------------------------------ATACACGGCCGTTTC GGTTCTTGAGATGGACTTCGAGCGTATCGACATAAACCAGTGTCCATTGGGTGAAGGCAACAAAGGACCTAATCACT TTGCGGATACGGCGCGGTGTAAAAAAGAAACGACAGAGTGTGA HL02444.5prime CK00408.5prime GH23994.5prime Translation M F P S S I L G R S Y L L F M L V L A V G V F A Q H E W Q A R D A F D E I K R Q F D K V N A D N C P I Q H H S D L F M P M D A V S H K P D I K E I N V N P V F P N R T A L L H L Q N M A L S R S F F W S Y I L Q S R F I R P A I N D T Y D P G M M Y Y F L S T V A D V S A N P H I N A S A V Y F S P N S S Y S S S Y R G F F N K T F P R F G P R T F R L D D F N D P I H L Q K I S T W N T F D V Q D L G A H H P D S I S K D Y T H D L Y K I N E W Y R A W L P D N V E G R H D T K I T Y Q V E I R Y A N N T N E T Y T F H G P P G S E E N P G P I K F T R P Y F D C G R S N K W L V A A V V P I A D I Y P R H T Q F R H I E Y P K----------------------------------2----------------------- Y T A V S V L E M D F E R I D I N Q C P L G E G N K G P N H F A D T A R C K K E T T E C E P L Q G W G F R R G G Y Q C R C K P G F R L P N V V R R P Y L G E I V E R A S A E Q Y Y N E Y D C L K I G \--------------------------- \-------1-------------------------W I Q K L P I Q W D K A S Y H I R Q K Y L D R H P E Y R N Y T T G S R S L H A E H L N I D Q A L K Y I H G V N Y R T C K N----------------- \---------------------2---------------------------------- F H P Q D L I L R G D V S F G A K E Q F E N E A K M A V R L A N F I S A F L Q \-----------------------0--------------------------------- \---------0------------------------------------------0------------------------ \------------------0------------------------------------------0---- fly MFPSSILGRSYLLFMLVLAVGVFAQHEWQARDAFDEIKRQFDKVNADNCPIQHHSDLFMPMDAVSHKPDIKEINVNP VFPNRTALLHLQNMALSRSFFWSYILQSRFIRPAINDTYDPGMMYYFLSTVADVSANPHINASAVYFSPNSSYSSSY RGFFNKTFPRFGPRTFRLDDFNDPIHLQKISTWNTFDVQDLGAHHPDSISKDYTHDLYKINEWYRAWLPDNVEGRHD TKITYQVEIRYANNTNETYTFHGPPGSEENPGPIKFTRPYFDCGRSNKWLVAAVVPIADIYPRHTQFRHIEYPKYTA VSVLEMDFERIDINQCPLGEGNKGPNHFADTARCKKETTECEPLQGWGFRRGGYQCRCKPGFRLPNVVRRPYLGEIV ERASAEQYYNEYDCLKIGWIQKLPIQWDKASYHIRQKYLDRHPEYRNYTTGSRSLHAEHLNIDQALKYIHGVNYRTC KNFHPQDLILRGDVSFGAKEQFENEAKMAVRLANFISAFLQSMQTITRISSLQVSDPNEVYSGKRVADKPLTEDQMI GETLAIVLGDSKVWSATMLWERNKFTNRTYFAPYAYKTELNTRKFKVEDLARLNKTHELYTEKKYFKFLKQRWNTNF DDLETFYMKIKIRHNETGEYQQKYEHYPNSYRAANIKHGYWTQPQFDCDGYVKKWLVTYAVPFFGWDSLKVKLEFKG VVAVSMDMLQLDINQCPDWYYEPNAFKNTHKCDEQSSYCVPIMGRGYETGGYKCECLQGYEYPFEDLITYYDGQLVE AEYQNIVADVETRYDMFKCRLAGASGLQSALGLVVALIGLTLTLLYRFS honeybee MEFINFKIYTIVIISLYCLGVVGQYEWQARDAFDEIRLKMDKINEENCPIQHIGDLYLPEDTVSHLPDIKDININPV FPNRTALLHLHNMALSRSFFWSYILQSRFIRPAINDTYDPGMMYYFLSTVADVSANSHINASAIYFSPNMSYSSSYR GFFNKTMPRFAPRTFRADDLMILYI-RKNIHRNTFNVQDLGAFASGSLGEDYTTDYYRINE Amazing thing is that I can continue this gene for 3500bp with perfectly predicted introns and lovely exons, encoding over 800aa! Possible TM domain at end, if real. The only similar protein in NR is CG18679, which is 179aa and has a 5C5G region that matches twice in this one, presumably a extra-cellular disulfide-bonded domain. Also TBLASTN matches in 75-80% range for an A. gambiae and B. mori EST each. So is insect specific \- yet quite conserved! And now find multiple ESTs confirming beginning and end and several introns. At end, seems there may be another gene further on this strand, with some ORFs and good intron splices; But there are ESTs for genes on opposite strand, although could be within intron of this gene (other was around, see below) The honeybee EST translation could also be continued in alignment with two frameshifts, so indeed is simply N-terminus of this long gene. \****************A. mellifera The remaining 3kb at the end of this gene before the next annotation is strange. As noted above, one might be able to construct a gene going in the same RC direction as our new one, but there are about 30 Drosophila ESTs from all tissues to a single exon on the forward strand, but it is a region full of stop codons! Could it be an interesting RNA-coding gene \- No, is simply a spliced transcript for start of the next gene! These are some of these cDNAs, the AT ones are the new adult testes ESTs, with linkers attached! genomic, now forward caggaaggaacatttcagtattacaacatcaaccattctgaaattgttaaaa ttctaaaaggataaaaaaaatcatagtccaaattggaaattattcttgatatttcgtggatagaaagccgattgtga gccgttgaatagcgcgaacctattcaagacgagccaagcgatcgagttatcgcgaatatatataagatactaatact attggaggagaatttacgccgctcgacgattagacgggcgacgtgaatcgttttggagttttcaagaccttttgtaa tttgttttgttctctctaaagtatcacaaattgtgatatcatttagcacttttataatttctggaaaattcaagcaa cggattttgatctttgacctgtgcccttcgattgtaatacattcaaattgtaaagcgtgaagaaaacccacatattg acaaggatcagttcttttggaagcaccgaaaactaacgtctcaactaacgtcagaaacactcgcatgcaaaatgaat aagtttggtaagtcaatggggtttttactacaaaatcatatatttgaacattgtatatatggatggattgctttaaa attataagacacttaaattctagacattagactactcaagcaggttgtaaaagttattcgattttgtctcaaggcac ttatcactaattcagatatgtagatacataatacaaagtagactactcattggacggatgtttttgatatagtctgt tgtgttacaaatattcagtttaaggcacaatttacacattcgatttcttctcattgctttgtactgatctacaaata aattgcggaatgttcaggggggcaagacttccagaaacaaaaccaaaagcggacacggccagccactcgaacgtgtt ttagcagacgcgggcatttttccaaatggaaatggaggagttgcccgaatgctgagacagttactgcccaccgctgc tgccctgaaatgactatgaaaaatgtgtgaaaaagattttttctgcccctgtccagctacctatggctataaagttt taatgaaaaaagctggagatttctttttgccctggtagaagaccaagtggctgctaaactggttcctgcagcgcata gaaaaagttctcagggtacagataataaaaattcaagcgcatatgataatcaaggcgcaaaaaacaaaaagtggaaa aaacgccgcagcggcagcagcatcgacatacatatttaattcagcaaaaaaaatcgcagccaacagaccatcgacga tttaatataagaaaaatacgacggcaggcgttggattttttgtggcatccgttggtcggaaaaaaggtgtgtgtgcg ggcacaaacaaccctcagctaggacctggacgacctccccgatgggtgtaggtacgcctgggcttaactgggttccg atgttaacaggtttgcgatcgccgcacatacgcacacacagctgtgcgacttcacggacattagagaggaaggatct tccgaagaagaaaaatacgagctacacggcatttccgtaatctgagcgcagtaggcgcggctgtttgcgcttttctg acgatgctgctgcttttgcttcggctgctgctgctgctgcttggtcttctgccgcttttgatagaaatgacaaataa ccagattcattgtaatagattatgtctacaacttaatcgccttgcagat AT25734.5prime GGCACGAGG-CAGGAAGGAACATTTCAGTATTACAACATCAACCATTCTGAAATTGTTAAAA TTCTAAAAGGATAAAAAAAATCATAGTCCAAATTGGAAATTATTCTTGATATTTCGTGGATAGAAAGCCGATTGTGA GCCGTTGAATAGCGCGAACCTATTCAAGACGAGCCAAGCGATCGAGTTATCGCGAATATATATAAGATACTAATACT ATTGGAGGAGAATTTACGCCGCTCGACGATTAGACGGGCGACGTGAATCGTTTTGGAGTTTTCAAGACCTTTTGTAA TTTGTTTTGTTCTCTCTAAAGTATCACAAATTGTGATATCATTTAGCACTTTTATAATTTCTGGAAAATTCAAGCAA CGGATTTTGATCTTTGACCTGTGCCCTTCGATTGTAATACATTCAAATTGTAAAGCGTGAAGAAAACCCACATATTG ACAAGGATCAGTTCTTTTGGAAGCACCGAAAACTAACGTCTCAACTAACGTCGGAAACACTCGCATGCAAAATGAAT AAGTTTG------------------------TTTCCCACAACCTTGGTCGCGATCCATATCGGACCTTCAGCCCAGA AATGTACCCGTTATCAAGCCCATTGGGACCGCATGGAACTGAAATGGCGGAAGGTAATGGCGAACTGTTGGATGACA TTAACCAGAAAGCCGATGACCGTGGCGATGGCGAGCGTACAGAGGATTATCCCAAGCTGCTGGAATACGGTCTGGAC AAGAAGGTCGCCGGCAAACTGGATGAGATCTAC AT04521.5prime GGCACGAGG--------------------ATTACAACATCAACCATTCTGAAATTGTTAAAA TTCTAAAAGGATAAAAAAAATCATAGTCCAAATTGGAAATTATTCTTGATATTTCGTGGATAGAAAGCCGATTGTGA GCCGTTGAATAGCGCGAACCTATTCAAGACGAGCCAAGCGATCGAGTTATCGCGAATATATATAAGATACTAATACT ATTGGAGGAGAATTTACGCCGCTCGACGATTAGACGGGCGACGTGAATCGTTTTGGAGTTTTCAAGACCTTTTGTAA TTTGTTTTGTTCTCTCTAAAGTATCACAAATTGTGATATCATTTAGCACTTTTATAATTTCTGGAAAATTCAAGCAA CGGATTTTGATCTTTGACCTGTGCCCTTCGATTGTAATACATTCAAATTGTAAAGCGTGAAGAAAACCCACATATTG ACAAGGATCAGTTCTTTTGGAAGCACCGAAAACTAACGTCTCAACTAACGTCAGAAACACTCGCATGCAAAATGAAT AAGTTTGTTTCCCACAACCTTGGTCGCGATCCATATCGGACCTTCAGCCCAGAAATGTACCCGTTATCAAGCCCATT GGGACCGCATGGAACTGAAATGGCGGAAGGTAATGGCGAACTGTTGGATGACATTAACCAGAAAGCCGATGACCGTG GCGATGGCGAGCGTACAGAGGATTATCCCAAGCTGCTGGAATACGGTCTGGACAAG LP04990.5prime ATCAACCATTCTGAAATTGTTAAAA TTCTAAAAGGATAAAAAAAATCATAGTCCAAATTGGAAATTATTCTTGATATTTCGTGGATAGAAAGCCGATTGTGA GCCGTTGAATAGCGCGAACCTATTCAAGACGAGCCAAGCGATCGAGTTATCGCGAATATATATAAGATACTAATACT ATTGGAGGAGAATTTACGCCGCTCGACGATTAGACGGGCGACGTGAATCGTTTTGGAGTTTTCAAGACCTTTTGTAA TTTGTTTTGTTCTCTCTAAAGTATCACAAATTGTGATATCATTTAGCACTTTTATAATTTCTGGAAAATTCAAGCAA CGGATTTTGATCTTTGACCTGTGCCCTTCGATTGTAATACATTCAAATTGTAAAGCGTGAAGAAAACCCACATATTG ACAAGGATCAGTTCTTTTGGAAGCACCGAAAACTAACGTCTCAACTAACGTCAGAAACACTCGCATGCAAAATGAAT AAGTTTG GH18064.5prime CAACCATTCTGAAATTGTTAAAA TTCTAAAAGGATAAAAAAAATCATAGTCCAAATTGGAAATTATTCTTGATATTTCGTGGATAGAAAGCCGATTGTGA GCCGTTGAATAGCGCGAACCTATTCAAGACGAGCCAAGCGATCGAGTTATCGCGAATATATATAAGATACTAATACT ATTGGAGGAGAATTTACGCCGCTCGACGATTAGACGGGCGACGTGAATCGTTTTGGAGTTTTCAAGACCTTTTGTAA TTTGTTTTGTTCTCTCTAAAGTATCACAAATTGTGATATCATTTAGCACTTTTATAATTTCTGGAAAATTCAAGCAA CGGATTTTGATCTTTGACCTGTGCCCTTCGATTGTAATACATTCAAATTGTAAAGCGTGAAGAAAACCCACATATTG ACAAGGATCAGTTCTTTTGGAAGCACCGAAAACTAACGTCTCAACTAACGTCAGAAACACTCGCATGCAAAATGAAT AAGTTTGTTTCCCACAACCTTGGTCGCGATCCATATCGGACCTTCAGCCCAGAAATGTACCCGTTATCAAGCCCATT GGGACCGCATGGAACTGAAATGGCGGAAGGTAATGGCGAACTGTTGGATGACATTTACCAGAAAGCCGATGACCGT AT16285.5prime GGCACGAGG-----------------------------------ATTCTGAAATTGTTAAAA TTCTAAAAGGATAAAAAAAATCATAGTCCAAATTGGAAATTATTCTTGATATTTCGTGGATAGAAAGCCGATTGTGA GCCGTTGAATAGCGCGAACCTATTCAAGACGAGCCAAGCGATCGAGTTATCGCGAATATATATAAGATACTAATACT ATTGGAGGAGAATTTACGCCGCTCGACGATTAGACGGGCGACGTGAATCGTTTTGGAGTTTTCAAGACCTTTTGTAA TTTGTTTTGTTCTCTCTAAAGTATCACAAATTGTGATATCATTTAGCACTTTTATAATTTCTGGAAAATTCAAGCAA CGGATTTTGATCTTTGACCTGTGCCCTTCGATTGTAATACATTCAAATTGTAAAGCGTGAAGAAAACCCACATATTG ACAAGGATCAGTTCTTTTGGAAGCACCGAAAACTAACGTCTCAACTAACGTCAGAAACACTCGCATGCAAAATGAAT AAGTTTGTTTCCCACAACCTTGGTCGCGATCCATATCGGACCT So, figured it out, these start before our gene, and jump over it into the start of CG17838, so ours is in the first long 5'UTR intron of CG17838! Has tons of ESTs overlapping, so this annotation needs fixing. So I scan this 30kb region from our gene to CG17838 for anything else, beside one short exon in the middle that belongs to CG17838, and find some ESTs, but none coding. I can't believe there are not genes in here. There are several 500bp ORFs! See below for one! \***************extra2 GOOD GRIEF! in this 30kb region BLASTX reveals about in the middle of it a relative of the 300aa N-terminus of TGF-beta activated-kinase 1 homolog <up>Drosophila or CG18492 AE003732.2 aaataaaaattaatttgatttgcggggaagtcacaaagaataatattaagca tttttgataagttggaattgggtgaatgacggaattatttcttatcgcgtaccgataaggtggttttatctcactta actagcacttaatcacaactttcattgcaattcagttcacaaattgcacttcaaagcgatcgcacgtattttgctag agATGGTCAAGCAAGTGGATTTTGCGGAGGTGAAGCTCAGTGAGgtaggttttacttgaaatattgttaaggattca atgagacccccttttatgcttagAAATTTCTCGGAGCTGGATCTGGTGGAGCGGTGCGCAAAGCCACCTTTCAAAAT CAGGAGATTGCAGTAAAGATATTTGATTTCCTTGAGGAAACAATCAAAAAGAATGCAGAGAGGGAAATCACACATTT GTCGGAGATCGACCACGAAAACGTTATCAGGGTGATCGGGAGGGCCAGCAATGGAAAGAAGGACTACTTGTTGATGG AGTACCTGGAGGAGGGGTCCCTCCACAACTACCTCTATGGCGATGACAAGTGGGAGTACACCGTGGAGCAAGCGGTT CGCTGGGCACTCCAATGCGCCAAGgtaaagtgcaagatcgcctttccccacaatcagatacattttcggtgttttag GCCTTAGCATACTTGCATTCGTTGGATCGACCGATTGTTCACCGCGATATTAAGCCGCAAAACATGCTTTTATATAA TCAGCATGAAGACTTAAAGATTTGTGACTTTGGCCTGGCGACGGATATGTCCAATAATAAGACCGATATGCAAGGAA CATTGAGGTATATGGCTCCCGAGGCCATTAAGCACTTAAAGTATACGGCTAAGTGTGATGTGTACAGCTTTGGAATA ATGCTCTGGGAGCTGATGACACGTCAATTGCCATATAGTCACTTGGAAAACCCCAACAGCCAGTACGCCATTATGAA AGCTATCAGTTCAGgtaattattattatatacttattttaaatcataattccttaaatttacgttaattattataaa gaattattatattgtaagtatactccgtcccgatgaacttgagatctcggtacccaagagctgaaaggtgatatgca gattcctatgtcgtgcaagtttgtttgcgacgttcattataaaaatgtgtcaaaaaatattcgattttagaatagga atgatatttcttcaaatatagaaatatcaaattactaattttatttaaataaatttctgtttaatttcttttaaata tatatatatatacatatacatgaatacatatacatatgaatattttagGCGAAAAACTTCCAATGGAAGCAGTAAGA TCCGATTGCCCAGAGGGTATCAAGCAATTAATGGAATGTTGCATGGATATAAATCCCGAAAAGCGCCCCTCTATGAA GGAGATCGAAAAGTTCCTTGGCGAACAGTATGAATCCGGCACTGACGAGGACTTTATCAAGCCTTTGGATGAGGATA CCGTGGCTGTGGTGACCTACCATGTGGATTCGTCCGGCAGCAGGATAATGCGTGTTGATTTCTGGCGACATCAGTTG CCATCGATCCGCATGACTTTTCCGATAGTGAAACGGGAAGCCGAAAGATTGGGAAAGACCGTTGTCAGAGAAATGGC CAAGGCGGCGGCGGATGGAGATCGGGAAGTTCGGCGGGCTGAGAAGGACACGGAGCGTGAAACCTCGAGGGCTGCCC ACAATGGAGAGCGGGAAACGCGGAGAGCGGGTCAGGATGTGGGTCGTGAAACTGTACGGGCGGTCAAGAAAATAGGA AAGAAACTGCGCTTCTAACCAGAAATA translation M V K Q V D F A E V K L S E \------------------------------0----- \-------------------- K F L G A G S G G A V R K A T F Q N Q E I A V K I F D F L E E T I K K N A E R E I T H L S E I D H E N V I R V I G R A S N G K K D Y L L M E Y L E E G S L H N Y L Y G D D K W E Y T V E Q A V R W A L Q C A K \---------------------------------0------------------- A L A Y L H S L D R P I V H R D I K P Q N M L L Y N Q H E D L K I C D F G L A T D M S N N K T D M Q G T L R Y M A P E A I K H L K Y T A K C D V Y S F G I M L W E L M T R Q L P Y S H L E N P N S Q Y A I M K A I S S \----------1--------------------------1--------------------------1- \-------------------------1--------------------------1------------------------ \--1--------------------------1--------------------------1-------------------- \------1--------------------------1--------------------------1---------------- \----------1--------------------------1-------G E K L P M E A V R S D C P E G I K Q L M E C C M D I N P E K R P S M K E I E K F L G E Q Y E S G T D E D F I K P L D E D T V A V V T Y H V D S S G S R I M R V D F W R H Q L P S I R M T F P I V K R E A E R L G K T V V R E M A K A A A D G D R E V R R A E K D T E R E T S R A A H N G E R E T R R A G Q D V G R E T V R A V K K I G K K L R F \* So is a neat little kinase gene, with no ESTs, and only the N-terminal kinase domain having matches! So this is in the second intron of CG17838! Are there more genes in here or intron 1? I could do this endlessly, but without BLASTX or ESTs to guide it is useless. \***************A. mellifera BB260023B20H4.F AGTACGGGAGGACGTGCAAGGACATCGGTTGC ATGAGGGATGAGGTCTGCGTGATGGCCGAGGATCCTTGTTCGATCTACCAACGAGATAACTGCGGTCGTTATCCGAC TTGTATGAAATCTCGTCCAGGCGAGGCTAATTGTGCCAGCACTCTGTGCGGTGAAAACGAATACTGCAAAACCGAGA ATGGCGTCCCAACATGTGTGAAGAAATCAGCAGTAAATGATGACGGATTGTTCGCAGAGTTCGATGGAACAAGCAAC AGCTTATTGAGAAAAAGACGGGGGACCGGCGATGAAGCGGATTCCACATCGGATGACACGCAATCGGTTAAGACGAT GGACTCTCGATACCCATCCGGAAGTGGATATCCGTCCTCCGAAACTCGACCAAAAGCTGACACCTCGGTCAAATCAT CCGGTTACCCATCCAGTTCCGGTTATCCATCGAACTCTGGTTATCCCTCCACCGGCTCATCCGGTTATCCATCCAGT TCTGGTTACCCATCCAGATCGTCCGGATATCCTTCAGAATCGGCAAGTTCCGGTTACCCGTCGAGAAGCTCTGGCTA TCCTTCAGAGGCTGGATATCCGTCGAGAAGCTCGTCCGGTTATCCATCACAATCTGAATATCCTTCGAAAAGCTCTT CAGGTTATCCATCAGAGTCTGGCTACCCTTCGAGAAGCTCTTATCCATCAGGATCTAGTTATCCCAGTGGCTCATAT CCTTCATCAAGTTCAAGGGGATATCCATCGACCTCCGTCNATGAGGCAAGTGTGAAAAGGGTGGATGCAAACTCTTT AATGCTG long ORF encodes wacky protein with SSGRY repeats at end; weak BLASTX matchs to long Drosophila proteins; nothing better in nr; genomic match is weak, but to the least wacky part of the protein, and then there are many ESTs, and two good B.mori ESTs to this region, so I think it is a real protein; NEW INSECT GENE GH20482.5prime TGAAGGTGTCAGTTGGGCTCCCGACGGTTTAATTTTTAGCTCCCACAACGAA CGCAGTTCTCAGCGATTTTGACGCAAATTATAAACGTCAGCAACTTTTGATCCAAGGGCGTGACATCGATAGCGAGC AACAGGATGCTGGCACTGCCTTTGCTGACTCTGGCGGTCCTCGCCAGCTGCGGCTACTCCGTGGACGCCTACTCCAA GTACGGCCGTGGATGCGGGGACATTGGCTGCCTGCCCACCGAGGAGTGCGTCATCACCAGCGACTCGTGCAGCTACA ACCAGCGTGACGGCAAGGATTGCGGCAACTATCCCACCTGCAAACGGCGCTCCGGCGGAGGATCATCCGCCTCGAAC AGCAGCCCCAACTTGGCAGCCCCCTCGGCCAATCCGTCAGAGGTGAGCCATAACGCATACGCCCCGAATGCCCCAAG TGCCCCGAGTGCCCCGTTGCCGGAAGCGGATGCGAGCGGTGGTGCTGGCTACGGTGGTGCTGCGGGCGGTGGTGGAA GCGGCGGATATGGTG GH16618.5prime TGAAGGTGTCAGTTGGGCTCCCGACGGTTTAATTTTTAGCTCCCACAACGAA CGCAGTTCTCAGCGATTTTGACGCAAATTATAAACGTCAGCAACTTTTGATCCAAGGGCGTGACATCGATAGCGAGC AACAGGATGCTGGCACTGCCTTTGCTGACTCTGGCGGTCCTCGCCAGCTGCGGCTACTCCGTGGACGCCTACTCCAA GTACGGCCGTGGATGCGGGGACATTGGCTGCCTGCCCACCGAGGAGTGCGTCATCACCAGCGACTCGTGCAGCTACA ACCAGCGTGACGGCAAGGATTGCGGCAACTATCCCACCTGCAAACGGCGCTCCGGCGGAGGATCATCCGCCTCGAAC AGCAGCCCCAACTTGGCAGCCCCCTCGGCCAATCCGTCAGAGGTGAGCCATAACGCATACGCCCCGAATGCCCCAAG TGCCCCGAGTGCCCCGTTGCCGGAAGCGGATGCGAGCGGTGGTGCTGGCTACGGTGGTGCTGCGGGCGGTGGTGGAA GCGGCGGATATGGTGGTGGTTTCTCGGCTGGCGGCCACTCCCTGTACCCCAGCCTACCCAACTCAAACGGCGGCGGC GH10109.5prime AAGGTGTCAGTTGGGCTCCCGACGGTTTAATTTTTAGCTCCCACAACGAA CGCAGTTCTCAGCGATTTTGACGCAAATTATAAACGTCAGCAACTTTTGATCCAAGGGCGTGACATCGATAGCGAGC AACAGGATGCTGGCACTGCCTTTGCTGACTCTGGCGGTCCTCGCCAGCTGCGGCTACTCCGTGGACGCCTACTCCAA GTACGGCCGTGGATGCGGGGACATTGGCTGCCTGCCCACCGAGGAGTGCGTCATCACCAGCGACTCGTGCAGCTACA ACCAGCGTGACGGCAAGGATTGCGGCAACTATCCCACCTGCAAACGGCGCCCCGGCGGAGGATCATCCGCCTCGAAC AGCAGCCCCAACTTGGCAGCCCCCTCGGCCAATCCGTCAGAGGTGAGCCATAACGCATACGCCCCGAATGCCCCAAG TGCCCCGAGTGCCCCGTTGCCGGAAGCGGATGCGAGCGGTGGTGCTGGCTACGGTGGTGCTGCGGGCGGTGGTGGAA GCGGCGGATATGGTGGTGGTTTCTCGGCTGGCGGCCACTCCCTGTACCCCAGCCTACCCAACTCAAACGGCGGCTGC GGTGGTGCGGCTCCCTACAATCCATATGGCAATGGTGGCGGA This is outrageous! This transcript, and there are many more ESTs to confirm it, starts with 90bp 52994bp upstream in the next segment of the scaffold, jumps several genes, then has a series of exons with introns with more genes in them! I can't possibly put it all together! The C-terminus appears to be CG1726, however, so is not a new gene. fly MLALPLLTLAVLASCGYSVDAYSKYGRGCGDIGCLPTEECVITSDSCSYNQRDGKDCGNYPTCK-RRSGGGSSASNS SPNLAAPSANPSEVSHNAYAPNAPSAPSAPLPEADASGGAGYGGAAGGGGSGGYGGGFSAGGHSLYPSLPNSNGGG bee YGRTCKDIGCMRDEVCVMAEDPCSIYQRD--NCGRYPTCMKSRPGEANCASTLCGENEYCKTENGVPTCVKKSAVN DDGLFAEFDGTSNSLLRKRRGTGDEADSTSDDTQSVKTMDSRYPSGSGYPSSETRPKADTSVKSSGYPSSSGYPSNS GYPSTGSSGYPSSSGYPSRSSGYPSESASSGYPSRSSGYPSEAGYPSRSSSGYPSQSEYPSKSSSGYPSESGYPSRS SYPSGSSYPSGSYPSSSSRGYPSTSVMRQV So fly protein is a little longer at the N-terminus, that is, bee cDNA is truncated. Fly protein looks secreted. \***************A. mellifera BB270004B10G5.F GTCCCCCGCCGATTGATTTGGCGCGCGCGTGA CGGGAAAGAGGAACACGGGACTTTGGAGGTATTCCAAACGGCGGATATACACATCGTGGACGATGAAGTGGTGCTTA GTGATGCTGATTCTGGCCGGTGTCACGAGGGCCGACAATTCGGTCGACGCGGACTATTCGATCCTCAAGTGTCCGGA CCTGAATTCCCAAGAGGAGATCGATTTGAACGAGATAATGGGCAAGTGGTACGTGGTCGAGGTGTTGGAGCACAAAG TCGATCCATCGAAGCCCAACGGCTCGTACAAGGTCAATTCCTGCCCGATCGTCAAGCTGAGAGCGGTCGAGAACACG TCCAAGTACCTCTCCTCGTTGAGGCTGTTGTGGACCGAGGAGATCGGCGACCTCGAGTACACTTTCCGGATACCGGA CGTATCCAGGAAGCCGGGCTTTTGGATCTCCACCTCTGTGCAAAATGGCACACTGGTGGAGAGGGGGTACAAGCAAT TCAGCGGGAACGTGCACGTGATGAAGGCCGTCGCCTCGGACATGGTGCTGACATTCTGTTCCCGGAACCCGGACAAT CAGCTGTACTCGTTGCTACTCTCGCGGGAGCACATCTTGCAAAAGAGCGACAAGCGAGGGGTGCACAATCTGCTCGG CCGCCGCGGCCTCAAGATCGTCAATATCCGGG Long ORF encodes 200aa 13% leucine N-terminus; weak full-length match to insecticyanin A \- tobacco hornworm; BLASTX 22% over 197aa; p=3.2!; this is the genomic match, and one EST, and it is unannotated; NEW GENE. AE003553.2 TTCATCTGGAACAGACCGCATTCAGTGGCTCTTATCGGTAGAAACAGCAGCA CTTTTCCGAGATGTCTATCAAATCCTTGACATACGTTGCGATCTTTGGCCTTTTTTGGGGCTCAATTGCGGGAACTG TAGTTGATCAGTTTGGGATATATGGTGGTTCACCGATTACCACCACGGAAAGGAGTAATGCGGAGTTGCGCTGCATG AACATCAATCCGCAGAACTCGGTGGACTTGGAGCAGgtacgatatgataagataatatccttgagtgacaggaaata ttagctacacccaagatgaaaatgtaagctcatttgaatgcggagagctaatgcaattatggtcgatttgttctaat taaccctttgcgccctgagtgcgcccaatgtcaatgctgtacagatctgagccctgtttatttagttattttttttt ttcattgtgggcatttaattgaacgcacagcgagtggcatgcaaatagcacaaatgacccccattcgctcctacatg agtgtgtaataatgcccaacctctctgcatatatagATGATGGGACTCTGGTACGGCAGCGAGATTATCGTGCACAG CCAAGATTTTCCGGGCACCTACGAGTACGACTCATGTGTCATCATTCATCTGACCGATGCCACGGATCAGgtcagtt gccatcgaaattcaaaacatttagatactgattttaatcatatgaaatattacagATCCGTTTGAGCCAAGCAAATC GCGGCTATGGCTATGGAAATCAGGACTACAACCGTAACCAGAATAACTATGGACGCACCACCACCACTCAATCCTCC TATCCGGATAGCGATGAGTACCCGTTGAGATCGATTCAAAGCCAGCAGAAGTACCTACGTTTGATTTGGAGTGAGCG TGATAACAATCTGGAGTATACTTTCAACTATACCACCAGTGCACCTGGTCAGTGGTCCAACATCGGCGATCAGCGGG GATCCTTGGTCACCCTGAACACGTACACCCAGTTCACGGGCACTGTCCAGGTGGTGAAAGCGGTCAACGATCACCTG GTGCTGACCTTCTGCGGCAACGATGTTAAGAGCTCCATATACACAGTGGTTCTCACCCGCAATCGCCTTGGTCTCAG TTTAGATgtgagtaaagtagtcttctttcaatttcttttccagaaagttaaatctttgggtaattgtagtttgatta agtgtaaagaacactttttatttatctttaaatcagacctttttatttctcgtaaaatttgtttgtttcttaacaaa gcttaatatttgtttacaagctgcgtcaagtaataatatgcaaataatttttttagtaaatactgatagtgatgagc aaatctatatttgagaggtaaaaagaggttacattgtttacaattttacgtatatgctggtaacaaataagaatgag gattgtaggaatcgtatatgtatatattaaacctaattaatcattttttccaattttccagGAGCTGCGTAGCATCA GGAATCTGCTTTCCCGCCGTGGACTCTACACGGAGACCATTCGCAAGGTTTGCAATGGATGTGGGCGATTGGGTGGC AGCCTCTTCGCTCTTTTAGCCCTTTTGCTGGTCGTACGTTTGGCCTGGGGGCGTGGCCAGTGAGCTGGAGGGGATCG TCAGAGTGTCGAAGTACAGCCggcgtatttaatgagtccagccatattacgttaatttatgtataattttcccataa gcaatacacaagcgagatcgccgagtgctctccccgaaaactaactcacattgcctccgttttaattgctcgtcttg tcaattaaagtcaattacgaataaggcagggctgacttaagtgggcattagccgttggctagttgtaggcgttaagt gtttgcttaattcaattagcagggatctccacccgctcattggaatttcggtacaactaatgtcggaaaaaattcgg ttgatattgctgcaacgttcgcttcgtatggatctgcagaccccaaaagtaggcaacaaaaagtgtgctgctgaaat tgatacatataaattactcatacgccatggtgacacagtcacacacatcgaaacccaatgccacttgcctctgtctg attacgttgttaacatttcgtttttttttacactatcagcactaaccgaaaaacgagcagatgatc GH25183.5prime CTGGAACAGACCGCATTCAGTGGCTCTTATCGGTAGAAACAGCAGCA CTTTTCCGAGATGTCTATCAAATCCTTGACATACGTTGCGATCTTTGGCCTTTTTTGGGGCTCAATTGCGGGAACTG TAGTTGATCAGTTTGGGATATATGGTGGTTCACCGATTACCACCACGGAAAGGAGTAATGCGGAGTTGCGCTGCATG AACATCAATCCGCAGAACTCGGTGGACTTGGAGCAG---------------------0------------------- \---------------------------0----------------------------------------------0-- \--------------------------------------------0-------------------------------- \--------------0----------------------------------------------0--------------- \-------------------------------0----ATGATGGGACTCTGGTACGGCAGCGAGATTATCGTGCACAG CCAAGATTTTCCGGGCACCTACGAGTACGACTCATGTGTCATCATTCATCTGACCGATGCCACGGATCAG------- \---------------------------0---------------------------ATCCGTTTGAGCCAAGCAAATC GCGGCTATGGCTATGGAAATCAGGACTACAACCGTAACCAGAATAACTATGGACGCACCACCACCACTCAATCCTCC TATCCGGATAGCGATGAGTACCCGTTGAGATCGATTCAAAGCCAGCAGAAGTACCTACGTTTGATTTGGAGTGAGCG TGATAACAATCTGGAGTATACTTTCAACTATACCACCAGTGCACCTGGTCAGTGGTCCAACATCGGCGATCAGCGG translation M S I K S L T Y V A I F G L F W G S I A G T V V D Q F G I Y G G S P I T T T E R S N A E L R C M N I N P Q N S V D L E Q \---------------------0------------------------------ \----------------0----------------------------------------------0------------- \---------------------------------0------------------------------------------- \---0----------------------------------------------0-------------------------- \--------------------0---- M M G L W Y G S E I I V H S Q D F P G T Y E Y D S C V I I H L T D A T D Q \------------------ \----------------0--------------------------- I R L S Q A N R G Y G Y G N Q D Y N R N Q N N Y G R T T T T Q S S Y P D S D E Y P L R S I Q S Q Q K Y L R L I W S E R D N N L E Y T F N Y T T S A P G Q W S N I G D Q R G S L V T L N T Y T Q F T G T V Q V V K A V N D H L V L T F C G N D V K S S I Y T V V L T R N R L G L S L D \---- \--------------0------------------------------------0------------------------- \-----------0------------------------------------0---------------------------- \--------0------------------------------------0------------------------------- \-----0------------------------------------0---------------------------------- \--0------------------------------------0---------- E L R S I R N L L S R R G L Y T E T I R K V C N G C G R L G G S L F A L L A L L L V V R L A W G R G Q Z fly MSIKSLTYVAIFGLFWGSIAGTVVDQFGIYGGSPITTTERSNAELRCMNINPQNSVDLEQMMGLWYGSEIIVHSQDF PGTYEYDSCVIIHLTDATDQIRLSQANRGYGYGNQDYNRNQNNYGRTTTTQSSYPDSDEYPLRSIQSQQKYLRLIWS ERDNNLEYTFNYTTSAPGQWSNIGDQRGSLVTLNTYTQFTGTVQVVKAVNDHLVLTFCGNDVKSSIYTVVLTRNRLG LSLDELRSIRNLLSRRGLYTETIRKVCNGCGRLGGSLFALLALLLVVRLAWGRGQZ bee MKWCLVMLILAGVTRADNSVDADYSILKCPDLNSQEEIDLNEIMGKWYVVEVLEHKVDPSKPNGSYKVNSCPIVKLR AVENTSKYLSSLRLLWTEEIGDLEYTFRIPDVSRKPGFWISTSVQNGTLVERGYKQFSGNVHVMKAVASDMVLTFCS RNPDNQLYSLLLSREHILQKSDKRGVHNLLGRRGLKIVNIR So is a reasonably nice small gene, with several phase 0 introns. Protein at 250aa is a little long, and it doesn't match insecticyanin! Rapidly evolving insect proteins \***************A. mellifera BB270005A10H10.F GCAACGCTGGAAGATAATTTGCAAGTCGAAT CCAGACCAAGTTCCCAAGACGCGATTCAGTTTCTTCATCGGAGGACCACTATGCGGTGTCGTGGCATCGTCGAAGGC TCGCTCGTTCGACGAGGGGAACAGCGCTCGAGAGATTTCCGATAAGTTTGGCCGCGACGAGGATGGAGAGAATCCGC GGTCTGCCGGAACAGAGTTACGGAAATGCGCGCCGCCCCGGAACGCGACGACGACCTCGACGAGGCGACTGGCGTTC GTTTCGAGTTTCAAAGAGATCCCGGAGCGAGGTTCAAGTTCGCGTCGACCGGTGATTCGAGGATAAGGCACGGTTTG ATCGCGGCCCGCGTTGAAAATCGCTGAAAATCGTGAAGGATCGTGACCGATTGGATGGAAGATAGGGACGACGGGAC GATCGTGAAGAAGAGATGCCTGGTATCGTGGTATTCCGACGCCGATGGAGCGTCGGCAGTGACGACCTCGTCGTTCC CGGCGCTTTCCTCTTCATCCTTCATCTGATATGGATGACGGTATTGAGCGTTCTACTAGGGATATTTAAATGGGATT GCAATGTTATGTGCATCCTTCTACTGTGGAGATACATTGTCGGTTATTTGGTAATTTTCGTGATCTCCATGATCGTG GAGTTTTCCATCTGCTTTCTGGCCACTAGGGGTAGCATCCTGG unobvious internal ORF after long 5'UTR has e-05 match to N-terminus of a 680aa C. elegans predicted protein; this is the genomic match, and is to an unannotated region, but just before an annotated gene. No good ESTs, but some for same region to Brugia? Not sure about this one, would be hard to annotate. AE003493.2 ATGCCTGGACTTGTGGTCTTCAGACGTCGCTGGTCTGTGGGCTCTGATGATC TCGTGGTGCCGGGCGCATTTCTCCTGACGATTCATTTTATATGgtaagtagccacgtaccattttacttagtcatcg attattgttcataattctatttcgtgttttttttcttcttctttttcttcaacaacaaaataaaaacaaaaaaacga tgtgatttctatgacagttttgtgattgttagcgtctcgttggttatctttgagtataatacacgaattttaagcgt aaaattattgttctatcatctaataggctacttgttgatactattttgtaagtatactgaatcactttctgtggatt ctgtattgatattgatcctattttgcagtttcaatatgtgtagaaataggtatatgtgtgatctcgATGCGTGGCAG TATTCTGGATGCCGAGGCGCGCACCTCAATCAACATTTGGATATATCTCAAGAGCTGTAAGTTAGCACTAATAGAAT AATCCTATATGTATGTAGTATAC translation M P G L V V F R R R W S V G S D D L V V P G A F L L T I H F I W-------------------2------------------ bee MPGIVVFRRRWSVGSDDLVVPGAFLFILHLIWMTVLSVLLGIFKWDCNVMCILLLWRYIVGYLVIFVISMIV EFSICFLATRGSIL fly MPGLVVFRRRWSVGSDDLVVPGAFLLTIHFIW I can't really figure it out, but I think this is the first exon of gene CG11102 \***************A. mellifera BB270012B20H7.F ATGACACCAAAACCTAAGCAACAAAATATAAC GAATAAATCTAAAGAGAGATCCCCATCTATAGAAAAGCCAAAAGCGGAAGAAAAAGTAAAAATAACTAAAGTATTTG AATTTGCCGGTGAAGAAGTAAAAGTAGAAAAAGAAGTCTCTATAGATTCAGCAGAAGCAAGAATATCTCTATCCTCC GCTGAGAATTCTGAGAAAACAGGAAATTCTGGATCTCTCGCGGGTAGAGGATCTGGAAGAGGTAGAGGTTTCAAACG AGCTGGTTTAGGAGGTATTTCTTCTGTCCTTGGTCAATTAGGGAAGAAGGCGAAAATTAGTACGTTAGAAAAATCCA AACTAGATTGGGATAATTATAAGAAACAAGAGAATTTGGAGGAAGAAATTAGTACTCATAACAAAGGCAAGGATGGA TATTTAGAACGTCAAGATTTCTTACAAAGAGCAGATTTGCGACAATTTGAAATTGAAAAACAATTACGTAATGCAAA CAGACGTAGTACACGGTGAATTTATAATTTTATGTATATATATTTTATATATATATATCTTCCTTAATAATGAGAAC CAGAAATGGTATTGAGAAAAATATCTTATTAGAATTGCCAATTATTGGCGCTTGTAGCACATATTTACTCGAATTGT TTAAACTCTTACTAAATATCTCATGCCATGTATAAAAATGTGATTGCCTCGCTTGGCTTGACATCGGCTGG end of ORF encodes 14%K; 11%E 170aa; 50% match to end of 300aa craniofacial development protein 1 <up>Mus musculus</up> e-33; genomic match is to unannotated short scaffold; one Dros EST; NEW GENE AE003220.2 GACACAATGAACTCACAAAAAGAATACGTATCGGACTGCGAAACCGACGATG ATTATTATGTCGATTTGTTAACTTCAGGCAAGGGCAGTGATAAGAGTGAAAGTGATGTGTCGGACAAGTCTGAAAAT TATCCAGGCCTAAAATCAAAGCATACTGCGAAGGCATTGCGGAAAACAAGGCATTGTGACGGCGATAATAGGGAATA CAGGTCTAAGGAGTGCGACGACCTTCATTCCGAAGAGGAGTCTGAAAAATCGCGGTCGGATGCTTTATGGGCCGATT TTCTTGGCGACATTGATACTAAAAGCGTAATCAACCAAAAAACAGATTATACGGAGGGAAACGCAGCAAGTGCTACC AATACCAATACGCATGAGACTTGTAATAAATATGATAAAAACGATACGGCAATAATAAAAACTGCACAGCAATACGA TTCCAAAAGAACCACGCTTTCAGTTTCCACACTCGGAAAAATTAAACGATCATCCGCTGAAAAGAGTATCGGTACCA TGATAAATAAATTTGAAAAGAAGAAAAAATTGACAGTGCTTGAAAGGTCACAATTGGATTGGAAAATATTTAAACAA GACGAAGGCATAGACGAACTTCTGTGCTCGCATAACAAAGGCAAGGACGGgtgagtttggaagaagaagaagaagag tatttaaatggataaacttaaatttattacccaatgatttagGTATTTGGACCGTCAAGACTTTTTGGAGAGAACCG ATCTTAGGCAGTTTGAAATGGAAAAGAAGTTGCGGCTGTCTCGCAGGCCATACTAACGGCTTAACCAACG GH01620.5prime GACACAATGAACTCACAAAAAGAATACGTATCGGACTGCGAAACCGACGATG ATTATTATGTCGATTTGTTAACTTCAGGCAAGGGCAGTGATAAGAGTGAAAGTGATGTGTCGGACAAGTCTGAAAAT TATCCAGGCCTAAAATCAAAGCATACTGCGAAGGCATTGCGGAAAACAAGGCATTGTGACGGCGATAATAGGGAATA CAGGTCTAAGGAGTGCGACGACCTTCATTCCGAAGAGGAGTCTGAAAAATCGCGGTCGGATGCTTTATGGGCCGATT TTCTTGGCGACATTGATACTAAAAGCGTAATCAACCAAAAAACAGATTATACGGAGGGAAACGCAGCAAGTGCTACC AATACCAAT-CGCATGAGACTTGTAATAAATATGATAAAAACGATACGGCAATAATAAAAACTGCACAGCAATACGA TTCCAAAAGAACCACGCTTTCAGTTTCCACACTCGGAAAAATTAAACGATCATCCGCTGAAAAGAGTATCGGTACCA TGATAAATAAATTTGAAAAGAAGAAAAAATTGACAGTGCTTGAAAGGTCACAATTGGATTGGAAAATATTTAAACAA GACGAAGGCATAGACGAACTTCTGTGCTCGCATAACAAAGGCAAGGACGG--------------------------- \------------------------------------------GTATTTGGACCGTCAAG translation M N S Q K E Y V S D C E T D D D Y Y V D L L T S G K G S D K S E S D V S D K S E N Y P G L K S K H T A K A L R K T R H C D G D N R E Y R S K E C D D L H S E E E S E K S R S D A L W A D F L G D I D T K S V I N Q K T D Y T E G N A A S A T N T N T H E T C N K Y D K N D T A I I K T A Q Q Y D S K R T T L S V S T L G K I K R S S A E K S I G T M I N K F E K K K K L T V L E R S Q L D W K I F K Q D E G I D E L L C S H N K G K D G--------------------------------2-------------------------- \---------- Y L D R Q D F L E R T D L R Q F E M E K K L R L S R R P Y Z fly only 200aa MNSQKEYVSDCETDDDYYVDLLTSGKGSDKSESDVSDKSENYPGLKSKHTAKALRKTRHCDGDNREYRSKECDDLHS EEESEKSRSDALWADFLGDIDTKSVINQKTDYTEGNAASATNTNTHETCNKYDKNDTAIIKTAQQYDSKRTTLSVST LGKIKRSSAEKSIGTMINKFEKKKKLTVLERSQLDWKIFKQDEGIDELLCSHNKGKDGYLDRQDFLERTDLRQFEME KKLRLSRRPYZ bee MTPKPKQQNITNKSKERSPSIEKPKAEEKVKITKVFEFAGEEVKVEKEVSIDSAEARISLSSAENSEKTGNSGSLAG RGSGRGRGFKRAGLGGISSVLGQLGKKAKISTLEKSKLDWDNYKKQENLEEEISTHNKGKDGYLERQDFLQRADLRQ FEIEKQLRNANRRSTRZ There is about 15kb before this in this short contig; several EST matches, but only one is perfect, and then no ORFs, so seems is repetitive DNA. Indeed the sole perfect EST is for RT. \****************A. mellifera BB270013A20H11.F TTTTCCCGGTATGTGCTTTGCCTCGACAAG ATGTGCCACTATTGAACCAACAAAATCTTGGGAATTGACACCATTTTGTGGCCGTTCTACTTGCGTACCTGCTGATG ACAACTCTGGTCGACTTTTCGAACTTGTCGAAGACTGTGGACCACTTCCAAAAGCTAATCCGAAATGCAAACTCTCA GATAAAACTAATAAGACCGCTGCATTCCCTAATTGCTGTCCCATTTTCGAATGCGAAGAAGGAGCAAAACTTGAATA TCCAGAAATTCCAACTTTACCACCACCCACGGAAATTATAGAGACCGAGAAAACTTCAGAAGAAGTTCCGACAAAAG CTTAAATTCTAAAAAAACAGATTATAATCTTTACAAATTAAATTGAAAAATCGATTAAATTGAAACAGAAATTAAAG ATTTATTAATTATAATCTGAAATAATAAATTTAATTAAAAATATATATATATACTTCGTTAAAAAAAATATATTTTT ATCGAAAGTAAAAAAAAAATTTTACTTCTAACGAAAAATGTTATTTCATTCATTATATGTATACTGAAATATATAAA ATATATTTCTTATATTTATGCAATGATACAAATATAAAATTGCAAACTTACATTATATAAATAAATATATGCATACT AGTAAAATCATCAGAAACTTCGGGTATCCGTTCTAAAATATTGAATTTCTTCNATTTCCTAGTCCCGGAACC end of ORF encodes 13% proline; 11% glutamic acid; BLASTX match to Manduca sexta pMsmaD211! 77% and e-35; genomic match is similar and to unannotated region; ESTs from four insects! Clearly NEW INSECT-SPECIFIC YET REASONABLY CONSERVED GENE AE003844.2 RC CTTCATTTAGGCTGGTTAGGTGGTTAATTCCATTTGTCTTCGTTCTTTTGTA TTATTTTTACAAAGCGATAATATTTTAATCGTTTATGATTATTACAATATAACAAAAAGTTAACATCTTTGGAATCT TAAAAATGAGTTTTCATTTTGCTGTACTGACCCTTATTTTAACAGCCTTCACAGTTTCTCTGTGTGCTGAACAAAAA ATTACAAAGAGTGACGCAGGTGAAATACGAATTTTCAAACGTCTTATTCCTGCCGATGTTCTACGAGgtaagtatgg caatcatcagatttagaaattttccattattaaaagttacaagttcaatataagtatatctaaaacggcatgttgtt aaatcgggtgacacgcgtatagttttaagtaacataaaaggtatgggctagtgtaacgcaaaaaaaaaacaacaact aaatatccctctcctttctcaaggtattaattttggccacaaaaggtatcattcagcctatggtgaacatttatcga gtgtttttgcttttgatgtatacgtgatctattatatagtttccacagaaacagcccgaaaattaattggtctgtga gtgtattccaattattaacgtaggttcaatagtgtttcaaagctcgcgttttatctggccttgcggcttgaatattc cctcgcacttcctttcaaaacattttaataactcttcagATTTTCCGGGAATGTGCTTTGCTTCAACTCGATGTGCC ACTGTTGAGCCTGGAAAGTCGTGGGACCTTACTCCATTCTGCGGTCGATCTACTTGTGTTCAAAATGAGGAAAATGA TGCAAAgtaaacaaatttcagttaatatatatttaataaacaaatgcctaatatacattatttatagGCTATTCGAA CTCGTAGAAGACTGCGGCCCATTGCCACTGGCGAATGACAAATGTAAATTGGACACAGAGAAGACTAATAAAACCGC ATCGTTTCCTTATTGCTGCCCCATCTTTACATGTGACCCCGGTGTTAAATTGGAATACCCCGAGATCGGAAAGGATA ATGACAAAAAGAATTCTGAGTGATTCAAAACAAATATATTATGAAAACGTCTGTCAATACAATAAAAACATTTGTTG CTTTAGTCAAAAAGAACATTT LP07557.5prime CTTCATTTAGGCTGGTTAGGTGGTTAATTCCATTTGTCTTCGTTCTTTTGTA TTATTTTTACAAAGCGATAATATTTTAATCGTTTATGATTATTACAATATAACAAAAAGTTAACATCTTTGGAATCT TAAAAATGAGTTTTCATTTTGCTGTACTGACCCTTATTTTAACAGCCTTCACAGTTTCTCTGTGTGCTGAACAAAAA ATTACAAAGAGTGACGCAGGTGAAATACGAATTTTCAAACGTCTTATTCCTGCCGATGTTCTACGAG---------- \--------------1------------------------------------------1------------------- \-----------------------1------------------------------------------1---------- \--------------------------------1------------------------------------------1- \-----------------------------------------1----------------------------------- \-------1------------------------------------------1-------------------------- \----------------1----------------------ATTTTCCGGGAATGTGCTTTGCTTCAACTCGATGTGCC ACTGTTGAGCCTGGAAAGTCGTGGGACCTTACTCCATTCTGCGGTCGATCTACTTGTGTTCAAAATGAGGAAAATGA TGCAAA----------------------------------2--------------------------GCTATTCGAA CTCGTAGAAGACTGCGGCCCATTGCCACTGGCGAATGACAAATGTAAATTGGACACAGAGAAGACTAATAAAACCGC ATCGTTTCCTTATTGCTGCCCCATCTTTACATGTGACCCCGGTGTTAAATTGGAATACCCCGAGATCGGAAAGGATA ATGACAAAAAGAATTCTGAGTGATTCAAAACAAATATATTATGAAAACGTCTGTCAATACAATAAAAACATTT GH25016.5prime ATTCCATTTGTCTTCGTTCTTTTGTA TTATTTTTACAAAGCGATAATATTTTAATCGTTTATGATTATTACAATATAACAAAAAGTTAACATCTTTGGAATCT TAAAAATGAGTTTTCATTTTGCTGTACTGACCCTTATTTTAACAGCCTTCACAGTTTCTCTGTGTGCTGAACAAAAA ATTACAAAGAGTGACGCAGGTGAAATACGAATTTTCAAACGTCTTATTCCTGCCGATGTTCTACGAG---------- \--------------1------------------------------------------1------------------- \-----------------------1------------------------------------------1---------- \--------------------------------1------------------------------------------1- \-----------------------------------------1----------------------------------- \-------1------------------------------------------1-------------------------- \----------------1----------------------ATTTTCCGGGAATGTGCTTTGCTTCAACTCGATGTGCC ACTGTTGAGCCTGGAAAGTCGTGGGACCTTACTCCATTCTGCGGTCGATCTACTTGTGTTCAAAATGAGGAAAATGA TGCAAA----------------------------------2--------------------------GCTATTCGAA CTCGTAGAAGACTGCGGCCCATTGCCACTGGCGAATGACAAATGTAAATTGGACACAGAGAAGACTAATAAAACCGC ATCGTTTCCTTATTGGCTG translation M S F H F A V L T L I L T A F T V S L C A E Q K I T K S D A G E I R I F K R L I P A D V L R \------------------------1------------------------------------------1-- \----------------------------------------1------------------------------------ \------1------------------------------------------1--------------------------- \---------------1------------------------------------------1------------------ \------------------------1------------------------------------------1--------- \---------------------------------1----------------------D F P G M C F A S T R C A T V E P G K S W D L T P F C G R S T C V Q N E E N D A K----------------------------------2------------------- \------- L F E L V E D C G P L P L A N D K C K L D T E K T N K T A S F P Y C C P I F T C D P G V K L E Y P E I G K D N D K K N S E Z The first intron contains a set of NNNNNNNNNNs bee FPGMCFASTRCATIEPTKSWELTPFCGRSTCVPADDNSGRLFELVEDCGPLPKANPKC KL-SDKTNKTAAFPNCCPIFECEEGAKLEYPEIPTLPPPTEIIETEKTSEEVPTKA fly MSFHFAVLTLILTAFTVSLCAEQKITKSDAGEIRIFKRLIPADVLRDFPGMCFASTRCATVEPGKSWDLTPFCGRST CVQNEENDAKLFELVEDCGPLPLANDKCKLDTEKTNKTASFPYCCPIFTCDPGVKLEYPEIGKDNDKKNSEZ \************There is 20kb to the next gene, YIKES, each 10kb half contains at least one huge gene, one ORF is 5kb, the other is 7kb! Amazingly there are no ESTs, and the few BLASTX matches are poor BUT amazingly the 7kb one is to the entire TES domain of lacunin, but with nothing else, that is, no Kunitz or thrombospondins? What on earth is it? SIMA \- these have got to be genes, I can't see how one could have several kb of ORF without it encoding something selected. The latter TES region is a threonin/glutamic acid/serine rich region of a moth protein we described, but this is not it's ortholog in the fly genome, that is elsewhere and already annotated. \**************A. mellifera BB270014A20B4.F TATTCTACAGGTTCCACAGCACCGTTTGCTTGT GCGAGGTCTGGAAAAGAGCGTACAAGGTGAAGCCTACATACATGTGACGTAAGGGGTCGCGTGAGGTCGTGTGCAAG ACGGCGGCTGCTCCTCAACGACCGAAAACCCTGCCGGCCAGGCTGGACGCCATCGCGGAAAACGATCCGATTTACTT TATGTAAGAGGGAGAGGGAGAGATGCGAGATTCTCTTTCGCCTCTCTTCGTCGCTCGAGGTTCTTCAAGCTCTGCGA AACTGCGAATATATATATATATATATATATATATCGAGTATATCGTTCACCAGGCGTTGTTCCATAGATATTGATTC GTGCGACGAGCGATGGACGAGCCTCGAGTTTGAGAAGGAATGGCGGTTTGTTTATTATTATTCTTTTTTCTCGAGAA GTTGCGTTTATTTATATTATTTAGATAATANNAATGATTGTACAAGGTTNTATAGTCGTCCTCGCGGTCTATGGAGA GAAGAANAGATCCCCATGCGCGATGGTTTTTACATACACACTACCATACATACACCACACGGCGATT The match is barely the end of an ORF, which encodes the end of a family of proteins. The best genomic match is not properly annotated, but there is a gene in the region; there are many related lower matches in Drosophila and C. elegans, e.g. CG3332, and there is an EST for this one. Leave it for now \**************A. mellifera BB270014B20D9.F ATAATTTCTTCTTCTTCTCCATTTCCATTTCTT CGATGTCGAGGGTGTGATGCCGTGCGAGCCAGTGTCCGCGAGCGACTTTGTCGAATTTTCGAATTTCATGTCGAGGG ACCGGAAGTACTCGAGGCACTGCGGCCAACAGAAGGAGTTCGACGTGAACAGCGACAGAAAGTTCTTTCGCGTGACG TTCAAGAGCAACGATAGATACGACGGGACAGGATTCAACGCTAGCTACGTGTTCGTGGATGACGAGGGAAATTACAC GACGAAGCCGCCGACGAGTAACGCGTCAACGTTAAAAGGTGCAACGATGATGATGCTGCTGCTGCTGCTCGTGTTCA CGGATCCTCTTCTCCTCCGTTCCGGCCGAGTTTCACCACGCTTTAATCACGATCAGTAAGTTGGACGGGATGGTCTT GGCTCAATTTACAACGAACTATAACTTGGAAGACGGGGAGCCATCCGAAATCTTCCATTATACTCGGCTAGCCGAGG AAAGTGGCCGCATCGTTTGCATCGATGTGAAAATCGGGGGGACACATCTTCGAGGGACGCGTTCGTGATTCGTTTAT TATTATCATCGTTATCACGGGAACCTTGTCTCCAACCAAACGTTATTATTATTATTATTATCATCTCGTTCACGTTT CGTAGTCGTTGGTTGGGCGACAGAAGAACGACAAATATATATATATATATATATATAAATATAAATACTAT no obvious ORFs; BLASTX match to C. elegans C15A11.3 37% over 74aa; 7e-09; genomic match is same region at 50% and e-19; needs to be added on to CG4940? no EST unfortunately \**************A. mellifera BB270018B10D12.F GTAAGCCAACTTCATCGACCTTAAGCGGATTA CGACGCTGTCGAGCATTGTATGATTGCGAACAGATAACGAGGATGACTATCATTCCGGGAAGGAGAGGGATCTCGTA ACAAATGAACAAACCGACGACGACAACGGATGGAAGGCGCTCTTGAACGAGCGCCTGAAAGAAGAGGAATGTTCCCG ATTAGTTTCGTGCACATGTTGCAAGATTAACATTCGGTAAGCCTGAACTCATTGGCAAGTGTCTCCAGTACTGCCAG CTCCGTGAGCATTGCTAGTCTCGGCAGGAAATCTTGGACGAGGAAACCGAAAGATTGAACAGAAAGGATCAAGCATT CAGTTCCATAATTTTTAAAAAGTCGTGGATTCCTTCTCTTGAAATTATACGCAATTTGAAATAATTCCTCAATTTAT AACTTGAAACAATTCCAGAGAAGCACATTGATTTGCTAAAAGAAGAGATTTTAATAATTATTATATGAGAAAATACT AAAAAATATTTGTTTAAATTTGAGAGGAGGAAATCATAAATATTGAAGAATTCATTAAAAAAAAAAATAATATCTGA CTAAAAAAAATCAATATGAAAGTTGAGAAAGTTTGTCTTTGAGAATCATTAGAAACATCATAGAGAAGAGAAGTAGT TACTATAGCCTGAGCAAAATAAATTGGCTATTACAAGTTACAAATAGATAAACTCTCATCAT no obvious ORFs; short stretch matches end to ADP-ribosylation factor-directed GTPase activating protein in mammals; could be C-terminus of CG2226?; there is one Droso EST! Indeed this region is included in the annotation, but not the protein. SIMA \- This seems to be a common problem I don't understand, why are the annotated mRNAs not always completely translated? \*************A. mellifera BB270019A20H6.F TTTCCACTCGAATTTTTCTACGCTCGTGTACGGT CTTTTCACTCTCTCTCTCTATTTCTCTCTTTCTCTCTGTTTCTCCCCTTAATCGAGAGAAGTTGAAGTAGCGGACAG AACGTTTTTTTTATGGAATTTCCAGTTAAAAAATAATCGTTTTCAAACTCACCTCGTTGGTGTTCCATCGGTGCCTC TGAGTTGGAAAATGTTCAGCTCGTGGCAAGGTCTCCAAATTCTCCGGAAGTTTTATCGGATCTCCATCTAAACGGGG AGGAAAGAAAAAAAAGAATTCGATGAAAATCTGCTCTCGACCAATTCGAACGTTTTTCTTTCTTTCTTTTTCTTTTT TTTTCTCTCTTTTTCGAAACGCGTAAAATATACGAGTTTTTCGAAAAATTCAAGGAAATTTAATATCCTAATGGTCG AATCAATTATATCATTCAATTCGATTAACTCTTTCTCAATCGTTAATCCGATTATCGTGCGAGTTACTAATAATTTG CGCTCACATATATTCACTTCGACAAAAGCTTAGATGTTTGCATAACACATCATTAATCGTGTTTATATGCAACGGAT CGAAGTATGCTATCGATCGTAACACAACTCGTGTACGAGAGAAAGAAACAGTGGTGTTAAAAGTAATGCACTTTTCA CGTGCATAATATTACTTTCCCGGGATATGANACGCATATTTACTGTNNTAGAGAGNAAGNAAAGAAGA no obvious ORFs; no BLASTX hits?; genomic hit is from a short internal region of RC, to unannotated region; no Dros ESTs, but a few others for same region, could be a real protein? \*************A. mellifera BB270020A20G6.F AATACTTCTGTTGTTAAAGGTGTTGAAAGTATAT TATCAATTAAGTTTGATCATCCTCTTCTAAAAGAATTAGTCATTGTAGAGGAACCTACATCAACCCAGGAGCCAATT GTTTCCAATGCGGCAGTAGTCTCTGAATGTTATAAAGTGACTGCCGATGTTTTACCAGTACTATCAAAATTTGGATA TGAAAAAGGAGACATTATGAAAAGAGCTGAGATCAGAAAATGTTTTACTGAATACGTGAAAGCAGAGAATCTTCAAG ATGGAAGGATACTGAAACTGAACCCGCAACTCGCAGGTATTATGAAAACTAAAGCGAATGTGGAAACTGTAATGATG GAGGATGGAATAAACAAGTTTATTGGACGTATGACGCATATGCATGAAGTTACTTTAGCAGGAAATAAATTGTTACA CACGGGTAAATTGGAACCTATTGATATGAGAGTCACTGTTCGATCCGGCGGCAAAAAGGTAACGCTAGTAAATAATT TGGAAACATTTGGCATAAATGCTAAAGAATTTAGTAAAGAATGTCAGAATATTGGAGCGAGTGCAACAATTACGGAT GAACCAGGAAAAAAAACTCCTAGTGTTCTAGTTCAAGGAAATCAAATTTTATATATCTACAAATTACTTACAGAAAA ATATCANATTAAAAAAAACTATATAAGAGGATTAGAATTCGCTCCAAAGAAACAAGGTTC long ORF encodes 220aa 11% lysine end of protein; indeed is end of ligatin <up>Drosophila melanogaster</up> by BLASTXl e-18; not in Drosophila genome set for some reason; curiously vertebrate proteins are same or better identity, and C. elegans! Appears not to be annotated properly. \************A. mellifera BB270021B10C10.F ACAATTTCTATTTAAGAAAAAAAATCTAAAAACA TGAAATTTAGATTTTTAGGTGATGGCGATTGCCCTGATTGGTTNGCTAGCCGAAATCAACACATTGTCACGTATGAC ATCCATTAAAATTAAGATATTAGGACAAACGGTTGCAAAATATCTTACGGAAGGAGAACTCGATGAAGAAAAAATAA AAAAAATTACTCAAGATGCCAAGATTGAACTTAACGATGCAAAGGCTATGGTAGCAGCTCTTGAATTAATCTTTACA TCGTCTGCTCGATATGGCGTTTCCGCCGCCGATTTAAGCAATGAATTGCAGCAACTAGGACTCCCTCGTGAGCACAG CGCTGCAATTGCCAGATTGCATACGGATTATTGTCCTCAAATTACTGCTACGCTGTCTTCCCAATCCTTGAGAGTAA GCAGATTATCGTCGATTGAAGTTTTGTCCTGTGATAATTCGTCACCTTTCTCCACGGTATCTCTTAAATTAAAGAAA TTGGATGGAAATGTGGAAGATTCTATTATTAATATTTCAAAAAAAGATGTACACGTTCTATTGGCAGAATTACGAAG AGCCAAGTCATTGATGGAAAACCTTTGAATAAAATAATGTTCTACGTATTAAACACAATTATTTTTATTAAAATAAT AAAGTATATTAAAGTTATTATCTACAATAATAAAGTACTGGCTAAAAAAAAAAGTAAAATATAAAAAAAAAAA long ORF encodes 180aa 14% leucine 12% serine protein; 48% match over full-length to similar hypothetical protein FLJ20452 <up>Homo sapiens</up>; e-21; also similar C. elegans protein; unannotated NEW GENE; no Drosophila ESTs, but tons of others! Strange AE003534.2 taccgacatttggtttgccctttacagAAATTCCGCTTCTGTGGCGAAGGCG ATTGCCCCGATTGGGTCCTAGCTGAGATCATATCAACACTCTCGAACTTGAGCATTGAAAACTTGGAACAACTTAGC GATTTAGTGGCACAACGAATTTGTGGAGAGACATTTGAGgtttgtaattatttgtttgaaattcataaatatacaag acttttactttcagGAAGCGAAAATAAAATCGCTGACATCCACATTAACTAATGAAGGAAAAACCGCCGTGGCATGC ATCAATTTTATGCTGACCAGCGCAGCTCGCTATAGCTGTAGTGAAAGCATTTTTGGCGAGGAGATCCAGCAATTGGG ACTTCCCAAGGACCATGCCGCAGCCATGTGCAGAGTCCTCCAAAAGCATTCCGCCACCATAAGGCAAACACTTATAA ACAAATCTTTCAGAAgttagtggtctaaacacatattaagtcttatgtgctatcttattaaggcttatatttgcaga ttaatttgcttcaattttatatttattttagTTAACGAACTGACAAGCGTCCGAGACATATCTACGCCAGGGCAAAC GCCTCCAAACTACGCCACCTTGGAACTGAAGATCTCGCAAGAACTGGTCGATGGCCTACCGAAGGATACCACCCATG TCCTCAACATTGATCGCACCCAAATGAAGGCTCTGCTGGCGGAGCTGAAATTGGCACGTGATGTTATGCAAAAATAT GAAAATAAACCAGATTCCTAAAAATGTTATTAATA translation K F R F C G E G D C P D W V L A E I I S T L S N L S I E N L E Q L S D L V A Q R I C G E T F E \-------------- \-----------0-------------------------- E A K I K S L T S T L T N E G K T A V A C I N F M L T S A A R Y S C S E S I F G E E I Q Q L G L P K D H A A A M C R V L Q K H S A T I R Q T L I N K S F R \-------------------------------------- \--------1----------------------------------------------I N E L T S V R D I S T P G Q T P P N Y A T L E L K I S Q E L V D G L P K D T T H V L N I D R T Q M K A L L A E L K L A R D V M Q K Y E N K P D S Z Can't easily find the correct N-terminus for this, but anticipate that it will be short. Need an EST! bee MKFRFLGDGDCPDWLLAEINTLSRMTSIKIKILGQTVAKYLTEGELDEEKIKKITQDAKIELNDAKAMVAALELIFT SSARYGVSAADLSNELQQLGLPREH------SAAIARLHTDYCPQITATLSSQSLRVSRLSSIEVLSCDNSSPFSTV SLKLKKLDGNVEDSIINISKKDVHVLLAELRRAKSLMEN fly KFRFCGEGDCPDWVLAEIISTLSNLSIENLEQLSDLVAQRICGETFEE--AKIKSLTSTLTNEGKTAVACINFMLTS AARYSCSESIFGEEIQQLGLPKDHAAAMCRVLQKHSATIRQTLINKSFRINELTSVRDISTPGQTPPNYATLELKIS QELVDGLPKDTTHVLNIDRTQMKALLAELKLARDVMQKYENKPDSZ \************A. mellifera BB270025A20A3.F AGTTCTAGATCTTGCCACTGAAACTGCTACTGCTG TAAGAGAAACAAGTAGAAGTGCTCATCGTACGATACCAAAACGCGATAGACCTCCTCGTGTGGCAAGTGGTTCTGCT GGTCTATTACCACCCTATAATCGCCAACAAGCAGAGGGCCAAGAATTTCTTTATATAATAAATGAACATAATTATTC AGAATTATTTGTGGCATATGAGTGTTTACGTAGTGGAACGGAGAATCTAAGAATTCTTGTTTCTAATGAAAGAGTTC GAGTGATTTCCGGAGGTACCAAAGGAGTTGTAACCGAAGTCAGTCTAGCGGACTTATTATATTGTCAACCAATGCAT AAGCTAGAAAGTAATGGTGTTACTTTATACTATATTGAATTAATATCTAGATCAGATTCAACGATAACCGTTAACAT GGACGGTCCAGAACTTCTAAGAAGACCTAAAGTTCGATGTGACAATGAAGAAGTAGCCAAAAGAGTATCGCAGCAAA TTAATTACGCTAAAGGAATGCACGAGGAACGTAGCTTGACTCTTTCTTCTTCGGATAATATGTTAGATGATGTACAG TACTATAAGTAGTTACAAACAATCATATATGAAAATTTATTTTGTATTTGACAAAAGTTTGGAATCACTATGTTTTT ACAAAAAATTTTTATGGGAAATTAATGCATTAAAATATTTTCATTTCAATGTTAATTCC long ORF encodes normal protein; several weak matchs to Drosophila proteins, none convincing; but also several human proteins, especially KIAA0453 protein <up>Homo sapiens</up> at e-19; seems to be a missed exon of CG11003! Which is also one of the weak matches above for part of it. No ESTs to help, but I think it is part of CG11003. \**************A. mellifera BB270028A10H8.F GCCGTGGCCCGAATTTTATCAAGAACACAATGT CGGAAGATCAAGTAAATCCACCATCTCCAATCGATGGTATTTTACCGTTTTTGCAAAGTATTGAATGGAGAGATCCA TGGCTTGCATTATTATTAACTTTTCACATTGCTGTTACTTTGACTGCATTGATGACACGAAACCATGCCAATTTTCA AATTATGTTATTTCTTGCACTATTACCTCTGGTATATTTTTCTGAAAGTATTAATGAAGTTGCTGCATCTAATTGGA TGTTGTTCTCAAGACAGCAATATTTTGATTCCAATGGTCTCTTTATATCTGTAGTATTCTCTGTGCCTATCTTGATG AATTGTATGATCATGATTGCCAGTTGGCTTTATCAGTCTAGTCAATTAATGACCAGTTTGAAAAGAGCGCAATTAAG ACAACAAGCAAGAAATCAGGAAATAGGAAATGAATCAATAAATACAAATGGCACTGCTGCAAGAGAAAAACAGGAGT AATATTTCTAGTCCAAGAACAATGAGAAATGGAAAATACTCTATAAGTAGAGTCGTATATAACGGCATTGTAAAATT CGACGAATATTTTCAACATAGTATTTTTTTAAAAGATTACTGCCGACACTTGTTATCACTGTACTTCAAGTTGATTA ATTTCACTGTCAGTTAACTATATTTCCAACTTTATGCCGTATATACATATATATCGACTATTAGAA end of ORF; weak long match to OstStt3 gene product; 23% and e=1.3; but 48% and e-23 to end of 171aa hypothetical protein DKFZp434C1714.1 \- human (fragment); genomic match is unannotated short region, NEW GENE. Tons of ESTs from cow, plants and others, but not Drosophila! This is the entire available region, assuming flanking annotations are correct. AE003822.2 gaaactcccgccacaagcgctttagaacagagtcctagacgagtgtggtgca cggtagggtcggtggcccgtgccacatctaaggcgccccttttttcggattaccctgctcggctttagcttcgattg cttgctcacacgtcgcccgttcgacttaataacccgaatagatttgattcgccctaaaaactacaattttgactgtt ttaaaacgaattctttgtgatatttttcggatttgttaatgttgtctactgagtcagtgaaagcgttatcgacacgt tcagactgaatgacgggcagggcgactctgcacgacaagtcggggtgggtgagaatgaggtggcacaaaatttgtaa ttgcatttatatggtgagtaatacatactaaacgaaataatagtattttgatttatgttgttatatttagcccataa aatagtaagttaggtcttacaaacagcgctaccagatccagtcaaaattgaggaagcctgaacgtctataggcctcc caaaatggcgttgccATGAAAAATGACAGCTGTTCGCTGCGAATGGCTATTTATGTTTGTTTTGACTCGGCTTTCGA ATATATCGCAAAATATATACAGGAAACATTTATATTCACAAAAATCTGTACGATGCACCCAGGGCAAATTGAGGTCA ACGAGATCAATGGCTATTGGACATTTCTGCTGAGCgtaagatactcgcctatatacaatcaaaaatcaagaatccgg caagttgtcactatttttgcagATCGATTGGAAGGATCCCTGGCTTATTGGCCTTATTTTGGCGCATATCTTAACCA CCACCACTGCGCTGCTCAGCCGGAACAGCTCCAACTTCCAGGTTTTCCTCTTCCTAGTACTGTgtacgtggacttgg cggcttccttgacttacccgataaatgactctgatttttgcatgtgcttcatcttcctcagTGCTGGCAGTCTACTT CACCGAAAGCATCAATGAGTTCGCTGCTAACAACTGGAGTTCCTTTTCCAGACAACAATACTTCGATAGCAACGGCC TGTTTATCTCGACAGTTTTCTCAATACCTATTTTGCTTAATTGCATGCTTTTGATTgtaagttatagtgtttccact gcatgaagtgtgtatttatctttgcttatttgcagGGCACTTGGCTCTACAACTCCACGCAGCTGATGGTGACTCTA AAAACAGCGCAGCTCAAGGAGCGAGCTCGCAAGGAACGCCAGACTAAGGCGGATTCGGAATCCATAGCACATAAAAA GGCAGAGTAGaacttacgcctgtattacatgcagttaaaagcacaagtagagctgtgaaattatatgttatgcttta aatggattttcctgtcatctagatgtagtttgctgcacagctctcgtctttaaaataaatttaatttagtataatca aacttatagaatttgtaaatttaggctatttttacatcctgttttacttagcgaagttacaaacctaacatgccctt catattaagcaaaaaatcacaccagttaccgttgccaccttggtaaagcagtttttactgccacctaaaattttcta tatatatcacgtaatatgaactattttgatatttttgacgaaattaacattatagatccaatcagcttattgcctgt atcaatttctgatctgtgtgccaagactgtaatttcaaattagaagctcgttggacctgtgtcattttttagtacga attcaattgggagcccttcgtcgtctggtaacactgtccaacgattttgttgttgctggcttgtgggtgtcgaagca gtgtcgcggcgcaatgttggaagtggtttttgggtaa translation M K N D S C S L R M A I Y V C F D S A F E Y I A K Y I Q E T F I F T K I C T M H P G Q I E V N E I N G Y W T F L L S \----- \-------------------------0--------------------------------- I D W K D P W L I G L I L A H I L T T T T A L L S R N S S N F Q V F L F L V L \-----------------------------------1------------------ \---------------------L L A V Y F T E S I N E F A A N N W S S F S R Q Q Y F D S N G L F I S T V F S I P I L L N C M L L I \----------------------------------0--------------------- G T W L Y N S T Q L M V T L K T A Q L K E R A R K E R Q T K A D S E S I A H K K A E \* This is my best guess, because two intron boundaries are unpredicted. fly MKNDSCSLRMAIYVCFDSAFEYIAKYIQETFIFTKICTMHPGQIEVNEINGYWTFLLSIDWKDPWLIGLILAHI LTTTTALLSRNSSNFQVFLFLVLLLAVYFTESINEFAANNWSSFSRQQYFDSNGLFISTVFSIPILLNCMLLIGTWL YNSTQLMVTLKTAQLKERARKERQTKADSESIAHKKAE bee RGPNFIKNTMSEDQVNPPSPIDGILPFLQSIEWRDPWLALLLTFHIA VTLTALMTRNHANFQIMLFLALLPLVYFSESINEVAASNWMLFSRQQYFDSNGLFISVVFSVPILMNCMIMIASWLY QSSQLMTSLKRAQLRQQARNQEIGNESINTNGTAAREKQE Looks good from the alignment though. \***************A. mellifera BB270030B20G7.F ACTATTCTCACCTCCGGCCGATTTCACGCCGC GTAATTCTCATTTCTTTCGACAATCGAATATCCGTCGATCACAGTGATTATTATTTACGACTTGCTGGAATAACAAT CACGCGATTAATTTGTTAAGTTTCAGTATGGAGTGTCCTGAAGCGATGGAACGAGGCAGAAACTTTCGTTTGCTTGC CAAGGAAGAACTACCTAAACTCTTGGACTTCCTTGATGGCTATTTGCCCGAATCCTTAAAGTTCCATCAAACTTTGT TGACCTATATGAATGACAGGGTATGGGATTTTATTTTCTATGTGGCTAATGACTGGCCGGATGATGAGATCTGTTTA CATTTTCCAGGCATGACGTTAGCCACATAGAAAAAAAAAGCAAC possible internal ORF; weak match to CG5750; no better BLASTX matches; but convincing genomic match for same region at 70%, to unannotated region, but could be real N-terminus of CG15628? No ESTs to help \*************A. mellifera BB270032B20A6.F GGAAGGGTGCGTGTCAAAGTAGTAGACACACAAC TGCTAATCTCGTGGTTACATTTTATTTTCACGAATATCTTAGGAAATGTACTTTTTCGGCACATTGCTATGTAGCAC GTAAATGAAGCGACGGCGTATAGCGCGGTCGCATATCAAAAGAATACTACCTATAGGAACGATGAGAATGGCGCGCA AATGTTGCGTGCGTAGCTGTGAGGCTGATGTGCAAGATGCGCGTGCTAAAGGGTTACCGCTTCATAAATTTCCGAAA GATATTACTTTAAGAAACAAATGGTTGACTAGTGGTGGATTTGACGCGAATTTTAAACCTTCACCAGGTCAAGTTGT TTGTCACAGACATTTTAAACGAGCTGATTACGAAGCTGCTAAAGGACATAAATTACTTCTACGTAAAGGTAGTGTTC CGTCGGTTTTTGCAGATTATGACAATCATCCGGATCCTGTAATAATGTCTGTAAAATCATCAACTTCTTATGCACAA GAAGATTTAGATCTTATTAATTCTGAAATTTTGAATTTAGAACAATCCATATCTCCATTGAATTCTGGTGCCAGAAC ACCAAAATCCGATAGCTGTGGAGAAACATGTTCTTCTCGACCAGAATCATCAGCTGATTCTTTTAATTTATTAGATT CAACAGAATTAATTGATANATGGATGTAAAACTTTGAATATGAAAGAAGAGAATATATCTCCTATGA long ORF encodes 200aa 12% serine protein; repeated weak matches to huge protein CG10631; e-05; also weak match to dJ126A5.2.2 (novel protein) (isoform 2) <up>Homo sapiens</up>; short protein at e-04; genomic match seems to be to region of N-terminus of CG10042 gene product <up>alt 1</up>; several ESTs for this match too, so could be new gene, but hard to annotate. \----------------------------------------------------------------------------- \-- 'New Genes FASTA' file >Found with R. suavis J3-A2 ATATAGTTCCATTCTGTTTTATTGGATTGAGTAAAGTTAGACAAAATGCAAGGTCTTGGTCTGCAAAGTCTTAAAAA AAA TCCAGCTTTAATTCCACTTTATGTGTGCGTTGGAGCGGGACTATTGGAGCCGTCTACTATATGGCTCGACTTGCTAC TCG TAATCCCGATGTCACTTGGAATCGCACATCAAATCCCGAACCATGGCAAGAGTACAAAGAAAAGCAATACAAGTTTT ATT CGCCTGTGAGGGATTATTCCAAAACTAAGAGTGCTGCCCCAAACTTTGATGAATAAATTACGTTTCCCTAGCAGCTG CAA TTTAAAAATGTAAAATGAAATAACTTCAAATTATAAATAAACATAGTGGATTTGAAAGCGTA >Extra1 ATGCTTAATCTCAACCTTCTAGATTGTATAGTTCCTGAGATCTCGACATTCATACAGACGGACGGACAGCGTCAGAT CGA CTCGGATATTGATCCTGATCGAGAATATATAGGCTTCATATGGTGA >Found with R. suavis J3-D1 ATGTATATTTATTTTCCAACAATCTTTCTCTTATTTTTGTATCCAGTAGTAGCAGTTGTCCCTCAAGGATTTACAAT TAA ACAACCAAAATGCTGGTATGTGGCAAACCCTGGACCCTGTGATGATTTTGTAAAAGTCTGGGGCTACGATTATTTGA CTA ATCGTTGCATTTTCTTTTATTATGGAGGCTGTGGTGGAAATCCAAATCGATTTTATACGAAAGAGGAGTGCTTGAAA ACA TGCCGTGTGTACAGACCTCCAAATCACGTCTGTTTGCTGCCAATCTGGGCGACGGCCATTAAGTCAAACCGTTTGAA GCA ATTTGAAAGCTACCCAGACTATGCAACATATATATTTTTACAGACTCCCTGGGTTATTTATCAACAATTTTATGTGG ATA GCGTTGCGATTCTGACAATTTTTGACATGCAATTTGCCATTTTCCATCTGCTTCAGCCGTATTTTGGGTGTGGAATT TGG CATTTTTCCGCAGGCTGCAATAAGTTTTGGCAGCGAATGCAGATGAGGCTCATGATGAGATGA >Found with R. suavis J3-A7 ATGATTGAAATATCAGATTTGCAGAAAATTGGCATCGGCTTGGCTGGTTTTGGCATTTTCTTTTTGTTTCTCGGCAT GCT GCTGCTGTTCGATAAAGGACTGCTCGCCATTGGCAATATTCTATTCATATCGGGCCTGGCCTGCGTCATTGGCGTGG AGC GCACGATGCGCTTTTTCTTCCAACGGCACAAAGTCAAAGGCACAACGGCCTTCTTAGGGGGAATCGTCATCGTCCTG CTG GGATTCCCCATCTTCGGCATGATTATTGAATCCTATGGATTTTTCGCACTCTTCAGCGGCTTCTTCCCCGTGGCCAT TAA TTTCCTAGGCCGAGTGCCTGTTTTAGGATCGCTGTTTAATTTACCATTTATACAAAAGATTGTTCAAAAACTTGGTG GAG ACGGCAACCGAACTACAGTAtaa >Found with R. suavis J3-B3 ATGGATGCACGAAAGTTTTCTACCCACATATTGGATACTTCGGTGGGAAAGGCGGCAGCCAATGTGAGAGTAACAGT TTC CAGGCTGGACGAGATTCAGGAATGGAGATCCCTTCGGGCGGCCCAAACTGATGCGGATGGTCGCTGCCTGCTCTTGG AAC CTGGTCAATTTCCCGGCGGGATCTATAAGCTGACCTTTCACGTGGGCGCCTATTACGCGGAGCGCAATGTGAGGACA CTT TATCCAGCAATTGACTTGATTGTGGATTGCAGTGAGAATCAGAACTATCACATTCCTTTGTTACTCAATCCCTTTGG GTA TTCCACATATCGTGGAACATAG >Found with A. mellifera Contig1312 ATGGACATCTCAAAGGCACCAAATCCGCGAAAACTGGAGCTGTGTCGCAAATACTTCTTTGCTGGCTTTGCATTTCT GCC CTTTGTGTGGGCCATTAACGTTTGCTGGTTTTTCACGGAGGCCTTCCATAAGCCACCATTTTCGGAGCAGAGCCAAA TAA AGAGATATGTTATATACTCTGCAGTGGGGACTCTATTCTGGCTGATAGTACTAACTGCCTGGATAATAATATTCCAG ACA AATCGCACAGCCTGGGGCGCCACAGCGGACTATATGAGCTTCATCATACCCCTAGGCAGTGCATAG >Found with A. mellifera Contig1481 TACCAATTACTTGTAAGCACAAAAAACAGCTGACGGCAACAAGTGGTTCGGTCCCCATCGGAATACACGTGCTCAAA ACG TGTGGGTTTTATTTGCCTTAATTGACTTAAATTCACTCGCAATAAGTGGAAATGATTCGAAAGGTGCCGCTAATTGT AGT CCTGGGCTCCACGGGCACCGGAAAGACGAAACTGTCTTTGCAACTGGCCGAACGCTTCGGAGGAGAAATAATCAGCG CTG ACTCCATGCAGGTTTACACCCACCTGGACATCGCCACCGCCAAGGCAACCAAGGAGGAGCAGTCCCGGGCACGACAT CAT CTACTGGACGTGGCCACACCGGCCGAACCCTTCACAGTCACTCACTTTCGTAACGCAGCACTGCCCATTGTGGAGCG CCT GCTCGCCAAGGACACTTCTCCGATTGTGGTGGGCGGCACGAATTACTACATAGAATCCCTACTTTGGGATATTCTGG TTG ACTCGGATGTCAAGCCGGACGAAGGCAAACATTCGGGGGAGCATCTTAAGGATGCCGAACTGAATGCTTTGTCCACC CTC GAGCTGCATCAGCACCTTGCCAAGATCGACGCAGGTAGTGCCAACCGTATTCACCCCAACAACCGGCGCAAGATCAT CCG GGCTATCGAAGTGTATCAGAGCACCGGGCAGACTTTGAGCCAGATGCTGGCGGAACAGCGGGCACAGCCGGGAGGAA ACC GCCTGGGTGGACCCCTTCGCTATCCACACATCGTTCTCCTTTGGTTGCGTTGCCAGCAGGATGTTCTAAACGAGCGA TTG GATTCCCGCGTAGATGGCATGCTGGCCCAAGGGCTGCTCCCTGAACTACGACAGTTTCACAATGCCCACCATGCTAC CAC TGTGCAAGCCTATACGTCGGGAGTTCTGCAGACGATTGGCTACAAGGAGTTTATTCCCTATCTGATCAAGTACGACC AGC AGCAGGACGAAAAGATAGAGGAGTACCTCAAAACCCATAGTTACAAGCTGCCAGGCCCAGAAAAACTGAAAGAAGAA GGT CTTCCAGATGGCTTGGAACTCCTACGCAATTGTTGCGAAGAACTAAAGTTAGTCACTCGCCGATACTCAAAGAAGCA GCT GAAGTGGATCAACAATCGATTCCTGGCCAGCAAAGATCGTCAAGTGCCGGATCTCTACGAACTGGACACCAGTGATG TGT CAGCTTGGCAGGTGGCAGTCTACAAGCGGGCAGAGACCATCATAGAAAGCTATCGAAACGAAGAGGCTTGCGAGATA CTA CCAATGGCCAAGCGGGAGCATCCTGGAGCGGATTTGGATGAGGAGACTAGCCATTTTTGTCAAATATGCGAACGGCA TTT CGTTGGGGAGTACCAATGGGGACTGCATATGAAGTCCAACAAACACAAGCGAAGAAAGGAGGGACAGCGCAAGCGGC AAA GGGATCACGAAACAATGCTCTCAACGGATCTAGCGAAGAAGCAAAAGGAGGAGAAAGAGGAGGCAGGAAAGGCGGAG ACT CAGCCACCACCCAGCCGAGTCAATGATACTGATAAGGCAATGtaa >Found with A. mellifera Contig2709 TTGAACACAGATGTCACTTCTACAGGGGAAAAAAGTTTAAAAACAAGTAAATCACAGAAAACGTCGTTTCCTTTTGC TAA TAGAGCGCCTGAATTCGGTGGAAATAGCAAAAATAATATATCACCATTCTTGGGACTGCAAACAAATTCGAAAATGA GTG ACAATTTTTCAAGAACACCATATTCTGATGGGCACGCTGCAACCCATGAGGAAGCATCAAAACCCCACTACACTACC ACT ACGAGTTCTTTTAGTAGAACTCCGGTCTCGCCGTACCTCAACTATGATTCGCGATATCTGCAGCAAGCACAGCCAGA GTT CATTTTTCCCGAAGGGGCCAACAAGCAGCGTGGACGCTTCGAGTTGGCCTTCTCTCAGATAGGCACTTCGGTAATGA TTG GCGGTGGAATTGGCGGCCTAGCAGGTGTTTATAATGGTTTAAAAGTCACAAAAGCACTCGAGCAGAAGGGAAAAGTT CGT CGAACACAGTTACTTAATCACATTATGAAGCAAGGTTCCGGCACAGCTAACACATTAGGTACATTGACGGTGCTGTA TTC GGCTTGTGGAGTTTTGCTGCAGTTTTTCCGCGGAGAAGATGATCATATAAACACAGTAATTGCGGGCTCTGCCACAG GAC TATTATACAAGTCAACAGCTGGCCTTAGGACGTGTGCTTTTGGTGGAGCTATTGGGCTGGGCATCTCGTCCCTCTAT TGC TTATACCTAATAGCACAGGAAAACAGTTCGAACTCAAGTCCCAAATACCTATAG >Found with A. mellifera BB260003B20H2.F ATGGTGGATTTTTTTGAAAAGCTTAGACGCGGTCACACATTTATTTACATCGAACATATGATGGGCACGCCGGAATT AAA AATCATATTAGAATTCAGTGCAGGGGCGGAGTTACTATTTGGTAACATAAAACGCCGTGAATTGAACTTGGACGGTA AAC AAAAATGGACTATTGCTAATCTGCTTAAGTGGATGCATGCGAATATTTTAACGGAGCGTCCGGAACTTTTTCTTCAA GGA GATACTGTGCGACCTGGAATTTTAGTACTCATAAATGATACAGACTGGGAATTGCTGGGTGAACTGGACTACGAGCT GCA GCCCAACGACAATGTGTTGTTTATATCAACTTTACACGGTGGTTAA >Found with A. mellifera BB260004B10A11.F ATGGAGAAATCTGAAATACGACTGCAACGCATGTCTAATGAATATCAGTCGCAATCGAGCTATATGTACCTCCGGAC CAA GATGCTGTTAAAAATCGAGAATACCCTACTTCGAAGCCATCGTCAGCGCGAGACCACCGGTATCAAGAAACTATACA ATT CGTTTTTCGTATTGTTTTAA >Found with A. mellifera BB260010A20C3.F TTCGCATATTTCAGTTATTTATTTAGAAATGGGGCGATTTAAGTTATGTGCTTCGCCGAGAGAGGTTATGAAGTACG AAG ACTTTATAAAACGCATTCGAAAAAGCCTCTACTATGGCGTTGGAACACCAGACACAGAAATGTCGGTCTCCTTACCC TTT GCGGAGTACGCGGCAGATTTGTTTTCGGAGACTCATCGCGGGCATTCTTTGCATCGCCTAAGTTGCGTATCTGCTGC ACA AGTACATGCCACGCCTTGCTCTTTAATTATGGCATTGATATACCTCGATCGCTTAAACGTCATCGACTCGGGCTATA GCT GCAGAATCACACCACAGCAGCTGTTTGTTGTGTCACTAATGATTTCCACAAAATTCTACGCGGGCCACGACGAACGG TTC TATCTGGAAGACTGGGCCAGTGACGCTTGTATGACGGAAGATAGGCTCAAGGCAGTCGAGCTCGAATTTCTTTCCGC TAT GGGTTGGAATATATACATATCCAATGAGCTATTCTTTGATAAGTTAAGAAACGTTGAACGTTCTTTGGCTGAACAGC AGG GACTGCGTAGAGGTTGGCTCACTTACAGTGAGCTCGTGCAGTTGCTGCCTAGCCTTGAATGGACGAAATTCCTCGTT AAC AGCCTGTCTGTACTATCTCTAAGCTATGCGGCAAGTATTATAACATTAGCCGGAGCTTTTTTTATTGCGAGCCAAGT TCC CGGTACGTTATGGCATCGGGATGTGGAAACTGCCTCAGATTTCACCATGACAATTAGCAGTCAGGTATCCGTTTCAA ATG CATTAGAGTCCACACCTTTTATTAATGTCCAAGTATCCTCACTTTTACGTAAAACGAGTAACGTGAATGTTGAATTG ATG AATCTTGAGAAGACAAGCTGCGCCAGGGCAAGACTGAATAAAATTGAATATAAGCATCCGCGCCATCAATCAGTACC TAC GCTTTCATTCATAAGCACCTGTCCACAACTTGATTTATTGTATGCCCAAGATGGAACAAGGAATTGGCTAAATATTA AAT CGCCCAACAGCGACTACAAAAACAACAGAAACCTTTCAATAACAGTTAGATCCGTACAACTAGAAGAGCAAAAGGCT GAA AATGATTCCGTTATTTGGCAAGCCAACACCGAAGCAATGCAGTAA >Found with A. mellifera BB260019B20F2.F ATGAAGGAAGAAGGCGGCACATTGCTGGGCGATAAAGGTGTACGAAGGCATCAGTCCATGCAGCGTCTGTCAGCGGA GCA GAATGGTGGTTCAACGACTGAACAAACACATGAACACAATCCAAACGTCGTACCTGATCATAGAGGCAACTTACACA TTA CAGTTAAGAAAACCAAACCAATTTTAGGTATTGCTATCGAAGGTGGTGCTAATACAAAACACCCGCTCCCTAGGATA ATC AATATCCATGAAAATGGTGCAGCATTTGAAGCGGGCGGCTTAGAAGTCGGCCAACTCATCCTGGAGGTAGATGGAAC GAA AGTGGAGGGTCTGCATCATCAGGAGGTTGCTCGACTAATAGCCGAATGCTTTGCTAATCGTGAAAAGGCTGAAATAA CCT TCTTAGTTGTCGAAGCAAAAAAATCAAATTTGGAACCGAAGCCGACGGCGCTGATATTTTTAGAAGCCTAA >Found with A. mellifera BB260023A20H5.F CTCTGTTTGAGGGCGTAGTTCCAACAAGTGCTGAGCATCACAATTTTCTATTACTAAGCCCAGCTTTGCGTTGGCGC GCC CCAGAATCTCATTTTATATTTAGTTTCTGCCAGTTTAGTTAATTAGTTAGTTGATAGTGTTGTTTGTTTCTTCTGCA ACA ATTGTGTGCGATAGGAGTCGGGCAAAATGTTCCCGTCGTCGATTTTGGGGCGCAGCTATTTGCTTTTTATGCTGGTG CTC GCCGTGGGCGTGTTCGCCCAACACGAGTGGCAGGCCCGGGATGCCTTTGATGAGATAAAGAGGCAGTTCGACAAGGT GAA CGCGGATAACTGCCCCATCCAACACCATTCGGACCTTTTCATGCCCATGGACGCGGTGTCCCACAAGCCGGACATCA AGG AGATCAACGTGAATCCGGTGTTCCCCAACCGAACTGCCCTGCTGCATCTGCAGAATATGGCCCTTAGCAGAAGCTTC TTC TGGAGCTACATCCTCCAGTCGAGGTTTATTCGACCCGCCATCAACGACACCTACGATCCCGGCATGATGTACTACTT TCT GTCCACCGTAGCCGATGTATCCGCCAACCCACATATCAACGCCTCGGCCGTGTACTTCTCCCCCAACAGCTCGTATT CGT CGTCGTATCGCGGCTTCTTCAATAAGACGTTCCCCAGATTCGGGCCAAGAACCTTCAGGCTGGACGACTTCAACGAT CCC ATTCATCTGCAGAAGATATCGACGTGGAATACTTTCGATGTTCAGGATCTGGGCGCCCATCACCCGGACTCCATATC CAA GGACTACACCCACGACCTGTATAAAATAAACGAGTGGTACCGCGCCTGGCTACCAGACAACGTCGAGGGACGGCACG ATA CGAAGATCACCTACCAGGTGGAAATCCGCTATGCGAACAACACAAACGAGACGTATACCTTCCACGGACCGCCTGGC TCT GAAGAAAACCCTGGTCCGATTAAATTTACAAGGCCGTACTTCGATTGTGGCAGGTCCAACAAGTGGCTGGTGGCCGC AGT AGTGCCAATTGCGGATATCTACCCCCGACACACGCAGTTCCGTCACATTGAGTATCCCAAATACACGGCCGTTTCGG TTC TTGAGATGGACTTCGAGCGTATCGACATAAACCAGTGTCCATTGGGTGAAGGCAACAAAGGACCTAATCACTTTGCG GAT ACGGCGCGGTGTAAAAAAGAAACGACAGAGTGTGAACCATTACAAGGCTGGGGCTTTAGGCGCGGTGGCTACCAGTG CCG TTGTAAGCCAGGTTTTCGGCTGCCCAACGTAGTGCGGCGACCTTATCTGGGCGAGATTGTGGAGCGCGCATCGGCAG AAC AGTACTACAACGAGTACGACTGCCTTAAGATTGGCTGGATCCAAAAGCTTCCCATTCAGTGGGATAAGGCCTCCTAC CAC ATTCGCCAAAAGTATCTGGACCGGCATCCGGAATATCGCAACTACACCACCGGCTCGCGATCACTTCATGCTGAGCA CTT AAATATTGATCAGGCGTTGAAGTATATTCATGGAGTCAACTATCGCACTTGCAAAAACTTCCATCCGCAGGATCTGA TTC TTCGCGGTGATGTGAGCTTCGGCGCCAAGGAGCAGTTCGAGAACGAAGCCAAGATGGCCGTGAGACTGGCCAACTTT ATT AGCGCCTTTCTGCAGAGTATGCAAACTATAACACGAATATCCTCCTTACAGGTATCGGATCCCAACGAAGTGTACTC GGG CAAGCGTGTGGCCGACAAGCCGCTGACCGAGGATCAAATGATCGGCGAGACCCTTGCCATTGTCCTGGGCGACAGCA AGG TTTGGTCGGCCACAATGCTCTGGGAGCGCAACAAGTTTACCAATCGCACATATTTCGCACCCTATGCCTACAAAACT GAG CTCAACACAAGAAAGTTCAAGGTGGAGGACCTGGCGCGGCTCAACAAGACGCACGAACTCTACACGGAAAAGAAGTA CTT CAAGTTCCTGAAGCAGCGCTGGAACACCAACTTCGACGACCTGGAGACCTTCTACATGAAGATCAAGATCCGCCACA ATG AAACAGGTGAATACCAGCAGAAGTACGAGCACTACCCAAATTCGTACAGAGCGGCCAACATCAAGCACGGCTACTGG ACT CAACCACAATTCGACTGCGATGGATATGTGAAGAAGTGGCTGGTGACCTATGCGGTGCCCTTCTTCGGCTGGGACAG CCT GAAAGTCAAGCTGGAATTCAAGGGTGTGGTAGCTGTCTCCATGGACATGCTGCAGCTGGACATCAACCAGTGCCCGG ACT GGTACTACGAACCGAACGCCTTTAAGAACACACACAAGTGTGACGAGCAATCGTCCTACTGCGTTCCCATTATGGGT CGT GGCTATGAAACCGGAGGCTACAAGTGCGAGTGCCTGCAGGGATACGAGTATCCTTTCGAGGATCTGATTACCTACTA CGA TGGACAGCTCGTCGAGGCCGAGTACCAAAATATTGTGGCTGATGTCGAGACCCGCTACGATATGTTCAAGTGCCGAC TGG CCGGAGCTTCGGGTCTGCAATCCGCTTTGGGACTTGTGGTCGCTCTGATCGGGCTCACGCTCACCCTGCTGTATAGA TTT AGTTAA >extra2 ATGGTCAAGCAAGTGGATTTTGCGGAGGTGAAGCTCAGTGAGAAATTTCTCGGAGCTGGATCTGGTGGAGCGGTGCG CAA AGCCACCTTTCAAAATCAGGAGATTGCAGTAAAGATATTTGATTTCCTTGAGGAAACAATCAAAAAGAATGCAGAGA GGG AAATCACACATTTGTCGGAGATCGACCACGAAAACGTTATCAGGGTGATCGGGAGGGCCAGCAATGGAAAGAAGGAC TAC TTGTTGATGGAGTACCTGGAGGAGGGGTCCCTCCACAACTACCTCTATGGCGATGACAAGTGGGAGTACACCGTGGA GCA AGCGGTTCGCTGGGCACTCCAATGCGCCAAGGCCTTAGCATACTTGCATTCGTTGGATCGACCGATTGTTCACCGCG ATA TTAAGCCGCAAAACATGCTTTTATATAATCAGCATGAAGACTTAAAGATTTGTGACTTTGGCCTGGCGACGGATATG TCC AATAATAAGACCGATATGCAAGGAACATTGAGGTATATGGCTCCCGAGGCCATTAAGCACTTAAAGTATACGGCTAA GTG TGATGTGTACAGCTTTGGAATAATGCTCTGGGAGCTGATGACACGTCAATTGCCATATAGTCACTTGGAAAACCCCA ACA GCCAGTACGCCATTATGAAAGCTATCAGTTCAGGCGAAAAACTTCCAATGGAAGCAGTAAGATCCGATTGCCCAGAG GGT ATCAAGCAATTAATGGAATGTTGCATGGATATAAATCCCGAAAAGCGCCCCTCTATGAAGGAGATCGAAAAGTTCCT TGG CGAACAGTATGAATCCGGCACTGACGAGGACTTTATCAAGCCTTTGGATGAGGATACCGTGGCTGTGGTGACCTACC ATG TGGATTCGTCCGGCAGCAGGATAATGCGTGTTGATTTCTGGCGACATCAGTTGCCATCGATCCGCATGACTTTTCCG ATA GTGAAACGGGAAGCCGAAAGATTGGGAAAGACCGTTGTCAGAGAAATGGCCAAGGCGGCGGCGGATGGAGATCGGGA AGT TCGGCGGGCTGAGAAGGACACGGAGCGTGAAACCTCGAGGGCTGCCCACAATGGAGAGCGGGAAACGCGGAGAGCGG GTC AGGATGTGGGTCGTGAAACTGTACGGGCGGTCAAGAAAATAGGAAAGAAACTGCGCTTCTAA >Found with A. mellifera BB270004B10G5.F CTGGAACAGACCGCATTCAGTGGCTCTTATCGGTAGAAACAGCAGCACTTTTCCGAGATGTCTATCAAATCCTTGAC ATA CGTTGCGATCTTTGGCCTTTTTTGGGGCTCAATTGCGGGAACTGTAGTTGATCAGTTTGGGATATATGGTGGTTCAC CGA TTACCACCACGGAAAGGAGTAATGCGGAGTTGCGCTGCATGAACATCAATCCGCAGAACTCGGTGGACTTGGAGCAG ATG ATGGGACTCTGGTACGGCAGCGAGATTATCGTGCACAGCCAAGATTTTCCGGGCACCTACGAGTACGACTCATGTGT CAT CATTCATCTGACCGATGCCACGGATCAGATCCGTTTGAGCCAAGCAAATCGCGGCTATGGCTATGGAAATCAGGACT ACA ACCGTAACCAGAATAACTATGGACGCACCACCACCACTCAATCCTCCTATCCGGATAGCGATGAGTACCCGTTGAGA TCG ATTCAAAGCCAGCAGAAGTACCTACGTTTGATTTGGAGTGAGCGTGATAACAATCTGGAGTATACTTTCAACTATAC CAC CAGTGCACCTGGTCAGTGGTCCAACATCGGCGATCAGCGGGGATCCTTGGTCACCCTGAACACGTACACCCAGTTCA CGG GCACTGTCCAGGTGGTGAAAGCGGTCAACGATCACCTGGTGCTGACCTTCTGCGGCAACGATGTTAAGAGCTCCATA TAC ACAGTGGTTCTCACCCGCAATCGCCTTGGTCTCAGTTTAGATGAGCTGCGTAGCATCAGGAATCTGCTTTCCCGCCG TGG ACTCTACACGGAGACCATTCGCAAGGTTTGCAATGGATGTGGGCGATTGGGTGGCAGCCTCTTCGCTCTTTTAGCCC TTT TGCTGGTCGTACGTTTGGCCTGGGGGCGTGGCCAGTGA >Found with A. mellifera BB270012B20H7.F GACACAATGAACTCACAAAAAGAATACGTATCGGACTGCGAAACCGACGATGATTATTATGTCGATTTGTTAACTTC AGG CAAGGGCAGTGATAAGAGTGAAAGTGATGTGTCGGACAAGTCTGAAAATTATCCAGGCCTAAAATCAAAGCATACTG CGA AGGCATTGCGGAAAACAAGGCATTGTGACGGCGATAATAGGGAATACAGGTCTAAGGAGTGCGACGACCTTCATTCC GAA GAGGAGTCTGAAAAATCGCGGTCGGATGCTTTATGGGCCGATTTTCTTGGCGACATTGATACTAAAAGCGTAATCAA CCA AAAAACAGATTATACGGAGGGAAACGCAGCAAGTGCTACCAATACCAATACGCATGAGACTTGTAATAAATATGATA AAA ACGATACGGCAATAATAAAAACTGCACAGCAATACGATTCCAAAAGAACCACGCTTTCAGTTTCCACACTCGGAAAA ATT AAACGATCATCCGCTGAAAAGAGTATCGGTACCATGATAAATAAATTTGAAAAGAAGAAAAAATTGACAGTGCTTGA AAG GTCACAATTGGATTGGAAAATATTTAAACAAGACGAAGGCATAGACGAACTTCTGTGCTCGCATAACAAAGGCAAGG ACG GGTATTTGGACCGTCAAGACTTTTTGGAGAGAACCGATCTTAGGCAGTTTGAAATGGAAAAGAAGTTGCGGCTGTCT CGC AGGCCATACTAA >Found with A. mellifera BB270013A20H11.F CTTCATTTAGGCTGGTTAGGTGGTTAATTCCATTTGTCTTCGTTCTTTTGTATTATTTTTACAAAGCGATAATATTT TAA TCGTTTATGATTATTACAATATAACAAAAAGTTAACATCTTTGGAATCTTAAAAATGAGTTTTCATTTTGCTGTACT GAC CCTTATTTTAACAGCCTTCACAGTTTCTCTGTGTGCTGAACAAAAAATTACAAAGAGTGACGCAGGTGAAATACGAA TTT TCAAACGTCTTATTCCTGCCGATGTTCTACGAGATTTTCCGGGAATGTGCTTTGCTTCAACTCGATGTGCCACTGTT GAG CCTGGAAAGTCGTGGGACCTTACTCCATTCTGCGGTCGATCTACTTGTGTTCAAAATGAGGAAAATGATGCAAAGCT ATT CGAACTCGTAGAAGACTGCGGCCCATTGCCACTGGCGAATGACAAATGTAAATTGGACACAGAGAAGACTAATAAAA CCG CATCGTTTCCTTATTGCTGCCCCATCTTTACATGTGACCCCGGTGTTAAATTGGAATACCCCGAGATCGGAAAGGAT AAT GACAAAAAGAATTCTGAGTGA >Found with A. mellifera BB270028A10H8.F ATGAAAAATGACAGCTGTTCGCTGCGAATGGCTATTTATGTTTGTTTTGACTCGGCTTTCGAATATATCGCAAAATA TAT ACAGGAAACATTTATATTCACAAAAATCTGTACGATGCACCCAGGGCAAATTGAGGTCAACGAGATCAATGGCTATT GGA CATTTCTGCTGAGCATCGATTGGAAGGATCCCTGGCTTATTGGCCTTATTTTGGCGCATATCTTAACCACCACCACT GCG CTGCTCAGCCGGAACAGCTCCAACTTCCAGGTTTTCCTCTTCCTAGTACTGTTGCTGGCAGTCTACTTCACCGAAAG CAT CAATGAGTTCGCTGCTAACAACTGGAGTTCCTTTTCCAGACAACAATACTTCGATAGCAACGGCCTGTTTATCTCGA CAG TTTTCTCAATACCTATTTTGCTTAATTGCATGCTTTTGATTGGCACTTGGCTCTACAACTCCACGCAGCTGATGGTG ACT CTAAAAACAGCGCAGCTCAAGGAGCGAGCTCGCAAGGAACGCCAGACTAAGGCGGATTCGGAATCCATAGCACATAA AAA GGCAGAGTAG >Found with R. suavis J3-A2 MQGLGLQSLKKNPALIPLYVCVGAGAIGAVYYMARLATRNPDVTWNRTSNPEPWQEYKEKQYKFYSPVRDYSKTKSA APN FDE >Extra1 MLNLNLLDCIVPEISTFIQTDGQRQIDSDIDPDREYIGFIW >Found with R. suavis J3-D1 MYIYFPTIFLLFLYPVVAVVPQGFTIKQPKCWYVANPGPCDDFVKVWGYDYLTNRCIFFYYGGCGGNPNRFYTKEEC LKT CRVYRPPNHVCLLPIWATAIKSNRLKQFESYPDYATYIFLQTPWVIYQQFYVDSVAILTIFDMQFAIFHLLQPYFGC GIW HFSAGCNKFWQRMQMRLMMR >Found with R. suavis J3-A7 MIEISDLQKIGIGLAGFGIFFLFLGMLLLFDKGLLAIGNILFISGLACVIGVERTMRFFFQRHKVKGTTAFLGGIVI VLL GFPIFGMIIESYGFFALFSGFFPVAINFLGRVPVLGSLFNLPFIQKIVQKLGGDGNRTTV >Found with R. suavis J3-B3 MDARKFSTHILDTSVGKAAANVRVTVSRLDEIQEWRSLRAAQTDADGRCLLLEPGQFPGGIYKLTFHVGAYYAERNV RTL YPAIDLIVDCSENQNYHIPLLLNPFGYSTYRGT >Found with A. mellifera Contig1312 MDISKAPNPRKLELCRKYFFAGFAFLPFVWAINVCWFFTEAFHKPPFSEQSQIKRYVIYSAVGTLFWLIVLTAWIII FQT NRTAWGATADYMSFIIPLGSA >Found with A. mellifera Contig1481 MIRKVPLIVVLGSTGTGKTKLSLQLAERFGGEIISADSMQVYTHLDIATAKATKEEQSRARHHLLDVATPAEPFTVT HFR NAALPIVERLLAKDTSPIVVGGTNYYIESLLWDILVDSDVKPDEGKHSGEHLKDAELNALSTLELHQHLAKIDAGSA NRI HPNNRRKIIRAIEVYQSTGQTLSQMLAEQRAQPGGNRLGGPLRYPHIVLLWLRCQQDVLNERLDSRVDGMLAQGLLP ELR QFHNAHHATTVQAYTSGVLQTIGYKEFIPYLIKYDQQQDEKIEEYLKTHSYKLPGPEKLKEEGLPDGLELLRNCCEE LKL VTRRYSKKQLKWINNRFLASKDRQVPDLYELDTSDVSAWQVAVYKRAETIIESYRNEEACEILPMAKREHPGADLDE ETS HFCQICERHFVGEYQWGLHMKSNKHKRRKEGQRKRQRDHETMLSTDLAKKQKEEKEEAGKAETQPPPSRVNDTDKAM >Found with A. mellifera Contig2709 MSDNFSRTPYSDGHAATHEEASKPHYTTTTSSFSRTPVSPYLNYDSRYLQQAQPEFIFPEGANKQRGRFELAFSQIG TSV MIGGGIGGLAGVYNGLKVTKALEQKGKVRRTQLLNHIMKQGSGTANTLGTLTVLYSACGVLLQFFRGEDDHINTVIA GSA TGLLYKSTAGLRTCAFGGAIGLGISSLYCLYLIAQENSSNSSPKYL >Found with A. mellifera BB260003B20H2.F MVDFFEKLRRGHTFIYIEHMMGTPELKIILEFSAGAELLFGNIKRRELNLDGKQKWTIANLLKWMHANILTERPELF LQG DTVRPGILVLINDTDWELLGELDYELQPNDNVLFISTLHGG >Found with A. mellifera BB260004B10A11.F MEKSEIRLQRMSNEYQSQSSYMYLRTKMLLKIENTLLRSHRQRETTGIKKLYNSFFVLF >Found with A. mellifera BB260010A20C3.F MGRFKLCASPREVMKYEDFIKRIRKSLYYGVGTPDTEMSVSLPFAEYAADLFSETHRGHSLHRLSCVSAAQVHATPC SLI MALIYLDRLNVIDSGYSCRITPQQLFVVSLMISTKFYAGHDERFYLEDWASDACMTEDRLKAVELEFLSAMGWNIYI SNE LFFDKLRNVERSLAEQQGLRRGWLTYSELVQLLPSLEWTKFLVNSLSVLSLSYAASIITLAGAFFIASQVPGTLWHR DVE TASDFTMTISSQVSVSNALESTPFINVQVSSLLRKTSNVNVELMNLEKTSCARARLNKIEYKHPRHQSVPTLSFIST CPQ LDLLYAQDGTRNWLNIKSPNSDYKNNRNLSITVRSVQLEEQKAENDSVIWQANTEAMQ >Found with A. mellifera BB260019B20F2.F MKEEGGTLLGDKGVRRHQSMQRLSAEQNGGSTTEQTHEHNPNVVPDHRGNLHITVKKTKPILGIAIEGGANTKHPLP RII NIHENGAAFEAGGLEVGQLILEVDGTKVEGLHHQEVARLIAECFANREKAEITFLVVEAKKSNLEPKPTALIFLEA >Found with A. mellifera BB260023A20H5.F MFPSSILGRSYLLFMLVLAVGVFAQHEWQARDAFDEIKRQFDKVNADNCPIQHHSDLFMPMDAVSHKPDIKEINVNP VFP NRTALLHLQNMALSRSFFWSYILQSRFIRPAINDTYDPGMMYYFLSTVADVSANPHINASAVYFSPNSSYSSSYRGF FNK TFPRFGPRTFRLDDFNDPIHLQKISTWNTFDVQDLGAHHPDSISKDYTHDLYKINEWYRAWLPDNVEGRHDTKITYQ VEI RYANNTNETYTFHGPPGSEENPGPIKFTRPYFDCGRSNKWLVAAVVPIADIYPRHTQFRHIEYPKYTAVSVLEMDFE RID INQCPLGEGNKGPNHFADTARCKKETTECEPLQGWGFRRGGYQCRCKPGFRLPNVVRRPYLGEIVERASAEQYYNEY DCL KIGWIQKLPIQWDKASYHIRQKYLDRHPEYRNYTTGSRSLHAEHLNIDQALKYIHGVNYRTCKNFHPQDLILRGDVS FGA KEQFENEAKMAVRLANFISAFLQSMQTITRISSLQVSDPNEVYSGKRVADKPLTEDQMIGETLAIVLGDSKVWSATM LWE RNKFTNRTYFAPYAYKTELNTRKFKVEDLARLNKTHELYTEKKYFKFLKQRWNTNFDDLETFYMKIKIRHNETGEYQ QKY EHYPNSYRAANIKHGYWTQPQFDCDGYVKKWLVTYAVPFFGWDSLKVKLEFKGVVAVSMDMLQLDINQCPDWYYEPN AFK NTHKCDEQSSYCVPIMGRGYETGGYKCECLQGYEYPFEDLITYYDGQLVEAEYQNIVADVETRYDMFKCRLAGASGL QSA LGLVVALIGLTLTLLYRFS >extra2 MVKQVDFAEVKLSEKFLGAGSGGAVRKATFQNQEIAVKIFDFLEETIKKNAEREITHLSEIDHENVIRVIGRASNGK KDY LLMEYLEEGSLHNYLYGDDKWEYTVEQAVRWALQCAKALAYLHSLDRPIVHRDIKPQNMLLYNQHEDLKICDFGLAT DMS NNKTDMQGTLRYMAPEAIKHLKYTAKCDVYSFGIMLWELMTRQLPYSHLENPNSQYAIMKAISSGEKLPMEAVRSDC PEG IKQLMECCMDINPEKRPSMKEIEKFLGEQYESGTDEDFIKPLDEDTVAVVTYHVDSSGSRIMRVDFWRHQLPSIRMT FPI VKREAERLGKTVVREMAKAAADGDREVRRAEKDTERETSRAAHNGERETRRAGQDVGRETVRAVKKIGKKLRF >Found with A. mellifera BB270004B10G5.F MSIKSLTYVAIFGLFWGSIAGTVVDQFGIYGGSPITTTERSNAELRCMNINPQNSVDLEQMMGLWYGSEIIVHSQDF PGT YEYDSCVIIHLTDATDQIRLSQANRGYGYGNQDYNRNQNNYGRTTTTQSSYPDSDEYPLRSIQSQQKYLRLIWSERD NNL EYTFNYTTSAPGQWSNIGDQRGSLVTLNTYTQFTGTVQVVKAVNDHLVLTFCGNDVKSSIYTVVLTRNRLGLSLDEL RSI RNLLSRRGLYTETIRKVCNGCGRLGGSLFALLALLLVVRLAWGRGQ >Found with A. mellifera BB270012B20H7.F MNSQKEYVSDCETDDDYYVDLLTSGKGSDKSESDVSDKSENYPGLKSKHTAKALRKTRHCDGDNREYRSKECDDLHS EEE SEKSRSDALWADFLGDIDTKSVINQKTDYTEGNAASATNTNTHETCNKYDKNDTAIIKTAQQYDSKRTTLSVSTLGK IKR SSAEKSIGTMINKFEKKKKLTVLERSQLDWKIFKQDEGIDELLCSHNKGKDGYLDRQDFLERTDLRQFEMEKKLRLS RRP Y >Found with A. mellifera BB270013A20H11.F MSFHFAVLTLILTAFTVSLCAEQKITKSDAGEIRIFKRLIPADVLRDFPGMCFASTRCATVEPGKSWDLTPFCGRST CVQ NEENDAKLFELVEDCGPLPLANDKCKLDTEKTNKTASFPYCCPIFTCDPGVKLEYPEIGKDNDKKNSE >Found with A. mellifera BB270028A10H8.F MKNDSCSLRMAIYVCFDSAFEYIAKYIQETFIFTKICTMHPGQIEVNEINGYWTFLLSIDWKDPWLIGLILAHILTT TTA LLSRNSSNFQVFLFLVLLLAVYFTESINEFAANNWSSFSRQQYFDSNGLFISTVFSIPILLNCMLLIGTWLYNSTQL MVT LKTAQLKERARKERQTKADSESIAHKKAE \----------------------------------------------------------------------------- \--