FB2026_01 , released March 12, 2026
FB2026_01 , released March 12, 2026
Reference Report
Open Close
Reference
Citation
Robertson, H.M. (2001.1.8). FlyBase error report on Mo n Jan 8 14:31:14 2001. 
FlyBase ID
FBrf0133242
Publication Type
Personal communication to FlyBase
Abstract
PubMed ID
PubMed Central ID
Text of Personal Communication
Subject: Re: more annotations
Sima,
.
Let me make sure you understand what I have done. These are the results from
a small 600 EST project on a tephritid fruit fly, Rhagoletis suavis, conducte
d
by myself and Stewart Berlocher, and the large 20,000 honey bee EST project o
f
Gene Robinson's lab described in my email to Michael. The former yielded fou
r
unannotated genes, and the latter about twelve, but many more suggestions for
improvements of existing annotations (it turned out that my previous estimate
of tens to a hundred unannotated genes was exagerated when I went back
carefully and tried to reconstruct each of them). We did BLASTX against
Release 1 Drosophila proteins at E-05 and TBLASTX against the genome at E-06;
for the +/-9000 honey bee contigs and singletons this yielded about 150 TBLASTX
hits without corresponding BLASTX hits. I've examined all of these carefully
,
and the ones I am sending you are the 50 or so that look interesting (others
are proteins only in release 2, or artifacts). In addition I was sometimes
curious about large neighboring unannotated regions in the genome and note
some interesting unannotated genes in them.
The two files are both text files, however the New Drosophila Genes file is i
n
PAUP format and hence can only usefully be viewed with that program (I've
attached an old Mac version if you don't have it). Each annotation suggestio
n
is separated by a series of asterisks, following which is the name of the EST
or contig and its sequence, so you can do the searches for yourself to see
what I saw.
The mRNA and protein FASTA sequences are together in the second file.
I'm considering doing something similar for the other available insect EST
projects, that is, the 15,000 Bombyx mori, 6000 Anopheles gambiae, and 1300
Aedes aegypti, as well as our own 1200 Manduca sexta ESTs, and perhaps the
17,000 Anopheles gambiae GSS sequences available. I wonder if you've already
heard from the Bombyx mori and mosquito folks in this regard?
Hugh
.
Hugh M. Robertson
Professor
Department of Entomology
University of Illinois at Urbana-Champaign
\-----------------------------------------------------------------------------
\--
'New Drosophila Genes' file
<up>FlyBase curator comment: original PAUP version of this 'New Drosophila
Genes' file is archived. What follows is a version containing the text
of the file with the genomic and EST sequences that are shown aligned
in the original file but does not show the actual alignments between
these sequences.</up>
New genes in the Drosophila genome identified by TBLASTX searches with tephri
tid fly and honey bee ESTs that don't match Drosophila proteins by BLASTX
Turns out most are really just problems with the current Drosophila annotatio
n, where N and C-termini are not annotated
\********R. suavis J3-A2
CAAAGTTTAAAAAAAAATCCAGCCTTAATTCCATTATATGTATGTGTTGGGGTTGGAGCGCTCGGTGCAGTTTTCTA
TACTTTGCGCTTAGCTACACGAAACCCTGATGTAACATGGAATCGTTCATCGAATCCTGAACCTTGGCAAGAATACA
AAGACAAACAATACAAGTTCTATTCACCGATAAGGGATTACAGCAACATTAAATCCCCAGCCCCTAAATATGAGGAA
TAATTTAATAAAGATGTATCTTTCGGCTAGAACATTTCGTTAGCTAATGAAATTAAAAAAAAAAAAAAAAAAA
Matches human and other organism NADH-UBIQUINONE OXIDOREDUCTASE MLRQ SUBUNIT
(COMPLEX I-MLRQ) at 52% over length of short 80aa protein
TBLASTX match to Drosophila genome is an unannotated region of a small messy
scaffold 142000013386032
AE002656.1
176729-ATATAGTTCCATTCTGTTTTATTGGATTGAGTAAAGTTAGgtaattttttataaatgtgtttctttctta
catattaaatacattttaattataacgtagACAAAATGCAAGGTCTTGGTCTGCAAAGTCTTAAAAAAAATCCAGCT
gtaagttttatgattatattaagttatctgttgattgcataacaaaaatttgttcagTTAATTCCACTTTATGTGTG
CGTTGGAGCGGGAGCTATTGGAGCCGTCTACTATATGGCTCGACTTGCTACTCGTAATCCCGATGTCACTTGGAATC
GCACATCAAATCCCGAACCATGGCAAGAGTACAAAGAAAAGCAATACAAGgttagatttgctccttaatattcatat
tctattttcatattttaatgggccaagaaaagccctcgaccgtggttttgttaataaccaaatttgcttatagatat
atgtatttttgcaattttttacttttttggcgatttaaccgacatgccataaccattacacatatatatttcacgaa
tatatactatgttgtagttgtagttatacccgttactcgtagagtaagatgttatactagattcattgaaaagtatg
taataggtagaaggcagagttttcaccatatgaagcctatatattctcgatcaggatcaatatccgagtcgatctga
cgctgtccgtccgtctgtatgaatgtcgagatctcaggaactatacaatctagaaggttgagattaagcatacagat
tctagagaaatacccgcagcgcaagtttgttggcctatgttgccacgcccacaaatcttcaaaaactgccacatttt
ttcacatttttattagttttttaaattttgattgatttcccaaaaattttattcaccaatatctatcgatatcccag
acaaatcatgaaatttcgctatggcattttaactagctgaataacgggtatctgatagtcgaggaagtcgactattt
tgtacagtgatactttttttcacttttattatttgtctcgcaaaacaattcccaataaccgtaaacaccgcctgtta
acaatctctaaacggagatatttgcgtatatcaactagctaataacaaattattaaaacaccatactcaccattctt
tatggcagaaaaaatgtatatatatatatatatatgggtaatttaggtaaataggtaaaataagtgaaattaagaaa
tgatggggtgtcattagtgcgtattaaggaataattcaacccttaaataaaaaggaagagggatatgcttcaatatg
tagtgattgcgacttaaacgccaatcaaaccctttaattttatcaattttataataatctagttatttaaaactatg
aaaaaaaaataaagatcacaaaaatgttctaacatttaatattttatagcaattcggacgcaatgcgggagaactta
tcatttattttgtcgacagtgactaaccagaaatgtagggatcgccagagatttacacagggtctaaaacgttcgct
tcagataaatttctatcaaattctatcacttcactagaatagtagattctttaaaatgcatgcaaccagttaatgga
tgccttcctgaccttaaaaattatacatattcttgaaccgagatgaacctcgtaaacctccagggaactattgaagc
tagaaagttgagactaagcatgtgtatttaagggtcaactacgcagtgaaggtacataagttccattatattaccca
caagccccgcaaacactcactgttacttaataatgttttaatttttttttgtttttgcctttcaaatttcttttgct
aggtaaacaattgtttgcgtaagtaatgtaaaactgattgcgatcgttcagtgtcagtttactaaaatctaaaaata
tttaaatctcttaattgctggtggacttataggtttactgtaattcgtgttttttagatgaaatatataatattata
atataatataataatataatataataatataatataatataataatataatataatataatataatataataatata
atataatataatataatataatatataatatattgaatcttgcattttttttcagTTTTATTCGCCTGTGAGGGATT
ATTCCAAAACTAAGAGTGCTGCCCCAAACTTTGATGAATAAATTACGTTTCCCTAGCAGCTGCAAT
GH04411.5prime
TTAG------------------------------------------------------------ACAAAATGCAAGG
TCTTGGTCTGCAAAGTCTTAAAAAAAATCCAGCT-------------------------------------------
\--------------TTAATTCCACTTTATGTGTGCGTTGGAGCGGGAGCTATTGGAGCCGTCTACTATATGGCTCGA
CTTGCTACTCGTAATCCCGATGTCACTTGGAATCGCACATCAAATCCCGAACCATGGCAAGAGTACAAAGAAAAGCA
ATACAAG----------------------------------------------------------------------
\-----------------------------------------------------------------------------
\------------TTTTATTCGCCTGTGAGGGATTATTCCAAAACTAAGAGTGCTGCCCCAAACTTTGATGAATAAAT
TACGTTTCCCTAGCAGCTGCAATTTAAAAATGTAAAATGAAATAACTTCAAATTATAAATAAACAT
GM01687.5prime
ATATAGTTCCATTCTGTTTTATTGGATTGAGTAAAGTTAG-------------------------------------
\-----------------------ACAAAATGCAAGGTCTTGGTCTGCAAAGTCTTAAAAAAAATCCAGCT-------
\--------------------------------------------------TTAATTCCACTTTATGTGTGCGTTGGA
GCGGGA-CTATTGGAGCCGTCTACTATATGGCTCGACTTGCTACTCGTAATCCCGATGTCACTTGGAATCGCACATC
AAATCCCGAACCATGGCAAGAGTACAAAGAAAAGCAATACAAG----------------------------------
\-----------------------------------------------------------------------------
\------------------------------------------------TTTTATTCGCCTGTGAGGGATTATTCCAA
AACTAAGAGTGCTGCCCCAAACTTTGATGAATAAATTACGTTTCCCTAGCAGCTGCAATTTAAAAA
translation
M Q G L G L Q S L K K N P A \--------------------------0---------
\--------------------- L I P L Y V C V G A G A I G A V Y Y M
A R L A T R N P D V T W N R T S N P E P W Q E Y K
E K Q Y K \------------0----very long
F Y S P V R D Y S K T K S A A P N F D
E Z
So, translation is correct, encoding 83aa protein with 56% almost colinear ma
tch to full-length of mammalian proteins
Drosphila ESTs indicate that there is an intron in the 5' UTR, no surprise th
ere.
All splice sites are predicted, except intron 2 acceptor; but there are sever
al good acceptors within the long intron that are ignored.
MQGLGLQSLKKNPALIPLYVCVGAGAIGAVYYMARLATRNPDVTWNRTSNPEPWQEYKEK
QYKFYSPVRDYSKTKSAAPNFDE
\*********an extra one!
The long intron above has several potential ORF-like regions, and
indeed one matches a short predicted protein, CG13301;
Score = 45.1 bits (105), Expect = 0.003 Identities = 24/36 (66%), Positives =
27/36 (74%)
So is on reverse strand, about 400bp in:
AE002656.1
ATGCTTAATCTCAACCTTCTAGATTGTATAGTTCCTGAGATCTCGACATTCATACAGACGGACGGACAGCGTCAGAT
CGACTCGGATATTGATCCTGATCGAGAATATATAGGCTTCATATGGTGAaaactctgccttctacctattacatact
tttcaatgaatctagtataacatcttactctacgagtaacgggtataactacaactacaacatagtatatattcgtg
aaatatatatgtgtaatggttatggcatgtcggttaaatcgccaaaaaagtaaaaaattgcaaaaatacatatatct
ataagcaaatttggttattaacaaaaccacggtcgagggcttttcttggcccattaaaatatgaaaatagaatatga
atattaaggagcaaatctaac
translation
M L N L N L L D C I V P E I S T F I Q T D G Q R
Q I D S D I D P D R E Y I G F I W \*
No EST matches for this or CG13301, however these is another something
out there that is similar.
Lots of room for a long 5' UTR, and a possible large intron in it.
\*********R. suavis J3-D1 AGCAGTTCTTCTATGGTGGCTGTCAAGGAAATGATAATCGCTTTGACACCA
AGGAAGAGTGTGAGAAAACATGCCTTTAAGTGATGCGATCAAGTTTTATTGATAAATTATATCGATAAATTTATCGT
CGTTGGATTGAAGTCGAGGCATTTAGTTGTTGATGGCCTTTTGTAAATATGTGTTGTACCTTATGAAAATGTATTTG
TTAAAAAATTATTGTATTATCTCCGTTAAAAACTGTGCATACTAAAATATAATTTATAAAATTTTAAAAAAAAAAAA
AAAAAA
Matches various Kunitz domains; best Drosophila genome match is an
unannotated region of about 22kb!
AE003623.1
aaaccactatggtggttcgcatcgagctttgccgtggttgttgttcgatgattttgggtatagcacacggcatgaag
ttatatcgagacttaattaagtgccgggtccagaggacctaattcaactgggtcctggagctggaatggaaaatgat
tgtacggcggttactaacaaattctaacggatatgggttttgtgtaacaaactgctagttgatttttgggttttaca
tttaaatggaaaggaaaaaaaaaagaaaacaaatttaggtgctggaaaaacgtattttaagcaccactctgtgtaac
ccaatattttctatggctttatctacatctcacacctgaaaatgattaatttaaaaatttcaatgtacttcgatata
atctaggcgttcctcatcctccgctaaatatcaccattaatgtgcaataaagaattatgtagagacagttaataccc
gcgagacaaaaagcggctggcttatcttagccaccaccccgaaattcgcgtgcaaaacaattttcaactcgcttcct
tttggctgacgacgaaattttcgggatataccccgcctattcaggcatttcccccacattctgtctgccaatttacg
ttatcgttttcaaattgaaaatttgcgtaatcacattttaagtgaaaatcgctttgcaaattaaattaaatcgattt
tttcaagacgataaaatcgggctgagtgaaaatcgttctaccaaaagttgcagggcacgtaaatatacattatggcc
ttctttttatttccgtttccggtttggcattagagctttaagctctgaaatatgcatctgcagttttactttggtgt
ctgcatagtcgcgacattgcatctcaaagacccatcaacgtcatcaaatcgttagtcaaatacgagaaaagaaaaaa
tatttaagttctttgctcttcgcgcaagtctctcgtggcaataaaaatgatgaagtactttgaaagtactagtagga
ctgggctggaaaaataagagaagaagttagaaaagcttgcattcctggcatatcctttagtttttatgcatacccaa
caatgaaaaaaagggtacattgagatttcgaacgatttgaatgattaagagtagttttgattcttatgatttaaaat
aaataatagataaataaatttaaaaaaatcatttctagtaattcagctgatgtgaaatattaaaatatatttcacat
attattctgtcatactagccttgaactttcattaaatgtaaacaaaacaatttataacttgtgtccgtaattttcaa
atatttttatcATGTATATTTATTTTCCAACAATCTTTCTCTTATTTTTGTATCCAGTAGTAGCAGTTGTCCCTCAA
GGATTTACAATTAAACAACgtaggtcttagaagaaccaaaaattgcatatgaaatttaaaattgtatatattcttag
CAAAATGCTGGTATGTGGCAAACCCTGGACCCTGTGATGATTTTGTAAAAGTCTGGGGCTACGATTATTTGACTAAT
CGTTGCATTTTCTTTTATTATGGAGGCTGTGGTGGAAATCCAAATCGATTTTATACGAAAGAGGAGTGCTTGAAAAC
ATGCCGTGTGTACAGACCTCCAAATCgtaagaaaagggaagaaaatttggacgaagaagaggaggaagagtttgagg
aagatattgacaactgggacaaatgggacagcgaatgggacaggatggatctatgaccatatcaatactaagggtat
tcaagacccgacatgccgaatagacattggcttcaagttaacttttgattcgttgtgcaaacgcataatttcagttt
gaaagaaagaacttcccgtcgagttggttgtaatgcgttttcttttagccgtctccactgcattttccatcttactt
acgacggatatatatatatatccctctagACGTCTGTTTGCTGCCAATCTGGGCGACGGCCATTAAGTCAAACCGTT
TGAAGCAATTTGAAAGCTACCCAGACTATGCAACATATATATTTTTACAGACTCCCTGGGTTATTTATCAACAATTT
TATGTGGATAGCGTTGCGATTCTGACAATTTTTGACATGCAATTTGCCATTTTCCATCTGCTTCAGCCGTATTTTGG
GTGTGGAATTTGGCATTTTTCCGCAGGCTGCAATAAGTTTTGGCAGCGAATGCAGATGAGGCTCAT
translation
M Y I Y F P T I F L L F L Y P V V A V V P Q G F T I
K Q \--------------------------2-------------------------------P K C W
Y V A N P G P C D D F V K V W G Y D Y L T N R C I F
F Y Y G G C G G N P N R F Y T K E E C L K T C R V Y
R P P N \-----1---------------------------------------------------------
\-----------------------------------------------------------------------------
\-----------------H V C L L P I W A T A I K S N R L K Q F
E S Y P D Y A T Y I F L Q T P W V I Y Q Q F Y V D S
V A I L T I F D M Q F A I F H L L Q P Y F G C G I W
H F S A G C N K F W Q R M Q M R L M M R Z
Can reconstruct a nice small gene with two introns, encoding 180aa
protein with a single Kunitz domain in it.
MYIYFPTIFLLFLYPVVAVVPQGFTIKQPKCWYVANPGPCDDFVKVWGYDYLTNRCIFFYYGGCGGNPNRFYTKEEC
LKTCRVYRPPNHVCLLPIWATAIKSNRLKQFESYPDYATYIFLQTPWVIYQQFYVDSVAILTIFDMQFAIFHLLQPY
FGCGIWHFSAGCNKFWQRMQMRLMMRZ
Kunitz domain C G C WYYD C F YGGC GN N F T C
E C
Best BLASTP match is to PROTEASE INHIBITOR CARRAPATIN, a 69aa protein
from ticks; 53% over 53aa.
Neighboring genes are TOLL-4 and Or30a!
No other BLASTX matches for this 3000bp region, although the entire
region between the neighbouring genes is 22,000bp, so could be others
in it, or long 5' UTR introns.
No ESTs to help solve this one, even from the entire 22kb!
\***********R. suavis J3-A7
ATCAGTTTACGTGGAAGCAATCAAATATTTGCCGAAAATAAGCGTTGATGTTTTAACTAATTACAGTCATTTTGATT
GTTTCCAAACAAACAAAAAACAGCGAAATGATTGAAGTTACAGATTTACAAAAAATCGGTATTGGGCCTGGCCTGGA
TTTGGTGTCTCATTCCTATTTTTGGGTGTTCTATTTCTGTTCGACAAAGGATTGCCTGGCCATAGGAAACTTGCTCT
TCCTAAGTGGTCTGGCATGTGTAATTGGAGTTCAACGAACATTCAGGTTTTTCTTTCAACGGCACAAAGTTAAAGGT
ACCACAGCATTTTTTGGCGGAATTTTTGTGGTTCTACTGGGATTTCCAATGATCGGAATGGTTATTGAGTTGTATGG
ATTTTTTGTGCTGTTCAGCGGATTTTTCCCCGTGGCGGTAAATTTCTTGGGGCGAGTACCGGTGCTGGGATCAATTT
TGAATACTCCATTGATTCAAAAGCTGGTACAAAAACTCGGTGGGGACGCGAATCGAACAACGGTATAGTAGAGATCC
AAACAAACTTTCAACTAAGGATAATTTTAATTTAGTTTATTCCTCCGTGAAATTAAACTTAAAGCGCGTTTTTGTAC
AAAAATACATAGATTCCACTTTCACATGA
Encodes complete protein with excellent matches to 180aa proteins from
all eukaryotes; 50% identity to CGI-141 protein <up>Homo sapiens</up>
AE003501.2 RC
atacgtattaaatggtaaaaatagttttatttaatacattttatcatttatttgcatccaaatggactgaatccaag
aggttttaaataattaatcggcagcttgatcgctttacaacactaaagcgcttgcatccctgcaaacattgtttaca
gtcgttagcacgtaactttgaatgaaagtcgaaatcagctgtttgtgctttgaaattgaattggtgtttacgtggat
ttaattttttctgaaatcaaattaaagcagagccaaatacaaaATGATTGAAATATCAGATTTGCAGAgtaagtagc
cattaatatgcgaaagccatcagcgcaactaaaccgcctattgaaatcttcccgcagAAATTGGCATCGGCTTGGCT
GGTTTTGGCATTTTCTTTTTGTTTCTCGGCATGCTGCTGCTGTTCGATAAAGGACTGCTCGCCATTGGCAATgtacg
ttacccacatgcacatgcaccatatgccccatatagcacacccttcgagctctataataatagcaatcgcctctttc
ccgcagATTCTATTCATATCGGGCCTGGCCTGCGTCATTGGCGTGGAGCGCACGATGCGCTTTTTCTTCCAACGGCA
CAAAGTCAAAGGCACAACGGCCTTCTTAGGGGGAATCGTCATCGTCCTGCTGGGATTCCCCATCTTCGGCATGATTA
TTGAATCCTATGGATTTTTCGCACTCTTCAGgttcgtagcaccctggccaagtcccagtcggattcgtttaagacca
tttgagcggggctaaccggttacgcactccacatggtcttttcgctttcccgctcttaatacaccattcgcaagtca
gggcttggcagtcaaagttgctgtcagatgacatcctttagaaatattttatattttaattgaaagactgcaagtca
tgtagatgggacaatttgacgctgtggcattaacagcaaataccaagtataattcttcagattcgcttacaaaaaaa
gctcgatcagattttatagcatttgatcagagctaaagaaagcaaaaagtatgtccttcattataactattcgctgt
ttctgtctgttttgttctgttctgtccttttatacattttttcctatctatacctctctccttttaccttttcaatt
gcatgcttcttcataagttacgaccaaaggtgcaacttaataattaccctaatatatgtaggtctttttattcatta
aaatatcttttaacgaacgttaaaaggtctgcttcatatatatcggattaatgaatggaaatctattcaaagtaggg
aaaatattcagtgaatataagaatttaacttaacttcgaaacgtgcaagatggcaatacaagtggaagtaacttctt
ttagtgatttgtcgtttacttattacttattaatgtccttctctttatattttcagCGGCTTCTTCCCCGTGGCCAT
TAATTTCCTAGGCCGAGTGCCTGTTTTAGGATCGCTGTTTAATTTACCATTTATACAAAAGgtaaggtagcatggag
cctcacaaaaaataaaaaatataacatagcaaatagcccatactcatgatcacttaacgcgctcatctcgccttgac
gcgaaagcccattgaaataaactgaagggtccacattcccaaacgaccacgcagactcgatttcatagccaattttg
gtatttatctaagctatgtatttcaacatttacaagtccaaagtagagggatttagtggcgcgtttagtggatcttt
tcccagtcctcacgctcctcgcgctccttcacgacctccttgaggtagggttccagatacttgacgtcctcctcgta
cttggtccactgctccttgggcagaatggtcttggtcatggacagatggagggccctcatgatgcggtagttacgct
catcgtacagcttcctgggcaatcggcgcacggcctccttcacatcctcgttctcatacagacaatcatcgcgatgc
agacctgcaaaaatcgatggcatcaagctcagctgttaggaatcactaggttatatagtagtcctaccgtattggtt
gaatccggagagattgtaggcccatctgcccagatttgctgtggggaaaaattcgcattacatatttacgtaatcat
aatattccggcgtactgcactacgaaatttgttgcatgcaacgtgagtccgattgttgcccgttcg
translation
M I E I S D L Q \--------------------------------1--------------------
\-------------K I G I G L A G F G I F F L F L G M L L L F
D K G L L A I G N \----------------------------------------------0--
\--------------------------------------- I L F I S G L A C V I G V
E R T M R F F F Q R H K V K G T T A F L G G I V I
V L L G F P I F G M I I E S Y G F F A L F S--------2----
\---------
G F F P V A I N F L G R V P V L G S L F N
L P F I Q K \---------0----------
Can deduce great little gene encoding 140aa protein
Drosophila protein MIEISDLQKIGIGL-AGFGIFFLFLGMLLLFDKGLLAIGNILFISGLACVIG
VERTMRFFFQRHKVKGTTAFLGGIVIVLLGFPIFGMIIESYGFFALFSGFFPVAINFLGRVPVLGSLFNLPFIQKIV
QKLGGDGNRTTV
J3-A7 MIEVTDLQKIGIGPGLDLVSHSYFWVFYFCSTKDCLAIGNLLFLSGLACVIG
VQRTFRFFFQRHKVKGTTAFFGGIFVVLLGFPMIGMVIELYGFFVLFSGFFPVAVNFLGRVPVLGSILNTPLIQKLV
QKLGGDANRTTV
The region shown here is the RC strand between the forward strand eas
and cas genes, with CG3560 being in the long 4th intron of this gene,
on the forward strand \- and it is a real gene!
No ESTs to help out with this one; although there are several for
CG3560, which is a component of the cytochromes
\***********R. suavis J3-B3
ATTGGCAGAACTGAATGTTGAAACAACGAATCAAGATGGACGTGTAAATCAATTAGTGAGTGCTTCGCAATACAAAG
CAGGCATCTACAAGTTGCATTTCGATGTGGCATCATATAACGCAGAACGTGGAGTTAAAAGTTTTTATCCTTTTATT
GAGATTGTCGTACAATGTGAACGGAATCAACATTATCATATTCCATTACTGTTAAATCCGTTTGGTTACACAACCTA
TAGGGGTACTTGAATGTTAAGCTGATTAATGAATTATTAAACTAATATTTGAACACACATGGAAATAAATGATGCTT
TTATATATAAAAAAAACATACCGTGACTCAAACAATGATAACGTGTAGATTTTTTTTTAAATATTCGTTATGTTTTT
CGAGCTAAACTTTTTTTGTTATTTTTCTATAAAAACACCGCCCAAGCTTTGATCAGAAACATATTAACCACAGTTTC
AAACCTTATCTAAAAATCATAAAAATTCAAGAAATTTTCATCGAAATCTAAACTGTGCGGAATGGTAGTGCGCACTT
CCAACTACATCACAGCGCTCAGCACTTGTTAGTTATATGCATGAAAAAAGCAAACGA
possible end of ORF encodes matches to end of yeast and C. elegans
120aa proteins; called probable transthyretin precursor \- fission yeast
gcaggttgtgtgaccagatcctattaattacgctgttactaataccaaagacac
tagaataatagaatattctagtgtctttgctaatactttaaacaaatactgaaacatacggcatcaaacgattttta
tatttttaaaaactgttgcaaatttattcgtttatttggaattaatacttcattaaaaaaaagtatatatatgatcg
ttccaatttatttccccaaacagatttttggcttacatattattttgatatcccacttatcagtgattttcattgtt
aatgcaagatgcaatcaaagattaagataaagagtcccataaactggagcttttgctaccgtctgagataccagaga
tatctacgatctttaatcttaaagttgccctcaagATGGATGCACGAAAGTTTTCTACCCACATATTGgtaactttg
tttcaatttatttatatgtataaaatttttcgttaaccttttacagGATACTTCGGTGGGAAAGGCGGCAGCCAATG
TGAGAGTAACAGTTTCCAGGCTGGACGAGATTCAGGAATGGAGATCCCTTCGGGCGGCCCAAACTGATGCGGATGGT
CGCTGCCTGCTCTTGGAACCTGGTCAATTTCCCGGCGGGATCTATAAGCTGACCTTTCACGTGGGCGCCTATTACGC
GGAGCGCAATGTGAGGACACTTTATCCAGCAATTGACTTGATTGTGGATTGCAGTGAGAATCAGAACTATCACATTC
CTTTGTTACTCAATCCCTTTGGGTATTCCACATATCGTGGAACATAGCTCGGTTAAAACCGAATAATGGATGTTACA
CAACTTACAAAAATGTAATTGATTTGAATAAAGGTTTTTTAAATGTTTATTTGTAACATCTCGGGATCGCTTTACAC
TTCGTGCGATGTTCGTGCATGAGTAACCGTGTCATGAAATCAACTAACCTCACGCTCCC
translation
M D A R K F S T H I L \--------------------------0------------------
\---------- D T S V G K A A A N V R V T V S R L D E I Q
E W R S L R A A Q T D A D G R C L L L E P G Q F P G
G I Y K L T F H V G A Y Y A E R N V R T L Y P A I D
L I V D C S E N Q N Y H I P L L L N P F G Y S T Y
R G T \*
Drosophila protein
MDARKFSTHILDTSVGKAAANVRVTVSRLDEIQEWRSLRAAQTDADGRCLL-LEPGQFPGGIYKLTFHVGAYYAERN
VRTLYPAIDLIVDCSENQNYHIPLLLNPFGYSTYRGT
J3B3 LAELNVETTNQDGRVNQLVS
ASQYKAGIYKLHFDVASYNAERGVKSFYPFIEIVVQCERNQHYHIPLLLNPFGYTTYRGT
No ESTs for this one
\*************A. mellifera Contig1006
GGAAAACATACGTGGAGTCGACGGTTACTACTGGGATCCAGATCTATACGTCAGAGACGTTGAAGCAAGCCGAAGGT
GTGGTGAGCACAGTGAGGTGCGAGGGCAGGGAACATGCCCTCGGCCGCGGAATCAAGCGAAAGCTGGACTCCATCCA
TTCCATGCATTCTACCCTGCATGAAGACCAAGATGTAGCCGAGGCAAAGTCGGAGGAGAAGAGCCAGAGGAAACTGG
AGGTGGGTGAGCTAGTATGGGGCGCCGCAAGAGGAAGTCCGGCGTGGCCGGGCAAGGTCGAGTCTTTGGGCCCACCG
GGCACCATGACGGTGTGGGTCCGTTGGTACGGGGGCGGGGGCGGTCGGAGCCAGGTCGAGGTCAAGGCTCTCAAGTC
CCTCTCCGAAGGCCTCGAGGCGCACCACCGTGCGCGAAAAAAGTTTAGGAAAAGTCGTAAATTGAACATGCAGCTGG
AGAACGCTATACAGGAGGCGATGGCTGAGCTGGACAAGGTGACGGAGTCGAGCAAGGAGCAGAAGGTCGGCGGGAAG
TCGTGCAAGGTGTCGAGCGGAAGCAAGGAGTGCGGCAACGCAGGCTCGAAGCAGGATGGGAAGAGGTCGTCATCGAA
GAAGGCGTCTGTCGGTTCTGTGAATCCTGTCGCGGCTGAACAGAAACAGTGTCGGTGATCCGTGTCGATGGATCCTC
GTACCAGTCAATTTGATAGATCACGATGATGATGAAACGCACTGTTCCATCTGTGAACCATAATCAAACGTGTTAAT
TTTGTGATTGAGAAACACCGACACCGAAAAACTTCCTCCACAACTCTTGTTTTTTAAAATTCTCGACTGACACGTAC
ACACACACATGCGCATACATATATACATATACACACGGTGTTTCCCTTGATAACAGAAAAAAA
Matches DNA cytosine-5 methyltransferase- mammal 40%; Dros EST 51%;
Dros genome 37, 67%; seems to be in same region, but not same
translation, as sba gene?
So this one is tricky. First, it matches the extreme end of a 1498aa
human protein KIAA1461, in a region known as a PWWP domain in CDD
However in other DNA cytosine-5 methyltransferases this domain is
around 300aa in 900aa proteins. Since our best match is clearly
KIAA1461, and ours is clearly at the end of an ORF,
try searching Drosophila genome with this protein. Hopeless, since get
lots of matches, including to sba gene. So could represent an
alternative end to sba gene.
\***********A. mellifera Contig1287
CCAAGATTGAGAAGGCAGACGTCCAACTCGAGCTTGGACAACGTCGCGCTCAAGCAAATTTTACATTCCAGCGAGAA
CGTCAATTCGGAAGGTGACACGTCCAAATTGGCCAGCTTCGCGAATCTGAGCAGGCAAAGCTCGGAGAAGGGGATCA
ACTTGACGTACACGGAACAGGATCGAGATGACGGGAAATCGAATATGTCCGGTAAGAAGTTTGGCCAGACGAATGGT
AATGGGAACGGTAATGAGAAGAAGACTACGTTCGCCACTCTGCCGAACACGACCACGTGGCAACAGCAGAGCAGCCA
GCAATCCCAACAGGTGGAACAACATTCTGTTGATGAAAACGGTGGTAACACCATTATGGCCTCGCAACTGAATAACA
TTAGATTGAAGCTGGAGGAGAAACGTCGGCACATAGAGAACGAGAAGAGGAGGATGGAGGTCGTGATGTCGAAACAG
CGTCAAAAAGTTGGCAAGGCTGCGTTCCTGCAAGCTGTCACGAAGGGTAAGGTTAAATCTCCCTCTTCATCAACGTC
TGGGGGGGACAGTCCGGCTGAAATTGGTCCCCCCACTTCTGTAACCTCCGGATCTTCGGGGGAGACCCCGACAAGTG
TTTCCGAGACGACCCCTGTAACCCAACAACCCTCTCAAGAAAAACCACAGAGACCCTTCTCGCTCAAGGAAATTAGT
GAAGATGTTCGAGATGTTGAACATAAATGGTTGGAACATGACGGAAATGCGCCATTTATTGAAACAAGACGTACTCC
AGATATTGAGAACATGGATATTGAACAGTATCATCAATCCATATCACAAATGAATAACAGTCTTAGTGAAATTCAAG
CTGACATACAACGTTTAGCAAATCGAGCAAATCAAATACAACAACAGCATCTAATGACCCAACACCAACAA
complete ORF encodes 13% serine and 12% glutamine protein >300aa;
BLASTX is to N-terminus of 856aa KIAA1078 protein <up>Homo sapiens</up>;
24% over 260aa and 5e-10; genomic match links CG18462, CG18459, and
CG18460!; three ESTs, but not very useful
So, this tripartite gene spans 10kb, and is probably beyond my
abilities to reconstruct. Instead suggest they search with KIAA1078
protein and see how it spans these three genes.
\**********A. mellifera Contig1312
AAAATCTTAAATTTTGGTTAGACATGTAATAAGTATGGTTTTTGCATTATTCTTTCACATATCTTCTTGTTTTTATT
AATATCTTTAATAGTTTTTTAAAACTCTGTAAATACATATTCTTAAGATATTTAAAGTAATATCAAGTGTATATTTA
TATATCTGATGTAAGATACAGGTTATAATTTTGAGATTATTAATACAATGGATTTATCAAAAATTCCAAATGATAAA
AAATTATATCTTTGCAAATGGTATTTTAGAGCTGGATTTGTTTTTCTACCATTTCTTTGGGCTGTGAATGCTATTTG
GTTTGCAAAAGAAGCTTTCGTTGAACCACATTATGAGGAACAAAAACAAATTAAAAGATATGTAATATTTTCTGCAA
TTGGAGCAGCTATATGGTCAGCTGCTCTTTTAGCATGGATTGTTACATTTCAAACACAAAGAGCAGCATGGGGTGAA
TTTGCAGACTCTATTAGTTACATAATTCCAACTGGCATTCCTTGATGTTAAAATATATATCTTTGTATATTAATAAG
TGATTATTATAATATAAATTTTTATTGTACGAATTAAAATGAAATAAGACTATTCTTTGTCTTGGTATTTAATTAAA
TAATAATTGTCTTTAAGATGTAAAAAATATATTATTTATTTAGATGTATTGATTTTAAATAAGATTAAAATTTGATA
AAACATTCAAATTTTAATTAATTAAATATAATTGAACATAAAATGTAATATTTTATACAATTTATATCATTTATAAA
AATTTAAGTTACAAATAAATATTTAAGAACTTAAAAAAAAAAAAAAAAAAAAAAAAGCAAC
Could have internal ORF of 300bp, encoding 14% alanine 100aa protein;
indeed excellent BLASTX to uncharacterized hematopoietic
stem/progenitor cells protein MDS033 from human and another 100aa
protein from C. elegans, 40% and e-18;
NEW GENE in unannotated region; sadly no ESTs, but will be easy to
annotate.
AE003800.2
204127-ATGGACATCTCAAAGGCACCAAATCCGCGAAAACTGGAGCTGTGTCGCAAATACTTCTTTGgtaagagtt
actaccaatgagtaatgattggattttaaccaagttactttctatttgtctcgaacttagCTGGCTTTGCATTTCTG
CCCTTTGTGTGGGCCATTAACGTTTGCTGGTTTTTCACGGAGGCCTTCCATAAGCCACCATTTTCGGAGCAGAGCCA
AATAAAGAGATgtaagtcaatatatgaatagatgcccatgccatacagtctaatattccacaatttctttcctacct
tcctccagATGTTATATACTCTGCAGTGGGGACTCTATTCTGGCTGATAGTACTAACTGCCTGGATAATAATATTCC
AGACAAATCGCACAGCCTGGGGCGCCACAGCGGACTATATGAGCTTCATCATACCCCTAGGCAGTGCATAGACATAA
CTAGATTAATTCGTTAGCA
translation
M D I S K A P N P R K L E L C R K Y F F \-----------------
\-------------------1--------------------------------A G F A F L P F V
W A I N V C W F F T E A F H K P P F S E Q S Q I K
R \----------------------------------------1---------------------------------
Y V I Y S A V G T L F W L I V L T A W I I I F Q T N
R T A W G A T A D Y M S F I I P L G S A \*
bee
MDLSKIPNDKKLYLCKWYFRAGFVFLPFLWAVNAIWFAKEAFVEPHYEEQKQIKRYVIFSAIGAAIWSAALLAWIVT
FQTQRAAWGEFADSISYIIPTGIP
fly
MDISKAPNPRKLELCRKYFFAGFAFLPFVWAINVCWFFTEAFHKPPFSEQSQIKRYVIYSAVGTLFWLIVLTAWIII
FQTNRTAWGATADYMSFIIPLGSA
Neat little two phase 1 intron gene.
\***********A. mellifera Contig1411
ACGTTTTCGTGTGTAATCAGTGAAATTTTTGTGAAAATGTTTTCCACTTCTACACTGAGTGTCCTATTAGGGACAAT
TTTACTTATCTCTTCGATTCCCGATGCAGTATCTTTTAGCAAGTACGGGAGGACGTGCAAGGACATCGGTTGCATGA
GGGATGAGGTCTGCGTGATGGCCGAGGATCCTTGTTCGATCTACCAACGAGATAACTGCGGTCGTTATCCGACTTGT
ATGAAATCTCGTCCAGGCGAGGCTAATTGTGCCAGCACTCTGTGCGGTGAAAACGAATACTGCAAAACCGAGAATGG
CGTCCCAACATGTGTGAAGAAATCAGCAGTAAATGGATTCGAGTCGGCGGGCGTTTCTTACGTGAACGGGCAGCGGG
TGAACACGGACGAGAAGCAGCAGCTCGATAAGACGACGGCCAGCAACAGCGCTAGCAATTCGAACCCTTACGCCAAT
GCTAATGCGCCACCTGCCCCCGCCGAGCCAGCGGGAGGGTATCGCCATCAAGTGAATTCCGCCACCAATTTGGGTTA
TCCACCTTATCCTAGCTCCGACACGGAGCGTTCCAAGAGCGGTGGGTATCCATCGTATCCTGCTGGCTCCAGCAACG
GGTATCCGCCTTATCCTTCCCCCAATCAAGGGAATCGCCAACAGGATTTAGGATACCCACCGTATCCAACGCACAAC
AAAATGCCGATGCCCGGACAGTCGAATTATCCCACGTATCCCGGCCAACCGGGCCACTCCAATTA
complete ORF encodes 250aa protein with 12% serine and 11% proline;
lots of weak BLASTX matches involving cysteines in the first 100aa,
hard to say anything about them; genomic match is similar but at 50%;
then there are a bunch of ESTs, including two B.mori, and to same
sequences as Genomic, so is a real gene; NEW GENE in unannotated region
TFSCVISEIFVKMFSTSTLSVLLGTILLISSIPDAVSFSKYGRTCKDIGCMRDEVCVMAEDPCSIYQRDNCGRYPTC
MKSRPGEANCASTLCGENEYCKTENGVPTCVKKSAVNGFESAGVSYVNGQRVNTDEKQQLDKTTASNSASNSNPYAN
ANAPPAPAEPAGGYRHQVNSATNLGYPPYPSSDTERSKSGGYPSYPAGSSNGYPPYPSPNQGNRQQDLGYPPYPTHN
KMPMPGQSNYPTYPGQPGHSN
TTCCCGCGGGCGAAGAGTACGGCCGTGGATGCGGGGACATTGGCTGCCTGCCCA
CCGAGGAGTGCGTCATCACCAGCGACTCGTGCAGCTACAACCAGCGTGACGGCAAGGATTGCGGCAACTATCCCACC
TGCAAACGGCGCTCCGGCGGAGGATCATCCGCCTCGAACAGCAGCCCCAACTTGGCAGCCCCCTCGGCCAATCCGTC
AGGTATTTAGCTTGGAACCCCCCACTTGATTActcattgtgcgcttccctgggggagttgtccaggcgattggcatc
gtgtttcgtaattcgaatttcgaggtttggcgaccggcgattggcggctacatctttttggttcctcaaaacaagtt
atttcgggttaaatccaacgaatttctgggggcgcaaaagctgacttcgggcgccgaagataacaatagcctctcag
agacgagaaataacacgttgcattgaatacaattcaagaaataataattttctgaaatatacaaataatataacgct
attgactgacctattgtcctaataatcacatcacgtattcacatactctttccaagcctcgaaactctgcaaggcta
taagcttaattccaaattaccgaacttatttgtcgtttctttttgtttggccccaagtatttgttaagttttgtgat
catttccatttatatgtatgtatctatgctatataactgaatgtaatatgttttatggcgatctaatacccacacca
acacactgagcgcttccgctgactcaatcaatttaacactcaattcgactctcaatcaatcaatcaaccaatcactc
gctcgctcgctcattcatcgcctcattcgctcaaggcttgcaaatattcgactgctaaccacccacccgcccccgcc
tctccccctctccaccgatcacttgtgcatgttttaggaagcaacagtacagagagaaaaaaggcaacatgcaacac
gtctatttacttctgtgtatataacatatatttcaaattcaaattcaactttattagcactaaaaatctttacaact
gcataatagcttttgtattttaagcttacacaatctagcagctgaaatgaaaataataatttaagagcaaaaccaat
ttcttaatgtccatgtcttaaaattaacatcaagaaggttacgagtacttgaattaaaatgctagaatcatgttgtt
aaggaagcttgactggtgcctgacccatcatagattaacatatatat
translation
E Y G R G C G D I G C L P T E E C V I T S D S C S Y
N Q R D G K D C G N Y P T C K R R S G G G S S A S
5' region ttgcaagcttgtaatgcgcttttgattggttacatagggcagatgcgttttttt
ttttgtttagaagcaaactgccttcaaacttgttttaactcttacgcgaaagttggccaactgaaaaaaaagtattt
ttccatcttgtactttgcagcaacttttgatccaagggcgtgacatcgatagcgagcaacaggatgctggcactgcc
tttgctgactctggcggtcctcgccagctgcggctactccgtggacgcctactccagtacgtatacgtaatatttca
ggtgggggttggctttcgtggacaccttaccaccaactattgtcctagaaagccatcaccccactgccatttgttgt
gttgtgtaatcagtactcctgaaatgagcaccgatcccttggatcggtggtgaacctttcatgccttcatcaaccct
tgcccctccatcattgacatactgaagccagatgcggtatccccgattttgagcagacttataatttgattttcttt
tttttttccgtatgattttgacccacccactctgatccacaaaacacacaccgaaacccgcaatccgcaacccgaaa
tccgaaatccgtaatccgcaacccgaaaccgtaaaccgtaatccgtaatccttgaacctaatcgaat
This is going to be horrendous, with the ESTs showing a 5' end in the
next file, about 70kb away with a bunch of gene inbetween! then
linking CG1735 and CG1726.
SIMA \- this one is rather complicated, suggesting that there is a huge
gene linking CG1735 and CG1726, try working with the Bombyx mori ESTs
identified using ours.
\***********A. mellifera Contig1463
GCAATTATCCTCTTCCTTCCCGTTTTTGTTTTCGTTTTTTTTTCCTTTCTTTCCAGTTCGCTTTTTTTTTTTTTTTC
ACTTCACTTCTCCCTCCTTGGGCATAAAGGTCAACTCGAAATTGGTTGTTCATTGTTTTTATTGTTTAACTCTCGAT
CGATCGATCGTTCGTTCGTTCGTTCGTTCGTTCGTTCGTCATGCGTGATTGCGTGCGTACTTAAATGAGTGCGTGCG
TGCGTATTCACTCACAGACTTGCAGAAGTTAGCTCCATGACTGGTAGGAGTCTTGGTACCATTCGTGATCATCGCTG
GAAACACCGCCAAGAGTGGCGGTCCTACCTCCGAATCCTATGTCACCGCCGCCACCACCACCACTGCCACCATTCAC
ACTCGCCAATCCGTACGCATTGCCCAGTGGTTGTTGGGCGAGGGGTTGGTTTCCCCAGTTGCTCTGGAAGCGACGCT
TAGATTCCCCCTGGTTCTGGTGACCCCCGTCAAATTTTCGTTTACCTGCTGGTAAACTTCCCTTGGCGCGGACCCCC
CCACGAGCGGACAGCCGCTGGACCCCACGAGCCCCTGAAGTTGCGGGGTTGCGTCCCCCCCTGACTGGCCCACGGAC
TTGGGGGCCACCGACTCGGCCACGCGGCACTACTCCACGGCCACGTCCAGCCGGTTGAGGCTGCCTGCCTCTCCCTC
TGGCAGGTGGCGGAGGCGGCGCGTAATCGAAATAGTAATCTTCGTATCGATAATAGTCATCGTAATATGGATCAC
no obvious ORFs; but in RC end of ORF encodes 21% glycine and 11%
arginine protein with 35% e-05 match to end of 600aa heterogeneous
nuclear ribonucleoprotein R <up>Homo sapiens</up>; seems real to me; genomic
match is to C-terminus of CG17838; provides real end of this protein.
\***********A. mellifera Contig1481
TGTGCTTCCAATTTCTTTTTTCTTTTTTCAATTCTTTTATGCTTTGTACTTTTTAAATGAATGTTCCATTGAAATTC
TCCAATAAATATCCTATCGCAAATATCACAAAAGTGTCTTTCCTCGTTGCTACTATCACTGAATTTTTTATTCTCTA
TTGATTCGTTTAAAGGTTTTTGTTCAGGCTTTTCTCCTCTTAGCACAGCTTCGATTATTGCCACAGCTGGTTCATAT
ACACAACTGTCCCATTGATTCACATCGGTAGAATCTAATACATAAATTGGTGGTACTTGTCTGTCACTGCGACGAAG
TAGACGATTCATAACCCATTTTTTCTGTTTTTTCGCGTACCTCTTCGTGACCATTTTTAAGTCATCAATACCCCTTT
GCAATAATTCTTGTCCTTTTTTCCCTCCCTTCTCTTCTTCTGGCAACACAAGGTAATCATGAAACTCTTTGAAGCCG
ATACTTTGAAAAATGCCCTTCGTATAATCGACTGATGTGTTGGATTTAATCCGTTGCTTGTTGTATCTCCGATGAAA
GTCAAGCAGTTCCTGAACCAGACCGGTCTCCACCATGTCGTCGACCCTTCTCTCCAACCGATCCTCGAGGACTTTCA
TGTCACAATTGATCCATAATAGAATGGCATTGCGGTATCTCAAAGGACCTCCTAATCCAGAACCACCAGCTATCCTT
TGAGCTTTTAGCAATTCTGAATGCTTCACACCATGTTGCTCGAACACTTCAAGTGATCGAATGATCTTCCTCCTATT
GTTCGGATGAAATCTCTTCGCCATTTCCGGATCCACTTTAACCAACTCTTCGTAAAGCTCCTGGTTATCCTTTGTCA
TCGATCGATCCAACTCGATCTTCATCCTCTTCGTACGCGACACATTCTCGTCCAACCGGTCATCATCGTCCTTGCCG
ATCCCCGAGTCGTTCATCAGAACTTCCCAAAGGATGGACTCTATGTAATAGTTGGTGCCACCGACGATGATCGGGAG
CTTCCTCCTCGCGAGAAGATCGTTGATAATAGGTATGGCAGCATCCCTGAATTGTACCACCGTGTAGCTAGGGTTCA
GAGGGTCTACGATGTCCAACATGTGGTGAGCCGCCTTTGCTTGTTCCTCTTTCGTTACTTTCGCGGTCACGATGTCG
AGGCCTTTGTACACCTGCATACTATCGGCTGAAATAATTTCTCCGAAGAATTTACAAGCTAATTCGATGGCCAAAC
complete ORF in RC encodes tRNA isopentenylpyrophosphate transferase
<up>Homo sapiens</up>; 47% full-length e-102;
genomic match is 43%? But single ORF and unannotated! NEW GENE. One
EST
Match to human protein misses just 40aa on each end, a 467aa protein
AE003749.2 TACCAATTACTTGTAAGCACAAAAAACAGCTGACGGCAACAAGTGGTTCGGTCC
CCATCGGAATACACGTGCTCAAAACGTGTGGGTTTTATTTGCCTTAATTGACTTAAATTCACTCGCAATAAGTGGAA
ATGATTCGAAAGGTGCCGCTAATTGTAGTCCTGGGCTCCACGGGCACCGGAAAGACGAAACTGTCTTTGCAACTGGC
CGAACGCTTCGGAGGAGAAATAATCAGCGCTGACTCCATGCAGGTTTACACCCACCTGGACATCGCCACCGCCAAGG
CAACCAAGGAGGAGCAGTCCCGGGCACGACATCATCTACTGGACGTGGCCACACCGGCCGAACCCTTCACAGTCACT
CACTTTCGTAACGCAGCACTGCCCATTGTGGAGCGCCTGCTCGCCAAGGACACTTCTCCGATTGTGGTGGGCGGCAC
GAATTACTACATAGAATCCCTACTTTGGGATATTCTGGTTGACTCGGATGTCAAGCCGGACGAAGGCAAACATTCGG
GGGAGCATCTTAAGGATGCCGAACTGAATGCTTTGTCCACCCTCGAGCTGCATCAGCACCTTGCCAAGATCGACGCA
GGTAGTGCCAACCGTATTCACCCCAACAACCGGCGCAAGATCATCCGGGCTATCGAAGTGTATCAGAGCACCGGGCA
GACTTTGAGCCAGATGCTGGCGGAACAGCGGGCACAGCCGGGAGGAAACCGCCTGGGTGGACCCCTTCGCTATCCAC
ACATCGTTCTCCTTTGGTTGCGTTGCCAGCAGGATGTTCTAAACGAGCGATTGGATTCCCGCGTAGATGGCATGCTG
GCCCAAGGGCTGCTCCCTGAACTACGACAGTTTCACAATGCCCACCATGCTACCACTGTGCAAGCCTATACGTCGGG
AGTTCTGCAGACGATTGGCTACAAGGAGTTTATTCCCTATCTGATCAAGTACGACCAGCAGCAGGACGAAAAGATAG
AGGAGTACCTCAAAACCCATAGTTACAAGCTGCCAGGCCCAGAAAAACTGAAAGAAGAAGGTCTTCCAGATGGCTTG
GAACTCCTACGCAATTGTTGCGAAGAACTAAAGTTAGTCACTCGCCGATACTCAAAGAAGCAGCTGAAGTGGATCAA
CAATCGATTCCTGGCCAGCAAAGATCGTCAAGTGCCGGATCTCTACGAACTGGACACCAGTGATGTGTCAGCTTGGC
AGGTGGCAGTCTACAAGCGGGCAGAGACCATCATAGAAAGCTATCGAAACGAAGAGGCTTGCGAGATACTACCAATG
GCCAAGCGGGAGCATCCTGGAGCGGATTTGGATGAGGAGACTAGCCATTTTTGTCAAATATGCGAACGGCATTTCGT
TGGGGAGTACCAATGGGGACTGCATATGAAGTCCAACAAACACAAGCGAAGAAAGGAGGGACAGCGCAAGCGGCAAA
GGGATCACGAAACAATGCTCTCAACGGATCTAGCGAAGAAGCAAAAGGAGGAGAAAGAGGAGGCAGGAAAGGCGGAG
ACTCAGCCACCACCCAGCCGAGTCAATGATACTGATAAGGCAATGtaacactagacgcggcttggcaataaatgaac
ctacgtaaatttgagtcatttgttgttgttttgaatctcaatcccaccgttttgctgctgatgcaagcggcttgagg
agtatctgataaccctacacctcgctaatggggaccacagaccgcaggggaggtcgttgcctagccagaaaagcgaa
aacgcgtaaacatgtttgtgcaccgaacaaccagcccacacaatcgccatcgcccactgactgatctcgtctttcat
ttgcatttcagttgcccagcggttcagacgcaattagagaaaccaat
LD10347.5prime
TACCAATTACTTGTAAGCACAAAAAACAGCTGACGGCAACAAGTGGTTCGGTCCCCATCGGAATACACGTGCTCAAA
ACGTGTGGGTTTTATTTGCCTTAATTGACTTAAATTCACTCGCAATAAGTGGAAATGATTCGAAAGGTGCCGCTAAT
TGTAGTCCTGGGCTCCACGGGCACCGGAAAGACGAAACTGTCTTTGCAACTGGCCGAACGCTTCGGAGGAGAAATAA
TCAGCGCTGACTCCATGCAGGTTTACACCCACCTGGACATCGCCACCGCCAAGGCAACCAAGGAGGAGCAGTCCCGG
GCACGACATCATCTACTGGACGTGGCCACACCGGCCGAACCCTTCACAGTCACTCACTTTCGTAACGCAGCACTGCC
CATTGTGGAGCGCCTGCTCGCCAAGGACACTTCTCCGATTGTGGTGGGCGGCACGAATTACTACATAGAATCCCTAC
TTTGGGATATTCTGGTTGACTCGGATGTCAAGCCGGACGAAGGCAAACATTCGGGGGAGCATCTTAAGGATGCCGAA
CTGAATGCTTTGTCCACCCTCGAGCTGCATCAGCACCTTGCCAAGATCGACGCAGGTAGTGCCAACCGTATTCACCC
CAACAACCGGCGCAAGATCATCCGGGCTATCGAAGTGTATCAGAGCACCGGGCAGACTT
M I R K V P L I V V L G S T G T G K T K L S L Q L A
E R F G G E I I S A D S M Q V Y T H L D I A T A K
A T K E E Q S R A R H H L L D V A T P A E P F T V T
H F R N A A L P I V E R L L A K D T S P I V V G G T
N Y Y I E S L L W D I L V D S D V K P D E G K H S
G E H L K D A E L N A L S T L E L H Q H L A K I D A
G S A N R I H P N N R R K I I R A I E V Y Q S T G Q
T L S Q M L A E Q R A Q P G G N R L G G P L R Y P
H I V L L W L R C Q Q D V L N E R L D S R V D G M L
A Q G L L P E L R Q F H N A H H A T T V Q A Y T S G
V L Q T I G Y K E F I P Y L I K Y D Q Q Q D E K I
E E Y L K T H S Y K L P G P E K L K E E G L P D G L
E L L R N C C E E L K L V T R R Y S K K Q L K W I N
N R F L A S K D R Q V P D L Y E L D T S D V S A W
Q V A V Y K R A E T I I E S Y R N E E A C E I L P M
A K R E H P G A D L D E E T S H F C Q I C E R H F V
G E Y Q W G L H M K S N K H K R R K E G Q R K R Q
R D H E T M L S T D L A K K Q K E E K E E A G K A E
T Q P P P S R V N D T D K A M
Amazingly is a single ORF! With no obvious 5' intron either.
\*************A. melifera Contig1578
GTTGCTTTTTTTTTCAGAAGTAATTATATTCTGTTATATATAATTACTTCTGAAAAAAAATTTTTCAAAATAACATT
ACCAAATCAATACATTTTTCTGTTCATTCGCAAACTGAAAATTCATAAAATTCAAAAAATGGGAATGTAAAGAGGCA
AAATTTATAAATATTTTAAGTATTCAATTAAATAATGTTTTACTTAATTAAAATCATAATCCATATTTTATTATTGA
TAATTTTTTTTTCACAAGGAGATATCAAATAGATACCTAATTTGTTCACAGGGCACCAAATGGATCAAGTTGAACTT
GATTATTTTGTGGTTGCGCAATATTCTGCTGTGGTTGCTGCTGTGGTTGTTGTTGAAGGTTAGTACCCATCATCGAA
TTTGAACTAGACATCATCATTGGTGCAGCTCCTCCTGTGACCATCATGTTACCAGGACCACCAGATATTGTGCTCAT
CATAGGTCTTATGCCTTGCATACTTTGCATGCCTATTGGCACACCTTGCATTCCCATTGGACGATATCCAGCGCCAG
TTGTAGCTGCCATAGGTTGCGGTGTCCATCCTCCAGCTGAGCCACCAGTTTTGGCAGCATTTTTAGGCGAATTCCAT
TGCATACCTTTGACTTGTTGCTGAGCACTTTTGTTGATGGTCAAATTTTGAGCAAGACTAGCAAGACTGCTATCCAA
ATCTCCAGTCAGGACTTTACCAGTAGACGCTGCATTCTGTTGTTGTCCTGCTACTGAAATCGGTTGCTTTGCCGGGG
AACCGTACCCCGCCGGGACTTGGGTGGGGATACCGTAGGCCGACCATCGGTCCG
possible end of ORF in RC encodes 130aa 14%G/Q protein; BLASTX match is
39% over 56aa, 5e-05 to clathrin assembly protein AP180 short form \-
rat; and many others, indeed frog is best match;
genomic is unannotated within an intron \- NEW GENE ; one EST to
Anopheles gambiae of all things, but only 43%?
Turns out is just an alternative C-terminus for the lap gene, with some
extra exons within the final large intron of annotated gene.
\*************A. mellifera Contig1637
CTGTGCACGTCGCACAGGCGCCAGCTGCTCGTCATGCTGCAGAACCACAACAAGCTGCGCGACATCAGGCGTAGGTG
CACCAAGGCGAAGGAGGAGCTGTCCGTGAACATCTATCACCGGCTCAAGTGGATCATGTACGTGGAGAACAAGATGA
TGGAGGTGGACGGCAAGTTGGTCATGTATCACGAGAGCCTGAAACGTCTGAGAAGGCACCTCGAGGTGTTGCAACAG
ATCCATCTCGCGCCCCAGATGTACATGAACGCCGTGGCCGAGGTCGTTCGTAGGAGAACGTTCTCGCAAGCTTTCCT
GGTCTGGGCGAGCAACCTGGCCTGCCAATTGCTCACCGTTCACAGCGAGGAATTGGCACGTAGAAGGGAGTTTCAGA
GCAAATTCGACGGCCACTTCCTCAACACGTTGTTCCCAGGCCTCGAGGACACGCCACCGCCGTTCGCCACCCAGGCG
CCGTCCGTTTTCGACAACGGATTGCCAAAGTTGACGGCCGAGGATATGGAATCTCTGAGATCTCAGCTACCCGATCT
GGCGCTCACCATCTCGTCGCCAGATTTGAACAGCATCACCCAGTTCTTCCTGTCCAAGAGTCTCACCAGCACGGACG
AGAACAACAAGGAGAAGGACGGCGCCTCGATGCGCGTGGAC
complete ORF encodes at least 130aa 14% leucine protein; full-length
39% match at e-23 to KIAA0203 gene product <up>Homo sapiens</up> and C.
elegans and arabidopsis;
genomic match is to 46% to region of CG1347; but annotation needs
fixing it seems \- NEW ANNOTATION; one good EST
\************A. mellifera Contig1793
AGATTCCTAAAATCCTTTCATGGGGCCGCGGCCAGCCGGATGATATCAGTGCGATCAATCTAGGAGATGAGAAATTC
GACCCTGACTCGGATAAAAAGCCGCGCGCAGGACAAATTCTATGGATCCGTGGTCTAACACGACTACAGACACAGGT
AATAGGTGGCGAGTTGCAGGAACGTTTGATACCAGTACCCTACAGCAAGAGTTCGACAGACCAAGCTATCCGCGTAG
TGAACGCATTTCGGCAGGGCCTAGACGCACGCTACACGAGCGAGCACAGCAGCACTACATTGGCGGAGGTACTGAGA
AAACAGTCGTCCTTAAGCAAGCGGCTCTCGCAAACGAGCAGCATTGAATACGCCGATAACAACCCAGACGAACTGAC
CATACCCGAGATAGATGTGGAAAGATTGTCAAGTCACAGTCATACAGAGACCGCTGTTTAGAATGGGAGAAGAATGG
CGGGGTCAGCAGCGATATGTTTATCAAGCTGTGTTACGTTCGATCCTCTCTGACGATATGTAAACACGAAAAAGAAG
TCATCTTCATCCTTGTTTATCTCGTGGCCAGCGCGTTCGGAGGAAAGGACAAAAAAAAAAAAAATAAATA
long ORF encodes >130aa serine rich protein; BLASTX says is N-terminus
of Ca2+-transporting ATPase (EC 3.6.1.38), plasma membrane isoform 1c
from rat and C. elegans; 40% over 82aa; 4e-04;
genomic match is 75% to unannotated region \- NEW GENE; several ESTs.
AE003844.2 aataatatatttatttatttatatccttatattttagGTGGGGTCGCGGACATC
CCGAGGAGTACACAGATGGTATGAATCTGGGTGAGGAACGCTTTGATTCAATTGATTCTGATAAAAAGCCTAGGGCT
GGTCAAATTCTATGGATTCGTGGTCTAACTCGCTTGCAAACACAAgtaagtgtcaattcaacaacaacatcaacttg
tcttaagaaactagaactaaacaataattacctaaaaggatcaacagttatatatcaatttgtttaaataatgactt
ttttatgttgttgtgatatatttttataaatttctatttctattcattattcacatttattacattgatattttata
tgatataatgatataattaattattaaataataatataatataatataaatataataaaatattatatatatatata
tatacacatatatatacctacctactatatagannnnn---nnnnn11kb intron, at least,ttttatgac
tattttactctttcccacttatattttctgtattactttctatctccctctctttctcttcattcttaaagGTAATA
GGCGGCGAATTGCAAGAACGCTTGATTCCGGTCCCATATAGCAAGAGCAACACTGATCAAGCTgtaagccttaacaa
ttttgttttatgttatttttacagttttaaaataagtcgaattattaaattctatctcggcaaaagcgtaactataa
gaatgaacgatgatatacttgtagtacgtttgtctattcactaagaacacaatttttttaaatcgtctgtttgtccg
taataataaggtaatgaaaggcaaggcaatttaataggcgtattgtgtgtcaaccacaagcttaatttaatatgccg
aatttttaacccacctatatccaaaatatatatggttatatttcttttttaattatgatttagttggtttttcgtgc
ccactggttttgctttaaacttccatcatgtagaagaacgatatacttagttttttaatgtgtttgttcgtcaccac
ttaaataaaatcaaaaaaaggttgtcgcataatgcatgattaggcaaattaattttagattgctgattaattggtaa
attaccgagctacagtcctccgaattatgaacaaaataagcgaaatattaaaaagaataagcaactcatatgaagtg
ttgttactgattatttgtccgtctgaattaattggtatttggatatcccattattaacttaagatctatcttatcat
ttatgtgtcgcatctgctcgtagtaagaattgtcaaaataatatttgtatttaatttgaactagaaatatacataaa
agaaagtatgttcatatatgtataattggatctttagcttgaatgattaaagatgttcttcttatactattgtttat
gtcccttttgtcactgttcttggtgttgttcttttctttttactaaatgtacgcttagATACGAGTGGTAAACGCAT
TCCGCCAGGGTCTGGACGCCCGTTACGGTGATCACACCAACACATCCCTGGCAGAGGTACTGCGTAAGCAGACTTCG
TTGAGCAAACGCCTTTCGGAAACGTCTTCCATTGAGTATGCCGATAATATACCTGATGAGCTGACCATACCCGAAAT
TGATGTCGAACGTCTATCATCCCACAGTCACACTGAAACTGCAGTTTAAATTTCAGTGGCATCCATATCCATATAAA
AATAAACCGCACACATTCTCAGAAATAACAATTCTAAgaaatccttagcacagcttggaatttttataaaaaaaggt
tttgttcaggataacagcaatgtagctgtgaattggttaaaaagcattttgtaattcagcaaaaataatccgtaaaa
aaaatgtaaatttctaattttttttgttagtatgtatgctaacaaaatatataaagtacttaatattaatataattg
taatgcaagggcatacatacattgattataaccacctttaactcaaaatgtaagcggatcggttttgtctcgcacac
tgaagccattaattaatattttatcgtcttacatgtaataaatgattcaatgaataaacatgttttatttacttaca
cgtggaaaaggttagcactataataaatcgaccaaacggtgcaaaagaaacagaaaagcacggatc
GH15464.5prime
GTAAACGCAT
TCCGCCAGGGTCTGGACGCCCGTTACGGTGATCACACCAACACATCCCTGGCAGAGGTACTGCGTAAGCAGACTTCG
TTGAGCAAACGCCTTTCGGAAACGTCTTCCATTGAGTATGCCGATAATATACCTGATGAGCTGACCATACCCGAAAT
TGATGTCGAACGTCTATCATCCCACAGTCACACTGAAACTGCAGTTTAAATTTCAGTGGCATCCATATCCATATAAA
AATAAACCGCACACATTCTCAGAAATAACAATTCTAAGAAATCCTTAGCACAGCTTGGAATTTTTATAAAAAAAGGT
TTTGTTCAGGATAACAGCAATGTAGCTGTGAATTGGTTAAAAAGCATTTTGTAATTCAGCAAAAATAATCCGTAAAA
AAAATGTAAATTTCTAATTTTTTTTGTTAGTATGTATGCTAACAAAATATATAAAGTACTTAATATTAATATAATTG
TAATGCAAGGGCATACATACATTGATTATAACCACCTTTAACTCAAAATGTAAGCGGATCGGTTTTGTCTCGCACAC
TGAAGCCATTAATTAATATTTTATCGTCTTACATGTAATAAATGATTCA
LP02848.3prime RC
ACGCAT
TCCGCCAGGGTCTGGACGCCCGTTACGGTGATCACACCAACACATCCCTGGCAGAGGTACTGCGTAAGCAGACTTCG
TTGAGCAAACGCCTTTCGGAAACGTCTTCCATTGAGTATGCCGATAATATACCTGATGAGCTGACCATACCCGAAAT
TGATGTCGAACGTCTATCATCCCACAGTCACACTGAAACTGCAGTTTAAATTTCAGTGGCATCCATATCCATATAAA
AATAAACCGCACACATTCTCAGAAATAACAATTCTAAGAAATCCTTAGCACAGCTTGGAATTTTTATAAAAAAAGGT
TTTGTTCAGGATAACAGCAATGTAGCTGTGAATTGGTTAAAAAGCATTTTGTAATTCAGCAAAAATAATCCGTAAAA
AAAATGTAAATTTCTAATTTTTTTTGTTAGTATGTATGCTAACAAAATATATAAAGTACTTAATATTAATATAATTG
TAATGCAAGGGCATACATACATTGATTATAACCACCTTTAACTCAAAATGTAAGCGGATCGGTTTTGTCTCGCACAC
TGAAGCCATTAATTAATATTTTATCGTCTTACATGTAATAAATGATTCAATGAAT
translation
\-------1---------------- W G R G H P E E Y T D G M N L G E
E R F D S I D S D K K P R A G Q I L W I R G L T R L
Q T Q \---------------0--------------
V I G G E L Q E R L I P V P
Y S K S N T D Q A \---------------0------------
I R V V N A F R Q G L D A R Y G D H
T N T S L A E V L R K Q T S L S K R L S E T S S I E
Y A D N I P D E L T I P E I D V E R L S S H S H T E
T A V \*
5'region for 5' exon \- 4kb more available
attcttaaatctatgtttgaggactatgaccgcgctaatcacctcccgcatggtcatatctcttgacacttcggcaa
taggccgctgcaatcgtacttatctgtagttgccacttatgctgtccggtgacatgtttattgtagcccataaatag
aacatttttatagtttgcctacttatcattatggaaatatcgacatggctagtgatcagtatcaataacatacaatt
aactttatatcgtaggatatgcgtctttctattgccagaattgtttcttttaagtattctatgaatagtaaagggtt
tattaacctgtagttttatcatatatgataattataccttttactcgtagaggaagcgcttccgacaatataaagta
tatatatttctgaccaaaacaaccacacccccacttgtcttgccaatatcggtcggtatactaaatacacttttttt
ttttaatatggctctgtggctgtccaattgattaaatgcgttcagttctcgtctttgaagagtggtttctgttctaa
aatgatggtcctgatcaagaatatatatacttaatatggtctgaaaagtttccttctgcctgttaaatacttttcaa
caaatctggtaattcttttactctcgcctaacgggtataattaattagtcatatcgggctactatatcatatagctg
ccatatagcgatcggtctgcaataaagtgtttgtatggttggcagctgcccttctctggacctaaaaggaatgttca
agaaattttataatttgctgcctatcacgtaacttcccgttgtttatttacactatgaatatgaattctactatctg
ccccctgctggctgatggcctggcgacgcccttgacaaaatatatgtaaaaataatattacaaaatgttacaacaaa
gtttgattgaagtttatttgtcttggttatctatcagtacagcaaaacattttagccgcgccccttccaaagcccac
aagtcgctcaaaactgtcatgtcaacacgtttcaatatattattttctggccatatggaatctgatagtcaaggaac
tcgactatagcattctctctttttttattcttatgttggtcatacgttatcaagttacacaa
It turns out that this is a large 1200aa protein in others, and we have
just the C-terminus. Indeed we need to add it to CG2165, providing the
C-terminus for this gene
\*************A. mellifera Contig253 TGTCCGTACTCGTTGAAACGCATAAATACACGCTAATGAA
TAATTTTTACTGATGTATATAGGCACTTTTTTCTTCGTTCTCCTCTTTCTCTCTCTCTCTCTCCTCATTCTTTCCGT
TTCTCACTTTCTCTCTCTCTCTCTCTCTCGTTTCATTCTTCTTACGCTCTCTCTTTCTCTCGCTCATATACTATTCC
AAAACAACAAATTTGTTTATCAACCTTAAAAGGCTCCTTTTTTTTTTCCCGATAAGAAAAGCTTGCAGCTTCAAAAA
CAGATATTTTTTTGTGTTTAGACGATCGTTTTCTTAAAAGCTAAAAAACATTATGAAATCGAATCAAAAATTCACGC
CTATCATCATCTCAACGAAGAATCGGTCAAAAATCATCTTCGTTAACGGAAAGTCTTTTGATATATATAAAAAGGTA
TCGAGACAGTGACAGAAGGAATTGAAAGCGGGTTGAAGGAAACATTATGAGAACAATTACGTGCCTCTTAACCTCTA
CCAAGTCTTTTACATAGTCGCTTTGATATGGTAATAAATAGGAAATATTTCACGTTTCACGAGAGGATGTAATACAA
TAATACAGTTTAGTTGGTTAGAAGAGAATAGAGAGCTGTATGAAAACAGAAGGTAGAGGTAGAAATAGGAAAAAGAT
ATAACGACAGATAGAAGAATCTCGAGCGAGAGGCTGCTGTTACATTTTCGCATTTTCGTTCTGTTTCCGGACGTAGT
AATTCGTACTTAC
long ORF in RC encodes 17% arginine and 14% glutamic acid; 230aa
protein; BLASTX match to eukaryotic translation initiation factor 4B
<up>Homo sapiens</up> 40% e-08 and in yeast; genomic match is full-length and
clear but low at e-09 to C-termianl annotated region of CG10837, so
think have additional C-terminal region of this gene; one EST
\*************A. mellifera Contig2709 TGAAAATGCGAAAAATACAATAAATGGTGGAAAATATAG
CAATTTAAATATACCAGTAACATCACAACAAGGATTAGCGCCACTTAGTCCCTATTTAAATTTTGATCCTGCATATC
TTCCTCCAAGCCAACCAGAATATATATTTCCGGAAGGAGCAGCAAAACAAAGAGGAAGATTTGAATTGGCTTTTAGT
CAGATTGGTGCAGCATGTATTATAGGAGCTGGTATTGGAGGTGCTACTGGTTTGTATAGAGGCATTAAAGCAACATC
TTTAGCTGACCAAACTGGGAAACTTAGAAGAACACAATTAATCAATCATGTTATGAAAAGTGGATCGTCGTTAGCAA
ATACATTTGGAATAGTATCTGTGATGTATAGTGGATTTGGTGTGCTTTTATCTTGGGTCAGAGGTACAGATGATTCC
TTAAATACATTAGCAGCAGCAACTGGAACAGGAATGTTGTTCAAATCTACAACTGGCTTAAAAAAATGTGCATTGGG
TGGTTGTATAGGACTAGGAATAGCATCTGTATATTGCTTATGGACTAATCGAGAAGCCTTACTGGAATTGAGGCATC
GCAATATAAATCCAGCGTAAGACTGTGTGTAACAGCAAAAACTCTGAATATTCCTAAATATTTTTCTTATAGTGATT
TAAATTTCTTAGTAGTACTAACAAAGGAATGATAGAAGGGTTTGCATTCTGTAATTGATATAAATTATATATGAATG
ATTATTTTTACAAAAATAAAAAAAAA
long ORF encodes 13% glycine >200aa protein; BLASTX matches full-length
at 43% and e-31 for 200aa translocase of inner mitochondrial membrane
<up>Mus;
genomic match is to a small 15kb contig; several ESTs, so annotation
would be easy \- NEW GENE
AE003403.2 TTGAACACAGATGTCACTTCTACAGGGGAAAAAAGTTTAAAAACA
AGTAAATCACAGAAAACGTCGTTTCCTTTTGCTAATAGAGCGCCTGAATTCGGTGGAAATAGCAAAAATAATATATC
ACCATTCTTGGGACTGCAAACAAATTCGAAAATGAGTGACAATTTTTCAAGAACACCATATTCTGATGGGCACGCTG
CAACCCgtaagcaacaaagaatttggttacataattttattaattattaaatagattgctgttatcttttttctttg
taaattgaaatgcaatacaatatgcagATGAGGAAGCATCAAAACCCCACTACACTACCACTACGAGTTCTTTTAGT
AGAACTCCGGTCTCGCCGTACCTCAACTATGATTCGCGATATCTGCAGCAAGCACAGCCAGAGTTCATTTTTCCCGA
AGGGGCCAACAAGCAGCGTGGACGCTTCGAGTTGGCCTTCTCTCAGATAGGCACTTCGGTAATGATTGGCGGTGGAA
TTGGCGGCCTAGCAGGTGTTTATAATGGTTTAAAAGTCACAAAAGCACTCGAGCAGAAGGGAAAAGTTCGTCGAACA
CAgtaagcaattggcggaattaaaattggttgcaacctcacaactttgactcaacacgtagGTTACTTAATCACATT
ATGAAGCAAGGTTCCGGCACAGCTAACACATTAGGTACATTGACGGTGCTGTATTCGGCTTGTGGAGTTTTGCTGCA
GTTTTTCCGCGGAGAAGATGATCATATAAACACAGTAATTGCGGGCTCTGCCACAGGACTATTATACAAGTCAACAG
gttagctaaattttccatatatcgaaaaataatatttattaactagttgcattgtattttacagCTGGCCTTAGGAC
GTGTGCTTTTGGTGGAGCTATTGGGCTGGGCATCTCGTCCCTCTATTGCTTATACCTAATAGCACAGGAAAACAGTT
CGAACTCAAGTCCCAAATACCTATAGATGGCTGAAATATGTAGTACGCAGGCATTAATAGGATCACTCCTAGCCGAT
TAAAATTATAAATACGAAGTTTTAATTTTATTTTGTTTTATTGCATTTTATACTAAGCATTTTTGCATTAACTTGCT
GTTGTAGATAAAGCCATCACATTCCCCCACGCAATTTAGTTAGGAACCCAATTTCCAAACTCGCTAATAGTCCAAGT
TTTTGGTATCGGCGCCTACCATGATTGCCCTGCTGCCCCCAGCCATTTCATTATTGGCCGGAGCTTCGCAGCTGGTA
TCCTGTGTGGAGCACGTCTGGCAGTGACAGTTCACTGCCTCCATGTACTGGTACTTGCTTACGCTGTCCTCTGCTTT
AGGGTGACAATTCTTTAGGATAGCTACTACCAGCTGCCGCTGGGCGTGGACACAAACAGGATGGAACGAACGCTTGT
AAGGAAACTTCCAATCGGAAATTTCGCTTGAGTCGCATCGTCCCCAACACGACCAAACGCTTACATAGTCCCAACAC
TCGTGTCCTTGAAGATCGGACTGTGTCACTTTATACGTGTATACACGACGATGGCATCCCAAAGGCGTCACGATATG
TCCGTTGTTCATCGGCTTAATTTCCGACAAACTTGAAGATGAAACCGAAACAAGCACAACAGACGTACCTACAAAGA
TAGCTAAAGTCCTGAAAAAAATTATTCTGAGCATAGATGAACAATGC
LP07554.5prime
TTGAACACAGATGTCACTTCTACAGGGGAAAAAAGTTTAAAAACAAGTAAATCACAGAAAACGTCGTTTCCTTTTGC
TAATAGAGCGCCTGAATTCGGTGGAAATAGCAAAAATAATATATCACCATTCTTGGGACTGCAAACAAATTCGAAAA
TGAGTGACAATTTTTCAAGAACACCATATTCTGATGGGCACGCTGCAACCC--------------------------
\------------------------------------------------------------------------ATGAG
GAAGCATCAAAACCCCACTACACTACCACTACGAGTTCTTTTAGTAGAACTCCGGTCTCGCCGTACCTCAACTATGA
TTCGCGATATCTGCAGCAAGCACAGCCAGAGTTCATTTTTCCCGAAGGGGCCAACAAGCAGCGTGGACGCTTCGAGT
TGGCCTTCTCTCATATAGGCACTTCGGTAATGATTGGCGGTGGAATTGGCGGCCTAGCAGGTGTTTATAATGGTTTA
AAAGTCACAAAAGCACTCGAGCAGAAGGGAAAAGTTCGTCGAACACA------------------------------
2----------------------------GTTACTTAATCACATTATGAAGCAAGGTTCCGGCACAGCTAACACATT
AGGTACATTGACGGTGCTGTATTCGGCTTGTGGAGTTTTGCTGCAGTTTTTCCGCGGAGAAGATGATCATATAAACA
CAGTAATTGCGGGCTCTGCCACAGGACTATTATACAAGTCAACAG--------------------------------
1-------------------------------CT
AT20116.5prime
GGCACGAGGCTTCTACAGGGGAAAAAAGTTTAAAAACAAGTAAATCACAGAAAACGTCGTTTCCTTTTGCTAATAGA
GCGCCTGAATTCGGTGGAAATAGCAAAAATAATATATCACCATTCTTGGGACTGCAAACAAATTCGAAAATGAGTGA
CAATTTTTCAAGAACACCATATTCTGATGGGCACGCTGCAACCC---------------------------------
\-----------------------------------------------------------------ATGAGGAAGCAT
CAAAACCCCACTACACTACCACTACGAGTTCTTTTAGTAGAACTCCGGTCTCGCCGTACCTCAACTATGATTCGCGA
TATCTGCAGCAAGCACAGCCAGAGTTCATTTTTCCCGAAGGGGCCAACAAGCAGCGTGGACGCTTCGAGTTGGCCTT
CTCTCAGATAGGCACTTCGGTAATGATTGGCGGTGGAATTGGCGGCCTAGCAGGTGTTTATAATGGTTTAAAAGTCA
CAAAAGCACTCGAGCAGAAGGGAAAAGTTCGTCGAACACA------------------------------2------
\----------------------GTTACTTAATCACATTATGAAGCAAGGTTCCGGCACAGCTAACACATTAGGTACA
TTGACGGTGCTGTATTCGGCTTGTGGAGTTTTGCTGCAGTTTTTCCGCGGAGAAGATGATCATATAAACACAGTAAT
TGCGGGCTCTGCCACAGGACTATTATAACAGTCAACAG--------------------------------1------
\-------------------------CTGGCCTTAGGACGTGTGCTTTTGGTGGAGCTATTGGGCTGGGCCATTTGTC
CCTCTATTGC
GH02609.5prime
CAGGAATGCTTGAACACAGATGTCACTTCTACAGGGGAAAAAAGTTTAAAAACAAGTAAATCACAGAAAACGTCGTT
TCCTTTTGCTAATAGAGCGCCTGAATTCGGTGGAAATAGCAAAAATAATATATCACCATTCTTGGGACTGCAAACAA
ATTCGAAAATGAGTGACAATTTTTCAAGAACACCATATTCTGATGGGCACGCTGCAACCC-----------------
\-----------------------------------------------------------------------------
\----ATGAGGAAGCATCAAAACCCCACTACACTACCACTACGAGTTCTTTTAGTAGAACTCCGGTCTCGCCGTACCT
CAACTATGATTCGCGATATCTGCAGCAAGCACAGCCAGAGTTCATTTTTCCCGAAGGGGCCAACAAGCAGCGTGGAC
GCTTCGAGTTGGCCTTCTCTCAGATAGGCACTTCGGTAATGATTGGCGGTGGAATTGGCGGCCTAGCAGGTGTTTAT
AATGGTTTAAAAGTCACAAAAGCACTCGAGCAGAAGGGAAAAGTTCGTCGAACACA---------------------
\---------2----------------------------GTTACTTAATCACATTATGAAGCAAGGTTCCGGCACAGC
TAACACATTAGGTACATTGACGGTGCTGAATTCG
translation
M S D N F S R T P Y S D G H A A T \--------------------------
\--------------------------1---------------------------------------------H E
E A S K P H Y T T T T S S F S R T P V S P Y L N Y D
S R Y L Q Q A Q P E F I F P E G A N K Q R G R F E
L A F S Q I G T S V M I G G G I G G L A G V Y N G L
K V T K A L E Q K G K V R R T Q------------------------------
2---------------------------- L L N H I M K Q G S G T A N T L
G T L T V L Y S A C G V L L Q F F R G E D D H I N
T V I A G S A T G L L Y K S T \--------------------------------
1-------------------------------A G L R T C A F G G A I G L G
I S S L Y C L Y L I A Q E N S S N S S P K Y L Z
several intron splices not predicted, but cDNAs make it clear, as well
as matches. Interesting 5' UTR, has no stop codons, but this is the
first M and it aligns okay.
\***********A. mellifera Contig764 GATCCTGCCAGTAGTCATATGCTTGTCTCAAAGATTAAGCCA
TGCATGTCTCAGTACATGCCGAATTAAGGTGAAACCGCGAATGGCTCATTAAATCAGTTATGGTTCATTAGATCGTG
GACACATTTACTTGGATAACTGTGGTAATTCTAGAGCTAATACATGCAAACAGAATTCCTCTCAGAGATGGGAGGAA
TGCTTTTATTAGATCAAAACCAATCGGTGGCGGACGGCTCGTCCGTTCGTCCATCGTTTGTTTTGGTGACTCTGAAT
AACTTTGTGCTGATCGCATGGTCATCTAGCACCGGCGACGCATCTTTCAAATGTCTGCCTTATCAACTGTCGATGGT
AGGTTCTGCGCCTACCATGGTTGTAACGGGTAACGGGGAATCAGGGTTCGATTCCGGAGAGGGAGCCTGAGAAACAG
CTACCACATCCAAGGAAGGCAGCAGGCGCGCAAATTACCCACTCCCGGCACGGGGAGGTAGTGACGAAAAATAACGA
TACGGGACTCATCCGAGGCCCCGTAATCGGAATGAGTACACTTTAAATCCTTTAA
no obvious ORFs; WEIRD BLASTX match for RC full of stop codons, at 70%
and many gaps and e-14 to human non-functional folate binding protein
<up>Homo sapiens</up>; Needs four stop codons included!
Is this a mouse pseudogene cDNA? No, has genomic match in Drosophila
at e-37, but to a tiny incomplete 1000bp contig \- so will not be able
to reconstruct this amazing gene; AND tons of 70% B. mori and some
Drosophila ESTs with their own set of stop codons in them! What on
earth is this gene? Mitochondrial code doesn't fix problem
Contig 764 RC TTAAAGGATTTAAAGTGTACTCATTCCGATTACGGGGCCTCGGATGAGTCCCGT
ATCGTTATTTTTCGTCACTACCTCCCCGTGCCGGGAGTGGGTAATTTGCGCGCCTGCTGCCTTCCTTGGATGTGGTA
GCTGTTTCTCAGGCTCCCTCTCCGGAATCGAACCCTGATTCCCCGTTACCCGTTACAACCATGGTAGGCGCAGAACC
TACCATCGACAGTTGATAAGGCAGACATTTGAAAGATGCGTCGCCGGTGCTAGATGACCATGCGATCAGCACAAAGT
TATTCAGAGTCACCAAAACAAACGATGGACGAACGGACGAGCCGTCCGCCACCGATTGGTTTTGATCTAATAAAAGC
ATTCCTCCCATCTCTGAGAGGAATTCTGTTTGCATGTATTAGCTCTAGAATTACCACAGTTATCCAAGTAAATGTGT
CCACGATCTAATGAACCATAACTGATTTAATGAGCCATTCGCGGTTTCACCTTAATTCGGCATGTACTGAGACATGC
ATGGCTTAATCTTTGAGACAAGCATATGACTACTGGCAGGATC
translation
L K D L K C T H S D Y G A S D E S R I V I \# F V T T S
P C R E W V I C A P A A F L G C G S C F S G S L S
G I E P \* F P V T R Y N H G R R R T Y H R Q L I R Q
T F E R C V A G A R \* P C D Q H K V I Q S H Q N K R
W T N G R A V R H R L V L I \* \* K H S S H L \* E E
F C L H V L A L E L P Q L S K \*
AE003241.2 entire AGTGATCCACCGCTTAGAGTTTTATAATTCATTTTTATATAATGTCAATTATGT
TTTTATTGAAAGAAATTAAAAATACACCATTTTACTGGCATATATCAATTCCTTCAATAAATGTATTTATATACCTA
AAATAAATGTTGCGAAATGTCTTAGTTTCATATAAGCATTATGTATCATAATAATCTGGTTGGTTATGGGGTTTGCT
ATTTTGGGTGACACATACTGCAATTTATATAAAACATTAACCTGATGGATGCCAGGTACAACATTGTTTATTTCAGG
TTGTTGCATTAGCCAACGTATGCCCATAACTAAGATGAACAATACATATTCGCAACGCGTGTATAGTAATAAATACA
CACAAATTTTAAAAATTAGTTAATATCTACCAATTATATTAACACTTATTTCGATGATTACCACACATTCGAAATTA
TTTTATTTTGATTCGACTTCCACTTTCGAATTTTGTTTTTTCGATTTTCATGTTCGAAACATTATTTTTATAGGAAA
CGCCGTTGTTGTAAGTACTCGCCACAAATACGCACAACATACATTAGAAATGTTAAAATCTTTTTATGAGGTTGCCA
AGCCCCATCTTCGTTTTATTTTGATTTTAACTTTTTGTATGAAAAGATACAAGTATTTAATCACATATAAGAACTCC
ACCGGTAATACGCTTACATACATAAAGGTATAGTACTAACCACAATTGTAAGTTGTACTACCCGTATGAAGCACAAG
TTCAACTACGAACGTTTTAACCGCAACAACTTTAATATACGCTATTGGAGCTGGAATTACCGCGGCTGCTGGCACCA
GACTTGCCCTCCAATTGGTCCTTGTTAAAGGATTTAAAGTGTACTCATTCCAATTACAGGGCCTCGGATATGAGTCC
TGTATTGTTATTTTTCGTCACTACCTCCCCGAGCTGGGAGTGGGTAATTTACGCGCCTGCTGCCTTCCTTAGATGTG
GTAGCCGTTTCTCAGGCTCCCTCTCCGGAATCGAACCCTGATTCCCCGTTACCCGTTGCAACCATGGTAGTCCTAGA
TACTACCATCAAAAGTTGATAGGGCAGACATTTGAAAGATCTGGCGTCGGTACAAGACCATACGATCTGCATGTTAT
CTAG
translation
L K D L K C T H S N Y R A S D M S P V L L F
F V T T S P S W E W V I Y A P A A F L R C G S R F S
G S L S G I E P \* F P V T R C N H G S P R Y Y H Q
K L I G Q T F E R S G V G T R P Y D L H V I \*
Yikes, all the cDNA matches are backwards too, that is to the RC as
with the honey bee, and even for the vertebrate clones? What on
earth can this be?
\************A. mellifera BB260003A10H2.F ACAGAGAGAAAAAGCAAGCTGCCTTACCTATCGCT
GTCTTTGGAATGGAAATGGTGGAGAAATTCTATTCGAAACAATTCACGGATAAGGAAGAGGGTCTGATGCAATTGAA
AGAAGAATTAAAGACGTTTGATCCAGAAGTTTCGAAACATTCCGCGAATAAAACGGCCAGAGCTGCGATTTTGTTGT
TACACAGAGCCCTTAGGGACAAAGTTTTCAGTGTATACAGTCTAGCTGCACAATTGATTAGAATTTTTTTTTCAGAA
TTTGCAACTAGGGTATCTTCTACGGAGATTGCGAGAAGTGTGGAAAGATTGCTCCCAGAATTATTGACTAAATCAGG
GGATACCACTCCAAGGATTCATAACATGGCAGTCCACACGATACTCAGTATGGCAGATTGTAAATGTGTCCGAGAAT
TGCACATTATACCAGTGCACTTAACAAGACCTGTCAGCAGTAGCACTCATCAAAGATTGGCGTTGAGTAGGTTAGAA
ATGGTGGAGCAGTTGATTCTGAGCCATGGAATATCTACTGATAAACAAAGTGGGCTCACTTGTCGAACATTGTCGGA
ATTGGGTTCTACGGGATTGCATCATCCGGCGGAAGCAGTGAGAAAAGTTTCGGAAAGAATTCTAGTGTTGGTGTACA
AAGTGAATCCTAGATTGGTTCGCAAACAATTGCCTCCTGACGATGATATTACCAGGAGAAATTTATTGTATCGTCAA
CTTTTTCACGAATTTGATGTTT
full ORF; BLASTX glycine-, glutamate-,
thienylcyclohexylpiperidine-binding protein <up>Rattus; 28%; 3-22; could
be rest of CG10137; no ESTs</up>
Indeed, there is a human protein, KIAA0562 , that is about 900aa long,
and its first part matches CG10137, while its second matches our cDNA.
Similar long proteins are known from C. elegans and Leishmania major,
so the truncated rat protein above of 400aa, is aberrant.
\*************A. mellifera BB260003A20B1.F CCTTAGATGGACCGATCTGATATGCAACCTAATA
CAAATCAATTTACTGTATGGAGTCCTGCAGGCCAAGACATTAGTACGAGGGTCTCCTGGAATTCTGATCGACGACCA
ACCGCCCGAGAGCCCGAGCGACAGTAACGAAACCGACGAGACGAATAAAAATTTCCCGTGGATGAGGGTGTTGGCCC
AGTTCGCCAACTCGTTCAACTTTTACTGCTCCCATCAGAATTTCTGCCACCCGTATTGCCACAGGCGGCAGATGCGC
GCGTGCAGCAGATTGATCAAGTCCATAAGGAAAATCTACGGGGAGGAGTTTGGCATATTGAACGGGACAGGGATATT
CGACTTGGACACGGATAAGAAGGAGGCGAGCAAAAAGGAGAAACGGAGCCGAAAAGTTTCGGAGCAGGCCAGCACCC
AAGTGTCCCCTGTGAGGAGGAAGGACAGCGTAGGGAAGAAGTACAAGTAACATTTATCTTCCTTAATTGATTCAGTG
AAAACTGACTGTTCTAATAATTTCCTGAATAAATTTAAGATAAGATATTCTTCAAACACAATTTGGATTAACATCAA
CATCTCTTCAAATTATTCTAAGATTATTTCTATATAATATATTTCTTATTATAAGATCTTATCTCTAAGATCTGTTA
AAGAAATGCATGGCTCTATTTTATTAATATTTTAAATACAATCAAACTGATAACTGGTCTTGAAAANTTATTCAAAT
GAAAAGTGGCATCTTGCGTTCCTTTCCTCTTCTTCGACAGGGTTGAAAAGAACATGGATGGGTCTTAATTAGGCAGG
CTCGCTCAACGAGACTTGTC
end of ORF encodes weak match to C. elegans and human proteins;
methyl-CpG binding protein 1 <up>Homo; 32% over 84; e-0.27; unannotated
NEW GENE; no ESTs
The human protein is about 600aa long, with our matching in the middle;
this region has two zing fingers, but the methyl-CpG binding domain is
at the extreme N-terminus;
This genomic match is very good at about 55% so see how far it can be
extended, we have 14kb region to play with!
AE003762.2 CACCTCGGAACAGGGCATGATCAGTGGTGGCGAGGAATCACCAGGAATTCTCAG
TGACGATCAGCAGCCGGAGTCACCAACGGACTCGAATGAAAACGATGATACGGCCAAGAATATGCCGTGGCTAAAGG
CAATTATCGATCTTATGTCCAGCTATAACTACTACTGCACCCATAAAGGATATTGCCATCCATTTTGCTATAAACGG
CACATGCGATCCTGCACTCGCTTGGTCAAAGCCACCAGAAAGgtaagataccatttacggaagtcaaacttaagcca
aaaaattcgattgtttcctagGTTTATGGCGAGGAGTTTGGATTCACCTTCGATGCAGACCATCCGAATGTGGAGCC
CACTATCATCACCTCCAGTAAGCCACATACTTCTCGAGCTCGCTCCACTAGAAAAGTATCGGAGCAGAGTTCCACTC
AGACATCTCCGTCCAAGCGAAAGGATAGCTTGTCACGCAAAGATCGGTGGGTAGTATTCATAGTTAGCCAAAACATA
GCTTATATATACATGCTCACAGGA
translation
S P G I L S D D Q Q P E S P T D S N E N D D T A K N
M P W L K A I I D L M S S Y N Y Y C T H K G Y C H P
F C Y K R H M R S C T R L V K A T R K \-------------------
\---0--------------------------------- V Y G E E F G F T F D A D
H P N V E P T I I T S S K P H T S R A R S T R K V S
E Q S S T Q T S P S K R K D S L
Now figure out that is already recognized as CG18437, except for some
reason the entire annotated mRNA is not translated into the protein
CG18437?
\**************A. mellifera BB260003B20H2.F GAAGATATAAAAATTATTTAACGAAAATCTAAA
TTTTTGTCGAACGATTAAAATTTTGCGGAGTTAACCTATACGTTTACATTCAATTACCAATAACGACACATTATCAT
AAAAGAATTTCCAACATTAATATCACGGAAAGATTAACGAGGTGGGGCAGAATTACTTTTCAATAAAAAAAAAAAGG
CTCGAAGTAAACTTACCTGGAGTAAACGAATGGACTATACAAAAGCTACTTTTCTGGATAACAGATAATCTGTTAAA
GGAACGACAGGAACTTTTTATACAAGGGGATACCGTGCGTCCAGGAATCTTGGTTCTTATAAATGACATCGATTGGG
AATTATTGGGCGAAAGCGATTATAAAATAAAATCAGGCGATACTATATTATTCATATCCACTTTACACGGAGGATAA
GAAAGATAAAAGTGGAAAAAGAAAAGATAAAAGTCGATGCAAAGAAGATCGAAGATAATACCGCATAACATAATATA
AGTGGAATCCGAGAAAAACGTAATTCTTCAGAAAAACATAATTTCACGGAATGTTTGTACCGATGGTGAAATACGTG
GATACGCGCGTAGCGTTAAAAAAAAAAAAAAAATAAACCGTTTATTCCTTCGTAAGATAAGTAACGATTTCTATTAG
CCCCGCCGG
no clear ORF, yet ubiquitin like protein; Urm1p <up>Saccharomyces
cerevisiae</up>; 47% 3-11; has no start codon and a frameshift, but encodes
full-length 99aa protein;
TBLASTX is clear; unannotated in genome NEW GENE; no ESTs at all
Only two kb available for this gene between annotated neighbors; this
is the entire region
AE003558.2 gtgttttttttaatttcataaattcacaacgaaatgtaaaatgtcttcttagag
cacagatcataatatgctaatgaaaactttacctagcgttccattggaatgccacctgtttatgtttctcaatggga
ataattataatgctgagtcccctcgtcgaaggtgaatggtaaaggtaaatttacgtacaaattataatttctttcaa
aatgcaataatttttggaagtttagagcatgtggttgccagactttttagaaattaggattctatatttggtattat
ttttgaATGGTGGATTTTTTTGAAAAGCTTAGACGCGGTCACACATTTATTTACATCGAACATATGATGGGCACGCC
GGAATTAAAAATCATATTAGAATTCAGgtatataaagccttgttcgaattattatttccaataaatgagacaatttg
taatttaatttcgtagTGCAGGGGCGGAGTTACTATTTGGTAACATAAAACGCCGTGAATTGAACTTGGACGGTAAA
CAAAAATgtatgttctaaaagatgttttaaacttgagtgaaatcggttatgaattatatatattttaaagGGACTAT
TGCTAATCTGCTTAAGTGGATGCATGCGAATATTTTAACGGAGCGTCCGGAACTTTTTCTTCAAGGAGATACTGTgt
aagtttggcttaaactataagggatacatagtctatactttctattgtatgttttcagGCGACCTGGAATTTTAGTA
CTCATAAATGATACAGACTGGGAATTGCTGgtaagtaagagtacagaatatggaattctaaaaactataattaactc
aacttctagGGTGAACTGGACTACGAGCTGCAGCCCAACGACAATGTGTTGTTTATATCAACTTTACACGGTGGTTA
AAAAACGTTCTGGAATCTAAATTATAGGAGAAAAGTTTATTTTTATACTACAACCCATTaaattgtattcaaaaaac
aaacaaaaacagtttttgactgaatcagaattacgtccctttaccagaggcaaaggcccacacgatggttcgggtag
taaagttttcatcatatcatctagagattgtaagctagcaggttttcgctatttataaaagcacagcattgaacaag
cacttgggaattgtagggaagttaaaaatagaacaatccagatgcggtttggcgggtaattcgaagtaaggacaaca
cgttagttttacattgaagcaacatttattgaatttaaattttcgcttaaaattatttatcgagttattatgaagta
tagatatatttttttaatgttcgtttacgattattttaactataggcacttaggttacatatcaactaactgtacgt
aatgaagttcgattcagatatgttgggcatcggccacgcccctttttcgggtgctgctcatggctcccatcgatttc
tgtgtgtgtgtggtgcggatcgttcgattgtgtggactaagaactagatatgtatgaaacttgcttcgacattgaca
cgctgaattcaataatttcgactgatttttccgataaacaaggcaaaacgaaaagcagcgactatgacaaattaacg
aaaactaaaaatgtaataaataataaataacaaaaataatgaaataaaattagcgatcgaacgtaatacgatgatac
tacatgggatccgatgaaaccgttctgctaaagctattaatggggatttatactatatctagatgcgat
translation
M V D F F E K L R R G H T F I Y I E H M M G T P E L
K I I L E F S--------------------------------2------------------------
\--------- A G A E L L F G N I K R R E L N L D G K Q K
\---------------------------------1-----------------------------W T I A N
L L K W M H A N I L T E R P E L F L Q G D T V---------
\-------------------------2------------------------- R P G I L V L I
N D T D W E L L \------------------------------0-----------------------
\-- G E L D Y E L Q P N D N V L F I S T L H G G \*
Lovely small compact gene with all intron boundaries predicted; encoded
protein is 121aa
Drosophila
MVDFFEKLRRGHTFIYIEHMMGTPELKIILEFSAGAELLFGNIKRRELNLDGKQKWTIANLLKWMHANILTERPELF
LQGDTVRPGILVLINDTDWELLGELDYELQPNDNVLFISTLHGG
Honey bee
RGGAELLFNKKKRLEVNLPGVNEWTIQKLLFWITDNLLKERQELFIQGDTVRPGILVLINDIDWELLGESDYKIKSG
DTILFISTLHGG
\**************A. mellifera BB260004B10A11.F GTTGCGCACTCCTAGAAACGCATCGGCAACGC
GAAACAACTGGCATCAAGAAATTGTACAATAGCTTCTTCGTAATGTTCTAAGGATAACTCTTCCTTAAGGAAGTTCG
GCCCTTTATTAACCGGTGGTCGAGTGATGCACCGTGCACCTCCGAGATCATTGTCGATCCCGATCGAGGAAAATACG
CGGAGAAAACGCACAAAAGAAGGAGGACGTTTTGGTTAGTAAACGAAAGAAAAGGAACGAGCTGAGCGGAGGAAGCC
GAGAAGCGGCGAAAACAAAGAGGAAAAAAAAAAAAAAAAAAAGCCAC
tiny match 88% over 25aa at e-07. Could be end of an ORF. No BLASTX
matches though, so novel protein; one Drosophila EST too.
Match is to an unannotated region of about 5kb, so try to figure it out.
AE003830.2 cgtaaacaactactatataatgtccgtcgccatctgacagtggtccaaacaacc
agcacaaagagctagaagtcgccgcagccagtgacaagtttcaccacagcgagtgagacctgtaccgtagaaccaac
acaagacacttaagcttgcaacacgggctaaccaattcagcgataaATGGAGAAATCTGAAATACGACTGCAACGCA
TGTCTAATGAATATCAGTCGCAATCGAGCTATATGTACCTCCGGACCAAGATGCTGTTAAAAATCGAGAATACCCTA
CTTCGAAGCCATCGTCAGCGCGAGACCACCGGTATCAAGAAACTATACAATTCGTTTTTCGTATTGTTTTAATTTGC
CCCCCCGGCCAGT
translation
M E K S E I R L Q R M S N E Y Q S Q S S Y M Y L R T
K M L L K I E N T L L R S H R Q R E T T G I K K L Y
N S F F V L F \*
This is all I can annotate, a simple little ORF, the second half of
which matches our honey bee EST as well as one Drosophila EST (which
has other problems including a piece in RC.
No idea if it is real, no BLASTP matches.
\*************A. mellifera BB260009A20B5.F TTTTTTTTTGAATTCTGTCCGAGATTCCATTCTT
AAAAAAGAAAACAAAAGCTAAAAGAATTAATAAGAAAAAAAATATATATATACACATATATTAAATAGTATAAATAA
ACATAACCTATAAATAGTAAAATATTCACATACATTTTAAAAGATTATTACTATTCTTAATAATAGACTTATAATGG
TTCTTCGCTATATCAATAAATTCTTATATTATTTAAATAAAATTATAGATTGAAAAAAATGATTTAAGTGAAATTAA
ATATCGAAAAAGAAAATAGAGATATCCACTTCCATACACAATATAAAGGGAAATGTAAATCGAAGTTAATTGTGATA
CAAATGCATACAAGAGAAATTAAGAAATACAATCTTATTTCATTATCATGCTATTTAAAAATACATTATATGAAAGC
AACTTTATTATGAAGTATCAAGAACTCCATTTTTATTCGTATTATATCCGGGACACATGGTTATTTTCTAACCTTAA
TAATACGTTTTAGTTTGGGTAGTGATTATTTAGAAATATTGAATGGAATATAAGATAAAAAAATCATGTCAAAGCTT
TTTGTTTTATTTGAACATGCTGCTGGCTATGCCATATTTTCTGTCAGAGAATTTGAAGAAGTGGGAATGTTATTGCC
TCAAGTTGAAGCATCTGTAACAGATTTGTCTCGTTTTAACTCAAGTGTGAAATTAATNTGGATTTTCACCTTTTAAA
ACTGGCTTAACAGCTCTAGAAAGTATAAATAATATTTCTGAAGGAATTGCCCACA
no obvious ORFs; frameshifted ORF at end of sequence (long 5' UTR or
chimeric?) has excellent match to nucleolar protein <up>Drosophila
subobscura</up>; tons of ESTs; why is it not annotated?
CG13849 is annotated for the same region, but the amino acids are
different? No, is part of this gene CG13849.
\*************A. mellifera BB260010A20C3.F GAAAAGGAAAACTCCATCAAAATTAAAGACCATG
GGTAATCATGATGACTTTTTAAATCGTATATCTAAATCGCTTTATTATGCAAAACTGCCAGTGACTGATTGCCTCAG
TTTACCTGTTACTGAATTGGCAGCAGAATTATTCACTGAAGTGAAGAGTGGTTATACACTTGAAAGATTAGATGTAG
AAGAGGCTAGTAGAATTTCTAGAAATGCATGTGTATCACCATGTTCTCTTGTTTTGGCATTGTTATATTTGGAGAGA
TTAAAAGATTGTAATCCAGAATATCTTCAACAAGTGGCACCTTCTGAGCTCTTCCTTGTTTCTTTGATGGTGGCTAG
TAAATTTTTAAACGATGAGGGAGAAGATGATGAAGTTTTCAATACTGAATGGGCACAATCAGCTGATTTGACTATAT
TACAAATAAATCGGTTAGAAAAAGATTTTCTTAAAGCTATTGATTGGACTGTTTTTGTTCATAATCAAGATTTTTGG
GAAAGATTGCAGAAATTAGAAAGAGATATAGCTTATAAGGAAGCACAAAAAAGAGGCTGGTTTTCATATACAGAATT
AAGTTGTCTAATGAATTCAATGCAATTAATTGCAGTAGCACATGCTGTAGTAAATGTATCATCTATTTGCTTAGCAA
CATATACTGCANGAGTAGTTACTCTTTTAGTTCTGCTTTAGTTGCAAGCTATCTTCCAGGAACAGTACTTAACAATC
CAAGACAAGTAACTAATTCTACAGATATTATGAAAGCAGATTTTAAATTCAAGATGGATATAACATCACCTATCGAA
ATTTATCAGAAATGTTTTACAACAGATTTATATC
long ORF encodes BLASTX match to CGI-57 protein <up>Homo sapiens</up>; e-37;
and nematode protein; these are 400aa proteins, and match is from the
N-terminus, our ORF may be frameshifted at end
TBLASTX match is to five regions in tiny unannotated scaffold; one
Drosophila EST and a Bmori EST
AE003132.1 TTCGCATATTTCAGTTATTTATTTAGAAATGGGGCGATTTAAGTTATGTGCTTC
GCCGAGAGAGgtatttaaaaagttttcacgaatatatgtattagcaaattttttaaatttccagGTTATGAAGTACG
AAGACTTTATAAAACGCATTCGAAAAAGCCTCTACTATGGCGTTGGAACACCAGACACAGAAATGTCGGTCTCCTTA
CCCTTTGCGGAGTACGCGGCAGATTTGTTTTCGGAGACTCATCGCGGGCATTCTTTGCATCGCCTAAGTTGCGTATC
TGCTGCACAAGTACATGCCACGCCTTGCTCTTTAATTATGGCATTGATATACCTCGATCGCTTAAACGTCATCGACT
CGGGCTATAGCTGCAGAATCACACCACAGCAGCTGTTTGTTGTGTCACTAgtaagtacgcactcctctataacttgc
aaactaatgcaaacaacaatatgaacgcaccgtaaaaaagtacatggctataaatgtcgaaactgtagctgctgaaa
caaatttccgttagtttcactgtcggctgaatgaaaaatgacgatgattttgatcagaaataaattgtaaaatttca
cagcggcactcactgtgtctgtacatgcactcagtcagcaagaagttttgacgtggctaccattgcttgagtccgct
ttaatacatatgtgattgtctgtttttaatatggtaattataagttgaataaatggtaattatctttacagATGATT
TCCACAAAATTCTACGCGGGCCACGACGAACGGTTCTATCTGGAAGACTGGGCCAGTGACGCTTGTATGACGGAAGA
TAGGCTCAAGGCAGTCGAGCTCGAATTTCTTTCCGCTATGgtaaactttacaatgtctaaaatacaaaaataaaata
cgttttttctagGGTTGGAATATATACATATCCAATGAGCTATTCTTTGATAAGTTAAGAAACGTTGAACGTTCTTT
GGCTGAACAGCAGGGACTGCGTAGAGGTTGGCTCACTTACAGTGAGCTCGTGCAGTTGCTGCCTAGCCTTGAATGGA
CGAAATTCCTCGTTAACAGCCTGTCTGTACTATCTCTAAGCTATGCGGCAAGTATTATAACATTAGCCGGAGCTTTT
TTTATTGCGAGCCAAGTTCCCGGTACGTTATGGCATCGGGATGTGGAAACTGCCTCAGATTTCACCATGACAATTAG
CAGTCAGGTATCCGTTTCAAATGCATTAGAGTCCACACCTTTTATTAATGTCCAAGTATCCTCACTTTTACGTAAAA
CGAGTAACGTGAATGTTGAATTGATGAATCTTGAGAAGACAAGCTGCGCCAGGGCAAGACTGAATAAAATTGAATAT
AAGCATCCGCGCCATCAATCAGTACCTACGCTTTCATTCATAAGCACCTGTCCACAACTTGATTTATTGTATGCCCA
AGATGGAACAAGGAATTGGCTAAATATTAAATCGCCCAACAGCGACTACAAAAACAACAGAAACCTTTCAATAACAG
TTAGATCCGTACAACTAGAAGAGCAAAAGGCTGAAAATGATTCCGTTATTTGGCAAGCCAACACCGAAGCAATGCAG
TAAttgtttttaccgcaaaacttaaagaggtgctaacaactgatataaaataaatatatttatattatatataatat
caatataataatattgataacaaattaaccaagcgtacgagtaatataacatgcataacagtaatacgaaactgctt
ttatttcttcacagaactaatgttcgctggcttaatcaaactgtcataaaaactataatagcacattattatatgtg
cctagaggtggatactttggatgctaaactaaaatgaacaaaataagttagattgttctatattatattaaaataaa
atgtttctgttgctctacatataaggaaatacttttttaaggaacaaaattatggccatcggacttttagttttccc
ggactttcgtccactcgaagcgtttttccgagataaataagttcgattattatacatacaagctgtaatatgttgcg
ctatacttcaaagttactgccttacactgaccgaaatcatttacaaaacaagagagaattctataatcaagttcccc
aactgtaactcagctggtgcaaagacactagaataacaagatgcgtaacggccatacattggtttg
HL02313.5prime
CATATTTCAGTTATTTATTTAGAAATGGGGCGATTTAAGTTATGTGCTTCGCCGAGAGAG-----------------
\-------------------------------------GTTATGAAGTACGAAGACTTTATAAAACGCATTCGAAAAA
GCCTCTACTATGGCGTTGGAACACCAGACACAGAAATGTCGGTCTCCTTACCCTTTGCGGAGTACGCGGCAGATTTG
TTTTCGGAGACTCATCGCGGGCATTCTTTGCATCGCCTAAGTTGCGTATCTGCTGCACAAGTACATGCCACGCCTTG
CTCTTTAATTATGGCATTGATATACCTCGATCGCTTAAACGTCATCGACTCGGGCTATAGCTGCAGAATCACACCAC
AGCAGCTGTTTGTTGTGTCACTA------------------------------------------------------
\-----------------------------------------------------------------------------
\--------------------------------------------ATGATTTCCACAAAATTCTACGCGGGCCACGAC
GAACGGTTCTATCTGGAAGACTGGGCCAGTGACGCTTGTATGACGGAAGATAGGCTCAAGGCAGTCGAGCTCGAATT
TCTTTCCGCTATG-------------------------------------------------GGTTGGAATATATAC
ATATCCAATGAGCTATTCTTTGATAAGTTAAGAAACGTTGAACGTTCTTTGGCTGAACAGCAGGGACTGCGTAGAGG
TTGGCTCACTTACAGTGAGCTCGTGCAGTTGCTGCCTAGCCTTGAATGGACGAAATTCCTCGTTAACAGCCTGTCTG
TACTATCTCTAAGCTATGCGGCAAGTATTATAACATTAGCCGGAGCTTTTTTTATTGCGAGCCAAGTTCCCGGTACG
TTATGGCATCGGGATGTGGAAACTGCCTCAGATTTCACCATGACAATTAGCAGTCAGG
translation
M G R F K L C A S P R E \-------------------------0----------------
\------------ V M K Y E D F I K R I R K S L Y Y G V G T P
D T E M S V S L P F A E Y A A D L F S E T H R G H
S L H R L S C V S A A Q V H A T P C S L I M A L I Y
L D R L N V I D S G Y S C R I T P Q Q L F V V S L \--
\--------------0-----------------------------------0--------------------------
\---------0-----------------------------------0-------------------------------
\----0-----------------------------------0-----------------------------------0
\-----------------------------------0-----------------------------------0-----
\------------------- M I S T K F Y A G H D E R F Y L E D W
A S D A C M T E D R L K A V E L E F L S A M \------------
\-----0------------------------------- G W N I Y I S N E L F F D
K L R N V E R S L A E Q Q G L R R G W L T Y S E L V
Q L L P S L E W T K F L V N S L S V L S L S Y A A S
I I T L A G A F F I A S Q V P G T L W H R D V E T
A S D F T M T I S S Q V S V S N A L E S T P F I N V
Q V S S L L R K T S N V N V E L M N L E K T S C A R
A R L N K I E Y K H P R H Q S V P T L S F I S T C
P Q L D L L Y A Q D G T R N W L N I K S P N S D Y K
N N R N L S I T V R S V Q L E E Q K A E N D S V I W
Q A N T E A M Q Z
Nice gene, with all intron termini predicted
This is the available 5' region
tacgcataaaaaaaaatttatccaaatatctccaatagtttaggaggtattaatttttgtaaaaaaacaggccaaat
gccccatagtgcagcggcgtcaccatcgccgtggtcctaattgtttatgggatcatccaaaacatccgttactgtcg
agtcaaaagacgacaccccgactccagcatcgagatgacgatatcttcccgaaaagccacggacgactttaacctaa
cgcggggaggagttaacacgctaactccacgataagaccggacgcgggcgcagtaacgtcagcgcgactgcaactaa
gtcgcgaaatatgactcctgcattccaacatgcccggagcgtgtgaagcgcaatgtcagtattctgccgtgagcgct
gcttcagaagacgggctacttcatattaagcttaagttctctgtctttagtttaaaaactcatcagaacgcgcatag
tcgcataataaatctcaataattaaaattgtttgttaatttatataaggctttttattcacgttgtttctctttcca
gctcttgacttaagcttctcgacctcgataacactatcgcttgttcttaagacaagacaattaattctatcgatata
agtgttagctagtattttatatttatacaccgtgtactcttagtattttatatttatagacagcttacaaaacaaaa
aatcgaaaacttggggtttgaattaaacatttaatagtcaattatttctatttggcatatccctttttagtttttac
tgcgcttggtaacgcctaatggtgtgcattaccatacaaaaattgtatgaactaaaaaagagtgtttctcttctcat
cgtttctaaaaacctctgcatggtaatgccggcagcttgacgatttttttaaaagtaattaaaaatttattagatcg
gtatcggttttttttataggactaaatagttttatt
\**************A. mellifera BB260017A10H4.F AGCTATCATAGGTTAGGAACGATACATGTTTCA
AAACCTAGTATACAAAGACGAAGCGGAAGTGATCCTAGCGAAGTAGCAAATGTTACAACTTTAGGCGAAGCTATTGA
AAAGTTGAATACTTCAAAAGATTCGACTTTGAGAAGAAACTCTGGTGGTCACGCAACGGTAACACGAAGTCATAGTA
GCTTATATGGATTAACGAGGCCAACCGATGAACGTTTAATAACACATCCGTTAAATCGTACATCTTCTCATGGTCAC
TTAAGTTTCGAGGAAATGTGTAAAGGAAATGAAATAAAATCAAAAGTATGGAATTCAGAAGTTATAGCTCCACCTGA
TGATGTTCAGACTCGACTAGGAATAGAGATGCTTACGCAACGAGATTTAACCAAATTACAACCATTATTATGGCTTG
AACTTACTGCGGTATTTGACAAATATAACGTACCATTAAAAAAACGCAAACCAAATAAACGTCGAACAAAAGCTGGA
AATCTATTCGGGGTTTCATTATCAACTCTTTTACTTCGAGATAGTCAATTATCGTCCGAGGAGAGTAATATTCCATT
AGTATTTCAAAAACTTTTTAACGAATTAACGAGACGTGGTGTTAAAAAAAAGGGTATTCTTCGCGTTGGAGGACATA
AGCAGAAAGTTGAATCAATATGCATGCAGCTGGAAACGGATTTCTATTGTAACCT
full ORF encodes >240aa leucine and serine-rich oprotein; 31% over
144aa; 3-11 BLASTX match to KIAA1314 protein <up>Homo sapiens</up>;
genomic match is to an unannotated region of a short messy scaffold;
indicates that other missing genes might be similar; no ESTs
AE003032.2 cctattgtcatttaataatactaattttatgcaaaatttaatttgtttactaaa
aggaacaagggacagttacacgctcacgaagtgccactccggattccctggactctttacaaatcgATGAAGCTTGG
ACCAACAATTCTTTGTATGTATTTTCCAATCGCTTAAGCCCCGTTTCCTTACTTTAAATTTATTCAATAGACCGACT
TTTGTAAATGTATATGAAAAAAATACCGAAACTGCCATACAATGTGTAGAACAATCAAACGAAATATATTTAAAACA
AAACCTGCGACGAACCCCTAGCGCCCCGCCTAAAAGCGGCACCTATGCGGATATATTTCGTGGTTCGCAAGTGCGAT
GTGATATACCTTTGTACTCGGCGGATGGTGTAGAATTGCTTGGATATTCACGGATTGGCACCATACAATTTCCTCGG
AACCGATCCGTCTCTGATCCATTTTGTTCTATTGGGTATGTTATTTTTAATAATTTGAGGACTCCTAAGGGGTTTGC
CTTTCTAAAGTTATGTGTTTTCAGTCGATCAAAGGAGTCAAGAAGTGAAAATGATGCACGATCACAAAAGAAAAAAT
CGAGTGAAGTGCTGTCAGCCTCAGAGAATGAATGCGGTCGCCTTTTACCGATGCCCTACAATGTCCTGAGCTTTGAA
AGTATTTGTCGAGATTCTTCTAGTCTTGATAGCTGCGAAGTCCTAGATACCTGTGATATCCCTTCAACGCTATTCAC
AGATGTCGTATTAATAAGCGAAACTGACATGAAACGTTTACAGACAATACTTTGGTTAGAGTTAGCCACAATATTTG
ATCGCAACAAAGTTTCTTTAGATAAAAGAAAACCTTTTAAGCGTCGTCGCAAAGAAGAGGGTAATCTTTTTGGGGTA
TCTATAAACGCTCTTATTCGTCGGGATCAGCAAGTTACTGGCACTGACTCATCTTTGGTCCCACTATTTTTGGAAAA
GCTAATTGGCGAACTTCTGCGACGTGGCTCTAGAGAAGAAGGATTACTTCGAATAGGTGGTCATAAACAAAAGgtta
taataataaataatgttaataatgaaaataattggtaatacaattttaacaattttctattttcagACTGAATTACT
TTATAATGAATTAGAATCAACATTTTATCAAAATCCAGATAATCTAGATAACCTCTTTCGCACAGCTACTGTTCATG
AACTTAGTTCGTTGCTAAAACGATGGCTGCGCGAACTTCCTCAACCTTTGCTTACTAATGAGCTTATACAACTGTTT
TATCAATGTCACACACTTCCATCAATAGATCAAATGAATGCACTATCGATTTTATGTCACCTGCTTCCGCCTGAAAA
TAGAAACACATTACGTTCATTATTAAGCTTTTTTAATATTATAATTAATTTAAAAGATATAAACAAAATGAATGTGC
ATAACGTAGCAACAATAATGGCACCGTCAATGTTTCCACCACGTTATATACATCCGAGTGACAATAACAGCATTGCA
GAACAAGTAAGAATGGCCGCTCAGTGTTGCCGTTTGACGAATATTTTAATCCTACGTGGCGAAAAACTTTTCCAAGT
ACCAAACAATTTAATTGTGGAGTCACAGAAAACAATGATGGTATGTGTTATTCTTGAGTTTTTAATTAACCGTAAAT
ATATATGTTTCCATAGGGTAAGAAAGGATGGCATCGGCATCGGAATTCAAATGAAATTACGGCAAAACCAAGCGGAA
AGGCGAGCAATGTCGGCGTTGGACACGACTCTACAGTTATAAATAAATACTCAACCAATTTAAAGCATTTACATCCA
TTTGTTATTTAAACAAACGGCTTCTAACGGTGCTTAAGTTGTATTATATGTTGATAAAATATTTACCTATTATTAAA
GAAAATATAAAGAATATCCTCTATAAACCGTCAAATTGAAAAAAATGTTCAATTTCAAACCAATTTCAAATGTTCAG
ATTCAAACCAAAAATTAAGTGAAGCTCAACTGAATAGTTTTGAAAACCCTTTCTAACATTTTTTTTGTTCCTTTTCA
AATTTTTGATCTTTCGAACTGCTTCAAACTCTACCTATTTTGCTGGTAAAATTATGGAAGAATATATATATT
translation
M Q N L I C L L K G T R D S Y T L T K C H S G F P G
L F T N R Z S L D Q Q F F V C I Q S L K P R F L T
L N L F N R P T F V N V Y E K N T E T A I Q C V E
Q S N E I Y L K Q N L R R T P S A P P K S G T Y A D
I F R G S Q V R C D I P L Y S A D G V E L L G Y S R
I G T I Q F P R N R S V S D P F C S I G Y V I F N
N L R T P K G F A F L K L C V F S R S K E S R S E N
D A R S Q K K K S S E V L S A S E N E C G R L L P M
P Y N V L S F E S I C R D S S S L D S C E V L D T C
D I P S T L F T D V V L I S E T D M K R L Q T I L
W L E L A T I F D R N K V S L D K R K P F K R R R K
E E G N L F G V S I N A L I R R D Q Q V T G T D S S
L V P L F L E K L I G E L L R R G S R E E G L L R
I G G H K Q K \-----------------------------------1--------------------
\-------------- T E L L Y N ELESTFYQNPDNLDNLFRTATVHELSSLLKRWLRELPQPLLTNE
LIQLFYQCHTLPSIDQMNALSILCHLLPPENRNTLRSLLSFFNIIINLKDINKMNVHNVATIMAPSMFPPRYIHPSD
NNSIAEQVRMAAQCCRLTNILILRGEKLFQVPNNLIVESQKTMMVCVILEFLINRKYICFHRVRKDGIGIGIQMKLR
QNQ A E R R A M S A L D T T L Q L Z
This turns out to be a huge complicated gene, starting with the
annotated CG17082, but extending far further over a section of nnnnns
Here are the cDNAs that match
LD04957.5prime AATTCGGCACCAAGGAAAAGAAGTTACAACCTGCTACCCAGTTGATGATTTTTTGTTG
ATGAAACTAGTAGTTTTGCACATAAGCTGTGTCTAATTTACTTAACTTTTTATAATTAAAAAAAGTGTTTGTTTATT
TTAATATGAAAAAACAACTAGATATGCGTGTTGTCATGGGTTCTTAGGTATTTTCTCCCATGGATACGAGAAAATTT
GTTATAAAGAGACGTTTAAAATGAACAGCAATACAGATCTCCATCATTCAGATGATCAGGACTTTTCGGAATTTCTC
AATGAGTATTATCTGCAAAGCAATTCCCAAAGTATCGAACCTGAAGCAAGTTACGAAGATGGAGAAATGGAAGCAGA
GTGGCTAGTTTCTGCTGGTTATCCAGAATTGACAAAACCGTTTG
LD16910.5prime CTGCTACCCAGTTGATGATTTTTTGTTG
ATGAAACTAGTAGTTTTGCACATAAGCTGTGTCTAATTTACTTAACTTTTTATAATTAAAAAAAGTGTTTGTTTATT
TTAATATGAAAAAACAACTAGATATGCGTGTTGTCATGGGTTCTTAGGTATTTTCTCCCATGGATACGAGAAAATTT
GTTATAAAGAGACGTTTAAAATGAACAGCAATACAGATCTCCATCATTCAGATGATCAGGACTTTTCGGAATTTCTC
AATGAGTATTATCTGCAAAGCAATTCCCAAAGTATCGAACCTGAAGCAAGTTACGAAGATGGAGAAATGGAAGCAGA
GTGGCTAGTTTCTGCTGGTTATCCAGAATTGACATAACCGTGTGAACAAGGGTTAGA
LD08837.5prime
CTTTACACAAACATCTTGATATTT
GCCTTAGTAAATCTTTAGTGTATAGGTGAAATATACTGAATTTCTGTGTATTTTCTCCCATGGATACGAGAAAATTT
GTTATAAAGAGACGTTTAAAATGAACAGCAATACAGATCTCCATCATTCAGATGATCAGGACTTTTCGGAATTTCTC
AATGAGTATTATCTGCAAAGCAATTCCCAAAGTATCGAACCTGAAGCAAGTTACGAAGATGGAGAAATGGAAGCAGA
GTGGCTAGTTTCTGCTGGTTATCCAGAATTGACAAAACCGTTTGAACAAGGGTTAGAGGTTTCTAAAAAGGACTTGG
AACCCATACTGACTACTTTATCGAAGCCCCATGCTGAAGCGATTGTACAACTAGTAAGGACTTTAAACAAAACAGTA
CGGGTGCGCACAAAAAGTCGGCCAAAACGGAAACCTGATATAAGGGATGTATTTCGCGAATTTGATGAACAAGGGAC
AGTTACACGCTCACGAAGTGCCACTCCGGATTCCCTGGACTCTTTACAAATCGATGAAGCTTGGACCAACAATTCCT
TACCGACTTTTGTAAATGTATATGAAAAAAATACC
LD34572.5prime
TATTTTCTCCCATGGATACGAGAAAATTT
GTTATAAAGAGACGTTTAAAATGAACAGCAATACAGATCTCCATCATTCAGATGATCAGGACTTTTCGGAATTTCTC
AATGAGTATTATCTGCAAAGCAATTCCCAAAGTATCGAACCTGAAGCAAGTTACGAAGATGGAGAAATGGAAGCAGA
GTGGCTAGTTTCTGCTGGTTATCCAGAATTGACAAAACCGTTTGAACAAGGGTTAGAGGTTTCTAAAAAGGACTTGG
AACCCATACTGACTACTTTATCGAAGCCCCATGCTGAAGCGATTGTACAACTAGTAAGGACTTTAAACAAAACAGTA
CGGGTGCGCACAAAAAGTCGGCCAAAACGGAAACCTGATATAAGGGATGTATTTCGCGAATTTGATGAACAAGGGAC
AGTTACACGCTCACGAAGTGCCACTCCGGATTCCCTGGACTCTTTACAAATCGATGAAGCTTGGACCAACAATTCTT
TACCGACTTTTGTAAATGTATATGAAAAAAATACCGAAACTGCCATACAATGTGTAGAACAATCAAACGAAATATAT
TTAA
LD27621.5prime
CAGATCTCCATCATTCAGATGATCAGGACTTTTCGGAATTTCTC
AATGAGTATTATCTGGCAAGCAATTCCCAAAGTATCGAACCTGAAGCAAGTTACGAAGATGGAGAAATGGAAGCAGA
GTGGCTAGTTTCTGCTGGTTATCCAGAATTGACAAAACCGTTTGAACAAGGGTTAGAGGTTTCTAAAAAGGACTTGG
AACCCATACTGACTACTTTATCGAAGCCCCATGCTGAAGCGATTGTACAACTAGTAAGGACTTTAAACAAAACAGTA
CGGGTGCGCACAAAAAGTCGGCCAAAACGGAAACCTGATATAAGGGATGTATTTCGCGAATTTGATGAACAAGGGAC
AGTTACACGCTCACGAAGGNCCACTCCGGATTCCCTGGACTCTTTACAAATCGATGAAGCTTGGACCAACAATTCTT
TACCGACTTTTGTAAATGTATATGAAAAAAATACCGAAACTGCCATACAATGTGTAGAACAATCAAACGAAATATAT
TTAAAACAAAACCTGCGACGAACCCCTAGCGCCCCGCCTAAAAGCGGCACCTATGCGGATATATTTC
LD04154.5prime
TGGCTAGTTTCTGCTGGTTATCCAGAATTGACAAAACCGTTTGAACAAGGGTTAGAGGTTTCTAAAAAGGACTTGG
AACCCATACTGACTACTTTATCGAAGCCCCATGCTGAAGCGATTGTACAACTAGTAAGGACTTTAAACAAAACAGTA
CGGGTGCGCACAAAAAGTCGGCCAAAACGGAAACCTGATATAAGGGATGTATTTCGCGAATTTGATGAACAAGGGAC
AGTTACACGCTCACGAAGTGCCACTCCGGATTCCCTGGACTCTTTACAAATCGATGAAGCTTGGACCAACAATTCTT
TACCGACTTTTGTAAATGTATATGAAAAAAATACCGAAACTGCCATACAATGTGTAGAACAATCAAACGAAATATAT
TTAAAACAAAACCTGCGACGAACCCCTAGCGCCCCGCCTAAAAGCGGCACCTATGCGGATATATTTCGTGGTTCGCA
GTGCGATGTGATATACCTTTGTA
LD15784.5prime
AAAAGTCGGCCAAAACGGAAACCTGATATAAGGGATGTATTTCGCGAATTTGATGAACAAGGGAC
AGTTACACGCTCACGAAGGTCCACTCCGGATTCCCTGGACTCTTTACAAATCGATGAAGCTTGGACCAACAATTCTT
TACCGACTTTTGTAAATGTATATGAAAAAAATACCGAAACTGCCATACAATGTGTAGAACAATCAAACGAAATATAT
TTAAAACAAAACCTGCGACGAACCCCTAGCGCCCCGCCTAAAAGCGGCACCTATGCGGATATATTTCGTGGTTCGCA
AGTGCTATGTGATATACCTTTGTACTCGGCGGATGGTGTAGAATTGCTTGGATATTCACGGATTGGCACCATACAAT
TTCCTCGGAACCGATCCGTCTCTGATCCATTTTGTTCTATTGGTCGATCAAAGGAGTCAAGAAGTGAAAATGATGCA
CGATCACAAAAGAAAAAATCGAGTGAAGTGCTGTCAGCCTCAGAGAATGAATGCGGTCGCCTTTTACCGATGCCCTA
CAATGTCCTGAGCTTTGAAAGTATTTGTCGAGATTCTTCTAGTCTTGATAGCTGCGAGTCCTAGATACCTGTGATAT
CCCTTCAACGCTATTCACAGATGTCGTATTAATAAGCGAAACTGACATGAAACGTTTACAGACAATACTTTGGTTAG
AGTTAG
Contig AATTCGGCACCAAGGAAAAGAAGTTACAACCTGCTACCCAGTTGATGATTTTTTGTTG
ATGAAACTAGTAGTTTTGCACATAAGCTGTGTCTAATTTACTTAACTTTTTATAATTAAAAAAAGTGTTTGTTTATT
TTAATATGAAAAAACAACTAGATATGCGTGTTGTCATGGGTTCTTAGGTATTTTCTCCCATGGATACGAGAAAATTT
GTTATAAAGAGACGTTTAAAATGAACAGCAATACAGATCTCCATCATTCAGATGATCAGGACTTTTCGGAATTTCTC
AATGAGTATTATCTGCAAAGCAATTCCCAAAGTATCGAACCTGAAGCAAGTTACGAAGATGGAGAAATGGAAGCAGA
GTGGCTAGTTTCTGCTGGTTATCCAGAATTGACAAAACCGTTTGAACAAGGGTTAGAGGTTTCTAAAAAGGACTTGG
AACCCATACTGACTACTTTATCGAAGCCCCATGCTGAAGCGATTGTACAACTAGTAAGGACTTTAAACAAAACAGTA
CGGGTGCGCACAAAAAGTCGGCCAAAACGGAAACCTGATATAAGGGATGTATTTCGCGAATTTGATGAACAAGGGAC
AGTTACACGCTCACGAAGGTCCACTCCGGATTCCCTGGACTCTTTACAAATCGATGAAGCTTGGACCAACAATTCTT
TACCGACTTTTGTAAATGTATATGAAAAAAATACCGAAACTGCCATACAATGTGTAGAACAATCAAACGAAATATAT
TTAAAACAAAACCTGCGACGAACCCCTAGCGCCCCGCCTAAAAGCGGCACCTATGCGGATATATTTCGTGGTTCGCA
AGTGCTATGTGATATACCTTTGTACTCGGCGGATGGTGTAGAATTGCTTGGATATTCACGGATTGGCACCATACAAT
TTCCTCGGAACCGATCCGTCTCTGATCCATTTTGTTCTATTGGTCGATCAAAGGAGTCAAGAAGTGAAAATGATGCA
CGATCACAAAAGAAAAAATCGAGTGAAGTGCTGTCAGCCTCAGAGAATGAATGCGGTCGCCTTTTACCGATGCCCTA
CAATGTCCTGAGCTTTGAAAGTATTTGTCGAGATTCTTCTAGTCTTGATAGCTGCGAGTCCTAGATACCTGTGATAT
CCCTTCAACGCTATTCACAGATGTCGTATTAATAAGCGAAACTGACATGAAACGTTTACAGACAATACTTTGGTTAG
AGTTAG
Put these together to encode a 296aa protein at 14% serine. Seems
good, but too hard to put all the introns together here.
\*****************A. mellifera BB260019B20F2.F
GGATCGTTATCCTTTCAATGACGAATAATTATTCTAAAATCGTATCAAAATTACTAACGAGTCAAAAATTTTACCTG
TGCTCGAGAAACGCGCGTAGTATAAATAAAATTGGACGGATATCCTTTCAATGACGAATAATTATTCTAAAATCAAA
ATTAACGAGTCAAAAATAAGCGTTGAAAGAGAAAGAAAATTTTAAGAGGAGAATGTAAATGGAATCGATAGATGACG
GTGGCTGTCACGATGCTGCAAGGTCGACCGGAAGCGAGACGGGGTGTTCAATTTCCGCCGAATGAAACGAAGCTGTG
CCCGTTTCAGGAACTCGCGCACAAGCTCTCGCGGAGCTTCGACATGCAGGAGGCTCAGCTCCTCGAGGAGGGTGGCT
CTGGTGCCGCTACGGCGGGTGCTGGGGGCGCGGGTGGCCCGCCACGAAGACATCACAGCGCTCAGAGATTGGCCAGG
AGCGAAATATCTGAGAGAAGGGAAGAGGACGGAGCGTTGGTGGTACCCGATCACCAGGGCAACCTGAGGATCACCGT
GAAGAAGACCAAGTCGATTTTGGGCATTGCCATCGAGGGCGGTGCCAATACCAAACATCCACTGCCCAGGATCATCA
ATATACACGATAATGGGGCAGCTTACGAGGCAGGTGGCCTCGAGGTCGGTCAACTGATCCTAGAAGTCGATGGACAC
AAAGTGGAAGGTCTGCATCATCAGGAGGTGGCAAGACTGATCGCAGAGTCGTTCGCGAGGCGCGATCGCAACG
long ORF after long 5' UTR encodes 11% G/A protein; weak but convincing
BLASTX matches to several Drosophila proteins; especially CG7151 at
e-05; strangely there is a BLASTX match to KIAA1526 protein <up>Homo
sapiens</up> which is far longer and better at 32% over 175aa; e-18;
genomic match is even better at 80% to unannotated region; NEW GENE!;
but no ESTs? Most of it is a PDZ domain.
There are three versions of the human protein annotated, 500, 700 and
900aa long, and in each we have the C-terminus, so perhaps our cDNA is
unspliced?
AE003536.2 tcgaaaccgaaactaaaattaaatgaataggagttggccacaaaactctcaaga
agtttcgatATGAAGGAAGAAGGCGGCACATTGCTGGGCGATAAAGGTGTACGAAGGCATCAGgtgagtgctactcg
aaaatagtgcaagtacactaaataaacagatatctatatcaacaaataataactgaatacagcacttgaatgccaac
aagagaaaccagtatacgtttcagctcgaaattaacagagttaaatcatattatcataagactctcgtcggtttttc
tattaaattctatacaattcggtcaccccagaacatttcatgattcattccaaacaaatcacacaaaagtttctact
ttctatctctctaattctctttatacctaactcaaaaataaacagcaactgtaccattgattattttaaactagcaa
tatactatgtaaatttgaattggcctaccaatagcttgaaccgtttcctattgtttgaacccacacgtaattaagta
caagaacccaaaacagtatacatatacaaattccttgaatcaaccaattgttaaattgcttgcctttaccaaatcac
agaatctattttatcagcttgaaccaagttacttttaatcaaaccaatacacaatttgaattgtaagctaatcctag
aagtaaagccaatggaatatttttgatcaaaccggaaaaacctactaaaatctttaatttagtttaaagatttttca
tgaacttggcctatttgtttttcagTCCATGCAGCGTCTGTCAGCGGAGCAGAATGGTGGTTCAACGACTGAACAAA
CACATGAACACAATCCAAACGTCGTACCTGATCATAGAGGCAACTTACACATTACAGTTAAGAAAACCAAACCAATT
TTAGGTATTGCTATCGAAGGTGGTGCTAATACAAAACACCCGCTCCCTAGGATAATCAATATCCATgtaagtacgca
atagatcataaaatgactttttgagactcaaatgcctggcaaatagGAAAATGGTGCAGCATTTGAAGCGGGCGGCT
TAGAAGTCGGCCAACTCATCCTGGAGGTAGATGGAACGAAAGTGGAGGGTCTGCATCATCAGgtatgttcgagctcc
cttttttttttgagttattaatttgacggttttctttttgattttccagGAGGTTGCTCGACTAATAGCCGAATGCT
TTGCTAATCGTGAAAAGGCTGAAATAACCTTCTTAGTTGTCGAAGCAAAAAAATCAAATTTGGAACCGAAGCCGACG
GCGCTGATATTTTTAGAAGCCTAAcatttgcttgccctgccggccagtgacccattggaacgacaaaaactcgagtt
cctttacgaatggggcatcgatttaacagagacgccaaaaccaatgccaacaccaataccgctaacgaaagccaaga
atccgccgccactgccccatgagttgcacaacaacatcaacagccagtatggtagcagtgcggctctcagcaatcat
caacctcatcagcatacgcatccacatccccagcagcagcagcagcaacaacaacattcaaacacaaaaacgcccaa
cacgaacagcaacaaaacacaaggaacaccaacaacgggaacgggagctgcaacaactggcagcaaacaacaacagc
aaccgggaaacaccaccaacacaccaacgaaggcgtcgcgtgaggcgactcccacaagggagcagcatc
translation
M K E E G G T L L G D K G V R R H Q \----------------0-------
\------------------------0-------------------------------0--------------------
\-----------0-------------------------------0-------------------------------0-
\------------------------------0-------------------------------0--------------
\-----------------0-------------------------------0---------------------------
\----0-------------------------------0-------------------------------0--------
\-----------------------0-------------------------------0---------------------
\----------0-------------------------------0-------------------------------0--
\-----------------------------0-------------------------------0---------------
\--------------- S M Q R L S A E Q N G G S T T E Q T H E H
N P N V V P D H R G N L H I T V K K T K P I L G I
A I E G G A N T K H P L P R I I N I H \---------------------
\-------0---------------------------- E N G A A F E A G G L E V G
Q L I L E V D G T K V E G L H H Q \------------------------0
\--------------------------------------- E V A R L I A E C F A N R
E K A E I T F L V V E A K K S N L E P K P T A L I
F L E A Z
This is highly speculative, and there's plenty of room for more 5'
Drosophila
MKEEGGTLLGDKGVRRHQSMQRLSAEQNGGSTTEQTHEHNPNVVPDHRGNLHITVKKTKPILGIAIEGGANTKHPLP
RIINIHENGAAFEAGGLEVGQLILEVDGTKVEGLHHQEVARLIAECFANREKAEITFLVVEAKKSNLEPKPTALIFL
EAZ
Honey bee from below
MPRTRAGCRRTPQHQVTTSDLALSNDDECDNQDYEDELENGRGRRHSSPGGSRGNPRDYGHHHLHPDNLELAHKLS
RSFDMQEAQLLEEGGSGAATAGAGGAGGPPRRHHSAQRLARSEISERREEDGALVVPDHQGNLRITVKKTKSILGIA
IEGGANTKHPLPRIINIHDNGAAYEAGGLEVGQLILEVDGHKVEGLHHQEVARLIAESFARRDRNEIEFLVVEAKKS
NLEPKPTALIFLEA
\****************extras \- the first 90kb of AE003536.2 above is unannotated \-
but some possible ORFS in it, so try BLASTX searches, and find NOTHING! Trul
y 90kb of nothing!
\****************A. mellifera BB260020B10C4.F GGCATCCAGGATAGAGCAGCGACGATATCGC
AATCGAGAGAGTGCAATCGATGAATGTCATCAAGAAGAATGACCAACTGTTCCTTTATAGGTTCCTCCTCGATCCGT
CTCATAATTTGCCCCAGCCAATTGTTCAAATAAAGAGGATCAAACGAGGCGTCTTTGGGAAGATTTCCATCTAACGA
GCCGGAAAGGAAGCCGACGTGTTGACAGAGTACGCGAAGCAGTTCGAGACTATACGCTGATCGAGGTGTAGACGCAC
AGAGGCGAACAATCCTGAATGTGGGACAATCCAACCAGGTGGAAGCATCCTCGTAAATTCTTGACAGCAACGATGAT
TTCCCGGAATATTTACCGCCTTTTATCAATATGGGGCCATGCTTCTGTGATTTACCCGATATTAAGATCTCCCTGAT
CCTTTCAATATTCTCGTTGTTCTCGCAGGGTTTAATCTCCCTAAGAAGCGCCAGATGTACCAGACTCTCCGCATAGA
TTTCCTGCACCATCTTCTTCTTGCTCTTGATTTCTGGATCCGCTTCTATAGATGCGTTCACCAAATTCTGCACGCGA
TTCACCACCAAGCGACAAAGATGAGCCAGGTAGCCTTTATGGGATCTATTATCGGGATTAATGCCGCCTTGCTTCTG
ATCCACATCGATGTTAATTACATTGTCGGATGGGAGACGGTTGTCCACCAAAGACTTCAACGAGGCAACTGCTCTTG
TGCTCTTTTTGCCTTTCCATGTGCGTTTCACGGCTATTATGCCGTCACCTGTTTCG
full ORF on RC encodes 260aa 13% leucine protein; only very weak BLASTX
matches to Drosophila proteins; nothing more in nr;
genomic match is near N-terminus of 1240aa protein  BcDNA:GH04922  gene
product; with two ESTs, so could be alternative splice?
In fact end of this translation overlaps N-terminus of this protein, so
is probably even longer.
Also, weakly matches C. elegans T05C3.2 , which is 2340aa, from
900-1150, so Drosphila protein could be a lot longer.
In fact, connects CG17671 to  BcDNA:GH04922 , but the two Drosophila ESTs
are only for the start of the latter. I will not try to figure it all
out.
\*****************A. mellifera BB260022B10E6.F ATGCCCAGGACCAGAGCTGGGTGCAGGAGG
ACGCCGCAGCACCAAGTGACCACCTCCGATCTCGCCCTCTCCAACGACGACGAGTGCGACAACCAGGATTACGAGGA
CGAGTTGGAGAACGGGCGTGGACGTAGGCACAGCTCGCCAGGTGGCTCGAGGGGGAACCCCAGGGATTACGGCCATC
ATCATCTTCATCCGGACAATTTGGAACTCGCGCACAAGCTCTCGCGGAGCTTCGACATGCAGGAGGCTCAGCTCCTC
GAGGAGGGTGGCTCTGGTGCCGCTACGGCGGGTGCTGGGGGCGCGGGTGGCCCGCCACGAAGACATCACAGCGCTCA
GAGATTGGCCAGGAGCGAAATATCTGAGAGAAGGGAAGAGGACGGAGCGTTGGTGGTACCCGATCACCAGGGCAACC
TGAGGATCACCGTGAAGAAGACCAAGTCGATTTTGGGCATTGCCATCGAGGGCGGTGCCAATACCAAACATCCACTG
CCCAGGATCATCAATATACACGATAATGGGGCAGCTTACGAGGCAGGTGGCCTCGAGGTCGGTCAACTGATCCTAGA
AGTCGATGGACACAAAGTGGAAGGTCTGCATCATCAGGAGGTGGCAAGACTGATCGCAGAGTCGTTCGCGAGGCGCG
ATCGCAACGAGATCGAGTTCCTGGTGGTCGAGGCGAAAAAAAGCAACCTGGAGCCGAAGCCGACAGCGCTGATATTC
CTGGAGGCGTAGCAGGAATCTCCCACCTCCGAGCCGGAACCACCGTAGCCCAACCTGACCAGAGCTCGGCCCGCGAC
ACACTTGGAGCGTGTCGACGCTGAACCGACGTCATGAGAAG
long ORF encodes G/A rich protein with only weak BLASTX match to
several Drosophila proteins; 46% e-20 match to KIAA1526 protein <up>Homo
sapiens</up>;
genomic match is 85% to at least three exons in unannotated region; NEW
GENE; no ESTs
This protein has a PDZ domain, so could be same as BB260019B20F2.F?
Indeed it is. But differ in the 5' regions, which is probably an
intron in that one.
>BB260019B20F2.F 766 0 766 ABI GGATCGTTATCCTTTCAATGACGAATAA
TTATTCTAAAATCGTATCAAAATTACTAACGAGTCAAAAATTTTACCTGTGCTCGAGAAACGCGCGTAGTATAAATA
AAATTGGACGGATATCCTTTCAATGACGAATAATTATTCTAAAATCAAAATTAACGAGTCAAAAATAAGCGTTGAAA
GAGAAAGAAAATTTTAAGAGGAGAATGTAAATGGAATCGATAGATGACGGTGGCTGTCACGATGCTGCAAGGTCGAC
CGGAAGCGAGACGGGGTGTTCAATTTCCGCCGAATGAAACGAAGCTGTGCCCGTTTCAGGAACTCGCGCACAAGCTC
TCGCGGAGCTTCGACATGCAGGAGGCTCAGCTCCTCGAGGAGGGTGGCTCTGGTGCCGCTACGGCGGGTGCTGGGGG
CGCGGGTGGCCCGCCACGAAGACATCACAGCGCTCAGAGATTGGCCAGGAGCGAAATATCTGAGAGAAGGGAAGAGG
ACGGAGCGTTGGTGGTACCCGATCACCAGGGCAACCTGAGGATCACCGTGAAGAAGACCAAGTCGATTTTGGGCATT
GCCATCGAGGGCGGTGCCAATACCAAACATCCACTGCCCAGGATCATCAATATACACGATAATGGGGCAGCTTACGA
GGCAGGTGGCCTCGAGGTCGGTCAACTGATCCTAGAAGTCGATGGACACAAAGTGGAAGGTCTGCATCATCAGGAGG
TGGCAAGACTGATCGCAGAGTCGTTCGCGAGGCGCGATCGCAACG
ATGCCCAGGACCAGAGCTGGGTGCAGGAGGACGCCGCAGCACCAAGTGACCACCTCCGATCTCGCCCTCTCCAACG
ACGACGAGTGCGACAACCAGGATTACGAGGACGAGTTGGAGAACGGGCGTGGACGTAGGCACAGCTCGCCAGGTGGC
TCGAGGGGGAACCCCAGGGATTACGGCCATCATCATCTTCATCCGGACAATTTG-----------------------
\-----------------------------------------------------------------------------
\-----------------------------------------------------------GAACTCGCGCACAAGCTC
TCGCGGAGCTTCGACATGCAGGAGGCTCAGCTCCTCGAGGAGGGTGGCTCTGGTGCCGCTACGGCGGGTGCTGGGGG
CGCGGGTGGCCCGCCACGAAGACATCACAGCGCTCAGAGATTGGCCAGGAGCGAAATATCTGAGAGAAGGGAAGAGG
ACGGAGCGTTGGTGGTACCCGATCACCAGGGCAACCTGAGGATCACCGTGAAGAAGACCAAGTCGATTTTGGGCATT
GCCATCGAGGGCGGTGCCAATACCAAACATCCACTGCCCAGGATCATCAATATACACGATAATGGGGCAGCTTACGA
GGCAGGTGGCCTCGAGGTCGGTCAACTGATCCTAGAAGTCGATGGACACAAAGTGGAAGGTCTGCATCATCAGGAGG
TGGCAAGACTGATCGCAGAGTCGTTCGCGAGGCGCGATCGCAACGAGATCGAGTTCCTGGTGGTCGAGGCGAAAAAA
AGCAACCTGGAGCCGAAGCCGACAGCGCTGATATTCCTGGAGGCGTAGCAGGAATCTCCCACCTCCGAGCCGGAACC
ACCGTAGCCCAACCTGACCAGAGCTCGGCCCGCGACACACTTGGAGCGTGTCGACGCTGAACCGACGTCATGAGAAG
See above file. The N-terminus encoded by the 5' part of this contig
does not have a Drosophila match, but rest is clearly a good match, so
unclear which one is right or wrong.
\******************A. mellifera BB260023A20H5.F TGTACGCGCATGTGACAAATAGAAAAGAA
TTTATTTCTTATTTTTTTTATATTTTTTTTTTCTTACAATGAATCAATATAATATAATATCTATATAAAAATTTATT
AAAAGATGGAATTCATTAATTTTAAAATATATACGATAGTGATTATAAGCCTCTATTGTTTGGGCGTTGTCGGACAA
TATGAGTGGCAAGCTAGAGATGCTTTTGATGAAATCCGTTTAAAAATGGATAAGATTAATGAAGAGAATTGTCCTAT
TCAACATATCGGAGACCTCTATCTACCGGAAGATACAGTCTCTCATTTACCTGACATTAAAGATATTAATATCAATC
CTGTATTTCCAAATAGAACTGCTTTACTTCATCTTCATAATATGGCTCTTAGTAGATCATTTTTTTGGAGTTATATT
TTACAATCTCGATTCATACGTCCAGCTATCAATGATACTTATGATCCAGGCATGATGTATTATTTTTTATCAACAGT
TGCTGATGTTTCTGCAAATTCACATATAAATGCTTCTGCTATATATTTCTCACCAAATATGTCTTATTCTTCATCAT
ATAGAGGTTTTTTTAATAAAACTATGCCCAGATTTGCTCCGAGAACTTTTAGAGCTGATGATNTTAATGATCCTATA
CATTTAGAAAGAATATCCACAAGAAATACATTTAATGTGCAAGATCTTGGGGCATTTGCCAGTGGTAGTCTTGGTGA
AGATTATACAACAGATTATTATCGTATAAATGAA
Long complete ORF encodes secreted 180aa 12% isoleucine protein;
genomic match is 80% full-length e-108 to single exon is unannotated
region; NEW INSECT GENE; two ESTs
From 6-42kb is empty in this accession! Check for other genes too.
AE003732.2 RC ATATATAGAAGCAATAGCATTGAAATAGCAACATTCATTTTTTGTATTAATGTT
CGATATCAGCCAGCAACCAATATAAACAATATATCGATACTATCGGACAGGTGGCCCATCTCTGTTTGAGGGCGTAG
TTCCAACAAGTGCTGAGCATCACAATTTTCTATTACTAAGCCCAGCTTTGCGTTGGCGCGCCCCAGAATCTCATTTT
ATATTTAGTTTCTGCCAGTTTAGTTAATTAGTTAGTTGATAGTGTTGTTTGTTTCTTCTGCAACAATTGTGTGCGAT
AGGAGTCGGGCAAAATGTTCCCGTCGTCGATTTTGGGGCGCAGCTATTTGCTTTTTATGCTGGTGCTCGCCGTGGGC
GTGTTCGCCCAACACGAGTGGCAGGCCCGGGATGCCTTTGATGAGATAAAGAGGCAGTTCGACAAGGTGAACGCGGA
TAACTGCCCCATCCAACACCATTCGGACCTTTTCATGCCCATGGACGCGGTGTCCCACAAGCCGGACATCAAGGAGA
TCAACGTGAATCCGGTGTTCCCCAACCGAACTGCCCTGCTGCATCTGCAGAATATGGCCCTTAGCAGAAGCTTCTTC
TGGAGCTACATCCTCCAGTCGAGGTTTATTCGACCCGCCATCAACGACACCTACGATCCCGGCATGATGTACTACTT
TCTGTCCACCGTAGCCGATGTATCCGCCAACCCACATATCAACGCCTCGGCCGTGTACTTCTCCCCCAACAGCTCGT
ATTCGTCGTCGTATCGCGGCTTCTTCAATAAGACGTTCCCCAGATTCGGGCCAAGAACCTTCAGGCTGGACGACTTC
AACGATCCCATTCATCTGCAGAAGATATCGACGTGGAATACTTTCGATGTTCAGGATCTGGGCGCCCATCACCCGGA
CTCCATATCCAAGGACTACACCCACGACCTGTATAAAATAAACGAGTGGTACCGCGCCTGGCTACCAGACAACGTCG
AGGGACGGCACGATACGAAGATCACCTACCAGGTGGAAATCCGCTATGCGAACAACACAAACGAGACGTATACCTTC
CACGGACCGCCTGGCTCTGAAGAAAACCCTGGTCCGATTAAATTTACAAGGCCGTACTTCGATTGTGGCAGGTCCAA
CAAGTGGCTGGTGGCCGCAGTAGTGCCAATTGCGGATATCTACCCCCGACACACGCAGTTCCGTCACATTGAGTATC
CCAAgtaagataccttgaatatcccctgaataccctccttttatctactgtatcgcttttagATACACGGCCGTTTC
GGTTCTTGAGATGGACTTCGAGCGTATCGACATAAACCAGTGTCCATTGGGTGAAGGCAACAAAGGACCTAATCACT
TTGCGGATACGGCGCGGTGTAAAAAAGAAACGACAGAGTGTGAACCATTACAAGGCTGGGGCTTTAGGCGCGGTGGC
TACCAGTGCCGTTGTAAGCCAGGTTTTCGGCTGCCCAACGTAGTGCGGCGACCTTATCTGGGCGAGATTGTGGAGCG
CGCATCGGCAGAACAGTACTACAACGAGTACGACTGCCTTAAGATTGGCTgtatgttttaagtagcaatatgtaaaa
gtatgagatttgactcttgatgtttttttttagGGATCCAAAAGCTTCCCATTCAGTGGGATAAGGCCTCCTACCAC
ATTCGCCAAAAGTATCTGGACCGGCATCCGGAATATCGCAACTACACCACCGGCTCGCGATCACTTCATGCTGAGCA
CTTAAATATTGATCAGGCGTTGAAGTATATTCATGGAGTCAACTATCGCACTTGCAAAAAgtaagacacatacaaaa
cttatccagccaaggtcacttcaataaactgatcaattatgctatcgccttgacagCTTCCATCCGCAGGATCTGAT
TCTTCGCGGTGATGTGAGCTTCGGCGCCAAGGAGCAGTTCGAGAACGAAGCCAAGATGGCCGTGAGACTGGCCAACT
TTATTAGCGCCTTTCTGCAGgtaagcaaacgattcagagcaaaggattcccatcgccttcacgctaaatgaagagca
ataattgataacccgacacctattaagagccttcgacgacggctcttgaaaacttctcaagtgtaaattataatttt
ccacgcgtaattcaacttcctcgaatttcctgcattgccagtttctcggttcttaccgatgctgct
LD18575.5prime
CTCCGGTTTGAGGGCGTAG
TTCCAACAAGTGCTGAGCATCACAATTTTCTATTACTAAGCCCAGCTTTGCGTTGGCGGGCCCAAGAATCTCATTTT
ATATTAAGTTTCTGCAAGTTTAGTTAATTAGTTAGTTGATAGTGTTGTTTGTTTCTTCTGCAACAATTGTGTGCGAT
AGGAGTCGGGCAAAATGTTGCCGTCGTCGATTTTGGGGCGCAG-TATTTGCTTTTTATGCTGGTGCTCGCCGTGGGG
CTGTTCGCCATACACGAGTGGCAGGCCCGGGATGCCTTTGATGAGATAAAGAGGCAGTTCGACAAGGTGAACGCGGA
TAACTGCCCCATCCAACACCATTCGGACCTTTTCATGCCCATGGACGCGGTGTCCCACAAGCCGGACATCAAGGAGA
TCAACGTGAATCCGGTGTTCCCCAACCGAACTGCCCTGCTGCATCTGCAGAATAT
LD21417.5prime
CTAAGCT---NTNTGCGTTGGCGCGCCCCAGAATCTCATTTT
ATATTTAGTTTCTGCCAGTTTAGTTAATTAGTTAGTTGATAGTGTTGTTTGTTTCTTCTGCAACAATTGTGTGCGAT
AGGAGTCGGGCAAAATGTTCCCGTCGTCGATTTTGGGGCGCA-CTATTTGCTTTTTATGCTGGTGCTCGCCGTGGGC
\-TGTTCGCC-AACACGAGTGGCAGGCCCGGGATGCCTTTGATGAGATAAAGAGGCAGTTCGACAAGGTGAACGCGGA
TAACTGCCCCATCCAACACCATTCGGACCTTTTCATGCCCATGGACGCGGTGTCCCACAAGCCGGACATCAAGGAGA
TCAACGTGAATCCGGTGTTCCCCAACCGAACTGCCCTGCTGCATCTGCAGAATATGGCCCTTAGCAGAAGCTTCTTC
TGGAGCTACATCCTCCAGTCGAGGTTTATTCGACCCGCCATCAACGACACCTACGA
LD13768.5prime
CGCCCATCACCCGGA
CTCCATATCCAAGGACTACACCCACGACCTGTATAAAATAAACGAGTGGTACCGCGCCTGGCTACCAGACAACGTCG
AGGGACGGCACGATACGAAGATCACCTACCAGGTGGAAATCCGCTATGCGAACAACACAAACGAGACGTATACCTTC
CACGGACCGCCTGGCTCTGAAGAAAACCCTGGTCCGATTAAATTTACAAGGCCGTACTTCGATTGTGGCAGGTCCAA
CAAGTGGCTGGTGGCCGCAGTAGTGCCAATTGCGGATATCTACCCCCGACACACGCAGTTCCGTCACATTGAGTATC
CCAA----------------------------------------------------------ATACACGGCCGTTTC
GGTTCTTGAGATGGACTTCGAGCGTATCGACATAAACCAGTGTCCATTGGGTGAAGGCAACAAAGGACCTAATCACT
TTGCGGATACGGCGCGGTGTAAAAAAGAAACGACAGAGTGTGAACCATTACAAGGCTGGGGCTTTAGGCGCGGTGGC
TACCAGTGCCGTTGTAAGCCAGGTTTTCGGCTGCCCAACGTAGTGCGGCGACCTTATCTGGGCGAGATTGTGGAGCG
CCCATCGGCAGAACAGTACTACAACGAGTACGACTGCCCTAAGATTGGCT---------------------------
\---------------------------------GGAT
LD16802.5prime
CGCCCATCACCCGGA
CTCCATATCCAAGGACTACACCCACGACCTGTATAAAATAAACGAGTGGTACCGCGCCTGGCTACCAGACAACGTCG
AGGGACGGCACGATACGAAGATCACCTACCAGGTGGAAATCCGCTATG-GAACAACACAAACGAGACGTATACCTTC
CACGGACCGCCTGGCTCTGAAGAAAACCCTGGTCCGATTAAATTTACAAGGCCGTACTTCGATTGTGGCAGGTCCAA
CAAGTGGCTGGTGGCCGCAGTAGTGCCAATTGCGGATATCTACCCCCGACACACGCAGTTCCGTCACATTGAGTATC
CCAA----------------------------------------------------------ATACACGGCCGTTTC
GGTTCTTGAGATGGACTTCGAGCGTATCGACATAAACCAGTGTCCATTGGGTGAAGGCAACAAAGGACCTAATCACT
TTGCGGATACGGCGCGGTGTAAAAAAGAAACGACAGAGTGTGA
HL02444.5prime
CK00408.5prime
GH23994.5prime
Translation
M F P S S I L G R S Y L L F M L V L A V G
V F A Q H E W Q A R D A F D E I K R Q F D K V N A D
N C P I Q H H S D L F M P M D A V S H K P D I K E
I N V N P V F P N R T A L L H L Q N M A L S R S F F
W S Y I L Q S R F I R P A I N D T Y D P G M M Y Y F
L S T V A D V S A N P H I N A S A V Y F S P N S S
Y S S S Y R G F F N K T F P R F G P R T F R L D D F
N D P I H L Q K I S T W N T F D V Q D L G A H H P D
S I S K D Y T H D L Y K I N E W Y R A W L P D N V
E G R H D T K I T Y Q V E I R Y A N N T N E T Y T F
H G P P G S E E N P G P I K F T R P Y F D C G R S N
K W L V A A V V P I A D I Y P R H T Q F R H I E Y
P K----------------------------------2----------------------- Y T A V S
V L E M D F E R I D I N Q C P L G E G N K G P N H
F A D T A R C K K E T T E C E P L Q G W G F R R G G
Y Q C R C K P G F R L P N V V R R P Y L G E I V E R
A S A E Q Y Y N E Y D C L K I G \---------------------------
\-------1-------------------------W I Q K L P I Q W D K A S Y H
I R Q K Y L D R H P E Y R N Y T T G S R S L H A E H
L N I D Q A L K Y I H G V N Y R T C K N-----------------
\---------------------2---------------------------------- F H P Q D L I
L R G D V S F G A K E Q F E N E A K M A V R L A N
F I S A F L Q \-----------------------0---------------------------------
\---------0------------------------------------------0------------------------
\------------------0------------------------------------------0----
fly
MFPSSILGRSYLLFMLVLAVGVFAQHEWQARDAFDEIKRQFDKVNADNCPIQHHSDLFMPMDAVSHKPDIKEINVNP
VFPNRTALLHLQNMALSRSFFWSYILQSRFIRPAINDTYDPGMMYYFLSTVADVSANPHINASAVYFSPNSSYSSSY
RGFFNKTFPRFGPRTFRLDDFNDPIHLQKISTWNTFDVQDLGAHHPDSISKDYTHDLYKINEWYRAWLPDNVEGRHD
TKITYQVEIRYANNTNETYTFHGPPGSEENPGPIKFTRPYFDCGRSNKWLVAAVVPIADIYPRHTQFRHIEYPKYTA
VSVLEMDFERIDINQCPLGEGNKGPNHFADTARCKKETTECEPLQGWGFRRGGYQCRCKPGFRLPNVVRRPYLGEIV
ERASAEQYYNEYDCLKIGWIQKLPIQWDKASYHIRQKYLDRHPEYRNYTTGSRSLHAEHLNIDQALKYIHGVNYRTC
KNFHPQDLILRGDVSFGAKEQFENEAKMAVRLANFISAFLQSMQTITRISSLQVSDPNEVYSGKRVADKPLTEDQMI
GETLAIVLGDSKVWSATMLWERNKFTNRTYFAPYAYKTELNTRKFKVEDLARLNKTHELYTEKKYFKFLKQRWNTNF
DDLETFYMKIKIRHNETGEYQQKYEHYPNSYRAANIKHGYWTQPQFDCDGYVKKWLVTYAVPFFGWDSLKVKLEFKG
VVAVSMDMLQLDINQCPDWYYEPNAFKNTHKCDEQSSYCVPIMGRGYETGGYKCECLQGYEYPFEDLITYYDGQLVE
AEYQNIVADVETRYDMFKCRLAGASGLQSALGLVVALIGLTLTLLYRFS
honeybee
MEFINFKIYTIVIISLYCLGVVGQYEWQARDAFDEIRLKMDKINEENCPIQHIGDLYLPEDTVSHLPDIKDININPV
FPNRTALLHLHNMALSRSFFWSYILQSRFIRPAINDTYDPGMMYYFLSTVADVSANSHINASAIYFSPNMSYSSSYR
GFFNKTMPRFAPRTFRADDLMILYI-RKNIHRNTFNVQDLGAFASGSLGEDYTTDYYRINE
Amazing thing is that I can continue this gene for 3500bp with
perfectly predicted introns and lovely exons, encoding over 800aa!
Possible TM domain at end, if real.
The only similar protein in NR is CG18679, which is 179aa and has a
5C5G region that matches twice in this one, presumably a extra-cellular
disulfide-bonded domain.
Also TBLASTN matches in 75-80% range for an A. gambiae and B. mori EST
each. So is insect specific \- yet quite conserved!
And now find multiple ESTs confirming beginning and end and several
introns.
At end, seems there may be another gene further on this strand, with
some ORFs and good intron splices;
But there are ESTs for genes on opposite strand, although could be
within intron of this gene (other was around, see below)
The honeybee EST translation could also be continued in alignment with
two frameshifts, so indeed is simply N-terminus of this long gene.
\****************A. mellifera The remaining 3kb at the end of this gene
before the next annotation is strange.
As noted above, one might be able to construct a gene going in the same
RC direction as our new one, but there are about 30 Drosophila ESTs
from all tissues to a single exon on the forward strand, but it is a
region full of stop codons!
Could it be an interesting RNA-coding gene \- No, is simply a spliced
transcript for start of the next gene!
These are some of these cDNAs, the AT ones are the new adult testes
ESTs, with linkers attached!
genomic, now forward caggaaggaacatttcagtattacaacatcaaccattctgaaattgttaaaa
ttctaaaaggataaaaaaaatcatagtccaaattggaaattattcttgatatttcgtggatagaaagccgattgtga
gccgttgaatagcgcgaacctattcaagacgagccaagcgatcgagttatcgcgaatatatataagatactaatact
attggaggagaatttacgccgctcgacgattagacgggcgacgtgaatcgttttggagttttcaagaccttttgtaa
tttgttttgttctctctaaagtatcacaaattgtgatatcatttagcacttttataatttctggaaaattcaagcaa
cggattttgatctttgacctgtgcccttcgattgtaatacattcaaattgtaaagcgtgaagaaaacccacatattg
acaaggatcagttcttttggaagcaccgaaaactaacgtctcaactaacgtcagaaacactcgcatgcaaaatgaat
aagtttggtaagtcaatggggtttttactacaaaatcatatatttgaacattgtatatatggatggattgctttaaa
attataagacacttaaattctagacattagactactcaagcaggttgtaaaagttattcgattttgtctcaaggcac
ttatcactaattcagatatgtagatacataatacaaagtagactactcattggacggatgtttttgatatagtctgt
tgtgttacaaatattcagtttaaggcacaatttacacattcgatttcttctcattgctttgtactgatctacaaata
aattgcggaatgttcaggggggcaagacttccagaaacaaaaccaaaagcggacacggccagccactcgaacgtgtt
ttagcagacgcgggcatttttccaaatggaaatggaggagttgcccgaatgctgagacagttactgcccaccgctgc
tgccctgaaatgactatgaaaaatgtgtgaaaaagattttttctgcccctgtccagctacctatggctataaagttt
taatgaaaaaagctggagatttctttttgccctggtagaagaccaagtggctgctaaactggttcctgcagcgcata
gaaaaagttctcagggtacagataataaaaattcaagcgcatatgataatcaaggcgcaaaaaacaaaaagtggaaa
aaacgccgcagcggcagcagcatcgacatacatatttaattcagcaaaaaaaatcgcagccaacagaccatcgacga
tttaatataagaaaaatacgacggcaggcgttggattttttgtggcatccgttggtcggaaaaaaggtgtgtgtgcg
ggcacaaacaaccctcagctaggacctggacgacctccccgatgggtgtaggtacgcctgggcttaactgggttccg
atgttaacaggtttgcgatcgccgcacatacgcacacacagctgtgcgacttcacggacattagagaggaaggatct
tccgaagaagaaaaatacgagctacacggcatttccgtaatctgagcgcagtaggcgcggctgtttgcgcttttctg
acgatgctgctgcttttgcttcggctgctgctgctgctgcttggtcttctgccgcttttgatagaaatgacaaataa
ccagattcattgtaatagattatgtctacaacttaatcgccttgcagat
AT25734.5prime GGCACGAGG-CAGGAAGGAACATTTCAGTATTACAACATCAACCATTCTGAAATTGTTAAAA
TTCTAAAAGGATAAAAAAAATCATAGTCCAAATTGGAAATTATTCTTGATATTTCGTGGATAGAAAGCCGATTGTGA
GCCGTTGAATAGCGCGAACCTATTCAAGACGAGCCAAGCGATCGAGTTATCGCGAATATATATAAGATACTAATACT
ATTGGAGGAGAATTTACGCCGCTCGACGATTAGACGGGCGACGTGAATCGTTTTGGAGTTTTCAAGACCTTTTGTAA
TTTGTTTTGTTCTCTCTAAAGTATCACAAATTGTGATATCATTTAGCACTTTTATAATTTCTGGAAAATTCAAGCAA
CGGATTTTGATCTTTGACCTGTGCCCTTCGATTGTAATACATTCAAATTGTAAAGCGTGAAGAAAACCCACATATTG
ACAAGGATCAGTTCTTTTGGAAGCACCGAAAACTAACGTCTCAACTAACGTCGGAAACACTCGCATGCAAAATGAAT
AAGTTTG------------------------TTTCCCACAACCTTGGTCGCGATCCATATCGGACCTTCAGCCCAGA
AATGTACCCGTTATCAAGCCCATTGGGACCGCATGGAACTGAAATGGCGGAAGGTAATGGCGAACTGTTGGATGACA
TTAACCAGAAAGCCGATGACCGTGGCGATGGCGAGCGTACAGAGGATTATCCCAAGCTGCTGGAATACGGTCTGGAC
AAGAAGGTCGCCGGCAAACTGGATGAGATCTAC
AT04521.5prime GGCACGAGG--------------------ATTACAACATCAACCATTCTGAAATTGTTAAAA
TTCTAAAAGGATAAAAAAAATCATAGTCCAAATTGGAAATTATTCTTGATATTTCGTGGATAGAAAGCCGATTGTGA
GCCGTTGAATAGCGCGAACCTATTCAAGACGAGCCAAGCGATCGAGTTATCGCGAATATATATAAGATACTAATACT
ATTGGAGGAGAATTTACGCCGCTCGACGATTAGACGGGCGACGTGAATCGTTTTGGAGTTTTCAAGACCTTTTGTAA
TTTGTTTTGTTCTCTCTAAAGTATCACAAATTGTGATATCATTTAGCACTTTTATAATTTCTGGAAAATTCAAGCAA
CGGATTTTGATCTTTGACCTGTGCCCTTCGATTGTAATACATTCAAATTGTAAAGCGTGAAGAAAACCCACATATTG
ACAAGGATCAGTTCTTTTGGAAGCACCGAAAACTAACGTCTCAACTAACGTCAGAAACACTCGCATGCAAAATGAAT
AAGTTTGTTTCCCACAACCTTGGTCGCGATCCATATCGGACCTTCAGCCCAGAAATGTACCCGTTATCAAGCCCATT
GGGACCGCATGGAACTGAAATGGCGGAAGGTAATGGCGAACTGTTGGATGACATTAACCAGAAAGCCGATGACCGTG
GCGATGGCGAGCGTACAGAGGATTATCCCAAGCTGCTGGAATACGGTCTGGACAAG
LP04990.5prime ATCAACCATTCTGAAATTGTTAAAA
TTCTAAAAGGATAAAAAAAATCATAGTCCAAATTGGAAATTATTCTTGATATTTCGTGGATAGAAAGCCGATTGTGA
GCCGTTGAATAGCGCGAACCTATTCAAGACGAGCCAAGCGATCGAGTTATCGCGAATATATATAAGATACTAATACT
ATTGGAGGAGAATTTACGCCGCTCGACGATTAGACGGGCGACGTGAATCGTTTTGGAGTTTTCAAGACCTTTTGTAA
TTTGTTTTGTTCTCTCTAAAGTATCACAAATTGTGATATCATTTAGCACTTTTATAATTTCTGGAAAATTCAAGCAA
CGGATTTTGATCTTTGACCTGTGCCCTTCGATTGTAATACATTCAAATTGTAAAGCGTGAAGAAAACCCACATATTG
ACAAGGATCAGTTCTTTTGGAAGCACCGAAAACTAACGTCTCAACTAACGTCAGAAACACTCGCATGCAAAATGAAT
AAGTTTG
GH18064.5prime CAACCATTCTGAAATTGTTAAAA
TTCTAAAAGGATAAAAAAAATCATAGTCCAAATTGGAAATTATTCTTGATATTTCGTGGATAGAAAGCCGATTGTGA
GCCGTTGAATAGCGCGAACCTATTCAAGACGAGCCAAGCGATCGAGTTATCGCGAATATATATAAGATACTAATACT
ATTGGAGGAGAATTTACGCCGCTCGACGATTAGACGGGCGACGTGAATCGTTTTGGAGTTTTCAAGACCTTTTGTAA
TTTGTTTTGTTCTCTCTAAAGTATCACAAATTGTGATATCATTTAGCACTTTTATAATTTCTGGAAAATTCAAGCAA
CGGATTTTGATCTTTGACCTGTGCCCTTCGATTGTAATACATTCAAATTGTAAAGCGTGAAGAAAACCCACATATTG
ACAAGGATCAGTTCTTTTGGAAGCACCGAAAACTAACGTCTCAACTAACGTCAGAAACACTCGCATGCAAAATGAAT
AAGTTTGTTTCCCACAACCTTGGTCGCGATCCATATCGGACCTTCAGCCCAGAAATGTACCCGTTATCAAGCCCATT
GGGACCGCATGGAACTGAAATGGCGGAAGGTAATGGCGAACTGTTGGATGACATTTACCAGAAAGCCGATGACCGT
AT16285.5prime GGCACGAGG-----------------------------------ATTCTGAAATTGTTAAAA
TTCTAAAAGGATAAAAAAAATCATAGTCCAAATTGGAAATTATTCTTGATATTTCGTGGATAGAAAGCCGATTGTGA
GCCGTTGAATAGCGCGAACCTATTCAAGACGAGCCAAGCGATCGAGTTATCGCGAATATATATAAGATACTAATACT
ATTGGAGGAGAATTTACGCCGCTCGACGATTAGACGGGCGACGTGAATCGTTTTGGAGTTTTCAAGACCTTTTGTAA
TTTGTTTTGTTCTCTCTAAAGTATCACAAATTGTGATATCATTTAGCACTTTTATAATTTCTGGAAAATTCAAGCAA
CGGATTTTGATCTTTGACCTGTGCCCTTCGATTGTAATACATTCAAATTGTAAAGCGTGAAGAAAACCCACATATTG
ACAAGGATCAGTTCTTTTGGAAGCACCGAAAACTAACGTCTCAACTAACGTCAGAAACACTCGCATGCAAAATGAAT
AAGTTTGTTTCCCACAACCTTGGTCGCGATCCATATCGGACCT
So, figured it out, these start before our gene, and jump over it into
the start of CG17838, so ours is in the first long 5'UTR intron of
CG17838!
Has tons of ESTs overlapping, so this annotation needs fixing.
So I scan this 30kb region from our gene to CG17838 for anything else,
beside one short exon in the middle that belongs to CG17838, and find
some ESTs, but none coding.
I can't believe there are not genes in here. There are several 500bp
ORFs! See below for one!
\***************extra2 GOOD GRIEF! in this 30kb region BLASTX reveals
about in the middle of it a relative of the 300aa N-terminus of
TGF-beta activated-kinase 1 homolog <up>Drosophila or CG18492
AE003732.2 aaataaaaattaatttgatttgcggggaagtcacaaagaataatattaagca
tttttgataagttggaattgggtgaatgacggaattatttcttatcgcgtaccgataaggtggttttatctcactta
actagcacttaatcacaactttcattgcaattcagttcacaaattgcacttcaaagcgatcgcacgtattttgctag
agATGGTCAAGCAAGTGGATTTTGCGGAGGTGAAGCTCAGTGAGgtaggttttacttgaaatattgttaaggattca
atgagacccccttttatgcttagAAATTTCTCGGAGCTGGATCTGGTGGAGCGGTGCGCAAAGCCACCTTTCAAAAT
CAGGAGATTGCAGTAAAGATATTTGATTTCCTTGAGGAAACAATCAAAAAGAATGCAGAGAGGGAAATCACACATTT
GTCGGAGATCGACCACGAAAACGTTATCAGGGTGATCGGGAGGGCCAGCAATGGAAAGAAGGACTACTTGTTGATGG
AGTACCTGGAGGAGGGGTCCCTCCACAACTACCTCTATGGCGATGACAAGTGGGAGTACACCGTGGAGCAAGCGGTT
CGCTGGGCACTCCAATGCGCCAAGgtaaagtgcaagatcgcctttccccacaatcagatacattttcggtgttttag
GCCTTAGCATACTTGCATTCGTTGGATCGACCGATTGTTCACCGCGATATTAAGCCGCAAAACATGCTTTTATATAA
TCAGCATGAAGACTTAAAGATTTGTGACTTTGGCCTGGCGACGGATATGTCCAATAATAAGACCGATATGCAAGGAA
CATTGAGGTATATGGCTCCCGAGGCCATTAAGCACTTAAAGTATACGGCTAAGTGTGATGTGTACAGCTTTGGAATA
ATGCTCTGGGAGCTGATGACACGTCAATTGCCATATAGTCACTTGGAAAACCCCAACAGCCAGTACGCCATTATGAA
AGCTATCAGTTCAGgtaattattattatatacttattttaaatcataattccttaaatttacgttaattattataaa
gaattattatattgtaagtatactccgtcccgatgaacttgagatctcggtacccaagagctgaaaggtgatatgca
gattcctatgtcgtgcaagtttgtttgcgacgttcattataaaaatgtgtcaaaaaatattcgattttagaatagga
atgatatttcttcaaatatagaaatatcaaattactaattttatttaaataaatttctgtttaatttcttttaaata
tatatatatatacatatacatgaatacatatacatatgaatattttagGCGAAAAACTTCCAATGGAAGCAGTAAGA
TCCGATTGCCCAGAGGGTATCAAGCAATTAATGGAATGTTGCATGGATATAAATCCCGAAAAGCGCCCCTCTATGAA
GGAGATCGAAAAGTTCCTTGGCGAACAGTATGAATCCGGCACTGACGAGGACTTTATCAAGCCTTTGGATGAGGATA
CCGTGGCTGTGGTGACCTACCATGTGGATTCGTCCGGCAGCAGGATAATGCGTGTTGATTTCTGGCGACATCAGTTG
CCATCGATCCGCATGACTTTTCCGATAGTGAAACGGGAAGCCGAAAGATTGGGAAAGACCGTTGTCAGAGAAATGGC
CAAGGCGGCGGCGGATGGAGATCGGGAAGTTCGGCGGGCTGAGAAGGACACGGAGCGTGAAACCTCGAGGGCTGCCC
ACAATGGAGAGCGGGAAACGCGGAGAGCGGGTCAGGATGTGGGTCGTGAAACTGTACGGGCGGTCAAGAAAATAGGA
AAGAAACTGCGCTTCTAACCAGAAATA
translation
M V K Q V D F A E V K L S E \------------------------------0-----
\-------------------- K F L G A G S G G A V R K A T F Q N Q
E I A V K I F D F L E E T I K K N A E R E I T H L S
E I D H E N V I R V I G R A S N G K K D Y L L M E
Y L E E G S L H N Y L Y G D D K W E Y T V E Q A V R
W A L Q C A K \---------------------------------0------------------- A
L A Y L H S L D R P I V H R D I K P Q N M L L Y N Q
H E D L K I C D F G L A T D M S N N K T D M Q G T
L R Y M A P E A I K H L K Y T A K C D V Y S F G I M
L W E L M T R Q L P Y S H L E N P N S Q Y A I M K A
I S S \----------1--------------------------1--------------------------1-
\-------------------------1--------------------------1------------------------
\--1--------------------------1--------------------------1--------------------
\------1--------------------------1--------------------------1----------------
\----------1--------------------------1-------G E K L P M E A V R S
D C P E G I K Q L M E C C M D I N P E K R P S M K E
I E K F L G E Q Y E S G T D E D F I K P L D E D T
V A V V T Y H V D S S G S R I M R V D F W R H Q L P
S I R M T F P I V K R E A E R L G K T V V R E M A K
A A A D G D R E V R R A E K D T E R E T S R A A H
N G E R E T R R A G Q D V G R E T V R A V K K I G K
K L R F \*
So is a neat little kinase gene, with no ESTs, and only the N-terminal
kinase domain having matches!
So this is in the second intron of CG17838! Are there more genes in
here or intron 1? I could do this endlessly, but without BLASTX or
ESTs to guide it is useless.
\***************A. mellifera BB260023B20H4.F AGTACGGGAGGACGTGCAAGGACATCGGTTGC
ATGAGGGATGAGGTCTGCGTGATGGCCGAGGATCCTTGTTCGATCTACCAACGAGATAACTGCGGTCGTTATCCGAC
TTGTATGAAATCTCGTCCAGGCGAGGCTAATTGTGCCAGCACTCTGTGCGGTGAAAACGAATACTGCAAAACCGAGA
ATGGCGTCCCAACATGTGTGAAGAAATCAGCAGTAAATGATGACGGATTGTTCGCAGAGTTCGATGGAACAAGCAAC
AGCTTATTGAGAAAAAGACGGGGGACCGGCGATGAAGCGGATTCCACATCGGATGACACGCAATCGGTTAAGACGAT
GGACTCTCGATACCCATCCGGAAGTGGATATCCGTCCTCCGAAACTCGACCAAAAGCTGACACCTCGGTCAAATCAT
CCGGTTACCCATCCAGTTCCGGTTATCCATCGAACTCTGGTTATCCCTCCACCGGCTCATCCGGTTATCCATCCAGT
TCTGGTTACCCATCCAGATCGTCCGGATATCCTTCAGAATCGGCAAGTTCCGGTTACCCGTCGAGAAGCTCTGGCTA
TCCTTCAGAGGCTGGATATCCGTCGAGAAGCTCGTCCGGTTATCCATCACAATCTGAATATCCTTCGAAAAGCTCTT
CAGGTTATCCATCAGAGTCTGGCTACCCTTCGAGAAGCTCTTATCCATCAGGATCTAGTTATCCCAGTGGCTCATAT
CCTTCATCAAGTTCAAGGGGATATCCATCGACCTCCGTCNATGAGGCAAGTGTGAAAAGGGTGGATGCAAACTCTTT
AATGCTG
long ORF encodes wacky protein with SSGRY repeats at end; weak BLASTX
matchs to long Drosophila proteins; nothing better in nr;
genomic match is weak, but to the least wacky part of the protein, and
then there are many ESTs, and two good B.mori ESTs to this region, so I
think it is a real protein; NEW INSECT GENE
GH20482.5prime TGAAGGTGTCAGTTGGGCTCCCGACGGTTTAATTTTTAGCTCCCACAACGAA
CGCAGTTCTCAGCGATTTTGACGCAAATTATAAACGTCAGCAACTTTTGATCCAAGGGCGTGACATCGATAGCGAGC
AACAGGATGCTGGCACTGCCTTTGCTGACTCTGGCGGTCCTCGCCAGCTGCGGCTACTCCGTGGACGCCTACTCCAA
GTACGGCCGTGGATGCGGGGACATTGGCTGCCTGCCCACCGAGGAGTGCGTCATCACCAGCGACTCGTGCAGCTACA
ACCAGCGTGACGGCAAGGATTGCGGCAACTATCCCACCTGCAAACGGCGCTCCGGCGGAGGATCATCCGCCTCGAAC
AGCAGCCCCAACTTGGCAGCCCCCTCGGCCAATCCGTCAGAGGTGAGCCATAACGCATACGCCCCGAATGCCCCAAG
TGCCCCGAGTGCCCCGTTGCCGGAAGCGGATGCGAGCGGTGGTGCTGGCTACGGTGGTGCTGCGGGCGGTGGTGGAA
GCGGCGGATATGGTG
GH16618.5prime TGAAGGTGTCAGTTGGGCTCCCGACGGTTTAATTTTTAGCTCCCACAACGAA
CGCAGTTCTCAGCGATTTTGACGCAAATTATAAACGTCAGCAACTTTTGATCCAAGGGCGTGACATCGATAGCGAGC
AACAGGATGCTGGCACTGCCTTTGCTGACTCTGGCGGTCCTCGCCAGCTGCGGCTACTCCGTGGACGCCTACTCCAA
GTACGGCCGTGGATGCGGGGACATTGGCTGCCTGCCCACCGAGGAGTGCGTCATCACCAGCGACTCGTGCAGCTACA
ACCAGCGTGACGGCAAGGATTGCGGCAACTATCCCACCTGCAAACGGCGCTCCGGCGGAGGATCATCCGCCTCGAAC
AGCAGCCCCAACTTGGCAGCCCCCTCGGCCAATCCGTCAGAGGTGAGCCATAACGCATACGCCCCGAATGCCCCAAG
TGCCCCGAGTGCCCCGTTGCCGGAAGCGGATGCGAGCGGTGGTGCTGGCTACGGTGGTGCTGCGGGCGGTGGTGGAA
GCGGCGGATATGGTGGTGGTTTCTCGGCTGGCGGCCACTCCCTGTACCCCAGCCTACCCAACTCAAACGGCGGCGGC
GH10109.5prime AAGGTGTCAGTTGGGCTCCCGACGGTTTAATTTTTAGCTCCCACAACGAA
CGCAGTTCTCAGCGATTTTGACGCAAATTATAAACGTCAGCAACTTTTGATCCAAGGGCGTGACATCGATAGCGAGC
AACAGGATGCTGGCACTGCCTTTGCTGACTCTGGCGGTCCTCGCCAGCTGCGGCTACTCCGTGGACGCCTACTCCAA
GTACGGCCGTGGATGCGGGGACATTGGCTGCCTGCCCACCGAGGAGTGCGTCATCACCAGCGACTCGTGCAGCTACA
ACCAGCGTGACGGCAAGGATTGCGGCAACTATCCCACCTGCAAACGGCGCCCCGGCGGAGGATCATCCGCCTCGAAC
AGCAGCCCCAACTTGGCAGCCCCCTCGGCCAATCCGTCAGAGGTGAGCCATAACGCATACGCCCCGAATGCCCCAAG
TGCCCCGAGTGCCCCGTTGCCGGAAGCGGATGCGAGCGGTGGTGCTGGCTACGGTGGTGCTGCGGGCGGTGGTGGAA
GCGGCGGATATGGTGGTGGTTTCTCGGCTGGCGGCCACTCCCTGTACCCCAGCCTACCCAACTCAAACGGCGGCTGC
GGTGGTGCGGCTCCCTACAATCCATATGGCAATGGTGGCGGA
This is outrageous! This transcript, and there are many more ESTs to
confirm it, starts with 90bp 52994bp upstream in the next segment of
the scaffold, jumps several genes, then has a series of exons with
introns with more genes in them! I can't possibly put it all together!
The C-terminus appears to be CG1726, however, so is not a new gene.
fly
MLALPLLTLAVLASCGYSVDAYSKYGRGCGDIGCLPTEECVITSDSCSYNQRDGKDCGNYPTCK-RRSGGGSSASNS
SPNLAAPSANPSEVSHNAYAPNAPSAPSAPLPEADASGGAGYGGAAGGGGSGGYGGGFSAGGHSLYPSLPNSNGGG
bee
YGRTCKDIGCMRDEVCVMAEDPCSIYQRD--NCGRYPTCMKSRPGEANCASTLCGENEYCKTENGVPTCVKKSAVN
DDGLFAEFDGTSNSLLRKRRGTGDEADSTSDDTQSVKTMDSRYPSGSGYPSSETRPKADTSVKSSGYPSSSGYPSNS
GYPSTGSSGYPSSSGYPSRSSGYPSESASSGYPSRSSGYPSEAGYPSRSSSGYPSQSEYPSKSSSGYPSESGYPSRS
SYPSGSSYPSGSYPSSSSRGYPSTSVMRQV
So fly protein is a little longer at the N-terminus, that is, bee cDNA
is truncated. Fly protein looks secreted.
\***************A. mellifera BB270004B10G5.F GTCCCCCGCCGATTGATTTGGCGCGCGCGTGA
CGGGAAAGAGGAACACGGGACTTTGGAGGTATTCCAAACGGCGGATATACACATCGTGGACGATGAAGTGGTGCTTA
GTGATGCTGATTCTGGCCGGTGTCACGAGGGCCGACAATTCGGTCGACGCGGACTATTCGATCCTCAAGTGTCCGGA
CCTGAATTCCCAAGAGGAGATCGATTTGAACGAGATAATGGGCAAGTGGTACGTGGTCGAGGTGTTGGAGCACAAAG
TCGATCCATCGAAGCCCAACGGCTCGTACAAGGTCAATTCCTGCCCGATCGTCAAGCTGAGAGCGGTCGAGAACACG
TCCAAGTACCTCTCCTCGTTGAGGCTGTTGTGGACCGAGGAGATCGGCGACCTCGAGTACACTTTCCGGATACCGGA
CGTATCCAGGAAGCCGGGCTTTTGGATCTCCACCTCTGTGCAAAATGGCACACTGGTGGAGAGGGGGTACAAGCAAT
TCAGCGGGAACGTGCACGTGATGAAGGCCGTCGCCTCGGACATGGTGCTGACATTCTGTTCCCGGAACCCGGACAAT
CAGCTGTACTCGTTGCTACTCTCGCGGGAGCACATCTTGCAAAAGAGCGACAAGCGAGGGGTGCACAATCTGCTCGG
CCGCCGCGGCCTCAAGATCGTCAATATCCGGG
Long ORF encodes 200aa 13% leucine N-terminus; weak full-length match
to insecticyanin A \- tobacco hornworm; BLASTX 22% over 197aa; p=3.2!;
this is the genomic match, and one EST, and it is unannotated; NEW GENE.
AE003553.2 TTCATCTGGAACAGACCGCATTCAGTGGCTCTTATCGGTAGAAACAGCAGCA
CTTTTCCGAGATGTCTATCAAATCCTTGACATACGTTGCGATCTTTGGCCTTTTTTGGGGCTCAATTGCGGGAACTG
TAGTTGATCAGTTTGGGATATATGGTGGTTCACCGATTACCACCACGGAAAGGAGTAATGCGGAGTTGCGCTGCATG
AACATCAATCCGCAGAACTCGGTGGACTTGGAGCAGgtacgatatgataagataatatccttgagtgacaggaaata
ttagctacacccaagatgaaaatgtaagctcatttgaatgcggagagctaatgcaattatggtcgatttgttctaat
taaccctttgcgccctgagtgcgcccaatgtcaatgctgtacagatctgagccctgtttatttagttattttttttt
ttcattgtgggcatttaattgaacgcacagcgagtggcatgcaaatagcacaaatgacccccattcgctcctacatg
agtgtgtaataatgcccaacctctctgcatatatagATGATGGGACTCTGGTACGGCAGCGAGATTATCGTGCACAG
CCAAGATTTTCCGGGCACCTACGAGTACGACTCATGTGTCATCATTCATCTGACCGATGCCACGGATCAGgtcagtt
gccatcgaaattcaaaacatttagatactgattttaatcatatgaaatattacagATCCGTTTGAGCCAAGCAAATC
GCGGCTATGGCTATGGAAATCAGGACTACAACCGTAACCAGAATAACTATGGACGCACCACCACCACTCAATCCTCC
TATCCGGATAGCGATGAGTACCCGTTGAGATCGATTCAAAGCCAGCAGAAGTACCTACGTTTGATTTGGAGTGAGCG
TGATAACAATCTGGAGTATACTTTCAACTATACCACCAGTGCACCTGGTCAGTGGTCCAACATCGGCGATCAGCGGG
GATCCTTGGTCACCCTGAACACGTACACCCAGTTCACGGGCACTGTCCAGGTGGTGAAAGCGGTCAACGATCACCTG
GTGCTGACCTTCTGCGGCAACGATGTTAAGAGCTCCATATACACAGTGGTTCTCACCCGCAATCGCCTTGGTCTCAG
TTTAGATgtgagtaaagtagtcttctttcaatttcttttccagaaagttaaatctttgggtaattgtagtttgatta
agtgtaaagaacactttttatttatctttaaatcagacctttttatttctcgtaaaatttgtttgtttcttaacaaa
gcttaatatttgtttacaagctgcgtcaagtaataatatgcaaataatttttttagtaaatactgatagtgatgagc
aaatctatatttgagaggtaaaaagaggttacattgtttacaattttacgtatatgctggtaacaaataagaatgag
gattgtaggaatcgtatatgtatatattaaacctaattaatcattttttccaattttccagGAGCTGCGTAGCATCA
GGAATCTGCTTTCCCGCCGTGGACTCTACACGGAGACCATTCGCAAGGTTTGCAATGGATGTGGGCGATTGGGTGGC
AGCCTCTTCGCTCTTTTAGCCCTTTTGCTGGTCGTACGTTTGGCCTGGGGGCGTGGCCAGTGAGCTGGAGGGGATCG
TCAGAGTGTCGAAGTACAGCCggcgtatttaatgagtccagccatattacgttaatttatgtataattttcccataa
gcaatacacaagcgagatcgccgagtgctctccccgaaaactaactcacattgcctccgttttaattgctcgtcttg
tcaattaaagtcaattacgaataaggcagggctgacttaagtgggcattagccgttggctagttgtaggcgttaagt
gtttgcttaattcaattagcagggatctccacccgctcattggaatttcggtacaactaatgtcggaaaaaattcgg
ttgatattgctgcaacgttcgcttcgtatggatctgcagaccccaaaagtaggcaacaaaaagtgtgctgctgaaat
tgatacatataaattactcatacgccatggtgacacagtcacacacatcgaaacccaatgccacttgcctctgtctg
attacgttgttaacatttcgtttttttttacactatcagcactaaccgaaaaacgagcagatgatc
GH25183.5prime CTGGAACAGACCGCATTCAGTGGCTCTTATCGGTAGAAACAGCAGCA
CTTTTCCGAGATGTCTATCAAATCCTTGACATACGTTGCGATCTTTGGCCTTTTTTGGGGCTCAATTGCGGGAACTG
TAGTTGATCAGTTTGGGATATATGGTGGTTCACCGATTACCACCACGGAAAGGAGTAATGCGGAGTTGCGCTGCATG
AACATCAATCCGCAGAACTCGGTGGACTTGGAGCAG---------------------0-------------------
\---------------------------0----------------------------------------------0--
\--------------------------------------------0--------------------------------
\--------------0----------------------------------------------0---------------
\-------------------------------0----ATGATGGGACTCTGGTACGGCAGCGAGATTATCGTGCACAG
CCAAGATTTTCCGGGCACCTACGAGTACGACTCATGTGTCATCATTCATCTGACCGATGCCACGGATCAG-------
\---------------------------0---------------------------ATCCGTTTGAGCCAAGCAAATC
GCGGCTATGGCTATGGAAATCAGGACTACAACCGTAACCAGAATAACTATGGACGCACCACCACCACTCAATCCTCC
TATCCGGATAGCGATGAGTACCCGTTGAGATCGATTCAAAGCCAGCAGAAGTACCTACGTTTGATTTGGAGTGAGCG
TGATAACAATCTGGAGTATACTTTCAACTATACCACCAGTGCACCTGGTCAGTGGTCCAACATCGGCGATCAGCGG
translation
M S I K S L T Y V A I F G L F W G S I A G T V V D Q
F G I Y G G S P I T T T E R S N A E L R C M N I N P
Q N S V D L E Q \---------------------0------------------------------
\----------------0----------------------------------------------0-------------
\---------------------------------0-------------------------------------------
\---0----------------------------------------------0--------------------------
\--------------------0---- M M G L W Y G S E I I V H S Q D F
P G T Y E Y D S C V I I H L T D A T D Q \------------------
\----------------0--------------------------- I R L S Q A N R G Y G
Y G N Q D Y N R N Q N N Y G R T T T T Q S S Y P D S
D E Y P L R S I Q S Q Q K Y L R L I W S E R D N N
L E Y T F N Y T T S A P G Q W S N I G D Q R G S L V
T L N T Y T Q F T G T V Q V V K A V N D H L V L T F
C G N D V K S S I Y T V V L T R N R L G L S L D \----
\--------------0------------------------------------0-------------------------
\-----------0------------------------------------0----------------------------
\--------0------------------------------------0-------------------------------
\-----0------------------------------------0----------------------------------
\--0------------------------------------0---------- E L R S I R N L L
S R R G L Y T E T I R K V C N G C G R L G G S L F A
L L A L L L V V R L A W G R G Q Z
fly
MSIKSLTYVAIFGLFWGSIAGTVVDQFGIYGGSPITTTERSNAELRCMNINPQNSVDLEQMMGLWYGSEIIVHSQDF
PGTYEYDSCVIIHLTDATDQIRLSQANRGYGYGNQDYNRNQNNYGRTTTTQSSYPDSDEYPLRSIQSQQKYLRLIWS
ERDNNLEYTFNYTTSAPGQWSNIGDQRGSLVTLNTYTQFTGTVQVVKAVNDHLVLTFCGNDVKSSIYTVVLTRNRLG
LSLDELRSIRNLLSRRGLYTETIRKVCNGCGRLGGSLFALLALLLVVRLAWGRGQZ
bee
MKWCLVMLILAGVTRADNSVDADYSILKCPDLNSQEEIDLNEIMGKWYVVEVLEHKVDPSKPNGSYKVNSCPIVKLR
AVENTSKYLSSLRLLWTEEIGDLEYTFRIPDVSRKPGFWISTSVQNGTLVERGYKQFSGNVHVMKAVASDMVLTFCS
RNPDNQLYSLLLSREHILQKSDKRGVHNLLGRRGLKIVNIR
So is a reasonably nice small gene, with several phase 0 introns.
Protein at 250aa is a little long, and it doesn't match insecticyanin!
Rapidly evolving insect proteins
\***************A. mellifera BB270005A10H10.F GCAACGCTGGAAGATAATTTGCAAGTCGAAT
CCAGACCAAGTTCCCAAGACGCGATTCAGTTTCTTCATCGGAGGACCACTATGCGGTGTCGTGGCATCGTCGAAGGC
TCGCTCGTTCGACGAGGGGAACAGCGCTCGAGAGATTTCCGATAAGTTTGGCCGCGACGAGGATGGAGAGAATCCGC
GGTCTGCCGGAACAGAGTTACGGAAATGCGCGCCGCCCCGGAACGCGACGACGACCTCGACGAGGCGACTGGCGTTC
GTTTCGAGTTTCAAAGAGATCCCGGAGCGAGGTTCAAGTTCGCGTCGACCGGTGATTCGAGGATAAGGCACGGTTTG
ATCGCGGCCCGCGTTGAAAATCGCTGAAAATCGTGAAGGATCGTGACCGATTGGATGGAAGATAGGGACGACGGGAC
GATCGTGAAGAAGAGATGCCTGGTATCGTGGTATTCCGACGCCGATGGAGCGTCGGCAGTGACGACCTCGTCGTTCC
CGGCGCTTTCCTCTTCATCCTTCATCTGATATGGATGACGGTATTGAGCGTTCTACTAGGGATATTTAAATGGGATT
GCAATGTTATGTGCATCCTTCTACTGTGGAGATACATTGTCGGTTATTTGGTAATTTTCGTGATCTCCATGATCGTG
GAGTTTTCCATCTGCTTTCTGGCCACTAGGGGTAGCATCCTGG
unobvious internal ORF after long 5'UTR has e-05 match to N-terminus of
a 680aa C. elegans predicted protein; this is the genomic match, and is
to an unannotated region, but just before an annotated gene.
No good ESTs, but some for same region to Brugia? Not sure about this
one, would be hard to annotate.
AE003493.2 ATGCCTGGACTTGTGGTCTTCAGACGTCGCTGGTCTGTGGGCTCTGATGATC
TCGTGGTGCCGGGCGCATTTCTCCTGACGATTCATTTTATATGgtaagtagccacgtaccattttacttagtcatcg
attattgttcataattctatttcgtgttttttttcttcttctttttcttcaacaacaaaataaaaacaaaaaaacga
tgtgatttctatgacagttttgtgattgttagcgtctcgttggttatctttgagtataatacacgaattttaagcgt
aaaattattgttctatcatctaataggctacttgttgatactattttgtaagtatactgaatcactttctgtggatt
ctgtattgatattgatcctattttgcagtttcaatatgtgtagaaataggtatatgtgtgatctcgATGCGTGGCAG
TATTCTGGATGCCGAGGCGCGCACCTCAATCAACATTTGGATATATCTCAAGAGCTGTAAGTTAGCACTAATAGAAT
AATCCTATATGTATGTAGTATAC
translation
M P G L V V F R R R W S V G S D D L V V P G A F L L
T I H F I W-------------------2------------------
bee MPGIVVFRRRWSVGSDDLVVPGAFLFILHLIWMTVLSVLLGIFKWDCNVMCILLLWRYIVGYLVIFVISMIV
EFSICFLATRGSIL
fly MPGLVVFRRRWSVGSDDLVVPGAFLLTIHFIW
I can't really figure it out, but I think this is the first exon of
gene CG11102
\***************A. mellifera BB270012B20H7.F ATGACACCAAAACCTAAGCAACAAAATATAAC
GAATAAATCTAAAGAGAGATCCCCATCTATAGAAAAGCCAAAAGCGGAAGAAAAAGTAAAAATAACTAAAGTATTTG
AATTTGCCGGTGAAGAAGTAAAAGTAGAAAAAGAAGTCTCTATAGATTCAGCAGAAGCAAGAATATCTCTATCCTCC
GCTGAGAATTCTGAGAAAACAGGAAATTCTGGATCTCTCGCGGGTAGAGGATCTGGAAGAGGTAGAGGTTTCAAACG
AGCTGGTTTAGGAGGTATTTCTTCTGTCCTTGGTCAATTAGGGAAGAAGGCGAAAATTAGTACGTTAGAAAAATCCA
AACTAGATTGGGATAATTATAAGAAACAAGAGAATTTGGAGGAAGAAATTAGTACTCATAACAAAGGCAAGGATGGA
TATTTAGAACGTCAAGATTTCTTACAAAGAGCAGATTTGCGACAATTTGAAATTGAAAAACAATTACGTAATGCAAA
CAGACGTAGTACACGGTGAATTTATAATTTTATGTATATATATTTTATATATATATATCTTCCTTAATAATGAGAAC
CAGAAATGGTATTGAGAAAAATATCTTATTAGAATTGCCAATTATTGGCGCTTGTAGCACATATTTACTCGAATTGT
TTAAACTCTTACTAAATATCTCATGCCATGTATAAAAATGTGATTGCCTCGCTTGGCTTGACATCGGCTGG
end of ORF encodes 14%K; 11%E 170aa; 50% match to end of 300aa
craniofacial development protein 1 <up>Mus musculus</up> e-33; genomic match
is to unannotated short scaffold; one Dros EST; NEW GENE
AE003220.2 GACACAATGAACTCACAAAAAGAATACGTATCGGACTGCGAAACCGACGATG
ATTATTATGTCGATTTGTTAACTTCAGGCAAGGGCAGTGATAAGAGTGAAAGTGATGTGTCGGACAAGTCTGAAAAT
TATCCAGGCCTAAAATCAAAGCATACTGCGAAGGCATTGCGGAAAACAAGGCATTGTGACGGCGATAATAGGGAATA
CAGGTCTAAGGAGTGCGACGACCTTCATTCCGAAGAGGAGTCTGAAAAATCGCGGTCGGATGCTTTATGGGCCGATT
TTCTTGGCGACATTGATACTAAAAGCGTAATCAACCAAAAAACAGATTATACGGAGGGAAACGCAGCAAGTGCTACC
AATACCAATACGCATGAGACTTGTAATAAATATGATAAAAACGATACGGCAATAATAAAAACTGCACAGCAATACGA
TTCCAAAAGAACCACGCTTTCAGTTTCCACACTCGGAAAAATTAAACGATCATCCGCTGAAAAGAGTATCGGTACCA
TGATAAATAAATTTGAAAAGAAGAAAAAATTGACAGTGCTTGAAAGGTCACAATTGGATTGGAAAATATTTAAACAA
GACGAAGGCATAGACGAACTTCTGTGCTCGCATAACAAAGGCAAGGACGGgtgagtttggaagaagaagaagaagag
tatttaaatggataaacttaaatttattacccaatgatttagGTATTTGGACCGTCAAGACTTTTTGGAGAGAACCG
ATCTTAGGCAGTTTGAAATGGAAAAGAAGTTGCGGCTGTCTCGCAGGCCATACTAACGGCTTAACCAACG
GH01620.5prime GACACAATGAACTCACAAAAAGAATACGTATCGGACTGCGAAACCGACGATG
ATTATTATGTCGATTTGTTAACTTCAGGCAAGGGCAGTGATAAGAGTGAAAGTGATGTGTCGGACAAGTCTGAAAAT
TATCCAGGCCTAAAATCAAAGCATACTGCGAAGGCATTGCGGAAAACAAGGCATTGTGACGGCGATAATAGGGAATA
CAGGTCTAAGGAGTGCGACGACCTTCATTCCGAAGAGGAGTCTGAAAAATCGCGGTCGGATGCTTTATGGGCCGATT
TTCTTGGCGACATTGATACTAAAAGCGTAATCAACCAAAAAACAGATTATACGGAGGGAAACGCAGCAAGTGCTACC
AATACCAAT-CGCATGAGACTTGTAATAAATATGATAAAAACGATACGGCAATAATAAAAACTGCACAGCAATACGA
TTCCAAAAGAACCACGCTTTCAGTTTCCACACTCGGAAAAATTAAACGATCATCCGCTGAAAAGAGTATCGGTACCA
TGATAAATAAATTTGAAAAGAAGAAAAAATTGACAGTGCTTGAAAGGTCACAATTGGATTGGAAAATATTTAAACAA
GACGAAGGCATAGACGAACTTCTGTGCTCGCATAACAAAGGCAAGGACGG---------------------------
\------------------------------------------GTATTTGGACCGTCAAG
translation
M N S Q K E Y V S D C E T D D D Y Y V D L L T S G K
G S D K S E S D V S D K S E N Y P G L K S K H T A K
A L R K T R H C D G D N R E Y R S K E C D D L H S
E E E S E K S R S D A L W A D F L G D I D T K S V I
N Q K T D Y T E G N A A S A T N T N T H E T C N K Y
D K N D T A I I K T A Q Q Y D S K R T T L S V S T
L G K I K R S S A E K S I G T M I N K F E K K K K L
T V L E R S Q L D W K I F K Q D E G I D E L L C S H
N K G K D G--------------------------------2--------------------------
\---------- Y L D R Q D F L E R T D L R Q F E M E K K L
R L S R R P Y Z
fly only 200aa
MNSQKEYVSDCETDDDYYVDLLTSGKGSDKSESDVSDKSENYPGLKSKHTAKALRKTRHCDGDNREYRSKECDDLHS
EEESEKSRSDALWADFLGDIDTKSVINQKTDYTEGNAASATNTNTHETCNKYDKNDTAIIKTAQQYDSKRTTLSVST
LGKIKRSSAEKSIGTMINKFEKKKKLTVLERSQLDWKIFKQDEGIDELLCSHNKGKDGYLDRQDFLERTDLRQFEME
KKLRLSRRPYZ
bee
MTPKPKQQNITNKSKERSPSIEKPKAEEKVKITKVFEFAGEEVKVEKEVSIDSAEARISLSSAENSEKTGNSGSLAG
RGSGRGRGFKRAGLGGISSVLGQLGKKAKISTLEKSKLDWDNYKKQENLEEEISTHNKGKDGYLERQDFLQRADLRQ
FEIEKQLRNANRRSTRZ
There is about 15kb before this in this short contig; several EST
matches, but only one is perfect, and then no ORFs, so seems is
repetitive DNA. Indeed the sole perfect EST is for RT.
\****************A. mellifera BB270013A20H11.F TTTTCCCGGTATGTGCTTTGCCTCGACAAG
ATGTGCCACTATTGAACCAACAAAATCTTGGGAATTGACACCATTTTGTGGCCGTTCTACTTGCGTACCTGCTGATG
ACAACTCTGGTCGACTTTTCGAACTTGTCGAAGACTGTGGACCACTTCCAAAAGCTAATCCGAAATGCAAACTCTCA
GATAAAACTAATAAGACCGCTGCATTCCCTAATTGCTGTCCCATTTTCGAATGCGAAGAAGGAGCAAAACTTGAATA
TCCAGAAATTCCAACTTTACCACCACCCACGGAAATTATAGAGACCGAGAAAACTTCAGAAGAAGTTCCGACAAAAG
CTTAAATTCTAAAAAAACAGATTATAATCTTTACAAATTAAATTGAAAAATCGATTAAATTGAAACAGAAATTAAAG
ATTTATTAATTATAATCTGAAATAATAAATTTAATTAAAAATATATATATATACTTCGTTAAAAAAAATATATTTTT
ATCGAAAGTAAAAAAAAAATTTTACTTCTAACGAAAAATGTTATTTCATTCATTATATGTATACTGAAATATATAAA
ATATATTTCTTATATTTATGCAATGATACAAATATAAAATTGCAAACTTACATTATATAAATAAATATATGCATACT
AGTAAAATCATCAGAAACTTCGGGTATCCGTTCTAAAATATTGAATTTCTTCNATTTCCTAGTCCCGGAACC
end of ORF encodes 13% proline; 11% glutamic acid; BLASTX match to
Manduca sexta pMsmaD211! 77% and e-35;
genomic match is similar and to unannotated region; ESTs from four
insects! Clearly NEW INSECT-SPECIFIC YET REASONABLY CONSERVED GENE
AE003844.2 RC CTTCATTTAGGCTGGTTAGGTGGTTAATTCCATTTGTCTTCGTTCTTTTGTA
TTATTTTTACAAAGCGATAATATTTTAATCGTTTATGATTATTACAATATAACAAAAAGTTAACATCTTTGGAATCT
TAAAAATGAGTTTTCATTTTGCTGTACTGACCCTTATTTTAACAGCCTTCACAGTTTCTCTGTGTGCTGAACAAAAA
ATTACAAAGAGTGACGCAGGTGAAATACGAATTTTCAAACGTCTTATTCCTGCCGATGTTCTACGAGgtaagtatgg
caatcatcagatttagaaattttccattattaaaagttacaagttcaatataagtatatctaaaacggcatgttgtt
aaatcgggtgacacgcgtatagttttaagtaacataaaaggtatgggctagtgtaacgcaaaaaaaaaacaacaact
aaatatccctctcctttctcaaggtattaattttggccacaaaaggtatcattcagcctatggtgaacatttatcga
gtgtttttgcttttgatgtatacgtgatctattatatagtttccacagaaacagcccgaaaattaattggtctgtga
gtgtattccaattattaacgtaggttcaatagtgtttcaaagctcgcgttttatctggccttgcggcttgaatattc
cctcgcacttcctttcaaaacattttaataactcttcagATTTTCCGGGAATGTGCTTTGCTTCAACTCGATGTGCC
ACTGTTGAGCCTGGAAAGTCGTGGGACCTTACTCCATTCTGCGGTCGATCTACTTGTGTTCAAAATGAGGAAAATGA
TGCAAAgtaaacaaatttcagttaatatatatttaataaacaaatgcctaatatacattatttatagGCTATTCGAA
CTCGTAGAAGACTGCGGCCCATTGCCACTGGCGAATGACAAATGTAAATTGGACACAGAGAAGACTAATAAAACCGC
ATCGTTTCCTTATTGCTGCCCCATCTTTACATGTGACCCCGGTGTTAAATTGGAATACCCCGAGATCGGAAAGGATA
ATGACAAAAAGAATTCTGAGTGATTCAAAACAAATATATTATGAAAACGTCTGTCAATACAATAAAAACATTTGTTG
CTTTAGTCAAAAAGAACATTT
LP07557.5prime CTTCATTTAGGCTGGTTAGGTGGTTAATTCCATTTGTCTTCGTTCTTTTGTA
TTATTTTTACAAAGCGATAATATTTTAATCGTTTATGATTATTACAATATAACAAAAAGTTAACATCTTTGGAATCT
TAAAAATGAGTTTTCATTTTGCTGTACTGACCCTTATTTTAACAGCCTTCACAGTTTCTCTGTGTGCTGAACAAAAA
ATTACAAAGAGTGACGCAGGTGAAATACGAATTTTCAAACGTCTTATTCCTGCCGATGTTCTACGAG----------
\--------------1------------------------------------------1-------------------
\-----------------------1------------------------------------------1----------
\--------------------------------1------------------------------------------1-
\-----------------------------------------1-----------------------------------
\-------1------------------------------------------1--------------------------
\----------------1----------------------ATTTTCCGGGAATGTGCTTTGCTTCAACTCGATGTGCC
ACTGTTGAGCCTGGAAAGTCGTGGGACCTTACTCCATTCTGCGGTCGATCTACTTGTGTTCAAAATGAGGAAAATGA
TGCAAA----------------------------------2--------------------------GCTATTCGAA
CTCGTAGAAGACTGCGGCCCATTGCCACTGGCGAATGACAAATGTAAATTGGACACAGAGAAGACTAATAAAACCGC
ATCGTTTCCTTATTGCTGCCCCATCTTTACATGTGACCCCGGTGTTAAATTGGAATACCCCGAGATCGGAAAGGATA
ATGACAAAAAGAATTCTGAGTGATTCAAAACAAATATATTATGAAAACGTCTGTCAATACAATAAAAACATTT
GH25016.5prime ATTCCATTTGTCTTCGTTCTTTTGTA
TTATTTTTACAAAGCGATAATATTTTAATCGTTTATGATTATTACAATATAACAAAAAGTTAACATCTTTGGAATCT
TAAAAATGAGTTTTCATTTTGCTGTACTGACCCTTATTTTAACAGCCTTCACAGTTTCTCTGTGTGCTGAACAAAAA
ATTACAAAGAGTGACGCAGGTGAAATACGAATTTTCAAACGTCTTATTCCTGCCGATGTTCTACGAG----------
\--------------1------------------------------------------1-------------------
\-----------------------1------------------------------------------1----------
\--------------------------------1------------------------------------------1-
\-----------------------------------------1-----------------------------------
\-------1------------------------------------------1--------------------------
\----------------1----------------------ATTTTCCGGGAATGTGCTTTGCTTCAACTCGATGTGCC
ACTGTTGAGCCTGGAAAGTCGTGGGACCTTACTCCATTCTGCGGTCGATCTACTTGTGTTCAAAATGAGGAAAATGA
TGCAAA----------------------------------2--------------------------GCTATTCGAA
CTCGTAGAAGACTGCGGCCCATTGCCACTGGCGAATGACAAATGTAAATTGGACACAGAGAAGACTAATAAAACCGC
ATCGTTTCCTTATTGGCTG
translation
M S F H F A V L T L I L T A F T V S
L C A E Q K I T K S D A G E I R I F K R L I P A D V
L R \------------------------1------------------------------------------1--
\----------------------------------------1------------------------------------
\------1------------------------------------------1---------------------------
\---------------1------------------------------------------1------------------
\------------------------1------------------------------------------1---------
\---------------------------------1----------------------D F P G M C F
A S T R C A T V E P G K S W D L T P F C G R S T C V
Q N E E N D A K----------------------------------2-------------------
\------- L F E L V E D C G P L P L A N D K C K L D T E
K T N K T A S F P Y C C P I F T C D P G V K L E Y P
E I G K D N D K K N S E Z
The first intron contains a set of NNNNNNNNNNs
bee
FPGMCFASTRCATIEPTKSWELTPFCGRSTCVPADDNSGRLFELVEDCGPLPKANPKC
KL-SDKTNKTAAFPNCCPIFECEEGAKLEYPEIPTLPPPTEIIETEKTSEEVPTKA
fly
MSFHFAVLTLILTAFTVSLCAEQKITKSDAGEIRIFKRLIPADVLRDFPGMCFASTRCATVEPGKSWDLTPFCGRST
CVQNEENDAKLFELVEDCGPLPLANDKCKLDTEKTNKTASFPYCCPIFTCDPGVKLEYPEIGKDNDKKNSEZ
\************There is 20kb to the next gene, YIKES, each 10kb half
contains at least one huge gene, one ORF is 5kb, the other is 7kb!
Amazingly there are no ESTs, and the few BLASTX matches are poor
BUT amazingly the 7kb one is to the entire TES domain of lacunin, but
with nothing else, that is, no Kunitz or thrombospondins? What on
earth is it?
SIMA \- these have got to be genes, I can't see how one could have
several kb of ORF without it encoding something selected.
The latter TES region is a threonin/glutamic acid/serine rich region of
a moth protein we described, but this is not it's ortholog in the fly
genome, that is elsewhere and already annotated.
\**************A. mellifera BB270014A20B4.F TATTCTACAGGTTCCACAGCACCGTTTGCTTGT
GCGAGGTCTGGAAAAGAGCGTACAAGGTGAAGCCTACATACATGTGACGTAAGGGGTCGCGTGAGGTCGTGTGCAAG
ACGGCGGCTGCTCCTCAACGACCGAAAACCCTGCCGGCCAGGCTGGACGCCATCGCGGAAAACGATCCGATTTACTT
TATGTAAGAGGGAGAGGGAGAGATGCGAGATTCTCTTTCGCCTCTCTTCGTCGCTCGAGGTTCTTCAAGCTCTGCGA
AACTGCGAATATATATATATATATATATATATATCGAGTATATCGTTCACCAGGCGTTGTTCCATAGATATTGATTC
GTGCGACGAGCGATGGACGAGCCTCGAGTTTGAGAAGGAATGGCGGTTTGTTTATTATTATTCTTTTTTCTCGAGAA
GTTGCGTTTATTTATATTATTTAGATAATANNAATGATTGTACAAGGTTNTATAGTCGTCCTCGCGGTCTATGGAGA
GAAGAANAGATCCCCATGCGCGATGGTTTTTACATACACACTACCATACATACACCACACGGCGATT
The match is barely the end of an ORF, which encodes the end of a
family of proteins. The best genomic match is not properly annotated,
but there is a gene in the region; there are many related lower matches
in Drosophila and C. elegans, e.g. CG3332, and there is an EST for this
one.
Leave it for now
\**************A. mellifera BB270014B20D9.F ATAATTTCTTCTTCTTCTCCATTTCCATTTCTT
CGATGTCGAGGGTGTGATGCCGTGCGAGCCAGTGTCCGCGAGCGACTTTGTCGAATTTTCGAATTTCATGTCGAGGG
ACCGGAAGTACTCGAGGCACTGCGGCCAACAGAAGGAGTTCGACGTGAACAGCGACAGAAAGTTCTTTCGCGTGACG
TTCAAGAGCAACGATAGATACGACGGGACAGGATTCAACGCTAGCTACGTGTTCGTGGATGACGAGGGAAATTACAC
GACGAAGCCGCCGACGAGTAACGCGTCAACGTTAAAAGGTGCAACGATGATGATGCTGCTGCTGCTGCTCGTGTTCA
CGGATCCTCTTCTCCTCCGTTCCGGCCGAGTTTCACCACGCTTTAATCACGATCAGTAAGTTGGACGGGATGGTCTT
GGCTCAATTTACAACGAACTATAACTTGGAAGACGGGGAGCCATCCGAAATCTTCCATTATACTCGGCTAGCCGAGG
AAAGTGGCCGCATCGTTTGCATCGATGTGAAAATCGGGGGGACACATCTTCGAGGGACGCGTTCGTGATTCGTTTAT
TATTATCATCGTTATCACGGGAACCTTGTCTCCAACCAAACGTTATTATTATTATTATTATCATCTCGTTCACGTTT
CGTAGTCGTTGGTTGGGCGACAGAAGAACGACAAATATATATATATATATATATATAAATATAAATACTAT
no obvious ORFs; BLASTX match to C. elegans C15A11.3 37% over 74aa;
7e-09; genomic match is same region at 50% and e-19; needs to be added
on to CG4940? no EST unfortunately
\**************A. mellifera BB270018B10D12.F GTAAGCCAACTTCATCGACCTTAAGCGGATTA
CGACGCTGTCGAGCATTGTATGATTGCGAACAGATAACGAGGATGACTATCATTCCGGGAAGGAGAGGGATCTCGTA
ACAAATGAACAAACCGACGACGACAACGGATGGAAGGCGCTCTTGAACGAGCGCCTGAAAGAAGAGGAATGTTCCCG
ATTAGTTTCGTGCACATGTTGCAAGATTAACATTCGGTAAGCCTGAACTCATTGGCAAGTGTCTCCAGTACTGCCAG
CTCCGTGAGCATTGCTAGTCTCGGCAGGAAATCTTGGACGAGGAAACCGAAAGATTGAACAGAAAGGATCAAGCATT
CAGTTCCATAATTTTTAAAAAGTCGTGGATTCCTTCTCTTGAAATTATACGCAATTTGAAATAATTCCTCAATTTAT
AACTTGAAACAATTCCAGAGAAGCACATTGATTTGCTAAAAGAAGAGATTTTAATAATTATTATATGAGAAAATACT
AAAAAATATTTGTTTAAATTTGAGAGGAGGAAATCATAAATATTGAAGAATTCATTAAAAAAAAAAATAATATCTGA
CTAAAAAAAATCAATATGAAAGTTGAGAAAGTTTGTCTTTGAGAATCATTAGAAACATCATAGAGAAGAGAAGTAGT
TACTATAGCCTGAGCAAAATAAATTGGCTATTACAAGTTACAAATAGATAAACTCTCATCAT
no obvious ORFs; short stretch matches end to ADP-ribosylation
factor-directed GTPase activating protein in mammals; could be
C-terminus of CG2226?; there is one Droso EST!
Indeed this region is included in the annotation, but not the protein.
SIMA \- This seems to be a common problem I don't understand, why are
the annotated mRNAs not always completely translated?
\*************A. mellifera BB270019A20H6.F TTTCCACTCGAATTTTTCTACGCTCGTGTACGGT
CTTTTCACTCTCTCTCTCTATTTCTCTCTTTCTCTCTGTTTCTCCCCTTAATCGAGAGAAGTTGAAGTAGCGGACAG
AACGTTTTTTTTATGGAATTTCCAGTTAAAAAATAATCGTTTTCAAACTCACCTCGTTGGTGTTCCATCGGTGCCTC
TGAGTTGGAAAATGTTCAGCTCGTGGCAAGGTCTCCAAATTCTCCGGAAGTTTTATCGGATCTCCATCTAAACGGGG
AGGAAAGAAAAAAAAGAATTCGATGAAAATCTGCTCTCGACCAATTCGAACGTTTTTCTTTCTTTCTTTTTCTTTTT
TTTTCTCTCTTTTTCGAAACGCGTAAAATATACGAGTTTTTCGAAAAATTCAAGGAAATTTAATATCCTAATGGTCG
AATCAATTATATCATTCAATTCGATTAACTCTTTCTCAATCGTTAATCCGATTATCGTGCGAGTTACTAATAATTTG
CGCTCACATATATTCACTTCGACAAAAGCTTAGATGTTTGCATAACACATCATTAATCGTGTTTATATGCAACGGAT
CGAAGTATGCTATCGATCGTAACACAACTCGTGTACGAGAGAAAGAAACAGTGGTGTTAAAAGTAATGCACTTTTCA
CGTGCATAATATTACTTTCCCGGGATATGANACGCATATTTACTGTNNTAGAGAGNAAGNAAAGAAGA
no obvious ORFs; no BLASTX hits?; genomic hit is from a short internal
region of RC, to unannotated region; no Dros ESTs, but a few others for
same region, could be a real protein?
\*************A. mellifera BB270020A20G6.F AATACTTCTGTTGTTAAAGGTGTTGAAAGTATAT
TATCAATTAAGTTTGATCATCCTCTTCTAAAAGAATTAGTCATTGTAGAGGAACCTACATCAACCCAGGAGCCAATT
GTTTCCAATGCGGCAGTAGTCTCTGAATGTTATAAAGTGACTGCCGATGTTTTACCAGTACTATCAAAATTTGGATA
TGAAAAAGGAGACATTATGAAAAGAGCTGAGATCAGAAAATGTTTTACTGAATACGTGAAAGCAGAGAATCTTCAAG
ATGGAAGGATACTGAAACTGAACCCGCAACTCGCAGGTATTATGAAAACTAAAGCGAATGTGGAAACTGTAATGATG
GAGGATGGAATAAACAAGTTTATTGGACGTATGACGCATATGCATGAAGTTACTTTAGCAGGAAATAAATTGTTACA
CACGGGTAAATTGGAACCTATTGATATGAGAGTCACTGTTCGATCCGGCGGCAAAAAGGTAACGCTAGTAAATAATT
TGGAAACATTTGGCATAAATGCTAAAGAATTTAGTAAAGAATGTCAGAATATTGGAGCGAGTGCAACAATTACGGAT
GAACCAGGAAAAAAAACTCCTAGTGTTCTAGTTCAAGGAAATCAAATTTTATATATCTACAAATTACTTACAGAAAA
ATATCANATTAAAAAAAACTATATAAGAGGATTAGAATTCGCTCCAAAGAAACAAGGTTC
long ORF encodes 220aa 11% lysine end of protein; indeed is end of
ligatin <up>Drosophila melanogaster</up> by BLASTXl e-18; not in Drosophila
genome set for some reason;
curiously vertebrate proteins are same or better identity, and C.
elegans! Appears not to be annotated properly.
\************A. mellifera BB270021B10C10.F ACAATTTCTATTTAAGAAAAAAAATCTAAAAACA
TGAAATTTAGATTTTTAGGTGATGGCGATTGCCCTGATTGGTTNGCTAGCCGAAATCAACACATTGTCACGTATGAC
ATCCATTAAAATTAAGATATTAGGACAAACGGTTGCAAAATATCTTACGGAAGGAGAACTCGATGAAGAAAAAATAA
AAAAAATTACTCAAGATGCCAAGATTGAACTTAACGATGCAAAGGCTATGGTAGCAGCTCTTGAATTAATCTTTACA
TCGTCTGCTCGATATGGCGTTTCCGCCGCCGATTTAAGCAATGAATTGCAGCAACTAGGACTCCCTCGTGAGCACAG
CGCTGCAATTGCCAGATTGCATACGGATTATTGTCCTCAAATTACTGCTACGCTGTCTTCCCAATCCTTGAGAGTAA
GCAGATTATCGTCGATTGAAGTTTTGTCCTGTGATAATTCGTCACCTTTCTCCACGGTATCTCTTAAATTAAAGAAA
TTGGATGGAAATGTGGAAGATTCTATTATTAATATTTCAAAAAAAGATGTACACGTTCTATTGGCAGAATTACGAAG
AGCCAAGTCATTGATGGAAAACCTTTGAATAAAATAATGTTCTACGTATTAAACACAATTATTTTTATTAAAATAAT
AAAGTATATTAAAGTTATTATCTACAATAATAAAGTACTGGCTAAAAAAAAAAGTAAAATATAAAAAAAAAAA
long ORF encodes 180aa 14% leucine 12% serine protein; 48% match over
full-length to similar hypothetical protein FLJ20452 <up>Homo sapiens</up>;
e-21; also similar C. elegans protein;
unannotated NEW GENE; no Drosophila ESTs, but tons of others! Strange
AE003534.2 taccgacatttggtttgccctttacagAAATTCCGCTTCTGTGGCGAAGGCG
ATTGCCCCGATTGGGTCCTAGCTGAGATCATATCAACACTCTCGAACTTGAGCATTGAAAACTTGGAACAACTTAGC
GATTTAGTGGCACAACGAATTTGTGGAGAGACATTTGAGgtttgtaattatttgtttgaaattcataaatatacaag
acttttactttcagGAAGCGAAAATAAAATCGCTGACATCCACATTAACTAATGAAGGAAAAACCGCCGTGGCATGC
ATCAATTTTATGCTGACCAGCGCAGCTCGCTATAGCTGTAGTGAAAGCATTTTTGGCGAGGAGATCCAGCAATTGGG
ACTTCCCAAGGACCATGCCGCAGCCATGTGCAGAGTCCTCCAAAAGCATTCCGCCACCATAAGGCAAACACTTATAA
ACAAATCTTTCAGAAgttagtggtctaaacacatattaagtcttatgtgctatcttattaaggcttatatttgcaga
ttaatttgcttcaattttatatttattttagTTAACGAACTGACAAGCGTCCGAGACATATCTACGCCAGGGCAAAC
GCCTCCAAACTACGCCACCTTGGAACTGAAGATCTCGCAAGAACTGGTCGATGGCCTACCGAAGGATACCACCCATG
TCCTCAACATTGATCGCACCCAAATGAAGGCTCTGCTGGCGGAGCTGAAATTGGCACGTGATGTTATGCAAAAATAT
GAAAATAAACCAGATTCCTAAAAATGTTATTAATA
translation
K F R F C G E G D C P D W V L A E I I S T L S N L S
I E N L E Q L S D L V A Q R I C G E T F E \--------------
\-----------0-------------------------- E A K I K S L T S T L T N
E G K T A V A C I N F M L T S A A R Y S C S E S I F
G E E I Q Q L G L P K D H A A A M C R V L Q K H S
A T I R Q T L I N K S F R \--------------------------------------
\--------1----------------------------------------------I N E L T S V R
D I S T P G Q T P P N Y A T L E L K I S Q E L V D
G L P K D T T H V L N I D R T Q M K A L L A E L K L
A R D V M Q K Y E N K P D S Z
Can't easily find the correct N-terminus for this, but anticipate that
it will be short. Need an EST!
bee
MKFRFLGDGDCPDWLLAEINTLSRMTSIKIKILGQTVAKYLTEGELDEEKIKKITQDAKIELNDAKAMVAALELIFT
SSARYGVSAADLSNELQQLGLPREH------SAAIARLHTDYCPQITATLSSQSLRVSRLSSIEVLSCDNSSPFSTV
SLKLKKLDGNVEDSIINISKKDVHVLLAELRRAKSLMEN
fly
KFRFCGEGDCPDWVLAEIISTLSNLSIENLEQLSDLVAQRICGETFEE--AKIKSLTSTLTNEGKTAVACINFMLTS
AARYSCSESIFGEEIQQLGLPKDHAAAMCRVLQKHSATIRQTLINKSFRINELTSVRDISTPGQTPPNYATLELKIS
QELVDGLPKDTTHVLNIDRTQMKALLAELKLARDVMQKYENKPDSZ
\************A. mellifera BB270025A20A3.F AGTTCTAGATCTTGCCACTGAAACTGCTACTGCTG
TAAGAGAAACAAGTAGAAGTGCTCATCGTACGATACCAAAACGCGATAGACCTCCTCGTGTGGCAAGTGGTTCTGCT
GGTCTATTACCACCCTATAATCGCCAACAAGCAGAGGGCCAAGAATTTCTTTATATAATAAATGAACATAATTATTC
AGAATTATTTGTGGCATATGAGTGTTTACGTAGTGGAACGGAGAATCTAAGAATTCTTGTTTCTAATGAAAGAGTTC
GAGTGATTTCCGGAGGTACCAAAGGAGTTGTAACCGAAGTCAGTCTAGCGGACTTATTATATTGTCAACCAATGCAT
AAGCTAGAAAGTAATGGTGTTACTTTATACTATATTGAATTAATATCTAGATCAGATTCAACGATAACCGTTAACAT
GGACGGTCCAGAACTTCTAAGAAGACCTAAAGTTCGATGTGACAATGAAGAAGTAGCCAAAAGAGTATCGCAGCAAA
TTAATTACGCTAAAGGAATGCACGAGGAACGTAGCTTGACTCTTTCTTCTTCGGATAATATGTTAGATGATGTACAG
TACTATAAGTAGTTACAAACAATCATATATGAAAATTTATTTTGTATTTGACAAAAGTTTGGAATCACTATGTTTTT
ACAAAAAATTTTTATGGGAAATTAATGCATTAAAATATTTTCATTTCAATGTTAATTCC
long ORF encodes normal protein; several weak matchs to Drosophila
proteins, none convincing; but also several human proteins, especially
KIAA0453 protein <up>Homo sapiens</up> at e-19;
seems to be a missed exon of CG11003! Which is also one of the weak
matches above for part of it. No ESTs to help, but I think it is part
of CG11003.
\**************A. mellifera BB270028A10H8.F GCCGTGGCCCGAATTTTATCAAGAACACAATGT
CGGAAGATCAAGTAAATCCACCATCTCCAATCGATGGTATTTTACCGTTTTTGCAAAGTATTGAATGGAGAGATCCA
TGGCTTGCATTATTATTAACTTTTCACATTGCTGTTACTTTGACTGCATTGATGACACGAAACCATGCCAATTTTCA
AATTATGTTATTTCTTGCACTATTACCTCTGGTATATTTTTCTGAAAGTATTAATGAAGTTGCTGCATCTAATTGGA
TGTTGTTCTCAAGACAGCAATATTTTGATTCCAATGGTCTCTTTATATCTGTAGTATTCTCTGTGCCTATCTTGATG
AATTGTATGATCATGATTGCCAGTTGGCTTTATCAGTCTAGTCAATTAATGACCAGTTTGAAAAGAGCGCAATTAAG
ACAACAAGCAAGAAATCAGGAAATAGGAAATGAATCAATAAATACAAATGGCACTGCTGCAAGAGAAAAACAGGAGT
AATATTTCTAGTCCAAGAACAATGAGAAATGGAAAATACTCTATAAGTAGAGTCGTATATAACGGCATTGTAAAATT
CGACGAATATTTTCAACATAGTATTTTTTTAAAAGATTACTGCCGACACTTGTTATCACTGTACTTCAAGTTGATTA
ATTTCACTGTCAGTTAACTATATTTCCAACTTTATGCCGTATATACATATATATCGACTATTAGAA
end of ORF; weak long match to OstStt3 gene product; 23% and e=1.3; but
48% and e-23 to end of 171aa hypothetical protein DKFZp434C1714.1 \-
human (fragment);
genomic match is unannotated short region, NEW GENE. Tons of ESTs
from cow, plants and others, but not Drosophila!
This is the entire available region, assuming flanking annotations are
correct.
AE003822.2 gaaactcccgccacaagcgctttagaacagagtcctagacgagtgtggtgca
cggtagggtcggtggcccgtgccacatctaaggcgccccttttttcggattaccctgctcggctttagcttcgattg
cttgctcacacgtcgcccgttcgacttaataacccgaatagatttgattcgccctaaaaactacaattttgactgtt
ttaaaacgaattctttgtgatatttttcggatttgttaatgttgtctactgagtcagtgaaagcgttatcgacacgt
tcagactgaatgacgggcagggcgactctgcacgacaagtcggggtgggtgagaatgaggtggcacaaaatttgtaa
ttgcatttatatggtgagtaatacatactaaacgaaataatagtattttgatttatgttgttatatttagcccataa
aatagtaagttaggtcttacaaacagcgctaccagatccagtcaaaattgaggaagcctgaacgtctataggcctcc
caaaatggcgttgccATGAAAAATGACAGCTGTTCGCTGCGAATGGCTATTTATGTTTGTTTTGACTCGGCTTTCGA
ATATATCGCAAAATATATACAGGAAACATTTATATTCACAAAAATCTGTACGATGCACCCAGGGCAAATTGAGGTCA
ACGAGATCAATGGCTATTGGACATTTCTGCTGAGCgtaagatactcgcctatatacaatcaaaaatcaagaatccgg
caagttgtcactatttttgcagATCGATTGGAAGGATCCCTGGCTTATTGGCCTTATTTTGGCGCATATCTTAACCA
CCACCACTGCGCTGCTCAGCCGGAACAGCTCCAACTTCCAGGTTTTCCTCTTCCTAGTACTGTgtacgtggacttgg
cggcttccttgacttacccgataaatgactctgatttttgcatgtgcttcatcttcctcagTGCTGGCAGTCTACTT
CACCGAAAGCATCAATGAGTTCGCTGCTAACAACTGGAGTTCCTTTTCCAGACAACAATACTTCGATAGCAACGGCC
TGTTTATCTCGACAGTTTTCTCAATACCTATTTTGCTTAATTGCATGCTTTTGATTgtaagttatagtgtttccact
gcatgaagtgtgtatttatctttgcttatttgcagGGCACTTGGCTCTACAACTCCACGCAGCTGATGGTGACTCTA
AAAACAGCGCAGCTCAAGGAGCGAGCTCGCAAGGAACGCCAGACTAAGGCGGATTCGGAATCCATAGCACATAAAAA
GGCAGAGTAGaacttacgcctgtattacatgcagttaaaagcacaagtagagctgtgaaattatatgttatgcttta
aatggattttcctgtcatctagatgtagtttgctgcacagctctcgtctttaaaataaatttaatttagtataatca
aacttatagaatttgtaaatttaggctatttttacatcctgttttacttagcgaagttacaaacctaacatgccctt
catattaagcaaaaaatcacaccagttaccgttgccaccttggtaaagcagtttttactgccacctaaaattttcta
tatatatcacgtaatatgaactattttgatatttttgacgaaattaacattatagatccaatcagcttattgcctgt
atcaatttctgatctgtgtgccaagactgtaatttcaaattagaagctcgttggacctgtgtcattttttagtacga
attcaattgggagcccttcgtcgtctggtaacactgtccaacgattttgttgttgctggcttgtgggtgtcgaagca
gtgtcgcggcgcaatgttggaagtggtttttgggtaa
translation
M K N D S C S L
R M A I Y V C F D S A F E Y I A K Y I Q E T F I F T
K I C T M H P G Q I E V N E I N G Y W T F L L S \-----
\-------------------------0--------------------------------- I D W K D P
W L I G L I L A H I L T T T T A L L S R N S S N F Q
V F L F L V L \-----------------------------------1------------------
\---------------------L L A V Y F T E S I N E F A A N N W S
S F S R Q Q Y F D S N G L F I S T V F S I P I L L N
C M L L I \----------------------------------0--------------------- G T
W L Y N S T Q L M V T L K T A Q L K E R A R K E R
Q T K A D S E S I A H K K A E \*
This is my best guess, because two intron boundaries are unpredicted.
fly
MKNDSCSLRMAIYVCFDSAFEYIAKYIQETFIFTKICTMHPGQIEVNEINGYWTFLLSIDWKDPWLIGLILAHI
LTTTTALLSRNSSNFQVFLFLVLLLAVYFTESINEFAANNWSSFSRQQYFDSNGLFISTVFSIPILLNCMLLIGTWL
YNSTQLMVTLKTAQLKERARKERQTKADSESIAHKKAE
bee
RGPNFIKNTMSEDQVNPPSPIDGILPFLQSIEWRDPWLALLLTFHIA
VTLTALMTRNHANFQIMLFLALLPLVYFSESINEVAASNWMLFSRQQYFDSNGLFISVVFSVPILMNCMIMIASWLY
QSSQLMTSLKRAQLRQQARNQEIGNESINTNGTAAREKQE
Looks good from the alignment though.
\***************A. mellifera BB270030B20G7.F ACTATTCTCACCTCCGGCCGATTTCACGCCGC
GTAATTCTCATTTCTTTCGACAATCGAATATCCGTCGATCACAGTGATTATTATTTACGACTTGCTGGAATAACAAT
CACGCGATTAATTTGTTAAGTTTCAGTATGGAGTGTCCTGAAGCGATGGAACGAGGCAGAAACTTTCGTTTGCTTGC
CAAGGAAGAACTACCTAAACTCTTGGACTTCCTTGATGGCTATTTGCCCGAATCCTTAAAGTTCCATCAAACTTTGT
TGACCTATATGAATGACAGGGTATGGGATTTTATTTTCTATGTGGCTAATGACTGGCCGGATGATGAGATCTGTTTA
CATTTTCCAGGCATGACGTTAGCCACATAGAAAAAAAAAGCAAC
possible internal ORF; weak match to CG5750; no better BLASTX matches;
but convincing genomic match for same region at 70%, to unannotated
region, but could be real N-terminus of CG15628? No ESTs to help
\*************A. mellifera BB270032B20A6.F GGAAGGGTGCGTGTCAAAGTAGTAGACACACAAC
TGCTAATCTCGTGGTTACATTTTATTTTCACGAATATCTTAGGAAATGTACTTTTTCGGCACATTGCTATGTAGCAC
GTAAATGAAGCGACGGCGTATAGCGCGGTCGCATATCAAAAGAATACTACCTATAGGAACGATGAGAATGGCGCGCA
AATGTTGCGTGCGTAGCTGTGAGGCTGATGTGCAAGATGCGCGTGCTAAAGGGTTACCGCTTCATAAATTTCCGAAA
GATATTACTTTAAGAAACAAATGGTTGACTAGTGGTGGATTTGACGCGAATTTTAAACCTTCACCAGGTCAAGTTGT
TTGTCACAGACATTTTAAACGAGCTGATTACGAAGCTGCTAAAGGACATAAATTACTTCTACGTAAAGGTAGTGTTC
CGTCGGTTTTTGCAGATTATGACAATCATCCGGATCCTGTAATAATGTCTGTAAAATCATCAACTTCTTATGCACAA
GAAGATTTAGATCTTATTAATTCTGAAATTTTGAATTTAGAACAATCCATATCTCCATTGAATTCTGGTGCCAGAAC
ACCAAAATCCGATAGCTGTGGAGAAACATGTTCTTCTCGACCAGAATCATCAGCTGATTCTTTTAATTTATTAGATT
CAACAGAATTAATTGATANATGGATGTAAAACTTTGAATATGAAAGAAGAGAATATATCTCCTATGA
long ORF encodes 200aa 12% serine protein; repeated weak matches to
huge protein CG10631; e-05; also weak match to dJ126A5.2.2 (novel
protein) (isoform 2) <up>Homo sapiens</up>; short protein at e-04;
genomic match seems to be to region of N-terminus of CG10042 gene
product <up>alt 1</up>; several ESTs for this match too, so could be new gene,
but hard to annotate.
\-----------------------------------------------------------------------------
\--
'New Genes FASTA' file
>Found with R. suavis J3-A2
ATATAGTTCCATTCTGTTTTATTGGATTGAGTAAAGTTAGACAAAATGCAAGGTCTTGGTCTGCAAAGTCTTAAAAA
AAA
TCCAGCTTTAATTCCACTTTATGTGTGCGTTGGAGCGGGACTATTGGAGCCGTCTACTATATGGCTCGACTTGCTAC
TCG
TAATCCCGATGTCACTTGGAATCGCACATCAAATCCCGAACCATGGCAAGAGTACAAAGAAAAGCAATACAAGTTTT
ATT
CGCCTGTGAGGGATTATTCCAAAACTAAGAGTGCTGCCCCAAACTTTGATGAATAAATTACGTTTCCCTAGCAGCTG
CAA
TTTAAAAATGTAAAATGAAATAACTTCAAATTATAAATAAACATAGTGGATTTGAAAGCGTA
>Extra1
ATGCTTAATCTCAACCTTCTAGATTGTATAGTTCCTGAGATCTCGACATTCATACAGACGGACGGACAGCGTCAGAT
CGA
CTCGGATATTGATCCTGATCGAGAATATATAGGCTTCATATGGTGA
>Found with R. suavis J3-D1
ATGTATATTTATTTTCCAACAATCTTTCTCTTATTTTTGTATCCAGTAGTAGCAGTTGTCCCTCAAGGATTTACAAT
TAA
ACAACCAAAATGCTGGTATGTGGCAAACCCTGGACCCTGTGATGATTTTGTAAAAGTCTGGGGCTACGATTATTTGA
CTA
ATCGTTGCATTTTCTTTTATTATGGAGGCTGTGGTGGAAATCCAAATCGATTTTATACGAAAGAGGAGTGCTTGAAA
ACA
TGCCGTGTGTACAGACCTCCAAATCACGTCTGTTTGCTGCCAATCTGGGCGACGGCCATTAAGTCAAACCGTTTGAA
GCA
ATTTGAAAGCTACCCAGACTATGCAACATATATATTTTTACAGACTCCCTGGGTTATTTATCAACAATTTTATGTGG
ATA
GCGTTGCGATTCTGACAATTTTTGACATGCAATTTGCCATTTTCCATCTGCTTCAGCCGTATTTTGGGTGTGGAATT
TGG
CATTTTTCCGCAGGCTGCAATAAGTTTTGGCAGCGAATGCAGATGAGGCTCATGATGAGATGA
>Found with R. suavis J3-A7
ATGATTGAAATATCAGATTTGCAGAAAATTGGCATCGGCTTGGCTGGTTTTGGCATTTTCTTTTTGTTTCTCGGCAT
GCT
GCTGCTGTTCGATAAAGGACTGCTCGCCATTGGCAATATTCTATTCATATCGGGCCTGGCCTGCGTCATTGGCGTGG
AGC
GCACGATGCGCTTTTTCTTCCAACGGCACAAAGTCAAAGGCACAACGGCCTTCTTAGGGGGAATCGTCATCGTCCTG
CTG
GGATTCCCCATCTTCGGCATGATTATTGAATCCTATGGATTTTTCGCACTCTTCAGCGGCTTCTTCCCCGTGGCCAT
TAA
TTTCCTAGGCCGAGTGCCTGTTTTAGGATCGCTGTTTAATTTACCATTTATACAAAAGATTGTTCAAAAACTTGGTG
GAG
ACGGCAACCGAACTACAGTAtaa
>Found with R. suavis J3-B3
ATGGATGCACGAAAGTTTTCTACCCACATATTGGATACTTCGGTGGGAAAGGCGGCAGCCAATGTGAGAGTAACAGT
TTC
CAGGCTGGACGAGATTCAGGAATGGAGATCCCTTCGGGCGGCCCAAACTGATGCGGATGGTCGCTGCCTGCTCTTGG
AAC
CTGGTCAATTTCCCGGCGGGATCTATAAGCTGACCTTTCACGTGGGCGCCTATTACGCGGAGCGCAATGTGAGGACA
CTT
TATCCAGCAATTGACTTGATTGTGGATTGCAGTGAGAATCAGAACTATCACATTCCTTTGTTACTCAATCCCTTTGG
GTA
TTCCACATATCGTGGAACATAG
>Found with A. mellifera Contig1312
ATGGACATCTCAAAGGCACCAAATCCGCGAAAACTGGAGCTGTGTCGCAAATACTTCTTTGCTGGCTTTGCATTTCT
GCC
CTTTGTGTGGGCCATTAACGTTTGCTGGTTTTTCACGGAGGCCTTCCATAAGCCACCATTTTCGGAGCAGAGCCAAA
TAA
AGAGATATGTTATATACTCTGCAGTGGGGACTCTATTCTGGCTGATAGTACTAACTGCCTGGATAATAATATTCCAG
ACA
AATCGCACAGCCTGGGGCGCCACAGCGGACTATATGAGCTTCATCATACCCCTAGGCAGTGCATAG
>Found with A. mellifera Contig1481
TACCAATTACTTGTAAGCACAAAAAACAGCTGACGGCAACAAGTGGTTCGGTCCCCATCGGAATACACGTGCTCAAA
ACG
TGTGGGTTTTATTTGCCTTAATTGACTTAAATTCACTCGCAATAAGTGGAAATGATTCGAAAGGTGCCGCTAATTGT
AGT
CCTGGGCTCCACGGGCACCGGAAAGACGAAACTGTCTTTGCAACTGGCCGAACGCTTCGGAGGAGAAATAATCAGCG
CTG
ACTCCATGCAGGTTTACACCCACCTGGACATCGCCACCGCCAAGGCAACCAAGGAGGAGCAGTCCCGGGCACGACAT
CAT
CTACTGGACGTGGCCACACCGGCCGAACCCTTCACAGTCACTCACTTTCGTAACGCAGCACTGCCCATTGTGGAGCG
CCT
GCTCGCCAAGGACACTTCTCCGATTGTGGTGGGCGGCACGAATTACTACATAGAATCCCTACTTTGGGATATTCTGG
TTG
ACTCGGATGTCAAGCCGGACGAAGGCAAACATTCGGGGGAGCATCTTAAGGATGCCGAACTGAATGCTTTGTCCACC
CTC
GAGCTGCATCAGCACCTTGCCAAGATCGACGCAGGTAGTGCCAACCGTATTCACCCCAACAACCGGCGCAAGATCAT
CCG
GGCTATCGAAGTGTATCAGAGCACCGGGCAGACTTTGAGCCAGATGCTGGCGGAACAGCGGGCACAGCCGGGAGGAA
ACC
GCCTGGGTGGACCCCTTCGCTATCCACACATCGTTCTCCTTTGGTTGCGTTGCCAGCAGGATGTTCTAAACGAGCGA
TTG
GATTCCCGCGTAGATGGCATGCTGGCCCAAGGGCTGCTCCCTGAACTACGACAGTTTCACAATGCCCACCATGCTAC
CAC
TGTGCAAGCCTATACGTCGGGAGTTCTGCAGACGATTGGCTACAAGGAGTTTATTCCCTATCTGATCAAGTACGACC
AGC
AGCAGGACGAAAAGATAGAGGAGTACCTCAAAACCCATAGTTACAAGCTGCCAGGCCCAGAAAAACTGAAAGAAGAA
GGT
CTTCCAGATGGCTTGGAACTCCTACGCAATTGTTGCGAAGAACTAAAGTTAGTCACTCGCCGATACTCAAAGAAGCA
GCT
GAAGTGGATCAACAATCGATTCCTGGCCAGCAAAGATCGTCAAGTGCCGGATCTCTACGAACTGGACACCAGTGATG
TGT
CAGCTTGGCAGGTGGCAGTCTACAAGCGGGCAGAGACCATCATAGAAAGCTATCGAAACGAAGAGGCTTGCGAGATA
CTA
CCAATGGCCAAGCGGGAGCATCCTGGAGCGGATTTGGATGAGGAGACTAGCCATTTTTGTCAAATATGCGAACGGCA
TTT
CGTTGGGGAGTACCAATGGGGACTGCATATGAAGTCCAACAAACACAAGCGAAGAAAGGAGGGACAGCGCAAGCGGC
AAA
GGGATCACGAAACAATGCTCTCAACGGATCTAGCGAAGAAGCAAAAGGAGGAGAAAGAGGAGGCAGGAAAGGCGGAG
ACT
CAGCCACCACCCAGCCGAGTCAATGATACTGATAAGGCAATGtaa
>Found with A. mellifera Contig2709
TTGAACACAGATGTCACTTCTACAGGGGAAAAAAGTTTAAAAACAAGTAAATCACAGAAAACGTCGTTTCCTTTTGC
TAA
TAGAGCGCCTGAATTCGGTGGAAATAGCAAAAATAATATATCACCATTCTTGGGACTGCAAACAAATTCGAAAATGA
GTG
ACAATTTTTCAAGAACACCATATTCTGATGGGCACGCTGCAACCCATGAGGAAGCATCAAAACCCCACTACACTACC
ACT
ACGAGTTCTTTTAGTAGAACTCCGGTCTCGCCGTACCTCAACTATGATTCGCGATATCTGCAGCAAGCACAGCCAGA
GTT
CATTTTTCCCGAAGGGGCCAACAAGCAGCGTGGACGCTTCGAGTTGGCCTTCTCTCAGATAGGCACTTCGGTAATGA
TTG
GCGGTGGAATTGGCGGCCTAGCAGGTGTTTATAATGGTTTAAAAGTCACAAAAGCACTCGAGCAGAAGGGAAAAGTT
CGT
CGAACACAGTTACTTAATCACATTATGAAGCAAGGTTCCGGCACAGCTAACACATTAGGTACATTGACGGTGCTGTA
TTC
GGCTTGTGGAGTTTTGCTGCAGTTTTTCCGCGGAGAAGATGATCATATAAACACAGTAATTGCGGGCTCTGCCACAG
GAC
TATTATACAAGTCAACAGCTGGCCTTAGGACGTGTGCTTTTGGTGGAGCTATTGGGCTGGGCATCTCGTCCCTCTAT
TGC
TTATACCTAATAGCACAGGAAAACAGTTCGAACTCAAGTCCCAAATACCTATAG
>Found with A. mellifera BB260003B20H2.F
ATGGTGGATTTTTTTGAAAAGCTTAGACGCGGTCACACATTTATTTACATCGAACATATGATGGGCACGCCGGAATT
AAA
AATCATATTAGAATTCAGTGCAGGGGCGGAGTTACTATTTGGTAACATAAAACGCCGTGAATTGAACTTGGACGGTA
AAC
AAAAATGGACTATTGCTAATCTGCTTAAGTGGATGCATGCGAATATTTTAACGGAGCGTCCGGAACTTTTTCTTCAA
GGA
GATACTGTGCGACCTGGAATTTTAGTACTCATAAATGATACAGACTGGGAATTGCTGGGTGAACTGGACTACGAGCT
GCA
GCCCAACGACAATGTGTTGTTTATATCAACTTTACACGGTGGTTAA
>Found with A. mellifera BB260004B10A11.F
ATGGAGAAATCTGAAATACGACTGCAACGCATGTCTAATGAATATCAGTCGCAATCGAGCTATATGTACCTCCGGAC
CAA
GATGCTGTTAAAAATCGAGAATACCCTACTTCGAAGCCATCGTCAGCGCGAGACCACCGGTATCAAGAAACTATACA
ATT
CGTTTTTCGTATTGTTTTAA
>Found with A. mellifera BB260010A20C3.F
TTCGCATATTTCAGTTATTTATTTAGAAATGGGGCGATTTAAGTTATGTGCTTCGCCGAGAGAGGTTATGAAGTACG
AAG
ACTTTATAAAACGCATTCGAAAAAGCCTCTACTATGGCGTTGGAACACCAGACACAGAAATGTCGGTCTCCTTACCC
TTT
GCGGAGTACGCGGCAGATTTGTTTTCGGAGACTCATCGCGGGCATTCTTTGCATCGCCTAAGTTGCGTATCTGCTGC
ACA
AGTACATGCCACGCCTTGCTCTTTAATTATGGCATTGATATACCTCGATCGCTTAAACGTCATCGACTCGGGCTATA
GCT
GCAGAATCACACCACAGCAGCTGTTTGTTGTGTCACTAATGATTTCCACAAAATTCTACGCGGGCCACGACGAACGG
TTC
TATCTGGAAGACTGGGCCAGTGACGCTTGTATGACGGAAGATAGGCTCAAGGCAGTCGAGCTCGAATTTCTTTCCGC
TAT
GGGTTGGAATATATACATATCCAATGAGCTATTCTTTGATAAGTTAAGAAACGTTGAACGTTCTTTGGCTGAACAGC
AGG
GACTGCGTAGAGGTTGGCTCACTTACAGTGAGCTCGTGCAGTTGCTGCCTAGCCTTGAATGGACGAAATTCCTCGTT
AAC
AGCCTGTCTGTACTATCTCTAAGCTATGCGGCAAGTATTATAACATTAGCCGGAGCTTTTTTTATTGCGAGCCAAGT
TCC
CGGTACGTTATGGCATCGGGATGTGGAAACTGCCTCAGATTTCACCATGACAATTAGCAGTCAGGTATCCGTTTCAA
ATG
CATTAGAGTCCACACCTTTTATTAATGTCCAAGTATCCTCACTTTTACGTAAAACGAGTAACGTGAATGTTGAATTG
ATG
AATCTTGAGAAGACAAGCTGCGCCAGGGCAAGACTGAATAAAATTGAATATAAGCATCCGCGCCATCAATCAGTACC
TAC
GCTTTCATTCATAAGCACCTGTCCACAACTTGATTTATTGTATGCCCAAGATGGAACAAGGAATTGGCTAAATATTA
AAT
CGCCCAACAGCGACTACAAAAACAACAGAAACCTTTCAATAACAGTTAGATCCGTACAACTAGAAGAGCAAAAGGCT
GAA
AATGATTCCGTTATTTGGCAAGCCAACACCGAAGCAATGCAGTAA
>Found with A. mellifera BB260019B20F2.F
ATGAAGGAAGAAGGCGGCACATTGCTGGGCGATAAAGGTGTACGAAGGCATCAGTCCATGCAGCGTCTGTCAGCGGA
GCA
GAATGGTGGTTCAACGACTGAACAAACACATGAACACAATCCAAACGTCGTACCTGATCATAGAGGCAACTTACACA
TTA
CAGTTAAGAAAACCAAACCAATTTTAGGTATTGCTATCGAAGGTGGTGCTAATACAAAACACCCGCTCCCTAGGATA
ATC
AATATCCATGAAAATGGTGCAGCATTTGAAGCGGGCGGCTTAGAAGTCGGCCAACTCATCCTGGAGGTAGATGGAAC
GAA
AGTGGAGGGTCTGCATCATCAGGAGGTTGCTCGACTAATAGCCGAATGCTTTGCTAATCGTGAAAAGGCTGAAATAA
CCT
TCTTAGTTGTCGAAGCAAAAAAATCAAATTTGGAACCGAAGCCGACGGCGCTGATATTTTTAGAAGCCTAA
>Found with A. mellifera BB260023A20H5.F
CTCTGTTTGAGGGCGTAGTTCCAACAAGTGCTGAGCATCACAATTTTCTATTACTAAGCCCAGCTTTGCGTTGGCGC
GCC
CCAGAATCTCATTTTATATTTAGTTTCTGCCAGTTTAGTTAATTAGTTAGTTGATAGTGTTGTTTGTTTCTTCTGCA
ACA
ATTGTGTGCGATAGGAGTCGGGCAAAATGTTCCCGTCGTCGATTTTGGGGCGCAGCTATTTGCTTTTTATGCTGGTG
CTC
GCCGTGGGCGTGTTCGCCCAACACGAGTGGCAGGCCCGGGATGCCTTTGATGAGATAAAGAGGCAGTTCGACAAGGT
GAA
CGCGGATAACTGCCCCATCCAACACCATTCGGACCTTTTCATGCCCATGGACGCGGTGTCCCACAAGCCGGACATCA
AGG
AGATCAACGTGAATCCGGTGTTCCCCAACCGAACTGCCCTGCTGCATCTGCAGAATATGGCCCTTAGCAGAAGCTTC
TTC
TGGAGCTACATCCTCCAGTCGAGGTTTATTCGACCCGCCATCAACGACACCTACGATCCCGGCATGATGTACTACTT
TCT
GTCCACCGTAGCCGATGTATCCGCCAACCCACATATCAACGCCTCGGCCGTGTACTTCTCCCCCAACAGCTCGTATT
CGT
CGTCGTATCGCGGCTTCTTCAATAAGACGTTCCCCAGATTCGGGCCAAGAACCTTCAGGCTGGACGACTTCAACGAT
CCC
ATTCATCTGCAGAAGATATCGACGTGGAATACTTTCGATGTTCAGGATCTGGGCGCCCATCACCCGGACTCCATATC
CAA
GGACTACACCCACGACCTGTATAAAATAAACGAGTGGTACCGCGCCTGGCTACCAGACAACGTCGAGGGACGGCACG
ATA
CGAAGATCACCTACCAGGTGGAAATCCGCTATGCGAACAACACAAACGAGACGTATACCTTCCACGGACCGCCTGGC
TCT
GAAGAAAACCCTGGTCCGATTAAATTTACAAGGCCGTACTTCGATTGTGGCAGGTCCAACAAGTGGCTGGTGGCCGC
AGT
AGTGCCAATTGCGGATATCTACCCCCGACACACGCAGTTCCGTCACATTGAGTATCCCAAATACACGGCCGTTTCGG
TTC
TTGAGATGGACTTCGAGCGTATCGACATAAACCAGTGTCCATTGGGTGAAGGCAACAAAGGACCTAATCACTTTGCG
GAT
ACGGCGCGGTGTAAAAAAGAAACGACAGAGTGTGAACCATTACAAGGCTGGGGCTTTAGGCGCGGTGGCTACCAGTG
CCG
TTGTAAGCCAGGTTTTCGGCTGCCCAACGTAGTGCGGCGACCTTATCTGGGCGAGATTGTGGAGCGCGCATCGGCAG
AAC
AGTACTACAACGAGTACGACTGCCTTAAGATTGGCTGGATCCAAAAGCTTCCCATTCAGTGGGATAAGGCCTCCTAC
CAC
ATTCGCCAAAAGTATCTGGACCGGCATCCGGAATATCGCAACTACACCACCGGCTCGCGATCACTTCATGCTGAGCA
CTT
AAATATTGATCAGGCGTTGAAGTATATTCATGGAGTCAACTATCGCACTTGCAAAAACTTCCATCCGCAGGATCTGA
TTC
TTCGCGGTGATGTGAGCTTCGGCGCCAAGGAGCAGTTCGAGAACGAAGCCAAGATGGCCGTGAGACTGGCCAACTTT
ATT
AGCGCCTTTCTGCAGAGTATGCAAACTATAACACGAATATCCTCCTTACAGGTATCGGATCCCAACGAAGTGTACTC
GGG
CAAGCGTGTGGCCGACAAGCCGCTGACCGAGGATCAAATGATCGGCGAGACCCTTGCCATTGTCCTGGGCGACAGCA
AGG
TTTGGTCGGCCACAATGCTCTGGGAGCGCAACAAGTTTACCAATCGCACATATTTCGCACCCTATGCCTACAAAACT
GAG
CTCAACACAAGAAAGTTCAAGGTGGAGGACCTGGCGCGGCTCAACAAGACGCACGAACTCTACACGGAAAAGAAGTA
CTT
CAAGTTCCTGAAGCAGCGCTGGAACACCAACTTCGACGACCTGGAGACCTTCTACATGAAGATCAAGATCCGCCACA
ATG
AAACAGGTGAATACCAGCAGAAGTACGAGCACTACCCAAATTCGTACAGAGCGGCCAACATCAAGCACGGCTACTGG
ACT
CAACCACAATTCGACTGCGATGGATATGTGAAGAAGTGGCTGGTGACCTATGCGGTGCCCTTCTTCGGCTGGGACAG
CCT
GAAAGTCAAGCTGGAATTCAAGGGTGTGGTAGCTGTCTCCATGGACATGCTGCAGCTGGACATCAACCAGTGCCCGG
ACT
GGTACTACGAACCGAACGCCTTTAAGAACACACACAAGTGTGACGAGCAATCGTCCTACTGCGTTCCCATTATGGGT
CGT
GGCTATGAAACCGGAGGCTACAAGTGCGAGTGCCTGCAGGGATACGAGTATCCTTTCGAGGATCTGATTACCTACTA
CGA
TGGACAGCTCGTCGAGGCCGAGTACCAAAATATTGTGGCTGATGTCGAGACCCGCTACGATATGTTCAAGTGCCGAC
TGG
CCGGAGCTTCGGGTCTGCAATCCGCTTTGGGACTTGTGGTCGCTCTGATCGGGCTCACGCTCACCCTGCTGTATAGA
TTT
AGTTAA
>extra2
ATGGTCAAGCAAGTGGATTTTGCGGAGGTGAAGCTCAGTGAGAAATTTCTCGGAGCTGGATCTGGTGGAGCGGTGCG
CAA
AGCCACCTTTCAAAATCAGGAGATTGCAGTAAAGATATTTGATTTCCTTGAGGAAACAATCAAAAAGAATGCAGAGA
GGG
AAATCACACATTTGTCGGAGATCGACCACGAAAACGTTATCAGGGTGATCGGGAGGGCCAGCAATGGAAAGAAGGAC
TAC
TTGTTGATGGAGTACCTGGAGGAGGGGTCCCTCCACAACTACCTCTATGGCGATGACAAGTGGGAGTACACCGTGGA
GCA
AGCGGTTCGCTGGGCACTCCAATGCGCCAAGGCCTTAGCATACTTGCATTCGTTGGATCGACCGATTGTTCACCGCG
ATA
TTAAGCCGCAAAACATGCTTTTATATAATCAGCATGAAGACTTAAAGATTTGTGACTTTGGCCTGGCGACGGATATG
TCC
AATAATAAGACCGATATGCAAGGAACATTGAGGTATATGGCTCCCGAGGCCATTAAGCACTTAAAGTATACGGCTAA
GTG
TGATGTGTACAGCTTTGGAATAATGCTCTGGGAGCTGATGACACGTCAATTGCCATATAGTCACTTGGAAAACCCCA
ACA
GCCAGTACGCCATTATGAAAGCTATCAGTTCAGGCGAAAAACTTCCAATGGAAGCAGTAAGATCCGATTGCCCAGAG
GGT
ATCAAGCAATTAATGGAATGTTGCATGGATATAAATCCCGAAAAGCGCCCCTCTATGAAGGAGATCGAAAAGTTCCT
TGG
CGAACAGTATGAATCCGGCACTGACGAGGACTTTATCAAGCCTTTGGATGAGGATACCGTGGCTGTGGTGACCTACC
ATG
TGGATTCGTCCGGCAGCAGGATAATGCGTGTTGATTTCTGGCGACATCAGTTGCCATCGATCCGCATGACTTTTCCG
ATA
GTGAAACGGGAAGCCGAAAGATTGGGAAAGACCGTTGTCAGAGAAATGGCCAAGGCGGCGGCGGATGGAGATCGGGA
AGT
TCGGCGGGCTGAGAAGGACACGGAGCGTGAAACCTCGAGGGCTGCCCACAATGGAGAGCGGGAAACGCGGAGAGCGG
GTC
AGGATGTGGGTCGTGAAACTGTACGGGCGGTCAAGAAAATAGGAAAGAAACTGCGCTTCTAA
>Found with A. mellifera BB270004B10G5.F
CTGGAACAGACCGCATTCAGTGGCTCTTATCGGTAGAAACAGCAGCACTTTTCCGAGATGTCTATCAAATCCTTGAC
ATA
CGTTGCGATCTTTGGCCTTTTTTGGGGCTCAATTGCGGGAACTGTAGTTGATCAGTTTGGGATATATGGTGGTTCAC
CGA
TTACCACCACGGAAAGGAGTAATGCGGAGTTGCGCTGCATGAACATCAATCCGCAGAACTCGGTGGACTTGGAGCAG
ATG
ATGGGACTCTGGTACGGCAGCGAGATTATCGTGCACAGCCAAGATTTTCCGGGCACCTACGAGTACGACTCATGTGT
CAT
CATTCATCTGACCGATGCCACGGATCAGATCCGTTTGAGCCAAGCAAATCGCGGCTATGGCTATGGAAATCAGGACT
ACA
ACCGTAACCAGAATAACTATGGACGCACCACCACCACTCAATCCTCCTATCCGGATAGCGATGAGTACCCGTTGAGA
TCG
ATTCAAAGCCAGCAGAAGTACCTACGTTTGATTTGGAGTGAGCGTGATAACAATCTGGAGTATACTTTCAACTATAC
CAC
CAGTGCACCTGGTCAGTGGTCCAACATCGGCGATCAGCGGGGATCCTTGGTCACCCTGAACACGTACACCCAGTTCA
CGG
GCACTGTCCAGGTGGTGAAAGCGGTCAACGATCACCTGGTGCTGACCTTCTGCGGCAACGATGTTAAGAGCTCCATA
TAC
ACAGTGGTTCTCACCCGCAATCGCCTTGGTCTCAGTTTAGATGAGCTGCGTAGCATCAGGAATCTGCTTTCCCGCCG
TGG
ACTCTACACGGAGACCATTCGCAAGGTTTGCAATGGATGTGGGCGATTGGGTGGCAGCCTCTTCGCTCTTTTAGCCC
TTT
TGCTGGTCGTACGTTTGGCCTGGGGGCGTGGCCAGTGA
>Found with A. mellifera BB270012B20H7.F
GACACAATGAACTCACAAAAAGAATACGTATCGGACTGCGAAACCGACGATGATTATTATGTCGATTTGTTAACTTC
AGG
CAAGGGCAGTGATAAGAGTGAAAGTGATGTGTCGGACAAGTCTGAAAATTATCCAGGCCTAAAATCAAAGCATACTG
CGA
AGGCATTGCGGAAAACAAGGCATTGTGACGGCGATAATAGGGAATACAGGTCTAAGGAGTGCGACGACCTTCATTCC
GAA
GAGGAGTCTGAAAAATCGCGGTCGGATGCTTTATGGGCCGATTTTCTTGGCGACATTGATACTAAAAGCGTAATCAA
CCA
AAAAACAGATTATACGGAGGGAAACGCAGCAAGTGCTACCAATACCAATACGCATGAGACTTGTAATAAATATGATA
AAA
ACGATACGGCAATAATAAAAACTGCACAGCAATACGATTCCAAAAGAACCACGCTTTCAGTTTCCACACTCGGAAAA
ATT
AAACGATCATCCGCTGAAAAGAGTATCGGTACCATGATAAATAAATTTGAAAAGAAGAAAAAATTGACAGTGCTTGA
AAG
GTCACAATTGGATTGGAAAATATTTAAACAAGACGAAGGCATAGACGAACTTCTGTGCTCGCATAACAAAGGCAAGG
ACG
GGTATTTGGACCGTCAAGACTTTTTGGAGAGAACCGATCTTAGGCAGTTTGAAATGGAAAAGAAGTTGCGGCTGTCT
CGC
AGGCCATACTAA
>Found with A. mellifera BB270013A20H11.F
CTTCATTTAGGCTGGTTAGGTGGTTAATTCCATTTGTCTTCGTTCTTTTGTATTATTTTTACAAAGCGATAATATTT
TAA
TCGTTTATGATTATTACAATATAACAAAAAGTTAACATCTTTGGAATCTTAAAAATGAGTTTTCATTTTGCTGTACT
GAC
CCTTATTTTAACAGCCTTCACAGTTTCTCTGTGTGCTGAACAAAAAATTACAAAGAGTGACGCAGGTGAAATACGAA
TTT
TCAAACGTCTTATTCCTGCCGATGTTCTACGAGATTTTCCGGGAATGTGCTTTGCTTCAACTCGATGTGCCACTGTT
GAG
CCTGGAAAGTCGTGGGACCTTACTCCATTCTGCGGTCGATCTACTTGTGTTCAAAATGAGGAAAATGATGCAAAGCT
ATT
CGAACTCGTAGAAGACTGCGGCCCATTGCCACTGGCGAATGACAAATGTAAATTGGACACAGAGAAGACTAATAAAA
CCG
CATCGTTTCCTTATTGCTGCCCCATCTTTACATGTGACCCCGGTGTTAAATTGGAATACCCCGAGATCGGAAAGGAT
AAT
GACAAAAAGAATTCTGAGTGA
>Found with A. mellifera BB270028A10H8.F
ATGAAAAATGACAGCTGTTCGCTGCGAATGGCTATTTATGTTTGTTTTGACTCGGCTTTCGAATATATCGCAAAATA
TAT
ACAGGAAACATTTATATTCACAAAAATCTGTACGATGCACCCAGGGCAAATTGAGGTCAACGAGATCAATGGCTATT
GGA
CATTTCTGCTGAGCATCGATTGGAAGGATCCCTGGCTTATTGGCCTTATTTTGGCGCATATCTTAACCACCACCACT
GCG
CTGCTCAGCCGGAACAGCTCCAACTTCCAGGTTTTCCTCTTCCTAGTACTGTTGCTGGCAGTCTACTTCACCGAAAG
CAT
CAATGAGTTCGCTGCTAACAACTGGAGTTCCTTTTCCAGACAACAATACTTCGATAGCAACGGCCTGTTTATCTCGA
CAG
TTTTCTCAATACCTATTTTGCTTAATTGCATGCTTTTGATTGGCACTTGGCTCTACAACTCCACGCAGCTGATGGTG
ACT
CTAAAAACAGCGCAGCTCAAGGAGCGAGCTCGCAAGGAACGCCAGACTAAGGCGGATTCGGAATCCATAGCACATAA
AAA
GGCAGAGTAG
>Found with R. suavis J3-A2
MQGLGLQSLKKNPALIPLYVCVGAGAIGAVYYMARLATRNPDVTWNRTSNPEPWQEYKEKQYKFYSPVRDYSKTKSA
APN
FDE
>Extra1
MLNLNLLDCIVPEISTFIQTDGQRQIDSDIDPDREYIGFIW
>Found with R. suavis J3-D1
MYIYFPTIFLLFLYPVVAVVPQGFTIKQPKCWYVANPGPCDDFVKVWGYDYLTNRCIFFYYGGCGGNPNRFYTKEEC
LKT
CRVYRPPNHVCLLPIWATAIKSNRLKQFESYPDYATYIFLQTPWVIYQQFYVDSVAILTIFDMQFAIFHLLQPYFGC
GIW
HFSAGCNKFWQRMQMRLMMR
>Found with R. suavis J3-A7
MIEISDLQKIGIGLAGFGIFFLFLGMLLLFDKGLLAIGNILFISGLACVIGVERTMRFFFQRHKVKGTTAFLGGIVI
VLL
GFPIFGMIIESYGFFALFSGFFPVAINFLGRVPVLGSLFNLPFIQKIVQKLGGDGNRTTV
>Found with R. suavis J3-B3
MDARKFSTHILDTSVGKAAANVRVTVSRLDEIQEWRSLRAAQTDADGRCLLLEPGQFPGGIYKLTFHVGAYYAERNV
RTL
YPAIDLIVDCSENQNYHIPLLLNPFGYSTYRGT
>Found with A. mellifera Contig1312
MDISKAPNPRKLELCRKYFFAGFAFLPFVWAINVCWFFTEAFHKPPFSEQSQIKRYVIYSAVGTLFWLIVLTAWIII
FQT
NRTAWGATADYMSFIIPLGSA
>Found with A. mellifera Contig1481
MIRKVPLIVVLGSTGTGKTKLSLQLAERFGGEIISADSMQVYTHLDIATAKATKEEQSRARHHLLDVATPAEPFTVT
HFR
NAALPIVERLLAKDTSPIVVGGTNYYIESLLWDILVDSDVKPDEGKHSGEHLKDAELNALSTLELHQHLAKIDAGSA
NRI
HPNNRRKIIRAIEVYQSTGQTLSQMLAEQRAQPGGNRLGGPLRYPHIVLLWLRCQQDVLNERLDSRVDGMLAQGLLP
ELR
QFHNAHHATTVQAYTSGVLQTIGYKEFIPYLIKYDQQQDEKIEEYLKTHSYKLPGPEKLKEEGLPDGLELLRNCCEE
LKL
VTRRYSKKQLKWINNRFLASKDRQVPDLYELDTSDVSAWQVAVYKRAETIIESYRNEEACEILPMAKREHPGADLDE
ETS
HFCQICERHFVGEYQWGLHMKSNKHKRRKEGQRKRQRDHETMLSTDLAKKQKEEKEEAGKAETQPPPSRVNDTDKAM
>Found with A. mellifera Contig2709
MSDNFSRTPYSDGHAATHEEASKPHYTTTTSSFSRTPVSPYLNYDSRYLQQAQPEFIFPEGANKQRGRFELAFSQIG
TSV
MIGGGIGGLAGVYNGLKVTKALEQKGKVRRTQLLNHIMKQGSGTANTLGTLTVLYSACGVLLQFFRGEDDHINTVIA
GSA
TGLLYKSTAGLRTCAFGGAIGLGISSLYCLYLIAQENSSNSSPKYL
>Found with A. mellifera BB260003B20H2.F
MVDFFEKLRRGHTFIYIEHMMGTPELKIILEFSAGAELLFGNIKRRELNLDGKQKWTIANLLKWMHANILTERPELF
LQG
DTVRPGILVLINDTDWELLGELDYELQPNDNVLFISTLHGG
>Found with A. mellifera BB260004B10A11.F
MEKSEIRLQRMSNEYQSQSSYMYLRTKMLLKIENTLLRSHRQRETTGIKKLYNSFFVLF
>Found with A. mellifera BB260010A20C3.F
MGRFKLCASPREVMKYEDFIKRIRKSLYYGVGTPDTEMSVSLPFAEYAADLFSETHRGHSLHRLSCVSAAQVHATPC
SLI
MALIYLDRLNVIDSGYSCRITPQQLFVVSLMISTKFYAGHDERFYLEDWASDACMTEDRLKAVELEFLSAMGWNIYI
SNE
LFFDKLRNVERSLAEQQGLRRGWLTYSELVQLLPSLEWTKFLVNSLSVLSLSYAASIITLAGAFFIASQVPGTLWHR
DVE
TASDFTMTISSQVSVSNALESTPFINVQVSSLLRKTSNVNVELMNLEKTSCARARLNKIEYKHPRHQSVPTLSFIST
CPQ
LDLLYAQDGTRNWLNIKSPNSDYKNNRNLSITVRSVQLEEQKAENDSVIWQANTEAMQ
>Found with A. mellifera BB260019B20F2.F
MKEEGGTLLGDKGVRRHQSMQRLSAEQNGGSTTEQTHEHNPNVVPDHRGNLHITVKKTKPILGIAIEGGANTKHPLP
RII
NIHENGAAFEAGGLEVGQLILEVDGTKVEGLHHQEVARLIAECFANREKAEITFLVVEAKKSNLEPKPTALIFLEA
>Found with A. mellifera BB260023A20H5.F
MFPSSILGRSYLLFMLVLAVGVFAQHEWQARDAFDEIKRQFDKVNADNCPIQHHSDLFMPMDAVSHKPDIKEINVNP
VFP
NRTALLHLQNMALSRSFFWSYILQSRFIRPAINDTYDPGMMYYFLSTVADVSANPHINASAVYFSPNSSYSSSYRGF
FNK
TFPRFGPRTFRLDDFNDPIHLQKISTWNTFDVQDLGAHHPDSISKDYTHDLYKINEWYRAWLPDNVEGRHDTKITYQ
VEI
RYANNTNETYTFHGPPGSEENPGPIKFTRPYFDCGRSNKWLVAAVVPIADIYPRHTQFRHIEYPKYTAVSVLEMDFE
RID
INQCPLGEGNKGPNHFADTARCKKETTECEPLQGWGFRRGGYQCRCKPGFRLPNVVRRPYLGEIVERASAEQYYNEY
DCL
KIGWIQKLPIQWDKASYHIRQKYLDRHPEYRNYTTGSRSLHAEHLNIDQALKYIHGVNYRTCKNFHPQDLILRGDVS
FGA
KEQFENEAKMAVRLANFISAFLQSMQTITRISSLQVSDPNEVYSGKRVADKPLTEDQMIGETLAIVLGDSKVWSATM
LWE
RNKFTNRTYFAPYAYKTELNTRKFKVEDLARLNKTHELYTEKKYFKFLKQRWNTNFDDLETFYMKIKIRHNETGEYQ
QKY
EHYPNSYRAANIKHGYWTQPQFDCDGYVKKWLVTYAVPFFGWDSLKVKLEFKGVVAVSMDMLQLDINQCPDWYYEPN
AFK
NTHKCDEQSSYCVPIMGRGYETGGYKCECLQGYEYPFEDLITYYDGQLVEAEYQNIVADVETRYDMFKCRLAGASGL
QSA
LGLVVALIGLTLTLLYRFS
>extra2
MVKQVDFAEVKLSEKFLGAGSGGAVRKATFQNQEIAVKIFDFLEETIKKNAEREITHLSEIDHENVIRVIGRASNGK
KDY
LLMEYLEEGSLHNYLYGDDKWEYTVEQAVRWALQCAKALAYLHSLDRPIVHRDIKPQNMLLYNQHEDLKICDFGLAT
DMS
NNKTDMQGTLRYMAPEAIKHLKYTAKCDVYSFGIMLWELMTRQLPYSHLENPNSQYAIMKAISSGEKLPMEAVRSDC
PEG
IKQLMECCMDINPEKRPSMKEIEKFLGEQYESGTDEDFIKPLDEDTVAVVTYHVDSSGSRIMRVDFWRHQLPSIRMT
FPI
VKREAERLGKTVVREMAKAAADGDREVRRAEKDTERETSRAAHNGERETRRAGQDVGRETVRAVKKIGKKLRF
>Found with A. mellifera BB270004B10G5.F
MSIKSLTYVAIFGLFWGSIAGTVVDQFGIYGGSPITTTERSNAELRCMNINPQNSVDLEQMMGLWYGSEIIVHSQDF
PGT
YEYDSCVIIHLTDATDQIRLSQANRGYGYGNQDYNRNQNNYGRTTTTQSSYPDSDEYPLRSIQSQQKYLRLIWSERD
NNL
EYTFNYTTSAPGQWSNIGDQRGSLVTLNTYTQFTGTVQVVKAVNDHLVLTFCGNDVKSSIYTVVLTRNRLGLSLDEL
RSI
RNLLSRRGLYTETIRKVCNGCGRLGGSLFALLALLLVVRLAWGRGQ
>Found with A. mellifera BB270012B20H7.F
MNSQKEYVSDCETDDDYYVDLLTSGKGSDKSESDVSDKSENYPGLKSKHTAKALRKTRHCDGDNREYRSKECDDLHS
EEE
SEKSRSDALWADFLGDIDTKSVINQKTDYTEGNAASATNTNTHETCNKYDKNDTAIIKTAQQYDSKRTTLSVSTLGK
IKR
SSAEKSIGTMINKFEKKKKLTVLERSQLDWKIFKQDEGIDELLCSHNKGKDGYLDRQDFLERTDLRQFEMEKKLRLS
RRP
Y
>Found with A. mellifera BB270013A20H11.F
MSFHFAVLTLILTAFTVSLCAEQKITKSDAGEIRIFKRLIPADVLRDFPGMCFASTRCATVEPGKSWDLTPFCGRST
CVQ
NEENDAKLFELVEDCGPLPLANDKCKLDTEKTNKTASFPYCCPIFTCDPGVKLEYPEIGKDNDKKNSE
>Found with A. mellifera BB270028A10H8.F
MKNDSCSLRMAIYVCFDSAFEYIAKYIQETFIFTKICTMHPGQIEVNEINGYWTFLLSIDWKDPWLIGLILAHILTT
TTA
LLSRNSSNFQVFLFLVLLLAVYFTESINEFAANNWSSFSRQQYFDSNGLFISTVFSIPILLNCMLLIGTWLYNSTQL
MVT
LKTAQLKERARKERQTKADSESIAHKKAE
\-----------------------------------------------------------------------------
\--
DOI
Associated Information
Comments
Associated Files
Other Information
Secondary IDs
    Language of Publication
    English
    Additional Languages of Abstract
    Parent Publication
    Publication Type
    Abbreviation
    Title
    ISBN/ISSN
    Data From Reference