Subject: Help Swiss-prot - Cs protein sequence Dear Dr. Ayala, I am an annotator (curator) at the SWISS-PROT protein sequence database (http://www.expasy.ch/sprot or http://www.ebi.ac.uk/sprot) and I have a question I am hoping you can answer. While curating the SWISS-PROT entry G3790084 (See http://www.expasy.ch/cgi-bin/get-full-entry?TREMBL_NEW-ID:'G3790084') I encountered the following problem: The sequence you submitted is very different from that submitted by Eveleth and Marsh (Molec. General Genet. 209: 290-298 (1987)). They suggest there are several potential open reading frames in the Cs region, none of which show a strong codon bias. Their sequence is only 245 amino acids but does have two regions that are identical to regions of your 504 amino acid sequence. Do you know the origin of the sequence discrepancies? Alternative splicing, frameshifts etc. I would be very grateful if you could spare the time to answer this question. With my best regards, Eleanor Whitfield. > Dear Dr. Whitfield, Nucleotide sequence of Cs in D. melanogaster submitted by Eveleth and Marsh (1987) differs from our sequence (Tatarenkov, Saez, Ayala) by occurrence of several deletions and insertions (indels), with consequent shifts in reading frame. We decided to re-sequence Cs in D. melanogaster for verification purposes during our work with Scaptodrosophila lebanonensis, when we found that sequences of the two species are highly similar along 1.5 kb region at nucleotide level, but only in a few stretches at amino acid level. This region is an ORF in S. lebanonensis. Our Cs sequence of D. melanogaster differs from the published sequence by the occurrence of indels, which are predicted by the alignment of the previously published sequence with the Cs sequence of S. lebanonensis. Encoded peptide sequence obtained by translating the corrected sequence of Cs in D. melanogaster is 78% identical to that of S. lebanonensis. We also sequenced Cs in D. simulans, which is a closely related species to D. melanogaster. We found that ORF arrangement in D. simulans is identical to our sequence of D. melanogaster, but different from previously published sequence by Eveleth and Marsh. In both species we found a long ORF that extends for 1507 bp from the intron, determined in D. melanogaster by Eveleth and Marsh. The longest ORF previously proposed is 735 bp. Thus, the coding region of Cs is twice as long, as previously thought. Results of our study will be published in GENE under the title 'A compact gene cluster in Drosophila: the unrelated Cs gene is compressed between duplicated amd and Ddc' by A. Tatarenkov, A.G. Saez, and F.J. Ayala. Please, let me know if you have any other questions. Sincerely yours, Andrei Tatarenkov. Subject: anons needed please. Hi, Could you please add anon designations to a couple of lone ORFs shown in Fig 1, pg 291 of: Eveleth, D.D., Marsh, J.L. Overlapping transcription units in Drosophila: sequence and structure of the Cs gene. Molec. gen. Genet. 1987 209:290--298 FBrf0046992 I have written to an author of a subsequent paper and he has confirmed there are many sequencing errors in this paper, such that 3 of the 5 ORFs shown in Fig 1 are actually l(2)37Cs, the remaining 2 are not. Those two are the 4th and 5th ORF, if you read them from left to right (the remaining three include 2 that show the presence of an intron and the third is the longest shown). thanks Ele > Subject: Re: anons needed please. > Hi, > > Could you please add anon designations to a couple of lone ORFs shown in > Fig 1, pg 291 of: > Eveleth, D.D., Marsh, J.L. > Overlapping transcription units in Drosophila: sequence and structure of > the Cs gene. > Molec. gen. Genet. 1987 209:290--298 > FBrf0046992 > > I have written to an author of a subsequent paper and he has confirmed > there are many sequencing errors in this paper, such that 3 of the 5 > ORFs shown in Fig 1 are actually l(2)37Cs, the remaining 2 are not. > Those two are the 4th and 5th ORF, if you read them from left to right > (the remaining three include 2 that show the presence of an intron > and the third is the longest shown). OK. here is a picture of Fig1. in the paper to make sure I get this right: _/\ |_________________________ 1 2 _/\__________ |___ 3 4 |__________ 5 so I think that you're saying that 1, 2 and 3 are actually l(2)37Cs ? while 4 and 5 aren't ? are 4 and 5 actually open reading frames, but just not of l(2)37Cs ? I will make this a pc from you to FlyBase. The anons from Fig1. will be: anon-37Cb for transcript 5 and anon-37Cc for transcript 4 (unless I got them wrong and 4 and 5 aren't the anon ORFs) Gillian > Subject: Re: anons needed please. >so I think that you're saying that 1, 2 and 3 are actually l(2)37Cs ? indeed >while 4 and 5 aren't ? are 4 and 5 actually open reading frames, but >just not of l(2)37Cs ? indeed >I will make this a pc from you to FlyBase. Well strictly it is information provided from Andrei Tatarenkov to me, is there anyway to represent that? I can forward you the emails if you wish. thanks again Ele