Subject: EG vs CG: some more... Hello Michael and all: Here are some more cases you asked me to check: 1. EG:34F3.1 \+ EG:34F3.2 > CG12467 Comment: reported by Michael Takis : These genes are mere predictions. EDGP reported Genefinder predictions as such; Celera reported (presumambly) Genie ones. However, EG:34F3.1 contains motif PS00012 (PHOSPHOPANTETHEINE), whereas EG:34F3.2 contains PF00169 (PH domain) and PF00784 (domain in myosin and kinesin tails). Could the motifs in two genes be connected? (I don't know) 2. CG11381 \+ CG14624 compared to EG:BACR42I17.9 Comment: reported by Michael Takis : no evidence for either case. EDGP reports the longest gene. EDGP reports one gene that constitutes the intact Genefinder prediction. Celera reports two genes, presumambly two different Genie/Genscan predictions. EDGP also misses one exon at the 5' of the gene (reported by Celera). 3. CG18166 \+ CG18273 compared to EG:171D11.6 Comment: reported by Michael; Takis did the CLUSTAL analysis Takis : Celera's genes are wrongly reported. EDGP followed the predictions of Genscan and Genefinder as well as EST matches. Wherever the two predictions didn't agree, EDGP adapted the longest exon, if was agreeing with the EST hits. Based on that, the last exons were discarded, since the EST matches support presence of genes on the other strand. Most curiously, gene CG18166 seems to be part of gene CG18273, as it is shown in the alignment that is attached. All the best, t;) \------------------------------------------------------------------------------ -- CLUSTAL W (1.7) multiple sequence alignment >.............................................................................. ....... (truncatd part; Celera's genes do not match EG:171D11.6 >.............................................................................. ....... CG18166|FBan0018166|CT40990|FB \----------------------------------------MVRRSQEPEK CG18273|FBan0018273|CT41446|FB \-------------------------------------------------- EG_171D11.6 REEQKKRNESESSEKTKAEPKVDHKKKNRDPETAKIQELEDNGKQRQPKL CG18166|FBan0018166|CT40990|FB LLEENNSKTVRPLVTRLNYRDANATRLLNVALTHRQRLHLDPDEIEFVLS CG18273|FBan0018273|CT41446|FB \----------------LNYRDANATRLLNVALTHRQRLHLDPDEIEFVLS EG_171D11.6 IDENFRRICKIYIGHSLNYRDANATRLLNVALTHRQRLHLDPDEIEFVLS CG18166|FBan0018166|CT40990|FB SYWRQLNTDIEVGDTFASSALDCLEPAIKLILGYKTNEDFLLLLRRLSSQ CG18273|FBan0018273|CT41446|FB SYWRQLNTDIEVGDTFSSSALDCLEPAIKLIIGYKTNEDFLLLLHRLSSQ EG_171D11.6 SYWRQLNTDIEVGDTFSSSALDCLEPAIKLIIGYKTNEDFLLLLHRLSSQ CG18166|FBan0018166|CT40990|FB VDMLDVDIKHLISHGGSWQHRQPDCNSTTQ-------------------- CG18273|FBan0018273|CT41446|FB VEQMARPASQTEHTALQNVLTLLALFAKCSLSSVKGAMLNEHFEVISVSV EG_171D11.6 VEQMARPASQTEHTALQNVLTLLALFAKCSLSSVKGAMLNEHFEVISVSV \*: : .: . . :. . CG18166|FBan0018166|CT40990|FB \-------------------------------------------------- CG18273|FBan0018273|CT41446|FB ALRLPEPKDLAYSGHALRLLEAQRNLAGNRTVPLTGESLDCLLSSMLDID EG_171D11.6 ALRLPEPKDLAYSGHALRLLEAQRNLAGNRTVPLTGESLDCLLSSMLDID CG18166|FBan0018166|CT40990|FB \-------------------------------------------------- CG18273|FBan0018273|CT41446|FB IKHLISHGGSWQQFVDLYSALTDNLIVLLKQHSNLMSDRAAQLSVLCQDL EG_171D11.6 IKHLISHGGSWQQFVDLYSALTDNLIVLLKQHSNLMSDRAAQLSVLCQDL CG18166|FBan0018166|CT40990|FB \-------------------------------------------------- CG18273|FBan0018273|CT41446|FB IQAVVGYRAERKQTQDISETELDGLADLGLKLATVMATVRATQALAVKRV EG_171D11.6 IQAVVGYRAERKQTQDISETELDGLADLGLKLATVMATVRATQALAVKRV CG18166|FBan0018166|CT40990|FB \-------------------------------------------------- CG18273|FBan0018273|CT41446|FB APFLLIFTIRQMVATERPTTLFEKIKVHIVRVCHELIGICDHRAGHFILR EG_171D11.6 APFLLIFTIRQMVATERPTTLFEKVCFTLRERHWPSRSQNSSFAAFRRID CG18166|FBan0018166|CT40990|FB \----------------------------------- CG18273|FBan0018273|CT41446|FB SSNEAGARMYEGLVKDHEKYHKFRGKV-------- EG_171D11.6 QRFLFRVKKYMVLYSLRSNNCHFFILCVHVVYIFL \------------------------------------------------------------------------------ -- Subject: Re: EG vs CG: some more... Hi All, In response to Takis mail I only have one thing to add: >1. EG:34F3.1 \+ EG:34F3.2 > CG12467 > Comment: reported by Michael > Takis : These genes are mere predictions. EDGP reported Genefinder > predictions as such; Celera reported (presumambly) Genie ones. > However, EG:34F3.1 contains motif PS00012 (PHOSPHOPANTETHEINE), whereas > EG:34F3.2 contains PF00169 (PH domain) and PF00784 (domain in myosin > and kinesin tails). Could the motifs in two genes be connected? (I > don't know) In Swiss-prot there are no known genes that carry all three motifs mentioned above (apart from TrEMBL entry Q9W5D0 for CG12467!). Myosin proteins can carry ph domins so I suggest that this region remains annotated as two genes and CDS feature for CG12467 be split into two potential reading frames. thanks Nellie