Open Close
Reference
Citation
Benos, T. (2000.12.13). FlyBase error report on Wed Dec 13 15:13:37 2000. 
FlyBase ID
FBrf0133223
Publication Type
Personal communication to FlyBase
Abstract
PubMed ID
PubMed Central ID
Text of Personal Communication
From benos@XXXX Wed Dec 13  15:14:47  2000
Envelope-to: gm119@XXXX
Delivery-date: Wed, 13 Dec 2000  15:14:47  \+0000
Date: Wed, 13 Dec 2000  15:13:37  \+0000 (GMT)
From: Takis Benos <benos@XXXX>
To: flybase-updates@XXXX
cc: Eleanor Whitfield <eleanor@XXXX>
Subject: CG_but_not_EG genes....
MIME-Version: 1.0
Content-Type: MULTIPART/MIXED; BOUNDARY='1053048513-877754332-976720417=:45284'
Content-Length: 23802
Sorry to bother you again....
Here is an anlysis i performed on div. 1-3 genes, reported by Celera (in
the version 1.0 of the Drosophila genome); but with no corresponding EDGP
prediction.
Some are small and/or underpredictions. The rest are wrong (i think!).
Please, let me know if there are any questions, etc...
Best regards,
takis ;)
From benos@XXXX Wed Dec 13  15:08:39  2000
Date: Wed, 13 Dec 2000  09:08:14  \-0600
From: Takis Benos <benos@XXXX>
To: benos@XXXX
\----- Begin Included Message \-----
>From ma11@XXXX Wed Oct 11  19:37:22  2000
Envelope-to: ma11@XXXX
Delivery-date: Wed, 11 Oct 2000  19:37:22  \+0100
To: benos@XXXX
Subject: CG but not EDGP
Cc: ma11XXXX, m.gattXXXX
X-Sun-Charset: US-ASCII
From: Michael Ashburner (Genetics) <ma11@XXXX>
Date: Wed, 11 Oct 2000  19:37:25  \+0100
Content-Length: 3583
Takis
Mellanie and I have just realised that the CG genes not predicted
by EDGP are non-random. There are runs of sequential or nearly
sequential CG numbers, which means that the genes are very close.
This strikes us as peculiar and odd and suspicious. Have you any ideas ?
Some are indeed very small and must be wrong, but by NO means all \!
M & M
CG genes not obviously predicted by EDGP:
(genes between '===' lines are tandemly arranged in the genome)
==============================================================================
==
CG13376 X CT32708 1 213 (-) GeneScene
Start:  cnt_1:25974  (frame=-2)
PVB: CG13376 small (60-70 a.a.) with low Genefinder score (sc=11.28).
Similarly low score for the Genscan prediction.
==============================================================================
==
CG13373 X CT32703 3 1038 (+)
GeneScene
Start:  cnt_1:230772  (frame=+3)
PVB: CG13373 is in a partially triplicated region.
WARNING: possible mis-assembly in Celera's sequence; in JG, gene CG13373
was supposed to be close to CG13376 (see above)!
CG18275 X CT41450 1 187 (-) GeneScene
Start:  cnt_1:231936  (frame=-1)
PVB: CG18275 is in the same partially triplicated region.
CG3176 X receptor CT10645 1 65 (+) GeneScene
Start:  cnt_1:235283  (frame=+2)
PVB: CG3176 is in the same partially triplicated region.
CG18273 X CT41446 3 1449 (-)
GeneScene
PVB: CG18273 is gene  EG:171D11.6 .
CG18166 X CT40990 2 638 (-) GeneScene
PVB: CG18166 is partial duplication of gene  EG:171D11.6  (see above).
==============================================================================
==
CG13365 X CT32692 1 494 (-) GeneScene
Start:  cnt_1:365582  (frame=-2)
PVB: CG13365 is in a partially duplicated region (as it shown from
partially duplicated  EST:AI294113 ).
==============================================================================
==
CG13362 X CT32687 1 768 (-) GeneScene
PVB: No comment. I couldn't locate it. Too many partial hits.
The 'gene' is obviously from repetitive region.
>CG13362
MLPGAQYDPYGMTSYAAGRRHDSVSRQEVTAPTDLSNAPSSSRTTSTTP
APTTTTTTTTTTTPAPTTRPSTTTTSTTPPPVPPPPQSTSSSFSEGPTS
SLLRFGEYPAYNRRVNNLYNARPQYPYPDYFNYQPQQQTVVSEQGVSSN
SRIQFVPCMCPVSMPSFVSSSTAATLPSQLSTSSTSFVSQPAARHIEGQ
ELEAEVDGETDNEGEEEDEDEEQGQGQEQDQGQSQSHSLEGIAIKAQTE
RQDITSDSPV
CG13361 X CT32686 2 1335 (-)
GeneScene
Start:  cnt_1:497318  (frame=-2)
PVB: CG13361 probably corresponds to predictions BACR19J1.e (Genefinder;
score=34.92) and/or BACR19J1.gs.3 (Genscan).  EST:AA140953  which
covers one of the predicted exons, extends into a region with stop
codons in all three frames. This made me believe that the EST should
be in a UTR region; thus there should be no (coding) gene CG13361.
==============================================================================
==
CG14635 X CT34396 1 360 (+) GeneScene
Start:  cnt_1:583083  (frame=+3)
PVB: CG14635 is small and i have no prediction in this area (not in this
strand anyway).
==============================================================================
==
CG14633 X CT34393 1 831 (-) GeneScene
Start:  cnt_1:619985  (frame=-2)
PVB: CG14633 corresponds to prediction BACR7A4.q (Genefinder;
score=33.27). It was not reported due to lack of additional evidence.
CG11663 X CT34392 2 573 (+) GeneScene
Start:  cnt_1:632851  (frame=+1)
PVB: The only prediction in the area is BACR7A4.u (Genefinder; sc=27.99).
The prediction was not reported due to low score and lack of any
other supportive evidence(s). Moreover, the conceptually translated
peptide is of very very low complexity.
>CG11663
MANPKSSGGNKSKGKGHQHRQSQQNSHQQQQQQQQQQSQQSQQPQMQTQI
TPAPVASTNLNTPTATPLASHPSEDTLALAAAVAASIPAAPLARPLPDRR
TTTPAVVTTTSNSSSETRNASENLATSRTASAAVAASENRRGILQRLFGW
SS
CG14632 X CT34391 4 1676 (-)
GeneScene
Start:  cnt_1:632537  (part of it; frame=-2)
PVB: No predictions in this area; but the gene is of low complexity.
>CG14632
MSDEVPLGRLSHIFDTLTNLQQQQHLRSQEQLHSQQHPHSQLQPEPQQS
SAEIRRRSASSSPSPSASASASTSGRATPSLGEVAGSGYLHTFPSHFYH
HQVHHLQQHSQPPSLPTQLGAARGSQSLQGSPLLAKRATSFSGQIPLAQ
GRFTASGTTAASGAIGLPASTPNSPRLLPRRAPRPPPIPAKPNQVKADQ
QSKDAQARNSTTTTVQATVNPVLAALDAPDAPWPHFSTLTEHLDVHQVN
NYGQALPQINWQERCLELQLELHRSKNQAGRIRDMLREKETLFS
==============================================================================
==
CG11639 X transcription CT34386 2 405 (-) GeneScene
Start:  cnt_1:749792  (frame=-2)
PVB: CG11639 is gene  EG:BACR7A4.7 .
CG14631 X CT34389 1 399 (+) GeneScene
Start:  cnt_1:750360  (frame=+3)
PVB: CG14631 corresponds to prediction BACR7A4.ag (Genefinder sc=27.98).
It was not reported due to low score and lack of other supportive
evidence.
==============================================================================
==
CG11393 X CT31813 1 353 (-) GeneScene
Start:  cnt_1:881702  (frame=-2)
PVB: CG11393 is a very small gene (52 a.a.). By the coordinates of the
hit, it must be included in  EG:BACR42I17.1  region.
I am very confident about  EG:BACR42I17.1 . It is supported by protein
and EST hits and it contains motif PS00813 (IF4E). I don't know
what's wrong with Celera's prediction.
==============================================================================
==
CG11381 X transcription CT31772 1 1365 (+)
GeneScene
PVB: CG11381 is (part of) gene BACR42I17.9 (reported).
==============================================================================
==
CG14770 X CT34578 2 694 (+) GeneScene
Start:  cnt_1:1086807  (frame=+3)
PVB: CG14770 corresponds to Genscan prediction 132E8.gs.2. The Genscan
score is low; there is no Genefinder prediction in this region and
no other supportive evidence. Thus, it was not reported.
==============================================================================
==
CG14771 X transcription CT34579 3 2318 (+)
GeneScene
CG14772 X CT34580 2 673 (+) GeneScene
PVB: There are two huge predictions in this region
( cnt_1:1095100-1120000 ). One from Genefinder and one from Genescan.
Very untypical for Drosophila genes (many small exons, big introns).
The two predictions do not agree in many of the exons; and none
includes region indicated from an EST cluster (e.g.  EST:AI062494 ).
The general genome organisation/evidences, made me suspicious and
thus i reported only the exons i was confident about, based on
protein similarity hits (gene  EG:132E8.4 ).
==============================================================================
==
CG14778 X CT34588 4 743 (+) GeneScene
Start:  cnt_1:1187350  (frame=+1)
PVB: CG14778 corresponds to Genefinder prediction 80H7.i. It was not
reported due to low score (sc=34.6) and lack of further supportive
evidence(s).
==============================================================================
==
CG3080 X CT10352 3 2540 (+)
GeneScene
Start:  cnt_1:1314702  (frame=+3)
PVB: I am not sure what CG3080 corresponds to. It looks like it's the
'tail' of a Genscan prediction. But there is no supportive evidence
in the whole region.
CG3729 X CT12497 2 845 (-) GeneScene
PVB: Small (101 a.a.) and repetitive. I cannot locate it. Should be
Genscan prediction 25D2.gs.1; which has very low score and no
suportive evidence.
>CG3729
MLCYVSLTIRRLHSLAPHCQLDAALDAVHWPLAPGPCPPSAIWHPPSPL
IQMLCRSAAIKITRQTTAAEQLKQKKKKKKEKEKRSGKRQQRKRKSGRG
G
==============================================================================
==
CG14797 X CT34609 3 787 (+) GeneScene
Start:  cnt_1:1397298  (frame=+3)
PVB: CG14797 probably corresponds to (non-reported) prediction 9D2.gs.1
(Genscan). This prediction as well as a similar (in all but one
exons) from Genefinder (pred. gene 9D2.j; sc=27.95) were not reported
due to their low score and lack of any other supportive evidences.
==============================================================================
==
CG14798 X CT34610 2 376 (+) GeneScene
Start:  cnt_1:1406487  (frame=+3)
PVB: CG14798 probably corresponds to the (non-reported) predicted gene
9D2.n (Genefinder; sc=23.26). The prediction was not reported due to
the low score, the absence of other supporting evidence(s) and the
overlap of its third exon with (reported) gene  EG:9D2.3  (on the
reverse strand).
==============================================================================
==
CG14810 X CT34623 1 556 (-) GeneScene
Start:  cnt_1:1528333  (frame=-3)
PVB: CG14810 probably corresponds to one of the (non-reported) predictions
30B7.gs.6 (Genscan) or 30B7.b (Genefinder; sc=19.37). The gene is
small, with low score and no other supporting evidence. Moreover it
is located in a region with a number of small tandem and inverted
repeats (and remnants of transp. elements).
CG14811 X CT34624 1 651 (-) GeneScene
Start:  cnt_1:1530386  (frame=-2)
PVB: CG14811 probably corresponds to (non-reported) predictions 30B7.a
(Genefinder; sc=26.47) and 30B7.gs.7 (Genscan). The prediction was
not reported due to the low score and the absence of other supporting
evidence(s).
CG14799 X CT34612 2 1018 (+)
GeneScene
Start:  cnt_1:1533897  (frame=+3)
PVB: CG14799 corresponds to (non-reported) predictions 30B7.h (Genefinder;
sc=8.34) and 30B7.gs.8 (Genscan). The prediction was not reported
due to the low score and the absence of other supporting evidence(s).
==============================================================================
==
CG14800 X CT34613 3 1260 (+)
GeneScene
Start:  cnt_1:1551750  (frame=+3)
PVB: The closest prediction to CG14800 is 131F2.gs.1 (Genscan). Only the
last exon(s) was reported from this prediction ( EG:131F2.2 ), based on
protein similarity hits (e.g.  SW:BCT5_BOVIN ).
==============================================================================
==
CG14806 X CT34619 3 663 (+) GeneScene
Start:  cnt_1:1588425  (frame=+3)
PVB: The only predictions i can find in this area are 63B12.e (Genefinder;
sc=15.36) and 63B12.gs.10. The predictions were not reported due to
the low score and the absence of other supporting evidence(s).
==============================================================================
==
CG14819 X CT34632 5 2022 (-)
GeneScene
Start:  cnt_1:1610637  (frame=-1)
PVB: CG14819 is probably is 'merged' prediction of 86E4.s (Genefinder;
sc=18.67) and 86E4.s (Genefinder; sc=18.94). The predictions were
not reported due to the low score and the absence of other supporting
evidence(s). Moreover,  EST:AA695846  does not agree with prediction
86E4.s.
==============================================================================
==
CG18082 X CT40582 1 222 (+) GeneScene
Start:  cnt_1:1919235  (frame=+3)
PVB: CG18082 is small (~70 a.a.), with low score and no supporting
evidences (pred. gene 30B8.a; Genefinder sc=10.05).
CG14052 X CT33613 3 1785 (+)
GeneScene
Start:  cnt_1:1919541  (frame=+3)
PVB: CG14052 corresponds to prediction 30B8.b (Genefinder; sc=24.88). The
prediction was not reported due to the low score and the absence of
other supporting evidence(s). Also, the predicted gene is of low
complexity.
CG18850
Start:  cnt_1:1921912  (frame=-3)
PVB: CG18850 corresponds to prediction 30B8.t (Genefinder; sc=18.05). The
prediction was not reported due to the low score and the absence of
other supporting evidence(s). Also, the predicted gene is of low
complexity.
WARNING!
Most importantly, in EDGP sequence THE PREDICTION OVERLAPS WITH GENE
 EG:30B8.4  (also known as pecanex; pcx; FBgn0003048)! However, in
the JG this gene is predicted between the previous two.
==============================================================================
==
CG3091 X binding or CT9997 6 1492 (-)
GeneScene
Start:  cnt_1:1949751  (frame=-1)
PVB: CG3091 is (partly) predicted gene 30B8.l (Genefinder; sc=34.63).
This prediction was supported by EST(s), but it was not reported
because it is consisted of a partial duplication of gene  EG:30B8.3 
(without the protein similarity thoug; different translation frame).
==============================================================================
==
CG3073 X enzyme CT9957 3 1834 (-)
GeneScene
Start:  cnt_1:1957696  (frame=-3)
PVB: CG3073 is probably consisted of the joint Genscan predictions
25E8.gs.1 and 30B8.gs.8. This gene is supported by EST hits,
therefore I SHOULD HAD REPORTED IT. The reason i missed it, is that
it is located in between two cosmids.
==============================================================================
==
CG14049 X CT33608 2 402 (-) GeneScene
Start:  cnt_1:2019302  (frame=-2)
PVB: CG14049 corresponds to prediction BACH48C10.gs.3 and (partly) to
prediction BACH48C10.e (Genefinder; sc=11.99). The predictions were
not reported due to the low score and the absence of other supporting
evidence(s).
==============================================================================
==
CG7894 X cell adhesion CT23737 3 6037 (-)
GeneScene
Start:  cnt_1:2129993  (frame=-2)
PVB: CG7894 is probably consisted of the joint Genefinder predictions
BACR25B3.n (sc=37.86) and BACH59J11.gs.6 (sc=9.99). This gene is
supported by EST hits and (weak) protein similarity hits, therefore I
SHOULD HAD REPORTED IT. The reason i missed it, is that it is
located in between two BACs.
==============================================================================
==
CG8310 X transporter CT24372 3 821 (-) GeneScene
Start:  cnt_1:2257964  (frame=-2)
PVB: CG8310 is gene  EG:BACR25B3.4 .
CG8636 X translation CT25021 2 959 (-) GeneScen
Start:  cnt_1:2288033  (frame=-2)
PVB: CG8636 corresponds to prediction BACR7C10.n (Genefinder; sc=61.87).
It is supported by (weak) protein hits (corresponding to translation
factors) as well as ESTs. However, one of the EST hits (AA820554) is
in close proximity (100bp) from the end of the transposable element
Burdock ( EG:BACR7C10.5 ) that was also found there. Both the nature
of the protein hits and the close proximity with the transposable
element, made me suspicious that this may not be a 'real' Drosophila
gene. Thus, i did not report it.
==============================================================================
==
DOI
Associated Information
Comments
Associated Files
Other Information
Secondary IDs
    Language of Publication
    English
    Additional Languages of Abstract
    Parent Publication
    Publication Type
    Abbreviation
    Title
    ISBN/ISSN
    Data From Reference