Category 1: genes that we have no clue about - no sign of them in any jam page, no mention anywhere. Leyla's method: BLASTP (at BDGP) of entry in aa_embl_dros against the Predicted proteins (AA) set. Check match. If any ambiguity, do BLASTN at NCBI using relevant gene accession against scaffold accessions. If necessary view pairwise alignments using the SEAN tool (especially useful for aligning e.g clusters like and Met75Ca and Met75Cb annotated on genomic DNA with scaffold accessions.) Genes to which a CG can be assigned: Amy-p(FBgn0000079)=CG18640|FBan0018640|CT42306| BcDNA:GH03186 (FBgn0028502)=CG6446|FBan0006446|CT20096| Cyt-P450-rBF6-2(FBgn0013773)=CG10240 (invalid gene symbol in GadFly Cyp6a22) EG:190E7.1 (FBgn0024368)=CG18091|FBan0018091|CT40606|FBan0018091 EG:49E4.1 (FBgn0025392)=CG3064|FBan0003064|CT10298| (intron-exon structure has to be corrected) Grip75(FBgn0026431)=CG6176|FBan0006176|CT19386|FBan0006176 Gtp-bp(FBgn0010391)=CG2522|FBan0002522|CT3545|FBan0002522 l(2)03659(FBgn0010549)=CG11803|FBan0011803|CT8643|FBan0011803 l(2)k10201 (FBgn0016970) =CG8803|FBan0008803|CT8653| (translation starts further upstream w/ CTG start than annotated on CG8803) l(2)tid(FBgn0002174)=CG5504|FBan0005504|CT17450|FBan0005504 (note: CG5504 is erroneously annotated as l(2)dtl in public view of GadFly; CG11295 is correctly annotated as l(2)dtl) Met75Cb(FBgn0028415)=CG18064|FBan0018064|CT40477|FBan0018064 ( note:CG18064 is incorrectly identified as Met75Ca; sequence corresponding to Met75Ca exists but is unannotated. This is by alignment of Acc.AJ249253 (which has both genes) to AE003520.) msopa (FBgn0004414)=CG14560|FBan0014560|CT34291 (several indels give rise to frameshift in CG protein product.) Mst98Ca (FBgn0002865)=CG11719|FBan0011719|CT5056| Mst98Cb (FBgn0004171)=CG18396|FBan0018396|CT41800| Nckx30C (FBgn0028704)=CG4106|FBan0004106|CT13634| prc (FBgn0028573)=CG5700|FBan0005700|CT17960|FBan0005700 (intron-exon structure has to be corrected) Prosα1 (FBgn0026781)=CG18495|FBan0018495|CT42044|CT42048| (note the 2 transcripts) Sgs7 (FBgn0003377)=CG18087|FBan0018087|CT40586| spen (FBgn0016977)=CG18497|FBan0018497|CT36331|CT42170| (CT36331 translation is very extremely truncated wrt known CDS, Missing from GadFly public view) ______________________________________________________________________________ Not as good a match as above, but good bet; save for reannotation if you wish: IM2 (FBgn0025583) is most likely CG18106|FBan0018106|CT40691 despite indel that causes difference in the middle of protein wrt translation of AF074003. This is one of 3 closely related genes close to each other (on AE003799.) ______________________________________________________________________________ Genes where there is a sequence match that is not annotated: BcDNA:GH02384 (FBgn0027612) there is matching sequence but is not annotated (matches AE003666: 223903(ATG)-222438(TAA) several exons; should be between CG11017 and CG2508) Orf5C (FBgn0011819) there is matching sequence but is not annotated (Orf5C is adjacent to Act5C) Ste (FBgn0003523) there is matching sequence but is not annotated (it should be in region next to ben (ben correctly annotated as CG18319 but no neighbors showing in GadFly; note:'Ste locus contains two major size classes of a tandemly repeated gene'; the 2 accessions in FlyBase align near each other) unknown-telomeric-protein-gene FBgn0027101 there is matching sequence but is not annotated (AF103941 aligns with coordinates 42123-42832 of AE003163) ______________________________________________________________________________ Ambiguous (no good GadFly BLAST hits, no good uninterrupted BLASTN alignment to NCBI contigs, should be rechecked when reannotating) Crg-1 no CG annotated 3' end of Crg-1 transcripts align to Acc. AE003327 (this is a tiny contig) Dbp80 no CG annotated BLAST hits fall on 3 short scaffolds (AE003058, AE002987, AE003078) 1 scaffold has annotation for another gene, other 2 have no annotations Lcp6 no unambiguous matches (note this from FB and paper:The Lcp6 protein may be encoded by an extra copy of the 'Lcp-b' gene (Lcp65Ab3) present in the Canton S strain. i.e. it may not be in all strains.) _____________________________________________________________________________ Genes where the symbol has changed (valid symbol already has a CGnumber): BcDNA:GH03016 is a synonym of Rh50 CG7499 BcDNA:GH04413 is a synonym of Appl CG7727 BcDNA:GH08312 is a synonym of Sap-r CG12070 _____________________________________________________________________________ Merges: Ser5 is probably a fragment of Ser99Da (Ser5 and Ser6 are PCR fragments). (Will send alignment separately to MA and AG)