Here are the matches of the nc2 class to CGs where possible, and to the scaffolds, where there was not CG. cheers, sima > nc2: genes with no SWP but with either TREMBL or PIR (47). Quite a > few of these have a PIR entry and no nucleic acid one -- no idea > what that means. Again some of these (commented) have already been > looked at. > X-Sun-Data-Name: nc2 > Acp98AB - Cat 1 residue =new exon in first intron of CG12879 CT32023 FBan0012879 FBgn0039540 >gb|AE003762.1|AE003762 Drosophila melanogaster genomic scaffold 142000013386035 section 87 of 105, complete sequence Length = 230995 Score = 52.3 bits (123), Expect = 2e-07 Identities = 24/26 (92%), Positives = 24/26 (92%) Frame = +2 Query: 2 EFPNPVLSRIGRSLRTNKGTHYQRMT 27 E NPVLSRIGRSLRTNKGTHYQRMT Sbjct: 33911 EVTNPVLSRIGRSLRTNKGTHYQRMT 33988 Database: Drosophila Genome Posted date: Apr 10, 2000 4:44 PM Number of letters in database: 122,680,987 Number of sequences in database: 1181 > Acph-1 (438aa) =CG7899|FBan0007899|CT6619|FBan0007899 last_updated:000321 (438aa) 95% id over 438aa > anon-63BC-T3 O18384(31aa) by BLASTP against predicted proteins at BDGP =CG14965|FBan0014965|CT34811|FBan0014965 last_updated:000321 (537aa) Score = 65 (22.9 bits), Expect = 0.049, P = 0.048 Identities = 11/11 (100%), Positives = 11/11 (100%) Query: 21 MHFFKFPVKDP 31 MHFFKFPVKDP Sbjct: 1 MHFFKFPVKDP 11 missing first 20aa in CG > anon-67Ea - Cat 9 residue, BLASTN finds CGless seq CAA48385(170aa) by BLASTP against predicted proteins at BDGP =no protein in GadFly GenBank:X68309(2578bp) by BLASTN against all at BDGP =gb|AE003547|Drosophila melanogaster genomic scaffold, 142000013386050 section 34 of 54, complete sequence very approximately (need sim4 alignment) ~1987-900 = 254926-256070, 163-865 = 256739-256036, 2568-1973 = 254288-254883, 164-68 = 256797-256893, 71-1 = 257060-257130 > Crlbp S65395(11aa) by Pattern Search against predicted proteins at BDGP =FBan0001668=CG1668|FBan0012467|CT4586=Pbprp2=FBgn0011280 > ect A61047(280aa) by BLASTP against predicted proteins at BDGP =CG11965|FBan0011965|CT37129|FBan0011965 last_updated:000321 (429aa) Score = 168 (59.1 bits), Expect = 5.1e-19, Sum P(2) = 5.1e-19 Identities = 38/50 (76%), Positives = 38/50 (76%) Query: 1 MKFVIXXXXXXXXXXQIEASPLQRLSRSERAVSPQQTVNNEVAPAVRPAS 50 MKFVI QIEASPLQRLSRSERAVS QQTVNNEVAPAV PAS Sbjct: 1 MKFVILLCVLSALLLQIEASPLQRLSRSERAVSRQQTVNNEVAPAVAPAS 50 Score = 75 (26.4 bits), Expect = 5.1e-19, Sum P(2) = 5.1e-19 Identities = 17/41 (41%), Positives = 17/41 (41%) Query: 155 LDEXXXXXXXXXXXGISDGVDLPAESXXXXXXXXXXXXXDE 195 LDE GISDGVDLPAES DE Sbjct: 155 LDEVAPAAASSVSAGISDGVDLPAESGNAAEVLAAGNAADE 195 > EG:34F3.2 - part of CG12467 O77432(504aa) by BLASTP against predicted proteins at BDGP =CG12467|FBan0012467|CT32681|FBan0012467 last_updated:000321 (1376aa) 77-524 of CG12467 corresponds to 38-485 of EG:34F3.2 40-47 of ' 1-8 ' > Gprk3 C41615(55aa) by BLASTP against predicted proteins at BDGP =CG8224|FBan0008224|CT8241|FBan0008224 last_updated:000321 (601aa) Score = 268 (94.3 bits), Expect = 8.0e-24, P = 8.0e-24 Identities = 51/55 (92%), Positives = 54/55 (98%) Query: 1 VVYRDLKSKNILVKSNLSCAIGDLGLAVRHVEKNDSVDIPSTHRVGTKRYMAPEI 55 + +RDLKSKNILVKSNLSCAIGDLGLAVRHVEKNDSVDIPSTHRVGTKRYMAPE+ Sbjct: 427 IAHRDLKSKNILVKSNLSCAIGDLGLAVRHVEKNDSVDIPSTHRVGTKRYMAPEV 481 > Gprk4 D41615(49aa) by BLASTP against predicted proteins at BDGP =CG3051|FBan0003051|CT10258|FBan0003051 last_updated:000321 (582aa) Score = 222 (78.1 bits), Expect = 7.1e-19, P = 7.1e-19 Identities = 43/50 (86%), Positives = 46/50 (92%) Query: 1 VVYRDLKPKNLLLDHNMHAKIADFGLSNMMLDGEFLRTSA-SPNYMAPEV 49 +V+RDLKP+NLLLDHNMH KIADFGLSNMMLDGEFLRTS SPNY APEV Sbjct: 147 IVHRDLKPENLLLDHNMHVKIADFGLSNMMLDGEFLRTSCGSPNYAAPEV 196 > GstD21 D46681(214aa) by BLASTP against predicted proteins at BDGP =CG4181|FBan0004181|CT13804|FBan0004181 last_updated:000321 (215aa) 214/214 (100%) id > GstD23 E46681(214aa) by BLASTP against predicted proteins at BDGP =CG11512|FBan0011512|CT36385|FBan0011512 last_updated:000321 (215aa) 213/214 (99%) id > GstD24 C46681(215aa) by BLASTP against predicted proteins at BDGP =CG12242|FBan0012242|CT13898|FBan0012242 last_updated:000321 (210aa) 209/215 (97%) id (gap of 6aa in subject) > GstD25 B46681(214aa) by BLASTP against predicted proteins at BDGP =CG4423|FBan0004423|CT13946|FBan0004423 last_updated:000321 (215aa) 214/214 (100%) id > GstD26 H46681(170aa) by BLASTP against predicted proteins at BDGP =CG4371|FBan0004371|CT13952|FBan0004371 last_updated:000321 (224aa) 170/170 (100%) id > GstD27 JQ1378(212aa) by BLASTP against predicted proteins at BDGP =CG4421|FBan0004421|CT13954|FBan0004421 last_updated:000321 (212aa) 211/212 (99%) id > Gta D46036(150aa) by BLASTP against predicted proteins at BDGP =CG4268|FBan0004268|CT13850|FBan0004268 last_updated:000321 (952aa) Score = 705 (248.2 bits), Expect = 5.5e-71, P = 5.5e-71 Identities = 147/152 (96%), Positives = 147/152 (96%) Query: 1 VVYRAKDKRTNEIEALKRLKMEKEKEGFPITSRREINTLLKGGQHPNIVTVREIVVGSNM 60 VVYRAKDKRTNEI ALKRLKMEKEKEGFPITS REINTLLKG QHPNIVTVREIVVGSNM Sbjct: 571 VVYRAKDKRTNEIVALKRLKMEKEKEGFPITSLREINTLLKG-QHPNIVTVREIVVGSNM 629 Query: 61 DKIFIVMDYVEHDLKSLMETMKNRKQSFFPGEVKCLTQQL-RAVAHLHDN-ILHRDLKTS 118 DKIFIVMDYVEHDLKSLMETMKNRKQSFFPGEVKCLTQQL RAVAHLHDN ILHRDLKTS Sbjct: 630 DKIFIVMDYVEHDLKSLMETMKNRKQSFFPGEVKCLTQQLLRAVAHLHDNWILHRDLKTS 689 Query: 119 NLLLSHKGILKVGDFGLAREYGSPIKK-TSLVV 150 NLLLSHKGILKVGDFGLAREYGSPIKK TSLVV Sbjct: 690 NLLLSHKGILKVGDFGLAREYGSPIKKYTSLVV 722 > Hmx Q24016(34aa) by BLASTP against predicted proteins at BDGP =CG5832|FBan0005832|CT18293|FBan0005832 last_updated:000321 (263aa) Score = 169 (59.5 bits), Expect = 3.5e-14, P = 3.5e-14 Identities = 34/34 (100%), Positives = 34/34 (100%) Query: 1 FDLKRYLSSSERAGLAASLRLTETQVKIWFQNRR 34 FDLKRYLSSSERAGLAASLRLTETQVKIWFQNRR Sbjct: 158 FDLKRYLSSSERAGLAASLRLTETQVKIWFQNRR 191 > Hrb85CD - probably same gene as Hrb87F Q24486(386aa) by BLASTP against predicted proteins at BDGP =CG12749|FBan0012749|CT27250|FBan0012749 last_updated:000321 (385aa) 198/198 (100%) id > ImpE3 A61046(331aa) by BLASTP against predicted proteins at BDGP =CG2723|FBan0002723|CT9257|FBan0002723 last_updated:000321 (325aa) 188/191 (98%) id > l(2)rot - Cat 2 residue, BLASTN finds CGless seq S42089(542aa) by BLASTP against predicted proteins at BDGP =no proteins in GadFly X95246(2818bp) by BLASTN against all at BDGP =overlaps on opposite strand CG5504 CG5504|FBan0005504|CT17450|FBan0005504 last_updated:000321 (1635bp) very approximately (need sim4 alignment) ~2632-1240, 1100-843 = 1-1365, 1378-1635bp of CG =gadfly| SEG:AE003461 |gb|AE003461|Drosophila melanogaster genomic scaffold 142000013386038 section 10 of 15, complete sequence. 2818-1 = 228045-230862 of scaffold > l(3)73Ah PIR:JC4296 (222aa) by BLASTP against predicted proteins at BDGP =CG4195|FBan0004195|CT13794|FBan0004195 last_updated:000321 (222aa) 221/222 (99%) id > Lcp6 - Cat 1 residue, BLASTN finds CGless seq SPTREMBL:P92184 (104aa) by BLASTP against predicted proteins at BDGP =no proteins in GadFly U84756(473bp) by BLASTN against all at BDGP =gb|AE003563|Drosophila melanogaster genomic scaffold 142000013386050 section 50 of 54, complete sequence. very approximately (need sim4 alignment) 330-13 = 242870-243193, poor but 462-334 = 27439-27567 > Lcp65Ab1 - Cat 2 residue, BLASTN finds CGless seq SPTREMBL:P92192 (104aa) by BLASTP against predicted proteins at BDGP =no proteins in GadFly U84747(947bp) by BLASTN against all at BDGP =gb|AE003563|Drosophila melanogaster genomic scaffold 142000013386050 section 50 of 54, complete sequence. very approximately (need sim4 alignment) 946-1 = 242514-243462 > Lcp65Ab2 - Cat 2 residue, BLASTN finds CGless seq SPTREMBL:P92192 (104aa) by BLASTP against predicted proteins at BDGP =no proteins in GadFly U84746(855bp) by BLASTN against all at BDGP =gb|AE003563|Drosophila melanogaster genomic scaffold 142000013386050 section 50 of 54, complete sequence. very approximately (need sim4 alignment) 855-1 = 245477-246316 > Mat89Ba SPTREMBL:Q27924 (545aa) by BLASTP against predicted proteins at BDGP =CG6814|FBan0006814|CT21143|FBan0006814 last_updated:000321 (689aa) CG has extra 27aa at N-term, a few polymorphisms > Mst40 - possibly untranslated? (Aubrey Apr 4) SPTREMBL:Q24437 (23aa) by BLASTP against predicted proteins at BDGP =no proteins in GadFly Z22588(1387bp) by BLASTN against all at BDGP =gb|AE003466|Drosophila melanogaster genomic scaffold 142000013386038 section 15 of 15, complete sequence. very approximately (need sim4 alignment) 1-1387 = 161926-163314 1-799 = 163309-164114, ~800-1387 = 164130-164722 (two closely related genes near each other) > NaCP37B - Cat 1 residue SPTREMBL:Q24308 (113aa) by BLASTP against predicted proteins at BDGP =no proteins in GadFly X84408(384bp) by BLASTN against all at BDGP =CG9071|FBan0009071|CT24831|FBan0009071 last_updated:000321 (7375bp) very approximately (need sim4 alignment) 1-240 = 6438-6677 of CG, 241-377 = 6700-6823 of CG > ph-d PIR:S23632 (1589aa) by BLASTP against predicted proteins at BDGP =CG3895|FBan0003895|CT12875|FBan0003895 last_updated:000321 (1211aa) from 130-594 of CG3895, CG missing exon, then a few other polymorphisms till end of CG, then =CG18414|FBan0018414|CT41888|FBan0018414 last_updated:000321 (290aa) from 1300 to 1589 of S23632 for all of CG > Pk1 SPTREMBL:Q24057 (36aa) by BLASTP against predicted proteins at BDGP =no proteins in GadFly U23827(110bp) by BLASTN against all at BDGP =gb|AE002760|Drosophila melanogaster genomic scaffold 142000013386034, complete sequence. 1-110 = 24362-24471 >PpD19 SPTREMBL:Q26247 (24aa) by BLASTP against predicted proteins at BDGP =CG10930|FBan0010930|CT30615|FBan0010930 last_updated:000321 (314aa) 24/24 (100%) id > PpD3 PIR:AAB22463 (25aa) by BLASTP against predicted proteins at BDGP =CG8402|FBan0008402|CT24679|FBan0008402 last_updated:000321 25/25 (100%) id > PpD33 PIR:AAB22469 (24aa) by BLASTP against predicted proteins at BDGP =CG9842|FBan0009842|CT27780|FBan0009842 last_updated:000321 (570aa) 24/24 (100%) id > PpD5 PIR:AAB22464 (24aa) by BLASTP against predicted proteins at BDGP =CG10138|FBan0010138|CT10817|FBan0010138 last_updated:000321 (346aa) 24/24 (100%) id > PpD6 PIR:AAB22465 (24aa) by BLASTP against predicted proteins at BDGP =CG8822|FBan0008822|CT25390|FBan0008822 last_updated:000321 (336aa) 24/24 (100%) id >prd3 PIR:AAA28839 (38aa) by BLASTP against predicted proteins at BDGP =no protein in GadFly M14551(114bp) by BLASTN against all at BDGP =CG10037|FBan0010037|CT28091|FBan0010037 last_updated:000321 (2556bp) 1-114 = 2122-2235 > Rbp10 PIR:I48110 (44aa) by BLASTP against predicted proteins at BDGP =CG3151|FBan0003151|CT38165|FBan0003151 last_updated:000321 (68aa) =CG3151|FBan0003151|CT10570|FBan0003151 last_updated:000321 (673aa) 43/43 (100%) id, 114-156 of CG > Rbp11 PIR:A47752 (39aa) by BLASTP against predicted proteins at BDGP =no protein in GadFly closest match: CG17136|FBan0017136|CT38058|FBan0017136 last_updated:000321 Length = 135 Score = 152 (53.5 bits), Expect = 2.2e-12, P = 2.2e-12 Identities = 29/39 (74%), Positives = 30/39 (76%) Query: 1 FVGNLAPRRSKPRDRSAFAKYGPLRNVWVARNPPGFAFV 39 +VGNL SK AFAKYGPLRNVWVARNPPGFAFV Sbjct: 14 YVGNLGSSASKHEIEGAFAKYGPLRNVWVARNPPGFAFV 52 S51740(117bp) by BLASTN agianst all at BDGP =CG17136|FBan0017136|CT38058|FBan0017136 last_updated:000321 (511bp) Plus Strand HSPs: Score = 490 (73.5 bits), Expect = 2.3e-16, P = 2.3e-16 Identities = 110/117 (94%), Positives = 110/117 (94%), Strand = Plus / Plus Query: 1 TTCGTGGGGAACCTGG-CTCCTCGGCGCTCCAAGCCACGAGATAGAAG-CGCATTTGCCA 58 T CGTGGG AACCTGG CTCCTCGGCG TCCAAGC ACGAGATAGAAG CGCATTTGCCA Sbjct: 93 TACGTGGGAAACCTGGGCTCCTCGGCG-TCCAAGC-ACGAGATAGAAGGCGCATTTGCCA 150 Query: 59 AATATGGACCCCTGCGAAACGTGTGGGTGGCCCGCAATCCACCAGGGTTCGCTTTCGTC 117 AATATGGACCCCTGCGAAACGTGTGGGTGGCCCGCAATCCACCAGG TTCGC TT GTC Sbjct: 151 AATATGGACCCCTGCGAAACGTGTGGGTGGCCCGCAATCCACCAGGTTTCGCCTTTGTC 209 > Rbp12 PIR:B47752 (41aa) by BLASTP against predicted proteins at BDGP =CG5422|FBan0005422|CT17178|FBan0005422 last_updated:000321 (464aa) =CG5422|FBan0005422|CT17194|FBan0005422 last_updated:000321 (464aa) 39/41 (95%) id > Rbp2 PIR:B48110 (43aa) by BLASTP against predicted proteins at BDGP =CG4429|FBan0004429|CT14414|FBan0004429 last_updated:000321 (325aa) 40/43 (93%) id, 32-74 of CG > Rbp3 PIR:C48110 (44aa) by BLASTP against predicted proteins at BDGP =CG17791|FBan0017791|CT39414|FBan0017791 last_updated:000321 (378aa) 31/32 (96%) id, 149-180 of CG > Rbp5 PIR:E48110 (48aa) by BLASTP against predicted proteins at BDGP =no protein in GadFly S51706(144bp) by BLASTN against all at BDGP overlaps opposite strand of CG3373|FBan0003373|CT11347|FBan0003373 last_updated:000321 (2291bp) Minus Strand HSPs: Score = 635 (95.3 bits), Expect = 2.6e-22, P = 2.6e-22 Identities = 135/141 (95%), Positives = 135/141 (95%), Strand = Minus / Plus Query: 143 ACGAAACCGAA-CCCACTACCAAGACGACAACTACTACAACCACGCCAAAGCCCACTACC 85 AC A CCGAA CCCACTACCAAGACGACAACTACTACAACCACGCCAAAGCCCACTACC Sbjct: 1470 ACCACGCCGAAACCCACTACCAAGACGACAACTACTACAACCACGCCAAAGCCCACTACC 1529 Query: 84 ACAACGACAACCAAGAAGCCAACGACCACTACCACCACTACGACAACCACGCCGAAGCCA 25 ACAACGACAACCAAGAAGCCAACGACCACTACCACCACTACGACAACCACGCCGAAGCCA Sbjct: 1530 ACAACGACAACCAAGAAGCCAACGACCACTACCACCACTACGACAACCACGCCGAAGCCA 1589 Query: 24 ACAACTACTTAGTCCGCCAACA 3 ACAACTACT AG CCGCC ACA Sbjct: 1590 ACAACTACTAAG-CCGCCGACA 1610 > Rbp6 PIR:F48110 (44aa) by BLASTP against predicted proteins at BDGP =no protein in GadFly S51715(132bp) by BLASTN against all at BDGP =gb|AE003524|Drosophila melanogaster genomic scaffold 142000013386050 section 11 of 54, complete sequence. Plus Strand HSPs: Score = 430 (64.5 bits), Expect = 9.1e-13, P = 9.1e-13 Identities = 90/95 (94%), Positives = 90/95 (94%), Strand = Plus / Plus Query: 30 TCCAGAGAGCTTACGCGATTACTTCGGACGTTACGGTGATATCTCAGAGGCTATGGTCAT 89 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 314913 TTCAGAGAGCTTACGCGATTACTTCGGACGTTACGGTGATATCTCAGAGGCTATGGTCAT 314972 Query: 90 GAAGGATCCCACGACGCGCAGATCCAGAGGTTTTG 124 ||||||||||||||||||||||||||| | ||| Sbjct: 314973 GAAGGATCCCACGACGCGCAGATCCAGGTGAGTTG 315007 > Rbp7 PIR:G48110 (44aa) by BLASTP against predicted proteins at BDGP =CG10377|FBan0010377|CT29098|FBan0010377 last_updated:000321 (421aa) 44/44 (100%) id, 10-53 of CG > Rbp8 PIR:H48110 (36aa) by BLASTP against predicted proteins at BDGP =CG10851|FBan0010851|CT41998|FBan0010851 last_updated:000321 (132aa) =CG10851|FBan0010851|CT30371|FBan0010851 last_updated:000321 (329aa) 31/36 (86%) id > repo - part of CG8045 PIR:A54282 (612aa) by BLASTP against predicted proteins at BDGP =CG8045|FBan0008045|CT24072|FBan0008045 last_updated:000321 (612aa) > Rya-r76CD - misassigned to CG10844 PIR:B49131 (508aa) by BLASTP against predicted proteins at BDGP =CG10844|FBan0010844|CT30357|FBan0010844 last_updated:000321 (5107aa) C terminal 4599-5107aa of CG Z18536(1528bp) by BLASTN against all at BDGP =CG10844|FBan0010844|CT30357|FBan0010844 last_updated:000321 (15606bp) 1-1528 = 13794-15325