Subject: FlyBase query Dear Dr. Goldstein, I am curating your paper for FlyBase: Bowman et al., 1999, J. Cell Biol. 146(1): 165--180 I have a couple of questions about the 'l(2)k10408' line. In your paper you discovered that this line is actually a deficiency that takes out robl and several other genes. This line is one of the Berkeley P-element lines and according to our files has a P{lacW} insertion at 54C. Do you know where this P{lacW} insertion is in relation to the deficiency that you discovered in the line ? Is the P{lacW} insertion at one of the deficiency breakpoints or is the deficiency a separate event from the P{lacW} insertion ? Do you know which gene the P{lacW} insertion is in - from looking at Figure 1 of your paper, if the P{lacW} insertion is at one of the deficiency breakpoints it looks to me a though it might be in the transcription unit of 'GENE 1' ? Any information you provide about this line would be recorded in FlyBase as a personal communication from you to FlyBase, I look forward to hearing from you, Gillian \-------------------------------------------------------------- Gillian Millburn. FlyBase (Cambridge), \-------------------------------------------------------------- > Subject: Re: FlyBase query Regarding your inquiry to our paper 'Bowman et al., 1999, J. Cell Biol. 146(1): 165--180' and the 'l(2)k10408' line. We found that the p-element is located at the deficiency break point. The p-element is oriented such that the sequence off the 3' end of the p-element (using the oientation defined by berkeley) is the sequence to the left of the breakpoint indicated in figure 1 of our paper. While the sequence flanking the 5' end of the p-element reads past the distal (right) break point (this break point is not shown in our paper). I will attempt to diagram this below (which is in the same orientation as figure 1): \--gene 1 --| (proximal/left break) 3' ..... l(2)k10408............5' (distal/right break) |---------- The closest gene we found to the right of the p-element (and thus also to the riht of the deficiecy) is a gene related to a '5'-nucleotidase precursor' and the closest gene at the right/distal break (which is within the deficiency) is p62GAP (which is also called QKR54B). The deficiency associated with robl k10408 is about 36,000 basepairs. And has the aproximate genomic structure as follows. gene 1, (breakpoint),-gene 2, gene 3, robl, gene 4, QKR54B, (breakpoint), 5' nucleotidase precursor, ...... Also I have the following information regarding the identity of these genes. gene 1 is a putative novel g-protein coupled receptor and is also encoded by EST (GM02553). Also the left breakpoint of l(2)k10408 is about 30 basepairs to the right of the putative start codon for this gene. gene 2 is a putative Trypsin-like Serine Protease gene 3 is a robl-like gene which we believe may be a pseudo-gene. gene 4 is a transposase-like gene which is represented by multiple ESTs. It is also unusual in that it appears to have a ~5kb insertion within the middle of the gene encoding a B104 retrotransposon and sequence which is found in multiple sites along the drosophila genome (e.g. 35A). This sequence appears to have been distributed throughout the drosophila genome perhaps by the B104 retrotranspon which appears to flank this sequence at all its locations within the genome. Gene 4 appears to splice over this unusual B104 retrotransposon containing sequence. QKR54B has been described in the literature. I hope this information helps. If you have any other questions let me know. Sincerely, Aaron Bowman > >>Dear Dr. Goldstein, >> >>I am curating your paper for FlyBase: >> >>Bowman et al., 1999, J. Cell Biol. 146(1): 165--180 >> >>I have a couple of questions about the 'l(2)k10408' line. >> >>In your paper you discovered that this line is actually a deficiency >>that takes out robl and several other genes. This line is one of the >>Berkeley P-element lines and according to our files has a P{lacW} >>insertion at 54C. Do you know where this P{lacW} insertion is in >>relation to the deficiency that you discovered in the line ? Is the >>P{lacW} insertion at one of the deficiency breakpoints or is the >>deficiency a separate event from the P{lacW} insertion ? Do you know >>which gene the P{lacW} insertion is in - from looking at Figure 1 of >>your paper, if the P{lacW} insertion is at one of the deficiency >>breakpoints it looks to me a though it might be in the transcription >>unit of 'GENE 1' ? >> >>Any information you provide about this line would be recorded in >>FlyBase as a personal communication from you to FlyBase, >> >>I look forward to hearing from you, >> >>Gillian >> >>-------------------------------------------------------------- >>Gillian Millburn. >> >>FlyBase (Cambridge), >>-------------------------------------------------------------- > Aaron Bowman > Subject: Re: FlyBase query Dear Aaron, thankyou for your reply, its really great and will help clarify things in the database. I will record the information in it as a personal communication from you to FlyBase. I have a few questions about the 'k10408' deficiency just to clarify things further. 1. do you have names/symbols for 'gene 1', 'gene 2' etc. that are in figure 1 of your paper. At the moment I have given them an 'anonymous' designation of the format 'anon-54Ba', 'anon-54Bb' etc. - this is what we do with transcripts that are shown on a molecular map but aren't really named (and then when they are named properly we change the name to the 'proper' name). However if you already have symbols for them then I could use those. Also do you have a symbol for the '5' nucleotidase precursor' that is to the right of the deficiency ? For gene 1, since you gave me an EST number, then I can call the gene ' BEST:GM02553 ' for 'Berkeley EST GM02553' as a temporary name until it gets a 'proper' symbol. 2. Could you tell me which ESTs match gene 4 ? 3. >QKR54B has been described in the literature. I'm not sure if I've managed to find which gene you mean - is it 'qrk54B' - 'quaking related 54B' which was described in: Fyrberg et al., 1998, Biochem. Genet. 36(1-2): 51--64 Gillian > Subject: Re: FlyBase query I've pasted each of yor questions below; with an answer below each. Aaron One note before I start, all the genes shown in the figure are contained (atleast in part) within the genomic sequence that has been submitted to genbank for the robl genomic interval (Accession \# AF141921) >1. do you have names/symbols for 'gene 1', 'gene 2' etc. that are in >figure 1 of your paper. At the moment I have given them an 'anonymous' >designation of the format 'anon-54Ba', 'anon-54Bb' etc. - this is what >we do with transcripts that are shown on a molecular map but aren't >really named (and then when they are named properly we change the name >to the 'proper' name). However if you already have symbols for them >then I could use those. I don't actually have any names or symbols for the genes. As they just show similarity to non-drosophila genes, but don't seem to be exact homologs of any specific named gene. Given the similarity of gene3 to roadblock, perhaps I should chose a name for this one. How about : roadblock similar 54B (robls54B) Also it should be noted that we have not identified transcripts for Gene2 and Gene3 (though we haven't really tried either); their existence as 'genes' is based soley on homology between genomic sequence and other genes. >Also do you have a symbol for the '5' nucleotidase precursor' that is >to the right of the deficiency ? I do not, but apparently other 5'-Nucleotidase Precursors (from Rat and Human and other species are given the symbol : (5'-NT) or (CD73 Antigen) or just (CD73)) >For gene 1, since you gave me an EST number, then I can call the gene >' BEST:GM02553 ' for 'Berkeley EST GM02553' as a temporary name until it >gets a 'proper' symbol. ok, please note I have sequenced this entire EST. This sequence has not been deposited anywhere. I'll paste the sequence of this gene at the end of this message. >2. Could you tell me which ESTs match gene 4 ? There are a total of 7 ESTs which I had identified. A group of 5 overlap with each other and are at the 5' of the gene (left most on my map in the paper). These then appear to overlap with another two ESTs which are at the 3' end of the gene (and are off the right side of my map). The ESTs are Group of 5 = LD10890, LD15733, LD09893, LD02955, LD03707 the other two are = LD13441 and GM03740 The order of the overlap is as follows : 5ESTS, LD13441, GM03740 sequencing of the 3' of LD03707 (on of the 5 ESTs) identified sequence overlapping with GM03740; which placed all these ESTs as belonging to the same gene. The EST sequence which seems to contain the most coding sequence (the basis of saying the gene encodes a transposase-like protein) is LD13441 so if you will be naming the gene based upon the ESTs this is probably the one to name the gene after. >3. QKR54B has been described in the literature. > >I'm not sure if I've managed to find which gene you mean - is it >'qrk54B' - 'quaking related 54B' which was described in: > >Fyrberg et al., 1998, Biochem. Genet. 36(1-2): 51--64 This is the correct gene. \-------------------------------------------------------------------------------------------- Sequence of EST GM02553 (1875 basepairs) CAGTCGGAAATGCGAATAGTTATTGGATCGTTCACCGCATTTCTTTTGCTGTTATTGCAAAACTCAAATGCCGAAATTCCCGGTTGCGACTTCTTCGACACCGTAGATATTTCAAAAGCGCCAAGATTCTCGAACGGATCGTACCTCTACGAAGGCTTGCTGATCCCCGCCCATTTGACAGCTGAATATGACTACAAGCTCCTGGCCGACGATTCGAAGGAGAAGGTGGCGAGCCACGTACGAGGATGTGCCTGCCACCTCAGGCCATGCATTCGGTTTTGTTGC Aaron Bowman > Subject: Re: FlyBase query Dear Aaron, thanks for your reply about the names of the genes in figure 1 of your paper and the extra info about ESTs etc. Here's what I think is the best solution for naming each gene: 1. Gene 1 - this will be called ' BEST:GM02553 ' for the EST that matches it. This name will be replaced once it is named 'properly' by someone. 2. Gene 2 - since no EST has been identified for this, it will get the 'anonymous' name 'anon-54Ba' until it is named properly. 3. Gene 3 - since this is similar to robl, I think the symbol and name you suggested 'roadblock similar 54B (robls54B)' is fine. I will record as a comment in your paper that this may be a pseudogene. 4. Gene 4 - I will name this ' BEST:LD13441 ' as you suggested since the EST LD13441 contains the most coding sequence for it. 5. 5' nucleotidase precursor. I am reluctant to name this after the rat and human genes. I think at the moment I will give the gene the valid symbol '5'-nucleotidase-precursor' , i.e. the symbol will be the same as the full name - we do this as a temporary measure when people give long names to genes but not a short symbol. Then when the gene gets a symbol the symbol can change. I hope all the symbols are OK with you. I'll curate the information in your messages as a personal communication from you to FlyBase. that's it, thanks for taking so much time over this, Gillian