by robkulathinal » Thu Sep 04, 2008 11:52 am
Hi Sandra,
I hope that you are doing well. So let me get this straight: from a provided set of Dmel R5 genomic coordinates (chr start end), you are looking for corresponding genomic coordinates from all the other species?
At this moment, FlyBase does not contain mapped genome-to-genome alignments from AAA (either from the UCSC genome multi-Z alignments or the Mavid/Mercator alignment pipeline of Lior Pachter).
Chado does, however, include an inventory of 1-1 orthology relationships between proteins, taken from the AAA group. (These relationships were generated using a fuzzy blast criteria and filtered multiple alignments by the Mike Eisen and Andy Clark groups.) Since only those Dmel proteins that have a single ortholog in *each* species of the 6-species melanogaster subgroup (or the 12-species Drosophilid genus) are included, only a subset (albeit a large subset) of the entire proteome is represented.
So, if utilizing FlyBase, what you'd have to do is use a combination of perl and sql. First, attach your Dmel genomic coordinates to a protein. Second, use chado to pull out the orthologous protein, *if* it is represented. And lastly, extract the genomic coordinates of its encoded gene from the non-Dmel assembly.
Of course, the other option is to use the UCSC chain files to directly pull your the coordinates from the other species.
I hope that this helps.
Rob.