Does Chado contain information about sequence alignments?

A forum for asking questions about FlyBase related services. This forum is monitored by FlyBase staff members.
Forum rules
  • FlyBase staff members do their best to monitor this forum and answer questions when possible, however, any question which requires an answer from FlyBase is best sent through our contact page.
  • Non FlyBase persons are encouraged to answer questions posted here as well.

Does Chado contain information about sequence alignments?

Postby sandra » Fri Aug 22, 2008 7:11 am

Hi all,

I am wondering whether there is a way -using chado- to find out the following thing:

Given:
Dmel, chr arm, start, end

Query for:
The appropriate (homolog) region(s) of DNA and their chromosomal coordinates
in the other 11 Drosophila species (as a result from the AAA whole (gapped) genome alignments)?

Thanks a lot

-Sandra
User avatar
sandra
 
Posts: 1
Joined: Fri Aug 22, 2008 6:48 am

Re: Does Chado contain information about sequence alignments?

Postby robkulathinal » Thu Sep 04, 2008 11:52 am

Hi Sandra,

I hope that you are doing well. So let me get this straight: from a provided set of Dmel R5 genomic coordinates (chr start end), you are looking for corresponding genomic coordinates from all the other species?

At this moment, FlyBase does not contain mapped genome-to-genome alignments from AAA (either from the UCSC genome multi-Z alignments or the Mavid/Mercator alignment pipeline of Lior Pachter).

Chado does, however, include an inventory of 1-1 orthology relationships between proteins, taken from the AAA group. (These relationships were generated using a fuzzy blast criteria and filtered multiple alignments by the Mike Eisen and Andy Clark groups.) Since only those Dmel proteins that have a single ortholog in *each* species of the 6-species melanogaster subgroup (or the 12-species Drosophilid genus) are included, only a subset (albeit a large subset) of the entire proteome is represented.

So, if utilizing FlyBase, what you'd have to do is use a combination of perl and sql. First, attach your Dmel genomic coordinates to a protein. Second, use chado to pull out the orthologous protein, *if* it is represented. And lastly, extract the genomic coordinates of its encoded gene from the non-Dmel assembly.

Of course, the other option is to use the UCSC chain files to directly pull your the coordinates from the other species.

I hope that this helps.

Rob.
robkulathinal
 
Posts: 1
Joined: Tue Sep 02, 2008 2:11 pm

Re: Does Chado contain information about sequence alignments?

Postby caseybergman » Mon Oct 06, 2008 12:46 am

Hi Sandra -

Following up on Rob's suggestion, you can try the UCSC genome browser liftover tool <http://genome.ucsc.edu/cgi-bin/hgLiftOver>. This was originally used to map coordinates from different releases of the same species, but can also be used to map coordinates from D. mel to most of the other species. The Apr 2004 (dm2 = Release 4) version has more of the 12 genomes than the Apr 2006 (dm3 = Release 5) version, so you can liftover from R5->R4->genomeX for all species but D willistoni. You can then get sequences for these coordinates in each species using the Table Browser <http://genome.ucsc.edu/cgi-bin/hgTables?>. Alternatively, this can be done by downloading the relevant UCSC chain files and the liftOver utility.

Best,
Casey

Casey Bergman, Ph.D.
Faculty of Life Sciences
University of Manchester
Michael Smith Building
Oxford Road, M13 9PT
Manchester, UK
http://www.bioinf.manchester.ac.uk/bergman/
caseybergman
 
Posts: 4
Joined: Sun Oct 05, 2008 1:42 pm


Return to Questions to FlyBase

cron