Chado query HOWTO: Location information by annotation ID

A forum for discussing Power User related features of FlyBase such as using Chado, GFF, FASTA files, etc...

Chado query HOWTO: Location information by annotation ID

Postby Josh Goodman » Thu May 08, 2008 11:00 am

The purpose of this query is to fetch location information for genes when all you have is the annotation ID. Please note that the annotation IDs are distinct from what we consider FlyBase IDs (FBgn0000490). Annotation IDs in Dmel start with CG or CR and are followed by a numeric suffix (eg. CG9885). Annotation IDs in the other species have different 2 letter prefixes (see Gene and Annotation IDs for more information).

Given an annotation ID this query returns the FlyBase ID, gene symbol, the supplied annotation ID, the chromosome the gene is localized on, the start/stop and strand information, and the cytogenetic map.

The Query
Code: Select all
select gene.uniquename as ID, gene.name as symbol, dbx.accession as annotation_id,
       arm.uniquename as chromosome, (fl.fmin + 1 ) as start, fl.fmax as stop, fl.strand as strand,
       fp.value as cytogenetic_map

       from feature gene, feature arm, featureloc fl, featureprop fp, feature_dbxref fdbx,
            dbxref dbx, db, cvterm fp_type, cvterm gene_type

       where dbx.accession='CG9885' and fdbx.is_current=true and db.name='FlyBase Annotation IDs' and
             gene_type.name='gene' and gene.is_analysis=false and gene.is_obsolete=false and fp_type.name='cyto_range' and
             dbx.db_id=db.db_id and dbx.dbxref_id=fdbx.dbxref_id and fdbx.feature_id=gene.feature_id and
             gene.type_id=gene_type.cvterm_id and gene.feature_id=fp.feature_id and fp.type_id=fp_type.cvterm_id and
             gene.feature_id=fl.feature_id and fl.srcfeature_id=arm.feature_id;


If you prefer more of a batch mode you can do this
Code: Select all
select gene.uniquename as ID, gene.name as symbol, dbx.accession as annotation_id,
       arm.uniquename as chromosome, (fl.fmin + 1 ) as start, fl.fmax as stop, fl.strand as strand,
       fp.value as cytogenetic_map

       from feature gene, feature arm, featureloc fl, featureprop fp, feature_dbxref fdbx,
            dbxref dbx, db, cvterm fp_type, cvterm gene_type

       where dbx.accession ~ '^C[GR][0-9]+$' and fdbx.is_current=true and db.name='FlyBase Annotation IDs' and
             gene_type.name='gene' and gene.is_analysis=false and gene.is_obsolete=false and fp_type.name='cyto_range' and
             dbx.db_id=db.db_id and dbx.dbxref_id=fdbx.dbxref_id and fdbx.feature_id=gene.feature_id and
             gene.type_id=gene_type.cvterm_id and gene.feature_id=fp.feature_id and fp.type_id=fp_type.cvterm_id and
             gene.feature_id=fl.feature_id and fl.srcfeature_id=arm.feature_id;


Possible problems
This query only fetches information for records whose current annotation ID matches the supplied ID. To work around this you may want to try removing the fdbx.is_current=true filter. This will result in duplications so you should test for this and flag them for spot checks.
Josh Goodman
Site Admin
 
Posts: 64
Joined: Mon Nov 26, 2007 2:39 pm

Return to Power Users

cron