query performance anomoly

A forum for discussing Power User related features of FlyBase such as using Chado, GFF, FASTA files, etc...

query performance anomoly

Postby malcook » Fri May 29, 2009 4:26 pm

Hi,

I am at a loss to explain why the example Clone Report query you provide as a Flybase Mapping Table (http://gmod.org/wiki/FlyBase_Field_Mapp ... one_Report) for "(Computed) gene" is so FAST, but when I make a TINY CHANGE to the query, and replace "uniquename" with "name", it is now so SLOW. I have examined the indices on the underlying tables and don't understand why there is such a radical change in postresql's query plan that yields this performance degradation.

Any help explaining why would be much appreciated.

To be clear.

Here is the FAST QUERY from your pages:

SELECT DISTINCT g.name
FROM feature c, feature evidence, feature tr, feature g, feature_relationship fr1,
feature_relationship fr2, feature_relationship fr3, cvterm evidence_type, cvterm fr2_type, cvterm g_type
WHERE c.uniquename='FBcl0000001' AND evidence_type.name IN ('cDNA','EST') AND
fr2_type.name='supports' AND g_type.name = 'gene' AND
c.feature_id=fr1.object_id AND fr1.subject_id=evidence.feature_id AND
evidence.type_id=evidence_type.cvterm_id AND evidence.feature_id=fr2.subject_id AND
fr2.type_id=fr2_type.cvterm_id AND fr2.object_id=tr.feature_id AND
tr.feature_id=fr3.subject_id AND fr3.object_id=g.feature_id AND
g.type_id=g_type.cvterm_id;

and here is the slow query (with a very different query plan)

SELECT DISTINCT g.name
FROM feature c, feature evidence, feature tr, feature g, feature_relationship fr1,
feature_relationship fr2, feature_relationship fr3, cvterm evidence_type, cvterm fr2_type, cvterm g_type
WHERE c.name='UUGC0315' AND evidence_type.name IN ('cDNA','EST') AND
fr2_type.name='supports' AND g_type.name = 'gene' AND
c.feature_id=fr1.object_id AND fr1.subject_id=evidence.feature_id AND
evidence.type_id=evidence_type.cvterm_id AND evidence.feature_id=fr2.subject_id AND
fr2.type_id=fr2_type.cvterm_id AND fr2.object_id=tr.feature_id AND
tr.feature_id=fr3.subject_id AND fr3.object_id=g.feature_id AND
g.type_id=g_type.cvterm_id;


I am trying to assist a colleague to compose a query that associates clone identifiers, clone IDs (FBclxxxx) with flybase gene names

Thanks,

Malcolm Cook
Malcolm Cook - Stowers Institute for Medical Research
malcook
 
Posts: 8
Joined: Mon Mar 31, 2008 12:19 pm
Location: kansas city

Re: query performance anomoly

Postby Josh Goodman » Fri May 29, 2009 9:27 pm

Hi Malcom,

I've seen the query planner do some strange things before but this one is really bizarre. Here is a first attempt to force the query planner to make better use of the indices. This works but I'm not happy with the fact that the original query had to be twisted around so much.

Josh

Code: Select all
SELECT (SELECT DISTINCT g.name
        FROM   feature evidence,
               feature tr,
               feature g,
               feature_relationship fr1,
               feature_relationship fr2,
               feature_relationship fr3,
               cvterm evidence_type,
               cvterm fr2_type,
               cvterm g_type
        WHERE  evidence_type.name IN ('cDNA','EST')
               AND fr2_type.name = 'supports'
               AND g_type.name = 'gene'
               AND c.feature_id = fr1.object_id
               AND fr1.subject_id = evidence.feature_id
               AND evidence.type_id = evidence_type.cvterm_id
               AND evidence.feature_id = fr2.subject_id
               AND fr2.type_id = fr2_type.cvterm_id
               AND fr2.object_id = tr.feature_id
               AND tr.feature_id = fr3.subject_id
               AND fr3.object_id = g.feature_id
               AND g.type_id = g_type.cvterm_id)
FROM   feature c
WHERE  c.name = 'UUGC0315';
Josh Goodman
Site Admin
 
Posts: 64
Joined: Mon Nov 26, 2007 2:39 pm


Return to Power Users