FB2017_01, released February 14, 2017

A Database of Drosophila Genes & Genomes

TermLink and controlled vocabularies

You will have noticed the box near the top of the FlyBase Home Page called TermLink. What is this? Why should you care?

The information you retrieve from a FlyBase query - no matter how simple - is stored in a relational database. For data to be retrievable it must be 'structured', and a very important part of data structure is the language that FlyBase uses in its database.

Let us imagine that you, as a user, wanted all genes known to FlyBase to be expressed in the mouth parts of the larva. You may know these structures as the 'mouth parts', but others may, with equal validity, call them the 'cephalopharyngeal sclerites', or the 'cephalopharyngeal plates'. If FlyBase used each of these synonymous names with abandon then a simple query on 'mouth parts' would retrieve some, but by no means all of the relevant records. It makes your life much easier if FlyBase always uses the same term for these structures (we use cephalopharyngeal sclerites in this case). So, one aspect of the terms listed in TermLink is that they represent controlled vocabularies. The other names of the cephalopharyngeal sclerites are listed as synonyms.

But the cephalopharyngeal sclerites have other properties: they develop from the ventral ectoderm and they are part of the larval head. You might want to know what genes are expressed in the ventral ectoderm and its derivatives. On the other hand you might want to know the genes expressed in the larval head, in which case you would expect to retrieve those expressed in the cephalopharyngeal sclerites.

It is quite easy to allow these queries to be made if we organise the terms of the controlled vocabulary in a way that these relationships are made explicit: (the cephalopharyngeal sclerites develop from the ventral ectoderm, the cephalopharyngeal sclerites are part of the larval head).

This is what we do with the 'trees' you see in TermLink, for the anatomy of the fly, for its cellular components, and for its development. These trees look like classical hierarchies, in fact they are more complex because they also express relationships like 'develops from' or 'part of'.

By always using these controlled vocabulary terms in FlyBase, and by doing some simple computation on the trees, FlyBase offers a simple and powerful way in which complex data about genes, their alleles, transcripts, proteins etc. can be retrieved by the user.

But it is better than this. For the properties of gene products FlyBase shares with all of the other Model Organism Databases a structured controlled vocabulary of terms, The Gene Ontology. Gene Ontology terms can be accessed through TermLink by searching for a term (try kinase). This allows you to find all genes that encode transcription factors, or genes whose products are involved in the organization of the synapse. Since all the other model organisms use the Gene Ontology, you can ask exactly the same question of the mouse, zebrafish or worm, though for this you need to go to a more specialized browser like AmiGO.