FB2014_06, released November 12th, 2014
 
 

A Database of Drosophila Genes & Genomes

Reference Manual F - Links to and from FlyBase

Last updated: 17 December 2009

FlyBase provides stable links to FlyBase for use by other databases, and links to other databases from FlyBase. Links to FlyBase data items, and links between data items in FlyBase and other databases are described in the sections that follow. Drosophila Resources includes a linked list of additional databases likely to be of interest to users of FlyBase.

F.1. FlyBase Identifier Numbers

FlyBase assigns unique identifier numbers to several classes of object within the database. One reason for this is to allow unambiguous cross-references both within FlyBase and between FlyBase and other databases.

FlyBase identifier numbers have the general form FBxxnnnnnnn where xx is an alphabetical code for the identifier class and nnnnnnn is a 7 digit number, padded with leading zeros.

The following classes of object are now publicly available in FlyBase data:

  • FBab - aberration
  • FBal - allele
  • FBba - balancer/genotype variant
  • FBcl - clone
  • FBgn - gene
  • FBim - image
  • FBmc - molecular construct
  • FBms - molecular segment
  • FBpp - polypeptide
  • FBrf - reference
  • FBst - stock
  • FBti - transposable element insertion
  • FBtp - transgenic construct or natural transposon
  • FBtr - transcript

Each object has a single Primary identifier number that is used to uniquely identify it in the database.

An object may also have any number of Secondary identifier numbers. If an object has a secondary identifier number, it generally indicates that at some point an entry has been merged with or split from other entries in the database. This may have occured due to more data becoming available in the literature or due to correction of previous errors in the database.

The rules for when primary identifier numbers become secondary are complex. Some examples are included below:

A merge:

If two entries A and B are found to refer to the same object, then a new primary identifier number will be given to the merged entry, and the old identifier numbers of entries A and B will be listed under this merged entry as secondary identifier numbers.

A split (case 1):

If one entry is found to correspond to two (or more) objects, e.g., entry A does, in fact, refer to objects X and Y, then X and Y, as new objects, each get new primary identifier numbers and the old primary identifier number of the suppressed entry A is listed as a secondary identifier number under both X and Y.

A split (case 2):

If one entry is found to correspond to two (or more) objects, e.g. entry A refers to objects A and X, then the existing entry for A and the new entry for X each get a new primary identifier number and the old primary identifier number of A is listed as a secondary identifier number under both A and X.

If an object is simply renamed, i.e. its valid symbol in FlyBase is changed without there being a merge or a split, its primary identifier number does not change.

The following classes of identifier were previously used in FlyBase, but are no longer in current use as identifier numbers in the database.

  • FBan - annotation

F.2. Links to external databases

FlyBase includes "pointers" to data kept by other databases in two different ways.

FlyBase-curated links

These are accession numbers that are incorporated into the FlyBase database, for sequence and certain other molecular data, and for reference data.

Linkouts

These links derive from linking tables that are maintained and provided to FlyBase by the external database. Linkouts are combined with FlyBase data for reporting on FlyBase web pages.

F.2.1. FlyBase-curated links

Accession numbers from the following databases are currently incorporated into FlyBase records as FlyBase-curated links:

  • DDBJ/EMBL/GenBank - the nucleic acid sequence databases of Japan, the U.S., and Europe
  • EPD - Eukaryotic Promoter Database (Bucher)
  • GPCRDB - The G protein-coupled receptor database
  • InterPro - a database of protein families, domains and functional sites
  • MEROPS - Protease database
  • miRBase - microRNA data
  • MitoDrome - A database of annotated Dmel nuclear genes encoding mitochondrial proteins.
  • PDB - Protein Data Bank (Brookhaven)
  • PubMed - biomedical literature citations and abstracts
  • Rfam - RNA families database of alignments and CMs
  • TRANSFAC - A database of transcription factors and their binding sites
  • UniProtKB/Swiss-Prot - UniProt Knowledgebase, Swiss-Prot section
  • UniProtKB/TrEMBL - UniProt Knowledgebase, TrEMBL section

F.2.2. Linkouts

FlyBase supports linkouts from any FlyBase object that has a stable FlyBase ID (e.g. FBxx[0-9]+) and a web report. Databases suitable for this kind of linking to FlyBase are those with mature data structures whose data are expressed in terms of FlyBase genetic objects that carry stable identifiers or as sequences that can be mapped to the reference sequence of a Drosophila species. FlyBase currently accepts linkout data in a simple spreadsheet table (see below), plus a summary record for the external database with link information and name. We are happy to consider additional linkout databases. Please contact us if you would like to contribute links to your database.

FlyBase-curated links and linkouts are displayed on the Report Pages in the most appropriate section of the Report. Linkouts are indicated by a Linkout label in parentheses after the field label. In addition, on the Gene Report, all FlyBase-curated links and linkouts are also grouped together in a single EXTERNAL CROSSREFERENCES & LINKOUTS section.

Linkout requirements

  • The linkout link targets (the web reports that the URLs redirect to) must provide data that isn't available in the FlyBase report.
  • Linkout links can only be established for the subset of FlyBase objects that you have additional data for. Links cannot lead to an error page, a blank report or a report that provides no additional data about the FlyBase object that is being linked from.
  • Linkout data must be updated once a year. Linkout data that has not been updated in over a year will be dropped from FlyBase.
  • FlyBase IDs must be validated using our ID Converter tool to ensure that you are using the current FlyBase IDs. Linkout links that refer to old FlyBase IDs will be automatically dropped.

FlyBase reserves the right to reject or remove linkouts if these requirements are not met.

How to establish linkouts

  1. Contact us with a brief description of your database and links to your website. Please be sure to include links to your main site as well as the report pages that you would like us to link to.
  2. If accepted, you will be given an FTP login account that will be used to upload your linkouts to FlyBase.
  3. Validate your FlyBase IDs using our ID Converter tool.
  4. Construct your linkout link table and database information files making sure that you meet the guidelines set forth in Linkout Requirements.
  5. Login to ftp.flybase.org using your FTP account and deposit the appropriately named files.
  6. Contact us to let us know that you have completed your submission.
  7. Update your links at least once a year from the time of your previous submission.

Please note that if you are establishing a single type of linkout between FlyBase and your site then only a single linking table and database information file is required. If you want to establish multiple types of linkouts then you need to submit a linking table and database information file for each type.

When will my linkouts appear in FlyBase?

FlyBase performs 10 releases a year. The exact dates of each release are posted in our forum. We generally skip one month in the summer and then December. In order for your linkouts to be included in any particular release we require that the necessary linkout files be uploaded to our FTP site no later than 3 weeks prior to our published release date.

Link table

The link table format is a simple 4 column tab delimited file. The description of the columns in order is show below. The filename of this file must use the form

<dbname>_linkout.txt

Replace <dbname> with the value used in column 2 of the same file.

Column 1 - FlyBase ID

A valid FlyBase ID matching this regular expression: '^FB\w\w\d+\t'

Column 2 - DBNAME

Some unique/standard name for external database. Alpha-numeric only '\w+'. If you are submitting more than one linking table you need to ensure that the DbName is unique to each file. Reusing a DbName once it is used in another linking table is not permitted.

For example, if a group named "FLYLAB" wanted to establish links between FlyBase gene reports and 2 different types of analysis on their web site they could use "FLYLAB_EX1" and "FLYLAB_EX2" for the DbName column in their linkout files.

Column 3 - DBID

External database object id. This field cannot contain spaces and is limited to 255 characters.

Column 4 - DBURL

Relative link to external database web report. This is the text that will be appended to the base URL parameter that is defined in the database information file.

Database information file

The database information file contains the DbName that it corresponds to, the base URL to use for linkout hyperlinks, the homepage URL for your site and a brief description of your database. The filename of this file must use the form

<dbname>_dbinfo.txt

Replace <dbname> with the value use in column 2 of the link table file that this file corresponds to.

The format of this file uses a simple FIELD<TAB>VALUE<NEWLINE> format. The field names are as follows

Line 1 - DBNAME

The DBNAME value used in column 2 of the link table.

Line 2 - BASEURL

The base URL to use when constructing links to your database.

Line 3 - HOMEURL

The homepage URL that represents the front page of your database.

Line 4 - DESC

A brief description of your database.

Line 5 - EMAIL

The email to use should we need to contact you.

File examples

Example 1-

DBNAME  GENBANK
BASEURL http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&val=
HOMEURL http://www.ncbi.nlm.nih.gov/
DESC    A genetic sequence database.
EMAIL   johndoe@nowhere.com

#Flybase ID	DBNAME  DBID        DBURL
FBgn0259750 GENBANK AAA86639    AAA86639
FBgn0005561 GENBANK AAB70249    AAB70249

Example 2-

DBNAME  UNIPROT
BASEURL http://www.uniprot.org/
HOMEURL http://www.uniprot.org/
DESC    A database of protein sequence and functional information.
EMAIL   johndoe@nowhere.com

#Flybase ID	DBNAME  DBID      DBURL
FBgn0259750 UNIPROT O16117    entry/O16117
FBgn0005561	UNIPROT O16804    entry/O16804