FB2014_06, released November 12th, 2014
 
 

A Database of Drosophila Genes & Genomes

Report on the November 2007 FlyBase survey

Report on the results of the FlyBase survey undertaken November 2007.

We were very pleased that respondents find that FlyBase is proving a valuable resource, with 65% using FlyBase once a day or more and 80% finding FlyBase invaluable or very helpful in their genetic research activities. We were also pleased that most respondents who had contacted us for help found our response very good or good (88%). For a detailed breakdown of the results please see the accompanying PDF.

More importantly, your responses have been key in setting our priorities for the future, and we urge you to continue to give us suggestions for how we can improve the ability of FlyBase to assist your research. We need to understand FlyBase users' needs in order to make best use of our limited resources. We received too many suggestions to discuss all of them in this summary, and although all the points that were raised are being considered by the FlyBase team, we would like to highlight here a few of the lessons we learned from the survey.

  • Formalizing the structure of the data can result in reports that are not easy to read and understand. On the one hand users want to find out the basics of what is known about a gene from a short summary written in plain English. On the other hand using "controlled vocabularies" or "ontologies" is essential to produce a database that can be effectively searched (if different terms are used to describe the same thing, then to find all the entries for that thing, you would have to search with all possible terms; if you always use the same term, then one search is comprehensive). These contrasting needs were also reflected in the responses we received, for example when we asked "Are there data that are difficult to interpret because of formatting or presentation?" we received these two opposing responses:
    • "The automated descriptions of gene function are very poor. I reckon that's because GO doesn't work. GO and controlled vocabulary is great for informatics, but awful for the average grad student wanting to do genetics."
    • "The textual descriptions of gene expression are utterly useless for the computational community. Someone should sit down and translate them into the official vocabulary."
    We recognize that both approaches are essential: controlled vocabularies for effective searching and textual descriptions for summarizing subtle aspects of genetic, phenotypic and expression information. We have realized that no single approach will solve all problems, so we will be providing a variety of summaries. We will transfer "Red Book" summaries for the classical markers and visible phenotypes in the adult. Many users found Tom Brody's Interactive Fly a good source of information, and he has kindly agreed to supply FlyBase with his gene summaries. In addition we will set up a gene wiki, so that users can contribute to gene summaries, both in an attributed way or adding to a collective summary. We are investigating other approaches as well and welcome your further suggestions.
  • We are currently working hard to eliminate the problem that one can click on a data topic only to discover there is nothing in that category (empty matryoshka). If you haven't already noticed it, you can use Profile Manager to set your own configurations of which parts of the gene report are open by default.
  • We are significantly accelerating literature incorporation into FlyBase. We will describe our detailed plans separately, but one important feature will be to seek your help, by asking authors to give us key bits of information from the paper on a data entry web site. A number of users also asked for protein interaction data and information about useful reagents, such as antibodies. We will begin to incorporate these data types in the next several months.
  • The phenotypic and expression data need to be improved, and methods developed to improve our ability to search for genes with similar phenotypes or expression patterns.
  • Many of you would like FlyBase search tools to guess, "like Google", at what you are looking for when you type something that is not in the database. FlyBase provides the equivalent of Google spell-checking for symbols by including in the database extensive symbol synonyms. If you search for a known variant of a gene or other symbol, the record will be found. We plan to provide additional help by offering Google-style suggestions based on extension of the letters you have already typed. We will not, however, be able to provide search tools that test for all possible variants of what you have typed (even Google doesn't do that).
  • Information about orthologs and gene families needs to be improved. We are replacing the current ortholog identifiers with gene symbols, and if possible names, from the other species. We are also evaluating optimal ways to present alignments and relationships amongst orthologs and gene families, and we expect the first results of these efforts to be publicly accessible in the next several months.
  • We had been concerned that those FlyBase users at great distance from our servers in Indiana might have much slower access to FlyBase; if so, this would have argued for the establishment of FlyBase mirrors around the globe. Here is the perception of the speed of the website from those countries with substantial numbers of survey respondents:
     
    Total
    Fast
    Acceptable
    Slow
    Canada
    29
    13 (45%)
    13 (45%)
    3 (10%)
    France
    58
    20 (34%)
    36 (62%)
    2 (3%)
    Germany
    52
    23 (44%)
    26 (50%)
    3 (6%)
    Spain
    32
    14 (44%)
    17 (53%)
    1 (3%)
    UK
    116
    41 (35%)
    61 (52%)
    14 (12%)
    USA
    481
    138 (29%)
    276 (57%)
    63 (13%)
    It can be seen that there is no correlation between the speed of FlyBase site and the country of access; therefore we concluded that mirrors would not improve speed of access to FlyBase. Rather, we expect that the speed of access reflects a complex set of issues such as local network conditions, capacity and configuration of individual desktops, browser choices and other issues that are largely out of our control.