GMOD

Interface test

Meeting Report Model Organism Database User Interface Caucus January 18, 2007 Town and Country Hotel, San Diego, CA, USA

The user interface (UI) is the most visible aspect of a MOD, and arguably has the most direct impact on the satisfaction of its users. On the first day of the January 2007 GMOD Meeting, we shared experiences, discussed lessons learned, and identified unsolved problems in the field of MOD user interface design. To drive the discussion, representatives of several MODs (including both model and multi-organism databases) presented aspects of their MOD’s UI that related to a common set of use cases: searching by gene name and viewing a gene report, searching for data related to a broader biological concept, constructing customized reports to answer more specific questions, extracting data in bulk for thousands of genes, and using one MOD in concert with other MODs.

Executive Summary

The MOD user interface session brought to light some very useful topics that many MODs (new and old) can benefit from discussing.

General UI Lessons

Complexity is an inherent problem

Ideas for future development

more here ..

Contents

In attendance

- attendance list

Individual Presentations

Slides and text summaries of individual presentations.

Detailed Report

Addressing Six MOD Use Cases

Commonalities, good ideas, strengths and weaknesses

  1. Many simple/quick/global searches
    • Most MODs have some variation of a “search everything” option that is the primary search entry.
    • Simple searches at MODs vary principally in the details of data classes (which were searched and whether or not they needed to be specified) and how terms are matched (exact, partial, phrase and/or wildcard).
    • These types of simple searches must balance ease of use with relevance of results.
    • How results can be handled also varies between MODs. Various MODs allow searches and results to be saved, refined, downloaded and/or exported to other tools.
    • One challenge is how to support user expectations to make “simple” searches that return quality answers. The TAIR presentation details a several-year process to determine best functionality of simple search from user feedback: it takes effort to develop “simple and quick” searches that return what biologists really looking for.
  2. Gene page reports
    • Most MODs agree that users like short summaries and graphical presentations of data.
    • Individiual MODs vary widely in deciding how much detail is displayed in the default view.
    • Individual MODs struggle with how to direct users, in an obvious manner, toward deeper levels of information
  3. Advanced/attribute searches.
    • Many MODs allow some sort of advanced search where users can specify search criteria over multiple data types.
    • Some MODs (e.g. NCBI, ApiDB) allowed query histories to be combined, allowing for complex, refined searches and results, and others (FlyMine) provide this functionality through set operations on ‘bags’ of objects.
  4. User choices in data reports
    • This is an aknowledged weak aspect many MODs; most have yet to develop systems that allow users to customize reports and instead expect users to parse information from current, defined formats (which can vary wildly between databases). Some MODs (e.g. FlyMine) allow choice of output columns and their order.
  5. UIs for bulk data handling.
    • Many MODs allow some sort of bulk query, although the allowed data types varys.
    • Some MODs (e.g. FlyMine) provide a more complex query UI that can operate on large lists (e.g. all genes), supporting pre-defined or user-defined data export formats. In addition intermediate results can be saved in ‘bags’ and these lists combined and/ or used in subsequent queries.
    • BioMart is used at two or more MODs for bulk data search/retreive and is in development at others.
  6. Cross-site facilitation
    • Some MODs allow searches for IDs and/or names that might be found elsewhere
    • Many MODs use ontologies (e.g. GO) or orthologies to link to other databases
    • computed relations between databases (ncbi/others?) e.g. homology/orthology, ontology/literature attributes, ..
    • problem: keeping up-to-date across dbs
    • Some MODs (e.g. FlyMine) maintain lists of orthologues for many species. This allows a set of genes from one species to be imported, and the list of corresponding genes from another, more data rich species, to be derived and then explored.

User Interface Development

Lessons Learned from Experience

Gathering and Analyzing User Feedback

Balancing Completeness and Simplicity

Too much information and too many choices can overwhelm users, but restricting choices and hiding data limits the usefulness of a MOD. This dilemma and how best to handle it was a common theme in presentations. Easier to Use versus Does More Things is a good way to express this. Achieving a good balance requires a great deal of thought and user input.

An illustrative example can be found in map displays and reports, where detail sections are hidden, but available through linked pages or dynamic web displays. At SGD, a sidebar of menu choices was found to hide too much from users. It is being replaced with an web page that openly exposes all choices.

A related issue is providing adequate information on what the different choices available actually do, and the provenance of data: what data are present and where they came from.

Community Participation

Wikipedia provides an excellent example of the power of community participation in science documentation. Many new genomics and biology wikis are springing up, running on the reusable software and documentation provided by Wikipedia. Members of this new generation of wikis include: http://wiki.gmod.org/ , an outcome of the GMOD meeting , http://genomewiki.ucsc.edu/ , http://www.bioperl.org/wiki/ , http://www.wormbase.org/wiki/ , http://wiki.dictybase.org/dictywiki/ , http://rana.lbl.gov/drosophila/wiki/ , http://www.nescent.org/wikis.php , http://openwetware.org/wiki/ , http://darwin.nerc-oxford.ac.uk/gc_wiki/ , http://wiki2.germonline.org/wiki/ , http://www.biodirectory.com/biowiki/ . This growing list of wikis offer scientists a common, well-documented user interface that is expected to facilitate expanded use, as experience in participation one site carries over to others.

Client-side Scripting

Dynamic web page content, user preferences and histories are becoming more widely available at MODs. These are used for showing or hiding contents (aiding the dilemma of supporting both the beginner and advanced user), for map track reordering, retaining history of user queries and answers, and other uses.

Using Virtualization for MOD Snapshots

Providing stable “snapshots” of the data in a MOD is important for reproducing results in publications that cite the MOD. Many MODs provide snapshots in the form of large data dumps created at specific time intervals, which could be used to laboriously reconstruct the state of the MOD’s data at a given point in time. WormBase takes this idea a step further, using virtualization technology to capture the complete state of the WormBase site for each snapshot. These snapshot images can be accessed via the web in the same manner as the main WormBase site, or can be downloaded for playback on any computer that supports the free VMWare player. This practice greatly facilitates reproduction of results from papers that cite WormBase.

Using Google and Other General Search Engines

A number of MODs represented at the meeting used Google to provide some of the search functionality on their site. It was generally agreed that if an external search engine is allowed to fully index a MOD’s pages, it can usually provide very useful full-text search results. However, since generalized search engines do not have specific knowledge of the structure of a MOD’s data, it may not be possible to obtain relevant results for very specific searches. For example, it would probably not be possible to use a generalized search engine for searching genes based on their exact physical locations in a genome.

It was briefly mentioned that some standard search engine optimization techniques may be used to improve results with external search engines, particularly providing a “site map” page with deep links to all or most of the pages in your site.

One concern that was raised was that the “crawlers” used by external search engines for indexing web pages sometimes impose unacceptable demands on a MOD’s web servers, particularly when crawlers from multiple search engines are indexing a site at once. Some solutions to this were suggested, including carefully tuning your site’s robots.txt file to avoid computation-intensive pages, and simply buying more servers and/or optimizing your site’s code to better handle the load. Googlebot and perhaps other robots can be told to reduce their hit rate to an acceptable level.

A different strategy pursued by some MODs is to use generalized search software on their own servers, such as Lucene or LuceGene, a Lucene variant customized for indexing many types of biological data. This approach offers more control over the indexing, searching, and result presentation than using an external search engine.

User Interface Conventions

One subject that was raised, but was not fully discussed in the time available, was the idea of developing common user interface conventions among MODs. Attendees notes several cases of “convergent evolution” among the MOD user interfaces:

Todd Harris proposed developing a convention for common URLs for bulk downloading of genome data, but the subject was not fully discussed.

Dr. Peter Karp presented a useful list of common elements that every MOD should be sure to include:

General Discussion

Implementation Techniques

Categories:

Namespaces

Documentation

Community

Tools