Interface test
Meeting Report Model Organism Database User Interface
Caucus January
18, 2007 Town and Country Hotel, San Diego, CA, USA
The user interface (UI) is the most visible aspect of a MOD, and
arguably has the most direct impact on the satisfaction of its users. On
the first day of the January 2007 GMOD
Meeting, we
shared experiences, discussed lessons learned, and identified unsolved
problems in the field of MOD user interface design. To drive the
discussion, representatives of several MODs (including both model and
multi-organism databases) presented aspects of their MOD’s UI that
related to a common set of use cases: searching by gene name and viewing
a gene report, searching for data related to a broader biological
concept, constructing customized reports to answer more specific
questions, extracting data in bulk for thousands of genes, and using one
MOD in concert with other MODs.
Executive Summary
The MOD user interface session brought to light some very useful topics
that many MODs (new and old) can benefit from discussing.
General UI Lessons
- In all cases, it should be clear what actions are required from the
user and what results those actions will produce.
- Contextual examples and “help” links are very useful to users, but are
underutilized. Utilizing help facilities should not require navigation
away from the user’s current page or tool.
- Appearance is less important to users than responsiveness (speed) and
functionality.
- Developing good UIs takes sustained work, including feedback and
community testing.
Complexity is an inherent problem
- MODs deal with rich, complex data that is constantly expanding and
changing. It is critical to recognize that this complexity “comes with
the territory.” A central challenge for a MOD’s user interface to this
data is to make common tasks easy and complex tasks possible.
Oversimplification is not helpful, but neither is overwhelming users
with unfiltered, unorganized information. A MOD’s UI must strive to
balance the need for simplicity with the need for complete
information.
- This problem is addressed through:
- User interface design
- Engineering of site infrastructure
- User education and documentation
Ideas for future development
- More dynamic web content options and graphical summaries to help
manage information.
- Genuine auto-complete in text boxes to help guide searches (can be
implemented with javascript doing asynchronous database lookups)
- Google can be harnessed to aid, but is not solely sufficient for,
searching MOD data.
- Virtualization of “server snapshots” is a useful mechanism for keeping
older database versions available.
- There is a need for broader availability of ‘power-user’ interfaces
allowing
- complex queries to be constructed
- gathering / uploading / combining / operating on sets of e.g. genes
in one step
- flexible configuration of data output formats
more here ..
Contents
In attendance
- attendance list
Slides and text summaries of individual presentations.
Detailed Report
Addressing Six MOD Use Cases
Commonalities, good ideas, strengths and weaknesses
- Many simple/quick/global searches
- Most MODs have some variation of a “search everything” option that
is the primary search entry.
- Simple searches at MODs vary principally in the details of data
classes (which were searched and whether or not they needed to be
specified) and how terms are matched (exact, partial, phrase
and/or wildcard).
- These types of simple searches must balance ease of use with
relevance of results.
- How results can be handled also varies between MODs. Various MODs
allow searches and results to be saved, refined, downloaded and/or
exported to other tools.
- One challenge is how to support user expectations to make “simple”
searches that return quality answers. The TAIR presentation
details a several-year process to determine best functionality of
simple search from user feedback: it takes effort to develop
“simple and quick” searches that return what biologists really
looking for.
- Gene page reports
- Most MODs agree that users like short summaries and graphical
presentations of data.
- Individiual MODs vary widely in deciding how much detail is
displayed in the default view.
- Individual MODs struggle with how to direct users, in an obvious
manner, toward deeper levels of information
- Advanced/attribute searches.
- Many MODs allow some sort of advanced search where users can
specify search criteria over multiple data types.
- Some MODs (e.g. NCBI, ApiDB) allowed query histories to be
combined, allowing for complex, refined searches and results, and
others (FlyMine) provide this functionality through set operations
on ‘bags’ of objects.
- User choices in data reports
- This is an aknowledged weak aspect many MODs; most have yet to
develop systems that allow users to customize reports and instead
expect users to parse information from current, defined formats
(which can vary wildly between databases). Some MODs (e.g.
FlyMine) allow choice of output columns and their order.
- UIs for bulk data handling.
- Many MODs allow some sort of bulk query, although the allowed data
types varys.
- Some MODs (e.g. FlyMine) provide a more complex query UI that can
operate on large lists (e.g. all genes), supporting pre-defined or
user-defined data export formats. In addition intermediate results
can be saved in ‘bags’ and these lists combined and/ or used in
subsequent queries.
- BioMart is used at two or more MODs for bulk data search/retreive
and is in development at others.
- Cross-site facilitation
- Some MODs allow searches for IDs and/or names that might be found
elsewhere
- Many MODs use ontologies (e.g. GO) or orthologies to link to other
databases
- computed relations between databases (ncbi/others?) e.g.
homology/orthology, ontology/literature attributes, ..
- problem: keeping up-to-date across dbs
- Some MODs (e.g. FlyMine) maintain lists of orthologues for many
species. This allows a set of genes from one species to be
imported, and the list of corresponding genes from another, more
data rich species, to be derived and then explored.
User Interface Development
Lessons Learned from Experience
- Clear (better yet, obvious) input actions and requirements are
important. For example, “What, exactly, are people expected to type
into a ‘quick search’ box?” and “If the ‘log on’ box is at the top of
the homepage, does that mean I have to register and log on to use the
site?”
- Reliable results are important.
- Fast results are good.
- What users ask for isn’t always what they really want (see the ApiDB
presentation).
- Appearance is less important than functionality.
- UI components and report sections should be clearly labeled and
include tips, sample queries and default values. Results should
include brief summaries whenever possible.
- Use terms the bio-community understands and try to stay away from
those that you might want them to learn, but that aren’t strictly
necessary for usage (e.g. the word “Boolean” at ApiDB )
- Don’t change what works for users; even if you develop a “better” more
sophisticated tool, the users might still prefer the one they know how
to use (e.g. the Batch Sequences tool at Wormbase). MODs generally
thought that we should avoid rapid turnover in UI, but many felt that
frequent, small changes and additions can be useful to community (if
clearly explaned)
Gathering and Analyzing User Feedback
- User community testing, interviews, surveys, HCI principles
- Ask users early about new prototypes, rather than after UI
development. Having an existing UI may limit the suggestions and
feedback.
- Users don’t necessarily want what they ask for. Make sure that
you’re meeting their need over and above their request.
- Concept “card sorting” test is another tool that lets users show
best organization & grouping of MOD topics.
- Watching users interact with MOD UI can tell a better, more
informative story than user opinions and surveys. Paired,
video/keystroke captured, user testing can be tedious to analyze but
extremely useful (ApiDB and ZFin both have experience with these
techniques). Using “paired” teams (matched for status/gender) allows
you to hear people explaining, to each other, how to use your site.
- Log site usage, moniter it enough to recognize common “failed”
queries, and figure out how to address and prevent such errors
- Support for User preferences/sessions/configurations (e.g. “MyNCBI”,
FlyMine’s “MyMine”)
- default UI dilemma: simple/dumb for general audience, or
sophisticated/complex for focused users
- variable (low) use of preference options?
- needs transparency to user (avoid logins/extra effort)
Balancing Completeness and Simplicity
Too much information and too many choices can overwhelm users, but
restricting choices and hiding data limits the usefulness of a MOD. This
dilemma and how best to handle it was a common theme in presentations.
Easier to Use versus Does More Things is a good way to express this.
Achieving a good balance requires a great deal of thought and user
input.
An illustrative example can be found in map displays and reports, where
detail sections are hidden, but available through linked pages or
dynamic web displays. At SGD, a sidebar of menu choices was found to
hide too much from users. It is being replaced with an web page that
openly exposes all choices.
A related issue is providing adequate information on what the different
choices available actually do, and the provenance of data: what data are
present and where they came from.
New Trends in MOD UIs
Wikipedia provides an excellent example of the power of community
participation in science documentation. Many new genomics and biology
wikis are springing up, running on the reusable software and
documentation provided by
Wikipedia.
Members of this new generation of wikis include:
http://wiki.gmod.org/ , an outcome of the GMOD
meeting , http://genomewiki.ucsc.edu/ ,
http://www.bioperl.org/wiki/ ,
http://www.wormbase.org/wiki/ ,
http://wiki.dictybase.org/dictywiki/ ,
http://rana.lbl.gov/drosophila/wiki/ ,
http://www.nescent.org/wikis.php ,
http://openwetware.org/wiki/ ,
http://darwin.nerc-oxford.ac.uk/gc_wiki/ ,
http://wiki2.germonline.org/wiki/ ,
http://www.biodirectory.com/biowiki/ . This growing
list of wikis offer scientists a common, well-documented user interface
that is expected to facilitate expanded use, as experience in
participation one site carries over to others.
Client-side Scripting
Dynamic web page content, user preferences and histories are becoming
more widely available at MODs. These are used for showing or hiding
contents (aiding the dilemma of supporting both the beginner and
advanced user), for map track reordering, retaining history of user
queries and answers, and other uses.
- more graphical summaries of long lists/tables are possible; can be
useful
- improving comparative genomics aspects to aid broader audience
- include curator/community edits/updates to contents (part of same
reading UI) (ZFIN,SGN,)
Using Virtualization for MOD Snapshots
Providing stable “snapshots” of the data in a MOD is important for
reproducing results in publications that cite the MOD. Many MODs provide
snapshots in the form of large data dumps created at specific time
intervals, which could be used to laboriously reconstruct the state of
the MOD’s data at a given point in time. WormBase takes this idea a step
further, using virtualization technology to capture the complete state
of the WormBase site for each snapshot. These snapshot images can be
accessed via the web in the same manner as the main WormBase site, or
can be downloaded for playback on any computer that supports the free
VMWare player. This practice greatly facilitates
reproduction of results from papers that cite WormBase.
Using Google and Other General Search Engines
A number of MODs represented at the meeting used Google to provide some
of the search functionality on their site. It was generally agreed that
if an external search engine is allowed to fully index a MOD’s pages, it
can usually provide very useful full-text search results. However, since
generalized search engines do not have specific knowledge of the
structure of a MOD’s data, it may not be possible to obtain relevant
results for very specific searches. For example, it would probably not
be possible to use a generalized search engine for searching genes based
on their exact physical locations in a genome.
It was briefly mentioned that some standard search engine optimization
techniques may be used to improve results with external search engines,
particularly providing a “site map” page with deep links to all or most
of the pages in your site.
One concern that was raised was that the “crawlers” used by external
search engines for indexing web pages sometimes impose unacceptable
demands on a MOD’s web servers, particularly when crawlers from multiple
search engines are indexing a site at once. Some solutions to this were
suggested, including carefully tuning your site’s robots.txt
file to
avoid computation-intensive pages, and simply buying more servers and/or
optimizing your site’s code to better handle the load. Googlebot and
perhaps other robots can be told to reduce their hit rate to an
acceptable level.
A different strategy pursued by some MODs is to use generalized search
software on their own servers, such as
Lucene or
LuceGene, a Lucene variant customized for indexing
many types of biological data. This approach offers more control over
the indexing, searching, and result presentation than using an external
search engine.
User Interface Conventions
One subject that was raised, but was not fully discussed in the time
available, was the idea of developing common user interface conventions
among MODs. Attendees notes several cases of “convergent evolution”
among the MOD user interfaces:
- most MODs seem to have some sort of unified search feature for all
data in the MOD
- wide adoption of GMOD’s GBrowse software
- existing tools for bulk data download, movement toward more flexible
search and reporting tools such as BioMart and InterMine
Todd Harris proposed developing a convention for common URLs for bulk
downloading of genome data, but the subject was not fully discussed.
Dr. Peter Karp presented a useful list of common elements that every MOD
should be sure to include:
- procedures for citing the MOD in publications
- “contact us” links
- facilities for downloading datasets and software
- community news
- MOD data cross-referenced to publications
- summary statistics of data in the MOD
- if possible, a history of these statistics over time
- update history (a.k.a. changelog) of MOD
- central credits list for MOD contributors
General Discussion
- User community testing, interviews, surveys, HCI principles
- Differing views on value of types and stages in user testing (Is
user testing necessary? What about releasing software as a means to
get feedback?)
- It may be that user testing is most valuable when attempting to
solve specific problems, as opposed to “design”. User testing is
probably most useful in the earliest phase, looking at prototypes
or prototypes on paper.
- Monitoring and logging is important, you must know what occupies
the user the most.
- Dilemma of too much or too little information/choice
- Consensus: complexity goes with the territory, but you have to make
the most common things easy and the hard things possible, and have
good user education and documentation.
- Is it true that complexity should be concealed? How does one strike
the balance between providing for the expert and for the novice?
- Google is your enemy/friend …
- Differing views on advantage/disadavantages of Google as adjunct to
MOD search systems (No consensus that using Google is advantageous;
Techniques for guiding Google/Bots below)
- Good new ideas
- More dynamic content, graphics summaries may be good
- Can we make more use of graphics to replace text? E.g. a generated
image replaces expression information.
- Word or string completion as user asks questions is useful UI
technique (see Javascript and AJAX below)
- Common UIs across MODs?
Not discussed (sigh :()
Implementation Techniques
- Easy way to direct Google/Bots to best or full content (e.g. gene
pages) is with a site map with links to everything you want to expose.
- detecting user agent, query referal string, etc., in web query lets
report software alter output (e.g. give Google all of report to
index, highlight Google user queries)
- Google offers indexing tips, directions to update or hide specific
parts
- Dynamic content (Javascript/AJAX):
- Word or string completion using Javascript/AJAX and dedicated,
simple databases of words to complete work well (Wormbase, EBI
Ontology Lookup Service, others?).
- use to hide and show sections of pages; reorder tracks in a genome
browser;
- RSS is probably underutilized, but would it be widely used if
available?
- An XML representation of an “entry” makes a “diff” or change easy to
assess, and reliable alerts could be constructed.
Categories:
Namespaces
Navigation
Documentation