GMOD Community Meeting July 16-17, 2008 University of Toronto |
The July 2008 GMOD community meeting was held on July 16-17, 2008 at the University of Toronto, immediately before BOSC and ISMB 2008 (also in Toronto), and just a few days after the 2008 GMOD Summer School. The meeting was attended by over 30 people representing more than 20 different groups.
Time | Topic | Who | Presentation |
---|---|---|---|
9:00 | Introductions | Scott Cain | |
9:30 | The State of GMOD | Scott Cain | <a href=”https://raw.githubusercontent.com/GMOD/gmod.github.io/main/mediawiki/images/f/fb/Gmod_meeting_7_2008.ppt” |
class=”internal” title=”Gmod meeting 7 2008.ppt”>PPT</a> | |||
10:10 | break | ||
10:40 | MediaWiki/TableEdit Roundtripping | Jim Hu | |
11:00 | More MediaWiki enhancements | Sheldon McKay | Links… |
SGN Community Annotation | Lukas Mueller | PDF unavailable | |
WikiMods & Chado API | Brad Arshinoff | <a href=”https://raw.githubusercontent.com/GMOD/gmod.github.io/main/mediawiki/images/6/65/PerlChadoGMOD2008.pdf” | |
class=”internal” title=”PerlChadoGMOD2008.pdf”>PDF</a> | |||
11:30 | Lunch | ||
1:30 | GMOD Help Desk | Dave Clements | <a href=”https://raw.githubusercontent.com/GMOD/gmod.github.io/main/mediawiki/images/1/19/HelpDeskGMOD2008.pdf” class=”internal” |
title=”HelpDeskGMOD2008.pdf”>PDF</a> PPT | | 2:15 | Rearchitecting Apollo and the need for a database independent Biological API layer | Ed Lee | | | 2:50 | break | | | | 3:20 | InterMine and Chado | Richard Smith | PDF | | 3:50 | Show and Tell | “What I did with my Summer” | | | | CMap | Ben Faga | |
Time | Topic | Who | Presentation |
---|---|---|---|
9:00 | New things for GBrowse 1.69 | Sheldon McKay | |
GBrowse 2.0 and Roadmap | Lincoln Stein | ||
9:30 | New things for GBrowse 3.0 | Ian Holmes | <a href=”https://raw.githubusercontent.com/GMOD/gmod.github.io/main/mediawiki/images/2/23/GBrowse3GMOD2008.pdf” class=”internal” |
title=”GBrowse3GMOD2008.pdf”>PDF</a> | |||
10:00 | break | ||
10:30 | The need for a computable common gene page (Don Gilbert’s proposal) | Scott Cain, Lincoln Stein | <a href=”https://raw.githubusercontent.com/GMOD/gmod.github.io/main/mediawiki/images/8/80/Common_gene_page.ppt” class=”internal” |
title=”Common gene page.ppt”>PPT</a> | |||
11:30 | Lunch | ||
1:30 | More Show and Tell or a mini hackathon or go see Toronto | ||
Traits at SGN | Lukas Mueller | ||
CellFrame | Yunchen Gong | ||
Matching Gene Names to Articles at Xenbase | Jeff Bowes | ||
Django and Chado - A user interface exploration | Victor de Jager |
This section covers discussion about the software components in GMOD. For a summary of talks and discussion on how those components are used at particular databases, see the GMOD User Community section.
Scott Cain spoke on Chado.
The GMOD 1.1 release is in the works. There are no schema changes yet.
Joshua Orvis requested better typing / the use of controlled vocabularies in the Chado Companalysis Module to better represent scores that are currently in the analysisfeature table. Without it there is no way to keep track of what the scores mean. This issue was also raised by Brett Whitty at the 2008 GMOD Summer School the week before.
Also, Joshua (again) proposed the addition of a type_id field to the analysisfeature table. The use case for this is to allow the distinction between types of features involved in an analysis. The most direct examples are ‘input_of’ and ‘created_by’ which allow the user to perform queries of a features role in the analysis. This has been brought up in previous meetings and in the GMOD mailing list and seems to have had general approval.
Action Items
Dave Clements discussed the Chado Natural Diversity Module. It was developed at NESCent and NCSU to enable Chado to better support natural diversity studies. This has been laying dormant for a while and it would be nice to get it in use by more groups so that we can better generalize it and make it an official Chado module. Lincoln pointed out there was grant money to do exactly this.
Action Items:
Scott Cain also spoke about his work on the Community Annotation System (CAS). The next release of CAS, 1.1, will feature
cas-utils is a set of tools that tie together GBrowse, Apollo and Chado. This includes
cas-utils is now available for download.
Jim Hu spoke about progress on TableEdit, currently at release 0.8.
Ed Lee, lead developer for Apollo spoke about enhancements to Apollo that have happened since he started working on it last September:
Richard Smith spoke about InterMine, a query optimized data warehouse system for biological data. Has the ability to create precomputed tables (a la materialized views) at any time (and do this from the GUI) in response to popular query patterns. Also supports query templates, which are fill-in-the blank versions of popular queries.
InterMine is written in Java. It has one class per Sequence Ontology (SO) term, and use Java class inheritance for is_a relationships. part_of relationships are implemented with Java references and collections.
Ben Faga gave a talk on what’s new in CMap. Some highlights:
Three talks gave us the GBrowse roadmap. Talks covered the next incremental release (1.69), and the next two major releases (2 and 3).
Sheldon McKay and Lincoln Stein spoke about recent enhancements to GBrowse. These features are available in the current development version (“stable”) of GBrowse and will be included in the upcoming (some would say imminent) 1.69 release of GBrowse.
Lincoln Stein talked about GBrowse 2, the next major release of GBrowse. This release focuses on performance and stability. GBrowse 2 will be cluster aware:
Our experience is that the database is usually the bottleneck with existing GBrowse installations.
GBrowse 3 was renamed JBrowse after this meeting.
Ian Holmes presented his group’s work on GBrowse 3, a complete rewrite of GBrowse using a Web 2.0 style interface. Mitch Skinner has done most of the coding work on this.
Most tracks are now rendered in client using JavaScript. Tracks such as wiggle tracks can also still be rendered on the server.
GBrowse 3 uses nested containment lists to quickly determine what features to display. These are 5 to 500x faster than R-trees. The group is using the modENCODE project as a target test audience.
Ian made the observation that when you are asking for guidance on GUIs, you need large sample sizes. Small sample sizes lead to a large set of suggestions with very little overlap between users. Large sample sizes enables you to identify a core set of requests.
Ian would like to move GBrowse 3 in the direction of being a genome wiki
Genome Wiki is about people sharing tracks, not so much about individual genes.
They are not currently working on a Chado adaptor. They hope to do that, but probably not soon.
At the 2008 GMOD Summer School there were several requests for a GBrowse glyphs page that
Lincoln believes that there is already similar documentation in the GBrowse distribution.
Action Items:
Scott Cain eerily yet effectively channeled Don Gilbert on the topic of a Common Gene Page.
This not the gene page that people see when they come to your web site. Rather, it is some minimal set of information about a gene in your organism, stored in XML format, that can be easily accessed and parsed by other organizations. It is meant to enable easy sharing of information about genes between GMOD users.
If you’ve been around GMOD for a while you know that the concept of a common gene page is almost as old as GMOD itself. We might have actually moved forward on this at this meeting.
There was discussion on what should be included in the gene page. The consensus was to keep track of only the minimal amount of information, See Scott’s presentation for the list we settled on.
Uniprot XML may be suitable for this.
Lincoln proposed a CGI script that has a set of predefined hooks for populating the XML. This could be a Perl program with methods for fetching data and then passing it to another routine for placing the data into an XML format. Each organization would write the classes called by the hooks to get the data from wherever they keep it. Provides a framework that can be used across mutliple organizations and that will always produce structurally identical XML, no matter how it is originally stored.
Rob Buells from SGN produced a prototype of this program while at the meeting.
Action Items:
We also discussed the Gene Wiki project. This project has created around 7,000 human gene pages in Wikipedia. Wikipedia asked
Someone might eventually be able to create a MODGeneWiki from GMOD Common Web Pages.
Sheldon McKay spoke about MediaWiki related work he’s been doing for the modENCODE project.
FCKEditor is a WYSIWIG editor for MediaWiki, but if you use it off the shelf it becomes hard for your users to use any other editor, including the default MediaWiki editor, which they may already be familiar with. Sheldon has extended FCKEditor to make it optional. Users now see “edit” “rich edit” links and tabs.
Action Items:
Sheldon has also created an extension for creating popup balloons in a MediaWiki Web Site. See Popup Balloons for details. This extension is installed on the GMOD web site.
Does what it says - enables users to collapse and expand sections on pages in MediaWiki.
A set of extensions were created to
These use the Yahoo autocomplettion library.
Brad Arshinoff from XanthusBase, (soon to be WikiMods, see below) gave a talk titled Perl based Schema Abstraction Layer for Chado. Brad’s talk (slides unavailable) gave an overview of a Perl middleware package for Chado that was developed at XanthusBase.
Q: Modware is a Perl-based Chado API that already exists. Why not use it?
A: Thought this would be less work and a lot less SQL than Modware. May or may not have worked out that way.
Eric Just, the developer of Modware, is no longer at DictyBase. Someone has replaced him, but we don’t know if that person is supporting Modware.
It seems that we have a lot of Perl and Java APIs to Chado, perhaps too many. What should we do about that? Lincoln Stein suggested that we document them all and provide a list of pros and cons for each. That will allow new users to make the best informed choice about what they want to do.
Action Items:
Ed Lee presented a talk on the need for a Java interface to to the Chado schema. He’s going to be rewriting the Apollo data model to clearly define biological concepts and to map well to any of Apollo’s potential data sources, including Chado.
This could be a way to enforce/encourage Chado Best Practices. A current problem for tool developers (such as the Apollo team) is writing code to work with Chado, when not all Chado users represent the same biological concepts in similar ways.
Having a cleanly designed, biological level (as opposed to DBMS table level) API for Java would help organizations follow best practices when using Chado. It also would make tool development much easier.
Lukas Mueller from SGN spoke about community annotation at the Sol Genomics Network.
Lukas also takled about SGN’s traits (phenotypes) database. SGN uses a custom database design for their phenotypic data. (They do not use the Chado Phenotype Module. Suzi Lewis indicated that her group is working on a new phenotype module for Chado which will address issues with the current design.)
Brad Arshinoff from XanthusBase, introduced the WikiMods web site, a collection of MODS for prokaryotes with small research scommunites. This will replace the existing XanthusBase site and add an additional organism in the process. It is scheduled to launch on July 30 2008 with these sites:
They have migrated Chado from Oracle to MySQL.
Yunchen Gong gave a talk about CellFrame, a web site about cell biology and construction of cell perturbation networks
Jeff Bowes of Xenbase talked about automatic loading, linking, and indexing publication abstracts. Xenbase downloads information for every Xenopus related publication. The abstract is then scanned for gene names/symbols and other controlled vocabulary terms. The publication is then associated with those terms and genes in Xenbase.
Xenbase has extended the schema to support this indexing scheme and uses DB2 Net Extender for indexing (but any indexing tool could be used). Xenbase also scrapes images from each journal they have an agreement with. They use a Java class for journals, and every journal has its own subclass.
Victor de Jager of the University of Nijmegen and the Centre for Molecular and Biomolecular Informatics, gave a talk on using the Django web framework with Chado (see Chado Django HOWTO for more). A Django based web site could be layered on top of the BioObjects proposed by Ed Lee in his talk.
Last year a Google Summer of Code student worked with Lincoln, and Hilmar Lapp (at NESCent) on a Google Summer of Code project to add phyogenetic information to GBrowse. Lincoln and Hilmar liked it enough that they recommend the program. Lincoln cautions that it is a lot of work to be a mentor in the program.
Action Items:
At the end of the GMOD Help Desk talk (see below), Dave asked for what else he should be working on. The number one response was creating GMOD packages that could be installed with Linux package installers.
Everyone agreed this was an excellent idea, and that it was hard to do, particularly to keep the packages up to date for all the distributions you want to support. BioPackages.net would be the place to put them, if we did this.
Lincoln mentioned that there are 1 year infrastructure grants for this sort of thing. That would get us where we want to be for a year, but not after that.
Action Items:
Dave Clements gave a talk on his first 10 months at the GMOD Help Desk, and what he is planning doing in the coming months.
Planning to TableEdit to make parts of the GMOD web site be database driven. Plan on having the same core set of data and a web page for each user. The core data set will describe what components they use and how, and be implemented in TableEdit tables. We’ll then be able to use that information to also show which users use a component on each component page, as well as a complete list of users.
This is a continuation of the community portal idea that was started in the past 10 months. This will help new and existing users get a handle on who is using which components for what kind of biology.
We can’t possibly describe or maintain HOWTO pagess for all possible combinations of operating system (in all their versions), external software (BioPerl, Java, libgd,… - in all their versions), and GMOD Components (in all their possible versions and combinations).
However, if we made it easy for GMOD users to record their experiences installing whatever combination they are using then that might be a useful approximation. New users would then be able to find several possible workarounds when they, for eample, can’t get libgd to work. Maybe one of the workarounds will even be for there Linux distribution.
We already have several such logs on the web site.
Dave will create a plan for
ZFIN’s current logo was designed several years ago by Kari Pape, a student in a University of Oregon design class. Judy Sprague, ZFIN’s manager, worked with the professor and the students to communicate what ZFIN was all about and at the end of the quarter we had about 20 designs to pick from, and most of them were spectacularly good.
Many GMOD user databases, web sites, and GMOD components don’t have snazzy logos. Dave offered to contact the same department and the local community college as well, and ask if they would be interested in doing something similar GMOD community. This time around I would propose that each student or team get a different database/web site/component.
This was clearly the most popular idea Dave has ever had during his time at GMOD. I’ll investigate ASAP. (See GMOD Logo Program.)
The Help Desk now offers to review grant proposal prior to submission to help them fully state how much they can use GMOD components, and thus avoid reinventing the wheel for their project.
We will also start suggesting that grants that propose using GMOD components also include a limited amount of funding for GMOD in the grant. This could either be core project funding, funding for existing components or funding for new components to become part of GMOD.
If you have something you want to be on the agenda at this meeting please add it below.