Difference between revisions of "November 2007 GMOD Meeting"

From GMOD
Jump to: navigation, search
m (Attendees)
 
(32 intermediate revisions by 5 users not shown)
Line 1: Line 1:
GMOD's November 2007 meeting was held November 5, 1:30PM to November 7, 12:00PM at [http://www.cshl.edu/ Cold Spring Harbor Laboratory] following the [http://meetings.cshl.edu/meetings/info07.shtml Genome Informatics] meeting.  
+
GMOD's November 2007 meeting was held November 5, 1:30PM to November 7, 12:00PM at [http://www.cshl.edu/ Cold Spring Harbor Laboratory] following the [http://meetings.cshl.edu/meetings/info07.shtml Genome Informatics] meeting.
  
 +
{{TocRight}}
 
== Pre-Meeting Information ==
 
== Pre-Meeting Information ==
  
Line 9: Line 10:
 
*community annotation - FlyBase seconds this topic
 
*community annotation - FlyBase seconds this topic
 
*Chado standard on ortholog/paralog/synteny storage.
 
*Chado standard on ortholog/paralog/synteny storage.
*The state of GFF tools in BioPerl. Some of the auditing and examples are on a [http://bioperl.org/wiki/GFF_code_audit Bioperl wiki page].
+
*The state of [[GFF]] tools in [[BioPerl]]. Some of the auditing and examples are on a [http://bioperl.org/wiki/GFF_code_audit Bioperl wiki page].
 
*GMOD releases and packaging
 
*GMOD releases and packaging
 
** How hard would it be to heap together specific releases of popular GMOD components into a named/numbered release that has gone through some level of compatibility testing?
 
** How hard would it be to heap together specific releases of popular GMOD components into a named/numbered release that has gone through some level of compatibility testing?
Line 30: Line 31:
 
* Tim Burgis, Imperial College, London
 
* Tim Burgis, Imperial College, London
 
* [[User:Scott|Scott Cain]], GMOD Coordinator
 
* [[User:Scott|Scott Cain]], GMOD Coordinator
* Mike Caudy, CSHL
+
* [[User:Mcaudy|Mike Caudy]], CSHL
* [[User:Clements|Dave Clements]], GMOD Help Desk, [http://nescent.org NESCent]
+
* [[User:Clements|Dave Clements]], [[GMOD Help Desk]], [http://nescent.org NESCent]
 +
* Norie de la Cruz, WormBase
 
* Quenfen Dong, Indiana University
 
* Quenfen Dong, Indiana University
 
* Dave Emmert, [http://flybase.org FlyBase]
 
* Dave Emmert, [http://flybase.org FlyBase]
* Ben Faga, CSHL
+
* [[User:Faga|Ben Faga]], CSHL
 
* Kathleen Falls, [http://flybase.org FlyBase]
 
* Kathleen Falls, [http://flybase.org FlyBase]
* Steve Fischer, [http://apidb.org ApiDB]
+
* [[User:Stevef|Steve Fischer]], [http://apidb.org ApiDB]
 
* [[User:Dongilbert|Don Gilbert]]
 
* [[User:Dongilbert|Don Gilbert]]
 
* [[User:Jogoodma|Josh Goodman]], FlyBase - Indiana University
 
* [[User:Jogoodma|Josh Goodman]], FlyBase - Indiana University
Line 44: Line 46:
 
* Kevin Galens, JCVI
 
* Kevin Galens, JCVI
 
* [[User:GreggHelt2|Gregg Helt]], DAS/2
 
* [[User:GreggHelt2|Gregg Helt]], DAS/2
* Chris Hemmerich, [http://flybase.org FlyBase]
+
* [[User:Chemmeri|Chris Hemmerich]], [http://flybase.org FlyBase]
 
* Hideya Kiwaji, [http://www.riken.go.jp/ Riken]
 
* Hideya Kiwaji, [http://www.riken.go.jp/ Riken]
* Ed Lee, Lawrence Berkeley Labs
+
* [[User:Elee|Ed Lee]], Lawrence Berkeley Labs
 
* Suzi Lewis, [http://bioontology.org/ National Center for Biomedical Ontology]
 
* Suzi Lewis, [http://bioontology.org/ National Center for Biomedical Ontology]
 
* [[User:Mckays|Sheldon McKay]], WormBase/modENCODE - Cold Spring Harbor Laboratory
 
* [[User:Mckays|Sheldon McKay]], WormBase/modENCODE - Cold Spring Harbor Laboratory
* Lukas Mueller, [http://soldb.cit.cornell.edu/ SOL Genomics Network]
+
* Lukas Mueller, [http://soldb.cit.cornell.edu/ Sol Genomics Network]
* Joshua Orvis, University of Maryland Medical Center
+
* [[User:Jorvis|Joshua Orvis]], University of Maryland Medical Center
 
* Suzanne Paley, [http://ecocyc.org EcoCyc]
 
* Suzanne Paley, [http://ecocyc.org EcoCyc]
 
* Chinmay Patel, GeneDB, Sanger Institute
 
* Chinmay Patel, GeneDB, Sanger Institute
Line 56: Line 58:
 
* Andy Schroeder, [http://flybase.org FlyBase]
 
* Andy Schroeder, [http://flybase.org FlyBase]
 
* Taner Sen, [http://maizegdb.org MaizeGDB]
 
* Taner Sen, [http://maizegdb.org MaizeGDB]
* [[User:Sperling|Linda Sperling]], ParameciumDB - CNRS
+
* [[User:Sperling|Linda Sperling]], [[ParameciumDB]] - CNRS
 
* [[User:Stajich|Jason Stajich]]
 
* [[User:Stajich|Jason Stajich]]
* Lincoln Stein, CSHL
+
* [[User:Lstein|Lincoln Stein]], CSHL
 
* Victor Strelets, [http://flybase.org FlyBase]
 
* Victor Strelets, [http://flybase.org FlyBase]
* Haiming Wang [http://apidb.org ApiDB.org]  
+
* Haiming Wang [http://apidb.org ApiDB.org]
 
* Robert Wilson, [http://flybase.org FlyBase]
 
* Robert Wilson, [http://flybase.org FlyBase]
 
* Haiyan Zhang, [http://flybase.org FlyBase]
 
* Haiyan Zhang, [http://flybase.org FlyBase]
Line 78: Line 80:
 
**Chado data storage
 
**Chado data storage
 
** See [[Chado Comparative Schema]].
 
** See [[Chado Comparative Schema]].
*BioPerl and GFF(2/3)
+
*BioPerl and [[GFF|GFF(2/3)]]
**GFF Questions
+
**[[GFF]] Questions
  
*Postgres Tuning / Materialized Views
+
*Postgres Tuning / [[Materialized views]]
**Performance Stratagies
+
**Performance Strategies
 
*Apollo-Chado Connection
 
*Apollo-Chado Connection
**Perfomance - See [[PostgreSQL Performance Tips]].
+
**Performance - See [[PostgreSQL Performance Tips]].
 
**Too many JDBC Adaptors
 
**Too many JDBC Adaptors
 
*Chado
 
*Chado
Line 92: Line 94:
  
 
*What Should GMOD Focus On (What's Missing)
 
*What Should GMOD Focus On (What's Missing)
**Genome Analysis (Galaxy, Ergatis, ...)
+
**Genome Analysis ([[Galaxy]], [[Ergatis]], ...)
***Lightweight annotation [http://www.yandell-lab.org/maker/index.html MAKER pipeline] from Mark Yandell
+
***Lightweight annotation [http://www.yandell-lab.org/maker/index.html MAKER pipeline] from Mark Yandell<br />(''2008/05/13: [[MAKER]] has since been folded in to GMOD.'')
 
**MicroArrays
 
**MicroArrays
 
**What is the GMOD Community and how best can we serve them?
 
**What is the GMOD Community and how best can we serve them?
 
**Is there a need for individual MODs?
 
**Is there a need for individual MODs?
*What should GMOD help desk do?
+
*What should [[GMOD Help Desk]] do?
 
**UIs: Picture Intensive
 
**UIs: Picture Intensive
 
*What should be the outcome of this meeting?
 
*What should be the outcome of this meeting?
 
  
 
===November 5===
 
===November 5===
Line 131: Line 132:
 
1:00 Standards and applications for storing comparative genome data
 
1:00 Standards and applications for storing comparative genome data
  
* Steve Fisher - GBrowse: SynView and the Generic database adaptor
+
* [[User:Stevef|Steve Fisher]] - [[GBrowse]]: [[SynView]] and the Generic database adaptor
 
* Victor Strelets - FlyBase Orthoview (GBrowse)
 
* Victor Strelets - FlyBase Orthoview (GBrowse)
  
Line 148: Line 149:
 
9:15 BioPerl
 
9:15 BioPerl
  
*GFF3 tools
+
*[[GFF3]] tools
 
*SeqFeatures/FeatureIO
 
*SeqFeatures/FeatureIO
 
*Sequence Ontology
 
*Sequence Ontology
Line 162: Line 163:
 
* WormBase update, [[User:Tharris|Todd Harris]]; Slides: [http://dev.wormbase.org/presentations/2007/2007.11-GMOD-WormBase/2007.11-GMOD-WormBase.key.tgz Keynote], [http://dev.wormbase.org/presentations/2007/2007.11-GMOD-WormBase/2007.11-GMOD-WormBase.ppt Powerpoint], [http://dev.wormbase.org/presentations/2007/2007.11-GMOD-WormBase/2007.11-GMOD-WormBase.pdf PDF], [http://dev.wormbase.org/presentations/2007/2007.11-GMOD-WormBase/2007.11-GMOD-WormBase.mov Mov]
 
* WormBase update, [[User:Tharris|Todd Harris]]; Slides: [http://dev.wormbase.org/presentations/2007/2007.11-GMOD-WormBase/2007.11-GMOD-WormBase.key.tgz Keynote], [http://dev.wormbase.org/presentations/2007/2007.11-GMOD-WormBase/2007.11-GMOD-WormBase.ppt Powerpoint], [http://dev.wormbase.org/presentations/2007/2007.11-GMOD-WormBase/2007.11-GMOD-WormBase.pdf PDF], [http://dev.wormbase.org/presentations/2007/2007.11-GMOD-WormBase/2007.11-GMOD-WormBase.mov Mov]
 
* [http://mango.ctegd.uga.edu/jkissingLab/presentations/GMOD_Nov_2007.ppt ApiDB GBrowse update] slides, Haiming Wang
 
* [http://mango.ctegd.uga.edu/jkissingLab/presentations/GMOD_Nov_2007.ppt ApiDB GBrowse update] slides, Haiming Wang
* [[Media:2007_11_05_CMap_GMOD.ppt|CMap/CMAE Progress Report]], Ben Faga
+
* [[Media:2007_11_05_CMap_GMOD.ppt|CMap/CMAE Progress Report]], [[User:Faga|Ben Faga]]
 
* [[Media:Gbrowse_syn.pdf|Gbrowse_syn]] Sheldon McKay
 
* [[Media:Gbrowse_syn.pdf|Gbrowse_syn]] Sheldon McKay
 
* [[Media:CommunityAnnotationNov2007.pdf|Community Annotation]] [[User:Sperling|Linda Sperling]]
 
* [[Media:CommunityAnnotationNov2007.pdf|Community Annotation]] [[User:Sperling|Linda Sperling]]
 
* [[Media:Workshop.pdf|Community Annotation]] Chinmay Patel
 
* [[Media:Workshop.pdf|Community Annotation]] Chinmay Patel
* [[Media:syntenyModeling.pdf|Modeling and Displaying Synteny w/ SynView]] Steve Fischer
+
* [[Media:syntenyModeling.pdf|Modeling and Displaying Synteny w/ SynView]] [[User:Stevef|Steve Fischer]]
 +
* [[Media:GMOD-Nov-2007.ppt|Recent Developments in Pathway Tools]], Suzanne Paley
  
 
== Meeting Minutes ==
 
== Meeting Minutes ==
Line 184: Line 186:
 
==== GMOD's Role ====
 
==== GMOD's Role ====
  
Don Gilbert pointed out that cheap short sequencers are now available.  Lots of people have inexpensive sequnces, but there still is no way to do cheap annotation.
+
Don Gilbert pointed out that cheap short sequencers are now available.  Lots of people have inexpensive sequnces, but there still is no way to do cheap annotation.
  
 
Current GMOD clients are species or family centered. Want to make it easy to integrate multiple species.  ApiDB is at the point of opening new species databases and web sites with relatively little effort.
 
Current GMOD clients are species or family centered. Want to make it easy to integrate multiple species.  ApiDB is at the point of opening new species databases and web sites with relatively little effort.
Line 194: Line 196:
 
How does GMOD want to deal with integration issues?
 
How does GMOD want to deal with integration issues?
  
How close to the sequencer does GMOD want to get?  We don't want to pull the data off the sequencer.
+
How close to the sequencer does GMOD want to get?  We don't want to pull the data off the sequencer.
  
 
Should we position GMOD as something that can feed data into places like Ensembl?  Ensembl does not have curation expertise of the MODs.  Even if NCBI is wonderful at consolidation, they won't have quality curation.  GMOD sits right there, supporting curation.  So, we doubt that Ensembl or NCBI will swallow us whole.
 
Should we position GMOD as something that can feed data into places like Ensembl?  Ensembl does not have curation expertise of the MODs.  Even if NCBI is wonderful at consolidation, they won't have quality curation.  GMOD sits right there, supporting curation.  So, we doubt that Ensembl or NCBI will swallow us whole.
Line 201: Line 203:
 
==== Releases and Bundles ====
 
==== Releases and Bundles ====
  
We need to figure out what components we want and what we are pushing.  If we focus on a core set of packages then life gets easier for the project.
+
We need to figure out what components we want and what we are pushing.  If we focus on a core set of packages then life gets easier for the project.
  
 
There was discussion of better release management for components, and the VMWare Community Annotation Server package.  Are GMOD bundles the way of the future?  Believe that binary packages are generally not going to work for GMOD unless someone is willing to put a lot of time into maintaining them.
 
There was discussion of better release management for components, and the VMWare Community Annotation Server package.  Are GMOD bundles the way of the future?  Believe that binary packages are generally not going to work for GMOD unless someone is willing to put a lot of time into maintaining them.
Line 225: Line 227:
 
===== New Development =====
 
===== New Development =====
  
Work has resumed on developing Apollo.  Ed formerly of TIGR/JCVI started working for Suzi Lewis at Berkeley this fall and is working on it.  Work is being done on
+
Work has resumed on developing [[Apollo]][[User:Elee|Ed Lee]] formerly of TIGR/JCVI started working for Suzi Lewis at Berkeley this fall and is working on it.  Work is being done on
  
* A GFF3 adaptor
+
* A [[GFF3]] adapter
* Speeding up Apollo when it uses Chado as a backend (or, just speeding up Chado).
+
* Speeding up Apollo when it uses [[Chado]] as a backend (or, just speeding up Chado).
 
* Communicating with more than one Chado instance.
 
* Communicating with more than one Chado instance.
 
* Undo/Redo support.
 
* Undo/Redo support.
 
  
 
===== ID Generation and JDBC Drivers =====
 
===== ID Generation and JDBC Drivers =====
  
Apollo can talk directly to a database or it can use XML files instead.  FlyBase, VectorBase, BeeBase, and BovineBase are all believed to take the XML approach.
+
Apollo can talk directly to a database or it can use XML files instead.  FlyBase, VectorBase, BeeBase, and BovineBase are all believed to take the XML approach.
  
 
Apollo currently has two choices for database adaptors:
 
Apollo currently has two choices for database adaptors:
Line 244: Line 245:
 
The trigger version is used in the Community Annotation Server and on the Dolan-Rice project.  We could not think of anywhere else it was used.  The triggerless version is used everywhere else that we knew of.
 
The trigger version is used in the Community Annotation Server and on the Dolan-Rice project.  We could not think of anywhere else it was used.  The triggerless version is used everywhere else that we knew of.
  
The trigger version is Postgres specific.  The triggerless version stores multiple copies of shared exons.  
+
The trigger version is Postgres specific.  The triggerless version stores multiple copies of shared exons.
  
 
Notes from Tuesday: Decided to actively discourage use of the trigger version.  Best thing may be to go through trigger code and externalize the logic.
 
Notes from Tuesday: Decided to actively discourage use of the trigger version.  Best thing may be to go through trigger code and externalize the logic.
Line 256: Line 257:
 
==== BioPerl, GFF ====
 
==== BioPerl, GFF ====
  
There was a discussion of BioPerl and how it relates to GMOD.
+
There was a discussion of [[BioPerl]] and how it relates to GMOD.
  
Jason Stajich created a slimmed down feature Perl package based on arrays instead of hashes: Bio::SeqFeature::Slim.  This is 70% faster for reading a GFF file.  Bio::Feature::IO only supports GFF3.  It is slow, uses heavy objects, and is strongly typed.  Jason wants to spend more time on middleware speed.  He also wants converter into a common object model and code to get it back out to any supported format.
+
Jason Stajich created a slimmed down feature Perl package based on arrays instead of hashes: Bio::SeqFeature::Slim.  This is 70% faster for reading a [[GFF]] file.  Bio::Feature::IO only supports [[GFF3]].  It is slow, uses heavy objects, and is strongly typed.  Jason wants to spend more time on middleware speed.  He also wants converter into a common object model and code to get it back out to any supported format.
  
 
6 to 8 people are currently contributing to BioPerl.
 
6 to 8 people are currently contributing to BioPerl.
  
  
GFF3 has an ID field.  ID is not clear in earlier versions.  GFF2 supports arbitrary feature types.  GFF3 requires SO types (but you can always ignore that).  Keep detailed alignment data in a separate database, not in GFF3.  Indicate in GFF3 that data is stored elsewhere.  Could store cigar strings in GFF3 and spec supports that.
+
[[GFF3]] has an ID field.  ID is not clear in earlier versions.  [[GFF2]] supports arbitrary feature types.  GFF3 requires SO types (but you can always ignore that).  Keep detailed alignment data in a separate database, not in GFF3.  Indicate in GFF3 that data is stored elsewhere.  Could store cigar strings in GFF3 and spec supports that.
  
  
Line 272: Line 273:
 
There was a request to make to Chado be more database neutral, rather than Postgres-specific.
 
There was a request to make to Chado be more database neutral, rather than Postgres-specific.
  
The slowness of Chado databases came up in several contexts.  David from UMD Medical Center started a Postgres performance page on the wiki.
+
The slowness of Chado databases came up in several contexts.  David from UMD Medical Center started a Postgres performance page on the wiki.
  
 
Scott described a potential way to implement materialized views in Chado that gets us most of the benefits of DBMS-supported materialized views.  Store
 
Scott described a potential way to implement materialized views in Chado that gets us most of the benefits of DBMS-supported materialized views.  Store
* the SQL to create it in a table,  
+
* the SQL to create it in a table,
 
* a run time schedule for when the table should be rebuilt,
 
* a run time schedule for when the table should be rebuilt,
* an enabled/disabled flag that is disabled by default.  
+
* an enabled/disabled flag that is disabled by default.
  
 
Question was raised if genome metadata fits into the current Chado.  The belief was that it does not.
 
Question was raised if genome metadata fits into the current Chado.  The belief was that it does not.
Line 289: Line 290:
  
  
===== Chado Validator =====  
+
===== Chado Validator =====
  
We discussed if a Chado database validator would be worthwhile.  A validator would check a Chado database to see if it conforms to the canonical model for a Chado database.  There was no consensus on the value or practicality of this.  There was consensus that no one was willing to volunteer to write it.
+
We discussed if a [[Chado]] database validator would be worthwhile.  A validator would check a Chado database to see if it conforms to the canonical model for a Chado database.  There was no consensus on the value or practicality of this.  There was consensus that no one was willing to volunteer to write it.
  
Ben suggested that if and when we do this, we use the GFF3 to Chado validator as a starting point.
+
Ben suggested that if and when we do this, we use the [[GFF3]] to Chado validator as a starting point.
  
  
Line 304: Line 305:
 
===== Postgres Performance =====
 
===== Postgres Performance =====
  
Slow performance of Chado Postgres implementations came up repeatedly.
+
Slow performance of Chado Postgres implementations came up repeatedly.
  
 
Some bits:
 
Some bits:
Line 316: Line 317:
 
==== CMap ====
 
==== CMap ====
  
'''Presentation:''' [[Media:2007_11_05_CMap_GMOD.ppt|CMap Progress Report]], Ben Faga
+
'''Presentation:''' [[Media:2007_11_05_CMap_GMOD.ppt|CMap Progress Report]], [[User:Faga|Ben Faga]]
  
 
New CMap release (1.0) is on its way.  Will have an assembly editor.    Includes a dot plot, new glyphs, and an install script based on the GBrowse install script.
 
New CMap release (1.0) is on its way.  Will have an assembly editor.    Includes a dot plot, new glyphs, and an install script based on the GBrowse install script.
Line 324: Line 325:
 
==== Community Annotation ====
 
==== Community Annotation ====
  
This was a popular motif in the meeting.
+
This was a popular motif in the meeting.
  
  
Line 331: Line 332:
 
'''Presentation:''' [[Media:CommunityAnnotationNov2007.pdf|Community Annotation]], [[User:Sperling|Linda Sperling]]
 
'''Presentation:''' [[Media:CommunityAnnotationNov2007.pdf|Community Annotation]], [[User:Sperling|Linda Sperling]]
  
Linda Sperling discussed ParameciumDB.  Paramecium is a small community with few resources and no dedicated curators.
+
Linda Sperling discussed ParameciumDB.  Paramecium is a small community with few resources and no dedicated curators.
  
 
Paramecium curators are a small set of people that must do their annotation from fixed IP addresses.  Curator annotations are kept in addition to existing Genoscope predictions.  These annotation are not validated when they are submitted.  Annotators cannot chage annotations made by other people.  There are two databases: one backing the website, and one where annotation goes.  Once a month the new annotation is pushed to the web site.  Validation happens prior to release.
 
Paramecium curators are a small set of people that must do their annotation from fixed IP addresses.  Curator annotations are kept in addition to existing Genoscope predictions.  These annotation are not validated when they are submitted.  Annotators cannot chage annotations made by other people.  There are two databases: one backing the website, and one where annotation goes.  Once a month the new annotation is pushed to the web site.  Validation happens prior to release.
Line 346: Line 347:
 
===== Community Annotation at SGN =====
 
===== Community Annotation at SGN =====
  
Lukas Mueller discussed SGN.
+
Lukas Mueller discussed [[:Category:SGN|SGN]].
  
 
SGN has data for tomato, potato, eggplant, and many other species.  SGN is locus centric.  Each locus has (or can have) a single person who is the editor/owner of that locus.  The locus editor can change anything about that locus that they want.  The name of the locus editor is displayed on the locus page.  Every locus has a "request editor privileges" link, if that locus has been assigned or not.
 
SGN has data for tomato, potato, eggplant, and many other species.  SGN is locus centric.  Each locus has (or can have) a single person who is the editor/owner of that locus.  The locus editor can change anything about that locus that they want.  The name of the locus editor is displayed on the locus page.  Every locus has a "request editor privileges" link, if that locus has been assigned or not.
Line 353: Line 354:
  
 
SGN supports tagging of loci.  Tags are free text that are rationalized after they are created.  The tagging metaphor for curation also came up in several contexts during the Genome Informatics meeting.
 
SGN supports tagging of loci.  Tags are free text that are rationalized after they are created.  The tagging metaphor for curation also came up in several contexts during the Genome Informatics meeting.
 
 
 
  
 
==== Community Annotation Server (CAS) ====
 
==== Community Annotation Server (CAS) ====
Line 364: Line 362:
 
** Picked Ubuntu LTS over CentOS because LTS stands for ''long term service'' and it will be supported for a while.
 
** Picked Ubuntu LTS over CentOS because LTS stands for ''long term service'' and it will be supported for a while.
 
* Postgres
 
* Postgres
* A Chado database with DictyBase data in it.
+
* A [[Chado]] database with DictyBase data in it.
 
* An empty Chado database
 
* An empty Chado database
* Modware
+
* [[Modware]]
* Apoolo - Uses the JDBC adaptor with triggers.  This is a Java WebStart version.
+
* [[Apollo]] - Uses the JDBC adaptor with triggers.  This is a Java WebStart version.
* GBrowse
+
* [[GBrowse]]
 
* MediaWiki - includes Cite, ProcessCite and TableEdit extensions.
 
* MediaWiki - includes Cite, ProcessCite and TableEdit extensions.
 
** Cite extensions make it easy to provide literature annotations.  Provide PubMed ID and it finds and grabs extract from PubMed.
 
** Cite extensions make it easy to provide literature annotations.  Provide PubMed ID and it finds and grabs extract from PubMed.
  
Note that it does not include Turnkey and/or GMODWeb.  Lincoln would like to add GMODweb, Testpresso and BioMart to that list.
+
Note that it does not include Turnkey and/or [[GMODWeb]][[User:Lstein|Lincoln]] would like to add GMODweb, [[Textpresso]] and [[BioMart]] to that list.
  
This can run on any Intel machine, inlcuding Apple.  Very little performance hit is caused by virtualization.
+
This can run on any Intel machine, including Apple.  Very little performance hit is caused by virtualization.
  
 
An online trial version of the Community Annotation Server was requested and was already on the way.
 
An online trial version of the Community Annotation Server was requested and was already on the way.
  
  
==== Distributed Annotation Server/2 (DAS/2) ====
+
==== Distributed Annotation System/2 (DAS/2) ====
  
Gregg Helt attended with the goal of bringing DAS/2 into the GMOD family.
+
Gregg Helt attended with the goal of bringing the [[Distributed Annotation System]], version 2 (DAS/2) into the GMOD family.
  
 
Preserving DAS/1 Strengths in DAS/2
 
Preserving DAS/1 Strengths in DAS/2
 
* Keep focus on location-based annotation of biological sequences.
 
* Keep focus on location-based annotation of biological sequences.
* Protocol, not an implementatoin.
+
* Protocol, not an implementation.
 
** HTTP for transport,
 
** HTTP for transport,
 
** URLs for queries
 
** URLs for queries
Line 393: Line 391:
 
* Couple XML response to URL request formats.
 
* Couple XML response to URL request formats.
 
* XML has been shortened, but big gain comes from client-server content format negotiation, including binary.  Empty elements dropped.
 
* XML has been shortened, but big gain comes from client-server content format negotiation, including binary.  Empty elements dropped.
* Uses HTTP caching in the client.
+
* Uses HTTP caching in the client.
* IGB - reference client for DAS2.  Integrated Genome Browser
+
* [[IGB]] - reference client for DAS2.  Integrated Genome Browser
  
Allen Day built a DAS2 server on top of chado.  That is in CVS.
+
Allen Day built a DAS2 server on top of [[Chado]].  That is in CVS.
  
 
There is a validation suite for server responses to different queries.
 
There is a validation suite for server responses to different queries.
Line 402: Line 400:
 
Spec has not changed in over a year.
 
Spec has not changed in over a year.
  
Scott would like that when someone installs Chado, they also get BioMart and DAS2.  That is, they get access by default.  Gregg would like to see GBrowse get a DAS/2 adapter.
+
Scott would like that when someone installs Chado, they also get [[BioMart]] and DAS2.  That is, they get access by default.  Gregg would like to see GBrowse get a DAS/2 adapter.
 
+
 
+
  
 
==== GBrowse ====
 
==== GBrowse ====
Line 411: Line 407:
 
===== Roadmap =====
 
===== Roadmap =====
  
Lincoln Stein talked about upcoming releases of GBrowse.
+
[[User:Lstein|Lincoln Stein]] talked about upcoming releases of [[GBrowse]].
  
* 1.69  
+
* 1.69
 
** Is in pre-release state.
 
** Is in pre-release state.
 
** Has
 
** Has
Line 428: Line 424:
 
** Major performance and scalability enhancements.
 
** Major performance and scalability enhancements.
 
*** e.g., each track can be drawn by different server or CPU.
 
*** e.g., each track can be drawn by different server or CPU.
* 3.0
+
* 3.0 ''(subsequently renamed to [[JBrowse]])
 
** Released sometime in 2008
 
** Released sometime in 2008
 
** Google maps type interface.
 
** Google maps type interface.
 
*** e.g., zooming and panning via mouse.
 
*** e.g., zooming and panning via mouse.
  
Version 3.0 is a fork of the code and version 2 and 3 are expected to co-exist 'forever'.  Some shops won't have the horsepower to power version 3, and Lincoln wants to keep it as an easy to install tool.
+
Version 3.0 (now called [[JBrowse]]) is a fork of the code and version 2 and 3 are expected to co-exist 'forever'.  Some shops won't have the horsepower to power version 3, and Lincoln wants to keep it as an easy to install tool.
 
+
 
+
  
 
===== Performance =====
 
===== Performance =====
  
Chado is usually too slow to run GBrowse on top of.  Consider using Bio::DB:GFF instead.  (Can't run GBrowse on top of BioMart.  No adapter exists because of BioMart's flexible schema.)
+
[[Chado]] is usually too slow to run [[GBrowse]] on top of.  Consider using Bio::DB:GFF instead.  (Can't run GBrowse on top of [[BioMart]].  No adapter exists because of BioMart's flexible schema.)
  
 
Jason S argues that GBrowse slows down when it does BioPerl object creation.  These are relatively heavyweight objects.  He has just written a Slim version that is up to 70% faster.
 
Jason S argues that GBrowse slows down when it does BioPerl object creation.  These are relatively heavyweight objects.  He has just written a Slim version that is up to 70% faster.
Line 451: Line 445:
 
'''Presentation:''' [http://eugenes.org/gmod/docs/gmod-update-07nov.ppt GMOD Indiana update] slides, Don Gilbert
 
'''Presentation:''' [http://eugenes.org/gmod/docs/gmod-update-07nov.ppt GMOD Indiana update] slides, Don Gilbert
  
Don Gilbert spoke about Genome Grid.
+
Don Gilbert spoke about [[Genome grid]].
  
Genome Grid is middleware to enable easy use of TeraGrid for genome analysis tasks.  Don is looking for genomes that need compute intensive analysis.  He also interested in applying BioMart and Ergatis to these problems.
+
Genome Grid is middleware to enable easy use of TeraGrid for genome analysis tasks.  Don is looking for genomes that need compute intensive analysis.  He also interested in applying [[BioMart]] and [[Ergatis]] to these problems.
  
 
==== Help Desk ====
 
==== Help Desk ====
  
Dave Clements introduced himself and the goals of the GMOD Help Desk position.
+
Dave Clements introduced himself and the goals of the [[GMOD Help Desk]] position.
  
Dave will make the help desk more visible on the web site, and add a GMOD News column to the home page.
+
Dave will make the help desk more visible on the web site, and add a [[GMOD News]] column to the [[Main Page|home page]].
  
 +
==== Pathway Tools ====
  
 +
'''Presentation:''' [[Media:GMOD-Nov-2007.ppt|Recent Developments in Pathway Tools]]
  
==== Pathway Tools ====
+
Suzanne Paley talked about recent developments in [[Pathway Tools]], including:
 
+
Suzanne Paley talked about recent developments in Pathway Tools, including:
+
  
 
* Advanced Query Form
 
* Advanced Query Form
Line 471: Line 465:
 
* Pathlogic over-infers pathways.  Pathways now have to be tagged to be shown.
 
* Pathlogic over-infers pathways.  Pathways now have to be tagged to be shown.
 
* Dataset diffs and incremental updates.
 
* Dataset diffs and incremental updates.
 
 
  
 
==== SynView ====
 
==== SynView ====
  
'''Presentation:''' [[Media:syntenyModeling.pdf|Modeling and Displaying Synteny w/ SynView]], Steve Fischer
+
'''Presentation:''' [[Media:syntenyModeling.pdf|Modeling and Displaying Synteny w/ SynView]], [[User:Stevef|Steve Fischer]]
  
Steve Fischer of ApiDB (see below) spoke about SynView.  SynView is a synteny browser based on GBrowse.  It is described in a [http://bioinformatics.oxfordjournals.org/cgi/content/full/22/18/2308 Bioinformatics paper].
+
[[User:Stevef|Steve Fischer]] of ApiDB (see below) spoke about SynView.  SynView is a synteny browser based on GBrowse.  It is described in a [http://bioinformatics.oxfordjournals.org/cgi/content/full/22/18/2308 Bioinformatics paper].
  
 
His talked raised a number of issues that have come up with recent extensions to SynView.
 
His talked raised a number of issues that have come up with recent extensions to SynView.
Line 494: Line 486:
 
These are all web interface layers that lay on top of Chado databases.
 
These are all web interface layers that lay on top of Chado databases.
  
GMODWeb is currently not working, we think because SQLTranslator has not been upgraded to deal with recent versions of Postgres.  Ben Faga agreed to actively work on this.
+
[[GMODWeb]] is currently not working, we think because SQLTranslator has not been upgraded to deal with recent versions of Postgres.  [[User:Faga|Ben Faga]] agreed to actively work on this.
  
 
Michael Caudy argued that even if GMODWeb did work right now that it is not extensible enough to support complex queries and presentation.  Mike presented Drupal, Drupal Views, and PHPTemplate as an alternative web framework for providing a web interface to Chado databases.  Mike demonstrated a prototype called DrupalFly that presents FlyBase data in an alternative organization.
 
Michael Caudy argued that even if GMODWeb did work right now that it is not extensible enough to support complex queries and presentation.  Mike presented Drupal, Drupal Views, and PHPTemplate as an alternative web framework for providing a web interface to Chado databases.  Mike demonstrated a prototype called DrupalFly that presents FlyBase data in an alternative organization.
  
Lincoln has an opening in Toronto for a full time programmer.  Lincoln will talk with Brian about GMODWeb's future.  We will put something on web site asking for volunteers to take on GMODweb.
+
[[User:Lstein|Lincoln]] has an opening in Toronto for a full time programmer.  Lincoln will talk with Brian about [[GMODWeb]]'s future.  We will put something on web site asking for volunteers to take on GMODweb.
  
  
Line 512: Line 504:
 
'''Presentation:''' [http://mango.ctegd.uga.edu/jkissingLab/presentations/GMOD_Nov_2007.ppt ApiDB GBrowse update], Haiming Wang
 
'''Presentation:''' [http://mango.ctegd.uga.edu/jkissingLab/presentations/GMOD_Nov_2007.ppt ApiDB GBrowse update], Haiming Wang
  
Steve Fischer talked about ApiDB.  ApiDB uses GUS as their schema.  They do multispecies comparative analysis.  They have a database adapter link from GBrowse to GUS.  It is based on the Chado adapter.  They use materialized views in Oracle 10G and it is still relatively slow.
+
[[User:Stevef|Steve Fischer]] talked about ApiDB.  ApiDB uses GUS as their schema.  They do multispecies comparative analysis.  They have a database adapter link from GBrowse to GUS.  It is based on the Chado adapter.  They use materialized views in Oracle 10G and it is still relatively slow.
  
  
Line 522: Line 514:
 
Syntenic maps at ApiDB are produced with Mercator.  The maps are based on gene orthology.  Gene orthologs are generated using OrthoMCL.  All alignments are pairwise, rather than multiple.  Orthology is represented outside standard GUS schema.  In the synteny schema, everything is defined relative to the reference sequence.  Also need a table to define anchors.
 
Syntenic maps at ApiDB are produced with Mercator.  The maps are based on gene orthology.  Gene orthologs are generated using OrthoMCL.  All alignments are pairwise, rather than multiple.  Orthology is represented outside standard GUS schema.  In the synteny schema, everything is defined relative to the reference sequence.  Also need a table to define anchors.
  
Steve Fischer showed an 11 track page, which has about 5000 popups in it.
+
[[User:Stevef|Steve Fischer]] showed an 11 track page, which has about 5000 popups in it.
  
 
ApiDB has a release cycle.  They discard and recalculate synteny with every new release.
 
ApiDB has a release cycle.  They discard and recalculate synteny with every new release.
 +
 +
  
 
==== Berkeley National Labs ====
 
==== Berkeley National Labs ====
  
The Berkeley group is actively involved in supporting and developing Chado, GO, SO, OBO-Edit, Phenote, Apollo, and the new AJAX GBrowse.
+
The Berkeley group is actively involved in supporting and developing [[Chado]], GO, SO, OBO-Edit, [[Phenote]], [[Apollo]], and the new [[Glossary#AJAX|AJAX]] [[GBrowse]].
 
+
  
 
==== FlyBase ====
 
==== FlyBase ====
  
Flybase has migrated their production databases to the Chado database schema.  FlyBase uses:
+
[[:Category:FlyBase|FlyBase]] has migrated their production databases to the [[Chado]] [[Glossary#Database Schema|database schema]].  FlyBase uses:
  
* Chado
+
* [[Chado]]
* GMOD XORT
+
* GMOD [[XORT]]
* Chado XML
+
* [[Chado XML]]
* Apollo
+
* [[Apollo]]
* BioMart
+
* [[BioMart]]
  
  
Line 545: Line 538:
 
===== Synteny at FlyBase =====
 
===== Synteny at FlyBase =====
  
Victor Strelets talked about OrthoView, an extension to GBrowse for viewing synteny.
+
Victor Strelets talked about OrthoView, an extension to [[GBrowse]] for viewing [[synteny]].
 
+
Victor also presented the genetic interactions viewer, a fast way of visualizing gene interactions.  It does not run directly off of the Chado database.
+
 
+
  
 +
Victor also presented the genetic interactions viewer, a fast way of visualizing gene interactions.  It does not run directly off of the [[Chado]] database.
  
 
==== GeneDB, Sanger ====
 
==== GeneDB, Sanger ====
Line 555: Line 546:
 
'''Presentation:''' [[Media:Workshop.pdf|Community Annotation]], Chinmay Patel
 
'''Presentation:''' [[Media:Workshop.pdf|Community Annotation]], Chinmay Patel
  
Chinmay Patel spoke about a week-long annotation project at Sanger involving 40 people all annotating the same genome.
+
Chinmay Patel spoke about a week-long annotation project at Sanger involving 40 people all annotating the same genome.
  
They used the Artemis annotation editor (instead of Apollo), but Artemis was talking to a Chado database using an Artemis-Chado Ibatis-based (instead of Hibernate-based) adapter.  The adapter is not yet released.
+
They used the [[Artemis]] annotation editor (instead of [[Apollo]]), but Artemis was talking to a [[Chado]] database using an Artemis-Chado Ibatis-based (instead of Hibernate-based) adapter.  The adapter is not yet released. (''But it is now: see [[Artemis-Chado Integration Tutorial]].'')
  
 
==== Imperial College London ====
 
==== Imperial College London ====
Line 587: Line 578:
 
* Apollo
 
* Apollo
 
* Turnkey
 
* Turnkey
 +
* GBrowse
  
  
Paramecium is an odd critter:
+
Paramecium is an odd critter (unicellular eukaryote, ciliate clade):
 
* 72 Mbp
 
* 72 Mbp
 
* 40K gene models
 
* 40K gene models
 
** 12,500 computationally identified potential errors.
 
** 12,500 computationally identified potential errors.
 
* At least 3 whole genome duplication events.
 
* At least 3 whole genome duplication events.
* Genome different in germ and somatic cells.
+
* Nuclear dimorphism. Germline nucleus (not yet sequenced) and somatic nucleus (sequenced) which is a rearranged version of the germline nucleus, streamlined for gene expression.
  
Fewer than 15 paramecium labs in the world.  Database supported with 1.5 staff.
+
Fewer than 20 paramecium molecular biology labs in the world.  Database supported with 1.5 staff.
  
It is important that people be able to click on a link, launch Apollo, add some curation and save it.  Their Apollo talks directly to Chado.  See Community Annotation above for more.
+
It is important that people be able to click on a link, launch Apollo, add some curation and save it.  Their Apollo talks directly to Chado (no triggers).  See Community Annotation above for more.
  
 
==== Riken ====
 
==== Riken ====
Line 609: Line 601:
 
==== University of Maryland Medical Center ====
 
==== University of Maryland Medical Center ====
  
Use Chado as a backend, a lot.  Use Sybil for comparative genomics, and are a mix of Postgres and Oracle.
+
Use [[Chado]] as a backend, a lot.  Use [[Sybil]] for [[:Category:Comparative Genomics|comparative genomics]], and are a mix of [[PostgreSQL]] and Oracle.
 
+
 
+
 
+
  
 
==== WormBase / CSHL ====
 
==== WormBase / CSHL ====
Line 624: Line 613:
 
===== GBrowse_Syn =====
 
===== GBrowse_Syn =====
  
'''Presentation:''' [[Media:Gbrowse_syn.pdf|Gbrowse_syn]], Sheldon McKay
+
'''Presentation:''' [[Media:Gbrowse_syn.pdf|Gbrowse_syn]], [[User:Mckays|Sheldon McKay]]
  
Sheldon McKay talked about GBrowse_syn, a prototype extension to GBrowse for viewing synteny.  Goal is to have a ''sequence'' alignment viewer that can look at more than two species at a time.  GBrowse_syn is based purely on sequence alignments.  It does not know about genes or orthologs per se.
+
Sheldon McKay talked about [[GBrowse_syn]], a prototype extension to [[GBrowse]] for viewing [[synteny]].  Goal is to have a ''sequence'' alignment viewer that can look at more than two species at a time.  GBrowse_syn is based purely on sequence alignments.  It does not know about genes or orthologs per se.
  
  
 
Used PECAN for the alignments.  Maps are precomputed in a very CPU-intensive step.
 
Used PECAN for the alignments.  Maps are precomputed in a very CPU-intensive step.
  
Chado may or may not support multiple alignments.
+
[[Chado]] may or may not support multiple alignments.
 
+
 
+
 
+
 
+
  
 +
[[Category:ApiDB]]
 +
[[Category:Apollo]]
 +
[[Category:Community Annotation]]
 +
[[Category:GBrowse]]
 +
[[Category:GBrowse syn]]
 +
[[Category:GMODWeb]]
 
[[Category:Meetings]]
 
[[Category:Meetings]]
 +
[[Category:ParameciumDB]]
 +
[[Category:Turnkey]]
 +
[[Category:SGN]]
 +
[[Category:JBrowse]]
 +
[[Category:DAS]]

Latest revision as of 05:54, 5 January 2011

GMOD's November 2007 meeting was held November 5, 1:30PM to November 7, 12:00PM at Cold Spring Harbor Laboratory following the Genome Informatics meeting.

Pre-Meeting Information

Possible topics

A list of suggested topics, raised in advance by GMOD community members.

  • community annotation - FlyBase seconds this topic
  • Chado standard on ortholog/paralog/synteny storage.
  • The state of GFF tools in BioPerl. Some of the auditing and examples are on a Bioperl wiki page.
  • GMOD releases and packaging
    • How hard would it be to heap together specific releases of popular GMOD components into a named/numbered release that has gone through some level of compatibility testing?
    • How much pain does a lack of such a release currently cause users?
    • how much might the community annotation server help with this?

Registration

There was a $25 registration fee to cover meals and other costs associated with the meeting. Please contact Scott Cain cain@cshl.edu if you need a reciept for your payment.

Location

The meeting was held at Cold Spring Harbor Laboratory's at the Woodbury building, which is not on the main CSHL campus.


Attendees

Agenda

Discussion Topics

We spent some time on our first day discussion what topics attendees would like to discusss. This list of topics helped shape the meeting agenda.

  • Postgres Tuning / Materialized views
    • Performance Strategies
  • Apollo-Chado Connection
  • Chado
    • ID Generation
    • Moving away from Postgres
    • Missing Chado pieces (phylogenetics)
  • What Should GMOD Focus On (What's Missing)
    • Genome Analysis (Galaxy, Ergatis, ...)
      • Lightweight annotation MAKER pipeline from Mark Yandell
        (2008/05/13: MAKER has since been folded in to GMOD.)
    • MicroArrays
    • What is the GMOD Community and how best can we serve them?
    • Is there a need for individual MODs?
  • What should GMOD Help Desk do?
    • UIs: Picture Intensive
  • What should be the outcome of this meeting?

November 5

1:00 Shuttle from Grace Auditorium to Woodbury

1:30 Introductions

2:30 Coffee

5:30 Shuttle from Woodbury to Grace Auditorium

November 6

8:50 Shuttle from Grace Auditorium to Woodbury

9:15 ? Scott

10:15 Community Annotation

  • Linda Sperling - ParameciumDB
  • Lukas Muller - SGN

10:30 Coffee

  • Michael Caudy - FlyBase Drupal

12:00 Lunch

1:00 Standards and applications for storing comparative genome data

2:30 Coffee

  • Sheldon McKay - gbrowse_syn


5:30 Shuttle from Woodbury to Grace Auditorium

November 7

8:50 Shuttle from Grace Auditorium to Woodbury

9:15 BioPerl

  • GFF3 tools
  • SeqFeatures/FeatureIO
  • Sequence Ontology

10:30 Coffee

12:00 Shuttle from Woodbury to Grace Auditorium


Presentations

Meeting Minutes

The minutes here are based on Dave Clements' notes from the meeting. They are far from complete and you are encouraged to expand and correct them.

The minutes are not chronological. Rather they are broken up into 3 sections:

Big Picture

We had several discussions about the big picture.


GMOD's Role

Don Gilbert pointed out that cheap short sequencers are now available. Lots of people have inexpensive sequnces, but there still is no way to do cheap annotation.

Current GMOD clients are species or family centered. Want to make it easy to integrate multiple species. ApiDB is at the point of opening new species databases and web sites with relatively little effort.

Comparative genomics came up over and over again, both across species and within species.

As data grows and is consolidated, issues of who owns the data and who's responsible for the annotation become more problematic.

How does GMOD want to deal with integration issues?

How close to the sequencer does GMOD want to get? We don't want to pull the data off the sequencer.

Should we position GMOD as something that can feed data into places like Ensembl? Ensembl does not have curation expertise of the MODs. Even if NCBI is wonderful at consolidation, they won't have quality curation. GMOD sits right there, supporting curation. So, we doubt that Ensembl or NCBI will swallow us whole.


Releases and Bundles

We need to figure out what components we want and what we are pushing. If we focus on a core set of packages then life gets easier for the project.

There was discussion of better release management for components, and the VMWare Community Annotation Server package. Are GMOD bundles the way of the future? Believe that binary packages are generally not going to work for GMOD unless someone is willing to put a lot of time into maintaining them.


Comparative Genomics

Comparative genomics came up over and over again, both across species and within species. The GBrowse_syn talk in particular spawned a discussion on this.

First, can Chado represent relationships that have more than two members? Yes. Feature_loc has a rank column. Do we want collections in Chado?

Jason suggested a working group on how to do this. Dave from UMD volunteered to manage a wiki page on this, with the end goal of establishing a document that defines how to store comparative genomes.

Talks on synteny are spread throughout this document.


GMOD Components / Functions

Apollo

New Development

Work has resumed on developing Apollo. Ed Lee formerly of TIGR/JCVI started working for Suzi Lewis at Berkeley this fall and is working on it. Work is being done on

  • A GFF3 adapter
  • Speeding up Apollo when it uses Chado as a backend (or, just speeding up Chado).
  • Communicating with more than one Chado instance.
  • Undo/Redo support.
ID Generation and JDBC Drivers

Apollo can talk directly to a database or it can use XML files instead. FlyBase, VectorBase, BeeBase, and BovineBase are all believed to take the XML approach.

Apollo currently has two choices for database adaptors:

  1. One that uses Postgres database triggers to set IDs.
  2. One that does not.

The trigger version is used in the Community Annotation Server and on the Dolan-Rice project. We could not think of anywhere else it was used. The triggerless version is used everywhere else that we knew of.

The trigger version is Postgres specific. The triggerless version stores multiple copies of shared exons.

Notes from Tuesday: Decided to actively discourage use of the trigger version. Best thing may be to go through trigger code and externalize the logic.

Notes from Wednesday: Apollo - Chado - No short term decision. Long term probably move to Crabtree.

As you may have noticed, those notes disagree.


BioPerl, GFF

There was a discussion of BioPerl and how it relates to GMOD.

Jason Stajich created a slimmed down feature Perl package based on arrays instead of hashes: Bio::SeqFeature::Slim. This is 70% faster for reading a GFF file. Bio::Feature::IO only supports GFF3. It is slow, uses heavy objects, and is strongly typed. Jason wants to spend more time on middleware speed. He also wants converter into a common object model and code to get it back out to any supported format.

6 to 8 people are currently contributing to BioPerl.


GFF3 has an ID field. ID is not clear in earlier versions. GFF2 supports arbitrary feature types. GFF3 requires SO types (but you can always ignore that). Keep detailed alignment data in a separate database, not in GFF3. Indicate in GFF3 that data is stored elsewhere. Could store cigar strings in GFF3 and spec supports that.



Chado

There was a request to make to Chado be more database neutral, rather than Postgres-specific.

The slowness of Chado databases came up in several contexts. David from UMD Medical Center started a Postgres performance page on the wiki.

Scott described a potential way to implement materialized views in Chado that gets us most of the benefits of DBMS-supported materialized views. Store

  • the SQL to create it in a table,
  • a run time schedule for when the table should be rebuilt,
  • an enabled/disabled flag that is disabled by default.

Question was raised if genome metadata fits into the current Chado. The belief was that it does not.

Jason Stajich wants a better idea of who is responsible for what in terms of Chado modules. Dave C will take this on.


Chado Documentation

The table level and column level documentation for Chado is in a good state. Enhanced basic, big picture documentation was requested. Josh Goodman is thinking of providing a mapping from Chado DB columns to FlyBase report columns. Mike Caudy pointed out we should have multiple examples of implementation, not just FlyBase.


Chado Validator

We discussed if a Chado database validator would be worthwhile. A validator would check a Chado database to see if it conforms to the canonical model for a Chado database. There was no consensus on the value or practicality of this. There was consensus that no one was willing to volunteer to write it.

Ben suggested that if and when we do this, we use the GFF3 to Chado validator as a starting point.


DBMS Choice

There was a request to make to Chado be more database neutral, rather than Postgres-specific. Someone also asked if there was an SQLite adapter for GBrowse.


Postgres Performance

Slow performance of Chado Postgres implementations came up repeatedly.

Some bits:

  • Specify locale. ASCII-US runs fast. UTF-8 is slow and that is the default. Specified for the server, at server start.
  • A lot of time has been spent on making the queries go fast.
  • RTree indexes are in the core.
  • Allen's FRange functions are in the DB, but aren't used by default queries.


CMap

Presentation: CMap Progress Report, Ben Faga

New CMap release (1.0) is on its way. Will have an assembly editor. Includes a dot plot, new glyphs, and an install script based on the GBrowse install script.

Ben will ask users to do beta testing, and hopes to start with that before end of 2007. Ben is looking for a project that is doing large scale assembly, to test CMap for doing assembly correction.

Community Annotation

This was a popular motif in the meeting.


Community Annotation at ParameciumDB

Presentation: Community Annotation, Linda Sperling

Linda Sperling discussed ParameciumDB. Paramecium is a small community with few resources and no dedicated curators.

Paramecium curators are a small set of people that must do their annotation from fixed IP addresses. Curator annotations are kept in addition to existing Genoscope predictions. These annotation are not validated when they are submitted. Annotators cannot chage annotations made by other people. There are two databases: one backing the website, and one where annotation goes. Once a month the new annotation is pushed to the web site. Validation happens prior to release.

They are also using ParameciumDB to teach annotation at two colleges, and some annotation comes from that. The bulk of annotations come from 2 curators, with the other curators all making a small number of annotations.

Uses Java WebStart version of Apollo. Annotators click on link and Apollo starts up. Apollo talks directly to Chado, using the triggerless database adapter.

Community Annotation at JGI

Don Gilbert briefly described community annotation at JGI. They have a web interface for simple annotations and use Apollo for complex annotations. Anyone can promote any gene model, but they can't delete other models. Use the Wikipedia model: Whoever annotates last is correct.


Community Annotation at SGN

Lukas Mueller discussed SGN.

SGN has data for tomato, potato, eggplant, and many other species. SGN is locus centric. Each locus has (or can have) a single person who is the editor/owner of that locus. The locus editor can change anything about that locus that they want. The name of the locus editor is displayed on the locus page. Every locus has a "request editor privileges" link, if that locus has been assigned or not.

All edits are logged, and nothing is ever truly deleted. 'Deleted' items are retained but flagged as obsolete and are no longer shown.

SGN supports tagging of loci. Tags are free text that are rationalized after they are created. The tagging metaphor for curation also came up in several contexts during the Genome Informatics meeting.

Community Annotation Server (CAS)

Scott Cain spoke about this. It is almost ready to go. The Community Annotation Server (CAS) is meant to be "GMOD in a box". Currently it consists of:

  • A VMWare image, containing
  • Ubuntu Linux, version 6.10 LTS.
    • Picked Ubuntu LTS over CentOS because LTS stands for long term service and it will be supported for a while.
  • Postgres
  • A Chado database with DictyBase data in it.
  • An empty Chado database
  • Modware
  • Apollo - Uses the JDBC adaptor with triggers. This is a Java WebStart version.
  • GBrowse
  • MediaWiki - includes Cite, ProcessCite and TableEdit extensions.
    • Cite extensions make it easy to provide literature annotations. Provide PubMed ID and it finds and grabs extract from PubMed.

Note that it does not include Turnkey and/or GMODWeb. Lincoln would like to add GMODweb, Textpresso and BioMart to that list.

This can run on any Intel machine, including Apple. Very little performance hit is caused by virtualization.

An online trial version of the Community Annotation Server was requested and was already on the way.


Distributed Annotation System/2 (DAS/2)

Gregg Helt attended with the goal of bringing the Distributed Annotation System, version 2 (DAS/2) into the GMOD family.

Preserving DAS/1 Strengths in DAS/2

  • Keep focus on location-based annotation of biological sequences.
  • Protocol, not an implementation.
    • HTTP for transport,
    • URLs for queries
    • XML for responses
    • REST-like style.
  • No Required central authority.
  • Couple XML response to URL request formats.
  • XML has been shortened, but big gain comes from client-server content format negotiation, including binary. Empty elements dropped.
  • Uses HTTP caching in the client.
  • IGB - reference client for DAS2. Integrated Genome Browser

Allen Day built a DAS2 server on top of Chado. That is in CVS.

There is a validation suite for server responses to different queries.

Spec has not changed in over a year.

Scott would like that when someone installs Chado, they also get BioMart and DAS2. That is, they get access by default. Gregg would like to see GBrowse get a DAS/2 adapter.

GBrowse

Roadmap

Lincoln Stein talked about upcoming releases of GBrowse.

  • 1.69
    • Is in pre-release state.
    • Has
      • popups
      • drag tracks vertically
      • quantitative data
      • multiple alignment and conservation tracks.
  • 1.7
    • Release by end of year
    • Rubberbanding (zoom by selecting a rectangle with mouse)
    • Autocomplete
  • 2.0
    • Release in early 2008
    • Major performance and scalability enhancements.
      • e.g., each track can be drawn by different server or CPU.
  • 3.0 (subsequently renamed to JBrowse)
    • Released sometime in 2008
    • Google maps type interface.
      • e.g., zooming and panning via mouse.

Version 3.0 (now called JBrowse) is a fork of the code and version 2 and 3 are expected to co-exist 'forever'. Some shops won't have the horsepower to power version 3, and Lincoln wants to keep it as an easy to install tool.

Performance

Chado is usually too slow to run GBrowse on top of. Consider using Bio::DB:GFF instead. (Can't run GBrowse on top of BioMart. No adapter exists because of BioMart's flexible schema.)

Jason S argues that GBrowse slows down when it does BioPerl object creation. These are relatively heavyweight objects. He has just written a Slim version that is up to 70% faster.

Browser speed was also the number one issue (with all browsers) at the Genome Browsers Birds-of-a-Feather meeting at Genome Informatics.


Genome Grid

Presentation: GMOD Indiana update slides, Don Gilbert

Don Gilbert spoke about Genome grid.

Genome Grid is middleware to enable easy use of TeraGrid for genome analysis tasks. Don is looking for genomes that need compute intensive analysis. He also interested in applying BioMart and Ergatis to these problems.

Help Desk

Dave Clements introduced himself and the goals of the GMOD Help Desk position.

Dave will make the help desk more visible on the web site, and add a GMOD News column to the home page.

Pathway Tools

Presentation: Recent Developments in Pathway Tools

Suzanne Paley talked about recent developments in Pathway Tools, including:

  • Advanced Query Form
  • Richer representation of regulation
  • Pathlogic over-infers pathways. Pathways now have to be tagged to be shown.
  • Dataset diffs and incremental updates.

SynView

Presentation: Modeling and Displaying Synteny w/ SynView, Steve Fischer

Steve Fischer of ApiDB (see below) spoke about SynView. SynView is a synteny browser based on GBrowse. It is described in a Bioinformatics paper.

His talked raised a number of issues that have come up with recent extensions to SynView.

TableEdit

This is a MediaWiki extension by Jim Hu. It does two things. First, it makes it easier to update tables in MediaWiki, by presenting a nicer interface for altering wiki tables. Secondly, it supports synchronizing MediaWiki tables from database tables and vice versa.



Turnkey, GMODweb, DrupalFly

These are all web interface layers that lay on top of Chado databases.

GMODWeb is currently not working, we think because SQLTranslator has not been upgraded to deal with recent versions of Postgres. Ben Faga agreed to actively work on this.

Michael Caudy argued that even if GMODWeb did work right now that it is not extensible enough to support complex queries and presentation. Mike presented Drupal, Drupal Views, and PHPTemplate as an alternative web framework for providing a web interface to Chado databases. Mike demonstrated a prototype called DrupalFly that presents FlyBase data in an alternative organization.

Lincoln has an opening in Toronto for a full time programmer. Lincoln will talk with Brian about GMODWeb's future. We will put something on web site asking for volunteers to take on GMODweb.



GMOD Participating Organizations

A number of organizations talked about their recent work.


ApiDB

Presentation: ApiDB GBrowse update, Haiming Wang

Steve Fischer talked about ApiDB. ApiDB uses GUS as their schema. They do multispecies comparative analysis. They have a database adapter link from GBrowse to GUS. It is based on the Chado adapter. They use materialized views in Oracle 10G and it is still relatively slow.


Synteny at ApiDB

See SynView above for details on SynView.

Syntenic maps at ApiDB are produced with Mercator. The maps are based on gene orthology. Gene orthologs are generated using OrthoMCL. All alignments are pairwise, rather than multiple. Orthology is represented outside standard GUS schema. In the synteny schema, everything is defined relative to the reference sequence. Also need a table to define anchors.

Steve Fischer showed an 11 track page, which has about 5000 popups in it.

ApiDB has a release cycle. They discard and recalculate synteny with every new release.


Berkeley National Labs

The Berkeley group is actively involved in supporting and developing Chado, GO, SO, OBO-Edit, Phenote, Apollo, and the new AJAX GBrowse.

FlyBase

FlyBase has migrated their production databases to the Chado database schema. FlyBase uses:


Synteny at FlyBase

Victor Strelets talked about OrthoView, an extension to GBrowse for viewing synteny.

Victor also presented the genetic interactions viewer, a fast way of visualizing gene interactions. It does not run directly off of the Chado database.

GeneDB, Sanger

Presentation: Community Annotation, Chinmay Patel

Chinmay Patel spoke about a week-long annotation project at Sanger involving 40 people all annotating the same genome.

They used the Artemis annotation editor (instead of Apollo), but Artemis was talking to a Chado database using an Artemis-Chado Ibatis-based (instead of Hibernate-based) adapter. The adapter is not yet released. (But it is now: see Artemis-Chado Integration Tutorial.)

Imperial College London

Using GMOD to support a fungal sequencing project. Using:

  • Chado
  • GBrowse
  • Apollo


JCVI (nee TIGR)

Using Chado as database schema.


MaizeGDB

Taner Sen from MaizeGDB was at the meeting. Maize has multiple groups generating different gene models. It would be nice to display each groun in a separate track. MaizeGDB is evaluating genome browsers and is considering using GBrowse.

ParameciumDB

Presentation: Community Annotation, Linda Sperling

Use GMOD for almost everything:

  • Chado
  • Apollo
  • Turnkey
  • GBrowse


Paramecium is an odd critter (unicellular eukaryote, ciliate clade):

  • 72 Mbp
  • 40K gene models
    • 12,500 computationally identified potential errors.
  • At least 3 whole genome duplication events.
  • Nuclear dimorphism. Germline nucleus (not yet sequenced) and somatic nucleus (sequenced) which is a rearranged version of the germline nucleus, streamlined for gene expression.

Fewer than 20 paramecium molecular biology labs in the world. Database supported with 1.5 staff.

It is important that people be able to click on a link, launch Apollo, add some curation and save it. Their Apollo talks directly to Chado (no triggers). See Community Annotation above for more.

Riken

Riken uses GBrowse.



University of Maryland Medical Center

Use Chado as a backend, a lot. Use Sybil for comparative genomics, and are a mix of PostgreSQL and Oracle.

WormBase / CSHL

Presentation: Keynote, Powerpoint, PDF, Mov, Todd Harris

Wormbase is migrating to Chado slowly. There is currently very little Chado there.


GBrowse_Syn

Presentation: Gbrowse_syn, Sheldon McKay

Sheldon McKay talked about GBrowse_syn, a prototype extension to GBrowse for viewing synteny. Goal is to have a sequence alignment viewer that can look at more than two species at a time. GBrowse_syn is based purely on sequence alignments. It does not know about genes or orthologs per se.


Used PECAN for the alignments. Maps are precomputed in a very CPU-intensive step.

Chado may or may not support multiple alignments.