Difference between revisions of "GMOD Evo Hackathon"

From GMOD
Jump to: navigation, search
m
Line 2: Line 2:
 
{| style="vertical-align: middle; border: 2px solid #A6A6BC" cellpadding="10"
 
{| style="vertical-align: middle; border: 2px solid #A6A6BC" cellpadding="10"
 
| {{#icon: EvoHackathonLogo.png|GMOD Evo Hackathon|200|https://www.nescent.org/wg_gmodevohackathon/}}
 
| {{#icon: EvoHackathonLogo.png|GMOD Evo Hackathon|200|https://www.nescent.org/wg_gmodevohackathon/}}
| <span style="font-size: 200%; line-height: 120%"><b>[https://www.nescent.org/wg_gmodevohackathon/ Tools for Evolutionary Biology Hackathon]</b><br />November 8-12, 2010<br />[http://nescent.org/ NESCent], Durham, North Carolina, USA</span>
+
| <span style="font-size: 180%; line-height: 120%"><b>[https://www.nescent.org/wg_gmodevohackathon/ Tools for Evolutionary Biology Hackathon]</b><br />November 8-12, 2010<br />[http://nescent.org/ NESCent], Durham, North Carolina, USA</span><br /><br />[[Image:EvoHackRoom2010.jpg|290px]]&nbsp;[[Image:EvoHackWhiteboard2010.jpg|149px]]
 
|}
 
|}
 
</center>
 
</center>

Revision as of 02:01, 21 December 2010

GMOD Evo Hackathon|200|https://www.nescent.org/wg_gmodevohackathon/}} Tools for Evolutionary Biology Hackathon
November 8-12, 2010
NESCent, Durham, North Carolina, USA


EvoHackRoom2010.jpg EvoHackWhiteboard2010.jpg

__NOTITLE__

GMOD will be holding a hackathon November 8-12, 2010, at the National Evolutionary Synthesis Center (NESCent) in Durham, North Carolina. This hackathon will focus on improving GMOD's support for evolutionary biology.

The Open Call for Participation went out on August 1, 2010, and remained open until August 25. Participants have been selected and notified of their status.

Synopsis

This hackathon will fill critical gaps in the capabilities of the Generic Model Organism Database (GMOD) toolbox that currently limit its utility for evolutionary research. Specifically, we will focus on tools for 1) viewing comparative genomics data; 2) visualizing phylogenomic data; and 3) supporting population diversity data and phenotype annotation.

The event will bring together a group of about 20 software developers, end-user representatives, and documentation experts who would otherwise not meet. The participants will include key developers of GMOD components that currently lack features critical for emerging evolutionary biology research, developers of informatics tools in evolutionary research that lack GMOD integration, and informatics-savvy biologists who can represent end-user requirements.

This hackathon will provide a unique opportunity to infuse the community of GMOD developers with a heightened awareness of unmet needs in evolutionary biology that GMOD components have the potential to fill, and for tool developers in evolutionary biology to better understand how best to extend or integrate with already existing GMOD components.

Background

The GMOD project is a confederation of intercompatible open-source projects developing software tools for storing, managing, curating, and publishing biological data. Although the GMOD project originated from the goal of developing a generic tool set for common needs among model organism databases, GMOD tools are meanwhile used by many large and small, collaborative and single-investigator biological database projects for the dissemination of experimental results and curated knowledge.

GMOD's software tools provide a powerful and feature-rich basis for working with biological, in particular genomic and other molecular data. However, due to GMOD's historical emphasis on single-genome projects many GMOD tools still lack features that are critical to effectively support the comparative, phylogenetic, and natural diversity-oriented questions frequently asked in evolutionary research.

Recent developments have given rise to a window of opportunity for forging collaborations towards filling this gap. In particular, the cost of collecting comparative molecular data on a large or even genomic scale has recently dropped dramatically, primarily thanks to next-generation high-throughput sequencing technologies. This has enabled evolutionary researchers to bring genome-scale molecular data to bear on key evolutionary questions. It has also allowed single organism-focused molecular biology labs, who represent GMOD's traditional user base, to broaden out to multi-organism comparative approaches. Bringing these two communities with increasingly shared interests and complementary scientific and technical expertise together offers an opportunity to start filling GMOD's gaps in these areas while building on its existing strengths. In addition, such direct interaction will heighten future awareness of needs of evolutionary researchers among GMOD developers who have so far mostly supported its traditional user base, and can in the long term increase the ranks of GMOD contributors from a field it was not originally designed to serve.

The hackathon format is ideally suited to realize this opportunity. Its strengths lie in facilitating face-to-face interaction among people with complementary expertise, and collaborative work on tangible products that can form the basis of continued partnerships long beyond the end of the meeting.

Specific objectives

Organizers have identified the following broad themes for focusing work at the event. Before and at the hackathon, the participants will refine and distill these and other options into concrete implementation targets. The participants will develop criteria for prioritization, such as maturity of a target for implementation, availability of test data, and potential for completing or making significant progress towards the target during the hackathon. Further ideas and discussion topics can be found on the Supplemental Information page.

Viewing tools for comparative genomics data

GBrowse_syn is a popular GMOD component for viewing comparative genomics data, particularly for viewing synteny between genomes. It does not currently support the next-generation sequencing (NGS) data increasingly available for comparative genomics and emerging model systems. Support for NGS data was identified by the EMS working group as a high priority.

In particular, GBrowse_syn lacks support for the Sequence Alignment Format (SAM), its mechanism of storing genome comparisons does not scale beyond a few organisms, and the means for tracking the necessary alignment metadata in Chado are insufficient.

In addition to filling those gaps, GBrowse_syn would also particularly stand to benefit from the event by gaining a more sustainable developer base.

Visualization of phylogenetic data and trees

The GMOD toolkit at present does not include web-based alignment viewers, nor can the increasingly popular JBrowse genome browser (the designated successor of GBrowse) display multiple sequence alignments. GMOD also lacks a phylogenetic tree widget.

Implementing these from scratch would be far beyond a suitable hackathon target. However, SGN has a relatively mature web-based multiple alignment and tree browser that could be extracted from SGN's codebase and transformed into a GMOD component, an add-on for JBrowse. Current Java-based tree viewers (such as Archaeopteryx or PhyloWidget) could be used as the basis for a JavaScript-based tree viewer (or an applet that can be controlled through JavaScript) that integrates with JBrowse.

Population Diversity and Phenotype support

GMOD's capabilities in managing phenotype and natural diversity data is scattered across partially redundant and outdated modules, does not support modern ontology-based entity-quality data, and lacks a web-interface. The sophisticated phenotype annotation tools that do exist cannot interface with Chado, GMOD's central relational data model. Yet, phenotypic and genetic diversity data are central to many evolutionary research questions.

A Natural Diversity Module initiative to address at least the deficiencies within Chado has already formed earlier this year. Several key developers (one of the original developers of the module, and the developer of Phenex, a phenotype curation tool) are already local to NESCent, and so the hackathon provides a unique opportunity to review and refine the natural diversity data model face-to-face, and to integrate it with an updated and reconciled phenotype module. A recently reported prototype of a Chado data adapter for Phenote, GMODs phenotype annotation tool, could be generalized to become the data persistence interface for such data.

Aside from the data model deficiencies, the ANISEED project has started efforts to generalize its sophisticated atlas/image-based web interface for phenotype data, and to make it operate on top of Chado. The hackathon could harness this synergy to help this effort leap forward, which could ultimately provide GMOD with the currently missing web-interface for such data.

Hackathon Structure

Before the Event

Discussion of ideas and sometimes even design actually starts well before the hackathon, on mailing lists, wiki pages, and conference calls set up among accepted attendees. This advance work lays the foundation for participants to be productive from the very first day. This also means that participants should be willing to contribute some time in advance of the hackathon itself to participate in this preparatory discussion.

During the Event

Typically, hackathon participants use the morning of the first day of the event to organize themselves into working groups of between 3 and 6 people, each with a focused implementation objective. Ideas and objectives are discussed, and attendees coalesce around the projects in which they have the most experience or interest.

Deliverables / Event Results

The hackathon will use a wiki hosted at NESCent during the event. Once the hackathon is done, relevant content will be copied from the NESCent wiki to the GMOD wiki. Each working group during the event will typically have its own wiki page, linked from the main hackathon wiki page, where it documents its minutes and design notes, and provides links to the code and documentation it produces. Also, since GMOD and NESCent are both committed to open source principles, all code and documentation produced by participants during the hackathon must be published under an OSI-approved open source license. As contributions to existing GMOD tools, all hackathon products will most likely satisfy this requirement automatically.

Participant Funding

NESCent is sponsoring this hackathon, and had made funds available to defray costs for qualified participants.

Timeline

June 3, 2010 Proposal submitted to NESCent
June 10, 2010 Funding approved
August 1, 2010 Open call for participants, applications open
August 25, 2010 Open call application deadline
September 16, 2010 Applicants notified
September 24, 2010 Deadline for participant attendance commitment

Sponsorship

This event is sponsored by the US National Evolutionary Synthesis Center (NESCent) through its Informatics Whitepapers program. NESCent promotes the synthesis of information, concepts and knowledge to address significant, emerging, or novel questions in evolutionary science and its applications. NESCent achieves this by supporting research and education across disciplinary, institutional, geographic, and demographic boundaries. NESCent|120|http://nescent.org/}}

Organizing Committee