Difference between revisions of "GMOD Evo Hackathon Proposal Supplemental Information"

From GMOD
Jump to: navigation, search
(New page: <big>'''THIS IS STILL A DRAFT'''</big> On this page you will find additional information related to the GMOD Evo Hackathon Proposal. === Other topics of Secondary Importance === ====...)
(No difference)

Revision as of 15:12, 30 April 2010

THIS IS STILL A DRAFT

On this page you will find additional information related to the GMOD Evo Hackathon Proposal.

Other topics of Secondary Importance

Natural Diversity / Population Genetics / Multidimensional Data Visualization in a Genomic Context

The Barley1K project (Eyal Fridman group, The Hebrew University) is an example dataset that should be supportable by GMOD. They gathered a thousand wild samples of barley in a hierarchical mode of collection (51 sites that include 5 microsite on different slopes or niches within the site). They also recorded many local environmental conditions and collected detailed phenotype data on portion of this collection, including that of a diverse set of interspecifc hybrids derived from a genetically well-defined core collection (by Illumina Golden Gate platform, the Barley OPA array[BOPA1]). The Natural Diversity module will allow us to store this type of data including also accumulated allelic variation obtained from the microsatellites (SSRs) and BOPA1 array, as well as from next generation sequencing of cDNA libraries . However, we lack tools to visualize such multi-dimensional data in a genomic context (e.g., in GBrowse, JBrowse, and GBrowse_syn) including the association of genome-phenotype (phenome). This could be solved either with specific new glyphs and plugins, or with generic interfaces to statistical/geolocation/image based visualization packages.

There is also work currently under way to extend GFF3 to handle variant information. Several existing GMOD tools will need to be modified to recognize this data.

Web-based display of Chado Phenotype/Natural Diversity data

Tripal is a Drupal-based web interface to Chado databases. It supports interfaces for several popular data types, but does not currently support phylogenies, phenotypes, expression, or natural diversity data. We could extend it to evolutionary data types as part of the hackathon.


Support for pangenomes and core genomes

The concept of the pangenome and core genomes is becoming common in the analysis of bacterial genomes, but is more broadly applicable. The pangenome is the union of all genes found in all strains of a species, while the core genome is the intersection of those sets. In both cases, a gene or feature is a generalization of the instances of the feature in multiple genomes. The gene in a pangenome, like a gene in an inferred ancestor, does not have a physical location, but it may have one or more contextual locations in a syntenic block of sequence found in some or all of the strains.

Support for annotation tools based on phylogenetic analysis, such as PAINT

The RefGenome project of the GO consortium is working on PAINT, a system for doing inference of GO annotations based on the distribution of curated annotations within clades and outgroups. GMOD tools and schemas need to be prepared to handle this kind of annotation. For example, ancestor nodes in PANTHER trees will have accessions; these will require versioning to deal with changes in the analysis as annotations to descendants and the addition/placement of descendants changes with the addition of new genomes or revision of the orthology analysis.

Linking xrate (and other phylo-aware annotation tools) to jbrowse

The conservation track is a staple of the UCSC browser. Evofold predictions form another useful track. Tools like xrate (http://biowiki.org/XRATE) allow automation and generalization of these kinds of phylogenetic HMM or SCFG model. Development would focus on linking these into existing GMOD browsers (e.g. JBrowse).