Common Gene Page

From GMOD
Revision as of 17:15, 14 July 2008 by Dongilbert (Talk | contribs)

Jump to: navigation, search

Common Gene Page Rationale

Model organism/genome databases (MODs) produce gene pages of similar gene data, and may benefit from looking at unifying these to common structure, labelling, etc.

A list of common gene page attributes

* Names, symbols/IDs, synonyms
* Map locations
* Sequences
* Reagents
* Gene ontology
* Similar Genes
* Database cross-refs, External links
* Alleles, Transcripts
* Proteins, Structure and Domains
* Expression and Mutant Phenotypes
* Gene Interactions
* Literature references
* Summary Text

Notes for Discussion 2008

From Dongilbert 13:15, 14 July 2008 (EDT) :

In hopes there will be a lively discussion on this topic at the July 2008 GMOD meeting here are some of thoughts. I would like to attend, but instead will be later in the week at the ISMB 2008 Toronto meeting, and hope to hear some outcomes of this.

It seems to me the only real issue in moving forward with a common gene page, is how to convince MOD projects to adopt the same gene summary format for our many shared customers. I'd like to see an agreement among 2+ genome data providers to actually produce and deploy a common gene report within the coming year.

There is a history in genome informatics of everyone doing their own thing across projects with common genome data and common customer needs. Low expectations come from this and other GMOD common goals/recommendations, e.g. simple, Standard URL for genome data. Some efforts do achieve common usage and consensus: GFF(3) format, GBrowse, Chado schema/db, Apollo annotator, others.

The common gene report concept to date is to provide consumers of genome data with a common format, both for web display and for simple computing. It is aimed at simple summaries of gene data, structured in a common way across organisms, suitable bioscientists and students to read and use as web pages and data files (XML) and do simple computing on if desired.

One can see it as alternate option to a MOD project's full, project-specific documents. It isn't aimed at full, complex data exchange among databases. Other formats/methods exist for that.

Although there are engineering details for implementing this for any project, this isn't likely to be more than a small effort. We were able to use simple web-page scraping software to convert existing MOD gene reports into a common format (see http://eugenes.org/gmod/gene-report-examples/)

User-interface and web page design/display aspects can be tuned to each MOD's desires. The main thrust is of a common gene page is having common data labelled in a similar way. Agreement on an XML notation should follow in a straightforward way from common data fields.

I (dgg) will be happy to work with any group of 2+ MODs agreeing to deploy a common gene report. The software and example UGP-XML cases can be adapted to help this. Background at http://gmod.org/Common_Gene_Page

Example uses

Early documents and samples

See this folder for some discussion, documents and examples for MOD gene pages from 2004: http://eugenes.org/gmod/gene-report-examples/

More discussion and samples

See this blog entry on a 2005 meeting disccussion, http://blog.gmod.org/common_gene_pages

Daphnia genome database use case

There is an implementation of how this can be used at Daphnia-base, where the gene reports are structured XML, with a style sheet to display. For example, see this gene page, http://wfleabase.org/lucegene/lookup?id=NCBI_GNO_292134 (view the page source to see structured gene page XML). Or see these screen shots daphnia gene page and gene page xml.

There is a simple perl tool to turn annotated GFF data into this gene page XML, suitable for search and display, in GMOD genepages in CVS or http://eugenes.org/gmod/gene-report-examples/ for bin/gff2ugpxml.pl

Search and display is then provided by the GMOD LuceGene tool, detailed at LuceGene_for_Daphnia_genome.