Difference between revisions of "Chado Natural Diversity Module Working Group"

From GMOD
Jump to: navigation, search
m (Observational Unit?)
m (debug wiki syntax)
 
(38 intermediate revisions by 7 users not shown)
Line 1: Line 1:
{{UnderConstruction|}}
+
The '''[[Chado Natural Diversity Module]] Working Group''' was established with the aim of getting the Chado Natural Diversity Module into the production version of [[Chado]].
 
+
The [[Chado]] Natural Diversity Working Group has been established with the aim of getting the Chado Natural Diversity Module into the production version of [[Chado]].
+
 
+
 
+
= Background and Timeline =
+
 
+
This section describes important events in the development of the module.  Detailed discussion of functionality is a separate section below.
+
 
+
== 2007 ==
+
 
+
The [http://sourceforge.net/projects/heliconiusdb/develop initial version] of the Natural Diversity Module was developed by several people associated with [http://nescent.org NESCent].  The initial application was ''heliconius'' research.  This first version (and subsequent versions) are directly inspired by the [http://www.maizegenetics.net/gdpdm/ Genomic Diversity and Phenotype Data Model (GDPDM)], which comes out of Cornell.  The GDPDM has great documentation and is also described in [[:Image:GDPDM_GMOD_presentation20060630.ppt|this presentation]].
+
 
+
== 2009 ==
+
 
+
In the fall of 2009, Sook Jung of Washington State downloaded the initial version (becoming the first known user outside ''heliconius'') and started looking at it with the goal of using for [http://www.rosaceae.org/ GDR], a plant genome database.  Sook found that a number of things weren't clear, and her input led to a rethinking of the design, and to the formation of this working group.
+
 
+
== 2010 ==
+
 
+
=== [[January 2010 GMOD Meeting#Satellite Meetings|January 2010 GMOD Satellite Meeting]] ===
+
 
+
Several working group participants met twice during [[PAG 2010]], immediately before the [[January 2010 GMOD Meeting]].
+
 
+
  
  
 
= Discussion =
 
= Discussion =
  
At this time (January 2010), most of this discussion is about making changes relative to the version that was created in 2007 with Heliconius in mindThis is referred to below as '''''HDB'''''.
+
See the [[Talk:Chado Natural Diversity Module Working Group|discussion page]] for notes on what we've talked about and where we are headingOnce the discussion settle, a summary of decisions will appear here.
  
== Observational Taxonomy ==
+
* See [http://www.ncbi.nlm.nih.gov/pubmed/22120662 the publication].
  
HDB has several different levels of biological unit, all represented with a different set of tables
 
  
* Organism - This already exists and comes from the [[Chado Organism Module]].  It defines a species.
+
= History =
* [http://heliconiusdb.svn.sourceforge.net/viewvc/heliconiusdb/trunk/schema/doc/diversity.html#biotype Biotype]
+
* [http://heliconiusdb.svn.sourceforge.net/viewvc/heliconiusdb/trunk/schema/doc/diversity.html#stock Stock] (which is different from the stock table already in Chado)
+
* [http://heliconiusdb.svn.sourceforge.net/viewvc/heliconiusdb/trunk/schema/doc/diversity.html#individual Individual]
+
* [http://heliconiusdb.svn.sourceforge.net/viewvc/heliconiusdb/trunk/schema/doc/diversity.html#crossexperiment Crossexperiment]
+
* [http://heliconiusdb.svn.sourceforge.net/viewvc/heliconiusdb/trunk/schema/doc/diversity.html#specimen Specimen]
+
  
And there are a bevy of relationships between these tables.
+
This section describes important events in the development of the module.  Detailed discussion of functionality is a separate section below.
  
{| class="wikitable"
+
== 2007 ==
| Organism
+
| M:M
+
| Biotype
+
|-
+
| Biotype
+
| 1:M
+
| Stock
+
| there are 3 different 1:M rels
+
|-
+
| Stock
+
| 1:M
+
| Individual
+
|-
+
| Crossexperiment
+
| 1:M
+
| Individual
+
|-
+
| Individual
+
| 1:M
+
| Crossexperiemnt
+
|-
+
| Individual
+
| 1:M
+
| Specimen
+
|-
+
| Biotype
+
| M:M
+
| Individual
+
|}
+
 
+
All of this tables describe some unit/group of biology/life, ranging from species (organism) down to tissue in hand (specimen).  The HDB design has several ''structurally identical'' tables in HDB for the various levels for different types of data (phenotype, images, ...).  This particular hierarchy is also particular to butterflies.
+
 
+
=== Observational Taxonomy Proposal ===
+
 
+
When Sook Jung mapped the HDB version to tree biology a number of issues came up, many of which boil down to:
+
* Species/biotype/stock/individual/cross hierarchy doesn't work for trees (living trees, not abstract ones).
+
* Lineage doesn't work for trees.
+
 
+
This highlights that HDB is not a very Chadoesque design.  We need to genericize the design to support arbitrary hierarchies, lineages, and mating types.  This will support many more users and allow them to store images, phenotypes, genotypes, properties, etc. for whatever level of the hierarchy they have data for.
+
 
+
We can't touch Organism, as it's a key table in every Chado instance out there.  However, everything else is open to change.
+
 
+
==== Observational Unit ====
+
 
+
<div class="quotebox">
+
The GDPDM has ''observational units,'' which represents whatever level of sample you have data for.  I find that name descriptive, but awkward.  Unfortunately, I can't think of a better name.  Suggestions are welcome.
+
</div>
+
 
+
Specifics:
+
# Try to combine biotype, stock, individual, and crossexperiemt into a single table, tentatively called obs_unit (with a nod to GDPDM).
+
# Investigate also folding specimen into obs_unit.
+
# An observational unit's place in the observational taxonomy will be indicated by a new column in obs_unit that points to the CV table.  For butterflies, the possible values might be "species", "biotype", "stock", "individual", and possibly "specimen"
+
 
+
==== Observational Unit Relationships ====
+
 
+
We need to support arbitrary M:M relationships between different levels of the observational taxonomy, and within the same level as well.  For example, we may want a complete chain from species to individual (or plot or brood or ...), and that individual may have resulted from crossing 2 other individuals (or from cloning one, or ...).
+
 
+
The common solution is to create a bridge/mapping/intersection table to implement M:M relationships between obs_unit and itself.  This table would define the standard "subject relationship object" triple where the subject and object are obs_unit's and the relationship is a CV term.
+
  
This also deals with complications in lineage and mating typesYou can represent ''T. Thermophila'' which has 7 mating types (any 2 will do), ''C. elegans'' which has hermaphrodites and outcrossing, ''E. coli'' which is asexual, ...
+
The [http://sourceforge.net/projects/heliconiusdb/develop initial version] of the Natural Diversity Module was developed by several people associated with [http://nescent.org NESCent]The initial application was ''heliconius'' research.  This first version was directly inspired by the [http://www.maizegenetics.net/gdpdm/ Genomic Diversity and Phenotype Data Model (GDPDM)], which comes out of Cornell. The GDPDM has great documentation and is also described in [[:Image:GDPDM_GMOD_presentation20060630.ppt|this presentation]].
  
= Links =
+
=== Links ===
  
 
* The [http://sourceforge.net/projects/heliconiusdb/develop beta natural diversity module] is in [http://sourceforge.net/projects/heliconiusdb/develop Soureforge].
 
* The [http://sourceforge.net/projects/heliconiusdb/develop beta natural diversity module] is in [http://sourceforge.net/projects/heliconiusdb/develop Soureforge].
Line 111: Line 25:
 
* The module is based heavily on the [http://www.maizegenetics.net/gdpdm/ Genomic Diversity and Phenotype Data Model (GDPDM)], which comes out of Cornell, and also has nice documentation.  The GDPDM is also described in [[:Image:GDPDM_GMOD_presentation20060630.ppt|this presentation]].
 
* The module is based heavily on the [http://www.maizegenetics.net/gdpdm/ Genomic Diversity and Phenotype Data Model (GDPDM)], which comes out of Cornell, and also has nice documentation.  The GDPDM is also described in [[:Image:GDPDM_GMOD_presentation20060630.ppt|this presentation]].
  
= Membership =
+
== 2009 ==
 +
 
 +
In the fall of 2009, Sook Jung of Washington State downloaded the initial version (becoming the first known user outside ''heliconius'') and started looking at it with the goal of using for [http://www.rosaceae.org/ GDR], a plant genome database.  Sook found that a number of things weren't clear, and her input led to a rethinking of the design, and to the formation of this working group.
 +
 
 +
== 2010 ==
 +
 
 +
Several working group participants met at a [[January 2010 GMOD Meeting#Satellite Meetings|January 2010 GMOD Satellite Meeting]] during [[PAG 2010]].  Discussion has continued in teleconferences and on the {{MailingListLink|Chado|Chado mailing list}}.  The design has changed considerable during this time.  It has become more ''Chadoesque'':  it is a very flexible and generic design.
 +
 
 +
As of July 2010, the schema is undergoing final tweaks before it is moved into production [[Chado]].
 +
 
  
 +
= Participation =
 
If you are interested, please add your name below.  (Either update this page directly, or send your contact info to [mailto:clements@nescent.org Dave Clements].)
 
If you are interested, please add your name below.  (Either update this page directly, or send your contact info to [mailto:clements@nescent.org Dave Clements].)
  
Line 179: Line 103:
 
|
 
|
  
 +
|-
 +
|
 +
Genevieve DeClerk
 +
|
 +
 +
|
 +
Cornell  / Gramene
 +
|
 +
 +
|-
 +
|
 +
Bob MacCallum
 +
|
 +
r.maccallum#imperial.ac.uk
 +
|
 +
[http://vectorbase.org VectorBase]
 +
|
 +
mosquitoes, ticks and other nasties...
 +
 +
|-
 +
|
 +
Seth Redmond
 +
|
 +
seth.redmond * imperial.ac.uk
 +
|
 +
Imperial / [http://vectorbase.org VectorBase]
 +
|
 +
 +
|-
 +
|
 +
[[User:NaamaMenda | Naama Menda]]
 +
|
 +
naama.menda * cornell.edu
 +
|
 +
sol genomics network / [http://solgenomics.net SGN]
 +
|
 +
 +
|-
 +
|
 +
Maren Friesen
 +
|
 +
 +
|
 +
University of Southern California
 +
|
 +
Medicago ecological genomics
 +
|-
 +
|
 +
Yuri Bendana
 +
|
 +
 +
|
 +
University of southern California
 +
|
 +
Medicago ecological genomics
 +
|-
 +
|
 +
Pantelis Topalis
 +
|
 +
topalis*imbb.forth.gr
 +
|
 +
[http://vectorbase.org VectorBase] @IMBB
 +
|
 +
Ontology developer
 +
|-
 +
|
 +
Emmanuel Dialynas
 +
|
 +
ed * imbb.forth.gr
 +
|
 +
[http://vectorbase.org VectorBase] @IMBB
 +
|
 +
Ontology/Insecticide Resistance database developer
 +
|-
 +
|
 +
Dan Bolser
 +
|
 +
dan.bolser@gmail.com
 +
|
 +
Ensembl Genomes, Plants division
 +
|
 +
We need to store millions of SNPs from hundreds of thousands of varieties. An unknown volume of phenotyping data to follow...
  
 
|-class='sortbottom'
 
|-class='sortbottom'
Line 185: Line 191:
 
<!--box uid=90f302545d6d2678756d3936319fd651.1541.T4b2076aa353c3--></protect>
 
<!--box uid=90f302545d6d2678756d3936319fd651.1541.T4b2076aa353c3--></protect>
 
{{ThisIsATET}}
 
{{ThisIsATET}}
 +
 +
= Stock Relationship Ontology =
 +
 +
[[Stock Relationship Ontology]]
  
 
[[Category:Meetings]]
 
[[Category:Meetings]]
 
[[Category:Chado Modules]]
 
[[Category:Chado Modules]]
 +
[[Category:Natural Diversity]]

Latest revision as of 15:04, 4 February 2012

The Chado Natural Diversity Module Working Group was established with the aim of getting the Chado Natural Diversity Module into the production version of Chado.


Discussion

See the discussion page for notes on what we've talked about and where we are heading. Once the discussion settle, a summary of decisions will appear here.


History

This section describes important events in the development of the module. Detailed discussion of functionality is a separate section below.

2007

The initial version of the Natural Diversity Module was developed by several people associated with NESCent. The initial application was heliconius research. This first version was directly inspired by the Genomic Diversity and Phenotype Data Model (GDPDM), which comes out of Cornell. The GDPDM has great documentation and is also described in this presentation.

Links

2009

In the fall of 2009, Sook Jung of Washington State downloaded the initial version (becoming the first known user outside heliconius) and started looking at it with the goal of using for GDR, a plant genome database. Sook found that a number of things weren't clear, and her input led to a rethinking of the design, and to the formation of this working group.

2010

Several working group participants met at a January 2010 GMOD Satellite Meeting during PAG 2010. Discussion has continued in teleconferences and on the Chado mailing list. The design has changed considerable during this time. It has become more Chadoesque: it is a very flexible and generic design.

As of July 2010, the schema is undergoing final tweaks before it is moved into production Chado.


Participation

If you are interested, please add your name below. (Either update this page directly, or send your contact info to Dave Clements.)

Name Email Affiliation Comments

Dave Clements (organizer)

clements@nescent.org

NESCent, GMOD

Please let me know if you are interested in participating in this group, or if you have any questions.

Sook Jung

sook * bioinfo.wsu.edu

Washington State University, GDR

Meg Staton

mestato * yahoo.com

CUGI

Stephen Ficklin

ficklin * clemson.edu

CUGI

Dorrie Main

dorrie * wsu.edu

Washington State University, GDR

Scott Cain

OICR / GMOD

Genevieve DeClerk

Cornell / Gramene

Bob MacCallum

r.maccallum#imperial.ac.uk

VectorBase

mosquitoes, ticks and other nasties...

Seth Redmond

seth.redmond * imperial.ac.uk

Imperial / VectorBase

Naama Menda

naama.menda * cornell.edu

sol genomics network / SGN

Maren Friesen

University of Southern California

Medicago ecological genomics

Yuri Bendana

University of southern California

Medicago ecological genomics

Pantelis Topalis

topalis*imbb.forth.gr

VectorBase @IMBB

Ontology developer

Emmanuel Dialynas

ed * imbb.forth.gr

VectorBase @IMBB

Ontology/Insecticide Resistance database developer

Dan Bolser

dan.bolser@gmail.com

Ensembl Genomes, Plants division

We need to store millions of SNPs from hundreds of thousands of varieties. An unknown volume of phenotyping data to follow...

edit table

Template:ThisIsATET

Stock Relationship Ontology

Stock Relationship Ontology