This page or section is under construction.

The Chado Natural Diversity Working Group has been established with the aim of getting the Chado Natural Diversity Module into the production version of Chado.

Background and Timeline

This section describes important events in the development of the module. Detailed discussion of functionality is a separate section below.

2007

The initial version of the Natural Diversity Module was developed by several people associated with NESCent. The initial application was heliconius research. This first version (and subsequent versions) are directly inspired by the Genomic Diversity and Phenotype Data Model (GDPDM), which comes out of Cornell. The GDPDM has great documentation and is also described in this presentation.

2009

In the fall of 2009, Sook Jung of Washington State downloaded the initial version (becoming the first known user outside heliconius) and started looking at it with the goal of using for GDR, a plant genome database. Sook found that a number of things weren't clear, and her input led to a rethinking of the design, and to the formation of this working group.

2010

January 2010 GMOD Satellite Meeting

Several working group participants met twice during PAG 2010, immediately before the January 2010 GMOD Meeting.

Discussion

At this time (January 2010), most of this discussion is about making changes relative to the version that was created in 2007 with Heliconius in mind. This is referred to below as HDB.

We will also use what came up at the PAG meeting as starting point for the discussion. This will change over time, as things settle down.

Responsibilities

Note: These responsibilities are flexible. They are just what we decided at the PAG meetings. They are wide open to discussion (or just add your name below.

Sook Jung will take the lead on schema changes/development. Sook is interested and is motivated to produce a working schema as soon as possible.

Dave Clements will lead documentation efforts. Dave will produce wiki documentation for the new tables. I hope to also create schema diagrams (probably using Power Architect). Dave is also keenly interested in how phenotypes are represented both in this module, and in the rest of the Chado.

Add your name here ...

Observational Taxonomy

HDB has several different levels of biological unit, all represented with a different set of tables

Organism - This already exists and comes from the Chado Organism Module. It defines a species.
Biotype
Stock (which is different from the stock table already in Chado)
Individual
Crossexperiment
Specimen

And there are a bevy of relationships between these tables.

Organism	M:M	Biotype
Biotype	1:M	Stock	there are 3 different 1:M rels
Stock	1:M	Individual
Crossexperiment	1:M	Individual
Individual	1:M	Crossexperiemnt
Individual	1:M	Specimen
Biotype	M:M	Individual

All of this tables describe some unit/group of biology/life, ranging from species (organism) down to tissue in hand (specimen). The HDB design has several structurally identical tables in HDB for the various levels for different types of data (phenotype, images, ...). This particular hierarchy is also particular to butterflies.

Stock

Both the HDB version and the production Chado have a stock table. The Chado Stock Module was added to production Chado while or after the HDB version was being developed.

The Chado Stock module is about keeping track of lines in your lab/community. Someone needs to take a look at it and determine how the natural diversity module should interact with it.

Observational Taxonomy Proposal

When Sook Jung mapped the HDB version to tree biology a number of issues came up, many of which boil down to:

Species/biotype/stock/individual/cross hierarchy doesn't work for trees (living trees, not abstract ones).
Lineage doesn't work for trees.

This highlights that HDB is not a very Chadoesque design. We need to genericize the design to support arbitrary hierarchies, lineages, and mating types. This will support many more users and allow them to store images, phenotypes, genotypes, properties, etc. for whatever level of the hierarchy they have data for.

We can't touch Organism, as it's a key table in every Chado instance out there. However, everything else is open to change.

Observational Unit

The GDPDM has observational units, which represents whatever level of sample you have data for. I find that name descriptive, but awkward. Unfortunately, I can't think of a better name. Suggestions are welcome.

Specifics:

Try to combine biotype, stock, individual, and crossexperiemt into a single table, tentatively called obs_unit (with a nod to GDPDM).
Investigate also folding specimen into obs_unit.
An observational unit's place in the observational taxonomy will be indicated by a new column in obs_unit that points to the CV table. For butterflies, the possible values might be "species", "biotype", "stock", "individual", and possibly "specimen"

Observational Unit Relationships

We need to support arbitrary M:M relationships between different levels of the observational taxonomy, and within the same level as well. For example, we may want a complete chain from species to individual (or plot or brood or ...), and that individual may have resulted from crossing 2 other individuals (or from cloning one, or ...).

The common solution is to create a bridge/mapping/intersection table to implement M:M relationships between obs_unit and itself. This table would define the standard "subject relationship object" triple where the subject and object are obs_unit's and the relationship is a CV term.

This also deals with complications in lineage and mating types. You can represent T. Thermophila which has 7 mating types (any 2 will do), C. elegans which has hermaphrodites and outcrossing, E. coli which is asexual, ...

Project/Experiment/Study Hierarchy

The current Project table is defined in the General Module. The HDB design links to it extensively. However, other modules hardly use it at all.

The GDR group needs to the ability to more robustly define projects/studies, and to introduce substudies/project hierarchy, as well.

Phenotypes

Phenotype are particularly well defined in Chado. Scott says that there are two sets of phenotype tables in Chado. One is a first rough draft that snuck in (and is used by some), and the other is a more robust set, which is used by others (including FlyBase). Too make things worse, which tables are in which set is not presently clearly defined.

Dave C. will do some research into

What is currently going on in Chado?
- Which tables are in the old and new implementations?
- How are those tables currently used, and by whom?
What are best practices for representing phenotypes in a generic database like Chado?

If items #2 and #1 don't line up, and there are not a lot of current users, then I would like to look into

reimplementing phenotypes in Chado, and
providing migration paths for what users we do have.

Assays, Images

HDB includes support for images and assays. We should probably have a general purpose solution that is usable for all images and assays, not just those in the natural diversity module.

Links

The beta natural diversity module is in Soureforge.
- documentation directory
- schema definitions.
- Table Documentation
The module is based heavily on the Genomic Diversity and Phenotype Data Model (GDPDM), which comes out of Cornell, and also has nice documentation. The GDPDM is also described in this presentation.

Membership

If you are interested, please add your name below. (Either update this page directly, or send your contact info to Dave Clements.)

Name	Email	Affiliation	Comments
Dave Clements (organizer)	clements@nescent.org	NESCent, GMOD	Please let me know if you are interested in participating in this group, or if you have any questions.
Sook Jung	sook * bioinfo.wsu.edu	Washington State University, GDR
Meg Staton	mestato * yahoo.com	CUGI
Stephen Ficklin	ficklin * clemson.edu	CUGI
Dorrie Main	dorrie * wsu.edu	Washington State University, GDR
Scott Cain		OICR / GMOD
Genevieve DeClerk		Cornell / Gramene
edit table

Template:ThisIsATET

Chado Natural Diversity Module Working Group

Contents

Background and Timeline

2007

2009

2010

January 2010 GMOD Satellite Meeting

Discussion

Responsibilities

Observational Taxonomy

Stock

Observational Taxonomy Proposal

Observational Unit

Observational Unit Relationships

Project/Experiment/Study Hierarchy

Phenotypes

Assays, Images

Links

Membership

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Documentation

Community

Tools