The Chado Natural Diversity Module is an extension to the Chado schema to better support natural diversity data.
Eventually this page will resemble the other Chado Module pages, with an overview followed by a detailed explanation of the tables, columns, and relationships in the module. However, while the module is under development, this page will have an alternative structure.
Recently, a lot of work was done on the ND module at the GMOD Tools for Evolutionary Biology hackathon, see https://www.nescent.org/wg_gmodevohackathon/Natural_Diversity_and_Phenotypes_Subgroup.
The Natural Diversity module allows storing data from multiple experiments of the same accessions/lines/strains, collected, treated and evaluated in various locations, environments, and times. Each accessions/lines/strains can be scored for a large number of phenotypic traits, and genotyped with an array of genetic markers. In addition to storing data from experiments performed on existing accessions/lines/strains, the tables in this module and other interacting modules allow storing data from experiments that generate new accessions/lines/strains and experimental samples, such as field collection, cross, and treatments.
The original stock module was designed to store information about stock collections in a laboratory. This original concept of ‘stock’ table has been expanded to accommodate entities that a ‘stock’ belongs to or entities that derived from a ‘stock’. Hence ‘stock’ table can store hierarchical entity of population, strain/line/accession, individual, clone, and sample, with relationships between ‘stocks’ defined in the ‘stock_relationship’ table. For example, a plant accession can belong to a population (eg. progeny of a particular cross), and have multiple ‘children’ when used as parent in a cross. A plant accession can also have a ‘clone’ when it is clonally propagated. The ‘stock’ table also stores ‘samples’ or ‘observational units’ of individual experiments. For example, when a plant was treated with different amounts of chemicals before various phenotyping experiments were performed, each sample treated with specific amounts of chemicals is stored in a distinct row of ‘stock’ table. Since population can be defined as a group of any entities, a population entity can be composed of multiple species (eg. a group of insects collected in a field). To accommodate this change, the ‘NOT NULL’ constraint for organism_id has been dropped.
Nd_experiment is the core table for the natural diversity module, representing each individual assay that is undertaken (nb this is usually *not* an entire experiment). Experiment.type is a cvterm that will define which records are expected for other tables. The types of experiments that are done on or gave rise to ‘stocks’ include, but are not restricted to, cross, field collection, sample treatment, phenotyping and genotyping experiments. Any cvterm may be used but it was designed with terms such as: [phenotype_assay, genotype_assay, field_collection, cross_experiment, sample_treatment] in mind.
For experiments of type ‘phenotype_assay’ or ‘genotype_assay’, each nd_experiment generally give rise to a single genotype or phenotype. Though there is no longer a one-to-one restriction, it is expected that phenotypes and genotypes will be the result of a single assay. An exception to one-to-one relationship would include the genotype of heterozygotes where the genotype of each allele is stored in a distinct row of genotype table. One record of phenotype and genotype can be linked to multiple experiments since multiple stock entries can produce same phenotype and/or genotype.
Each nd_experiment of type ‘phenotype_assay’ or ‘genotype_assay’ should be linked to a single stock entry via nd_experiment_stock table. So when multiple samples from the same line/accession/strain have been used for multiple phenotype assays, a unique nd_experiment_id should be created for a unique combination of a sample and a phenotype assay. The sample treatment that are performed to generate a sample from a line/accession/strain can be stored in a separate row of nd_experiment of type ‘sample_treatment’. Collections of assays that relate to each other can be linked via the same record in the ‘project’ table and/or same record in ‘stock’ table.
For experiments of type ‘field_collection’ and ‘cross_experiment’, the appropriate stock entries that are used or generated from the experiment can be linked by nd_experiment_stock table. In cross experiment, the parental stocks and the progeny are stored in ‘stock’ table and their roles in cross can be recorded using an appropriate cvterm for nd_experiment_stock.type_id. The progeny of a cross and the stocks collected from a field collection can be stored in ‘stock’ table as a population or as an individual line/accession/strain/sample and then be linked to nd_experiment appropriately.
The detailed protocols of experiments can be described via one or more protocols. Reagents used in protocol can be stored in nd_reagent table. The same protocol and reagent can be linked to multiple experiments and protocols, respectively. Metadata of experiments, such as date and experimenters, can be stored as properties (nd_experimentprop).
There are, as yet, no standard flat file formats or loading scripts to load data into this module. Custom scripts will need to be written to insert your data in the database.
Data:
Chado:
Data:
Chado:
Comments:
rel=”nofollow”>Sol genomics network</a>)</span>
A collection of 479 tomato accessions were used in several locations and years for assaying traits of breeding significance. The collection is an unstructured population of tomato varieties.
Chado Stock Module The stock table is used for storing any plant accession, collection, germplasm, or plants in a field plot. We use the stock_relationship table for defining those relationships. Each plant accession, and its derived germplasm, is stored in the stock table , with relationships between accessions defined in the stock_relationship table. For example, a plant accession can belong to a ‘parent’ population, and has multiple ‘children’, such as individual plants in a field plot.
In this data-set, we have in the stock table a Tomato cultivars population, and all the related accessions. The relationship between the accession stocks and the population stock is defined in stock_relationship (the population is the object in stock_relationship, and all accessions are subjects). Similar relationship is defined between each accession and the plants in the experiment’s field plots. For example accession Heinz 1706 belongs to the cultivars population, and has several field plots.
**Chado Natural Diversity Module** Each phenotyping event is stored in the nd_experiment table. An experiment, in the Natural Diversity module, is usually defined as a single genotyping event, or a phenotyping event performed on one or more plants, at the same time and location, for one or multiple traits. Then the experiment is linked with each participating stock (nd_experiment_stock table). In this example, the phenotyping was performed on plants in field plots, thus nd_experiment_stock links the experiment_id with the stock_id of the field plot. This allows storing multiple experiments of the same accessions, evaluated in different locations, environments, and years. Each plant can be scored for a large number of phenotypic traits, and genotyped with an array of genetic markers (see ‘SGN Genotyping experiment bellow).
Each phenotyping and genotyping experiment is also tied to a geolocation (nd_geolocation table). Metadata of experiments, such as date, environmental conditions, and person recording the data, are stored as properties (nd_experimentprop).
All the scored phenotype values and attributes are stored in the phenotype table, and genotypes in the genotype table. The 2 linking tables, nd_experiment_phenotype and nd_experiment_genotype, provide a 1-1 or 1-many relationship between phenotype/genotype and experiments.
Each genotyping experiment (in this use case SNP and SSR markers were assayed, yielding a genotype of a SNP of number of repeats) has a unique nd_experiment_id, and is linked to the stock using nd_experiment_stock, and to the genotype using nd_experiment_genotype table.
A simplified schema diagram by N. Menda and R. Buels
This will be populated using the process outlined in Chado Tables to Wiki.
This is the core table for the natural diversity module, representing
each individual assay that is undertaken (nb this is usually *not* an
entire experiment). Each nd_experiment should give rise to a single
genotype or phenotype and be described via 1 (or more) protocols.
Collections of assays that relate to each other should be linked to the
same record in the project table.
Experiment.type is a cvterm that will define which records are expected
for other tables. Any CV may be used but it was designed with terms such
as: [phenotype_assay, genotype_assay, field_collection,
cross_experiment] in mind.
FK | Name | Type | Description |
---|---|---|---|
nd_experiment_id | serial | PRIMARY KEY | |
nd_geolocation | nd_geolocation_id | integer | NOT NULL |
cvterm | type_id | integer | NOT NULL |
public.nd_experiment Structure
Tables referencing this one via Foreign Key Constraints:
primary contact / submitter of these nd_experiments (nd, where assays are not submitted separately this may be better stored in project_contact).
FK | Name | Type | Description |
---|---|---|---|
nd_experiment_contact_id | serial | PRIMARY KEY | |
nd_experiment | nd_experiment_id | integer | NOT NULL |
contact | contact_id | integer | NOT NULL |
public.nd_experiment_contact Structure
Cross-reference experiment to accessions, images, etc
FK | Name | Type | Description |
---|---|---|---|
nd_experiment_dbxref_id | serial | PRIMARY KEY | |
nd_experiment | nd_experiment_id | integer | NOT NULL |
dbxref | dbxref_id | integer | NOT NULL |
public.nd_experiment_dbxref Structure
Linking table: experiments to the genotypes they produce. Though there is no longer a one-to-one restriction, it is expected that genotypes will be the result of a single assay. The restriction was lifted to allow items such as: a single chromosome staining giving values for inversions: 2La/+, 2Rbc - (whilst these are technically a single genotype users may wish to store these separately such that all 2La/+ individuals can be easily ascertained).
FK | Name | Type | Description |
---|---|---|---|
nd_experiment_genotype_id | serial | PRIMARY KEY | |
nd_experiment | nd_experiment_id | integer | UNIQUE#1 NOT NULL |
genotype | genotype_id | integer | UNIQUE#1 NOT NULL |
public.nd_experiment_genotype Structure
Linking table: experiments to the phenotypes they produce. in most cases this will either be a single record, or an alternative (quantitative / qualitative?) description of the same phenotype (e.g. 1: “wing length: 12mm” / “wing length: increased”). In rare cases it may suit the user to link a single qualitative phenotype to multiple experiments
FK | Name | Type | Description |
---|---|---|---|
nd_experiment_phenotype_id | serial | PRIMARY KEY | |
nd_experiment | nd_experiment_id | integer | UNIQUE#1 NOT NULL |
phenotype | phenotype_id | integer | UNIQUE#1 NOT NULL |
public.nd_experiment_phenotype Structure
Used to group together related nd_experiment records. All nd_experiments should be linked to at least one project.
FK | Name | Type | Description |
---|---|---|---|
nd_experiment_project_id | serial | PRIMARY KEY | |
project | project_id | integer | NOT NULL |
nd_experiment | nd_experiment_id | integer | NOT NULL |
public.nd_experiment_project Structure
Linking table: experiments to the protocols they involve.
FK | Name | Type | Description |
---|---|---|---|
nd_experiment_protocol_id | serial | PRIMARY KEY | |
nd_experiment | nd_experiment_id | integer | NOT NULL |
nd_protocol | nd_protocol_id | integer | NOT NULL |
public.nd_experiment_protocol Structure
Linking nd_experiment(s) to publication(s)
FK | Name | Type | Description |
---|---|---|---|
nd_experiment_pub_id | serial | PRIMARY KEY | |
nd_experiment | nd_experiment_id | integer | UNIQUE#1 NOT NULL |
pub | pub_id | integer | UNIQUE#1 NOT NULL |
public.nd_experiment_pub Structure
Part of a stock or a clone of a stock that is used in an experiment
FK | Name | Type | Description |
---|---|---|---|
nd_experiment_stock_id | serial | PRIMARY KEY | |
nd_experiment_id | integer | NOT NULL | |
stock_id | integer | NOT NULL stock used in the extraction or the corresponding stock for the clone |
|
type_id | integer | NOT NULL |
public.nd_experiment_stock Structure
Tables referencing this one via Foreign Key Constraints:
Cross-reference experiment_stock to accessions, images, etc
FK | Name | Type | Description |
---|---|---|---|
nd_experiment_stock_dbxref_id | serial | PRIMARY KEY | |
nd_experiment_stock | nd_experiment_stock_id | integer | NOT NULL |
dbxref | dbxref_id | integer | NOT NULL |
public.nd_experiment_stock_dbxref Structure
Property/value associations for experiment_stocks. This table can store the properties such as treatment
FK | Name | Type | Description |
---|---|---|---|
nd_experiment_stockprop_id | serial | PRIMARY KEY | |
nd_experiment_stock_id | integer | UNIQUE#1 NOT NULL The experiment_stock to which the property applies. |
|
type_id | integer | UNIQUE#1 NOT NULL The name of the property as a reference to a controlled vocabulary term. |
|
value | character varying(255) | The value of the property. |
|
rank | integer | UNIQUE#1 NOT NULL The rank of the property value, if the property has an array of values. |
public.nd_experiment_stockprop Structure
Tag-value properties - follows standard chado model.
FK | Name | Type | Description |
---|---|---|---|
nd_experimentprop_id | serial | PRIMARY KEY | |
nd_experiment | nd_experiment_id | integer | UNIQUE#1 NOT NULL |
cvterm | type_id | integer | UNIQUE#1 NOT NULL |
value | character varying(255) | NOT NULL | |
rank | integer | UNIQUE#1 NOT NULL |
public.nd_experimentprop Structure
The geo-referencable location of the stock. NOTE: This entity is subject to change as a more general and possibly more OpenGIS-compliant geolocation module may be introduced into Chado.
FK | Name | Type | Description |
---|---|---|---|
nd_geolocation_id | serial | PRIMARY KEY | |
description | character varying(255) | A textual representation of the location, if this is the original georeference. Optional if the original georeference is available in lat/long coordinates. |
|
latitude | real | The decimal latitude coordinate of the georeference, using positive and negative sign to indicate N and S, respectively. |
|
longitude | real | The decimal longitude coordinate of the georeference, using positive and negative sign to indicate E and W, respectively. |
|
geodetic_datum | character varying(32) | The geodetic system on which the geo-reference coordinates are based. For geo-references measured between 1984 and 2010, this will typically be WGS84. |
|
altitude | real | The altitude (elevation) of the location in meters. If the altitude is only known as a range, this is the average, and altitude_dev will hold half of the width of the range. |
public.nd_geolocation Structure
Tables referencing this one via Foreign Key Constraints:
Property/value associations for geolocations. This table can store the properties such as location and environment
FK | Name | Type | Description |
---|---|---|---|
nd_geolocationprop_id | serial | PRIMARY KEY | |
nd_geolocation_id | integer | UNIQUE#1 NOT NULL | |
type_id | integer | UNIQUE#1 NOT NULL The name of the property as a reference to a controlled vocabulary term. |
|
value | character varying(250) | The value of the property. |
|
rank | integer | UNIQUE#1 NOT NULL The rank of the property value, if the property has an array of values. |
public.nd_geolocationprop Structure
A protocol can be anything that is done as part of the experiment.
FK | Name | Type | Description |
---|---|---|---|
nd_protocol_id | serial | PRIMARY KEY | |
name | character varying(255) | UNIQUE NOT NULL The protocol name. |
public.nd_protocol Structure
Tables referencing this one via Foreign Key Constraints:
FK | Name | Type | Description |
---|---|---|---|
nd_protocol_reagent_id | serial | PRIMARY KEY | |
nd_protocol | nd_protocol_id | integer | NOT NULL |
nd_reagent | reagent_id | integer | NOT NULL |
cvterm | type_id | integer | NOT NULL |
public.nd_protocol_reagent Structure
Property/value associations for protocol.
FK | Name | Type | Description |
---|---|---|---|
nd_protocolprop_id | serial | PRIMARY KEY | |
nd_protocol_id | integer | UNIQUE#1 NOT NULL The protocol to which the property applies. |
|
type_id | integer | UNIQUE#1 NOT NULL The name of the property as a reference to a controlled vocabulary term. |
|
value | character varying(255) | The value of the property. |
|
rank | integer | UNIQUE#1 NOT NULL The rank of the property value, if the property has an array of values. |
public.nd_protocolprop Structure
A reagent such as a primer, an enzyme, an adapter oligo, a linker oligo. Reagents are used in genotyping experiments, or in any other kind of experiment.
FK | Name | Type | Description |
---|---|---|---|
nd_reagent_id | serial | PRIMARY KEY | |
name | character varying(80) | NOT NULL The name of the reagent. The name should be unique for a given type. |
|
type_id | integer | NOT NULL The type of the reagent, for example linker oligomer, or forward primer. |
|
feature_id | integer | If the reagent is a primer, the feature that it corresponds to. More generally, the corresponding feature for any reagent that has a sequence that maps to another sequence. |
public.nd_reagent Structure
Tables referencing this one via Foreign Key Constraints:
Relationships between reagents. Some reagents form a group. i.e., they are used all together or not at all. Examples are adapter/linker/enzyme experiment reagents.
FK | Name | Type | Description |
---|---|---|---|
nd_reagent_relationship_id | serial | PRIMARY KEY | |
subject_reagent_id | integer | NOT NULL The subject reagent in the relationship. In parent/child terminology, the subject is the child. For example, in "linkerA 3prime-overhang-linker enzymeA" linkerA is the subject, 3prime-overhand-linker is the type, and enzymeA is the object. |
|
object_reagent_id | integer | NOT NULL The object reagent in the relationship. In parent/child terminology, the object is the parent. For example, in "linkerA 3prime-overhang-linker enzymeA" linkerA is the subject, 3prime-overhand-linker is the type, and enzymeA is the object. |
|
type_id | integer | NOT NULL The type (or predicate) of the relationship. For example, in "linkerA 3prime-overhang-linker enzymeA" linkerA is the subject, 3prime-overhand-linker is the type, and enzymeA is the object. |
public.nd_reagent_relationship Structure
FK | Name | Type | Description |
---|---|---|---|
nd_reagentprop_id | serial | PRIMARY KEY | |
nd_reagent | nd_reagent_id | integer | UNIQUE#1 NOT NULL |
cvterm | type_id | integer | UNIQUE#1 NOT NULL |
value | character varying(255) | ||
rank | integer | UNIQUE#1 NOT NULL |
public.nd_reagentprop Structure