Difference between revisions of "Notes on simplified nd schema and Use Cases"

From GMOD
Jump to: navigation, search
(Notes on the tables)
(tree fruit breeding data)
Line 176: Line 176:
 
**Where do we store the definition of 1 through 5? Many phenotypes have values of 1 through 5 and have their own definitions. In phenotype table or in nd_protocolprop?
 
**Where do we store the definition of 1 through 5? Many phenotypes have values of 1 through 5 and have their own definitions. In phenotype table or in nd_protocolprop?
 
**Some breeders may record fruit size from 1 thru 3 instead of 1 thru 5, then those could be stored as different protocols (fruit_size_protocol_1, fruit_size_protocol_2, etc), but under same phenotype.attr_id (fruit size). Then the definition of 1 through 5 should be stored somewhere in nd_protocolprop
 
**Some breeders may record fruit size from 1 thru 3 instead of 1 thru 5, then those could be stored as different protocols (fruit_size_protocol_1, fruit_size_protocol_2, etc), but under same phenotype.attr_id (fruit size). Then the definition of 1 through 5 should be stored somewhere in nd_protocolprop
 +
 +
-[Naama] this should be stored in the cvterm. If 'fruit size' has a 1-5 scale , the scale might mean 1=smallest, 5 = largest, then you store the scale in the definition of the 'fruit size' cvterm, and the numeric value in the phenotype table.
 +
If 1-5 translate to descriptive definition (1=round, 2= long, etc.) then you could store these as synonyms for the cvterm , or cvtermprops. If some breeder uses a different scale, then there should be a different cvterm for that.

Revision as of 02:32, 13 May 2010

SQL

https://gmod.svn.sourceforge.net/svnroot/gmod/schema/branches/integrate_natdiv/chado/modules/natural_diversity/natural_diversity.sql


Schema Drawing

http://gmod.org/w/index.php?title=Talk:Chado_Natural_Diversity_Module_Working_Group#April_2010

Notes on the tables

stock

  • Stores all types (groups or individuals) of germplasm, and their relationship is defined through stock_relationship table
  • Stock Relationship Ontology under development in the page below

http://gmod.org/wiki/Stock_Relationship_Ontology


nd_assay

  • The type_id would store whether the assay is phenotype assay, genotype assay, crossexperiment, field collection study. or any other potential assays.
  • In case of phenotype/genotype assay, the assay_id is unique per specific sample of a stock (part or clone of a stock under specific treatment), per collection date (eg. postharvest phenotype assay), per assay date, per type of protocol (eg. a specific molecular marker in case of genotype assay)


nd_assay_phenotype

  • linking table between nd_assay and phenotype
  • Question: The SQL document says that there is one to one relationship between assay_id and phenotype_id but multiple samples can have the same phenotypic value so it would be many to one relationship. For example, the phenotypic value of firmness (eg. apple) varies from 1 to 5 (1 being very soft and 5 being very firm). Multiple assay_ids, therefore, can be linked to phenotype_id with the attr_id 'firmness' and value 1.

- [Naama] From my understanding every time you measure a phenotype you store it it the phenotype table with the relevant attributes. I don't think you are supposed to 'reuse' these records, even if you get a similar measurement.


nd_assay_genotype

  • linking table between nd_assay and genotype
  • Question: For heterozygotes, do we store the genotype of an individual or an allele? In another word, when SSR results in 200 in one allele 230 in another for a locus of an individual, do we store 200/230 in genotype table or 200 and 230 separately? If we store the genotype of an allele in the genotype table, one nd_assay_id should be able to linked to multiple genotype_id. So nd_assay to genotype will be many to many in that case.

-[Naama] I would say for individuals you store the heterozygous genotype (200/230) and for an allele entry you store the actual allele genotype. If your individual is in the stock table, I guess you could link it to the allele and retrieve the genotype from there.

nd_assayprop

  • Stores all the assay related data using cvterm_id and value
  • The sample_id (or clone_id, tree_id, etc) of a stock, when part of a stock was used as a sample or a stock was propagated to produce multiple clones.
  • Any treatment or environmental condition applied to the sample (fertilizer, salt conc, temp, etc), which are not part of a specific phenotyping/genotyping protocol
  • Any property of a sample (eg. number of fruits picked per sample, fresh vs. stored, etc)
  • Dates (collection date/assay date for phenoytping assay, assay date for genotyping assay, cross date for cross experiment and collection start/end date for field collection experiment)
  • cross name/ID and cross type (F1, F2, etc) for cross experiment
  • Notes and comments for a specific assay
  • Question: Experimenter_id could be stored in cvterm_id and value and link to contact table OR do we better have nd_assay_contact table?

-[Naama] experimenter_id should be in a contact table (contact_id) .


nd_assay_project

  • Links individual assays/crosses to a bigger project
  • Related tables: projectprop and project_relationship


nd_assay_stock

  • Linking table between nd_assay and stock
  • New stocks can be generated by nd_assay in case of field collection or cross experiment (progeny)
  • Or stock can be used for nd_assay in case of phenotype assay, genotype assay, and cross experiment (parent)
  • The type_id can record whether the stock is a female parent, male parent, parent for mutation, or progeny in case of cross experiment
  • The rootstocks that are used in planting fruit trees can be recorded in nd_assay_stock and the type_id could represent 'root stock'.


nd_assay_stockprop

  • Since nd_assay_stock is only a linking table now, not a table to store a specific sample of a stock, the IDs and the treatments done to the sample of a stock can be stored in nd_assayprop
  • What could be stored here?


nd_protocol

  • stores phenotyping/genotyping protocols
  • For genotyping assays, the protocol would be equivalent to molcular markers


nd_protocolprop

  • Any property of a protocol


nd_assay_protocol

  • linking table between nd_assay and nd_protocol
  • Many to one relationship between assay_id and protocol_id


nd_reagent

  • A reagent such as a primer, an enzyme, an adapter oligo, a linker oligo used in genotyping protocol or any other protocol
  • feature_id links reagent with DNA sequences (eg. primer) to an entry in feature table


nd_reagentprop

  • Any property of reagents


nd_reagent_relationship

  • relationship between reagents


nd_protocol_reagent

  • linking table between nd_protocol and nd_reagent


nd_geolocation


nd_geolocationprop

Use Cases

tree fruit breeding data

A. Cross

Data:

  • Cross name/ID, location, female and male parent, progeny, experimenter, project


nd module:

  • Cross type (F1, etc) and cross location in nd_assay (geolocation_id)
  • Cross name/ID in nd_assayprop (cvterm_id and value)
  • parent and progeny in stock table, linked to nd_assay via nd_assay_stock
  • The whole progeny could be stored as a group in stock table and linked to nd_assay
  • nd_assay_stock.type_id is for cvterms such as 'is a female parent', 'is a progeny', etc
  • The relationship (between the parent and progeny) could be recorded again in stock_relationship
  • Individual crosses can be linked to a bigger project via nd_assay_project


B. Phenotype Evaluation

  • The progeny (planted using a certain rootstock) are tested (1st stage), 50 progeny (distinct genotype) are selected, propagated (15 trees per genotype), planted using a certain rootstock, and tested in multiple orchards (2nd Stage), and more selected and tested (3rd Stage).
  • The trees (or the fruits harvested from the tree) are evaluated multiple times for various phenotypic evaluations
  • Fruits harvested at the same date can be tested on two (or more?) different dates (to compare fresh vs stored)


Data:

  • Tree ID (eg. wsu001_1, gala_1), the identity of the corresponding progeny/control stock (eg. wsu001, gala, etc), rootstock (eg. rootstock_1), location of the trees (eg. orchard A), sub_location (eg. Lot 001), fertilizer treatment (eg. fertilizer A)
  • project, stage, collection date, assay date, sample property (fresh vs stored; number of fruits collected per tree), phenotyping protocol, phenotypic value
  • An example of phenotype is 'fruit size' and their value can be 1 through 5 (1=very small; 2=small; 3=medium; 4=large; 5= very large)


in nd module:

  • The progeny, controls, rootstocks in stock table, linked to nd_assay via nd_assay_stock
  • nd_assay_stock.type_id would be cvterms for 'rootstock', or 'stock'?
  • Tree ID (wsu001_1, gala_1) in nd_assayprop using clone_id (?) as a cvterm
  • collection date, assay date, and any other sample properties in nd_assay also in nd_assayprop using cvterm and value
  • The three valules in nd_assayprop (clone_id, collection date, assay date) should be unique per unique assay_id
  • Data related to location (orchard), sub_location (lot number) and any other (county, state, etc) in nd_geolocation, and nd_geolocationprop tables
  • Project and stage information in project, linked to nd_assay via nd_assay_project
  • Relationship between stages and larger projects in project_relationship
  • phenotyping protocol, such as fruit_size_protocol(?), in nd_protocol
  • phenotype and the value , such as fruit size (attr_id) and 1 (value), in phenotype table
  • Questions:
    • Where do we store the definition of 1 through 5? Many phenotypes have values of 1 through 5 and have their own definitions. In phenotype table or in nd_protocolprop?
    • Some breeders may record fruit size from 1 thru 3 instead of 1 thru 5, then those could be stored as different protocols (fruit_size_protocol_1, fruit_size_protocol_2, etc), but under same phenotype.attr_id (fruit size). Then the definition of 1 through 5 should be stored somewhere in nd_protocolprop

-[Naama] this should be stored in the cvterm. If 'fruit size' has a 1-5 scale , the scale might mean 1=smallest, 5 = largest, then you store the scale in the definition of the 'fruit size' cvterm, and the numeric value in the phenotype table. If 1-5 translate to descriptive definition (1=round, 2= long, etc.) then you could store these as synonyms for the cvterm , or cvtermprops. If some breeder uses a different scale, then there should be a different cvterm for that.