Difference between revisions of "Notes on simplified nd schema and Use Cases"
NaamaMenda (Talk | contribs) (→Notes on the tables) |
NaamaMenda (Talk | contribs) (→tree fruit breeding data) |
||
Line 176: | Line 176: | ||
**Where do we store the definition of 1 through 5? Many phenotypes have values of 1 through 5 and have their own definitions. In phenotype table or in nd_protocolprop? | **Where do we store the definition of 1 through 5? Many phenotypes have values of 1 through 5 and have their own definitions. In phenotype table or in nd_protocolprop? | ||
**Some breeders may record fruit size from 1 thru 3 instead of 1 thru 5, then those could be stored as different protocols (fruit_size_protocol_1, fruit_size_protocol_2, etc), but under same phenotype.attr_id (fruit size). Then the definition of 1 through 5 should be stored somewhere in nd_protocolprop | **Some breeders may record fruit size from 1 thru 3 instead of 1 thru 5, then those could be stored as different protocols (fruit_size_protocol_1, fruit_size_protocol_2, etc), but under same phenotype.attr_id (fruit size). Then the definition of 1 through 5 should be stored somewhere in nd_protocolprop | ||
+ | |||
+ | -[Naama] this should be stored in the cvterm. If 'fruit size' has a 1-5 scale , the scale might mean 1=smallest, 5 = largest, then you store the scale in the definition of the 'fruit size' cvterm, and the numeric value in the phenotype table. | ||
+ | If 1-5 translate to descriptive definition (1=round, 2= long, etc.) then you could store these as synonyms for the cvterm , or cvtermprops. If some breeder uses a different scale, then there should be a different cvterm for that. |
Revision as of 02:32, 13 May 2010
SQL
Schema Drawing
http://gmod.org/w/index.php?title=Talk:Chado_Natural_Diversity_Module_Working_Group#April_2010
Notes on the tables
stock
- Stores all types (groups or individuals) of germplasm, and their relationship is defined through stock_relationship table
- Stock Relationship Ontology under development in the page below
http://gmod.org/wiki/Stock_Relationship_Ontology
nd_assay
- The type_id would store whether the assay is phenotype assay, genotype assay, crossexperiment, field collection study. or any other potential assays.
- In case of phenotype/genotype assay, the assay_id is unique per specific sample of a stock (part or clone of a stock under specific treatment), per collection date (eg. postharvest phenotype assay), per assay date, per type of protocol (eg. a specific molecular marker in case of genotype assay)
nd_assay_phenotype
- linking table between nd_assay and phenotype
- Question: The SQL document says that there is one to one relationship between assay_id and phenotype_id but multiple samples can have the same phenotypic value so it would be many to one relationship. For example, the phenotypic value of firmness (eg. apple) varies from 1 to 5 (1 being very soft and 5 being very firm). Multiple assay_ids, therefore, can be linked to phenotype_id with the attr_id 'firmness' and value 1.
- [Naama] From my understanding every time you measure a phenotype you store it it the phenotype table with the relevant attributes. I don't think you are supposed to 'reuse' these records, even if you get a similar measurement.
nd_assay_genotype
- linking table between nd_assay and genotype
- Question: For heterozygotes, do we store the genotype of an individual or an allele? In another word, when SSR results in 200 in one allele 230 in another for a locus of an individual, do we store 200/230 in genotype table or 200 and 230 separately? If we store the genotype of an allele in the genotype table, one nd_assay_id should be able to linked to multiple genotype_id. So nd_assay to genotype will be many to many in that case.
-[Naama] I would say for individuals you store the heterozygous genotype (200/230) and for an allele entry you store the actual allele genotype. If your individual is in the stock table, I guess you could link it to the allele and retrieve the genotype from there.
nd_assayprop
- Stores all the assay related data using cvterm_id and value
- The sample_id (or clone_id, tree_id, etc) of a stock, when part of a stock was used as a sample or a stock was propagated to produce multiple clones.
- Any treatment or environmental condition applied to the sample (fertilizer, salt conc, temp, etc), which are not part of a specific phenotyping/genotyping protocol
- Any property of a sample (eg. number of fruits picked per sample, fresh vs. stored, etc)
- Dates (collection date/assay date for phenoytping assay, assay date for genotyping assay, cross date for cross experiment and collection start/end date for field collection experiment)
- cross name/ID and cross type (F1, F2, etc) for cross experiment
- Notes and comments for a specific assay
- Question: Experimenter_id could be stored in cvterm_id and value and link to contact table OR do we better have nd_assay_contact table?
-[Naama] experimenter_id should be in a contact table (contact_id) .
nd_assay_project
- Links individual assays/crosses to a bigger project
- Related tables: projectprop and project_relationship
nd_assay_stock
- Linking table between nd_assay and stock
- New stocks can be generated by nd_assay in case of field collection or cross experiment (progeny)
- Or stock can be used for nd_assay in case of phenotype assay, genotype assay, and cross experiment (parent)
- The type_id can record whether the stock is a female parent, male parent, parent for mutation, or progeny in case of cross experiment
- The rootstocks that are used in planting fruit trees can be recorded in nd_assay_stock and the type_id could represent 'root stock'.
nd_assay_stockprop
- Since nd_assay_stock is only a linking table now, not a table to store a specific sample of a stock, the IDs and the treatments done to the sample of a stock can be stored in nd_assayprop
- What could be stored here?
nd_protocol
- stores phenotyping/genotyping protocols
- For genotyping assays, the protocol would be equivalent to molcular markers
nd_protocolprop
- Any property of a protocol
nd_assay_protocol
- linking table between nd_assay and nd_protocol
- Many to one relationship between assay_id and protocol_id
nd_reagent
- A reagent such as a primer, an enzyme, an adapter oligo, a linker oligo used in genotyping protocol or any other protocol
- feature_id links reagent with DNA sequences (eg. primer) to an entry in feature table
nd_reagentprop
- Any property of reagents
nd_reagent_relationship
- relationship between reagents
nd_protocol_reagent
- linking table between nd_protocol and nd_reagent
nd_geolocation
nd_geolocationprop
Use Cases
tree fruit breeding data
A. Cross
Data:
- Cross name/ID, location, female and male parent, progeny, experimenter, project
nd module:
- Cross type (F1, etc) and cross location in nd_assay (geolocation_id)
- Cross name/ID in nd_assayprop (cvterm_id and value)
- parent and progeny in stock table, linked to nd_assay via nd_assay_stock
- The whole progeny could be stored as a group in stock table and linked to nd_assay
- nd_assay_stock.type_id is for cvterms such as 'is a female parent', 'is a progeny', etc
- The relationship (between the parent and progeny) could be recorded again in stock_relationship
- Individual crosses can be linked to a bigger project via nd_assay_project
B. Phenotype Evaluation
- The progeny (planted using a certain rootstock) are tested (1st stage), 50 progeny (distinct genotype) are selected, propagated (15 trees per genotype), planted using a certain rootstock, and tested in multiple orchards (2nd Stage), and more selected and tested (3rd Stage).
- The trees (or the fruits harvested from the tree) are evaluated multiple times for various phenotypic evaluations
- Fruits harvested at the same date can be tested on two (or more?) different dates (to compare fresh vs stored)
Data:
- Tree ID (eg. wsu001_1, gala_1), the identity of the corresponding progeny/control stock (eg. wsu001, gala, etc), rootstock (eg. rootstock_1), location of the trees (eg. orchard A), sub_location (eg. Lot 001), fertilizer treatment (eg. fertilizer A)
- project, stage, collection date, assay date, sample property (fresh vs stored; number of fruits collected per tree), phenotyping protocol, phenotypic value
- An example of phenotype is 'fruit size' and their value can be 1 through 5 (1=very small; 2=small; 3=medium; 4=large; 5= very large)
in nd module:
- The progeny, controls, rootstocks in stock table, linked to nd_assay via nd_assay_stock
- nd_assay_stock.type_id would be cvterms for 'rootstock', or 'stock'?
- Tree ID (wsu001_1, gala_1) in nd_assayprop using clone_id (?) as a cvterm
- collection date, assay date, and any other sample properties in nd_assay also in nd_assayprop using cvterm and value
- The three valules in nd_assayprop (clone_id, collection date, assay date) should be unique per unique assay_id
- Data related to location (orchard), sub_location (lot number) and any other (county, state, etc) in nd_geolocation, and nd_geolocationprop tables
- Project and stage information in project, linked to nd_assay via nd_assay_project
- Relationship between stages and larger projects in project_relationship
- phenotyping protocol, such as fruit_size_protocol(?), in nd_protocol
- phenotype and the value , such as fruit size (attr_id) and 1 (value), in phenotype table
- Questions:
- Where do we store the definition of 1 through 5? Many phenotypes have values of 1 through 5 and have their own definitions. In phenotype table or in nd_protocolprop?
- Some breeders may record fruit size from 1 thru 3 instead of 1 thru 5, then those could be stored as different protocols (fruit_size_protocol_1, fruit_size_protocol_2, etc), but under same phenotype.attr_id (fruit size). Then the definition of 1 through 5 should be stored somewhere in nd_protocolprop
-[Naama] this should be stored in the cvterm. If 'fruit size' has a 1-5 scale , the scale might mean 1=smallest, 5 = largest, then you store the scale in the definition of the 'fruit size' cvterm, and the numeric value in the phenotype table. If 1-5 translate to descriptive definition (1=round, 2= long, etc.) then you could store these as synonyms for the cvterm , or cvtermprops. If some breeder uses a different scale, then there should be a different cvterm for that.