Notes on simplified nd schema and Use Cases

Revision as of 00:09, 13 May 2010 by Sook (Talk | contribs)

Jump to: navigation, search



Notes on the tables


  • Stores all types (groups or individuals) of germplasm, and their relationship is defined through stock_relationship table
  • Stock Relationship Ontology under development in the page below


  • The type_id would store whether the assay is phenotype assay, genotype assay, crossexperiment, field collection study. or any other potential assays.
  • In case of phenotype/genotype assay, the assay_id is unique per specific sample of a stock (part or clone of a stock under specific treatment), per collection date (eg. postharvest phenotype assay), per assay date, per type of protocol (eg. a specific molecular marker in case of genotype assay)


  • linking table between nd_assay and phenotype
  • Question: The SQL document says that there is one to one relationship between assay_id and phenotype_id but multiple samples can have the same phenotypic value so it would be many to one relationship. For example, the phenotypic value of firmness (eg. apple) varies from 1 to 5 (1 being very soft and 5 being very firm). Multiple assay_ids, therefore, can be linked to phenotype_id with the attr_id 'firmness' and value 1.


  • linking table between nd_assay and genotype
  • Question: For heterozygotes, do we store the genotype of an individual or an allele? In another word, when SSR results in 200 in one allele 230 in another for a locus of an individual, do we store 200/230 in genotype table or 200 and 230 separately? If we store the genotype of an allele in the genotype table, one nd_assay_id should be able to linked to multiple genotype_id. So nd_assay to genotype will be many to many in that case.


  • Stores all the assay related data using cvterm_id and value
  • The sample_id (or clone_id, tree_id, etc) of a stock, when part of a stock was used as a sample or a stock was propagated to produce multiple clones.
  • Any treatment or environmental condition applied to the sample (fertilizer, salt conc, temp, etc), which are not part of a specific phenotyping/genotyping protocol
  • Any property of a sample (eg. number of fruits picked per sample, fresh vs. stored, etc)
  • Dates (collection date/assay date for phenoytping assay, assay date for genotyping assay, cross date for cross experiment and collection start/end date for field collection experiment)
  • cross name/ID and cross type (F1, F2, etc) for cross experiment
  • Notes and comments for a specific assay
  • Question: Experimenter_id could be stored in cvterm_id and value and link to contact table OR do we better have nd_assay_contact table?


  • Links individual assays/crosses to a bigger project
  • Related tables: projectprop and project_relationship


  • Linking table between nd_assay and stock
  • New stocks can be generated by nd_assay in case of field collection or cross experiment (progeny)
  • Or stock can be used for nd_assay in case of phenotype assay, genotype assay, and cross experiment (parent)
  • The type_id can record whether the stock is a female parent, male parent, parent for mutation, or progeny in case of cross experiment
  • The rootstocks that are used in planting fruit trees can be recorded in nd_assay_stock and the type_id could represent 'root stock'.


  • Since nd_assay_stock is only a linking table now, not a table to store a specific sample of a stock, the IDs and the treatments done to the sample of a stock can be stored in nd_assayprop
  • What could be stored here?


  • stores phenotyping/genotyping protocols
  • For genotyping assays, the protocol would be equivalent to molcular markers


  • Any property of a protocol


  • linking table between nd_assay and nd_protocol
  • Many to one relationship between assay_id and protocol_id


  • A reagent such as a primer, an enzyme, an adapter oligo, a linker oligo used in genotyping protocol or any other protocol
  • feature_id links reagent with DNA sequences (eg. primer) to an entry in feature table


  • Any property of reagents


  • relationship between reagents


  • linking table between nd_protocol and nd_reagent



Use Cases

tree fruit breeding data

A. Cross


  • Cross name/ID, location, female and male parent, progeny, experimenter, project

nd module:

  • Cross type (F1, etc) and cross location in nd_assay (geolocation_id)
  • Cross name/ID in nd_assayprop (cvterm_id and value)
  • parent and progeny in stock table, linked to nd_assay via nd_assay_stock
  • The whole progeny could be stored as a group in stock table and linked to nd_assay
  • nd_assay_stock.type_id is for cvterms such as 'is a female parent', 'is a progeny', etc
  • The relationship (between the parent and progeny) could be recorded again in stock_relationship
  • Individual crosses can be linked to a bigger project via nd_assay_project

B. Phenotype Evaluation

  • The progeny (planted using a certain rootstock) are tested (1st stage), 50 progeny (distinct genotype) are selected, propagated (15 trees per genotype), planted using a certain rootstock, and tested in multiple orchards (2nd Stage), and more selected and tested (3rd Stage).
  • The trees (or the fruits harvested from the tree) are evaluated multiple times for various phenotypic evaluations
  • Fruits harvested at the same date can be tested on two (or more?) different dates (to compare fresh vs stored)


  • Tree ID (eg. wsu001_1, gala_1), the identity of the corresponding progeny/control stock (eg. wsu001, gala, etc), rootstock (eg. rootstock_1), location of the trees (eg. orchard A), sub_location (eg. Lot 001), fertilizer treatment (eg. fertilizer A)
  • project, stage, collection date, assay date, sample property (fresh vs stored; number of fruits collected per tree), phenotyping protocol, phenotypic value
  • An example of phenotype is 'fruit size' and their value can be 1 through 5 (1=very small; 2=small; 3=medium; 4=large; 5= very large)

in nd module:

  • The progeny, controls, rootstocks in stock table, linked to nd_assay via nd_assay_stock
  • nd_assay_stock.type_id would be cvterms for 'rootstock', or 'stock'?
  • Tree ID (wsu001_1, gala_1) in nd_assayprop using clone_id (?) as a cvterm
  • collection date, assay date, and any other sample properties in nd_assay also in nd_assayprop using cvterm and value
  • The three valules in nd_assayprop (clone_id, collection date, assay date) should be unique per unique assay_id
  • Data related to location (orchard), sub_location (lot number) and any other (county, state, etc) in nd_geolocation, and nd_geolocationprop tables
  • Project and stage information in project, linked to nd_assay via nd_assay_project
  • Relationship between stages and larger projects in project_relationship
  • phenotyping protocol, such as fruit_size_protocol(?), in nd_protocol
  • phenotype and the value , such as fruit size (attr_id) and 1 (value), in phenotype table
  • Questions:
    • Where do we store the definition of 1 through 5? Many phenotypes have values of 1 through 5 and have their own definitions. In phenotype table or in nd_protocolprop?
    • Some breeders may record fruit size from 1 thru 3 instead of 1 thru 5, then those could be stored as different protocols (fruit_size_protocol_1, fruit_size_protocol_2, etc), but under same phenotype.attr_id (fruit size). Then the definition of 1 through 5 should be stored somewhere in nd_protocolprop