Notes on simplified nd schema and Use Cases
- 1 SQL
- 2 Schema Drawing
- 3 Notes on the tables
- 4 Use Cases
Notes on the tables
- Stores all types (groups or individuals) of germplasm, and their relationship is defined through stock_relationship table
- Stock Relationship Ontology under development in the page below
- The type_id would store whether the assay is phenotype assay, genotype assay, crossexperiment, field collection study. or any other potential assays.
- In case of phenotype/genotype assay, the assay_id is unique per specific sample of a stock (part or clone of a stock under specific treatment), per collection date (eg. postharvest phenotype assay), per assay date, per type of protocol (eg. a specific molecular marker in case of genotype assay)
- linking table between nd_assay and phenotype
- Question: The SQL document says that there is one to one relationship between assay_id and phenotype_id but multiple samples can have the same phenotypic value so it would be many to one relationship. For example, the phenotypic value of firmness (eg. apple) varies from 1 to 5 (1 being very soft and 5 being very firm). Multiple assay_ids, therefore, can be linked to phenotype_id with the attr_id 'firmness' and value 1.
- [Naama] From my understanding every time you measure a phenotype you store it it the phenotype table with the relevant attributes. I don't think you are supposed to 'reuse' these records, even if you get a similar measurement.
- [Sook] Perhaps chado users could choose either way? If we check 1000 flies and 500 of them had white eyes and the rest of them had red eyes, we can only have two rows in phenotype table - red or white. Otherwise we'll have 1000 rows each connected with individual assay.. I think ParameciumDB store distinct phenotypes and link to multiple stocks. http://paramecium.cgm.cnrs-gif.fr/db/Phenotype/5
- linking table between nd_assay and genotype
- Question: For heterozygotes, do we store the genotype of an individual or an allele? In another word, when SSR results in 200 in one allele 230 in another for a locus of an individual, do we store 200/230 in genotype table or 200 and 230 separately? If we store the genotype of an allele in the genotype table, one nd_assay_id should be able to linked to multiple genotype_id. So nd_assay to genotype will be many to many in that case.
-[Naama] I would say for individuals you store the heterozygous genotype (200/230) and for an allele entry you store the actual allele genotype. If your individual is in the stock table, I guess you could link it to the allele and retrieve the genotype from there.
- Stores all the assay related data using cvterm_id and value
- The sample_id (or clone_id, tree_id, etc) of a stock, when part of a stock was used as a sample or a stock was propagated to produce multiple clones.
- Any treatment or environmental condition applied to the sample (fertilizer, salt conc, temp, etc), which are not part of a specific phenotyping/genotyping protocol
- Any property of a sample (eg. number of fruits picked per sample, fresh vs. stored, etc)
- Dates (collection date/assay date for phenoytping assay, assay date for genotyping assay, cross date for cross experiment and collection start/end date for field collection experiment)
- cross name/ID and cross type (F1, F2, etc) for cross experiment
- Notes and comments for a specific assay
- Question: Experimenter_id could be stored in cvterm_id and value and link to contact table OR do we better have nd_assay_contact table?
-[Naama] experimenter_id should be in a contact table (contact_id) .
-[Sook] Then can we add assay_contact table?
- Links individual assays/crosses to a bigger project
- Related tables: projectprop and project_relationship
- Linking table between nd_assay and stock
- New stocks can be generated by nd_assay in case of field collection or cross experiment (progeny)
- Or stock can be used for nd_assay in case of phenotype assay, genotype assay, and cross experiment (parent)
- The type_id can record whether the stock is a female parent, male parent, parent for mutation, or progeny in case of cross experiment
- The rootstocks that are used in planting fruit trees can be recorded in nd_assay_stock and the type_id could represent 'root stock'.
- Since nd_assay_stock is only a linking table now, not a table to store a specific sample of a stock, the IDs and the treatments done to the sample of a stock can be stored in nd_assayprop
- What could be stored here?
- stores phenotyping/genotyping protocols
- For genotyping assays, the protocol would be equivalent to molcular markers
- Any property of a protocol
- linking table between nd_assay and nd_protocol
- Many to one relationship between assay_id and protocol_id
- A reagent such as a primer, an enzyme, an adapter oligo, a linker oligo used in genotyping protocol or any other protocol
- feature_id links reagent with DNA sequences (eg. primer) to an entry in feature table
- Any property of reagents
- relationship between reagents
- linking table between nd_protocol and nd_reagent
tree fruit breeding data
- Cross name/ID, location, female and male parent, progeny, experimenter, project
- Cross type (F1, etc) and cross location in nd_assay (geolocation_id)
- Cross name/ID in nd_assayprop (cvterm_id and value)
- parent and progeny in stock table, linked to nd_assay via nd_assay_stock
- The whole progeny could be stored as a group in stock table and linked to nd_assay
- nd_assay_stock.type_id is for cvterms such as 'is a female parent', 'is a progeny', etc
- The relationship (between the parent and progeny) could be recorded again in stock_relationship
- Individual crosses can be linked to a bigger project via nd_assay_project
- The progeny (planted using a certain rootstock) are tested (1st stage), 50 progeny (distinct genotype) are selected, propagated (15 trees per genotype), planted using a certain rootstock, and tested in multiple orchards (2nd Stage), and more selected and tested (3rd Stage).
- The trees (or the fruits harvested from the tree) are evaluated multiple times for various phenotypic evaluations
- Fruits harvested at the same date can be tested on two (or more?) different dates (to compare fresh vs stored)
- Tree ID (eg. wsu001_1, gala_1), the identity of the corresponding progeny/control stock (eg. wsu001, gala, etc), rootstock (eg. rootstock_1), location of the trees (eg. orchard A), sub_location (eg. Lot 001), fertilizer treatment (eg. fertilizer A)
- project, stage, collection date, assay date, sample property (fresh vs stored; number of fruits collected per tree), phenotyping protocol, phenotypic value
- An example of phenotype is 'fruit size' and their value can be 1 through 5 (1=very small; 2=small; 3=medium; 4=large; 5= very large)
in nd module:
- The progeny, controls, rootstocks in stock table, linked to nd_assay via nd_assay_stock
- nd_assay_stock.type_id would be cvterms for 'rootstock', or 'stock'?
- Tree ID (wsu001_1, gala_1) in nd_assayprop using clone_id (?) as a cvterm
- collection date, assay date, and any other sample properties in nd_assay also in nd_assayprop using cvterm and value
- The three valules in nd_assayprop (clone_id, collection date, assay date) should be unique per unique assay_id
- Data related to location (orchard), sub_location (lot number) and any other (county, state, etc) in nd_geolocation, and nd_geolocationprop tables
- Project and stage information in project, linked to nd_assay via nd_assay_project
- Relationship between stages and larger projects in project_relationship
- phenotyping protocol, such as fruit_size_protocol(?), in nd_protocol
- phenotype and the value , such as fruit size (attr_id) and 1 (value), in phenotype table
- Where do we store the definition of 1 through 5? Many phenotypes have values of 1 through 5 and have their own definitions. In phenotype table or in nd_protocolprop?
- Some breeders may record fruit size from 1 thru 3 instead of 1 thru 5, then those could be stored as different protocols (fruit_size_protocol_1, fruit_size_protocol_2, etc), but under same phenotype.attr_id (fruit size). Then the definition of 1 through 5 should be stored somewhere in nd_protocolprop
-[Naama] this should be stored in the cvterm. If 'fruit size' has a 1-5 scale , the scale might mean 1=smallest, 5 = largest, then you store the scale in the definition of the 'fruit size' cvterm, and the numeric value in the phenotype table. If 1-5 translate to descriptive definition (1=round, 2= long, etc.) then you could store these as synonyms for the cvterm , or cvtermprops. If some breeder uses a different scale, then there should be a different cvterm for that.
mosquito field collection + karyotyping
- Collection site, catch method, time, project
- ND Module
- Collection Site -> nd_geolocation (lat/long/geodetic datum recorded directly in table. Any or all of GAZeteer IDs, postcodes, etc recorded as nd_geolocationprop CVterm IDs)
- Catch Method -> nd_protocol / nd_protocolprop (catch method recorded in IDOMAL and placed in nd_protocolprop)
- Time -> nd_assay (recorded in nd_assayprop, format depending on whether a date/time range or cv term)
- Project -> Project
- Karyotyping method, karyotype, project
- ND Module
- Karyotyping method -> nd_protocol (which CV to record in? or just record as freetext under protocolprop?)
- Karyotype -> genotype/feature (relies on karyptypes being clearly defined in feature module)
No obvious way to link a project to either a contact (i.e. experimenter) or a publication. project_pub / project_contact experiments needed?