Difference between revisions of "Chado Natural Diversity Module/natdiv schema changes call"

From GMOD
Jump to: navigation, search
(Yuri's proposals)
(Yuri's proposals)
Line 73: Line 73:
 
** Chado has some way to [http://gmod.org/wiki/Chado_CV_Module#Post-coordinating_Terms post-compose cvterms] which [[User:Maccallr|Maccallr]] 11:56, 17 May 2011 (UTC) doesn't understand.
 
** Chado has some way to [http://gmod.org/wiki/Chado_CV_Module#Post-coordinating_Terms post-compose cvterms] which [[User:Maccallr|Maccallr]] 11:56, 17 May 2011 (UTC) doesn't understand.
 
*** It looks rather complex. --Yuri
 
*** It looks rather complex. --Yuri
 +
 +
(Sook) I think that the solution is to store the phenotypic value in the phenotype table and store the cvterm_id of the post-composed phenotypic descriptor in the phenotype table. The further-up cvterms can be associated via cvterm_relationship table. We only use 'attr_id' to store the final post-composed phenotypic descriptor. It might be better to have descriptor_id in the phenotype table so that users who use both 'attr_id' and 'observable_id' can keep their practice.
  
 
=== Bob's proposals ===
 
=== Bob's proposals ===

Revision as of 00:30, 1 June 2011

Conference call to resolve the latest proposed changes to natdiv module.

Date

Thursday, May 26, 6pm BST / 1pm EST / 10am PST

Participants

Agenda

  1. Triage proposed changes into the following categories:
    • implement before paper publishing
    • implement after paper publishing
    • do not implement
  2. Bio::Chado::Schema update
    • can someone do one after the changes have been made? Maccallr 14:37, 26 May 2011 (UTC)

Proposed changes

Prop table in genotype module

  • change: addition of (vanilla) prop table to genotype module [cvterm_id, value, rank]
    • proposer: Seth Redmond / Vectorbase
    • reason: enables us to store ontology terms for current genotypes, e.g. presence/absence of specific inversions - impossible under current schema
    • Did I understand correctly that for a genotypeprop table that cvterm_id would allow NULL? Scott 17:17, 26 May 2011 (UTC)

Hackathon changes

Yuri's proposals

  • Wouldn't it be preferable to give at least workable solutions to the two significant flaws of the phenotype module before publishing the paper? From the last call, these are:
    • Phenotype description and value are 1:1 and in the same table. Solution: Use nd_experiment_phenotypeprop to store values.
      • It is not necessarily 1:1. Phenotype description is stored in cvterm table. The phenotype table stores the value (value or cvalue_id) and has foreign keys (observable_id, attr_id). We could choose to use either one foreign key to connect to one phenotype descriptor or use both. So current schema can store 1 to M, phenotype descriptor to value. (Sook)
    • No (straightforward) way available to do post-composition of phenotype descriptions. Solution: Use phenotypeprop with cvalue_id
      • I think post composition can be done in cvterm_relationship table. All of those terms 'stem diameter at harvest in mm', 'Stem_diameter', 'at harvest', and 'mm' can be stored in cvterm table. The cvterm_relationship table then can store the relationships between 'stem diameter at harvest' and 'stem diameter' with the term 'part of ' 'belongs to' 'unit' or whatever relationship term appropriate.(Sook)
    • These solutions are easy to implement and can always be refined at some later date.
  • Add environmentprop. This is useful when creating phenstatements.
    • Example phenstatement: “The mean of the phenotype root length in genotype TN7.4 given an environment of NaCl treatment of 100 millimolar is 10.5 mm”
    • environment: uniquename='100 NaCl'
    • environmentprop: type_id='NaCl treatment', value=100, cvalue_id='mM'
      • How about '100 mM NaCl' as environment.uniquename? It can be linked to cvterm via environment_cvterm table.'100 mM NaCl', 'NaCl treatment' and 'mM' can all be separately stored in cvterm table and associated via cvterm_relationship table (see above). I think cvalue_id is to store qualitative values that can be stored in cvterm table, not for the units(Sook).
  • Add phenstatementprop. This is useful when creating phenstatements.
    • phenstatement: type_id = 'summary statistic', phenotype_id='flower number', genotype_id='TN7.4', pub_id='experimental result'
    • phenstatementprop: type_id='mean', value = 10.5, cvalue_id='mm'
  • Add nd_experiment_protocolprop. I use this to store protocol values specific to an nd_experiment.
    • Eg: nd_protocol.type_id='NaCl treatment', nd_experiment_protocolprop:{type_id='treatment amount', value=100, cvalue_id='mM'}
    • +1, could definitely use this for same reasons (e,g, same insecticide resistance assay protocol, but with different insecticides, exposure times, and/or concentrations; currently using nd_experimentprop) Maccallr 13:24, 30 May 2011 (UTC)
      • Can we make multiple protocols, such as NaCl 100mM, NaCl 10mM, etc, (or insecticide resistance assay 1, 2, etc) and link to nd_experiment table? If we want to group similar protocols, we could use protocolprop (type_id = protocol_type, value = insecticide resistance assay protocol). The details can be stored in protocolprop (type_id=exposure time, value=1 hr: type_id=concentration, value= 10 mM), etc). The insecticide can also be stored in reagent table(Sook).
      • I also think prop tables for linking tables are not consistent with the rest of chado tables and make chado schema too complicated (Sook).
  • Add nd_experiment_phenotypeprop. I use this to store phenotype observations specific to an nd_experiment.
    • Eg: phenotype.observable_id='root length', nd_experiment_phenotypeprop:{type_id='observation', value=10.5, cvalue_id='mm'}
      • 10.5 can be stored in phenotype.value and the unit can be associated with the cvterm itself in the cvterm table (Sook).
  • Add cvalue_id to NatDiv property tables and related property tables like projectprop. This allows for postcomposition of cvterms like units to the property type_id.
    • Eg: type_id='my experimental bucket color', cvalue_id='purple'
    • Clarification: I didn't propose this originally but Naama brought up the concern that the property tables weren't consistent if some have cvalue_id and others don't.
    • Chado has some way to post-compose cvterms which Maccallr 11:56, 17 May 2011 (UTC) doesn't understand.
      • It looks rather complex. --Yuri

(Sook) I think that the solution is to store the phenotypic value in the phenotype table and store the cvterm_id of the post-composed phenotypic descriptor in the phenotype table. The further-up cvterms can be associated via cvterm_relationship table. We only use 'attr_id' to store the final post-composed phenotypic descriptor. It might be better to have descriptor_id in the phenotype table so that users who use both 'attr_id' and 'observable_id' can keep their practice.

Bob's proposals

Just looking at the NatDiv prop tables, saw some inconsistencies:

  • nd_geolocationprop.value is varchar(250) while others in NatDiv are 255. Rest of chado is type 'text'. Propose change to text.

this means we need to change the value type in all nd prop tables to text (Naama)

  • nd_experimentprop.value is NOT NULL while all others (in NatDiv) allow NULL (rest of chado is mixed). Propose all allow NULL.

This was already fixed. I committed the SQL a couple of weeks ago (Naama)

    • I just haven't rolled it into the default_schema.sql yet Scott 17:10, 26 May 2011 (UTC)