Chado Natural Diversity Module/natdiv schema changes call

From GMOD
Revision as of 17:25, 1 June 2011 by Ybendana (Talk | contribs)

Jump to: navigation, search

Conference call to resolve the latest proposed changes to natdiv module.

Date

Thursday, May 26, 6pm BST / 1pm EST / 10am PST

Meeting notes

Chado_Natural_Diversity_Module/natdiv_call_notes

Participants

Agenda

  1. Triage proposed changes into the following categories:
    • implement before paper publishing
    • implement after paper publishing
    • do not implement
  2. Bio::Chado::Schema update
    • can someone do one after the changes have been made? Maccallr 14:37, 26 May 2011 (UTC)

Proposed changes

Prop table in genotype module

  • change: addition of (vanilla) prop table to genotype module [cvterm_id, value, rank]
    • proposer: Seth Redmond / Vectorbase
    • reason: enables us to store ontology terms for current genotypes, e.g. presence/absence of specific inversions - impossible under current schema
    • Did I understand correctly that for a genotypeprop table that cvterm_id would allow NULL? Scott 17:17, 26 May 2011 (UTC)

Hackathon changes

Yuri's proposals

  • Wouldn't it be preferable to give at least workable solutions to the two significant flaws of the phenotype module before publishing the paper? From the last call, these are:
    • Phenotype description and value are 1:1 and in the same table. Solution: Use nd_experiment_phenotypeprop to store values.
      • It is not necessarily 1:1. Phenotype description is stored in cvterm table. The phenotype table stores the value (value or cvalue_id) and has foreign keys (observable_id, attr_id). We could choose to use either one foreign key to connect to one phenotype descriptor or use both. So current schema can store 1 to M, phenotype descriptor to value. (Sook)
        • I think the phenotype table should store the description. --Yuri
    • No (straightforward) way available to do post-composition of phenotype descriptions. Solution: Use phenotypeprop with cvalue_id
      • I think post composition can be done in cvterm_relationship table. All of those terms 'stem diameter at harvest in mm', 'Stem_diameter', 'at harvest', and 'mm' can be stored in cvterm table. The cvterm_relationship table then can store the relationships between 'stem diameter at harvest' and 'stem diameter' with the term 'part of ' 'belongs to' 'unit' or whatever relationship term appropriate.(Sook)
        • You would need a way to link from the phenotype table to the cvterm_relationship table. --Yuri
    • These solutions are easy to implement and can always be refined at some later date.
  • Add environmentprop. This is useful when creating phenstatements.
    • Example phenstatement: “The mean of the phenotype root length in genotype TN7.4 given an environment of NaCl treatment of 100 millimolar is 10.5 mm”
    • environment: uniquename='100 NaCl'
    • environmentprop: type_id='NaCl treatment', value=100, cvalue_id='mM'
      • How about '100 mM NaCl' as environment.uniquename? It can be linked to cvterm via environment_cvterm table.'100 mM NaCl', 'NaCl treatment' and 'mM' can all be separately stored in cvterm table and associated via cvterm_relationship table (see above). I think cvalue_id is to store qualitative values that can be stored in cvterm table, not for the units(Sook).
      • or just store '100 mM' in the value field. Or the cvterm should have the unit as part of its definition 'NaCl in mM' --NaamaMenda 13:58, 1 June 2011 (UTC)
  • Add phenstatementprop. This is useful when creating phenstatements.
    • phenstatement: type_id = 'summary statistic', phenotype_id='flower number', genotype_id='TN7.4', pub_id='experimental result'
    • phenstatementprop: type_id='mean', value = 10.5, cvalue_id='mm'
      • this is the actual phenotype value, and should be in the phenotype table , or whatever other table we choose for values or post-composed terms.
  • Add nd_experiment_protocolprop. I use this to store protocol values specific to an nd_experiment.
    • Eg: nd_protocol.type_id='NaCl treatment', nd_experiment_protocolprop:{type_id='treatment amount', value=100, cvalue_id='mM'}
    • +1, could definitely use this for same reasons (e,g, same insecticide resistance assay protocol, but with different insecticides, exposure times, and/or concentrations; currently using nd_experimentprop) Maccallr 13:24, 30 May 2011 (UTC)
      • Can we make multiple protocols, such as NaCl 100mM, NaCl 10mM, etc, (or insecticide resistance assay 1, 2, etc) and link to nd_experiment table? If we want to group similar protocols, we could use protocolprop (type_id = protocol_type, value = insecticide resistance assay protocol). The details can be stored in protocolprop (type_id=exposure time, value=1 hr: type_id=concentration, value= 10 mM), etc). The insecticide can also be stored in reagent table(Sook).
      • Aren't these different protocols if you are using different amounts? I don't think the amounts are properties of the protocol. --NaamaMenda 13:58, 1 June 2011 (UTC)
      • I also think prop tables for linking tables are not consistent with the rest of chado tables and make chado schema too complicated (Sook).
  • Add nd_experiment_phenotypeprop. I use this to store phenotype observations specific to an nd_experiment.
    • Eg: phenotype.observable_id='root length', nd_experiment_phenotypeprop:{type_id='observation', value=10.5, cvalue_id='mm'}
      • 10.5 can be stored in phenotype.value and the unit can be associated with the cvterm itself in the cvterm table (Sook).
  • Add cvalue_id to NatDiv property tables and related property tables like projectprop. This allows for postcomposition of cvterms like units to the property type_id.
    • Eg: type_id='my experimental bucket color', cvalue_id='purple'
      • phenotype_cvterm with a type_id column should do. Should we add this to the sql file in a new svn branch ? --NaamaMenda 13:58, 1 June 2011 (UTC)
    • Clarification: I didn't propose this originally but Naama brought up the concern that the property tables weren't consistent if some have cvalue_id and others don't.
      • The issue here is whether the prop tables are the right place for this. I think we can add non-uniform colums to prop tables, assuming it is necessary, and Chado has no better way to answer. I think broad usage of prop tables should be limited. Look at props as a place to add some unstructured metadata (dates, names of places, nicknames, maybe synonyms, or anything else you do not wish to structure as a cvterm). If the data does need structure, it is better to store it in a designated well defined table. This is also better for querying the database. With cvalue_id you need to ask does my prop has a cvalue? what is the meaning of my cvalue? etc. --NaamaMenda 13:58, 1 June 2011 (UTC)
    • Chado has some way to post-compose cvterms which Maccallr 11:56, 17 May 2011 (UTC) doesn't understand.
      • It looks rather complex. --Yuri

(Sook) I think that the solution is to store the phenotypic value in the phenotype table and store the cvterm_id of the post-composed phenotypic descriptor in the phenotype table. The further-up cvterms can be associated via cvterm_relationship table. We only use 'attr_id' to store the final post-composed phenotypic descriptor. It might be better to have descriptor_id in the phenotype table so that users who use both 'attr_id' and 'observable_id' can keep their practice.

  • It was decided in the meeting to discuss these major changes to the phenotype module in another discussion topic. It was also proposed to see if phenotype_cvterm with a type_id field would be adequate. It is clear the phenotype table does not address well post-composing terms, yet we'd like to avoid making the same mistake again of adding multiple columns, which renders the schema not normalized. Also prop tables are not an ideal place for phenotype values (see more in the Chado_Natural_Diversity_Module/natdiv_call_notes ). We should keep this discussion related directly to the ND schema, and hammer out the phenotype module as a second stage. --NaamaMenda 01:14, 1 June 2011 (UTC)
    • I started a discussion for this page to address the phenotype module proposed changes. (click on the 'discussion' tab at the top of the page) --NaamaMenda 14:08, 1 June 2011 (UTC)

Bob's proposals

Just looking at the NatDiv prop tables, saw some inconsistencies:

  • nd_geolocationprop.value is varchar(250) while others in NatDiv are 255. Rest of chado is type 'text'. Propose change to text.

this means we need to change the value type in all nd prop tables to text (Naama)

  • nd_experimentprop.value is NOT NULL while all others (in NatDiv) allow NULL (rest of chado is mixed). Propose all allow NULL.

This was already fixed. I committed the SQL a couple of weeks ago (Naama)

    • I just haven't rolled it into the default_schema.sql yet Scott 17:10, 26 May 2011 (UTC)