Difference between revisions of "Chado Post-Composed Phenotypes"

From GMOD
Jump to: navigation, search
(Older proposals)
Line 14: Line 14:
 
== Older proposals ==
 
== Older proposals ==
  
An example of an EQ statement and two options for how it could be stored in the proposed revised Phenotype Module are shown below. Both examples used the proposed [[Chado_Group_Module|Group Module]]. New and modified tables relative to Chado version 1.23 are indicated in green.
+
See [[Talk:Chado_Post-Composed_Phenotypes]] for the older versions of this schema proposal
 
+
=== Option 1: [[Chado_Group_Module|Group Module]] is an intrinsic part of the phenotype_cvterm table ===
+
[[File:SampleEQstatementInChado.png]]
+
 
+
===Option 2: [[Chado_Group_Module|Group Module]] is decoupled from the phenotype_cvterm table===
+
[[File:Chado_phenotype_proposal.op2.png]]
+
Sample SQL for loading and querying this structure is [[Media:EQstatements.sql|here]].
+
 
+
=== Alternative: Represented directly in Group Module ===
+
 
+
This option separates the post-composed term completely from the phenotype table, permitting a particular statement to be re-used and to be attached to different types of data objects.
+
 
+
[[File:Chado_phenotype_proposal.v4.png]]
+
  
 
== New and Modified Tables in Phenotype Module ==
 
== New and Modified Tables in Phenotype Module ==

Revision as of 15:32, 19 August 2015

Overview

Increasingly phenotypes are rarely indicated with single, pre-composed term. A particular phenotype (or phene) can be described with an EAV statement (entity-attribute-value), or perhaps with more complex EQ statements (Entity-Quality statements in which the Entity and Quality parts themselves may contain several terms), and one expects even more complex statements in the future. In addition to containing multiple terms, these statements have a specific syntax that is critical to the meaning of the statement.

Our goal was to make minimal changes to Chado, and some of those are in the form of recommending deprecating some exiting table fields.

Update, Mar 2015: After running a trial on option 2 below, we found that the group table greatly increased the complexity of loading and querying the data, so we decided that rather than permitting an arbitrary level of statement structure hierarchy to force a maximum of 1 level of term grouping. This is expected to be sufficient for most if not all statement structures currently in use.

Proposal

Chado phenotype proposal.clause.jpg

Older proposals

See Talk:Chado_Post-Composed_Phenotypes for the older versions of this schema proposal

New and Modified Tables in Phenotype Module

 - Add phenotypeprop table.
 - Add phenotype_clause table, used for grouping phenotype_cvterm records into clauses within a statement.
 - Add type_id field to phenotype_cvterm to indicate role of term in a phenotype statement.
 - Add optional phenotype_clause_id field to phenotype_cvterm to permit grouping phenotype_cvterm records into clauses within a statement.
 CREATE TABLE phenotypeprop (
    phenotypeprop_id SERIAL PRIMARY KEY,
    phenotype_id INT NOT NULL,
       FOREIGN KEY (phenotype_id) REFERENCES phenotype (phenotype_id) ON DELETE CASCADE INITIALLY DEFERRED,
    type_id INT NOT NULL,
       FOREIGN KEY (type_id) REFERENCES cvterm (cvterm_id) ON DELETE CASCADE INITIALLY DEFERRED,
    value TEXT NULL,
    rank INT NOT NULL DEFAULT 0,

    CONSTRAINT phenotypeprop_c1 UNIQUE (phenotypeprop_id,type_id,rank)
 );
 COMMENT ON TABLE phenotypeprop IS "This table can be used to attach additional information to a phenotype or trait that is not part of the term or post-composed term. For example, heritability of a trait, dominant/recessive, et cetera.";
 CREATE TABLE phenotype_clause (
    phenotype_clause_id SERIAL PRIMARY KEY,
    uniquename TEXT NOT NULL,
    type_id INT NOT NULL,
       FOREIGN KEY (type_id) REFERENCES cvterm (cvterm_id) ON DELETE CASCADE INITIALLY DEFERRED,
    rank INT NOT NULL DEFAULT 0,
  );
 COMMENT ON TABLE phenotype_clause IS "Used to group phenotype_cvterm records into clauses, as are used in EQ statements where, for example, the primary entity may be a clause constructed with up to 3 terms";
 ALTER TABLE phenotype_cvterm 
   ADD COLUMN type_id INT NOT NULL,
      FOREIGN KEY type_id 
        REFERENCES cvterm (cvterm_id) ON DELETE CASCADE INITIALLY DEFERRED,
   ADD COLUMN phenotypeclause_id INT,
      FOREIGN KEY (grp_id) REFERENCES grp (grp_id) ON DELETE CASCADE INITIALLY DEFERRED,
 ;
 COMMENT ON COLUMN type_id IS "Name of this cvterm's role in a post-composed term";
 COMMENT ON COLUMN phenotypeclause_id IS "If this term is part of a clause within a statement, this field identifies the clause.";


Recommended Deprecated Fields

 COMMENT ON TABLE phenotype IS 'Columns observable_id, assay_id 
 are deprecated to break the connection between the phenotype value and the
 trait. The phenotype table should be used to store precomposed terms and the 
 phenotype value. Use tables phenotype_cvterm to store the trait(s) associated 
 with the phenotype.';

Controlled Vocabularies

The parts of a post-composed statement will need to be described in a cv. This could go into a new cv for each type of statement, or go into a general, post-composed_term cv.

For EQ statements:
Primary Entity
Primary Entity 1
Primary Entity 1 Relationship
Primary Entity 2
Quality
Qualifier
Secondary Entity
Secondary Entity 1
Secondary Entity 1 Relationship
Secondary Entity 2
...