Difference between revisions of "Chado Post-Composed Phenotypes"

From GMOD
Jump to: navigation, search
(New Tables)
(Overview)
 
(17 intermediate revisions by 2 users not shown)
Line 5: Line 5:
 
Our goal was to make minimal changes to Chado, and some of those are in the form of recommending deprecating some exiting table fields.
 
Our goal was to make minimal changes to Chado, and some of those are in the form of recommending deprecating some exiting table fields.
  
The proposal below makes use of the newly-proposed Group Module (in 2014) to show syntax structure, and some new proposed tables to be added to the Phenotype Module.
+
'''Update, August 2015'''
 +
The schema changes bellow offer the least changes and will cover storing EAV and EQ statements, post and pre-composed cvterms.
 +
These changes require only adding a new table, phenotype_clause, and a number of new rows in phenotype_cvterm.
 +
We will attempt to introduce these changes in Chado v1.3
  
== Example ==
 
  
An example of an EQ statement and two options for how it could be stored in the proposed revised Phenotype Module are shown below. Both examples used the proposed [[Chado_Group_Module|Group Module]]. New and modified tables are indicated in green.
+
'''Update, Mar 2015:'''
 +
After running a trial on option 2 below, we found that the group table greatly increased the complexity of loading and querying the data, so we decided that rather than permitting an arbitrary level of statement structure hierarchy to force a maximum of 1 level of term grouping. This is expected to be sufficient for most if not all statement structures currently in use.
  
=== Option 1: Group Module is an intrinsic part of the phenotype_cvterm table ===
+
== Proposal ==
[[File:SampleEQstatementInChado.png]]
+
  
===Option 2: Group Module is decoupled from the phenotype_cvterm table===
+
[[File:Chado_phenotype_proposal.clause.jpg]]
[[File:Chado_phenotype_proposal.op2.png]]
+
  
== New and Modified Tables ==
+
 
 +
== New and Modified Tables in Phenotype Module ==
 +
 
 +
  - Add phenotypeprop table.
 +
  - Add phenotype_clause table, used for grouping phenotype_cvterm records into clauses within a statement.
 +
  - Add type_id field to phenotype_cvterm to indicate role of term in a phenotype statement.
 +
  - Add optional phenotype_clause_id field to phenotype_cvterm to permit grouping phenotype_cvterm records into clauses within a statement.
  
 
   CREATE TABLE phenotypeprop (
 
   CREATE TABLE phenotypeprop (
      phenotypeprop_id SERIAL PRIMARY KEY,
+
    phenotypeprop_id SERIAL PRIMARY KEY,
      phenotype_id INT NOT NULL,
+
    phenotype_id INT NOT NULL,
        FOREIGN KEY (phenotype_id) REFERENCES phenotype (phenotype_id) ON DELETE CASCADE INITIALLY DEFERRED,
+
        FOREIGN KEY (phenotype_id) REFERENCES phenotype (phenotype_id) ON DELETE CASCADE INITIALLY DEFERRED,
      type_id INT NOT NULL,
+
    type_id INT NOT NULL,
        FOREIGN KEY (type_id) REFERENCES cvterm (cvterm_id) ON DELETE CASCADE INITIALLY DEFERRED,
+
        FOREIGN KEY (type_id) REFERENCES cvterm (cvterm_id) ON DELETE CASCADE INITIALLY DEFERRED,
      value TEXT NULL,
+
    value TEXT NULL,
      rank INT NOT NULL DEFAULT 0,
+
    rank INT NOT NULL DEFAULT 0,
 
+
      CONSTRAINT phenotypeprop_c1 UNIQUE (phenotypeprop_id,type_id,rank)
+
    CONSTRAINT phenotypeprop_c1 UNIQUE (phenotypeprop_id,type_id,rank)
 +
  );
 +
  COMMENT ON TABLE phenotypeprop IS "This table can be used to attach additional information to a phenotype or trait that is not part of the term or post-composed term. For example, heritability of a trait, dominant/recessive, et cetera.";
 +
 
 +
  CREATE TABLE phenotype_clause (
 +
    phenotype_clause_id SERIAL PRIMARY KEY,
 +
    uniquename TEXT NOT NULL,
 +
    type_id INT NOT NULL,
 +
        FOREIGN KEY (type_id) REFERENCES cvterm (cvterm_id) ON DELETE CASCADE INITIALLY DEFERRED,
 +
    rank INT NOT NULL DEFAULT 0,
 
   );
 
   );
 +
  COMMENT ON TABLE phenotype_clause IS "Used to group phenotype_cvterm records into clauses, as are used in EQ statements where, for example, the primary entity may be a clause constructed with up to 3 terms";
  
  ALTER TABLE phenotype_cvterm  
+
  ALTER TABLE phenotype_cvterm  
 
     ADD COLUMN type_id INT NOT NULL,
 
     ADD COLUMN type_id INT NOT NULL,
        FOREIGN KEY (type_id) REFERENCES cvterm (cvterm_id) ON DELETE CASCADE INITIALLY DEFERRED,
+
      FOREIGN KEY type_id  
  ;
+
        REFERENCES cvterm (cvterm_id) ON DELETE CASCADE INITIALLY DEFERRED,
 +
    ADD COLUMN phenotypeclause_id INT,
 +
      FOREIGN KEY (grp_id) REFERENCES grp (grp_id) ON DELETE CASCADE INITIALLY DEFERRED,
 +
  ;
 +
  COMMENT ON COLUMN type_id IS "Name of this cvterm's role in a post-composed term";
 +
  COMMENT ON COLUMN phenotypeclause_id IS "If this term is part of a clause within a statement, this field identifies the clause.";
 +
 
  
 
== Recommended Deprecated Fields ==
 
== Recommended Deprecated Fields ==
Line 42: Line 65:
 
   trait. The phenotype table should be used to store precomposed terms and the  
 
   trait. The phenotype table should be used to store precomposed terms and the  
 
   phenotype value. Use tables phenotype_cvterm to store the trait(s) associated  
 
   phenotype value. Use tables phenotype_cvterm to store the trait(s) associated  
   with the phenotype. Post-composed terms should use the Group Module tables
+
   with the phenotype.';
  to indicate the statement syntax';
+
  
 
== Controlled Vocabularies ==
 
== Controlled Vocabularies ==
Line 61: Line 83:
 
Secondary Entity 2 <br>
 
Secondary Entity 2 <br>
 
... <br>
 
... <br>
 +
 +
 +
== Older proposals ==
 +
 +
See [[Talk:Chado_Post-Composed_Phenotypes]]  for the older versions of this schema proposal

Latest revision as of 23:12, 19 August 2015

Overview

Increasingly phenotypes are rarely indicated with single, pre-composed term. A particular phenotype (or phene) can be described with an EAV statement (entity-attribute-value), or perhaps with more complex EQ statements (Entity-Quality statements in which the Entity and Quality parts themselves may contain several terms), and one expects even more complex statements in the future. In addition to containing multiple terms, these statements have a specific syntax that is critical to the meaning of the statement.

Our goal was to make minimal changes to Chado, and some of those are in the form of recommending deprecating some exiting table fields.

Update, August 2015 The schema changes bellow offer the least changes and will cover storing EAV and EQ statements, post and pre-composed cvterms. These changes require only adding a new table, phenotype_clause, and a number of new rows in phenotype_cvterm. We will attempt to introduce these changes in Chado v1.3


Update, Mar 2015: After running a trial on option 2 below, we found that the group table greatly increased the complexity of loading and querying the data, so we decided that rather than permitting an arbitrary level of statement structure hierarchy to force a maximum of 1 level of term grouping. This is expected to be sufficient for most if not all statement structures currently in use.

Proposal

Chado phenotype proposal.clause.jpg


New and Modified Tables in Phenotype Module

 - Add phenotypeprop table.
 - Add phenotype_clause table, used for grouping phenotype_cvterm records into clauses within a statement.
 - Add type_id field to phenotype_cvterm to indicate role of term in a phenotype statement.
 - Add optional phenotype_clause_id field to phenotype_cvterm to permit grouping phenotype_cvterm records into clauses within a statement.
 CREATE TABLE phenotypeprop (
    phenotypeprop_id SERIAL PRIMARY KEY,
    phenotype_id INT NOT NULL,
       FOREIGN KEY (phenotype_id) REFERENCES phenotype (phenotype_id) ON DELETE CASCADE INITIALLY DEFERRED,
    type_id INT NOT NULL,
       FOREIGN KEY (type_id) REFERENCES cvterm (cvterm_id) ON DELETE CASCADE INITIALLY DEFERRED,
    value TEXT NULL,
    rank INT NOT NULL DEFAULT 0,

    CONSTRAINT phenotypeprop_c1 UNIQUE (phenotypeprop_id,type_id,rank)
 );
 COMMENT ON TABLE phenotypeprop IS "This table can be used to attach additional information to a phenotype or trait that is not part of the term or post-composed term. For example, heritability of a trait, dominant/recessive, et cetera.";
 CREATE TABLE phenotype_clause (
    phenotype_clause_id SERIAL PRIMARY KEY,
    uniquename TEXT NOT NULL,
    type_id INT NOT NULL,
       FOREIGN KEY (type_id) REFERENCES cvterm (cvterm_id) ON DELETE CASCADE INITIALLY DEFERRED,
    rank INT NOT NULL DEFAULT 0,
  );
 COMMENT ON TABLE phenotype_clause IS "Used to group phenotype_cvterm records into clauses, as are used in EQ statements where, for example, the primary entity may be a clause constructed with up to 3 terms";
 ALTER TABLE phenotype_cvterm 
   ADD COLUMN type_id INT NOT NULL,
      FOREIGN KEY type_id 
        REFERENCES cvterm (cvterm_id) ON DELETE CASCADE INITIALLY DEFERRED,
   ADD COLUMN phenotypeclause_id INT,
      FOREIGN KEY (grp_id) REFERENCES grp (grp_id) ON DELETE CASCADE INITIALLY DEFERRED,
 ;
 COMMENT ON COLUMN type_id IS "Name of this cvterm's role in a post-composed term";
 COMMENT ON COLUMN phenotypeclause_id IS "If this term is part of a clause within a statement, this field identifies the clause.";


Recommended Deprecated Fields

 COMMENT ON TABLE phenotype IS 'Columns observable_id, assay_id 
 are deprecated to break the connection between the phenotype value and the
 trait. The phenotype table should be used to store precomposed terms and the 
 phenotype value. Use tables phenotype_cvterm to store the trait(s) associated 
 with the phenotype.';

Controlled Vocabularies

The parts of a post-composed statement will need to be described in a cv. This could go into a new cv for each type of statement, or go into a general, post-composed_term cv.

For EQ statements:
Primary Entity
Primary Entity 1
Primary Entity 1 Relationship
Primary Entity 2
Quality
Qualifier
Secondary Entity
Secondary Entity 1
Secondary Entity 1 Relationship
Secondary Entity 2
...


Older proposals

See Talk:Chado_Post-Composed_Phenotypes for the older versions of this schema proposal