Chado Expression Module

From GMOD
Revision as of 19:13, 19 February 2007 by Bosborne (Talk | contribs)

Jump to: navigation, search

Introduction

This module is for how curated expression data is stored in chado. This module is totally dependent on the sequence module. Objects in the genetic module cannot connect to expression data except by going via the sequence module. We assume that we'll always have a controlled vocabulary for expression data.

Here is an example of a simple case of the sort of data that FlyBase curates. The dpp transcript is expressed in embryonic stage 13-15 in the cephalic segment as reported in a paper by Blackman et al. in 1991. This would be implemented in the expression module by linking the dpp transcript feature to expression via feature_expression (we would add a pub_id column to feature_expression to link to the publication in the pub table). We would then link the following cvterms to the expression using expression_cvterm:

  • embryonic stage 13 where the cvterm_type would be stage and the rank=0
  • embryonic stage 14 where the cvterm_type would be stage and the rank=1
  • embryonic stage 15 where the cvterm_type would be stage and the rank=1
  • cephalic segment where the cvterm_type would be anatomy and the rank=0
  • in situ hybridization where the cvterm_type would be assat and the rank=0

Note that we would change the cvterm_type column to cvterm_type_id and use a cvterm_id for a particular expression slot (i.e. stage, anatomy, assay, 'subcellular location' and that cvterms from different OBO ontologies can share the same cvterm_type.


Tables

expression_cvterm

WARNING open question

What are the possibities of combination when more than one cvterm is used in a field?

For e.g. (in

here): <t> E | early <a> <p> anterior & dorsal If the two terms used in a particular field are co-equal (both from the same CV, is the relation always "&"? May we find "or"? Obviously another case is when a bodypart term and a bodypart qualifier term are used in a specific field, eg: <t> L | third instar <a> larval antennal segment sensilla | subset <p WRT the three-part --<t><a><p> statements, are the values in the different parts *always* from different vocabularies in proforma.CV? If not, we'll need to have some kind of type qualifier telling us whether the cvterm used is <t>, <a>, or <p> yes we should have a type qualifier as a cv term can be from diff vocab e.g. blastoderm can be body part and stage terms in dros anatomy but cvterm_type_id needs to be a cv instead of a free text type