Difference between revisions of "Chado Manual"

From GMOD
Jump to: navigation, search
m (New page: ==Introduction== ===A Feature is a Sequence=== Chado does not distinguish between a sequence and a sequence feature, on the theory that a feature is a piece of a sequence, and a piece o...)
 
m (Modules)
Line 1: Line 1:
 
 
==Introduction==
 
==Introduction==
  
Line 13: Line 12:
 
===Modules===
 
===Modules===
  
We organised the tables into distinct modular components with tightly
+
We organised the tables into distinct modular components with tightly defined dependencies. This is recogised as good software engineering practice, it allows different software components to focus on the specific data compartments required. It allows for extensibility and schema evolution within specific modules without disrupting the rest of the schema. Finally, it allows for a mix and match approach - it is the authors' hope that the schema modules will be adopted by other model organism and bioinformatics groups; these groups may want to swap in their own table variants within specific modules, or add modules of their own.
defined dependencies. This is recogised as good software engineering
+
practice, it allows different software components to focus on the
+
specific data compartments required. It allows for extensibility and
+
schema evolution within specific modules without disrupting the rest
+
of the schema. Finally, it allows for a mix and match approach - it is
+
the authors' hope that the schema modules will be adopted by other
+
model organism and bioinformatics groups; these groups may want to
+
swap in their own table variants within specific modules, or add
+
modules of their own.
+
 
    
 
    
general      - General/Core
+
* [[Chado Audit Module|Audit]] - for database audits
audit
+
* [[Chado Companalysis Module|Companalysis]] - for data from computational analysis
companalysis
+
* [[Chado Contact Module|Contact]] - for people, groups, and organizations
contact
+
* [[Chado CV Module|Controlled Vocabulary (cv)]] - for controlled vocabularies and ontologies
cv           - Controlled Vocabularies / Ontologies
+
* [[Chado Expression Module|Expression]] - for RNA and protein expresssion
expression  - RNA and protein expression, including space/time localisation
+
* [[Chado General Module|General]] - for identifiers
genetic Alleles, and relationships between alleles and phenotypes
+
* [[Chado Genetic Module|Genetic]] - for genetic data and genotypes
library
+
* [[Chado Library Module|Library]] - for descriptions of molecular libraries
map
+
* [[Chado Map Module|Map]] - for maps without sequence
organism    - Data related to taxonomy / species
+
* [[Chado Organism Module|Organism]] - for taxonomic data
phenotype
+
* [[Chado Phenotype Module|Phenotype]] - for phenotypic data
phylogeny
+
* [[Chado Phylogeny Module|Phylogeny]] - for organisms and phylogenetic trees
pub         - Publications, bibliographies, and references
+
* [[Chado Publication Module|Publication (pub)]] - for publications and references
sequence    - Biological sequences and annotation
+
* [[Chado Sequence Module|Sequence]] - for sequences and sequence features
stock
+
* [[Chado Stock Module|Stock]] - for specimens and biological collections
www
+
* [[Chado WWW Module|WWW]] -
 
   
 
   
  
DEPENDENCIES
+
====Module Dependencies====
  
 
general:    NO DEPENDENCIES
 
general:    NO DEPENDENCIES
Line 54: Line 44:
  
  
INTER MODULE LINKING TABLES
+
====Inter-module Linking Tables====
  
these can be thought of as floating outside of the respective modules
+
These can be thought of as floating outside of the respective modules they bridge, although they are generally bundled with one or the other
they bridge, although they are generally bundled with one or the other
+
module.
module
+
  
 
<not complete>
 
<not complete>
  
MODULE          MODULE          TABLE
+
{| border="1" cellspacing="0"
------          ------          -----
+
!Module
sequence        expression      feature_expression
+
!Module
cv              expression      expression_cvterm
+
!Table
pub            expression      expression_pub
+
|-
cv              genetic phenotype_cvterm
+
|sequence        |expression      |feature_expression
sequence        genetic feature_genotype
+
|-
general        organism        organism_dbxref
+
|cv              |expression      |expression_cvterm
general        pub            pub_dbxref
+
|-
general pub journal_dbxref
+
|pub            |expression      |expression_pub
pub            sequence        featureprop_pub
+
|-
general        sequence        feature_dbxref
+
|cv              |genetic |phenotype_cvterm
cv              sequence        feature_cvterm
+
|-
organism        sequence        feature_organism
+
|sequence        |genetic |feature_genotype
general sequence feature_synonym
+
|-
general sequence gene_synonym
+
|general        |organism        |organism_dbxref
 +
|-
 +
|general        |pub            |pub_dbxref
 +
|-
 +
|general |pub |journal_dbxref
 +
|-
 +
|pub            |sequence        |featureprop_pub
 +
|-
 +
|general        |sequence        |feature_dbxref
 +
|-
 +
|cv              |sequence        |feature_cvterm
 +
|-
 +
|organism        |sequence        |feature_organism
 +
|-
 +
|general |sequence |feature_synonym
 +
|-
 +
|general |sequence |gene_synonym
 +
|-
 +
|}
  
 
  
 +
===Naming Conventions===
  
1.1 Schema
 
  
 
+
===Design Patterns===
naming convention
+
 
+
design patterns
+
  
  
Line 95: Line 98:
  
 
Module Metadata
 
Module Metadata
 
  
 
===View Layers===
 
===View Layers===

Revision as of 16:34, 14 February 2007

Introduction

A Feature is a Sequence

Chado does not distinguish between a sequence and a sequence feature, on the theory that a feature is a piece of a sequence, and a piece of a sequence is a sequence. Both are represented as a row in the feature table.

Feature types

Feature types are taken from the SO controlled vocabulary (see also Controlled Vocabulary section in this document). A selection of Chado-relevant types from SO are shown below:


Modules

We organised the tables into distinct modular components with tightly defined dependencies. This is recogised as good software engineering practice, it allows different software components to focus on the specific data compartments required. It allows for extensibility and schema evolution within specific modules without disrupting the rest of the schema. Finally, it allows for a mix and match approach - it is the authors' hope that the schema modules will be adopted by other model organism and bioinformatics groups; these groups may want to swap in their own table variants within specific modules, or add modules of their own.


Module Dependencies

general: NO DEPENDENCIES organism: general pub: general cv: general pub sequence: cv general pub genetic sequence cv general pub expression: sequence cv general pub map: sequence cv general pub


Inter-module Linking Tables

These can be thought of as floating outside of the respective modules they bridge, although they are generally bundled with one or the other module.

<not complete>

Module Module Table
expression |feature_expression
expression |expression_cvterm
expression |expression_pub
genetic |phenotype_cvterm
genetic |feature_genotype
organism |organism_dbxref
pub |pub_dbxref
pub |journal_dbxref
sequence |featureprop_pub
sequence |feature_dbxref
sequence |feature_cvterm
sequence |feature_organism
sequence |feature_synonym
sequence |gene_synonym


Naming Conventions

Design Patterns

1.1.1 Module System


Module Metadata

View Layers

Views can be thought of as virtual tables. They provide a powerful abstraction layer over the database. All views should be portable across all DBMSs

Views in chado are defined on a per module basis. View definitions are maintained in the chado/modules/MODULE-NAME/views directory.

Included in the view directory are report views. These can usually be found in a file called chado/modules/MODULE-NAME/views/MODULE-NAME-report.sql

Collections of view definitions are bundled into packages, each package is a .sql file.


Inter-schema Bridges

GODB Bridge


BioSQL Bridge


DBMS Functions

DBMS Functions in Chado are entirely optional.

Functions in chado are defined on a per module basis. Function definitions are maintained in the chado/modules/MODULE-NAME/functions directory.

Collections of function definitions are bundled into packages. Each package comes with an interface descriptions and one or more implementations.


Function Interface Definitions

The interface descriptions are stored in a *.sqlapi file. The syntax used is a variant of SQL and is intended primarily as a consistent way of providing information for human, although it should be parseable by software.

Here is an example, taken from the top of the chado/modules/sequence/functions/subsequence.sqlapi package. This package provides basic subsequencing functions. It has dependencies on two other function packages, declared at the top of the file. The package declares multiple functions, only the first of which is show here, a function for extracting subsequences from the sequence of a feature.

<sql> IMPORT reverse_complement(TEXT) FROM 'sequtil'; IMPORT get_feature_relationship_type_id(TEXT) FROM 'sequence-cv-helper';


-- basic subsequencing functions --


DECLARE FUNCTION subsequence( srcfeature_id INT REFERENCES feature(feature_id), fmin INT, fmax INT, strandINT )

RETURNS TEXT;

COMMENT ON FUNCTION subsequence(INT,INT,INT,INT) IS 'extracts a subsequence from a feature referenced by srcfeature_id, within the interbase boundaries determined by fmin and fmax, reverse complementing if strand = -1. The sequence can be DNA or AA. Strand must always by >0 for AA sequences'; </sql>



Function Implementations

The goal is to provide implementations for different dialects of procedural SQL. Currently only PostgreSQL dialect is supported. The psql implementations are stored in *.plpgsql files.