Difference between revisions of "Chado Phylogeny Module"

From GMOD
Jump to: navigation, search
m
m (Introduction)
Line 1: Line 1:
 
=Introduction=
 
=Introduction=
  
For representing phylogenetic trees; the trees represent the  
+
For representing phylogenetic trees; the trees represent the phylogeny of some some kind of sequence feature, protein or nucleotide, or actual organism taxonomy trees.
phylogeny of some some kind of sequence feature (mainly proteins)
+
or actual organism taxonomy trees.
+
  
 
This module relies heavily on the sequence module  
 
This module relies heavily on the sequence module  
 
in particular, all the leaf nodes in a tree correspond to features;  
 
in particular, all the leaf nodes in a tree correspond to features;  
 
these will usually be features of type SO:protein or SO:polypeptide  
 
these will usually be features of type SO:protein or SO:polypeptide  
(but other trees are possible - eg intron trees). If it is desirable to store multiple alignments for each non-leaf node,  
+
(but other trees are possible - e.g. intron trees). If it is desirable to store multiple alignments for each non-leaf node,  
 
then each node can be mapped to a feature of type SO:match.  
 
then each node can be mapped to a feature of type SO:match.  
Please fefer to the sequence module docs for details on storing multiple alignments.
+
Please refer to the [[Chado_Sequence_Module|sequence module docs]] for details on storing multiple alignments.
  
 
===Annotating nodes===
 
===Annotating nodes===
Line 25: Line 23:
 
The nested set tree implementation by way of Joe Celko; see the excellent introduction by Aaron Mackey at  
 
The nested set tree implementation by way of Joe Celko; see the excellent introduction by Aaron Mackey at  
 
http://www.oreillynet.com/pub/a/network/2002/11/27/bioconf.html.
 
http://www.oreillynet.com/pub/a/network/2002/11/27/bioconf.html.
 
  
 
=Tables=
 
=Tables=

Revision as of 03:48, 2 March 2007

Introduction

For representing phylogenetic trees; the trees represent the phylogeny of some some kind of sequence feature, protein or nucleotide, or actual organism taxonomy trees.

This module relies heavily on the sequence module in particular, all the leaf nodes in a tree correspond to features; these will usually be features of type SO:protein or SO:polypeptide (but other trees are possible - e.g. intron trees). If it is desirable to store multiple alignments for each non-leaf node, then each node can be mapped to a feature of type SO:match. Please refer to the sequence module docs for details on storing multiple alignments.

Annotating nodes

Each node can have a feature attached; this 'feature' is the multiple alignment for non-leaf nodes. It is these features that are annotated rather than annotating the nodes themselves. This has lots of advantages - we can piggyback off of the sequence module and reuse the tables there the leaf nodes may have annotations already attached - for example, GO associations. In fact, it is even possible to annotate ranges along an alignment - this would entail creating a new feature which has a featureloc on the alignment feature.

The nested set tree implementation by way of Joe Celko; see the excellent introduction by Aaron Mackey at http://www.oreillynet.com/pub/a/network/2002/11/27/bioconf.html.

Tables

Table: phylonode

This is the most pervasive element in the phylogeny module, cataloging the "phylonodes" of tree graphs. Edges are implied by the parent_phylonode_id reflexive closure. For all nodes in a nested set implementation the left and right index will be *between* the parents left and right indexes.

phylonode Structure
F-Key Name Type Description
phylonode_id serial PRIMARY KEY

phylotree

phylotree_id integer UNIQUE#1 UNIQUE#2 NOT NULL

phylonode

parent_phylonode_id integer

Root phylonode can have null parent_phylonode_id value.
left_idx integer UNIQUE#1 NOT NULL
right_idx integer UNIQUE#2 NOT NULL

cvterm

type_id integer

Type: e.g. root, interior, leaf.

feature

feature_id integer

Phylonodes can have optional features attached to them e.g. a protein or nucleotide sequence usually attached to a leaf of the phylotree for non-leaf nodes, the feature may be a feature that is an instance of SO:match; this feature is the alignment of all leaf features beneath it.
label character varying(255)
distance double precision

Tables referencing this one via Foreign Key Constraints:



Table: phylonode_dbxref

For example, for orthology, paralogy group identifiers; could also be used for NCBI taxonomy; for sequences, refer to phylonode_feature, feature associated dbxrefs.

phylonode_dbxref Structure
F-Key Name Type Description
phylonode_dbxref_id serial PRIMARY KEY

phylonode

phylonode_id integer UNIQUE#1 NOT NULL

dbxref

dbxref_id integer UNIQUE#1 NOT NULL


Table: phylonode_organism

This linking table should only be used for nodes in taxonomy trees; it provides a mapping between the node and an organism. One node can have zero or one organisms, one organism can have zero or more nodes (although typically it should only have one in the standard NCBI taxonomy tree).

phylonode_organism Structure
F-Key Name Type Description
phylonode_organism_id serial PRIMARY KEY

phylonode

phylonode_id integer UNIQUE NOT NULL

One phylonode cannot refer to >1 organism.

organism

organism_id integer NOT NULL


Table: phylonode_pub

phylonode_pub Structure
F-Key Name Type Description
phylonode_pub_id serial PRIMARY KEY

phylonode

phylonode_id integer UNIQUE#1 NOT NULL

pub

pub_id integer UNIQUE#1 NOT NULL


Table: phylonode_relationship

This is for exotic relationships that are not strictly hierarchical; for example, horizontal gene transfer. Use of this table would be highly unusual; most phylogenetic trees are strictly hierarchical. Nevertheless, it is here for completeness.

phylonode_relationship Structure
F-Key Name Type Description
phylonode_relationship_id serial PRIMARY KEY

phylonode

subject_id integer UNIQUE#1 NOT NULL

phylonode

object_id integer UNIQUE#1 NOT NULL

cvterm

type_id integer UNIQUE#1 NOT NULL
rank integer


Table: phylonodeprop

phylonodeprop Structure
F-Key Name Type Description
phylonodeprop_id serial PRIMARY KEY

phylonode

phylonode_id integer UNIQUE#1 NOT NULL

cvterm

type_id integer UNIQUE#1 NOT NULL

type_id could designate phylonode hierarchy relationships, for example: species taxonomy (kingdom, order, family, genus, species), "ortholog/paralog", "fold/superfold", etc.
value text UNIQUE#1 NOT NULL DEFAULT ''::text
rank integer UNIQUE#1 NOT NULL


Table: phylotree

Global anchor for phylogenetic tree.

phylotree Structure
F-Key Name Type Description
phylotree_id serial PRIMARY KEY

dbxref

dbxref_id integer NOT NULL
name character varying(255)

cvterm

type_id integer

Type: protein, nucleotide, taxonomy, for example. The type should be any SO type, or "taxonomy".
comment text

Tables referencing this one via Foreign Key Constraints:



Table: phylotree_pub

Tracks citations global to the tree e.g. multiple sequence alignment supporting tree construction.

phylotree_pub Structure
F-Key Name Type Description
phylotree_pub_id serial PRIMARY KEY

phylotree

phylotree_id integer UNIQUE#1 NOT NULL

pub

pub_id integer UNIQUE#1 NOT NULL