Chado Phylogeny Module
Contents
Introduction
For representing phylogenetic trees; the trees represent the phylogeny of some some kind of sequence feature (mainly proteins) or actual organism taxonomy trees.
This module relies heavily on the sequence module in particular, all the leaf nodes in a tree correspond to features; these will usually be features of type SO:protein or SO:polypeptide (but other trees are possible - eg intron trees). If it is desirable to store multiple alignments for each non-leaf node, then each node can be mapped to a feature of type SO:match. Please fefer to the sequence module docs for details on storing multiple alignments.
Annotating nodes
Each node can have a feature attached; this 'feature' is the multiple alignment for non-leaf nodes. It is these features that are annotated rather than annotating the nodes themselves. This has lots of advantages - we can piggyback off of the sequence module and reuse the tables there the leaf nodes may have annotations already attached - for example, GO associations. In fact, it is even possible to annotate ranges along an alignment - this would entail creating a new feature which has a featureloc on the alignment feature.
The nested set tree implementation by way of Joe Celko; see the excellent introduction by Aaron Mackey at http://www.oreillynet.com/pub/a/network/2002/11/27/bioconf.html.
Tables
phylotree
Global anchor for phylogenetic tree
Field Name Data Type Size Default Value Other Foreign Key phylotree_id integer 11 PRIMARY KEY, UNIQUE, NOT NULL dbxref_id integer 10 NOT NULL dbxref.dbxref_id name varchar 255 NULL type_id integer 10 type: protein, nucleotide, taxonomy, ??? cvterm.cvterm_id comment text 64000 NULL REMOVED BY cjm; this is implicit from indexing - see phylonode
phylotree_pub
Tracks citations global to the tree e.g. multiple sequence alignment supporting tree construction
Field Name Data Type Size Default Value Other Foreign Key phylotree_pub_id integer 11 PRIMARY KEY, NOT NULL phylotree_id integer 10 UNIQUE, NOT NULL phylotree.phylotree_id pub_id integer 10 UNIQUE, NOT NULL pub.pub_id
phylonode
This is the most pervasive element in the phylogeny module, cataloging the 'phylonodes' of tree graphs. Edges are implied by the parent_phylonode_id reflexive closure
Field Name Data Type Size Default Value Other Foreign Key phylonode_id integer 11 PRIMARY KEY, NOT NULL phylotree_id integer 10 UNIQUE, NOT NULL phylotree.phylotree_id parent_phylonode_id integer 10 NULL root phylonode can have null parent_phylonode_id value phylonode.phylonode_id left_idx integer 10 UNIQUE, NOT NULL, nested set implementation
phylonode_dbxref
e.g. for orthology, paralogy group identifiers; could also be used for NCBI taxonomy; for sequences, refer to 'phylonode_feature' feature associated dbxrefs
Field Name Data Type Size Default Value Other Foreign Key phylonode_dbxref_id integer 11 PRIMARY KEY, NOT NULL phylonode_id integer 10 UNIQUE, NOT NULL phylonode.phylonode_id dbxref_id integer 10 UNIQUE, NOT NULL dbxref.dbxref_id
phylonode_pub
Field Name Data Type Size Default Value Other Foreign Key phylonode_pub_id integer 11 PRIMARY KEY, NOT NULL phylonode_id integer 10 UNIQUE, NOT NULL phylonode.phylonode_id pub_id integer 10 UNIQUE, NOT NULL pub.pub_id
phylonode_organism
this linking table should only be used for nodes in taxonomy trees; it provides a mapping between the node and an organism one node can have zero or one organisms one organism can have zero or more nodes (although typically it should only have one, in the standard NCBI taxonomy tree. should we enforce one only, or allow competing taxonomy trees?)
one phylonode cannot refer to >1 organism
Field Name Data Type Size Default Value Other Foreign Key phylonode_organism_id integer 11 PRIMARY KEY, NOT NULL phylonode_id integer 10 UNIQUE, NOT NULL phylonode.phylonode_id organism_id integer 10 NOT NULL organism.organism_id
phylonodeprop
e.g. "type_id" could designate phylonode hierarchy relationships, for example: species taxonomy (kingdom, order, family, genus, species), "ortholog/paralog", "fold/superfold", etc.
Field Name Data Type Size Default Value Other Foreign Key phylonodeprop_id integer 11 PRIMARY KEY, NOT NULL phylonode_id integer 10 UNIQUE, NOT NULL phylonode.phylonode_id type_id integer 10 UNIQUE, NOT NULL cvterm.cvterm_id value text 64000 UNIQUE, NOT NULL rank integer 10 0 UNIQUE, NOT NULL, not sure how useful the rank concept is here, but I'll leave it in for now
phylonode_relationship
this is for exotic relationships that are not strictly hierarchical; for example, horizontal gene transfer use of this table would be highly unusual; most phylogenetic trees are strictly hierarchical. nevertheless, it is here for completion
Field Name Data Type Size Default Value Other Foreign Key phylonode_relationship_id integer 11 PRIMARY KEY, NOT NULL subject_id integer 10 UNIQUE, NOT NULL phylonode.phylonode_id object_id integer 10 UNIQUE, NOT NULL phylonode.phylonode_id type_id integer 10 UNIQUE, NOT NULL cvterm.cvterm_id rank integer 10