Chado Phylogeny Module

From GMOD
Revision as of 20:22, 19 February 2007 by Bosborne (Talk | contribs)

Jump to: navigation, search

Introduction

For representing phylogenetic trees; the trees represent the phylogeny of some some kind of sequence feature (mainly proteins) or actual organism taxonomy trees.

This module relies heavily on the sequence module in particular, all the leaf nodes in a tree correspond to features; these will usually be features of type SO:protein or SO:polypeptide (but other trees are possible - eg intron trees). If it is desirable to store multiple alignments for each non-leaf node, then each node can be mapped to a feature of type SO:match. Please fefer to the sequence module docs for details on storing multiple alignments.

Annotating nodes

Each node can have a feature attached; this 'feature' is the multiple alignment for non-leaf nodes. It is these features that are annotated rather than annotating the nodes themselves. This has lots of advantages - we can piggyback off of the sequence module and reuse the tables there the leaf nodes may have annotations already attached - for example, GO associations. In fact, it is even possible to annotate ranges along an alignment - this would entail creating a new feature which has a featureloc on the alignment feature.

The nested set tree implementation by way of Joe Celko; see the excellent introduction by Aaron Mackey at http://www.oreillynet.com/pub/a/network/2002/11/27/bioconf.html.

Tables

phylotree

Global anchor for phylogenetic tree

Field Name Data Type Size Default Value Other Foreign Key phylotree_id integer 11 PRIMARY KEY, UNIQUE, NOT NULL dbxref_id integer 10 NOT NULL dbxref.dbxref_id name varchar 255 NULL type_id integer 10 type: protein, nucleotide, taxonomy, ??? cvterm.cvterm_id comment text 64000 NULL REMOVED BY cjm; this is implicit from indexing - see phylonode


phylotree_pub

Tracks citations global to the tree e.g. multiple sequence alignment supporting tree construction

Field Name Data Type Size Default Value Other Foreign Key phylotree_pub_id integer 11 PRIMARY KEY, NOT NULL phylotree_id integer 10 UNIQUE, NOT NULL phylotree.phylotree_id pub_id integer 10 UNIQUE, NOT NULL pub.pub_id


phylonode

This is the most pervasive element in the phylogeny module, cataloging the 'phylonodes' of tree graphs. Edges are implied by the parent_phylonode_id reflexive closure

Field Name Data Type Size Default Value Other Foreign Key phylonode_id integer 11 PRIMARY KEY, NOT NULL phylotree_id integer 10 UNIQUE, NOT NULL phylotree.phylotree_id parent_phylonode_id integer 10 NULL root phylonode can have null parent_phylonode_id value phylonode.phylonode_id left_idx integer 10 UNIQUE, NOT NULL, nested set implementation

phylonode_dbxref

e.g. for orthology, paralogy group identifiers; could also be used for NCBI taxonomy; for sequences, refer to 'phylonode_feature' feature associated dbxrefs

Field Name Data Type Size Default Value Other Foreign Key phylonode_dbxref_id integer 11 PRIMARY KEY, NOT NULL phylonode_id integer 10 UNIQUE, NOT NULL phylonode.phylonode_id dbxref_id integer 10 UNIQUE, NOT NULL dbxref.dbxref_id


phylonode_pub

Field Name Data Type Size Default Value Other Foreign Key phylonode_pub_id integer 11 PRIMARY KEY, NOT NULL phylonode_id integer 10 UNIQUE, NOT NULL phylonode.phylonode_id pub_id integer 10 UNIQUE, NOT NULL pub.pub_id


phylonode_organism

this linking table should only be used for nodes in taxonomy trees; it provides a mapping between the node and an organism one node can have zero or one organisms one organism can have zero or more nodes (although typically it should only have one, in the standard NCBI taxonomy tree. should we enforce one only, or allow competing taxonomy trees?)

one phylonode cannot refer to >1 organism

Field Name Data Type Size Default Value Other Foreign Key phylonode_organism_id integer 11 PRIMARY KEY, NOT NULL phylonode_id integer 10 UNIQUE, NOT NULL phylonode.phylonode_id organism_id integer 10 NOT NULL organism.organism_id


phylonodeprop

e.g. "type_id" could designate phylonode hierarchy relationships, for example: species taxonomy (kingdom, order, family, genus, species), "ortholog/paralog", "fold/superfold", etc.

Field Name Data Type Size Default Value Other Foreign Key phylonodeprop_id integer 11 PRIMARY KEY, NOT NULL phylonode_id integer 10 UNIQUE, NOT NULL phylonode.phylonode_id type_id integer 10 UNIQUE, NOT NULL cvterm.cvterm_id value text 64000 UNIQUE, NOT NULL rank integer 10 0 UNIQUE, NOT NULL, not sure how useful the rank concept is here, but I'll leave it in for now

phylonode_relationship

this is for exotic relationships that are not strictly hierarchical; for example, horizontal gene transfer use of this table would be highly unusual; most phylogenetic trees are strictly hierarchical. nevertheless, it is here for completion

Field Name Data Type Size Default Value Other Foreign Key phylonode_relationship_id integer 11 PRIMARY KEY, NOT NULL subject_id integer 10 UNIQUE, NOT NULL phylonode.phylonode_id object_id integer 10 UNIQUE, NOT NULL phylonode.phylonode_id type_id integer 10 UNIQUE, NOT NULL cvterm.cvterm_id rank integer 10