Chado Tables

From GMOD

Jump to: navigation, search

Contents

[edit] Table: db

A database authority. Typical databases in bioinformatics are FlyBase, GO, UniProt, NCBI, MGI, etc. The authority is generally known by this shortened form, which is unique within the bioinformatics and biomedical realm. To Do - add support for URIs, URNs (e.g. LSIDs). We can do this by treating the URL as a URI - however, some applications may expect this to be resolvable - to be decided.

db Structure
FK Name Type Description
db_id serial PRIMARY KEY
name character varying(255) UNIQUE NOT NULL
description character varying(255)
urlprefix character varying(255)
url character varying(255)

Tables referencing this one via Foreign Key Constraints:



[edit] Table: dbxref

A unique, global, public, stable identifier. Not necessarily an external reference - can reference data items inside the particular chado instance being used. Typically a row in a table can be uniquely identified with a primary identifier (called dbxref_id); a table may also have secondary identifiers (in a linking table <T>_dbxref). A dbxref is generally written as <DB>:<ACCESSION> or as <DB>:<ACCESSION>:<VERSION>.

dbxref Structure
FK Name Type Description
dbxref_id serial PRIMARY KEY

db

db_id integer UNIQUE#1 NOT NULL
accession character varying(255) UNIQUE#1 NOT NULL

The local part of the identifier. Guaranteed by the db authority to be unique for that db.
version character varying(255) UNIQUE#1 NOT NULL DEFAULT ''::character varying
description text

Tables referencing this one via Foreign Key Constraints:



[edit] Table: project

project Structure
FK Name Type Description
project_id serial PRIMARY KEY
name character varying(255) UNIQUE NOT NULL
description character varying(255) NOT NULL

Tables referencing this one via Foreign Key Constraints:



[edit] Table: tableinfo

tableinfo Structure
FK Name Type Description
tableinfo_id serial PRIMARY KEY
name character varying(30) UNIQUE NOT NULL
primary_key_column character varying(30)
is_view integer NOT NULL
view_on_table_id integer
superclass_table_id integer
is_updateable integer NOT NULL DEFAULT 1
modification_date date NOT NULL DEFAULT now()

Tables referencing this one via Foreign Key Constraints:



[edit] Table: cv

A controlled vocabulary or ontology. A cv is composed of cvterms (AKA terms, classes, types, universals - relations and properties are also stored in cvterm) and the relationships between them.

cv Structure
FK Name Type Description
cv_id serial PRIMARY KEY
name character varying(255) UNIQUE NOT NULL

The name of the ontology. This corresponds to the obo-format -namespace-. cv names uniquely identify the cv. In OBO file format, the cv.name is known as the namespace.
definition text

A text description of the criteria for membership of this ontology.

Tables referencing this one via Foreign Key Constraints:



[edit] Table: cvterm

A term, class, universal or type within an ontology or controlled vocabulary. This table is also used for relations and properties. cvterms constitute nodes in the graph defined by the collection of cvterms and cvterm_relationships.

cvterm Structure
FK Name Type Description
cvterm_id serial PRIMARY KEY

cv

cv_id integer UNIQUE#1 NOT NULL

The cv or ontology or namespace to which this cvterm belongs.
name character varying(1024) UNIQUE#1 NOT NULL

A concise human-readable name or label for the cvterm. Uniquely identifies a cvterm within a cv.
definition text

A human-readable text definition.

dbxref

dbxref_id integer UNIQUE NOT NULL

Primary identifier dbxref - The unique global OBO identifier for this cvterm. Note that a cvterm may have multiple secondary dbxrefs - see also table: cvterm_dbxref.
is_obsolete integer UNIQUE#1 NOT NULL

Boolean 0=false,1=true; see GO documentation for details of obsoletion. Note that two terms with different primary dbxrefs may exist if one is obsolete.
is_relationshiptype integer NOT NULL

Boolean 0=false,1=true relations or relationship types (also known as Typedefs in OBO format, or as properties or slots) form a cv/ontology in themselves. We use this flag to indicate whether this cvterm is an actual term/class/universal or a relation. Relations may be drawn from the OBO Relations ontology, but are not exclusively drawn from there.

Tables referencing this one via Foreign Key Constraints:



[edit] Table: cvterm_dbxref

In addition to the primary identifier (cvterm.dbxref_id) a cvterm can have zero or more secondary identifiers/dbxrefs, which may refer to records in external databases. The exact semantics of cvterm_dbxref are not fixed. For example: the dbxref could be a pubmed ID that is pertinent to the cvterm, or it could be an equivalent or similar term in another ontology. For example, GO cvterms are typically linked to InterPro IDs, even though the nature of the relationship between them is largely one of statistical association. The dbxref may be have data records attached in the same database instance, or it could be a "hanging" dbxref pointing to some external database. NOTE: If the desired objective is to link two cvterms together, and the nature of the relation is known and holds for all instances of the subject cvterm then consider instead using cvterm_relationship together with a well-defined relation.

cvterm_dbxref Structure
FK Name Type Description
cvterm_dbxref_id serial PRIMARY KEY

cvterm

cvterm_id integer UNIQUE#1 NOT NULL

dbxref

dbxref_id integer UNIQUE#1 NOT NULL
is_for_definition integer NOT NULL

A cvterm.definition should be supported by one or more references. If this column is true, the dbxref is not for a term in an external database - it is a dbxref for provenance information for the definition.


[edit] Table: cvterm_relationship

A relationship linking two cvterms. Each cvterm_relationship constitutes an edge in the graph defined by the collection of cvterms and cvterm_relationships. The meaning of the cvterm_relationship depends on the definition of the cvterm R refered to by type_id. However, in general the definitions are such that the statement "all SUBJs REL some OBJ" is true. The cvterm_relationship statement is about the subject, not the object. For example "insect wing part_of thorax".

cvterm_relationship Structure
FK Name Type Description
cvterm_relationship_id serial PRIMARY KEY

cvterm

type_id integer UNIQUE#1 NOT NULL

The nature of the relationship between subject and object. Note that relations are also housed in the cvterm table, typically from the OBO relationship ontology, although other relationship types are allowed.

cvterm

subject_id integer UNIQUE#1 NOT NULL

The subject of the subj-predicate-obj sentence. The cvterm_relationship is about the subject. In a graph, this typically corresponds to the child node.

cvterm

object_id integer UNIQUE#1 NOT NULL

The object of the subj-predicate-obj sentence. The cvterm_relationship refers to the object. In a graph, this typically corresponds to the parent node.


[edit] Table: cvtermpath

The reflexive transitive closure of the cvterm_relationship relation.

cvtermpath Structure
FK Name Type Description
cvtermpath_id serial PRIMARY KEY

cvterm

type_id integer UNIQUE#1

The relationship type that this is a closure over. If null, then this is a closure over ALL relationship types. If non-null, then this references a relationship cvterm - note that the closure will apply to both this relationship AND the OBO_REL:is_a (subclass) relationship.

cvterm

subject_id integer UNIQUE#1 NOT NULL

cvterm

object_id integer UNIQUE#1 NOT NULL

cv

cv_id integer NOT NULL

Closures will mostly be within one cv. If the closure of a relationship traverses a cv, then this refers to the cv of the object_id cvterm.
pathdistance integer UNIQUE#1

The number of steps required to get from the subject cvterm to the object cvterm, counting from zero (reflexive relationship).


[edit] Table: cvtermprop

Additional extensible properties can be attached to a cvterm using this table. Corresponds to -AnnotationProperty- in W3C OWL format.

cvtermprop Structure
FK Name Type Description
cvtermprop_id serial PRIMARY KEY

cvterm

cvterm_id integer UNIQUE#1 NOT NULL

cvterm

type_id integer UNIQUE#1 NOT NULL

The name of the property or slot is a cvterm. The meaning of the property is defined in that cvterm.
value text UNIQUE#1 NOT NULL DEFAULT ''::text

The value of the property, represented as text. Numeric values are converted to their text representation.
rank integer UNIQUE#1 NOT NULL

Property-Value ordering. Any cvterm can have multiple values for any particular property type - these are ordered in a list using rank, counting from zero. For properties that are single-valued rather than multi-valued, the default 0 value should be used.


[edit] Table: cvtermsynonym

A cvterm actually represents a distinct class or concept. A concept can be refered to by different phrases or names. In addition to the primary name (cvterm.name) there can be a number of alternative aliases or synonyms. For example, "T cell" as a synonym for "T lymphocyte".

cvtermsynonym Structure
FK Name Type Description
cvtermsynonym_id serial PRIMARY KEY

cvterm

cvterm_id integer UNIQUE#1 NOT NULL
synonym character varying(1024) UNIQUE#1 NOT NULL

cvterm

type_id integer

A synonym can be exact, narrower, or broader than.


[edit] Table: dbxrefprop

Metadata about a dbxref. Note that this is not defined in the dbxref module, as it depends on the cvterm table. This table has a structure analagous to cvtermprop.

dbxrefprop Structure
FK Name Type Description
dbxrefprop_id serial PRIMARY KEY

dbxref

dbxref_id integer UNIQUE#1 NOT NULL

cvterm

type_id integer UNIQUE#1 NOT NULL
value text NOT NULL DEFAULT ''::text
rank integer UNIQUE#1 NOT NULL


[edit] Table: wwwuser

Keep track of WWW users. This may also be useful in an audit module at some point.

wwwuser Structure
FK Name Type Description
wwwuser_id serial PRIMARY KEY
username character varying(32) UNIQUE NOT NULL
password character varying(32) NOT NULL
email character varying(128) NOT NULL
profile text

Tables referencing this one via Foreign Key Constraints:



[edit] Table: wwwuser_cvterm

Track wwwuser interest in cvterms.

wwwuser_cvterm Structure
FK Name Type Description
wwwuser_cvterm_id serial PRIMARY KEY

wwwuser

wwwuser_id integer UNIQUE#1 NOT NULL

cvterm

cvterm_id integer UNIQUE#1 NOT NULL
world_read smallint NOT NULL DEFAULT 1


[edit] Table: wwwuser_expression

Track wwwuser interest in expressions.

wwwuser_expression Structure
FK Name Type Description
wwwuser_expression_id serial PRIMARY KEY

wwwuser

wwwuser_id integer UNIQUE#1 NOT NULL

expression

expression_id integer UNIQUE#1 NOT NULL
world_read smallint NOT NULL DEFAULT 1


[edit] Table: wwwuser_feature

Track wwwuser interest in features.

wwwuser_feature Structure
FK Name Type Description
wwwuser_feature_id serial PRIMARY KEY

wwwuser

wwwuser_id integer UNIQUE#1 NOT NULL

feature

feature_id integer UNIQUE#1 NOT NULL
world_read smallint NOT NULL DEFAULT 1


[edit] Table: wwwuser_genotype

Track wwwuser interest in genotypes.

wwwuser_genotype Structure
FK Name Type Description
wwwuser_genotype_id serial PRIMARY KEY

wwwuser

wwwuser_id integer UNIQUE#1 NOT NULL

genotype

genotype_id integer UNIQUE#1 NOT NULL
world_read smallint NOT NULL DEFAULT 1


[edit] Table: wwwuser_organism

Track wwwuser interest in organisms.

wwwuser_organism Structure
FK Name Type Description
wwwuser_organism_id serial PRIMARY KEY

wwwuser

wwwuser_id integer UNIQUE#1 NOT NULL

organism

organism_id integer UNIQUE#1 NOT NULL
world_read smallint NOT NULL DEFAULT 1


[edit] Table: wwwuser_phenotype

Track wwwuser interest in phenotypes.

wwwuser_phenotype Structure
FK Name Type Description
wwwuser_phenotype_id serial PRIMARY KEY

wwwuser

wwwuser_id integer UNIQUE#1 NOT NULL

phenotype

phenotype_id integer UNIQUE#1 NOT NULL
world_read smallint NOT NULL DEFAULT 1


[edit] Table: wwwuser_project

Link wwwuser accounts to projects

wwwuser_project Structure
FK Name Type Description
wwwuser_project_id serial PRIMARY KEY

wwwuser

wwwuser_id integer UNIQUE#1 NOT NULL

project

project_id integer UNIQUE#1 NOT NULL
world_read smallint NOT NULL DEFAULT 1


[edit] Table: wwwuser_pub

Track wwwuser interest in publications.

wwwuser_pub Structure
FK Name Type Description
wwwuser_pub_id serial PRIMARY KEY

wwwuser

wwwuser_id integer UNIQUE#1 NOT NULL

pub

pub_id integer UNIQUE#1 NOT NULL
world_read smallint NOT NULL DEFAULT 1


[edit] Table: wwwuserrelationship

Track wwwuser interest in other wwwusers.

wwwuserrelationship Structure
FK Name Type Description
wwwuserrelationship_id serial PRIMARY KEY

wwwuser

objwwwuser_id integer UNIQUE#1 NOT NULL

wwwuser

subjwwwuser_id integer UNIQUE#1 NOT NULL
world_read smallint NOT NULL DEFAULT 1

Generated by PostgreSQL Autodoc

W3C HTML 4.01 Strict

[edit] Table: feature

A feature is a biological sequence or a section of a biological sequence, or a collection of such sections. Examples include genes, exons, transcripts, regulatory regions, polypeptides, protein domains, chromosome sequences, sequence variations, cross-genome match regions such as hits and HSPs and so on; see the Sequence Ontology for more.

feature Structure
FK Name Type Description
feature_id serial PRIMARY KEY

dbxref

dbxref_id integer

An optional primary public stable identifier for this feature. Secondary identifiers and external dbxrefs go in the table feature_dbxref.

organism

organism_id integer UNIQUE#1 NOT NULL

The organism to which this feature belongs. This column is mandatory.
name character varying(255)

The optional human-readable common name for a feature, for display purposes.
uniquename text UNIQUE#1 NOT NULL

The unique name for a feature; may not be necessarily be particularly human-readable, although this is preferred. This name must be unique for this type of feature within this organism.
residues text

A sequence of alphabetic characters representing biological residues (nucleic acids, amino acids). This column does not need to be manifested for all features; it is optional for features such as exons where the residues can be derived from the featureloc. It is recommended that the value for this column be manifested for features which may may non-contiguous sublocations (e.g. transcripts), since derivation at query time is non-trivial. For expressed sequence, the DNA sequence should be used rather than the RNA sequence.
seqlen integer

The length of the residue feature. See column:residues. This column is partially redundant with the residues column, and also with featureloc. This column is required because the location may be unknown and the residue sequence may not be manifested, yet it may be desirable to store and query the length of the feature. The seqlen should always be manifested where the length of the sequence is known.
md5checksum character(32)

The 32-character checksum of the sequence, calculated using the MD5 algorithm. This is practically guaranteed to be unique for any feature. This column thus acts as a unique identifier on the mathematical sequence.

cvterm

type_id integer UNIQUE#1 NOT NULL

A required reference to a table:cvterm giving the feature type. This will typically be a Sequence Ontology identifier. This column is thus used to subclass the feature table.
is_analysis boolean NOT NULL DEFAULT false

Boolean indicating whether this feature is annotated or the result of an automated analysis. Analysis results also use the companalysis module. Note that the dividing line between analysis and annotation may be fuzzy, this should be determined on a per-project basis in a consistent manner. One requirement is that there should only be one non-analysis version of each wild-type gene feature in a genome, whereas the same gene feature can be predicted multiple times in different analyses.
is_obsolete boolean NOT NULL DEFAULT false

Boolean indicating whether this feature has been obsoleted. Some chado instances may choose to simply remove the feature altogether, others may choose to keep an obsolete row in the table.
timeaccessioned timestamp without time zone NOT NULL DEFAULT ('now'::text)::timestamp(6) with time zone

For handling object accession or modification timestamps (as opposed to database auditing data, handled elsewhere). The expectation is that these fields would be available to software interacting with chado.
timelastmodified timestamp without time zone NOT NULL DEFAULT ('now'::text)::timestamp(6) with time zone

For handling object accession or modification timestamps (as opposed to database auditing data, handled elsewhere). The expectation is that these fields would be available to software interacting with chado.

Tables referencing this one via Foreign Key Constraints:



[edit] Table: feature_cvterm

Associate a term from a cv with a feature, for example, GO annotation.

feature_cvterm Structure
FK Name Type Description
feature_cvterm_id serial PRIMARY KEY

feature

feature_id integer UNIQUE#1 NOT NULL

cvterm

cvterm_id integer UNIQUE#1 NOT NULL

pub

pub_id integer UNIQUE#1 NOT NULL

Provenance for the annotation. Each annotation should have a single primary publication (which may be of the appropriate type for computational analyses) where more details can be found. Additional provenance dbxrefs can be attached using feature_cvterm_dbxref.
is_not boolean NOT NULL DEFAULT false

If this is set to true, then this annotation is interpreted as a NEGATIVE annotation - i.e. the feature does NOT have the specified function, process, component, part, etc. See GO docs for more details.

Tables referencing this one via Foreign Key Constraints:



[edit] Table: feature_cvterm_dbxref

Additional dbxrefs for an association. Rows in the feature_cvterm table may be backed up by dbxrefs. For example, a feature_cvterm association that was inferred via a protein-protein interaction may be backed by by refering to the dbxref for the alternate protein. Corresponds to the WITH column in a GO gene association file (but can also be used for other analagous associations). See http://www.geneontology.org/doc/GO.annotation.shtml#file for more details.

feature_cvterm_dbxref Structure
FK Name Type Description
feature_cvterm_dbxref_id serial PRIMARY KEY

feature_cvterm

feature_cvterm_id integer UNIQUE#1 NOT NULL

dbxref

dbxref_id integer UNIQUE#1 NOT NULL


[edit] Table: feature_cvterm_pub

Secondary pubs for an association. Each feature_cvterm association is supported by a single primary publication. Additional secondary pubs can be added using this linking table (in a GO gene association file, these corresponding to any IDs after the pipe symbol in the publications column.

feature_cvterm_pub Structure
FK Name Type Description
feature_cvterm_pub_id serial PRIMARY KEY

feature_cvterm

feature_cvterm_id integer UNIQUE#1 NOT NULL

pub

pub_id integer UNIQUE#1 NOT NULL


[edit] Table: feature_cvtermprop

Extensible properties for feature to cvterm associations. Examples: GO evidence codes; qualifiers; metadata such as the date on which the entry was curated and the source of the association. See the featureprop table for meanings of type_id, value and rank.

feature_cvtermprop Structure
FK Name Type Description
feature_cvtermprop_id serial PRIMARY KEY

feature_cvterm

feature_cvterm_id integer UNIQUE#1 NOT NULL

cvterm

type_id integer UNIQUE#1 NOT NULL

The name of the property/slot is a cvterm. The meaning of the property is defined in that cvterm. cvterms may come from the OBO evidence code cv.
value text

The value of the property, represented as text. Numeric values are converted to their text representation. This is less efficient than using native database types, but is easier to query.
rank integer UNIQUE#1 NOT NULL

Property-Value ordering. Any feature_cvterm can have multiple values for any particular property type - these are ordered in a list using rank, counting from zero. For properties that are single-valued rather than multi-valued, the default 0 value should be used.


[edit] Table: feature_dbxref

Links a feature to dbxrefs. This is for secondary identifiers; primary identifiers should use feature.dbxref_id.

feature_dbxref Structure
FK Name Type Description
feature_dbxref_id serial PRIMARY KEY

feature

feature_id integer UNIQUE#1 NOT NULL

dbxref

dbxref_id integer UNIQUE#1 NOT NULL
is_current boolean NOT NULL DEFAULT true

The is_current boolean indicates whether the linked dbxref is the current -official- dbxref for the linked feature.


[edit] Table: feature_pub

Provenance. Linking table between features and publications that mention them.

feature_pub Structure
FK Name Type Description
feature_pub_id serial PRIMARY KEY

feature

feature_id integer UNIQUE#1 NOT NULL

pub

pub_id integer UNIQUE#1 NOT NULL

Tables referencing this one via Foreign Key Constraints:



[edit] Table: feature_pubprop

Property or attribute of a feature_pub link.

feature_pubprop Structure
FK Name Type Description
feature_pubprop_id serial PRIMARY KEY

feature_pub

feature_pub_id integer UNIQUE#1 NOT NULL

cvterm

type_id integer UNIQUE#1 NOT NULL
value text
rank integer UNIQUE#1 NOT NULL


[edit] Table: feature_relationship

Features can be arranged in graphs, e.g. "exon part_of transcript part_of gene"; If type is thought of as a verb, the each arc or edge makes a statement [Subject Verb Object]. The object can also be thought of as parent (containing feature), and subject as child (contained feature or subfeature). We include the relationship rank/order, because even though most of the time we can order things implicitly by sequence coordinates, we can not always do this - e.g. transpliced genes. It is also useful for quickly getting implicit introns.

feature_relationship Structure
FK Name Type Description
feature_relationship_id serial PRIMARY KEY

feature

subject_id integer UNIQUE#1 NOT NULL

The subject of the subj-predicate-obj sentence. This is typically the subfeature.

feature

object_id integer UNIQUE#1 NOT NULL

The object of the subj-predicate-obj sentence. This is typically the container feature.

cvterm

type_id integer UNIQUE#1 NOT NULL

Relationship type between subject and object. This is a cvterm, typically from the OBO relationship ontology, although other relationship types are allowed. The most common relationship type is OBO_REL:part_of. Valid relationship types are constrained by the Sequence Ontology.
value text

Additional notes or comments.
rank integer UNIQUE#1 NOT NULL

The ordering of subject features with respect to the object feature may be important (for example, exon ordering on a transcript - not always derivable if you take trans spliced genes into consideration). Rank is used to order these; starts from zero.

Tables referencing this one via Foreign Key Constraints:



[edit] Table: feature_relationship_pub

Provenance. Attach optional evidence to a feature_relationship in the form of a publication.

feature_relationship_pub Structure
FK Name Type Description
feature_relationship_pub_id serial PRIMARY KEY

feature_relationship

feature_relationship_id integer UNIQUE#1 NOT NULL

pub

pub_id integer UNIQUE#1 NOT NULL


[edit] Table: feature_relationshipprop

Extensible properties for feature_relationships. Analagous structure to featureprop. This table is largely optional and not used with a high frequency. Typical scenarios may be if one wishes to attach additional data to a feature_relationship - for example to say that the feature_relationship is only true in certain contexts.

feature_relationshipprop Structure
FK Name Type Description
feature_relationshipprop_id serial PRIMARY KEY

feature_relationship

feature_relationship_id integer UNIQUE#1 NOT NULL

cvterm

type_id integer UNIQUE#1 NOT NULL

The name of the property/slot is a cvterm. The meaning of the property is defined in that cvterm. Currently there is no standard ontology for feature_relationship property types.
value text

The value of the property, represented as text. Numeric values are converted to their text representation. This is less efficient than using native database types, but is easier to query.
rank integer UNIQUE#1 NOT NULL

Property-Value ordering. Any feature_relationship can have multiple values for any particular property type - these are ordered in a list using rank, counting from zero. For properties that are single-valued rather than multi-valued, the default 0 value should be used.

Tables referencing this one via Foreign Key Constraints:



[edit] Table: feature_relationshipprop_pub

Provenance for feature_relationshipprop.

feature_relationshipprop_pub Structure
FK Name Type Description
feature_relationshipprop_pub_id serial PRIMARY KEY

feature_relationshipprop

feature_relationshipprop_id integer UNIQUE#1 NOT NULL

pub

pub_id integer UNIQUE#1 NOT NULL


[edit] Table: feature_synonym

Linking table between feature and synonym.

feature_synonym Structure
FK Name Type Description
feature_synonym_id serial PRIMARY KEY

synonym

synonym_id integer UNIQUE#1 NOT NULL

feature

feature_id integer UNIQUE#1 NOT NULL

pub

pub_id integer UNIQUE#1 NOT NULL

The pub_id link is for relating the usage of a given synonym to the publication in which it was used.
is_current boolean NOT NULL DEFAULT true

The is_current boolean indicates whether the linked synonym is the current -official- symbol for the linked feature.
is_internal boolean NOT NULL DEFAULT false

Typically a synonym exists so that somebody querying the db with an obsolete name can find the object theyre looking for (under its current name. If the synonym has been used publicly and deliberately (e.g. in a paper), it may also be listed in reports as a synonym. If the synonym was not used deliberately (e.g. there was a typo which went public), then the is_internal boolean may be set to -true- so that it is known that the synonym is -internal- and should be queryable but should not be listed in reports as a valid synonym.


[edit] Table: featureloc

The location of a feature relative to another feature. Important: interbase coordinates are used. This is vital as it allows us to represent zero-length features e.g. splice sites, insertion points without an awkward fuzzy system. Features typically have exactly ONE location, but this need not be the case. Some features may not be localized (e.g. a gene that has been characterized genetically but no sequence or molecular information is available). Note on multiple locations: Each feature can have 0 or more locations. Multiple locations do NOT indicate non-contiguous locations (if a feature such as a transcript has a non-contiguous location, then the subfeatures such as exons should always be manifested). Instead, multiple featurelocs for a feature designate alternate locations or grouped locations; for instance, a feature designating a blast hit or hsp will have two locations, one on the query feature, one on the subject feature. Features representing sequence variation could have alternate locations instantiated on a feature on the mutant strain. The column:rank is used to differentiate these different locations. Reflexive locations should never be stored - this is for -proper- (i.e. non-self) locations only; nothing should be located relative to itself.

featureloc Structure
FK Name Type Description
featureloc_id serial PRIMARY KEY

feature

feature_id integer UNIQUE#1 NOT NULL

The feature that is being located. Any feature can have zero or more featurelocs.

feature

srcfeature_id integer

The source feature which this location is relative to. Every location is relative to another feature (however, this column is nullable, because the srcfeature may not be known). All locations are -proper- that is, nothing should be located relative to itself. No cycles are allowed in the featureloc graph.
fmin integer

The leftmost/minimal boundary in the linear range represented by the featureloc. Sometimes (e.g. in Bioperl) this is called -start- although this is confusing because it does not necessarily represent the 5-prime coordinate. Important: This is space-based (interbase) coordinates, counting from zero. To convert this to the leftmost position in a base-oriented system (eg GFF, Bioperl), add 1 to fmin.
is_fmin_partial boolean NOT NULL DEFAULT false

This is typically false, but may be true if the value for column:fmin is inaccurate or the leftmost part of the range is unknown/unbounded.
fmax integer

The rightmost/maximal boundary in the linear range represented by the featureloc. Sometimes (e.g. in bioperl) this is called -end- although this is confusing because it does not necessarily represent the 3-prime coordinate. Important: This is space-based (interbase) coordinates, counting from zero. No conversion is required to go from fmax to the rightmost coordinate in a base-oriented system that counts from 1 (e.g. GFF, Bioperl).
is_fmax_partial boolean NOT NULL DEFAULT false

This is typically false, but may be true if the value for column:fmax is inaccurate or the rightmost part of the range is unknown/unbounded.
strand smallint

The orientation/directionality of the location. Should be 0, -1 or +1.
phase integer

Phase of translation with respect to srcfeature_id. Values are 0, 1, 2. It may not be possible to manifest this column for some features such as exons, because the phase is dependant on the spliceform (the same exon can appear in multiple spliceforms). This column is mostly useful for predicted exons and CDSs.
residue_info text

Alternative residues, when these differ from feature.residues. For instance, a SNP feature located on a wild and mutant protein would have different alternative residues. for alignment/similarity features, the alternative residues is used to represent the alignment string (CIGAR format). Note on variation features; even if we do not want to instantiate a mutant chromosome/contig feature, we can still represent a SNP etc with 2 locations, one (rank 0) on the genome, the other (rank 1) would have most fields null, except for alternative residues.
locgroup integer UNIQUE#1 NOT NULL

This is used to manifest redundant, derivable extra locations for a feature. The default locgroup=0 is used for the DIRECT location of a feature. Important: most Chado users may never use featurelocs WITH logroup > 0. Transitively derived locations are indicated with locgroup > 0. For example, the position of an exon on a BAC and in global chromosome coordinates. This column is used to differentiate these groupings of locations. The default locgroup 0 is used for the main or primary location, from which the others can be derived via coordinate transformations. Another example of redundant locations is storing ORF coordinates relative to both transcript and genome. Redundant locations open the possibility of the database getting into inconsistent states; this schema gives us the flexibility of both warehouse instantiations with redundant locations (easier for querying) and management instantiations with no redundant locations. An example of using both locgroup and rank: imagine a feature indicating a conserved region between the chromosomes of two different species. We may want to keep redundant locations on both contigs and chromosomes. We would thus have 4 locations for the single conserved region feature - two distinct locgroups (contig level and chromosome level) and two distinct ranks (for the two species).
rank integer UNIQUE#1 NOT NULL

Used when a feature has >1 location, otherwise the default rank 0 is used. Some features (e.g. blast hits and HSPs) have two locations - one on the query and one on the subject. Rank is used to differentiate these. Rank=0 is always used for the query, Rank=1 for the subject. For multiple alignments, assignment of rank is arbitrary. Rank is also used for sequence_variant features, such as SNPs. Rank=0 indicates the wildtype (or baseline) feature, Rank=1 indicates the mutant (or compared) feature.
featureloc Constraints
Name Constraint
featureloc_c2 CHECK ((fmin <= fmax))

Tables referencing this one via Foreign Key Constraints:



[edit] Table: featureloc_pub

Provenance of featureloc. Linking table between featurelocs and publications that mention them.

featureloc_pub Structure
FK Name Type Description
featureloc_pub_id serial PRIMARY KEY

featureloc

featureloc_id integer UNIQUE#1 NOT NULL

pub

pub_id integer UNIQUE#1 NOT NULL


[edit] Table: featureprop

A feature can have any number of slot-value property tags attached to it. This is an alternative to hardcoding a list of columns in the relational schema, and is completely extensible.

featureprop Structure
FK Name Type Description
featureprop_id serial PRIMARY KEY

feature

feature_id integer UNIQUE#1 NOT NULL

cvterm

type_id integer UNIQUE#1 NOT NULL

The name of the property/slot is a cvterm. The meaning of the property is defined in that cvterm. Certain property types will only apply to certain feature types (e.g. the anticodon property will only apply to tRNA features) ; the types here come from the sequence feature property ontology.
value text

The value of the property, represented as text. Numeric values are converted to their text representation. This is less efficient than using native database types, but is easier to query.
rank integer UNIQUE#1 NOT NULL

Property-Value ordering. Any feature can have multiple values for any particular property type - these are ordered in a list using rank, counting from zero. For properties that are single-valued rather than multi-valued, the default 0 value should be used

Tables referencing this one via Foreign Key Constraints:



[edit] Table: featureprop_pub

Provenance. Any featureprop assignment can optionally be supported by a publication.

featureprop_pub Structure
FK Name Type Description
featureprop_pub_id serial PRIMARY KEY

featureprop

featureprop_id integer UNIQUE#1 NOT NULL

pub

pub_id integer UNIQUE#1 NOT NULL


[edit] Table: synonym

A synonym for a feature. One feature can have multiple synonyms, and the same synonym can apply to multiple features.

synonym Structure
FK Name Type Description
synonym_id serial PRIMARY KEY
name character varying(255) UNIQUE#1 NOT NULL

The synonym itself. Should be human-readable machine-searchable ascii text.

cvterm

type_id integer UNIQUE#1 NOT NULL

Types would be symbol and fullname for now.
synonym_sgml character varying(255) NOT NULL

The fully specified synonym, with any non-ascii characters encoded in SGML.

Tables referencing this one via Foreign Key Constraints:



[edit] Table: phylonode

This is the most pervasive element in the phylogeny module, cataloging the "phylonodes" of tree graphs. Edges are implied by the parent_phylonode_id reflexive closure. For all nodes in a nested set implementation the left and right index will be *between* the parents left and right indexes.

phylonode Structure
FK Name Type Description
phylonode_id serial PRIMARY KEY

phylotree

phylotree_id integer UNIQUE#1 UNIQUE#2 NOT NULL

phylonode

parent_phylonode_id integer

Root phylonode can have null parent_phylonode_id value.
left_idx integer UNIQUE#1 NOT NULL
right_idx integer UNIQUE#2 NOT NULL

cvterm

type_id integer

Type: e.g. root, interior, leaf.

feature

feature_id integer

Phylonodes can have optional features attached to them e.g. a protein or nucleotide sequence usually attached to a leaf of the phylotree for non-leaf nodes, the feature may be a feature that is an instance of SO:match; this feature is the alignment of all leaf features beneath it.
label character varying(255)
distance double precision

Tables referencing this one via Foreign Key Constraints:



[edit] Table: phylonode_dbxref

For example, for orthology, paralogy group identifiers; could also be used for NCBI taxonomy; for sequences, refer to phylonode_feature, feature associated dbxrefs.

phylonode_dbxref Structure
FK Name Type Description
phylonode_dbxref_id serial PRIMARY KEY

phylonode

phylonode_id integer UNIQUE#1 NOT NULL

dbxref

dbxref_id integer UNIQUE#1 NOT NULL


[edit] Table: phylonode_organism

This linking table should only be used for nodes in taxonomy trees; it provides a mapping between the node and an organism. One node can have zero or one organisms, one organism can have zero or more nodes (although typically it should only have one in the standard NCBI taxonomy tree).

phylonode_organism Structure
FK Name Type Description
phylonode_organism_id serial PRIMARY KEY

phylonode

phylonode_id integer UNIQUE NOT NULL

One phylonode cannot refer to >1 organism.

organism

organism_id integer NOT NULL


[edit] Table: phylonode_pub

phylonode_pub Structure
FK Name Type Description
phylonode_pub_id serial PRIMARY KEY

phylonode

phylonode_id integer UNIQUE#1 NOT NULL

pub

pub_id integer UNIQUE#1 NOT NULL


[edit] Table: phylonode_relationship

This is for relationships that are not strictly hierarchical; for example, horizontal gene transfer. Most phylogenetic trees are strictly hierarchical, nevertheless it is here for completeness.

phylonode_relationship Structure
FK Name Type Description
phylonode_relationship_id serial PRIMARY KEY

phylonode

subject_id integer UNIQUE#1 NOT NULL

phylonode

object_id integer UNIQUE#1 NOT NULL

cvterm

type_id integer UNIQUE#1 NOT NULL
rank integer

phylotree

phylotree_id integer NOT NULL


[edit] Table: phylonodeprop

phylonodeprop Structure
FK Name Type Description
phylonodeprop_id serial PRIMARY KEY

phylonode

phylonode_id integer UNIQUE#1 NOT NULL

cvterm

type_id integer UNIQUE#1 NOT NULL

type_id could designate phylonode hierarchy relationships, for example: species taxonomy (kingdom, order, family, genus, species), "ortholog/paralog", "fold/superfold", etc.
value text UNIQUE#1 NOT NULL DEFAULT ''::text
rank integer UNIQUE#1 NOT NULL


[edit] Table: phylotree

Global anchor for phylogenetic tree.

phylotree Structure
FK Name Type Description
phylotree_id serial PRIMARY KEY

dbxref

dbxref_id integer NOT NULL
name character varying(255)

cvterm

type_id integer

Type: protein, nucleotide, taxonomy, for example. The type should be any SO type, or "taxonomy".

analysis

analysis_id integer
comment text

Tables referencing this one via Foreign Key Constraints:



[edit] Table: phylotree_pub

Tracks citations global to the tree e.g. multiple sequence alignment supporting tree construction.

phylotree_pub Structure
FK Name Type Description
phylotree_pub_id serial PRIMARY KEY

phylotree

phylotree_id integer UNIQUE#1 NOT NULL

pub

pub_id integer UNIQUE#1 NOT NULL


[edit] Table: library

library Structure
FK Name Type Description
library_id serial PRIMARY KEY

organism

organism_id integer UNIQUE#1 NOT NULL
name character varying(255)
uniquename text UNIQUE#1 NOT NULL

cvterm

type_id integer UNIQUE#1 NOT NULL

The type_id foreign key links to a controlled vocabulary of library types. Examples of this would be: "cDNA_library" or "genomic_library"

Tables referencing this one via Foreign Key Constraints:



[edit] Table: library_cvterm

The table library_cvterm links a library to controlled vocabularies which describe the library. For instance, there might be a link to the anatomy cv for "head" or "testes" for a head or testes library.

library_cvterm Structure
FK Name Type Description
library_cvterm_id serial PRIMARY KEY

library

library_id integer UNIQUE#1 NOT NULL

cvterm

cvterm_id integer UNIQUE#1 NOT NULL

pub

pub_id integer UNIQUE#1 NOT NULL


[edit] Table: library_feature

library_feature links a library to the clones which are contained in the library. Examples of such linked features might be "cDNA_clone" or "genomic_clone".

library_feature Structure
FK Name Type Description
library_feature_id serial PRIMARY KEY

library

library_id integer UNIQUE#1 NOT NULL

feature

feature_id integer UNIQUE#1 NOT NULL


[edit] Table: library_pub

library_pub Structure
FK Name Type Description
library_pub_id serial PRIMARY KEY

library

library_id integer UNIQUE#1 NOT NULL

pub

pub_id integer UNIQUE#1 NOT NULL


[edit] Table: library_synonym

library_synonym Structure
FK Name Type Description
library_synonym_id serial PRIMARY KEY

synonym

synonym_id integer UNIQUE#1 NOT NULL

library

library_id integer UNIQUE#1 NOT NULL

pub

pub_id integer UNIQUE#1 NOT NULL

The pub_id link is for relating the usage of a given synonym to the publication in which it was used.
is_current boolean NOT NULL DEFAULT true

The is_current bit indicates whether the linked synonym is the current -official- symbol for the linked library.
is_internal boolean NOT NULL DEFAULT false

Typically a synonym exists so that somebody querying the database with an obsolete name can find the object they are looking for under its current name. If the synonym has been used publicly and deliberately (e.g. in a paper), it my also be listed in reports as a synonym. If the synonym was not used deliberately (e.g., there was a typo which went public), then the is_internal bit may be set to "true" so that it is known that the synonym is "internal" and should be queryable but should not be listed in reports as a valid synonym.


[edit] Table: libraryprop

libraryprop Structure
FK Name Type Description
libraryprop_id serial PRIMARY KEY

library

library_id integer UNIQUE#1 NOT NULL

cvterm

type_id integer UNIQUE#1 NOT NULL
value text
rank integer UNIQUE#1 NOT NULL


[edit] Table: contact

Model persons, institutes, groups, organizations, etc.

contact Structure
FK Name Type Description
contact_id serial PRIMARY KEY

cvterm

type_id integer

What type of contact is this? E.g. "person", "lab".
name character varying(255) UNIQUE NOT NULL
description character varying(255)

Tables referencing this one via Foreign Key Constraints:



[edit] Table: contact_relationship

Model relationships between contacts

contact_relationship Structure
FK Name Type Description
contact_relationship_id serial PRIMARY KEY

cvterm

type_id integer UNIQUE#1 NOT NULL

Relationship type between subject and object. This is a cvterm, typically from the OBO relationship ontology, although other relationship types are allowed.

contact

subject_id integer UNIQUE#1 NOT NULL

The subject of the subj-predicate-obj sentence. In a DAG, this corresponds to the child node.

contact

object_id integer UNIQUE#1 NOT NULL

The object of the subj-predicate-obj sentence. In a DAG, this corresponds to the parent node.


[edit] Table: stock

Any stock can be globally identified by the combination of organism, uniquename and stock type. A stock is the physical entities, either living or preserved, held by collections. Stocks belong to a collection; they have IDs, type, organism, description and may have a genotype.

stock Structure
FK Name Type Description
stock_id serial PRIMARY KEY

dbxref

dbxref_id integer

The dbxref_id is an optional primary stable identifier for this stock. Secondary indentifiers and external dbxrefs go in table: stock_dbxref.

organism

organism_id integer UNIQUE#1 NOT NULL

The organism_id is the organism to which the stock belongs. This column is mandatory.
name character varying(255)

The name is a human-readable local name for a stock.
uniquename text UNIQUE#1 NOT NULL
description text

The description is the genetic description provided in the stock list.

cvterm

type_id integer UNIQUE#1 NOT NULL

The type_id foreign key links to a controlled vocabulary of stock types. The would include living stock, genomic DNA, preserved specimen. Secondary cvterms for stocks would go in stock_cvterm.
is_obsolete boolean NOT NULL DEFAULT false

Tables referencing this one via Foreign Key Constraints:



[edit] Table: stock_cvterm

stock_cvterm links a stock to cvterms. This is for secondary cvterms; primary cvterms should use stock.type_id.

stock_cvterm Structure
FK Name Type Description
stock_cvterm_id serial PRIMARY KEY

stock

stock_id integer UNIQUE#1 NOT NULL

cvterm

cvterm_id integer UNIQUE#1 NOT NULL

pub

pub_id integer UNIQUE#1 NOT NULL


[edit] Table: stock_dbxref

stock_dbxref links a stock to dbxrefs. This is for secondary identifiers; primary identifiers should use stock.dbxref_id.

stock_dbxref Structure
FK Name Type Description
stock_dbxref_id serial PRIMARY KEY

stock

stock_id integer UNIQUE#1 NOT NULL

dbxref

dbxref_id integer UNIQUE#1 NOT NULL
is_current boolean NOT NULL DEFAULT true

The is_current boolean indicates whether the linked dbxref is the current -official- dbxref for the linked stock.


[edit] Table: stock_genotype

Simple table linking a stock to a genotype. Features with genotypes can be linked to stocks thru feature_genotype -> genotype -> stock_genotype -> stock.

<
stock_genotype Structure
FK Name Type Description