Chaos-XML is based on the Chado relational model and is a subset of Chado’s content. For a full explanation of the meaning of the elements in Chaos-XML, please refer to the Chado manual and in particular the sequence module documentation.
Chaos XML was created at around the same time the main chado software development team at FlyBase devised the official Chado XML format. Chado XML and Chaos XML are semantically very similar, but they are different in how the XML to relational database mapping is performed. Chado XML is also considerably more verbose than Chaos XML. This is because Chaos uses some denormalisations of the Chado model, explained below. In our view these two formats are complementary. Conversions between the formats should be trivial.
Elements in Chaos XML will generally have an equivalent table or column in the Chado relational schema. Thus the Chado documentation should also serve as documentation for the Chaos XML format.
The central concept in Chaos/Chado is a “feature”. A feature can represent any genomic or sequence entity that is typed by the Sequence Ontology (SO).
Features are interconnected in a feature graph using the feature_relationship element. This is to indicate which exons and proteins belong to which transcript, which transcripts belong to which gene.
The location of a feature, relative to another feature, is described by the featureloc element. All locations are interbase (i.e. counting from 0, not 1. It is the gaps between bases that are counted, not the bases themselves). In contrast to chado, which uses fmin/fmax to indicate the left and right coordinates, chaos use nbeg/nend to indicate the five prime (natural start) and three prime (natural end) coordinates.
The Chaos-XML Library consists of specifications and software for dealing with Chaos-XML files.
The DTD specification can be found in chaos-xml/dtd/chaos.dtd
.
Soon there will also be specifications as XML Schema and/or Relax-NG
XSL transformations can be found in the chaos-xml/xsl/
directory.
Example Chaos-XML can be found in the chaos-xml/sample-data/
directory.
The scripts are in the chaos-xml/bin/
directory. You need to install
the perl chaos library before running these scripts.
You can browse the perl modules in the chaos-xml/lib/
directory. To
install, download the chaos-xml library and follow the instructions in
the chaos-xml/INSTALL
file.
There are XSLT stylesheets defined for mapping between these two similar
formats, see the chaos-xml/xsl
directory.
If you are not familiar with XSLT, you can use these scripts, part of
this distribution (see the chaos-xml/bin/
directory)
As new modules are added to chado (for example, the genetics module and the phylogeny module), corresponding chaos-xml DTDs will be generated.
Chaos uses nbeg and nend as opposed to the fmin and fmax found in the feature table.
Chaos collapses the normalised chado table dbxref into a single “dbxrefstr” PCDATA element.
Chaos uses a PCDATA element type in both feature and feature_relationship. In Chado, types are represented as a foreign key into the cvterm table. In Chaos the type string is implicitly mapped to cvterm with the same name as the type, from the Sequence Ontology (SO) CV.
Chaos uses an organismstr PCDATA element to represent the normalised chado table organism.