ParameciumDB (http://paramecium.cgm.cnrs-gif.fr) is a model organism database for the unicellular eukaryote Paramecium tetraurelia. ParameciumDB contains genome sequence and annotations, alleles and RNAi knockdowns, mutant phenotypes, and stocks all in a tightly integrated package. ParameciumDB is a good example of an online biological resource built mainly with GMOD Components.
This article provides an overview of Paramecium followed by a description of how ParameciumDB was implemented using GMOD components.
The intent of this page is to give you a feeling for how ParameciumDB uses GMOD, and what challenges they faced.
See also:
Paramecium is a unicellular eukaryote that belongs to the ciliate phylum. Ciliates are the only unicellular organisms that separate germinal and somatic functions. Diploid but silent micronuclei undergo meiosis and transmit the genetic information to the next sexual generation. Highly polyploid macronuclei express the genetic information but develop anew at each sexual generation, through extensive programmed rearrangements of the genome.
Paramecium is a model for studying
The somatic genome has been sequenced by Genoscope using a whole genome shotgun approach. That assembly and subsequent analysis have resulted in:
ParameciumDB is maintained by two people, Linda Sperling and Olivier Arnaiz at the Centre de Genetique Moleculaire, a part of the Centre National de la Recherche Scientifique. ParameciumDB is mainly implemented with GMOD Components.
ParameciumDB is first came online in August 2005.
This section covers some details of how ParameciumDB was implemented and how it is maintained. This focuses on how GMOD Components are implemented, but also touches on toher technologies as well.
See also:
ParameciumDB is built on the Chado schema and implemented in PostgreSQL database management system.
ParameciumDB uses core Chado modules, plus the Genetic and Stock modules.
ParameciumDB uses the Chado General Module to handle database IDs and cross-references.
The Chado Publication Module is another core Chado module. ParameciumDB does not manually curate publications, but they do mine PubMed entries for Paramecium allele references.
The Chado Sequence
Module, another core
module, is used to represent sequence features and
synteny.
Because of the recent whole genome duplication, a great deal of thought
has been given to how to represent paralogy and synteny. These are
represented in the sequence module using the feature, featureloc
, and
feature_relationship
tables.
See also:
The core Chado CV Module is used to store these ontologies:
The last two were developed at ParameciumDB to enable phenotypes to be modeled using the Entity-Quality model. The quality terms are provided by PATO. The anatomy ontology was developed for the Entity terms, since more granular ‘cellular component’ terms than are available in GO were needed to describe some species- or phylum-specific traits and cytological features, such as nuclear dimorphism and the ciliate cortex.
Ultimately, the ciliate- and Paramecium-specific terms in the Paramecium Anatomy Ontology, will be proposed for integration into the GO Cell Component Ontology. This still requires more work on the definitions and on identification of the appropriate place for the new terms in the GO Cell Component hierarchy.
The assay ontology will hopefully also be incorporated into a broader assay ontology in the future.
To create new phenotypes, we use the Phenote tool.
This Chado Genetic Module is used to model information about Paramecium alleles, genetic interactions and phenotypes.
The genetic module is tightly linked to the Stock Module.
The Chado Stock Module, which is now a standard Chado extension module, originated at ParameciumDB.
This module was necessary to allow integration of Paramecium Stock Collections into ParameciumDB.
ParameciumDB uses Turnkey, a generic Web framework built on Apache, mod_perl, and SQLFairy, that takes a relational schema of a given database as input and transforms it into a fully-functional and customizable web site within minutes. We use templates and cascading style sheets to customize the ParameciumDB web interface.
GBrowse, the Generic Genome Browser, is used to display and query sequence annotation with a Bio::DB::SeqFeature::Store database.
ParameciumDB does not have paid curators. It currently relies on the community for annotation of the gene models. They use Apollo as their genome annotation editor.
See also:
The Bio::Chado API is Perl Category%253Amiddleware module for working with Chado databases. It was developed specifically for the BioPipe project (so that BioPipe users can choose to store pipeline results in a Chado database as opposed to an EnsEMBL database) and for ParameciumDB.