GMOD

ParameciumDB

ParameciumDBLogo.png

ParameciumDB (http://paramecium.cgm.cnrs-gif.fr) is a model organism database for the unicellular eukaryote Paramecium tetraurelia. ParameciumDB contains genome sequence and annotations, alleles and RNAi knockdowns, mutant phenotypes, and stocks all in a tightly integrated package. ParameciumDB is a good example of an online biological resource built mainly with GMOD Components.

This article provides an overview of Paramecium followed by a description of how ParameciumDB was implemented using GMOD components.

The intent of this page is to give you a feeling for how ParameciumDB uses GMOD, and what challenges they faced.

See also:

Contents

Paramecium Biology

Paramecium is a unicellular eukaryote that belongs to the ciliate phylum. Ciliates are the only unicellular organisms that separate germinal and somatic functions. Diploid but silent micronuclei undergo meiosis and transmit the genetic information to the next sexual generation. Highly polyploid macronuclei express the genetic information but develop anew at each sexual generation, through extensive programmed rearrangements of the genome.

Paramecium is a model for studying

Paramecium tetraurelia Genome

The somatic genome has been sequenced by Genoscope using a whole genome shotgun approach. That assembly and subsequent analysis have resulted in:

ParameciumDB

ParameciumDB is maintained by two people, Linda Sperling and Olivier Arnaiz at the Centre de Genetique Moleculaire, a part of the Centre National de la Recherche Scientifique. ParameciumDB is mainly implemented with GMOD Components.

ParameciumDB is first came online in August 2005.

Implementation

This section covers some details of how ParameciumDB was implemented and how it is maintained. This focuses on how GMOD Components are implemented, but also touches on toher technologies as well.

See also:

Database

ParameciumDB is built on the Chado schema and implemented in PostgreSQL database management system.

Design Overview

ParameciumDataModel.jpg

Chado Modules

ParameciumDB uses core Chado modules, plus the Genetic and Stock modules.

General Module

ParameciumDB uses the Chado General Module to handle database IDs and cross-references.

Pub Module

The Chado Publication Module is another core Chado module. ParameciumDB does not manually curate publications, but they do mine PubMed entries for Paramecium allele references.

Sequence Module

The Chado Sequence Module, another core module, is used to represent sequence features and synteny. Because of the recent whole genome duplication, a great deal of thought has been given to how to represent paralogy and synteny. These are represented in the sequence module using the feature, featureloc, and feature_relationship tables.

See also:

Controlled Vocabulary Module

The core Chado CV Module is used to store these ontologies:

The last two were developed at ParameciumDB to enable phenotypes to be modeled using the Entity-Quality model. The quality terms are provided by PATO. The anatomy ontology was developed for the Entity terms, since more granular ‘cellular component’ terms than are available in GO were needed to describe some species- or phylum-specific traits and cytological features, such as nuclear dimorphism and the ciliate cortex.

Ultimately, the ciliate- and Paramecium-specific terms in the Paramecium Anatomy Ontology, will be proposed for integration into the GO Cell Component Ontology. This still requires more work on the definitions and on identification of the appropriate place for the new terms in the GO Cell Component hierarchy.

The assay ontology will hopefully also be incorporated into a broader assay ontology in the future.

To create new phenotypes, we use the Phenote tool.

Genetic Module

This Chado Genetic Module is used to model information about Paramecium alleles, genetic interactions and phenotypes.

The genetic module is tightly linked to the Stock Module.

Stock Module

The Chado Stock Module, which is now a standard Chado extension module, originated at ParameciumDB.

This module was necessary to allow integration of Paramecium Stock Collections into ParameciumDB.

Web Site

ParameciumDBHomePage.png

Turnkey / GMODWeb

ParameciumDB uses Turnkey, a generic Web framework built on Apache, mod_perl, and SQLFairy, that takes a relational schema of a given database as input and transforms it into a fully-functional and customizable web site within minutes. We use templates and cascading style sheets to customize the ParameciumDB web interface.

GBrowse

GBrowse, the Generic Genome Browser, is used to display and query sequence annotation with a Bio::DB::SeqFeature::Store database.

Annotation

ParameciumDB does not have paid curators. It currently relies on the community for annotation of the gene models. They use Apollo as their genome annotation editor.

See also:

Middleware

Bio::Chado API

The Bio::Chado API is Perl Category%253Amiddleware module for working with Chado databases. It was developed specifically for the BioPipe project (so that BioPipe users can choose to store pipeline results in a Chado database as opposed to an EnsEMBL database) and for ParameciumDB.

Comments

Issues

Feedback

Related Reading

Categories:

Documentation

Community

Tools