Chado Comparative Schema

From GMOD
Revision as of 18:00, 8 November 2007 by Sperling (Talk | contribs)

Jump to: navigation, search

Overview

This article will be used to describe strategies to store comparative data within Chado. To kickstart the discussion the ex-TIGR crew will detail their strategy as used in the sybil system. We will discuss/criticize it in the discussion section.

Goal

The goal of this article and the ensuing discussion is to define a usage convention for the storage of comparative data in Chado. These data include: Paralogous and Orthologous Gene clusters and Syntenic Blocks (and potentially more?).

Strategies

Sybil/IGS

Basic overview (more to come)

Sybil defines orthologous/paralogous clusters as features of SO type 'match'.

The 'match' features are joined to a particular clustering analysis via analysisfeature.

The 'match' features are joined to the cluster members ('polypeptide' features in this case) via the featureloc table. The 'match' feature_id is the feature_id of the featureloc entry while the 'polypeptide' feature_id is the srcfeature_id.

Cluster assignments are based on bi-directional best BLASTP hit. Top BLASTP hits are stored in the database via 'match' and 'match_part' features independent of the clustering analysis. In this way multiple clustering analyses may be loaded which use the same set of BLASTP hits. Loading multiple clustering analyses can be useful when one wants to try out/compare some new clustering parameters or would like to cluster on only a subset of the loaded genomes.

The use of featureloc to join matching regions allows for the method to include storage of locatable syntenic regions based on whole genome alignment or other genomic DNA based comparison methods. Doing this would simply require replacing the 'polypeptide' feature with an 'assembly' feature in the featureloc table.

FlyBase

This document describes how FlyBase has implemented comparative data for Dmel<->Dpse and also how it plans to implement these analyses for 10 more Drosophila species.

ParameciumDB

ParameciumDB working document heavily inspired by the FlyBase comparative_implementation_standard. This document describes how ParameciumDB models paralogy and synteny within a genome that has undergone whole genome duplications.