This is the new server for GMOD.org. Please let us know if you notice anything weird while it's getting broken in.

Chado Update via GFF

From GMOD
Jump to: navigation, search

There has frequently been interest in updating a Chado database using a GFF file, and I've finally gotten around to trying to implement it. My initial efforts were centered around converting GFF to Chado XML using Bio::SeqIO::chadoxml, but I was never completely satisfied with the result, and I was unable to load it with XORT or DBIx::DBStag. So, I've decided to work on the GFF3 bulk loader gmod_bulk_load_gff3.pl to have it do updates and deletes as well. Accordingly, I've identified these cases that should be addressed:

Updating properties

Perhaps the simplest case is when updating feature properties (for purposes of this discussion, 'feature properties' encompasses items in the featureprop, feature_cvterm and feature_dbxref tables) is desired, nevertheless, it poses some possible hang ups. For instance:

  • What should happen to the properties already there? Would they be uniformly deleted (bad), marked 'not current' (only partially possible) or just left there? Currently, the feature_dbxref table has an is_current column, but featureprop and feature_cvterm do not.
  • This is true of all updates and deletes: how to decide that the feature is the same? Is the Name enough? What about Name and type? Name, type and srcfeature/seq_id?

Updating feature locations

If name, type and srcfeature are the same, allow featureloc updates?

Updating complete gene models

If updating child features, what happens to the old features? Remove their featureloc entries and create completely new children? Only allow this for features of type 'gene'?

Deleting features

Again, if name, type and srcfeature are the same, allow the delete?

Comments

  • I'd say the most useful cases for many folks would be (a) add annotations/properties to main gene features, and (b) delete then reload existing gene features (with new primary data: locations, sequence, etc). These two abilities would handle many uses for annotating new genomes: adding more dbxrefs, properties, etc. to existing gene features, and ability to update selected features by drop/replace. For the second case, if one can Delete via a GFF entry, it should be easy to also Update the complete gene model.
  • For GFF input to handle these, I'd say lines like this should be able to trigger updates to an existing feature, where CRUDop is your database Create/Replace/Update/Drop operation.
 RefChr  Source  Type  (st) (en) (sc) (st) (ph)   Attributes
 ChrX    MyDB    gene    .    .   .    .    .      ID=MyGene1;CRUDop=DROP
 ChrX    MyDB    gene    .    .   .    .    .      ID=MyGene2;CRUDop=UPDATE;Dbxref=SW:U1234
 ChrX    MyDB    gene    1    2   9    -    .      ID=MyGene3;CRUDop=REPLACE;Dbxref=SW:U1234;..more..

Dongilbert 16:48, 30 March 2007 (EDT)