Difference between revisions of "Load GFF Into Chado"
m (New page: __TOC__ ==Abstract== This HOWTO describes a method for loading sequence annotation data in GFF format into the Chado database. ==Authors== * [[S...) |
m |
||
Line 28: | Line 28: | ||
==Download the GFF Files== | ==Download the GFF Files== | ||
− | + | An easy way | |
+ | to load data into the database is to use a GFF3 file and the script | ||
+ | <code>load/bin/gmod_bulk_load_gff3.pl</code>. A nice set of sample data is the GFF3 file prepared | ||
+ | by the nice folks at the Saccharomyces Genome Database: | ||
+ | ftp://ftp.yeastgenome.org/pub/yeast/data_download/chromosomal_feature/saccharomyces_cerevisiae.gff | ||
+ | This file contains [http://geneontology.org Gene Ontology (GO)] anotations, so if you didn't load | ||
+ | GO when you executed `make ontologies`, you will get many warning messages | ||
+ | about being unable to find entries in the [[Chado_Tables:Table:_dbxref|dbxref]] table. If you want to | ||
+ | load GO you should be able to execute <code>make ontologies</code> and select 'Gene Ontology' | ||
+ | for installation. | ||
− | + | Then execute gmod_bulk_load_gff3.pl: | |
+ | |||
+ | >gmod_bulk_load_gff3.pl --organism yeast --gfffile saccharomyces_cerevisiae.gff | ||
+ | |||
+ | This loads the GFF3 file. The loading script requires GFF3 as it has tighter control of the syntax and requires the use of a controlled vocabulary (from Sequence Ontology Feature Annotation (SOFA)), allowing mapping to the relational schema. In addition to supplying the location of the file with the --gfffile flag, the --organism tag uses the common name (common_name field) from the organism table. Do <code>perldoc gmod_bulk_load_gff.pl</code> for more information on adding other organisms and databases, as well as other available commandline flags. | ||
+ | |||
+ | GFF3 can also be generated via a script provided with Bioperl, bp_genbank2gff.pl: | ||
+ | |||
+ | >bp_genbank2gff.pl --stdout --file <genbank file> > <gff file> | ||
+ | |||
+ | Note the redirection of standard out. This method for generating GFF3 files is not completely satisfactory and development is ongoing to provide better translation. | ||
+ | |||
+ | Note that gmod_load_gff3.pl is also available, but is limited in how | ||
+ | much it has been supported and in how flexible it currently is. It is | ||
+ | a good example of how to write code using Class::DBI classes that are | ||
+ | created at the time of install. For more information on using these | ||
+ | classes, see http://sourceforge.net/projects/gmod-ware for a Class::DBI | ||
+ | based middleware/API. | ||
==More Information== | ==More Information== | ||
+ | |||
+ | See the related HOWTO [[Load_RefSeq_Into_Chado|Load RefSeq Into Chado]]. | ||
Please send questions to the GMOD developers list: | Please send questions to the GMOD developers list: |
Revision as of 18:03, 16 March 2007
Contents
Abstract
This HOWTO describes a method for loading sequence annotation data in GFF format into the Chado database.
Authors
Copyright
This document is copyright Scott Cain , 2007. For reproduction other than personal use please contact <cain@cshl.edu>
Revision History
Revision 1.0 2007-03-16 BIO | First version |
Download the GFF Files
An easy way
to load data into the database is to use a GFF3 file and the script
load/bin/gmod_bulk_load_gff3.pl
. A nice set of sample data is the GFF3 file prepared
by the nice folks at the Saccharomyces Genome Database:
ftp://ftp.yeastgenome.org/pub/yeast/data_download/chromosomal_feature/saccharomyces_cerevisiae.gff
This file contains Gene Ontology (GO) anotations, so if you didn't load
GO when you executed `make ontologies`, you will get many warning messages
about being unable to find entries in the dbxref table. If you want to
load GO you should be able to execute make ontologies
and select 'Gene Ontology'
for installation.
Then execute gmod_bulk_load_gff3.pl:
>gmod_bulk_load_gff3.pl --organism yeast --gfffile saccharomyces_cerevisiae.gff
This loads the GFF3 file. The loading script requires GFF3 as it has tighter control of the syntax and requires the use of a controlled vocabulary (from Sequence Ontology Feature Annotation (SOFA)), allowing mapping to the relational schema. In addition to supplying the location of the file with the --gfffile flag, the --organism tag uses the common name (common_name field) from the organism table. Do perldoc gmod_bulk_load_gff.pl
for more information on adding other organisms and databases, as well as other available commandline flags.
GFF3 can also be generated via a script provided with Bioperl, bp_genbank2gff.pl:
>bp_genbank2gff.pl --stdout --file <genbank file> > <gff file>
Note the redirection of standard out. This method for generating GFF3 files is not completely satisfactory and development is ongoing to provide better translation.
Note that gmod_load_gff3.pl is also available, but is limited in how much it has been supported and in how flexible it currently is. It is a good example of how to write code using Class::DBI classes that are created at the time of install. For more information on using these classes, see http://sourceforge.net/projects/gmod-ware for a Class::DBI based middleware/API.
More Information
See the related HOWTO Load RefSeq Into Chado.
Please send questions to the GMOD developers list:
gmod-devel@lists.sourceforge.net
Or contact the GMOD Help Desk