Difference between revisions of "GMODTools TestCase"

From GMOD
Jump to: navigation, search
(New page: This is an example test case for GMOD, loading a GenBank format genome (chromosome) to Chado database, then dumping it out, including GenBank submission format with GenBank output to test ...)
 
m (add output url)
 
Line 9: Line 9:
 
   set dbname=anogam_x
 
   set dbname=anogam_x
 
   $pg/bin/createdb -T chado_01_template $dbname
 
   $pg/bin/createdb -T chado_01_template $dbname
 
+
 
 
   # fix Genbank FT to SO type map if needed
 
   # fix Genbank FT to SO type map if needed
 
   vi  lib/Bio/SeqFeature/Tools/TypeMapper.pm : add pseudogenic tRNA
 
   vi  lib/Bio/SeqFeature/Tools/TypeMapper.pm : add pseudogenic tRNA
+
 
 
   # load chromosome X to chado
 
   # load chromosome X to chado
 
   gunzip -c NC_004818.gbk.gz |\
 
   gunzip -c NC_004818.gbk.gz |\
 
   perl bin/bp_genbank2gff3.pl -noCDS -in stdin -out stdout |\
 
   perl bin/bp_genbank2gff3.pl -noCDS -in stdin -out stdout |\
   perl bin/gmod_bulk_load_gff3.pl -dbname $dbname -organism fromdata  
+
   perl bin/gmod_bulk_load_gff3.pl -dbname $dbname -organism fromdata
 
+
 
 
   # create GMOD Bulkfiles conf/bulkfiles/anogam.xml from template.xml : dbname, etc. edits
 
   # create GMOD Bulkfiles conf/bulkfiles/anogam.xml from template.xml : dbname, etc. edits
 
   # see samples to create a new organism/project configuration
 
   # see samples to create a new organism/project configuration
Line 23: Line 23:
 
   # create standard Bulkfiles outputs for anogam_x
 
   # create standard Bulkfiles outputs for anogam_x
 
   perl -Ilib bin/bulkfiles.pl -config=anogam  -make >& log.anogam1 &
 
   perl -Ilib bin/bulkfiles.pl -config=anogam  -make >& log.anogam1 &
 
+
 
 
   # new in progress output, includes regurgitation of GenBank record to compare with original
 
   # new in progress output, includes regurgitation of GenBank record to compare with original
 
   # you should check and edit conf/bulkfiles/genbanksubmit.xml before running this part
 
   # you should check and edit conf/bulkfiles/genbanksubmit.xml before running this part
   perl -Ilib bin/bulkfiles.pl -config=anogam -format=genbanktbl -make
+
   perl -Ilib bin/bulkfiles.pl -config=anogam -format=genbanktbl -make
  
Outputs should include genbanksubmit/ folder with outputs of GMOD Bulkfiles:  
+
Outputs should include genbanksubmit/ folder with outputs of GMOD Bulkfiles:
 
  anogam-all-anogam1.tbl    : feature table  ** new part
 
  anogam-all-anogam1.tbl    : feature table  ** new part
 
  anogam-all-anogam1.fsa    : genome dna    == fasta/chromosome
 
  anogam-all-anogam1.fsa    : genome dna    == fasta/chromosome
Line 37: Line 37:
 
  anogam-all-anogam1.val : errors & warnings
 
  anogam-all-anogam1.val : errors & warnings
 
  anogam-all-anogam1.gbf : Genbank format for review
 
  anogam-all-anogam1.gbf : Genbank format for review
 +
 +
You will find these sample outputs here http://insects.eugenes.org/genome/Anopheles_gambiae_str._PEST/anogam_20080511/
  
 
There remain caveats on reproducing the original GenBank record this way.  At this writing, the GenbankSubmit function is still in development, and the Genbank to GFF to Chado loading has some losses.
 
There remain caveats on reproducing the original GenBank record this way.  At this writing, the GenbankSubmit function is still in development, and the Genbank to GFF to Chado loading has some losses.

Latest revision as of 17:03, 20 June 2008

This is an example test case for GMOD, loading a GenBank format genome (chromosome) to Chado database, then dumping it out, including GenBank submission format with GenBank output to test round-trip agreement.

This is an abbreviated example, current as of May 2008. It uses a Chado-db template in Postgres, and assumes you have installed/tested GMOD components available.

 # get AnoGam chrX and load to chado db
 curl -OR ftp://bio-mirror.net/biomirror/ncbigenomes/Anopheles_gambiae/CHR_X/NC_004818.gbk.gz
 or curl -OR ftp://ftp.ncbi.nih.gov/genomes/Anopheles_gambiae/CHR_X/NC_004818.gbk.gz
 set dbname=anogam_x
 $pg/bin/createdb -T chado_01_template $dbname
 # fix Genbank FT to SO type map if needed
 vi  lib/Bio/SeqFeature/Tools/TypeMapper.pm : add pseudogenic tRNA
 # load chromosome X to chado
 gunzip -c NC_004818.gbk.gz |\
  perl bin/bp_genbank2gff3.pl -noCDS -in stdin -out stdout |\
  perl bin/gmod_bulk_load_gff3.pl -dbname $dbname -organism fromdata
 # create GMOD Bulkfiles conf/bulkfiles/anogam.xml from template.xml : dbname, etc. edits
 # see samples to create a new organism/project configuration
 # create standard Bulkfiles outputs for anogam_x
 perl -Ilib bin/bulkfiles.pl -config=anogam  -make >& log.anogam1 &
 # new in progress output, includes regurgitation of GenBank record to compare with original
 # you should check and edit conf/bulkfiles/genbanksubmit.xml before running this part
 perl -Ilib bin/bulkfiles.pl -config=anogam -format=genbanktbl -make

Outputs should include genbanksubmit/ folder with outputs of GMOD Bulkfiles:

anogam-all-anogam1.tbl    : feature table  ** new part
anogam-all-anogam1.fsa    : genome dna     == fasta/chromosome
anogam-all-anogam1.pep    : protein aa     == fasta/translation

and outputs of NCBI tbl2asn:

anogam-all-anogam1.sqn : ASN.1 record to submit to NCBI
anogam-all-anogam1.val : errors & warnings
anogam-all-anogam1.gbf : Genbank format for review

You will find these sample outputs here http://insects.eugenes.org/genome/Anopheles_gambiae_str._PEST/anogam_20080511/

There remain caveats on reproducing the original GenBank record this way. At this writing, the GenbankSubmit function is still in development, and the Genbank to GFF to Chado loading has some losses.