load_ncbi_taxonomy.pl
is a perl script for loading NCBI taxonomy trees
in the
phylotree
phylotree
table.
This script was contributed by Naama
Menda at
Sol Genomics Network (SGN) led by Lukas Mueller.
In the 1.1 release,load_ncbi_taxonomy.pl is installed with other scripts with the distribution and will typically go in /usr/bin or /usr/local/bin.
-H hostname
for database [required if -g
isn’t used]-D database name</t> [required if <tt>-g
isn’t used]-g GMOD database profile name
(can provide host and DB name)
Default: default
-p phylotree name
(optional - defaults to NCBI taxonomy tree
. You
want to set this if you plan to load more than one tree)-i input file
- list of taxonomy ids to be stored (optional- without
this the entire NCBI taxonomy will be loaded)-v
verbose output-t
trial mode. Don’t perform any store operations at all. (trial
mode cannot test inserting associated data for new terms)For storing phylonodes a new phylotree will be stored with the name ‘NCBI taxonomy tree’. Each organism will get a phylonode id and will be stored in a tmp table, since each phylonode (except for the root) has a parent_phylonode_id, which is an internal foreign key. Next each phylonode will get a left and right indexes, which are calculated by walking down the entire tree structure (see article by Aaron Mackey: http://www.oreillynet.com/pub/a/network/2002/11/27/bioconf.html?page=2). Only after each phylonode will have calculated indexes, the phylonode table will be populated from the tmp table.