GBrowse syn Scripts

From GMOD
Revision as of 15:14, 30 December 2009 by Mckays (Talk | contribs)

Jump to: navigation, search

This page describes helper scripts for processing alignment data for loading into GBrowse_syn.

load_alignments_msa.pl

Purpose
Use this script to load the GBrowse_syn alignment database from a multiple sequence alignment file. A variety of formats are supported, including FASTA, CLUSTAL, STOCKHOLM, etc.
Note
Supported file formats are decoupled from the original application -- for example, FASTA and CLUSTALW is not generally used for whole genome alignments but a number of other applications can emit or read these formats.
Example
perl load_alignments_msa.pl -f clustalw -u me -p mypsswd -d mydb -c -v
Options
argument default description
f clustalw Format on the multiple sequence alignment files
u Username for the mysql database
p Password for the mysql database
d Database name
m 100 Resolution of the base-pair map uses to guide the alignment grid-lines in GBrowse_syn
n Flag to skip grid-line mapping (faster but you will lose all of the insertion/deletion data)
v Flag for verbose progress reporting
c Flag to create a new database and load the schema as well as the data. Note, using this flag will erase all existing data prior to loading in new data. Failing to use this option for a new database will cause a fatal error.

load_alignment_database.pl

Purpose
This script loads the alignment database from a tab-delimited alignment data files (format described here).
Example
perl load_alignment_databasepl -u user -p password -d dbname -c -v alignments.aln alignments2.aln
Options
argument default description
u Username for the mysql database
p Password for the mysql database
d Database name
v Flag for verbose progress reporting
c Flag to create a new database and load the schema as well as the data. Note, using this flag will erase all existing data prior to loading in new data. Failing to use this option for a new database will cause a fatal error.

mercatoraln_to_synhits.pl

mercatoraln_to_synhits.pl is a data parser for multiple sequence alignments generated by mercator.

Purpose
This script will process alignments generated by the MERCATOR pipeline
Example
Usage example here
Options
argument default description
a output.mfa Specifies the name of the alignment file
v Print progress reports while running
f fasta Specifies format of the input alignment files
d Specifies the containing directory for the genome and map files


aln2hit.pl

aln2hit.pl is a generic alignment data parser that reads alignment data into the GBrowse_syn database loading format.

Purpose
Use this script in cases where you have a single alignment file and want to convert it to the tab-delimited format that is used to load the GBrowse_syn alignment database.

Note
This script is deprecated. You can use the load_alignments_msa.pl to load the database directly.

Example
perl aln2hit.pl -f clustalw -i my_alignments.aln >my_alignments.txt
Options
argument default description
f clustalw Specifies the alignment file format. Most common formats recongnized by BioPerl's AlignIO parsers are supported. Use clustalw or fasta for best results.
i Specifies the name of the input alignment file

clustal2hit.pl

clustal2hit.pl is a CLUSTALW format alignment data parser.

Purpose
Use this script in cases where you have a one or more clustal alignment files and want to convert them to the tab-delimited format that is used to load the GBrowse_syn alignment database.

Note
This script is deprecated. You can use the load_alignments_msa.pl to load the database directly.

Example
perl clustal2hit.pl *.aln >my_alignments.txt
Options

None.