GBrowse syn Scripts

From GMOD
Jump to: navigation, search

GBrowse_syn is a GBrowse based synteny viewer. This page describes helper scripts for processing alignment data for loading into GBrowse_syn.

load_alignments_msa.pl

Purpose
Use this script to load the GBrowse_syn alignment database from a multiple sequence alignment file. A variety of formats are supported, including FASTA, CLUSTAL, STOCKHOLM, etc.
Note
Supported file formats are decoupled from the original application -- for example, FASTA and CLUSTALW is not generally used for whole genome alignments but a number of other applications can emit or read these formats.
Example
perl load_alignments_msa.pl -f clustalw -u me -p mypsswd -d mydb -c -v
Options
argument default description
f clustalw Format on the multiple sequence alignment files
u Username for the mysql database
p Password for the mysql database
d Database name
m 100 Resolution of the base-pair map uses to guide the alignment grid-lines in GBrowse_syn
n Flag to skip grid-line mapping (faster but you will lose all of the insertion/deletion data)
v Flag for verbose progress reporting
c Flag to create a new database and load the schema as well as the data. Note, using this flag will erase all existing data prior to loading in new data. Failing to use this option for a new database will cause a fatal error.

load_alignment_database.pl

Purpose
This script loads the alignment database from a tab-delimited alignment data files (format described here). This format can either be an intermediate for parsed alignment data or can be used for data that does not come from multiple sequence alignments, for example gene orthology data, defined regions of co-linearity, etc. The tab-delimited format requires start and end coordinates for each reference sequence. Any features that have start and end coordinates and strand information can be used.
Example
perl load_alignment_database.pl -u user -p password -d dbname -c -v alignments.aln.txt alignments2.aln.txt
Options
argument default description
u Username for the mysql database
p Password for the mysql database
d Database name
v Flag for verbose progress reporting
c Flag to create a new database and load the schema as well as the data. Note, using this flag will erase all existing data prior to loading in new data. Failing to use this option for a new database will cause a fatal error.

mercatoraln_to_synhits.pl

mercatoraln_to_synhits.pl is a data parser for multiple sequence alignments generated by mercator.

Purpose
This script will process alignments generated by the MERCATOR pipeline
Example
perl mercatoraln_to_synhits.pl -d alignments > mercator_alignments.gbrowse_syn.txt
load_alignment_database.pl -u user -p password -d dbname -c -v mercator_alignments.gbrowse_syn.txt
Options
argument default description
a output.mfa Specifies the name of the alignment file from when mercator does the MSA (MAVID, PECAN, or other genome alignment tool)
v Print progress reports while running. Note that the script will stop after reading the first line of the file (see line 117 of the script) if this option is set.
f fasta Specifies format of the input alignment files (multi-fasta format is the default)
d Specifies the containing directory for the genome and map files (typically this is called alignments in the mercator pipeline)

aln2hit.pl

aln2hit.pl is a generic alignment data parser that reads alignment data into the GBrowse_syn database loading format.

Purpose
Use this script in cases where you have a single alignment file and want to convert it to the tab-delimited format that is used to load the GBrowse_syn alignment database.

Note
This script is deprecated. You can use the load_alignments_msa.pl to load the database directly.

Example
perl aln2hit.pl -f clustalw -i my_alignments.aln >my_alignments.txt
Options
argument default description
f clustalw Specifies the alignment file format. Most common formats recongnized by BioPerl's AlignIO parsers are supported. Use clustalw or fasta for best results.
i Specifies the name of the input alignment file

clustal2hit.pl

clustal2hit.pl is a CLUSTALW format alignment data parser.

Purpose
Use this script in cases where you have a one or more clustal alignment files and want to convert them to the tab-delimited format that is used to load the GBrowse_syn alignment database.

Note
This script is deprecated. You can use the load_alignments_msa.pl to load the database directly.

Example
perl clustal2hit.pl *.aln >my_alignments.txt
Options

None.