Difference between revisions of "GBrowse syn Scripts"

From GMOD
Jump to: navigation, search
Line 122: Line 122:
 
;Options:
 
;Options:
 
None.
 
None.
 +
 +
 
</div>
 
</div>
  

Revision as of 11:40, 30 December 2009

This page describes helper scripts for processing alignment data for loading into GBrowse_syn.


Direct Database Loading Scripts

The scripts in this section load data from multiple sequence alignment files into the GBrowse_syn alignment database

load_alignment_database.pl

Purpose
This script loads the alignment database from a tab-delimited alignment data files (format described here).
Example
perl load_alignment_database
argument default description



load_alignments_msa.pl

Purpose
Use this script to load the GBrowse_syn alignment database from a multiple sequence alignment file. A variety of formats are supported, including FASTA, CLUSTAL, STOCKHOLM, etc.
Note
Supported file formats are decoupled from the original application -- for example, FASTA and CLUSTALW is not generally used for whole genome alignments but a number of other applications can emit or read these formats.
Example
perl load_alignments_msa.pl -f clustalw -i my_alignments.aln -u me -p mypsswd -d mydb -c -v
Options
argument default description
f clustalw Format on the multiple sequence alignment files
u Username for the mysql database
p Password for the mysql database
d Database name
m 100 Resolution of the base-pair map uses to guide the alignment grid-lines in GBrowse_syn
n Flag to skip grid-line mapping (faster but you will lose all of the insertion/deletion data)
v Flag for verbose progress reporting
c Flag to create a new database and load the schema as well as the data. Note, using this flag will erase all existing data prior to loading in new data. Failing to use this option for a new database will cause a fatal error.

Deprecated Alignment Parsers

The scripts in this section process multiple sequence alignment data in various formats and convert them to the tab-delimited format used to load the GBrowse_syn database.

aln2hit.pl

aln2hit.pl is a generic alignment data parser that reads alignment data into the GBrowse_syn database loading format.

Purpose
Use this script in cases where you have a single alignment file and want to convert it to the tab-delimited format that is used to load the GBrowse_syn alignment database.

Note
This script is deprecated. You can use the load_alignments_msa.pl to load the database directly.

Example
perl aln2hit.pl -f clustalw -i my_alignments.aln >my_alignments.txt
Options
argument default description
f clustalw Specifies the alignment file format. Most common formats recongnized by BioPerl's AlignIO parsers are supported. Use clustalw or fasta for best results.
i Specifies the name of the input alignment file

clustal2hit.pl

clustal2hit.pl is a CLUSTALW format alignment data parser.

Purpose
Use this script in cases where you have a one or more clustal alignment files and want to convert them to the tab-delimited format that is used to load the GBrowse_syn alignment database.

Note
This script is deprecated. You can use the load_alignments_msa.pl to load the database directly.

Example
perl clustal2hit.pl *.aln >my_alignments.txt
Options

None.


</div>

Other Scripts

mercatoraln_to_synhits.pl

mercatoraln_to_synhits.pl is a data parser for multiple sequence alignments generated by mercator.

Purpose
This script will process alignments generated by the MERCATOR pipeline
Example
Usage example here
Options
argument default description
a output.mfa Specifies the name of the alignment file
v Print progress reports while running
f fasta Specifies format of the input alignment files
d Specifies the containing directory for the genome and map files