Difference between revisions of "GBrowse syn Scripts"

From GMOD
Jump to: navigation, search
m (mercatoraln_to_synhits.pl: Adding a tip from the GBrowse mailing list)
 
(37 intermediate revisions by 3 users not shown)
Line 1: Line 1:
This page describes helper scripts for processing alignment data for loading into GBrowse_syn.
+
[[GBrowse_syn]] is a [[GBrowse]] based [[synteny]] viewer.  This page describes helper scripts for processing alignment data for loading into [[GBrowse_syn]].
 
+
=Parsing Multiple Sequence Alignment Data=
+
The scripts in this section process multiple sequence alignment data in various formats and convert them to the [[GBrowse_syn_Database#alignment_data_loading_format|tab-delimited format]] used to load the GBRowse_syn database.
+
 
+
==aln2hit.pl==
+
[http://gmod.svn.sourceforge.net/viewvc/gmod/Generic-Genome-Browser/branches/stable/bin/gbrowse_syn/aln2hit.pl?revision=15979 aln2hit.pl] is a generic alignment data parser that reads alignment data into the GBrowse_syn database loading format.
+
  
 +
=load_alignments_msa.pl=
 
;Purpose
 
;Purpose
:Use this script in cases where you have a single alignment file and want to convert it to the [[GBrowse_syn_Database#alignment_data_loading_format|tab-delimited format]] that is used to load the GBrowse_syn alignment database.
+
:Use this script to load the GBrowse_syn alignment database from a multiple sequence alignment file.  A variety of formats are supported, including [http://www.bioperl.org/wiki/FASTA_multiple_alignment_format FASTA], [http://www.bioperl.org/wiki/ClustalW_multiple_alignment_format CLUSTAL], [http://www.bioperl.org/wiki/Stockholm_multiple_alignment_format STOCKHOLM], etc.
  
 
;Note
 
;Note
:This script is somewhat deprecated.  The intermediate tab-delimited format is no longer required to load the database.  You can use the [[GBrowse_syn_Scripts#load_alignments_msa.pl|load_alignments_msa.pl]] to load the database directly.
+
:Supported file formats are decoupled from the original application -- for example, FASTA and CLUSTALW is not generally used for whole genome alignments but a number of other applications can emit or read these formats.
  
 
;Example:
 
;Example:
  perl aln2hit.pl -f clustalw -i my_alignments.aln >my_alignments.txt
+
  perl load_alignments_msa.pl -f clustalw -u me -p mypsswd -d mydb -c -v
  
 
;Options:
 
;Options:
Line 24: Line 19:
 
!f
 
!f
 
|clustalw
 
|clustalw
|Specifies the alignment file format.  Most common formats recongnized by BioPerl's [http://doc.bioperl.org/releases/bioperl-1.0/Bio/AlignIO.html AlignIO parsers] are supported.  Use [http://www.bioperl.org/wiki/ClustalW_multiple_alignment_format clustalw] or [http://www.bioperl.org/wiki/FASTA_multiple_alignment_format fasta] for best results.
+
|Format on the multiple sequence alignment files
 
|-
 
|-
!i
+
!u
 
|
 
|
|Specifies the name of the input alignment file
+
|Username for the mysql database
 +
|-
 +
!p
 +
|
 +
|Password for the mysql database
 +
|-
 +
!d
 +
|
 +
|Database name
 +
|-
 +
!m
 +
|100
 +
|Resolution of the base-pair map uses to guide the alignment [[GBrowse_syn_Help#Grid_Lines|grid-lines in GBrowse_syn]]
 +
|-
 +
!n
 +
|
 +
|Flag to skip grid-line mapping (faster but you will lose all of the insertion/deletion data)
 +
|-
 +
!v
 +
|
 +
|Flag for verbose progress reporting
 +
|-
 +
!c
 +
|
 +
|Flag to create a new database and load the schema as well as the data.  Note, using this flag will erase all existing data prior to loading in new data.  Failing to use this option for a new database will cause a fatal error.
 
|}
 
|}
  
==clustal2hit.pl==
+
=load_alignment_database.pl=
<span class=pops>[http://gmod.svn.sourceforge.net/viewvc/gmod/Generic-Genome-Browser/branches/stable/bin/gbrowse_syn/clustal2hit.pl clustal2hit.pl] is a CLUSTALW format alignment data parser.
+
 
+
 
;Purpose
 
;Purpose
:Use this script in cases where you have a one or more clustal alignment files and want to convert them to the [[GBrowse_syn_Database#alignment_data_loading_format|tab-delimited format]] that is used to load the GBrowse_syn alignment database. One example where this could be useful is for splitting a very large clustal files using split_clustal.pl, then running the smaller files through clustal2hit.pl using the syntax shown below.
+
:This script loads the alignment database from a tab-delimited alignment data files (format described <span class=pops>[[GBrowse_syn_Database#alignment_data_loading_format|here]]</span>).   This format can either be an intermediate for parsed alignment data or can be used for data that does not come from multiple sequence alignments, for example gene orthology data, defined regions of co-linearity, etc.  The tab-delimited format requires start and end coordinates for each reference sequenceAny features that have start and end coordinates and strand information can be used.
 
+
;Note:
+
* This script is somewhat deprecated.  The intermediate tab-delimited format is no longer required to load the databaseYou can use the [[GBrowse_syn_Scripts#load_alignments_msa.pl|load_alignments_msa.pl]] to load the database directly.
+
* If you want to process multiple files in another format, edit the FORMAT constant near the top of this script
+
 
+
  
 
;Example:
 
;Example:
  perl clustal2hit.pl *.aln >my_alignments.txt
+
  perl load_alignment_database.pl -u user -p password -d dbname -c -v alignments.aln.txt alignments2.aln.txt
  
 
;Options:
 
;Options:
None.
 
 
==split_clustal.pl==
 
 
 
{|class=wikitable
 
{|class=wikitable
 
!argument
 
!argument
Line 55: Line 63:
 
!description
 
!description
 
|-
 
|-
 +
!u
 
|
 
|
 +
|Username for the mysql database
 +
|-
 +
!p
 
|
 
|
 +
|Password for the mysql database
 +
|-
 +
!d
 
|
 
|
 +
|Database name
 +
|-
 +
!v
 +
|
 +
|Flag for verbose progress reporting
 +
|-
 +
!c
 +
|
 +
|Flag to create a new database and load the schema as well as the data.  Note, using this flag will erase all existing data prior to loading in new data.  Failing to use this option for a new database will cause a fatal error.
 
|}
 
|}
  
==mercatoraln_to_synhits.pl==
+
=mercatoraln_to_synhits.pl=
 
[http://gmod.svn.sourceforge.net/viewvc/gmod/Generic-Genome-Browser/branches/stable/bin/gbrowse_syn/mercatoraln_to_synhits.pl mercatoraln_to_synhits.pl] is a data parser for multiple sequence alignments generated by mercator.
 
[http://gmod.svn.sourceforge.net/viewvc/gmod/Generic-Genome-Browser/branches/stable/bin/gbrowse_syn/mercatoraln_to_synhits.pl mercatoraln_to_synhits.pl] is a data parser for multiple sequence alignments generated by mercator.
  
Line 67: Line 91:
  
 
;Example:
 
;Example:
  Usage example here
+
  perl mercatoraln_to_synhits.pl -d alignments > mercator_alignments.gbrowse_syn.txt
 
+
load_alignment_database.pl -u user -p password -d dbname -c -v mercator_alignments.gbrowse_syn.txt
 
;Options:
 
;Options:
 
{|class=wikitable
 
{|class=wikitable
Line 77: Line 101:
 
!a
 
!a
 
|output.mfa
 
|output.mfa
|Specifies the name of the alignment file
+
|Specifies the name of the alignment file from when mercator does the MSA (MAVID, PECAN, or other genome alignment tool)
 
|-
 
|-
 
!v
 
!v
 
|
 
|
|Print progress reports while running
+
|Print progress reports while running. Note that the script will stop after reading the first line of the file (see line 117 of the script) if this option is set.
 
|-
 
|-
 
!f
 
!f
 
|fasta
 
|fasta
|Specifies format of the input alignment files
+
|Specifies format of the input alignment files (multi-fasta format is the default)
 
|-
 
|-
 
!d
 
!d
 
|
 
|
|Specifies the containing directory for the genome and map files
+
|Specifies the containing directory for the genome and map files (typically this is called '''alignments''' in the mercator pipeline)
 
|}
 
|}
  
=Direct Database Loading Scripts=
+
=aln2hit.pl=
The scripts in this section load data from multiple sequence alignment files into the GBrowse_syn alignment database
+
[http://gmod.svn.sourceforge.net/viewvc/gmod/Generic-Genome-Browser/branches/stable/bin/gbrowse_syn/aln2hit.pl?revision=15979 aln2hit.pl] is a generic alignment data parser that reads alignment data into the GBrowse_syn database loading format.
  
==load_alignment_database.pl==
 
 
;Purpose
 
;Purpose
:This script loads the alignment database from a tab-delimited alignment data files (format described here).
+
:Use this script in cases where you have a single alignment file and want to convert it to the [[GBrowse_syn_Database#alignment_data_loading_format|tab-delimited format]] that is used to load the GBrowse_syn alignment database.
 +
 
 +
<font color=red>
 +
;Note
 +
:This script is deprecated.  You can use the [[GBrowse_syn_Scripts#load_alignments_msa.pl|load_alignments_msa.pl]] to load the database directly.
 +
</font>
  
 
;Example:
 
;Example:
  perl load_alignment_database
+
  perl aln2hit.pl -f clustalw -i my_alignments.aln >my_alignments.txt
  
 +
;Options:
 
{|class=wikitable
 
{|class=wikitable
 
!argument
 
!argument
Line 107: Line 136:
 
!description
 
!description
 
|-
 
|-
|
+
!f
|
+
|clustalw
|
+
|Specifies the alignment file format.  Most common formats recongnized by BioPerl's [http://doc.bioperl.org/releases/bioperl-1.0/Bio/AlignIO.html AlignIO parsers] are supported.  Use [http://www.bioperl.org/wiki/ClustalW_multiple_alignment_format clustalw] or [http://www.bioperl.org/wiki/FASTA_multiple_alignment_format fasta] for best results.
|}
+
 
+
==load_alignments_gff3.pl==
+
 
+
{|class=wikitable
+
!argument
+
!default
+
!description
+
 
|-
 
|-
 +
!i
 
|
 
|
|
+
|Specifies the name of the input alignment file
|
+
 
|}
 
|}
  
 +
=clustal2hit.pl=
 +
<span class=pops>[http://gmod.svn.sourceforge.net/viewvc/gmod/Generic-Genome-Browser/branches/stable/bin/gbrowse_syn/clustal2hit.pl clustal2hit.pl] is a CLUSTALW format alignment data parser.
  
==load_alignments_msa.pl==
+
;Purpose
 +
:Use this script in cases where you have a one or more clustal alignment files and want to convert them to the [[GBrowse_syn_Database#alignment_data_loading_format|tab-delimited format]] that is used to load the GBrowse_syn alignment database.
 +
<font color=red>
 +
;Note
 +
:This script is deprecated.  You can use the [[GBrowse_syn_Scripts#load_alignments_msa.pl|load_alignments_msa.pl]] to load the database directly.
 +
</font>
  
{|class=wikitable
+
;Example:
!argument
+
perl clustal2hit.pl *.aln >my_alignments.txt
!default
+
 
!description
+
;Options:
|-
+
None.
|
+
|
+
|
+
|}
+
  
 
[[Category:GBrowse syn]]
 
[[Category:GBrowse syn]]
 
[[Category:Documentation]]
 
[[Category:Documentation]]

Latest revision as of 22:39, 21 February 2013

GBrowse_syn is a GBrowse based synteny viewer. This page describes helper scripts for processing alignment data for loading into GBrowse_syn.

load_alignments_msa.pl

Purpose
Use this script to load the GBrowse_syn alignment database from a multiple sequence alignment file. A variety of formats are supported, including FASTA, CLUSTAL, STOCKHOLM, etc.
Note
Supported file formats are decoupled from the original application -- for example, FASTA and CLUSTALW is not generally used for whole genome alignments but a number of other applications can emit or read these formats.
Example
perl load_alignments_msa.pl -f clustalw -u me -p mypsswd -d mydb -c -v
Options
argument default description
f clustalw Format on the multiple sequence alignment files
u Username for the mysql database
p Password for the mysql database
d Database name
m 100 Resolution of the base-pair map uses to guide the alignment grid-lines in GBrowse_syn
n Flag to skip grid-line mapping (faster but you will lose all of the insertion/deletion data)
v Flag for verbose progress reporting
c Flag to create a new database and load the schema as well as the data. Note, using this flag will erase all existing data prior to loading in new data. Failing to use this option for a new database will cause a fatal error.

load_alignment_database.pl

Purpose
This script loads the alignment database from a tab-delimited alignment data files (format described here). This format can either be an intermediate for parsed alignment data or can be used for data that does not come from multiple sequence alignments, for example gene orthology data, defined regions of co-linearity, etc. The tab-delimited format requires start and end coordinates for each reference sequence. Any features that have start and end coordinates and strand information can be used.
Example
perl load_alignment_database.pl -u user -p password -d dbname -c -v alignments.aln.txt alignments2.aln.txt
Options
argument default description
u Username for the mysql database
p Password for the mysql database
d Database name
v Flag for verbose progress reporting
c Flag to create a new database and load the schema as well as the data. Note, using this flag will erase all existing data prior to loading in new data. Failing to use this option for a new database will cause a fatal error.

mercatoraln_to_synhits.pl

mercatoraln_to_synhits.pl is a data parser for multiple sequence alignments generated by mercator.

Purpose
This script will process alignments generated by the MERCATOR pipeline
Example
perl mercatoraln_to_synhits.pl -d alignments > mercator_alignments.gbrowse_syn.txt
load_alignment_database.pl -u user -p password -d dbname -c -v mercator_alignments.gbrowse_syn.txt
Options
argument default description
a output.mfa Specifies the name of the alignment file from when mercator does the MSA (MAVID, PECAN, or other genome alignment tool)
v Print progress reports while running. Note that the script will stop after reading the first line of the file (see line 117 of the script) if this option is set.
f fasta Specifies format of the input alignment files (multi-fasta format is the default)
d Specifies the containing directory for the genome and map files (typically this is called alignments in the mercator pipeline)

aln2hit.pl

aln2hit.pl is a generic alignment data parser that reads alignment data into the GBrowse_syn database loading format.

Purpose
Use this script in cases where you have a single alignment file and want to convert it to the tab-delimited format that is used to load the GBrowse_syn alignment database.

Note
This script is deprecated. You can use the load_alignments_msa.pl to load the database directly.

Example
perl aln2hit.pl -f clustalw -i my_alignments.aln >my_alignments.txt
Options
argument default description
f clustalw Specifies the alignment file format. Most common formats recongnized by BioPerl's AlignIO parsers are supported. Use clustalw or fasta for best results.
i Specifies the name of the input alignment file

clustal2hit.pl

clustal2hit.pl is a CLUSTALW format alignment data parser.

Purpose
Use this script in cases where you have a one or more clustal alignment files and want to convert them to the tab-delimited format that is used to load the GBrowse_syn alignment database.

Note
This script is deprecated. You can use the load_alignments_msa.pl to load the database directly.

Example
perl clustal2hit.pl *.aln >my_alignments.txt
Options

None.