Difference between revisions of "GBrowse syn Configuration"

From GMOD
Jump to: navigation, search
(Temporary workflow for sample data)
m (Configuration settings)
 
(17 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
[[GBrowse_syn]] is a [[synteny]] viewer based on [[GBrowse]].  This page describes how to configure GBrowse_syn.
 
[[GBrowse_syn]] is a [[synteny]] viewer based on [[GBrowse]].  This page describes how to configure GBrowse_syn.
  
=Main Configuration File=
+
= Main Configuration File =
==Purpose==
+
The main configuration file specifies the alignment database, the species to be included and their corresponding configuration files and display options.
+
  
*This file ends with the extension ".synconf".
+
== Purpose ==
  
*An example of the config file for the [http://dev.wormbase.org/db/seq/gbrowse_syn WormBase synteny browser] can be seen [[pecan.synconf|here]].
+
The main configuration file specifies the alignment database, the species to be included and their corresponding configuration files and display options.
 +
* The file ends with the extension ".synconf".
  
==Configurable Options==
+
=== Example config file ===
===join===
+
This example contains information about the alignment (joining) database and the individual database for of the species in the browser. For details of each setting, see below.
* Required setting
+
* The database source name (DSN) for the alignment database
+
#example
+
join        = dbi:mysql:database=pecan;host=localhost;user=nobody
+
  
===source map===
+
<code><pre>
* Required setting
+
[GENERAL]
* This option maps the relationship between the species data sources, names and descriptions
+
description =  PECAN alignments for Caenorhabditis
  
# example:
+
# The synteny database
#                name        conf. file          description
+
join        = dbi:mysql:database=pecan;host=localhost;user=nobody
source_map =     elegans      elegans_synteny    "C. elegans"
+
                  remanei      remanei_synteny    "C. remanei"
+
                  briggsae    briggsae_synteny    "C. briggsae"
+
  
===tmpimages===
+
#    symbolic src  config file (without the ".conf")  Description
* The URL for cached image and session data
+
source_map =     c_elegans      c_elegans    "C. elegans"
# example
+
                  c_remanei      c_remanei    "C. remanei"
tmpimages  = /gbrowse/tmp
+
                  c_briggsae    c_briggsae    "C. briggsae"
 +
                  c_brenneri    c_brenneri    "C. brenneri"
 +
                  c_japonica    c_japonica    "C. japonica"
  
===buttons===
+
tmpimages    = /gbrowse/tmp
* The URL for stock [[GBrowse]] images, etc
+
imagewidth    = 800
# example
+
stylesheet    = /gbrowse/gbrowse.css
buttons      = /gbrowse/images/buttons
+
cache time    = 1
  
===stylesheet===
+
# example searches to display
* default: /gbrowse/gbrowse.css
+
examples = c_elegans X:1050000..1150000
* The URL for the stylesheet
+
          c_briggsae chrX:620000..670000
 +
          c_elegans R193.2
  
===examples===
 
* Example searches to show at the top of the page
 
#example
 
examples = elegans X:1050000..1150000
 
            elegans I:10762799..10789727
 
            briggsae chrX:620000..670000
 
  
===zoom levels===
+
zoom levels = 5000 10000 25000 50000 100000 200000 400000
* which zoom levels will be available in the navigation menu
+
zoom levels = 5000 10000 25000 50000 100000 200000 400000
+
  
===config_extension===
+
# species-specific databases
* default: 'syn';
+
[c_elegans]
* This specifies the extension of species-specific configuration files.
+
tracks    = CG
* If GBrowse_syn is used with stand-alone [[GBrowse]] data sources, change this option to 'conf'.
+
color    = green
* To avoid confusing the configuration files parser, take care to select names for species-specific configuration files that are not similar to other file names.  For example, do not use both elegans.conf (for GBrowse) and elegans.syn (for GBrowse_syn).
+
  
===description===
+
[c_remanei]
* default: none
+
tracks    = CG
* The description of the GBrowse_syn data source for public display
+
color    = red
  
===max_segment===
+
[c_briggsae]
* default: 400_000
+
tracks    = CG
* The maximum allowed segment size (sequence length) for the central reference panel
+
color    = black
* Take care not to set this value too high.  Very large segments may cause significant network latency or even time out the web server
+
  
===max_span===
+
[c_brenneri]
* default: 0.3 (''i.e.'', 30%)
+
tracks    = CG
* This is an advanced option.
+
color    = purple
* The maximum portion of the reference sequence size that will trigger merging of adjacent inset (aligned sequence) panels.
+
  
===min_alignment_size===
+
[c_japonica]
* default: 0.01
+
tracks    = CG
* The minimum alignment size, expressed as a fraction of the total reference sequence length, that will be used to create an inset panel.
+
color    = blue
 +
</pre></code>
  
===imagewidth===
+
* Another example can be found in the <span class=pops>[[GBrowse_syn_Tutorial#The_GBrowse_syn_Config_File|GBrowse_syn_Tutorial]]</span>
* default: 800
+
* The width of the displayed sequence panels (pixels)
+
  
===interimage_pad===
+
== Configuration settings ==
* default: 5
+
See above for examples.
* The space between inset panels (pixels)
+
  
===vertical_pad===
+
{| class="x" border="1"
* default: 5
+
* The vertical space between panels (pixels)
+
  
===align_height===
+
! Option
* default: 6
+
! Required option?
* The height of the alignment syntenic block features (pixels)
+
! Default Value
 +
! Description
 +
|-
  
===max_gap===
+
| join
* default: 200_000
+
| Yes
* This is an advanced option
+
|
* The maximum gap allowed between chained alignment features
+
| The database source name (DSN) for the alignment database
 +
|-
  
===overview_ratio===
+
| source_map
* default: 0.9
+
| Yes
* The relative width of the overview panel in relation to the width of the detailed display panel
+
|
 +
| This option maps the relationship between the species data sources, names and descriptions. See the example above.
 +
* The value for "name" (the first column) is the symbolic name that gbrowse_syn uses to identify each species.
 +
* This value is also used in two other places in the gbrowse_syn configuration:
 +
*# it is used as the species name in the "examples" directive
 +
*# it is used as the species name in the .aln file
 +
* The value for "conf. file" is the basename of the corresponding gbrowse .conf file.  This value is also used to identify the species configuration stanzas at the bottom of the configuration file.
 +
|-
  
===overview bgcolor===
+
| tmpimages
* default: gainsboro
+
|
* The background color of the overview panel
+
|
* Allowed values are named web colors or RGB hex codes (eg: '#FFFFFF')
+
| The URL for cached image and session data
 +
|-
  
=Species Configuration File=
+
| buttons
 +
|
 +
|
 +
| The URL for stock [[GBrowse]] images, etc
 +
|-
  
*By default these files has the extension ".syn"
+
| stylesheet
*Regular GBrowse configuration files (extension .conf) can also be used by changing the configuration option '' in the main config file (above)
+
|
 +
| /gbrowse/gbrowse.css
 +
| The URL for the stylesheet
 +
|-
  
*This file is identical in structure to a normal [[GBrowse Configuration HOWTO#Adding and Configuring Databases|GBrowse configuration file]].
+
| examples
 +
|
 +
|
 +
| Example searches to show at the top of the page The species names used much match those used in the first column of the source_map directive.
 +
|-
  
*To avoid confusing GBrowse, select names for your GBrowse_syn configuration files that are not similar to the names or regular GBrowse configuration files.
+
| zoom levels
 +
|
 +
|
 +
| which zoom levels will be available in the navigation menu
 +
|-
  
*An example of a species config file can be seen [[briggsae.synconf|here]]
+
| config_extension
 +
|
 +
| syn
 +
| This specifies the extension of species-specific configuration files.
 +
* If GBrowse_syn is used with stand-alone [[GBrowse]] data sources, change this option to 'conf'.
 +
* To avoid confusing the configuration files parser, take care to select names for species-specific configuration files that are not similar to other file names.  For example, do not use both elegans.conf (for GBrowse) and elegans.syn (for GBrowse_syn).
 +
* NOTE: If you are using multiple data sources for gbrowse_syn, all must use the same config extension, you can not mix and match ".syn" and ".conf".
 +
|-
  
=Temporary workflow for sample data=
+
| description
 +
|
 +
| none
 +
| The description of the GBrowse_syn data source for public display
 +
|-
  
The first thing we need to do is create a mysql alignment database using the command-line incantation below:
+
| max_segment
 +
|
 +
| 400_000
 +
| The maximum allowed segment size (sequence length) for the central reference panel. Take care not to set this value too high. Very large segments take a long time to render and may even time out the web server!
 +
|-
  
  $ mysql -u root -e 'create database rice_synteney'
+
  | max_span
 +
|
 +
| 0.3
 +
| The maximum fraction of the reference sequence size that will trigger merging of adjacent inset (aligned sequence) panels.
 +
|-
  
Then we will have a look at the input data:
+
| min_alignment_size
 +
|
 +
| 0.01
 +
| The minimum alignment size, expressed as a fraction of the total reference sequence length, that will be used to create an inset panel.
 +
|-
  
<pre>
+
| imagewidth
  $ cd ~/data/gbrowse_syn/rice
+
  |
  $ more data/rice.aln
+
  | 800
 +
| The width of the displayed sequence in pixels.
 +
|-
  
CLUSTAL W(1.81) multiple sequence alignment W(1.81)
+
| interimage_pad
 +
|
 +
| 5
 +
| The space between inset panels in pixels. Of course you know what the inset panels are. Of course you do!
 +
|-
  
 +
| vertical_pad
 +
|
 +
| 5
 +
| The vertical space between panels in pixels.
 +
|-
  
rice-3(+)/16598648-16600199      ggaggccggccgtctgccatgcgtgagccagacggggcgggccggagacaggccacgtgg
+
| align_height
wild_rice-3(+)/14467855-14469373 gggggccgg------------------------------------agacaggccacgtgg
+
|
                                ** ******                                    ***************
+
| 6
 +
| The height of the alignment syntenic block features in pixels.
 +
|-
  
 +
| max_gap
 +
|
 +
| 200_000
 +
| The maximum gap allowed between chained alignment features.
 +
|-
  
rice-3(+)/16598648-16600199      ccctgccccgggctgttgacccactggcacccctgtcccgggttgtcgccctcctttccc
+
| overview_ratio
wild_rice-3(+)/14467855-14469373 ccctgccccgggctgttgacccactggcacccctgtcccgggttgtcgccctcctttccc
+
|
                                ************************************************************
+
| 0.9
 +
| The relative width of the overview panel in relation to the width of the detailed display panel.
 +
|-
  
 +
| overview bgcolor
 +
|
 +
| gainsboro
 +
| The background color of the overview panel. Allowed values are named web colors or RGB hex codes (eg: '#FFFFFF').
  
rice-3(+)/16598648-16600199      cgccatgctctaagtttgctcctcttctcgaacttctctctttgattcttcacgtcctct
+
|}
wild_rice-3(+)/14467855-14469373 cgccatgctctaagtttgctcctcttctcgaacttctctctttgattcttcacgtcctct
+
                                ************************************************************
+
  
 +
==The species' configuration stanzas==
 +
* Each individual species (or equivalent) that has a configuration file specified in the source map should also have a config stanza specifying which tracks to display and the theme color for the species.
 +
* Note that the label of each stanza must match one of those in the second column of the source_map.
 +
<pre>
 +
[elegans_synteny]
 +
tracks    = CG
 +
color    = blue
  
 +
[briggsae_synteny]
 +
tracks    = CG
 +
color    = purple
  
rice-3(+)/16598648-16600199      tggagcctccccttctagctcgatcacgctctgctcttccgcttggaggctggcaaaact
+
[remanei_synteny]
wild_rice-3(+)/14467855-14469373 tggagcctccccttctagctcgatcgcgctctgctcttccgcttggaggctggcaaaact
+
tracks    = CG
                                ************************* **********************************
+
color    = black
 
</pre>
 
</pre>
 
'''<font color=red>NOTE1:</font>''' These data are in clustalw format.  The scripts used to process these data will recognize clustalw and other commonly used formats recognized by BioPerl's AlignIO parser.  ''This does not mean that clustal is the format used to generate the alignment data'''''Bold text'''.  These particular aignments were generated by blastZ and formated with compara pipeline components.  See [[WGA_data]] for more information on whole genome alignments pipelines.
 
 
 
'''<font color=red>NOTE2:</font>''' The sequence ID is this clustal file is overloaded to contain information about the species, strand and coordinates.  This information is essential:
 
 
  rice-3(+)/16598648-16600199
 
  species-refseq(strand)/start-end
 
 
 
 
 
 
Then, we will load the database.  This is time-consuming, so we will use a screen session to run
 
it in the background while we turn our attention to downstream tasks
 
 
$ screen
 
 
*When entering screen mode, hit 'space' to clear the first screen.
 
*If your backapce key does not work in screen mode, use ^H (ctrl key + H key).
 
 
<pre>
 
$ bin/load_alignments_msa.pl -u root -d rice_synteny --verbose data/rice.aln
 
Processing alignment file data/rice.aln...
 
Processing alignment 1
 
Mapping coordinates for alignment 1... Done!
 
Processed pair-wise alignment 1
 
Processing alignment 2
 
Mapping coordinates for alignment 2... Done!
 
Processed pair-wise alignment 2
 
Processing alignment 3
 
Mapping coordinates for alignment 3... Done!
 
Processed pair-wise alignment 3
 
Processing alignment 4
 
Mapping coordinates for alignment 4... Done!
 
Processed pair-wise alignment 4
 
Processing alignment 5
 
Mapping coordinates for alignment 5... Done!
 
Processed pair-wise alignment 5
 
Processing alignment 6
 
Mapping coordinates for alignment 6... Done!
 
Processed pair-wise alignment 6
 
Processing alignment 7
 
Mapping coordinates for alignment 7... Done!
 
Processed pair-wise alignment 7
 
Processing alignment 8
 
Mapping coordinates for alignment 8... Done!
 
Processed pair-wise alignment 8
 
Processing alignment 9
 
Mapping coordinates for alignment 9... Done!
 
Processed pair-wise alignment 9
 
Processing alignment 10
 
Mapping coordinates for alignment 10... Done!
 
Processed pair-wise alignment 10
 
</pre>
 
 
* This will go on for some time (there are 1800 alignments), so we will skip let the screen run in the background and work on our other tasks.
 
  
 
[[Category:GBrowse syn]]
 
[[Category:GBrowse syn]]

Latest revision as of 19:51, 18 October 2011

GBrowse_syn is a synteny viewer based on GBrowse. This page describes how to configure GBrowse_syn.

Main Configuration File

Purpose

The main configuration file specifies the alignment database, the species to be included and their corresponding configuration files and display options.

  • The file ends with the extension ".synconf".

Example config file

This example contains information about the alignment (joining) database and the individual database for of the species in the browser. For details of each setting, see below.

[GENERAL]
description =  PECAN alignments for Caenorhabditis

# The synteny database
join        = dbi:mysql:database=pecan;host=localhost;user=nobody

#     symbolic src   config file (without the ".conf")  Description
source_map =      c_elegans      c_elegans     "C. elegans"
                  c_remanei      c_remanei     "C. remanei"
                  c_briggsae     c_briggsae    "C. briggsae"
                  c_brenneri     c_brenneri    "C. brenneri"
                  c_japonica     c_japonica    "C. japonica"

tmpimages     = /gbrowse/tmp
imagewidth    = 800
stylesheet    = /gbrowse/gbrowse.css
cache time    = 1

# example searches to display
examples = c_elegans X:1050000..1150000
           c_briggsae chrX:620000..670000
           c_elegans R193.2


zoom levels = 5000 10000 25000 50000 100000 200000 400000

# species-specific databases
[c_elegans]
tracks    = CG
color     = green

[c_remanei]
tracks    = CG
color     = red

[c_briggsae]
tracks    = CG
color     = black

[c_brenneri]
tracks    = CG
color     = purple

[c_japonica]
tracks    = CG
color     = blue

Configuration settings

See above for examples.

Option Required option? Default Value Description
join Yes The database source name (DSN) for the alignment database
source_map Yes This option maps the relationship between the species data sources, names and descriptions. See the example above.
  • The value for "name" (the first column) is the symbolic name that gbrowse_syn uses to identify each species.
  • This value is also used in two other places in the gbrowse_syn configuration:
    1. it is used as the species name in the "examples" directive
    2. it is used as the species name in the .aln file
  • The value for "conf. file" is the basename of the corresponding gbrowse .conf file. This value is also used to identify the species configuration stanzas at the bottom of the configuration file.
tmpimages The URL for cached image and session data
buttons The URL for stock GBrowse images, etc
stylesheet /gbrowse/gbrowse.css The URL for the stylesheet
examples Example searches to show at the top of the page The species names used much match those used in the first column of the source_map directive.
zoom levels which zoom levels will be available in the navigation menu
config_extension syn This specifies the extension of species-specific configuration files.
  • If GBrowse_syn is used with stand-alone GBrowse data sources, change this option to 'conf'.
  • To avoid confusing the configuration files parser, take care to select names for species-specific configuration files that are not similar to other file names. For example, do not use both elegans.conf (for GBrowse) and elegans.syn (for GBrowse_syn).
  • NOTE: If you are using multiple data sources for gbrowse_syn, all must use the same config extension, you can not mix and match ".syn" and ".conf".
description none The description of the GBrowse_syn data source for public display
max_segment 400_000 The maximum allowed segment size (sequence length) for the central reference panel. Take care not to set this value too high. Very large segments take a long time to render and may even time out the web server!
max_span 0.3 The maximum fraction of the reference sequence size that will trigger merging of adjacent inset (aligned sequence) panels.
min_alignment_size 0.01 The minimum alignment size, expressed as a fraction of the total reference sequence length, that will be used to create an inset panel.
imagewidth 800 The width of the displayed sequence in pixels.
interimage_pad 5 The space between inset panels in pixels. Of course you know what the inset panels are. Of course you do!
vertical_pad 5 The vertical space between panels in pixels.
align_height 6 The height of the alignment syntenic block features in pixels.
max_gap 200_000 The maximum gap allowed between chained alignment features.
overview_ratio 0.9 The relative width of the overview panel in relation to the width of the detailed display panel.
overview bgcolor gainsboro The background color of the overview panel. Allowed values are named web colors or RGB hex codes (eg: '#FFFFFF').

The species' configuration stanzas

  • Each individual species (or equivalent) that has a configuration file specified in the source map should also have a config stanza specifying which tracks to display and the theme color for the species.
  • Note that the label of each stanza must match one of those in the second column of the source_map.
[elegans_synteny]
tracks    = CG
color     = blue

[briggsae_synteny]
tracks    = CG
color     = purple

[remanei_synteny]
tracks    = CG
color     = black