GBrowse syn Configuration

From GMOD
Revision as of 05:45, 3 August 2009 by Mckays (Talk | contribs)

Jump to: navigation, search

GBrowse_syn is a synteny viewer based on GBrowse. This page describes how to configure GBrowse_syn.

Main Configuration File

Purpose

The main configuration file specifies the alignment database, the species to be included and their corresponding configuration files and display options.

  • This file ends with the extension ".synconf".

Configurable Options

join

  • Required setting
  • The database source name (DSN) for the alignment database
#example
join        = dbi:mysql:database=pecan;host=localhost;user=nobody

source map

  • Required setting
  • This option maps the relationship between the species data sources, names and descriptions
# example:
#                 name         conf. file          description
source_map =     elegans      elegans_synteny     "C. elegans"
                 remanei      remanei_synteny     "C. remanei"
                 briggsae     briggsae_synteny    "C. briggsae"

tmpimages

  • The URL for cached image and session data
# example
tmpimages   = /gbrowse/tmp

buttons

  • The URL for stock GBrowse images, etc
# example
buttons       = /gbrowse/images/buttons

stylesheet

  • default: /gbrowse/gbrowse.css
  • The URL for the stylesheet

examples

  • Example searches to show at the top of the page
#example
examples = elegans X:1050000..1150000
           elegans I:10762799..10789727
           briggsae chrX:620000..670000

zoom levels

  • which zoom levels will be available in the navigation menu
zoom levels = 5000 10000 25000 50000 100000 200000 400000

config_extension

  • default: 'syn';
  • This specifies the extension of species-specific configuration files.
  • If GBrowse_syn is used with stand-alone GBrowse data sources, change this option to 'conf'.
  • To avoid confusing the configuration files parser, take care to select names for species-specific configuration files that are not similar to other file names. For example, do not use both elegans.conf (for GBrowse) and elegans.syn (for GBrowse_syn).

description

  • default: none
  • The description of the GBrowse_syn data source for public display

max_segment

  • default: 400_000
  • The maximum allowed segment size (sequence length) for the central reference panel
  • Take care not to set this value too high. Very large segments may cause significant network latency or even time out the web server

max_span

  • default: 0.3 (i.e., 30%)
  • This is an advanced option.
  • The maximum portion of the reference sequence size that will trigger merging of adjacent inset (aligned sequence) panels.

min_alignment_size

  • default: 0.01
  • The minimum alignment size, expressed as a fraction of the total reference sequence length, that will be used to create an inset panel.

imagewidth

  • default: 800
  • The width of the displayed sequence panels (pixels)

interimage_pad

  • default: 5
  • The space between inset panels (pixels)

vertical_pad

  • default: 5
  • The vertical space between panels (pixels)

align_height

  • default: 6
  • The height of the alignment syntenic block features (pixels)

max_gap

  • default: 200_000
  • This is an advanced option
  • The maximum gap allowed between chained alignment features

overview_ratio

  • default: 0.9
  • The relative width of the overview panel in relation to the width of the detailed display panel

overview bgcolor

  • default: gainsboro
  • The background color of the overview panel
  • Allowed values are named web colors or RGB hex codes (eg: '#FFFFFF')

Species Configuration File

  • By default these files has the extension ".syn"
  • Regular GBrowse configuration files (extension .conf) can also be used by changing the configuration option in the main config file (above)
  • To avoid confusing GBrowse, select names for your GBrowse_syn configuration files that are not similar to the names or regular GBrowse configuration files.
  • An example of a species config file can be seen here

Temporary workflow for sample data

The first thing we need to do is create a mysql alignment database using the command-line incantation below:

$ mysql -u root -e 'create database rice_synteney'

Then we will have a look at the input data:

 $ cd ~/data/gbrowse_syn/rice
 $ more data/rice.aln

CLUSTAL W(1.81) multiple sequence alignment W(1.81)


rice-3(+)/16598648-16600199      ggaggccggccgtctgccatgcgtgagccagacggggcgggccggagacaggccacgtgg
wild_rice-3(+)/14467855-14469373 gggggccgg------------------------------------agacaggccacgtgg
                                 ** ******                                    ***************


rice-3(+)/16598648-16600199      ccctgccccgggctgttgacccactggcacccctgtcccgggttgtcgccctcctttccc
wild_rice-3(+)/14467855-14469373 ccctgccccgggctgttgacccactggcacccctgtcccgggttgtcgccctcctttccc
                                 ************************************************************


rice-3(+)/16598648-16600199      cgccatgctctaagtttgctcctcttctcgaacttctctctttgattcttcacgtcctct
wild_rice-3(+)/14467855-14469373 cgccatgctctaagtttgctcctcttctcgaacttctctctttgattcttcacgtcctct
                                 ************************************************************



rice-3(+)/16598648-16600199      tggagcctccccttctagctcgatcacgctctgctcttccgcttggaggctggcaaaact
wild_rice-3(+)/14467855-14469373 tggagcctccccttctagctcgatcgcgctctgctcttccgcttggaggctggcaaaact
                                 ************************* **********************************

NOTE1: These data are in clustalw format. The scripts used to process these data will recognize clustalw and other commonly used formats recognized by BioPerl's AlignIO parser. This does not mean that clustal is the format used to generate the alignment dataBold text. These particular aignments were generated by blastZ and formated with compara pipeline components. See WGA_data for more information on whole genome alignments pipelines.


NOTE2: The sequence ID is this clustal file is overloaded to contain information about the species, strand and coordinates. This information is essential:

 rice-3(+)/16598648-16600199
 species-refseq(strand)/start-end



Then, we will load the database. This is time-consuming, so we will use a screen session to run it in the background while we turn our attention to downstream tasks

$ screen
  • When entering screen mode, hit 'space' to clear the first screen.
  • If your backapce key does not work in screen mode, use ^H (ctrl key + H key).
$ bin/load_alignments_msa.pl -u root -d rice_synteny --verbose data/rice.aln
Processing alignment file data/rice.aln...
Processing alignment 1
Mapping coordinates for alignment 1... Done!
Processed pair-wise alignment 1
Processing alignment 2
Mapping coordinates for alignment 2... Done!
Processed pair-wise alignment 2
Processing alignment 3
Mapping coordinates for alignment 3... Done!
Processed pair-wise alignment 3
Processing alignment 4
Mapping coordinates for alignment 4... Done!
Processed pair-wise alignment 4
Processing alignment 5
Mapping coordinates for alignment 5... Done!
Processed pair-wise alignment 5
Processing alignment 6
Mapping coordinates for alignment 6... Done!
Processed pair-wise alignment 6
Processing alignment 7
Mapping coordinates for alignment 7... Done!
Processed pair-wise alignment 7
Processing alignment 8
Mapping coordinates for alignment 8... Done!
Processed pair-wise alignment 8
Processing alignment 9
Mapping coordinates for alignment 9... Done!
Processed pair-wise alignment 9
Processing alignment 10
Mapping coordinates for alignment 10... Done!
Processed pair-wise alignment 10
  • This will go on for some time (there are 1800 alignments), so we will skip let the screen run in the background and work on our other tasks.