Difference between revisions of "GBrowse 2.0 HOWTO"

From GMOD
Jump to: navigation, search
(GBrowse Installation)
(GBrowse Installation)
Line 112: Line 112:
  
 
First, save the distribution archive (GBrowse-2.00.tar.gz) to a convenient place, and then unpack it using the '''tar''' and/or '''gunzip''' programs. In this and all subsequent examples, the commands you type are in '''bold'''.
 
First, save the distribution archive (GBrowse-2.00.tar.gz) to a convenient place, and then unpack it using the '''tar''' and/or '''gunzip''' programs. In this and all subsequent examples, the commands you type are in '''bold'''.
 
Biologist;GBrowse;GMOD Components
 
 
  
 
  % '''tar zxvf ~/projects/Generic-Genome-Browser-Trunk/GBrowse-2.00.tar.gz'''
 
  % '''tar zxvf ~/projects/Generic-Genome-Browser-Trunk/GBrowse-2.00.tar.gz'''
Line 123: Line 120:
 
  GBrowse-2.00/t/01yeast.t
 
  GBrowse-2.00/t/01yeast.t
 
  GBrowse-2.00/t/07.karotype.t
 
  GBrowse-2.00/t/07.karotype.t
 +
 +
Next, enter the newly-created directory and run the Build.PL script:
 +
 +
% '''perl Build.PL'''
 +
Checking whether your kit is complete...
 +
Looks good
 +
 +
Checking prerequisites...
 +
Looks good
 +
 +
cc -I/usr/lib/perl/5.8/CORE -fPIC -c -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O2 -o /tmp/compilet.o /tmp/compilet.c
 +
cc -shared -L/usr/local/lib -o /tmp/compilet.so /tmp/compilet.o
 +
 +
Creating new 'Build' script for 'GBrowse' version '2.00'
 +
Now run:
 +
  ./Build test
 +
  ./Build config
 +
  ./Build demo          (optional)
 +
  ./Build install      (as superuser/administrator)
 +
        -or-
 +
  ./Build install_slave (optional, for slave installations)
 +
 +
Among other things, this script will check for missing Perl libraries that you need to run GBrowse, and will tell you about them. Please be sure to install all prerequisites before going any further. All but one of the prerequisites can be downloaded from [http://www.cpan.org CPAN] or installed from the command line using the [http://www.perl.com/doc/manual/html/lib/CPAN.html CPAN shell]. There are also Debian and RPM installer packages for these libraries. See [[GBrowse 2.0 Prerequisites]] for full details.
 +
 +
==BioPerl Issues==
 +
 +
One prerequisite that is not available from CPAN is [http://www.bioperl.org BioPerl]. GBrowse requires version 1.5.2 of BioPerl, but so far only version 1.4 has been released to CPAN. You will need to go to the [http://www.bioperl.org/wiki/Getting_BioPerl#Bioperl_1.5.2.2C_Developer_Release BioPerl download page], and unpack and install BioPerl manually. Because 1.5.2 is officially a developer release, you may not wish to install it into your system files. In this case, you can simply unpack it into some convenient location (your home directory or elsewhere), and configure GBrowse to find it by invoking Build.PL this way:
 +
 +
% '''perl -I /home/fred/packages/bioperl-1.5.2 Build.PL'''
 +
 +
Replace ''/home/fred/packages/bioperl-1.5.2'' with the path to the unpacked BioPerl distribution. Henceforth, GBrowse and all its support scripts will know to look in this directory to find BioPerl. Once you choose a location for BioPerl, do not move or rename it.
  
 
[[Category:Biologist]]
 
[[Category:Biologist]]
 
[[Category:GBrowse]]
 
[[Category:GBrowse]]
 
[[Category:GMOD Components]]
 
[[Category:GMOD Components]]

Revision as of 16:02, 22 October 2008

This document is a work in progress. It describes how to install and configure GBrowse 2.0

Introduction

GBrowse 2.0 is a complete rewrite of the original GBrowse version. In addition to making the code base more maintainable, GBrowse 2.0 adds the following major features:

  • User Interface: The user interface uses AJAX to provide a smoother user experience. Tracks turn on and off immediately, and updates affect only the tracks that have changed.
  • More rational configuration: Most configuration options have been moved into a single shared configuration file. This allows data source-specific files to be shorter and more concise. This also increases the performance for sites that use hundreds of configuration files to display annotations on multiple species because only the global configuration file and the source-specific configuration file need to be read.
  • Multiple database support: You can now declare multiple databases for each data source and attach them to different tracks. This allows you to add and remove genome annotation data sets far more easily than in earlier versions.
  • Slave renderer support: If you have a multi-CPU processor, or access to several machines, you can distribute the tasks of reading the databases and rendering tracks across multiple processes and machines via a series of "slave" renderers. This greatly increases performance.

This document describes how to install and configure GBrowse 2.0 on your system. Readers familiar with GBrowse 1.70 or earlier should start with the next section, which is a quick summary of what is different. Readers who have not installed or configured GBrowse before should skip to GBrowse Installation.

For Users of GBrowse 1.X

GBrowse 2.0 is largely backward compatible with GBrowse 1.X, but you will need to do some modest work in order to port existing sources to the new system. This section tells you what you need to know.

Apache Environment Variables

GBrowse 1.X found the location of its configuration files by consulting a hard-coded variable located in the CGI script itself. This made it hard to move the configuration files around. In contrast, GBrowse 2.0 finds its configuration directory by consulting an environment variable named GBROWSE_CONF that is set by Apache. You must add a 'SetEnv directive in the Apache configuration file in order to create this variable and pass it through. Usually this directive will be located in the "cgi-bin" <Directory> section as follows:

 <Directory /usr/lib/cgi-bin>
   SetEnv GBROWSE_CONF /etc/GBrowse2
   ... # other stuff # ...
 </Directory>

Other environment variables that can be set in the Apache configuration file include:

GBROWSE_DOCS
Location of GBrowse's static HTML files and images in the file system (e.g. "/var/www/gbrowse2")
GBROWSE_ROOT
Location of GBrowse's static HTML files and images in URL space (e.g. "/gbrowse2")
GBROWSE_MASTER
Name of the GBrowse master configuration file located in the configuration directory, "GBrowse.conf" by default.
PERL5LIB
Colon-delimited list of directories to search for Perl modules. Useful if some modules, such as bioperl, are installed in non-standard locations.

The Build script will guide you through selecting most of these options when you run "./Build config". You can then create a suitable fragment of Apache configuration file code to cut and paste into its configuration file by running ./Build apache_config.

GBrowse.conf and Data Source Config Files

In GBrowse 1.X, each data source had its own configuration file. However, many or most of the options in each file, such as file paths, stylesheets, and header/footer options, were the same, causing config file bloat. In GBrowse 2.0, all common configuration options have been moved into a master configuration file, usually located at /etc/GBrowse2/GBrowse.conf.

GBrowse.conf contains a [GENERAL] stanza that sets such options as the location of the data-specific configuration files, static HTML, Javascript and CSS files, timeouts, session settings and global appearance settings. It also contains one or more data-source stanzas, one for each species (or genome annotation release) you want to make available for browsing. Each data-source specific stanza looks like this:

 [datasource]
 description = This is a description
 path        = datasource.conf

The description appears in the pop-up menu that allows users to select the genome to browser. The path specifies the path to the configuration file for that data source. The Build process installs an example GBrowse.conf for you, so you can see how this is done.

Each data-source specific configuration file also has a [GENERAL] stanza. Options in this stanza supplement or override settings in GBrowse.conf. Usually there will be only a very few options in this stanza. Following this there is a [TRACK DEFAULTS] stanza that sets default options for tracks, followed by a series of [TRACK_NAME] stanzas for configuring individual tracks.

To migrate your GBrowse 1.X configuration files to 2.0, simply customize the [GENERAL] section of the new GBrowse.conf file to meet your needs, and then create a [datasource] section that points to each of your existing GBrowse 1.X config files. In most cases, these config files will work as is. Later, you may wish to consolidate redundant options that are shared among your config files in order to simplify maintenance.

Specifying Databases

In GBrowse 1.X each data source could be attached to one and only one database. In GBrowse 2.0, you can declare as many databases as you like, and attach them to one or more tracks. The syntax is simple. Somewhere in the data source configuration file (suggested: between [GENERAL] and the track stanzas) declare one or more [name:database] stanzas. For example:

 [volvox_genbank:database]
 db_adaptor    = Bio::DB::SeqFeature::Store
 db_args       = -adaptor memory
                 -dir    /usr/share/databases/volvox_gb_mirror

 [volvox_ncRNA:database]
 db_adaptor   = Bio::DB::SeqFeature::Store
 db_args      = -adaptor DBI::mysql
                -dsn     volvox_ncRNA

This declares two databases, one named "volvox_genbank" and the other "volvox_local". You then assign these to the tracks as follows:

 [GENES]
 database = volvox_genbank
 feature  = gene:genbank
 ... etc...

 [miRNAs]
 database = volvox_ncRNA
 feature  = miRNA
 ... etc...

The default database is specified in the [GENERAL] or [TRACK DEFAULTS] section, with the latter taking precedence over the former:

 [GENERAL]
 database = volvox_genbank   # this will be the default
 ... etc...

For backward compatibility, you can forego the [:database] sections entirely and just place db_adaptor and db_args options directly in the [GENERAL] and/or [TRACK] stanzas. The system will do its best to minimize the amount of redundancy and uniqueify the databases.

Specifying Rendering Slaves

GBrowse 2.0 supports rendering slaves, which are small network-based servers that receive track rendering requests from the GBrowse server and generate the text and graphics needed for a track. By judiciously spreading out the work among multiple slaves, you can speed up rendering considerably. On multiprocessor systems, there is also an advantage to having one or more rendering slaves running on the local host.

To attach a rendering slave to a track, add the remote renderer option, giving the host and port of the slave in URL format:

 [GENES]
 feature  = gene:genbank
 remote renderer = http://node22.serverfarm.org:1800
 ... etc...

 [miRNAs]
 database = volvox_ncRNA
 feature  = miRNA
 remote renderer = http://node23.serverfarm.org:1800

The database and remote renderer options are independent of each other, and can be mixed and matched according to your needs. See Running GBrowse 2.0 Rendering Slaves for more information on setting up renderers.

GBrowse Installation

First, save the distribution archive (GBrowse-2.00.tar.gz) to a convenient place, and then unpack it using the tar and/or gunzip programs. In this and all subsequent examples, the commands you type are in bold.

% tar zxvf ~/projects/Generic-Genome-Browser-Trunk/GBrowse-2.00.tar.gz
GBrowse-2.00/
GBrowse-2.00/t/
GBrowse-2.00/t/05.deferredrendering.t
GBrowse-2.00/t/06.featuresearch.t
GBrowse-2.00/t/01yeast.t
GBrowse-2.00/t/07.karotype.t

Next, enter the newly-created directory and run the Build.PL script:

% perl Build.PL
Checking whether your kit is complete...
Looks good

Checking prerequisites...
Looks good

cc -I/usr/lib/perl/5.8/CORE -fPIC -c -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O2 -o /tmp/compilet.o /tmp/compilet.c
cc -shared -L/usr/local/lib -o /tmp/compilet.so /tmp/compilet.o

Creating new 'Build' script for 'GBrowse' version '2.00'
Now run:
 ./Build test
 ./Build config
 ./Build demo          (optional)
 ./Build install       (as superuser/administrator)
       -or-
 ./Build install_slave (optional, for slave installations)

Among other things, this script will check for missing Perl libraries that you need to run GBrowse, and will tell you about them. Please be sure to install all prerequisites before going any further. All but one of the prerequisites can be downloaded from CPAN or installed from the command line using the CPAN shell. There are also Debian and RPM installer packages for these libraries. See GBrowse 2.0 Prerequisites for full details.

BioPerl Issues

One prerequisite that is not available from CPAN is BioPerl. GBrowse requires version 1.5.2 of BioPerl, but so far only version 1.4 has been released to CPAN. You will need to go to the BioPerl download page, and unpack and install BioPerl manually. Because 1.5.2 is officially a developer release, you may not wish to install it into your system files. In this case, you can simply unpack it into some convenient location (your home directory or elsewhere), and configure GBrowse to find it by invoking Build.PL this way:

% perl -I /home/fred/packages/bioperl-1.5.2 Build.PL

Replace /home/fred/packages/bioperl-1.5.2 with the path to the unpacked BioPerl distribution. Henceforth, GBrowse and all its support scripts will know to look in this directory to find BioPerl. Once you choose a location for BioPerl, do not move or rename it.