Difference between revisions of "GBrowse 2.0 HOWTO"

From GMOD
Jump to: navigation, search
(GBrowse Installation)
(GBrowse Installation)
Line 259: Line 259:
 
</pre>
 
</pre>
  
You should now cut out the indicated section and paste it into the appropriate Apache configuration file. Typically this is a file named ''httpd.conf'' or ''apache.conf'' located in /etc/apache, /etc/httpd or /usr/local/apache/conf (linux and MacOSX), or ''C:/Program Files/Apache Software Foundation/Apache2*/conf'' under Windows. Because httpd.conf gets updated by vendors on a regular basis, you may wish to importing these directives by creating a separate conf file to contain them, and then pointing the main httpd.conf file at this file  using an ''Include'' directive:
+
You should now cut out the indicated section and paste it into the appropriate Apache configuration file. Typically this is a file named ''httpd.conf'' or ''apache.conf'' located in /etc/apache, /etc/httpd or /usr/local/apache/conf (linux and MacOSX), or ''C:/Program Files/Apache Software Foundation/Apache2*/conf'' under Windows. Because httpd.conf gets updated by vendors on a regular basis, you may wish to put the GBrowse-specific directives in a separate configuration file and then point Apache at them by placing something like the followint at the bottom of ''httpd.conf:''
  
 
   # at the bottom of httpd.conf
 
   # at the bottom of httpd.conf

Revision as of 16:06, 23 October 2008

This document is a work in progress. It describes how to install and configure GBrowse 2.0

Introduction

GBrowse 2.0 is a complete rewrite of the original GBrowse version. In addition to making the code base more maintainable, GBrowse 2.0 adds the following major features:

  • User Interface: The user interface uses AJAX to provide a smoother user experience. Tracks turn on and off immediately, and updates affect only the tracks that have changed.
  • More rational configuration: Most configuration options have been moved into a single shared configuration file. This allows data source-specific files to be shorter and more concise. This also increases the performance for sites that use hundreds of configuration files to display annotations on multiple species because only the global configuration file and the source-specific configuration file need to be read.
  • Multiple database support: You can now declare multiple databases for each data source and attach them to different tracks. This allows you to add and remove genome annotation data sets far more easily than in earlier versions.
  • Slave renderer support: If you have a multi-CPU processor, or access to several machines, you can distribute the tasks of reading the databases and rendering tracks across multiple processes and machines via a series of "slave" renderers. This greatly increases performance.

This document describes how to install and configure GBrowse 2.0 on your system. Readers familiar with GBrowse 1.70 or earlier should start with the next section, which is a quick summary of what is different. Readers who have not installed or configured GBrowse before should skip to GBrowse Installation.

For Users of GBrowse 1.X

GBrowse 2.0 is largely backward compatible with GBrowse 1.X, but you will need to do some modest work in order to port existing sources to the new system. This section tells you what you need to know.

Apache Environment Variables

GBrowse 1.X found the location of its configuration files by consulting a hard-coded variable located in the CGI script itself. This made it hard to move the configuration files around. In contrast, GBrowse 2.0 finds its configuration directory by consulting an environment variable named GBROWSE_CONF that is set by Apache. You must add a 'SetEnv directive in the Apache configuration file in order to create this variable and pass it through. Usually this directive will be located in the "cgi-bin" <Directory> section as follows:

 <Directory /usr/lib/cgi-bin>
   SetEnv GBROWSE_CONF /etc/GBrowse2
   ... # other stuff # ...
 </Directory>

Other environment variables that can be set in the Apache configuration file include:

GBROWSE_DOCS
Location of GBrowse's static HTML files and images in the file system (e.g. "/var/www/gbrowse2")
GBROWSE_ROOT
Location of GBrowse's static HTML files and images in URL space (e.g. "/gbrowse2")
GBROWSE_MASTER
Name of the GBrowse master configuration file located in the configuration directory, "GBrowse.conf" by default.
PERL5LIB
Colon-delimited list of directories to search for Perl modules. Useful if some modules, such as bioperl, are installed in non-standard locations.

The Build script will guide you through selecting most of these options when you run "./Build config". You can then create a suitable fragment of Apache configuration file code to cut and paste into its configuration file by running ./Build apache_config.

GBrowse.conf and Data Source Config Files

In GBrowse 1.X, each data source had its own configuration file. However, many or most of the options in each file, such as file paths, stylesheets, and header/footer options, were the same, causing config file bloat. In GBrowse 2.0, all common configuration options have been moved into a master configuration file, usually located at /etc/GBrowse2/GBrowse.conf.

GBrowse.conf contains a [GENERAL] stanza that sets such options as the location of the data-specific configuration files, static HTML, Javascript and CSS files, timeouts, session settings and global appearance settings. It also contains one or more data-source stanzas, one for each species (or genome annotation release) you want to make available for browsing. Each data-source specific stanza looks like this:

 [datasource]
 description = This is a description
 path        = datasource.conf

The description appears in the pop-up menu that allows users to select the genome to browser. The path specifies the path to the configuration file for that data source. The Build process installs an example GBrowse.conf for you, so you can see how this is done.

Each data-source specific configuration file also has a [GENERAL] stanza. Options in this stanza supplement or override settings in GBrowse.conf. Usually there will be only a very few options in this stanza. Following this there is a [TRACK DEFAULTS] stanza that sets default options for tracks, followed by a series of [TRACK_NAME] stanzas for configuring individual tracks.

To migrate your GBrowse 1.X configuration files to 2.0, simply customize the [GENERAL] section of the new GBrowse.conf file to meet your needs, and then create a [datasource] section that points to each of your existing GBrowse 1.X config files. In most cases, these config files will work as is. Later, you may wish to consolidate redundant options that are shared among your config files in order to simplify maintenance.

Specifying Databases

In GBrowse 1.X each data source could be attached to one and only one database. In GBrowse 2.0, you can declare as many databases as you like, and attach them to one or more tracks. The syntax is simple. Somewhere in the data source configuration file (suggested: between [GENERAL] and the track stanzas) declare one or more [name:database] stanzas. For example:

 [volvox_genbank:database]
 db_adaptor    = Bio::DB::SeqFeature::Store
 db_args       = -adaptor memory
                 -dir    /usr/share/databases/volvox_gb_mirror

 [volvox_ncRNA:database]
 db_adaptor   = Bio::DB::SeqFeature::Store
 db_args      = -adaptor DBI::mysql
                -dsn     volvox_ncRNA

This declares two databases, one named "volvox_genbank" and the other "volvox_local". You then assign these to the tracks as follows:

 [GENES]
 database = volvox_genbank
 feature  = gene:genbank
 ... etc...

 [miRNAs]
 database = volvox_ncRNA
 feature  = miRNA
 ... etc...

The default database is specified in the [GENERAL] or [TRACK DEFAULTS] section, with the latter taking precedence over the former:

 [GENERAL]
 database = volvox_genbank   # this will be the default
 ... etc...

For backward compatibility, you can forego the [:database] sections entirely and just place db_adaptor and db_args options directly in the [GENERAL] and/or [TRACK] stanzas. The system will do its best to minimize the amount of redundancy and uniqueify the databases.

Specifying Rendering Slaves

GBrowse 2.0 supports rendering slaves, which are small network-based servers that receive track rendering requests from the GBrowse server and generate the text and graphics needed for a track. By judiciously spreading out the work among multiple slaves, you can speed up rendering considerably. On multiprocessor systems, there is also an advantage to having one or more rendering slaves running on the local host.

To attach a rendering slave to a track, add the remote renderer option, giving the host and port of the slave in URL format:

 [GENES]
 feature  = gene:genbank
 remote renderer = http://node22.serverfarm.org:1800
 ... etc...

 [miRNAs]
 database = volvox_ncRNA
 feature  = miRNA
 remote renderer = http://node23.serverfarm.org:1800

The database and remote renderer options are independent of each other, and can be mixed and matched according to your needs. See Running GBrowse 2.0 Rendering Slaves for more information on setting up renderers.

GBrowse Installation

First, save the distribution archive (GBrowse-2.00.tar.gz) to a convenient place, and then unpack it using the tar and/or gunzip programs. In this and all subsequent examples, the commands you type are in bold.

% tar zxvf ~/projects/Generic-Genome-Browser-Trunk/GBrowse-2.00.tar.gz
GBrowse-2.00/
GBrowse-2.00/t/
GBrowse-2.00/t/05.deferredrendering.t
GBrowse-2.00/t/06.featuresearch.t
GBrowse-2.00/t/01yeast.t
GBrowse-2.00/t/07.karotype.t

Next, enter the newly-created directory and run the Build.PL script:

 % '''perl Build.PL'''
 Checking whether your kit is complete...
 Looks good

 Checking prerequisites...
 Looks good

 cc -I/usr/lib/perl/5.8/CORE -fPIC -c -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O2 -o /tmp/compilet.o /tmp/compilet.c
 cc -shared -L/usr/local/lib -o /tmp/compilet.so /tmp/compilet.o

 Creating new 'Build' script for 'GBrowse' version '2.00'
 Now run:
  ./Build test
  ./Build config
  ./Build demo          (optional)
  ./Build install       (as superuser/administrator)
        -or-
  ./Build install_slave (optional, for slave installations)

Note: BioPerl issues

One prerequisite that is not available from CPAN is BioPerl. GBrowse requires version 1.5.2 of BioPerl, but so far only version 1.4 has been released to CPAN. You will need to go to the BioPerl download page, and unpack and install BioPerl manually. Because 1.5.2 is officially a developer release, you may not wish to install it into your system files. In this case, you can simply unpack it into some convenient location (your home directory or elsewhere), and configure GBrowse to find it by invoking Build.PL this way:

% perl -I /home/fred/packages/bioperl-1.5.2 Build.PL

Replace /home/fred/packages/bioperl-1.5.2 with the path to the unpacked BioPerl distribution. Henceforth, GBrowse and all its support scripts will know to look in this directory to find BioPerl. Once you choose a location for BioPerl, do not move or rename it.

Among other things, this script will check for missing Perl libraries that you need to run GBrowse, and will tell you about them. Please be sure to install all prerequisites before going any further. All but one of the prerequisites can be downloaded from CPAN or installed from the command line using the CPAN shell. There are also Debian and RPM installer packages for these libraries. See GBrowse 2.0 Prerequisites for full details.

This will create a script named Build in the current directory. You will now use Build to test, configure and install GBrowse. First you will confirm that GBrowse's libraries are completely functional by running ./Build test (the "./" is there to ensure that you are running the Build script in the current directory, and not some other Build script somewhere in your path:

 % '''./Build test'''
 t/01yeast...................ok
 t/02.rearchitecture.........ok
 t/03.render.................ok
 ...
 All tests successful.
 Files=8, Tests=323, 12 wallclock secs
 Result: PASS

All tests should pass (you may safely ignore warnings and occasional timeout errors). If not, please file a bug report, and/or send an inquiry to the GBrowse mailing list.

After passings its tests, you should configure GBrowse by running ./Build config:

 % '''./Build config'''

 **Beginning interactive configuration**
 Apache loadable module directory (for demo)? [/usr/lib/apache2/modules]
 Apache CGI scripts directory? [/usr/lib/cgi-bin]
 Directory for GBrowse's config and support files? [/etc/GBrowse2]
 Directory for GBrowse's static images & HTML files? [/var/www/gbrowse2]
 Internet port to run demo web site on (for demo)? [8000]
 User account under which Apache daemon runs? [www-data]

 **Interactive configuration done. Run './Build reconfig' to reconfigure**

The configuration process will ask you to confirm six site-specific configuration options, and will do its best to guess for you. In most cases you can just hit return to accept the default. If you specify a directory that does not exist, the system will ask you to confirm that this is what you mean. If you press yes, the directory (and all its needed parents) will be created at install time.

The configuration options are:

apachemodules
The directory in which Apache's loadable modules are located. This is only needed to run a demo GBrowse site before formal installation. If you do not know the location of this directory and you do not want to run the demo, you can safely ignore this option.
cgibin
The directory in which Apache's executable CGI scripts are located. This directory is set up for you when Apache is installed, and you must have the path correct in order for Build to install GBrowse's CGI scripts in the right place.
conf
The location of GBrowse's configuration files. The default is to place them in the "GBrowse2" subdirectory of /etc. This is where you will go to customize GBrowse and add new data sources.
htdocs
The directory in which to install GBrowse's Javascript libraries, static HTML pages and stylesheets. You can choose any location for this directory and it will be added to Apache's document tree. The default is to place the directory under the default document tree.
portdemo
The internet port on which the demo will run. The demo launches a new instance of Apache running under your user privileges. Because port 80 will usually already be taken by the system Apache, Build will choose an unused port like 8000 or 8080. You may manually select the port if you wish.
wwwuser
The account under which the system Apache runs, often "nobody", "www-data" or "httpd." If you do not know, you can find out by running ps aux | grep -i apache on a system that has Apache already running. The first column of the output contains the name of the user account.

Once you have configured GBrowse, the values you chose will stick until you run ./Build reconfig. You can also bypass interactive configuration by directly passing Build.PL one or more of the configuration options on the command line:

perl Build.PL --apachemodules=/usr/local/share/apache/libexec \
              --cgibin=/var/lib/cgi \
              --conf=/etc/gbrowse \
              --htdocs=/usr/local/docs \
              --portdemo=9000 \
              --wwwuser=fred

The options passed on the command line will become the defaults for ./Build config will be used during installation, and will also become the defaults if you later run ./Build config or ./Build reconfig.

Before you install GBrowse, you may wish to run its demo. This will attempt to launch a correctly configured instance of Apache running under your own account. To launch the demo, run ./Build demo:

% ./Build demo
Demo is now running on http://localhost:8000
Run "./Build demostop" to stop it.

You can now point your web browser at http://localhost:8000 (or whatever the build message specifies), and interact with the GBrowse web site, browse sample genomes and otherwise test the waters. When you are done, run ./Build demostop to stop Apache and clean up.

To install GBrowse and its support files permanently, run ./Build install. This will copy GBrowse's library and support files into the locations chosen during configuration.

The final step is to reconfigure Apache itself to run GBrowse. There is a small set of configuration options that need to be cut and paste into Apache's configuration file. To generate these options, run ./Build apache_conf. This will print a set of Apache configuration options to the screen:

% ./Build apache_conf

INSTRUCTIONS: Cut this where indicated and paste it into your Apache
configuration file. You may wish to save it separately and include it
using the Apache "Include /path/to/file" directive.

===>>> cut here <<<===
Alias "/gbrowse2/" "/var/www/gbrowse2/"

<Location "/gbrowse2/">
  Options Indexes FollowSymLinks MultiViews
</Location>


<Directory "/usr/lib/cgi-bin">
  SetEnv GBROWSE_MASTER GBrowse.conf
  SetEnv GBROWSE_CONF   "/etc/GBrowse2"
  SetEnv GBROWSE_DOCS   "/var/www/gbrowse2"
  SetEnv GBROWSE_ROOT   "/gbrowse2"
  SetEnv PERL5LIB /home/fred/packages/bioperl-1.5.2
  AllowOverride None
  Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch
  Order allow,deny
  Allow from all
</Directory>
===>>> cut here <<<===

You should now cut out the indicated section and paste it into the appropriate Apache configuration file. Typically this is a file named httpd.conf or apache.conf located in /etc/apache, /etc/httpd or /usr/local/apache/conf (linux and MacOSX), or C:/Program Files/Apache Software Foundation/Apache2*/conf under Windows. Because httpd.conf gets updated by vendors on a regular basis, you may wish to put the GBrowse-specific directives in a separate configuration file and then point Apache at them by placing something like the followint at the bottom of httpd.conf:

 # at the bottom of httpd.conf
 Include /path/to/gbrowse_directives.conf

You can now restart Apache using the appropriate administration script.

Customizing GBrowse

GBrowse's options and controlled by two types of configuration file. GBrowse.conf contains site-specific options that apply to all data sources. One or more Data source-specific" configuration files define the options needed to create a specific genome browser.