WebApollo Installation

From GMOD
Revision as of 18:52, 5 March 2013 by Nathan (Talk | contribs)

Jump to: navigation, search

Introduction

This guide will walk you through the server side installation for WebApollo. WebApollo is a web-based application, so the only client side requirement is a web browser. Note that WebApollo has only been tested on Chrome, Firefox, and Safari. It has not been tested with Internet Explorer.

Installation

You can download the latest WebApollo release here. All installation steps will be done through a shell. We'll be using Tomcat 7 as our servlet container and PostgreSQL as our relational database management system. We'll use sample data from the Pythium ultimum genome, provided as a separate download.

Server operating system

Any Unix like system (e.g., Unix, Linux, Mac OS X)

Prerequisites

Conventions

This guide will use the following conventions to make it more concise (you might want to keep these convention definitions handy so that you can easily reference them as you go through this guide):

  • $WEB_APOLLO_DIR
    • Location where the tarball was uncompressed and will include WebApollo-RELEASE_DATE (e.g., ~/webapollo/WebApollo-2012-10-08)
  • $WEB_APOLLO_SAMPLE_DIR
    • Location where the sample tarball was uncompressed (e.g., ~/webapollo/webapollo_sample)
  • $WEB_APOLLO_DATA_DIR
    • Location for WebApollo annotations (e.g., /data/webapollo/annotations)
  • $JBROWSE_DATA_DIR
    • Location for JBrowse data (e.g., /data/webapollo/jbrowse/data)
  • $TOMCAT_LIB_DIR
    • Location where Tomcat libs are installed (e.g., /usr/share/tomcat7/lib)
  • $TOMCAT_CONF_DIR
    • Location where Tomcat configuration is installed (e.g., /etc/tomcat7/conf)
  • $TOMCAT_WEBAPPS_DIR
    • Location where deployed servlets for Tomcat go (e.g., /var/lib/tomcat7/webapps)
  • $BLAT_DIR
    • Location where the Blat binaries are installed (e.g., /usr/local/bin)
  • $BLAT_TMP_DIR
    • Location for temporary Blat files (e.g., /data/webapollo/blat/tmp)
  • $BLAT_DATABASE
    • Location for the Blat database (e.g., /data/webapollo/blat/db/pyu.2bit)

The Tomcat related paths are the ones used by default in Ubuntu 12.04 and Ubuntu's provided Tomcat7 package. Paths will likely be different in your system depending on how Tomcat was installed.

Installation

Uncompress the WebApollo.tgz tarball.

$ tar -xvzf WebApollo-RELEASE_DATE.tgz

Authentication

Edit pg_hba.conf and add the following line:

local    web_apollo_users    webapollo_users_admin     md5

User database

WebApollo uses a database to determine who can access and edit annotations for a given sequence.

First we’ll need to create a database. You can call it whatever you want (remember the name as you’ll need to point the configuration to it). For the purposes of this guide, we’ll call it web_apollo_users You might want to create a separate account to manage the database. We’ll have the user web_apollo_users_admin with password web_apollo_users_admin who has database creation privilege. Depending on how your database server is setup, you might not need to set a password for the user. See the PostgreSQL documentation for more information. We'll assume that the database is in the same server where WebApollo is being installed ("localhost").

$ sudo su postgres
$ createuser -P web_apollo_users_admin
Enter password for new role: 
Enter it again: 
Shall the new role be a superuser? (y/n) n
Shall the new role be allowed to create databases? (y/n) y
Shall the new role be allowed to create more new roles? (y/n) n

Next we'll create the user database.

$ createdb -U web_apollo_users_admin web_apollo_users

Now that the database is created, we need to load the schema to it.

$ cd $WEB_APOLLO_DIR/tools/user
$ psql -U web_apollo_users_admin web_apollo_users < user_database_postgresql.sql

Now the user database has been setup.

Let's populate the database.

First we’ll create an user with access to WebApollo. We’ll use the add_user.pl script in $WEB_APOLLO_DIR/tools/user. Let’s create an user named web_apollo_admin with the password web_apollo_admin.

$ ./add_user.pl -D web_apollo_users -U web_apollo_users_admin -P web_apollo_users_admin \
-u web_apollo_admin -p web_apollo_admin

Next we’ll add the annotation tracks ids for the genomic sequences for our organism. We’ll use the add_tracks.pl script in the same directory. We need to generate a file of genomic sequence ids for the script. For convenience, there’s a script called extract_seqids_from_fasta.pl in the same directory which will go through a FASTA file and extract all the ids from the deflines. Let’s first create the list of genomic sequence ids. We'll store it in ~/scratch/seqids.txt. We’ll want to add the prefix “Annotations-” to each identifier.

$ mkdir ~/scratch
$ ./extract_seqids_from_fasta.pl -p Annotations- -i $WEB_APOLLO_SAMPLE_DIR/scf1117875582023.fa \
-o ~/scratch/seqids.txt

Now we’ll add those ids to the user database.

$ ./add_tracks.pl -D web_apollo_users -U web_apollo_users_admin -P web_apollo_users_admin \
-t ~/scratch/seqids.txt

Now that we have an user created and the annotation track ids loaded, we’ll need to give the user permissions to access the sequence. We’ll have the all permissions (read, write, publish, user manager). We’ll use the set_track_permissions.pl script in the same directory. We’ll need to provide the script a list of genomic sequence ids, like in the previous step.

$ ./set_track_permissions.pl -D web_apollo_users -U web_apollo_users_admin \
-P web_apollo_users_admin -u web_apollo_admin -t ~/scratch/seqids.txt -a

We’re all done setting up the user database.

Note that we’re only using a subset of the options for all the scripts mentioned above. You can get more detailed information on any given script (and other available options) using the “-h” or “--help” flag when running the script.

Deploying the servlet

Depending on how Tomcat was setup on your server, you might need to run the following command as root.

Note that WebApollo server sends error to the client through JSON messages. Your servlet container must be configured to allow raw JSON to be sent as when errors occur. In the case of Tomcat, you'll need to configure it to use the custom valve that is provided with the WebApollo package.

$ cp $WEB_APOLLO_DIR/tomcat/custom-valves.jar $TOMCAT_LIB_DIR

You'll then need to add errorReportValveClass="org.bbop.apollo.web.ErrorReportValve" as an attribute to the <Host> element in $TOMCAT_CONF_DIR/server.xml

<Host appBase="webapps" autoDeploy="true" name="localhost" unpackWARs="true" errorReportValveClass="org.bbop.apollo.web.ErrorReportValve">

We need to deploy the WAR file in the war directory from the unpacked tarball.

$ cd $TOMCAT_WEBAPPS_DIR

Next we need to create the directory that will contain the application.

$ mkdir WebApollo

Now we'll go into the newly created directory and unpack the WAR file into it.

$ cd WebApollo
$ jar -xvf $WEB_APOLLO_DIR/war/WebApollo.war

That’s it! We’re done installing WebApollo. Now we need to move on to configuring the application.

Configuration

Most configuration files will reside in $TOMCAT_WEBAPPS_DIR/WebApollo/config. We’ll need to configure a number of things before we can get WebApollo up and running.

Main configuration

The main configuration is stored in $TOMCAT_WEBAPPS_DIR/WebApollo/config/config.xml. Let’s take a look at the file.

<?xml version="1.0" encoding="UTF-8"?>
<server_configuration>
 
	<!-- mapping configuration for GBOL data structures -->
	<gbol_mapping>/config/mapping.xml</gbol_mapping>
 
	<!-- directory where JE database will be created -->
	<datastore_directory>ENTER_DATASTORE_DIRECTORY_HERE</datastore_directory>
 
	<!-- minimum size for introns created -->
	<default_minimum_intron_size>1</default_minimum_intron_size>
 
	<!-- size of history for each feature - setting to 0 means unlimited history -->
	<history_size>0</history_size>
 
	<!-- overlapping strategy for adding transcripts to genes -->
	<overlapper_class>org.bbop.apollo.web.overlap.OrfOverlapper</overlapper_class>
 
	<!-- class for comparing track names (used for sorting in lists) -->
	<track_name_comparator_class>org.bbop.apollo.web.track.DefaultTrackNameComparator</track_name_comparator_class>
 
	<!-- user authentication/permission configuration -->
	<user>
 
		<!-- database configuration -->
		<database>
 
			<!-- driver for user database -->
			<driver>org.postgresql.Driver</driver>
 
			<!-- JDBC URL for user database -->
			<url>ENTER_USER_DATABASE_JDBC_URL</url>
 
			<!-- username for user database -->
			<username>ENTER_USER_DATABASE_USERNAME</username>
 
			<!-- password for user database -->
			<password>ENTER_USER_DATABASE_PASSWORD</password>
 
		</database>
 
		<!-- class for generating user authentication page
			(login page) -->
		<authentication_class>org.bbop.apollo.web.user.localdb.LocalDbUserAuthentication</authentication_class>
 
	</user>
 
	<tracks>
 
		<!-- path to JBrowse refSeqs.json file -->
		<refseqs>ENTER_PATH_TO_REFSEQS_JSON_FILE</refseqs>
 
		<!-- annotation track name the current convention is to append
			the genomic region id to the the name of the annotation track
			e.g., if the annotation track is called "Annotations" and the
			genomic region is chr2L, the track name will be
			"Annotations-chr2L".-->
		<annotation_track_name>Annotations</annotation_track_name>
 
	 	<!-- organism being annotated -->
		<organism>ENTER_ORGANISM</organism>
 
		<!-- CV term for the genomic sequences - should be in the form
			of "CV:term".  This applies to all sequences -->
		<sequence_type>ENTER_CVTERM_FOR_SEQUENCE</sequence_type>
 
	</tracks>
 
	<!-- path to file containing canned comments XML -->
	<canned_comments>/config/canned_comments.xml</canned_comments>
 
	<!-- tools to be used for sequence searching.  This is optional.
		If this is not setup, WebApollo will not have sequence search support -->
	<sequence_search_tools>
 
		<!-- one <sequence_search_tool> element per tool -->
		<sequence_search_tool>
 
			<!-- display name for the search tool -->
			<key>BLAT nucleotide</key>
 
			<!-- class for handling search -->
			<class>org.bbop.apollo.tools.seq.search.blat.BlatCommandLineNucleotideToNucleotide</class>
 
			<!-- configuration for search tool -->
			<config>/config/blat_config.xml</config>
 
		</sequence_search_tool>
 
		<sequence_search_tool>
 
			<!-- display name for the search tool -->
			<key>BLAT protein</key>
 
			<!-- class for handling search -->
			<class>org.bbop.apollo.tools.seq.search.blat.BlatCommandLineProteinToNucleotide</class>
 
			<!-- configuration for search tool -->
			<config>/config/blat_config.xml</config>
 
		</sequence_search_tool>
 
	</sequence_search_tools>
 
	<!-- data adapters for writing annotation data to different formats.
		These will be used to dynamically generate data adapters within
		WebApollo.  This is optional.  -->
	<data_adapters>
 
		<!-- one <data_adapter> element per data adapter -->
		<data_adapter>
 
			<!-- display name for data adapter -->
			<key>GFF3</key>
 
			<!-- class for data adapter plugin -->
			<class>org.bbop.apollo.web.dataadapter.gff3.Gff3DataAdapter</class>
 
			<!-- required permission for using data adapter
			available options are: read, write, publish -->
			<permission>read</permission>
 
			<!-- configuration file for data adapter -->
 			<config>/config/gff3_config.xml</config>
 
			<!-- options to be passed to data adapter -->
			<options>output=file&amp;format=gzip</options>
 
		</data_adapter>
 
		<data_adapter>
 
			<!-- display name for data adapter -->
			<key>Chado</key>
 
			<!-- class for data adapter plugin -->
			<class>org.bbop.apollo.web.dataadapter.chado.ChadoDataAdapter</class>
 
			<!-- required permission for using data adapter
			available options are: read, write, publish -->
			<permission>publish</permission>
 
			<!-- configuration file for data adapter -->
			<config>/config/chado_config.xml</config>
 
			<!-- options to be passed to data adapter -->
			<options>display_features=false</options>
 
		</data_adapter>
 
	</data_adapters>
 
</server_configuration>

Let’s look through each element in more detail with values filled in.

<!-- mapping configuration for GBOL data structures -->
<gbol_mapping>/config/mapping.xml</gbol_mapping>

File that contains type mappings used by the underlying data model. It’s best not to change the default option.

<!-- directory where JE database will be created -->
<datastore_directory>$WEB_APOLLO_DATA_DIR</datastore_directory>

Directory where user generated annotations will be stored. The data is stored using Berkeley DB.

<!-- minimum size for introns created -->
<default_minimum_intron_size>1</default_minimum_intron_size>

Minimum length of intron to be created when using the “Make intron” operation. The operation will try to make the shortest intron that’s at least as long as this parameter. So if you set it to a value of “40”, then all calculated introns will be at least of length 40.

<!-- size of history for each feature - setting to 0 means unlimited history -->
<history_size>0</history_size>

The size of your history stack, meaning how many “Undo/Redo” steps you can do. The larger the number, the larger the storage space needed. Setting it to “0” makes it to that there’s no limit.

<!-- overlapping strategy for adding transcripts to genes -->
<overlapper_class>org.bbop.apollo.web.overlap.OrfOverlapper</overlapper_class>

Defines the strategy to be used for deciding whether overlapping transcripts should be considered splice variants to the same gene. This points to a Java class implementing the org.bbop.apollo.overlap.Overlapper interface. This allows you to create your own custom overlapping strategy should the need arise. Currently available options are:

  • org.bbop.apollo.web.overlap.NoOverlapper
    • No transcripts should be considered splice variants, regardless of overlap.
  • org.bbop.apollo.web.overlap.SimpleOverlapper
    • Any overlapping of transcripts will cause them to be part of the same gene
  • org.bbop.apollo.web.overlap.OrfOverlapper
    • Only transcripts that overlap within the coding region and within frame are considered part of the same gene
<!-- class for comparing track names (used for sorting in lists) -->
<track_name_comparator_class>org.bbop.apollo.web.track.DefaultTrackNameComparator</track_name_comparator_class>

Defines how to compare genomic sequence names for sorting purposes in the genomic region selection list. Points to a class implementing the org.bbop.apollo.web.track.TrackNameComparator interface. You can implement your own class to allow whatever sorting you’d like for your own organism. This doesn't make much of a difference in our case since we're only dealing with one genomic region. The only available implementation is:

  • org.bbop.apollo.web.track.DefaultTrackNameComparator
    • Sorts genomic sequence names lexicographically

Let’s take look at the user element, which handles configuration for user authentication and permission handling.

<!-- user authentication/permission configuration -->
<user>
 
	<!-- database configuration -->
	<database>
 
		<!-- driver for user database -->
		<driver>org.postgresql.Driver</driver>
 
		<!-- JDBC URL for user database -->
		<url>ENTER_USER_DATABASE_JDBC_URL</url>
 
		<!-- username for user database -->
		<username>ENTER_USER_DATABASE_USERNAME</username>
 
		<!-- password for user database -->
		<password>ENTER_USER_DATABASE_PASSWORD</password>
 
	</database>
 
	<!-- class for generating user authentication page (login page) -->
	<authentication_class>org.bbop.apollo.web.user.localdb.LocalDbUserAuthentication</authentication_class>
 
</user>

Let’s first look at the database element that defines the database that will handle user permissions (which we created previously).

<!-- driver for user database -->
<driver>org.postgresql.Driver</driver>

This should point the JDBC driver for communicating with the database. We’re using a PostgreSQL driver since that’s the database we’re using for user permission management.

<!-- JDBC URL for user database -->
<url>jdbc:postgresql://localhost/web_apollo_users</url>

JDBC URL to the user permission database. We'll use jdbc:postgresql://localhost/web_apollo_users since the database is running in the same server as the annotation editing engine and we named the database web_apollo_users.

<!-- username for user database -->
<username>web_apollo_users_admin</username>

User name that has read/write access to the user database. The user with access to the user database has the user name web_apollo_users_admin.

<!-- password for user database -->
<password>web_apollo_users_admin</password>

Password to access user database. The user with access to the user database has the password </tt>web_apollo_users_admin</tt>.

Now let’s look at the other elements in the user element.

<!-- class for generating user authentication page (login page) -->
<authentication_class>org.bbop.apollo.web.user.localdb.LocalDbUserAuthentication</authentication_class>

Defines how user authentication is handled. This points to a class implementing the org.bbop.apollo.web.user.UserAuthentication interface. This allows you to implement any type of authentication you’d like (e.g., LDAP). Currently available options are:

  • org.bbop.apollo.web.user.localdb.LocalDbUserAuthentication
    • Uses the user permission database to also store authentication information, meaning it stores user passwords in the database
  • org.bbop.apollo.web.user.browserid.BrowserIdUserAuthentication
    • Uses Mozilla’s BrowserID service for authentication. This has the benefits of offloading all authentication security to Mozilla and allows one account to have access to multiple resources (as long as they have BrowserID support). Being that the service is provided through Mozilla, it will require users to create a BrowserID account

Now let’s look at the configuration for accessing the annotation tracks for the genomic sequences.

<tracks>
 
	<!-- path to JBrowse refSeqs.json file -->
	<refseqs>ENTER_PATH_TO_REFSEQS_JSON_FILE</refseqs>
 
	<!-- annotation track name the current convention is to append
		the genomic region id to the the name of the annotation track
		e.g., if the annotation track is called "Annotations" and the
		genomic region is chr2L, the track name will be
		"Annotations-chr2L".-->
	<annotation_track_name>Annotations</annotation_track_name>
 
	<!-- organism being annotated -->
	<organism>ENTER_ORGANISM</organism>
 
	<!-- CV term for the genomic sequences - should be in the form
		of "CV:term".  This applies to all sequences -->
	<sequence_type>ENTER_CVTERM_FOR_SEQUENCE</sequence_type>
 
</tracks>

Let’s look at each element individually.

<!-- path to JBrowse refSeqs.json file -->
<refseqs>$TOMCAT_WEBAPPS_DIR/WebApollo/jbrowse/data/refSeqs.json</refseqs>

Location where the refSeqs.json file resides, which is created from the data generation pipeline (see the data generation section). By default, the JBrowse data needs to reside in $TOMCAT_WEBAPPS_DIR/WebApollo/jbrowse/data. If you want the data to reside elsewhere, you’ll need to do configure your servlet container to handle the appropriate alias to jbrowse/data or symlink the data directory to somewhere else. WebApollo is pre-configured to allow symlinks.

<annotation_track_name>Annotations</annotation_track_name>

Name of the annotation track. Leave it as the default value of Annotations.

<!-- organism being annotated -->
<organism>Pythium ultimum</organism>

Scientific name of the organism being annotated (genus and species). We're annotating Pythium ultimum.

<!-- CV term for the genomic sequences - should be in the form
	of "CV:term".  This applies to all sequences -->
<sequence_type>sequence:contig</sequence_type>

The type for the genomic sequences. Should be in the form of CV:term. Our genomic sequences are of the type sequence:contig.

<!-- path to file containing canned comments XML -->
<canned_comments>/config/canned_comments.xml</canned_comments>

File that contains canned comments (predefined comments that will be available from a pull-down menu when creating comments). It’s best not to change the default option. See the canned comments section for details on configuring canned comments.

<!-- tools to be used for sequence searching.  This is optional.
	If this is not setup, WebApollo will not have sequence search support -->
<sequence_search_tools>
 
	<!-- one <sequence_search_tool> element per tool -->
	<sequence_search_tool>
 
		<!-- display name for the search tool -->
		<key>BLAT nucleotide</key>
 
		<!-- class for handling search -->
		<class>org.bbop.apollo.tools.seq.search.blat.BlatCommandLineNucleotideToNucleotide</class>
 
		<!-- configuration for search tool -->
		<config>/config/blat_config.xml</config>
 
	</sequence_search_tool>
 
	<sequence_search_tool>
 
		<!-- display name for the search tool -->
		<key>BLAT protein</key>
 
		<!-- class for handling search -->
		<class>org.bbop.apollo.tools.seq.search.blat.BlatCommandLineProteinToNucleotide</class>
 
		<!-- configuration for search tool -->
		<config>/config/blat_config.xml</config>
 
	</sequence_search_tool>
 
</sequence_search_tools>

Here’s the configuration for sequence search tools (allows searching your genomic sequences). WebApollo does not implement any search algorithms, but instead relies on different tools and resources to handle searching (this provides much more flexible search options). This is optional. If it’s not configured, WebApollo will not have sequence search support. You'll need one sequence_search_tool element per search tool. Let's look at the element in more detail.

<!-- display name for the search tool -->
<key>BLAT nucleotide</key>

This is a string that will be used for the display name for the search tool, in the pull down menu that provides search selection for the user.

<!-- class for handling search -->
<class>org.bbop.apollo.tools.seq.search.blat.BlatCommandLineNucleotideToNucleotide</class>

Should point to the class that will handle the search request. Searching is handled by classes that implement the org.bbop.apollo.tools.seq.search.SequenceSearchTool interface. This allows you to add support for your own favorite search tools (or resources). We currently only have support for command line Blat, in the following flavors:

  • org.bbop.apollo.tools.seq.search.blat.BlatCommandLineNucleotideToNucleotide
    • Blat search for a nucleotide query against a nucleotide database
  • org.bbop.apollo.tools.seq.search.blat.BlatCommandLineProteinToNucleotide
    • Blat search for a protein query against a nucleotide database
<!-- configuration for search tool -->
<config>/config/blat_config.xml</config>

File that contains the configuration for the searching plugin chosen. If you’re using Blat, you should not change this. If you’re using your own plugin, you’ll want to point this to the right configuration file (which will be dependent on your plugin). See the Blat section for details on configuring WebApollo to use Blat.

<!-- data adapters for writing annotation data to different formats.
	These will be used to dynamically generate data adapters within
	WebApollo.  This is optional.  -->
<data_adapters>
 
	<!-- one <data_adapter> element per data adapter -->
	<data_adapter>
 
		<!-- display name for data adapter -->
		<key>GFF3</key>
 
		<!-- class for data adapter plugin -->
		<class>org.bbop.apollo.web.dataadapter.gff3.Gff3DataAdapter</class>
 
		<!-- required permission for using data adapter
		available options are: read, write, publish -->
		<permission>read</permission>
 
		<!-- configuration file for data adapter -->
 		<config>/config/gff3_config.xml</config>
 
		<!-- options to be passed to data adapter -->
		<options>output=file&amp;format=gzip</options>
 
	</data_adapter>
 
	<data_adapter>
 
		<!-- display name for data adapter -->
		<key>Chado</key>
 
		<!-- class for data adapter plugin -->
		<class>org.bbop.apollo.web.dataadapter.chado.ChadoDataAdapter</class>
 
		<!-- required permission for using data adapter
		available options are: read, write, publish -->
		<permission>publish</permission>
 
		<!-- configuration file for data adapter -->
		<config>/config/chado_config.xml</config>
 
		<!-- options to be passed to data adapter -->
		<options>display_features=false</options>
 
	</data_adapter>
 
</data_adapters>

Here’s the configuration for data adapters (allows writing annotations to different formats). This is optional. If it’s not configured, WebApollo will not have data writing support. You'll need one data_adapter element per data adapter. Let's look at the element in more detail.

<!-- display name for data adapter -->
<key>GFF3</key>

This is a string that will be used for the data adapter name, in the dynamically generated data adapters list for the user.

<!-- class for data adapter plugin -->
<class>org.bbop.apollo.web.dataadapter.gff3.Gff3DataAdapter</class>

Should point to the class that will handle the write request. Writing is handled by classes that implement the org.bbop.apollo.web.dataadapter.DataAdapter interface. This allows you to add support for writing to different formats. We currently only have support for:

  • org.bbop.apollo.web.dataadapter.gff3.Gff3DataAdapter
    • GFF3 (see the GFF3 section for details on this adapter)
  • org.bbop.apollo.web.dataadapter.chado.ChadoDataAdapter
    • Chado (see the Chado section for details on this adapter)
<!-- required permission for using data adapter
	available options are: read, write, publish -->
<permission>publish</permission>

Required user permission for accessing this data adapter. If the user does not have the required permission, it will not be available in the list of data adapters. Available permissions are read, write, and publish.

<!-- configuration for data adapter -->
<config>/config/gff3_config.xml</config>

File that contains the configuration for the data adapter plugin chosen.

<!-- options to be passed to data adapter -->
<options>output=file&amp;format=gzip</options>

Options to be passed to the data adapter. These are dependent on the data adapter.

Canned comments

You can configure a set of predefined comments that will be available for users when adding comments through a dropdown menu. The configuration is stored in /usr/local/tomcat/tomcat7/webapps/WebApollo/config/canned_comments.xml. Let’s take a look at the configuration file.

<?xml version="1.0" encoding="UTF-8"?>
 
<canned_comments>
<!-- one <comment> element per comment.
	it must contain the attribute "feature_type" that defines
	the type of feature this comment will apply to.
	must be be in the form of "CV:term" (e.g., "sequence:gene")
 
	<comment feature_type="sequence:gene">This is a comment for sequence:gene</comment>
-->
</canned_comments>

You’ll need one <comment> element for each predefined comment. The element needs to have a feature_type attribute in the form of CV:term that this comment applies to. Let’s make a few comments for feature of type sequence:gene and sequence:transcript:

<comment feature_type="sequence:gene">This is a comment for a gene</comment>
<comment feature_type="sequence:gene">This is another comment for a gene</comment>
<comment feature_type="sequence:transcript">This is a comment for a transcript</comment>

Search tools

As mentioned previously, WebApollo makes use of tools for sequence searching rather than employing its own search algorithm. The only currently supported tool is command line Blat.

Blat

You’ll need to have Blat installed and a search database with your genomic sequences available to make use of this feature. You can get documentation on the Blat command line suite of tools at BLAT Suite Program Specifications and User Guide and get information on setting up the tool in the official BLAT FAQ. The configuration is stored in $TOMCAT_WEBAPPS_DIR/WebApollo/config/blat_config.xml. Let’s take a look at the configuration file:

<?xml version="1.0" encoding="UTF-8"?>
 
<!-- configuration file for setting up command line Blat support -->
 
<blat_config>
 
	<!-- path to Blat binary →
	<blat_bin>ENTER_PATH_TO_BLAT_BINARY</blat_bin>
 
	<!-- path to where to put temporary data -->
	<tmp_dir>ENTER_PATH_FOR_TEMPORARY_DATA</tmp_dir>
 
	<!-- path to Blat database -->
	<database>ENTER_PATH_TO_BLAT_DATABASE</database>
 
	<!-- any Blat options (directly passed to Blat) e.g., -minMatch -->
	<blat_options>ENTER_ANY_BLAT_OPTIONS</blat_options>
 
</blat_config>

Let’s look at each element with values filled in.

<!-- path to Blat binary -->
<blat_bin>$BLAT_DIR/blat</blat_bin>

We need to point to the location where the Blat binary resides. For this guide, we'll assume Blat in installed in /usr/local/bin.

<!-- path to where to put temporary data -->
<tmp_dir>$BLAT_TMP_DIR</tmp_dir>

We need to point to the location where to store temporary files to be used in the Blat search. It can be set to whatever location you’d like.

<!-- path to Blat database -->
<database>$BLAT_DATABASE</database>

We need to point to the location of the search database to be used by Blat. See the Blat documentation for more information on generation search databases.

<!-- any Blat options (directly passed to Blat) e.g., -minMatch -->
<blat_options>-minScore=100 -minIdentity=60</blat_options>

Here we can configure any extra options to used by Blat. These options are passed verbatim to the program. In this example, we’re passing the -minScore parameter with a minimum score of 100 and the -minIdentity parameter with a value of 60 (60% identity). See the Blat documentation for information of all available options.

Data adapters

GFF3

The GFF3 data adapter will allow exporting the current annotations as a GFF3 file. You can get more information about the GFF3 format at The Sequence Ontology GFF3 page. The configuration is stored in $TOMCAT_WEBAPPS_DIR/WebApollo/config/gff3_config.xml. Let’s take a look at the configuration file:

<?xml version="1.0" encoding="UTF-8"?>
 
<!-- configuration file for GFF3 data adapter -->
 
<gff3_config>
 
	<!-- path to where to put generated GFF3 file.  This path is 
	relative path that will be where you deployed your 
	instance (so that it's accessible from HTTP download requests) -->
	<tmp_dir>tmp</tmp_dir>
 
</gff3_config>

There's only one element to be configured:

<tmp_dir>tmp</tmp_dir>

This is the root directory where the GFF3 files will be generated. The actual GFF3 files will be in subdirectories that are generated to prevent collisions from concurrent requests. This directory is relative to $TOMCAT_WEBAPPS_DIR/WebApollo. This is done to allow the generated GFF3 to be accessible from HTTP requests.

Note that the generated files will reside in that directory indefinitely to allow users to download them. You'll need to eventually remove those files to prevent the file system from cluttering up. There's a script that will traverse the directory and remove any files that are older than a provided time and cleanup directories as they become empty. It's recommended to setup this script as a cron job that runs hourly to remove any files older than an hour (should provide plenty of time for users to download those files). The script is in $WEB_APOLLO_DIR/tools/cleanup/remove_temporary_files.sh.

$ $WEB_APOLLO_DIR/tools/cleanup/remove_temporary_files.sh -d $TOMCAT_WEBAPPS_DIR/WebApollo/tmp -m 60

Chado

The Chado data adapter will allow writing the current annotations to a Chado database. You can get more information about the Chado at GMOD Chado page. The configuration is stored in $TOMCAT_WEBAPPS_DIR/WebApollo/config/chado_config.xml. Let’s take a look at the configuration file:

<?xml version="1.0" encoding="UTF-8"?>
 
<!-- configuration file for Chado data adapter -->
 
<chado_config>
 
	<!-- Hibernate configuration file for accessing Chado database -->
	<hibernate_config>/config/hibernate.xml</hibernate_config>
 
</chado_config>

There's only one element to be configured:

<hibernate_config>/config/hibernate.xml</hibernate_config>

This points to the Hibernate configuration for accessing the Chado database. Hibernate provides an ORM (Object Relational Mapping) for relational databases. This is used to access the Chado database. The Hibernate configuration is stored in $TOMCAT_WEBAPPS_DIR/WebApollo/config/hibernate.xml. It is quite large (as it contains a lot of mapping resources), so let's take a look at the parts of the configuration file that are of interest (near the top of the file):

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE hibernate-configuration PUBLIC
		"-//Hibernate/Hibernate Configuration DTD 3.0//EN"
		"http://hibernate.sourceforge.net/hibernate-configuration-3.0.dtd">
<hibernate-configuration>
	<session-factory name="SessionFactory">
		<property name="hibernate.connection.driver_class">org.postgresql.Driver</property>
		<property name="hibernate.connection.url">ENTER_DATABASE_CONNECTION_URL</property>
		<property name="hibernate.connection.username">ENTER_USERNAME</property>
		<property name="hibernate.connection.password">ENTER_PASSWORD</property>
 
		...
 
	</session-factory>
</hibernate-configuration>

Let's look at each element:

<property name="hibernate.connection.driver_class">org.postgresql.Driver</property>

The database driver for the RDBMS where the Chado database exists. It will most likely be PostgreSQL (as it's the officially recommended RDBMS for Chado), in which case you should leave this at its default value.

<property name="hibernate.connection.url">ENTER_DATABASE_CONNECTION_URL</property>

JDBC URL to connect to the Chado database. It will be in the form of jdbc:$RDBMS://$SERVERNAME:$PORT/$DATABASE_NAME where $RDBMS is the RDBMS used for the Chado database, $SERVERNAME is the server's name, $PORT is the database port, and $DATABASE_NAME is the database's name. Let's say we're connecting to a Chado database running on PostgreSQL on server my_server, port 5432 (PostgreSQL's default), and a database name of my_organism, the connection URL will look as follows: jdbc:postgresql://my_server:5432/my_organism.

<property name="hibernate.connection.username">ENTER_USERNAME</property>

User name used to connect to the database. This user should have write privileges to the database.

<property name="hibernate.connection.password">ENTER_PASSWORD</property>

Password for the provided user name.

Data generation

The steps for generating data (in particular static data) are mostly similar to JBrowse data generation steps, with some extra steps required. The scripts for data generation reside in $TOMCAT_WEBAPPS_DIR/WebApollo/jbrowse/bin. Let's go into WebApollo's JBrowse directory.

$ cd $TOMCAT_WEBAPPS_DIR/WebApollo/jbrowse

It will make things easier if we make sure that the scripts in the bin directory are executable.

$ chmod 755 bin/*

As mentioned previously, the data resides in the data directory by default. We can symlink $JBROWSE_DATA_DIR giving you a lot of flexibility in allowing your WebApollo instance to easily point to a new data directory.

$ ln -sf $JBROWSE_DATA_DIR data

Now that we have our data directory in JBrowse, we need to copy some files into it that are specific to WebApollo's JBrowse. We need to copy all of the contents from $WEB_APOLLO_DIR/WebApollo/json in our data directory.

$ cp $WEB_APOLLO_DIR/WebApollo/json/* data

DNA track setup

The first thing we need to do before processing our evidence is to generate the reference sequence data to be used by JBrowse. We'll use the prepare-refseqs.pl script.

$ bin/prepare-refseqs.pl --fasta $WEB_APOLLO_SAMPLE_DIR/scf1117875582023.fa

We now have the DNA track setup.

Static data generation

Generating data from GFF3 works best by having a separate GFF3 per source type. If your GFF3 has all source types in the same file, we need to split up the GFF3. We can use the split_gff_by_source.pl script in $WEB_APOLLO_DIR/tools/data to do so. We'll output the split GFF3 to some temporary directory (we'll use $WEB_APOLLO_SAMPLE_DIR/split_gff).

$ mkdir -p $WEB_APOLLO_SAMPLE_DIR/split_gff
$ $WEB_APOLLO_DIR/tools/data/split_gff_by_source.pl \
-i $WEB_APOLLO_SAMPLE_DIR/scf1117875582023.gff -d $WEB_APOLLO_SAMPLE_DIR/split_gff

If we look at the contents of $WEB_APOLLO_SAMPLE_DIR/split_gff, we can see we have the following files:

$ ls $WEB_APOLLO_SAMPLE_DIR/split_gff
blastn.gff  est2genome.gff  protein2genome.gff  repeatrunner.gff
blastx.gff  maker.gff       repeatmasker.gff    snap_masked.gff

We need to process each file and create the appropriate tracks.

(If you've previously used JBrowse, you may know that JBrowse also has an alternative approach to generating multiple static data tracks from a GFF3 file, which uses the biodb-to-json script and a configuration file. However, WebApollo is not yet compatible with that approach)

GFF3 with gene/transcript/exon/CDS/polypeptide features

We'll start off with maker.gff. We need to handle that file a bit differently than the rest of the files since the GFF represents the features as gene, transcript, exons, and CDSs.

$ bin/flatfile-to-json.pl --gff $WEB_APOLLO_SAMPLE_DIR/split_gff/maker.gff \
--arrowheadClass trellis-arrowhead --getSubfeatures \
--subfeatureClasses '{"wholeCDS": null, "CDS":"brightgreen-80pct", "UTR": "darkgreen-60pct", "exon":"container-100pct"}' \
--cssClass container-16px --type mRNA --trackLabel maker \
--webApollo --renderClassName gray-center-20pct

Note that brightgreen-80pct, darkgreen-60pct, container-100pct, container-16px, gray-center-20pct are all CSS classes defined in WebApollo stylesheets that describe how to display their respective features and subfeatures. WebApollo also tries to use reasonable default CSS styles, so it is possible to omit most of these CSS class arguments (the exception being renderClassname). For example, to accept default styles for maker.gff, the above could instead be shortened to:

$ bin/flatfile-to-json.pl --gff $WEB_APOLLO_SAMPLE_DIR/split_gff/maker.gff \
--getSubfeatures --type mRNA --trackLabel maker --webApollo --renderClassName gray-center-20pct

See the Customizing features section for more information on CSS styles. There are also many other configuration options for flatfile-to-json.pl, see JBrowse data formatting for more information.

GFF3 with match/match_part features

Now we need to process the other remaining GFF3 files. The entries in those are stored as "match/match_part", so they can all be handled in a similar fashion.

We'll start off with blastn as an example.

$ bin/flatfile-to-json.pl --gff $WEB_APOLLO_SAMPLE_DIR/split_gff/blastn.gff \
--arrowheadClass webapollo-arrowhead --getSubfeatures \
--subfeatureClasses '{"match_part": "darkblue-80pct"}' \
--cssClass container-10px --trackLabel blastn \
--webApollo --renderClassName gray-center-20pct

Again, container-10px and darkblue-80pct are CSS class names that define how to display those elements. See the Customizing features section for more information.

We need to follow the same steps for the remaining GFF3 files. It can be a bit tedious to do this for the remaining six files, so we can use a simple inline Bash shell script to help us out. Don't worry if the script doesn't make sense, you can still process each file by hand.

$ for i in $(ls $WEB_APOLLO_SAMPLE_DIR/split_gff/*.gff | grep -v maker);
do j=$(basename $i); j=${j/.gff/};
echo "Processing $j" && bin/flatfile-to-json.pl --gff $i --arrowheadClass webapollo-arrowhead \
--getSubfeatures --subfeatureClasses "{\"match_part\": \"darkblue-80pct\"}" \
--cssClass container-10px --trackLabel $j \
--webApollo --renderClassName gray-center-20pct; done

Generate searchable name index

Once data tracks have been created, you will need to generate a searchable index of names using the generate-names.pl script:

$ bin/generate-names.pl

This script creates an index of sequence names and feature names in order to enable auto-completion in the navigation text box. This index is required, so if you do not wish any of the feature tracks to be indexed for auto-completion, you can instead run generate-names.pl immediately after running prepare_refseqs.pl, but before generating other tracks.

The script can be also rerun after any additional tracks are generated if you wish feature names from that track to be added to the index.

BAM data

Now let's look how to configure BAM support. WebApollo has native support for BAM, so no extra processing of the data is required.

First we'll copy the BAM data into the WebApollo data directory. We'll put it in the data/bam directory. Keep in mind that this BAM data was randomly generated, so there's really no biological meaning to it. We only created it to show BAM support.

$ mkdir data/bam
$ cp $WEB_APOLLO_SAMPLE_DIR/*.bam* data/bam

Now we need to add the BAM track.

$ bin/add_bam_track.pl --bam_url bam/simulated-sorted.bam \ 
   --label simulated_bam --key "simulated BAM"

You should now have a simulated BAM track available.

BigWig data

WebApollo has native support for BigWig files (.bw), so no extra processing of the data is required.

Configuring a BigWig track is very similar to configuring a BAM track. First we'll copy the BigWig data into the WebApollo data directory. We'll put it in the data/bigwig directory. Keep in mind that this BigWig data was generated as a coverage map derived from the randomly generated BAM data, so like the BAM data there's really no biological meaning to it. We only created it to show BigWig support.

$ mkdir data/bigwig
$ cp $WEB_APOLLO_SAMPLE_DIR/*.bw data/bigwig

Now we need to add the BAM track.

$ bin/add_bw_track.pl --bw_url bigwig/simulated-sorted.coverage.bw \ 
  --label simulated_bw --key "simulated BigWig"

You should now have a simulated BigWig track available.


Customizing features

The visual appearance of biological features in WebApollo (and JBrowse) is handled by CSS stylesheets. Every feature and subfeature is given a default CSS "class" that matches a default CSS style in a CSS stylesheet. These styles are are defined in $TOMCAT_WEBAPPS_DIR/WebApollo/jbrowse/track_styles.css and $TOMCAT_WEBAPPS_DIR/WebApollo/jbrowse/plugins/WebApollo/css/webapollo_track_styles.css. Additional styles are also defined in these files, and can be used by explicitly specifying them in the -cssClass, --subfeatureClasses, --renderClassname, or --arrowheadClass parameters to flatfile-to-json.pl. See example above

WebApollo differs from JBrowse in some of it's styling, largely in order to help with feature selection, edge-matching, and dragging. WebApollo by default uses invisible container elements (with style class names like "container-16px") for features that have children, so that the children are fully contained within the parent feature. This is paired with another styled element that gets rendered within the feature but underneath the subfeatures, and is specified by the --renderClassname argument to flatfile-to-json.pl. Exons are also by default treated as special invisible containers, which hold styled elements for UTRs and CDS.

It is relatively easy to add other stylesheets that have custom style classes that can be used as parameters to flatfile-to-json.pl. An example is $TOMCAT_WEBAPPS_DIR/WebApollo/jbrowse/sample_data/custom_track_styles.css which contains two new styles:

.gold-90pct, 
.plus-gold-90pct, 
.minus-gold-90pct  {
    background-color: gold;
    height: 90%;
    top: 5%;
    border: 1px solid gray;
}

.dimgold-60pct, 
.plus-dimgold-60pct, 
.minus-dimgold-60pct  {
    background-color: #B39700;
    height: 60%;
    top: 20%;
}

In this example, two subfeature styles are defined, and the top property is being set to (100%-height)/2 to assure that the subfeatures are centered vertically within their parent feature. When defining new styles for features, it is important to specify rules that apply to plus-stylename and minus-stylename in addition to stylename, as WebApollo adds the "plus-" or "minus-" to the class of the feature if the the feature has a strand orientation.

You need to tell WebApollo where to find these styles. This can be done via standard CSS loading in the index.html file by adding a <link> element:

<link rel="stylesheet" type="text/css" href="sample_data/custom_track_styles.css">

Or alternatively, to avoid modifying the web application, additional CSS can be specified in the trackList.json file that is created in the data directory during static data generation, by adding a "css" property to the JSON data:

   "css" : "sample_data/custom_track_styles.css" 

Then these new styles can be used as arguments to flatfile-to-json.pl, for example:

bin/flatfile-to-json.pl --gff $WEB_APOLLO_SAMPLE_DIR/split_gff/maker.gff 
--getSubfeatures --type mRNA --trackLabel maker --webApollo 
--subfeatureClasses '{"CDS":"gold-90pct", "UTR": "dimgold-60pct"}' 

Depending on how your Tomcat server is setup, you might need to restart the server to pick up all the changes (or at least restart the WebApollo web application). You'll also need to do this any time you change the configuration files (not needed when changing the data files).

Congratulations, you're done configuring WebApollo!

Accessing your WebApollo installation

Let's test out our installation. Point your browser to http://SERVER_ADDRESS:8080/WebApollo .

WebApollo login page

The user name and password are both web_apollo_admin as we configured earlier.

WebApollo main options

Click on the Edit annotations button.

WebApollo reference sequence selection

We only see one reference sequence to annotate since we're only working with one contig. Click on the Edit button.

Now have fun annotating!!!