JBrowseDev/Main

From GMOD
Revision as of 21:34, 22 June 2011 by RSCummings (Talk | contribs)

Jump to: navigation, search

JBrowse is an interactive web application that can be used to visualize data about a single genomic sequence or a set of chromosomes. It is capable of displaying large genomes and numerous feature tracks describing those genomes (see human genome hg19).

Apart from enhanced graphics, JBrowse's major improvement over GBrowse is its use of Asynchronous JavaScript and XML (AJAX), hence the name "JBrowse" for "JavaScript Genome Browser". Through the use of AJAX web development methods, data can be obtained from a server without requiring a page reload. This implementation method allows JBrowse to offload a significant portion of computational effort from the machine that is serving JBrowse to the client machines, allowing the server to process requests from clients at a faster rate, and thus serve more clients, than a genome browser that is not implemented using AJAX.

Installation

Prerequisites

1. JBrowse requires Perl v5.10.0 or later. An installer can be downloaded from www.perl.org. If this does not work, you can download the source code and install perl through the command line.

If you had to download a new version of Perl, you will want this version to be used by default. First, find the path to the directory that contains the new perl executable. If you didn't specify this directory when you were installing perl, you should be able to find the default installation directory by looking at the documentation. Next, create (or open) the shell configuration file in your home directory:

pico $HOME/.bash_profile

Note: This is specific to Mac OS X. If you are using bash on a platform that is not OS X, add the 'export' line below to both .bash_profile and .bashrc. If you do not use bash, you will need to edit the configuration file(s) appropriate for your shell.

Insert this line into that file, substituting '<path to the newest version of perl>' with the path to the new perl executable:

export PATH=<path to the newest version of perl>:$PATH

Note: This is the command that is used in bash. Depending on your shell, you may have to use setenv instead of export.

Save this file, then close and reopen your terminal. To make sure that the newest version of perl is now being used by default, use this command:

perl --version

2. JBrowse must be unpacked into a directory that is served by Apache. Essentially, this means that the directory that the JBrowse package is installed in (placed in) must be accessible via your web browser. Directories that might be served by Apache include "/Library/WebServer/Documents" on OS X and "/var/www" on some Linux distributions. You can find more information about Apache here.

3. JBrowse requires installation of a number of perl modules. This should be done through the Comprehensive Perl Archive Network (CPAN), a large repository for shared perl modules. CPAN has an associated shell program that will allow you to download and install JBrowse's dependencies.

To access this shell program, open up a terminal and type 'perl -MCPAN -e shell'. If you have never used cpan before, you will be prompted to configure CPAN. During the configuration process, you will be asked a series of questions about your preferences. If you do not know what an option means, you can usually use the provided default argument.

If you have sufficient privileges, reopen the CPAN shell as the root user by typing 'sudo perl -MCPAN -e shell' in a terminal. You will be prompted for the root password. Once you have opened the CPAN shell as root, install BioPerl by typing 'install CJFields/BioPerl-1.6.1.tar.gz'. CPAN will automatically install BioPerl on your computer. After this is complete, install the other dependencies in the same way. Here is a complete list of what you will need to download and install from CPAN:

  • CJFields/BioPerl-1.6.1.tar.gz
  • JSON
  • JSON::XS (optional, for speed)
  • Heap::Simple
  • Heap::Simple::Perl
  • Heap::Simple::XS
  • PerlIO::gzip
  • Devel::Size

Common Issues

Restricted Permissions

If you do not have administrator privileges, you will need to install the JBrowse dependencies locally, that is, for your account only. This should be straightforward for the perl installation (simply change the install location to one that you have read and write access to), but is a bit more involved for the CPAN-installed perl modules.

To reconfigure cpan, open the CPAN shell and type 'o conf init'. You will need to provide an answer to the "Parameters for the './Build' command?" prompt. One possible answer would be:

--extra-linker-flags -L<path to home directory>/site_perl

This will cause all modules to be installed in a directory called 'site_perl' that is in your home directory. Of critical importance is that your user account has read and write access to this directory.

There is one more change to be made. Perl needs to be aware of the new installation directory for your perl modules. This involves editing the PERL5LIB variable. Open your shell configuration file with a text editor, e.g. with the command:

pico $HOME/.bash_profile

Note: This is specific to Mac OS X. If you are using bash on a platform that is not OS X, add the 'export' line below to both .bash_profile and .bashrc. If you do not use bash, you will need to edit the configuration file(s) appropriate for your shell.

and insert the line:

export PERL5LIB=$PERL5LIB:<path to home directory>/site_perl

Note: This is the command that is used in bash. Depending on your shell, you may have to use setenv instead of export.

Changes to the PERL5LIB variable will take effect in any new shell that you open. To check whether the PERL5LIB variable has been successfully changed, use the command:

echo $PERL5LIB

Missing Module or Loadable Object

You might encounter an error of the type:

Can't locate some/perl/module.pm in @INC (INC contains <list of paths>)

If you see this error, it normally means one of two things. Either (1) you have not installed the perl module, or (2) you have installed the perl module, but it is not in a directory where perl expects it to be. If the absolute path to the module's directory (try 'locate your/perl/module.pm') is not a member of the list of paths in @INC, you can either move the module and its associated files and directories to another directory that is a member of @INC, or you can add the module's current directory to @INC by appending its absolute path to the PERL5LIB variable in ~/.bash_profile (See the previous topic for instructions).

You might also encounter an error of this type:

Can't locate loadable object for some/perl/module.pm in @INC (INC contains <list of paths>)

If you encounter this error, it means that perl found the module it was looking for, but it didn't find all of the files that were associated with that module. This error often occurs when there is XS code that does not get compiled, or whose compiled object file is not in a directory that is visible to perl (i.e. in a directory that isn't listed in @INC). If you moved or copied the perl module and its associated files to a different directory, be sure that there aren't any object files that you missed. If you don't find any object files in the directory that you copied the perl module from, try reinstalling the module through the CPAN shell. Look carefully through the CPAN log, first for errors in the compilation of the XS code, and then for possible reasons for any errors (e.g. a missing library that cannot be installed through CPAN).

Installing Modules for an Older Version of Perl

It might be possible to type 'cpan' from your shell to open CPAN. However, it is recommended that you open CPAN using the perl interpreter ('perl -MCPAN -e shell'), because the shell associated with the 'cpan' command might be intended for a version of perl other than the version that you are using with JBrowse (e.g., it might install version 5.8.8 modules when the version of perl that you are actually using is 5.10.0). Modules for a different version of perl will be installed in the wrong library directory with respect to you current version of perl, and even if you move them to the correct directory, they might not be compatible with your current version of perl. If 'perl --version' and 'cpan --version' indicate the same perl version, it is fine to use 'cpan' to access the CPAN shell. Otherwise, use 'perl -MCPAN -e shell'.

Usage

JBrowse is comprised of a set of scripts that use external data sources (e.g. files on your computer) to produce additional files. If these JBrowse-generated files are present when you open JBrowse with your web browser, you will automatically see the sequence and feature tracks produced from the data they contain.

There is a particular order that should be followed when adding data to JBrowse. Reference sequences should be added first, followed by feature tracks. Once all of the tracks have been added, it is possible to make the names of each feature searchable. While there is some flexibility in this order of events (it is possible to add additional reference sequences after feature tracks have been added, for example), the first step will always be to specify a sequence or set of sequences, and the last step will always be to make the named features searchable (assuming it is desired that all feature names are searchable).

User Interface

1. Location Marker: Click and drag to move to a different genomic position.
2. Scroll Buttons: Click to scroll by a fixed amount at a given zoom level.
3. Viewing Field: Drag a track to this area to make it visible. Depending on the track, some zooming may be necessary.
4. Zoom Buttons: Click to zoom. Per click, the larger buttons zoom more than the smaller buttons.
5. Search Bar: Browse to a certain region by searching for a location or feature name.
6. Chromosome Selector: Choose which chromosome to view.
7. Hidden Tracks: Drag a track to this area to hide it.
8. Window Slider: Resize the viewing field.

Reference Sequences

The reference sequence is a sequence that is representative of the feature data. It might be a consensus sequence from an alignment, or simply a sequence of interest. Before any feature tracks can be input to JBrowse, the reference sequence must be taken into consideration. This is handled by the prepare-refseqs.pl script.

prepare-refseqs.pl

This script must be run prior to the addition of feature tracks. The simplest way to use it would be to use the --fasta option, which uses a single sequence or set of reference sequences from a fasta file:

bin/prepare-refseqs.pl --fasta <fasta file> [options]

If the file has multiple sequences, each sequence will become a reference sequence by default. You may switch between these sequences by selecting the sequence of interest via the pull-down menu to right of the large "zoom in" button.

You may use any alphabet you wish for your sequences (i.e., you are not restricted to the nucleotides A, T, C, and G; any alphanumeric character, as well as several other characters, may be used). Hence, it is possible to browse RNA and protein in addition to DNA. However, some characters should be avoided, because they will cause the sequence to "split" - part of the sequence will be cut off and and continue on the next line. These characters are the hyphen and question mark. Unfortunately, this prevents the use of hyphens to represent gaps in a reference sequence.

In addition to reading from a fasta file, prepare-refseqs.pl can read sequences from a gff3 file or a database (e.g. PostgreSQL, MySQL). In order to read fasta sequences from a database, a config file must be used.

Syntax used to import sequences from gff files:

bin/prepare-refseqs.pl --gff <gff3 file with sequence information> [options]

Syntax used to import sequences with a config file:

bin/prepare-refseqs.pl --conf <config file that references a database with sequence information> --[refs|refid] <reference sequences> [options]
Option Value
fasta, gff, or conf Path to the file that JBrowse will use to import sequences. With the fasta and gff options, the sequence information is imported directly from the specified file. With the conf option, the specified config file includes the details necessary to access a database that contains the sequence information. Exactly one of these three options must be used.
out A path to the output directory (default is 'data' in the current directory)
seqdir The directory where the reference sequences are stored (default: <output directory>/seq)
noseq Causes no reference sequence track to be created. This is useful for reducing disk usage.
refs A comma-delimited list of the names of sequences to be imported as reference sequences. This option (or refid) is required when using the conf option. It is not required when the fasta or gff options are used, but it can be useful with these options, since it can be used to select which sequences JBrowse will import.
refid A comma-delimited list of the genbank identifiers? of sequences to be imported as reference sequences. This is an alternative to the '--refs' option.

Feature Tracks

The feature tracks are the most important components of JBrowse. They can be used to visualize information about a sequence, such as sequence conservation, RNA base pairing, and the locations of transposons. There are a number of scripts that can be used to input various types of feature tracks into JBrowse:

  • flatfile-to-json.pl
  • bam-to-json.pl
  • biodb-to-json.pl
  • ucsc-to-json.pl
  • draw-basepair-track.pl
  • wig-to-json.pl

flatfile-to-json.pl

This script inputs a single track into JBrowse. To put multiple tracks into JBrowse, it must be executed repeatedly.

Terminology: A flat file is a database that exists entirely in a single file. In this case, the flat file must be a gff3, gff2, or bed file.

Basic syntax:

bin/flatfile-to-json.pl --[gff|gff2|bed] <flat file> --tracklabel <track name> [options]

Hint: flatfile-to-json.pl simplifies the process of inputting a small number of tracks into JBrowse, since it does not use a config file. If you have many tracks, you will probably want to use a config file, because its structure will make the task of editing tracks easier. In that case, the appropriate script will be biodb-to-json.pl.

Summary of flatfile-to-json.pl options.
Option Value
gff, gff2, or bed The name of the file that contains the feature data. The names of these options correspond to the file types, with the exception of gff, which uses a gff3 file instead of a gff file. Exactly one of these three options must be used.
tracklabel The internal name that JBrowse will give to this feature track. This option requires a value.
key The external, human-readable label seen on the feature track when it is viewed in JBrowse. The value of key defaults to the value of tracklabel.
autocomplete Make the features of the track searchable. This option can be used with the arguments "label", "alias", or "all".

   label: Make the features searchable by the viewable name that they are associated with in JBrowse. In a gff3 file, this will be the "Name" in the attributes column.
   alias: Make the features searchable by an alternate name defined in the input file. In a gff3 file, this will be the "Alias" in the attributes column.
   all: Make the features searchable by both their label and their alias.

out A path to the output directory (default is 'data' in the current directory).
cssClass The css class that will be used to create the feature track. This option makes it possible to choose how the feature track will look by selecting a template from a list of track types defined in genome.css. Click here to view some of the feature track types that come with JBrowse. The default feature track type is "feature".
getType Causes the 'type' to be included in the output JSON file. The type is the feature that has been predicted (e.g. promoter site, gene). If a gff file is being used, the type will be in column 3.
getPhase Causes the 'phase' to be included in the output JSON file. The phase describes the reading frame of a DNA (or messenger RNA) sequence. If the phase is relevant, it can have the values 0, 1, or 2; otherwise, the value associated with the phase is '.'. If a gff file is being used, the phase will be in column 8.
getSubs If subfeatures have been specified for any features in the track, setting this option will cause them to appear. Otherwise, subfeatures will not appear.
getLabel Causes the Name attribute associated with each feature to be included in the track. If a gff3 file is being used, the Name will be in column 9 when it is defined.
urlTemplate A url that your browser will visit when you click on a feature in this track. This is especially useful if you want to link a feature to a page with more information about that feature.
arrowheadClass When this option is used, directional features will be given an arrowhead. The presence and orientation of the arrowhead for each individual feature will depend on data in the input file. Arrowhead classes are defined in genome.css. There is only one that comes with JBrowse (transcript-arrowhead).
subfeatureClasses The css class(es) that will be used for the subfeatures of a feature track. This option makes it possible to choose how the subfeatures will appear. Any of the classes in genome.css can be used for the subfeatures. The argument must be specified as a JSON association list (e.g. { "subfeature1": "cssclass1", "subfeature2": "cssclass2" }). This option must be used with getSubs in order for subfeatures to appear.
clientConfig Any additions or edits to the CSS class being used for the main features of the track (not for subfeatures). These edits must be specified in JSON syntax, and any changes to the CSS style are associated with "featureCss", e.g. '{ "featureCss": "cssoption1: value1; cssoption2: value2", "histscale": 2}'.
type The type of feature that will appear in the feature track. This option is useful when the input file contains features of several different types, and you are interested in only having one type of feature (e.g. only having features that are genes) in the feature track. In gff3 files, the type is in the third column.
extraData Use additional information from the input file to create variations in the appearance or behavior of individual features. This option is meant to be used in conjunction with other options. For each feature in the track, a perl subroutine is used to extract additional information, which is then associated with a variable. The value of this variable can be different for each feature. When the name of this variable is surrounded by curly braces and used in the argument for a different option, such as urlTemplate, the feature-specific data is used.
nclChunk The NCList chunk size. This option should not be used unless an error such as "json or perl structure exceeds maximum nesting level" is encountered. If this error does occur, lower the chunk size (the default is 1000).

bam-to-json.pl

This script is very similar to flatfile-to-json.pl, but it specifically uses bam files as input.

Basic syntax:

bin/bam-to-json.pl --bam <bam file> --tracklabel <track name> [options]
Option Value
bam The name of the bam file that contains the feature data. This option requires a value.
tracklabel The internal name that JBrowse will give to this feature track. This option requires a value.
key The external, human-readable label seen on the feature track when it is viewed in JBrowse. The value of key defaults to the value of tracklabel.
out A path to the output directory (default is 'data' in the current directory).
cssClass The css class that will be used to create the feature track. This option makes it possible to choose how the feature track will look by selecting a template from a list of track types defined in genome.css. Click here to view some of the feature track types that come with JBrowse. The default feature track type is "feature".
clientConfig Any additions or edits to the CSS class being used for the main features of the track (not for subfeatures). These edits must be specified in JSON syntax, and any changes to the CSS style are associated with "featureCss", e.g. '{ "featureCss": "cssoption1: value1; cssoption2: value2", "histscale": 2}'.
nclChunk The NCList chunk size in bytes. This option should not be used unless an error such as "json or perl structure exceeds maximum nesting level" is encountered. If this error does occur, lower the chunk size (the default is 1000 bytes).
compress This option causes the output to be compressed (What is being compressed? What configuration is necessary?)

biodb-to-json.pl

This script uses a config file to produce a set of feature tracks in JBrowse. It can be used to obtain information from any database with appropriate schema, or from flat files. Because it can produce several feature tracks in a single execution, it is useful for large-scale feature data entry into JBrowse.

Basic syntax:

bin/biodb-to-json.pl --conf <config file> [options]

For more details about the structure of a config file, see Using Config Files.

Option Value
conf The name of the JSON configuration file that will be used. This option must be specified.
out A path to the output directory (default is 'data' in the current directory).
track The identifier of a single track that will be updated or added to JBrowse. In the list of key-value pairs comprising an individual track definition in the config file, the identifier will be the value associated with "track".
ref A comma-delimited list of reference sequence names, used to limit database queries to a certain set of sequences. By default, all reference sequences in the database are queried for the types of feature data specified in the config file.
refid How is refid different from refs? A comma-delimited list of reference sequence IDs, used to limit database queries to a certain set of sequences. By default, all reference sequences in the database are queried for the types of feature data specified in the config file.
compress This option causes the output to be compressed (What is being compressed? What configuration is necessary?)

ucsc-to-json.pl

This script uses data from UCSC genome annotation database. To reach this data, go to hgdownload.cse.ucsc.edu and click the link for the genome of interest. Next, click the "Annotation Database" link. The data relevant to ucsc-to-json.pl (*.txt.gz and *.sql files) can be downloaded from either this page or the FTP server described on this page.

Special dependencies: SAMtools

Basic syntax:

bin/ucsc-to-json.pl --in <directory with files from UCSC> [options]

Hint: If you're using this approach, it might be convenient to also download the sequence(s) from UCSC. These are usually available from the "Data set by chromosome" link for the particular genome or from the FTP server.

Option Value
in A directory containing the *.txt.gz and *.sql data from UCSC.
out A path to the output directory (default is 'data' in the current directory).
track The name of the database table. What does this mean?
cssClass The css class that will be used to create the feature track. This option makes it possible to choose how the feature track will look by selecting a template from a list of track types defined in genome.css. Click here to view some of the feature track types that come with JBrowse. The default feature track type is "feature".
arrowheadClass When this option is used, directional features will be given an arrowhead. The presence and orientation of the arrowhead for each individual feature will depend on data in the input file. Arrowhead classes are defined in genome.css. There is only one that comes with JBrowse (transcript-arrowhead).
subfeatureClasses The css class(es) that will be used for the subfeatures of a feature track. This option makes it possible to choose how the subfeatures will appear. Any of the classes in genome.css can be used for the subfeatures. The argument must be specified as a JSON association list (e.g. { "subfeature1": "cssclass1", "subfeature2": "cssclass2" }). This option must be used with getSubs in order for subfeatures to appear.
clientConfig Any additions or edits to the CSS class being used for the main features of the track (not for subfeatures). These edits must be specified in JSON syntax, and any changes to the CSS style are associated with "featureCss", e.g. '{ "featureCss": "cssoption1: value1; cssoption2: value2", "histscale": 2}'.
nclChunk The NCList chunk size in bytes. This option should not be used unless an error such as "json or perl structure exceeds maximum nesting level" is encountered. If this error does occur, lower the chunk size (the default is 1000 bytes).
compress This option causes the output to be compressed (What is being compressed? What configuration is necessary?)
sortMem The amount of memory in bytes to use for sorting. What needs sorting, and why? What is the default amount of memory that is allocated for sorting?

draw-basepair-track.pl

This script inputs a single base pairing track into JBrowse. A base pairing track is a distinctive track type that represents base pairing between nucleotides as arcs.

Terminology: In JBrowse jargon, a tile is a png image that is used as an entire track. When draw-basepair-track.pl is executed, a tile is created for each zoom level, and the set of generated tiles is used to display the track at all possible zoom levels. This is also the case for wig-to-json.pl.

Basic syntax:

bin/draw-basepair-track.pl --gff <gff file> --tracklabel <track name> [options]
Summary of draw-basepair-track.pl options.
Option Value
gff The name of the gff file that will be used. This option must be specified.
tracklabel The internal name that JBrowse will give to this feature track. This option requires a value.
key The external, human-readable label seen on the feature track when it is viewed in JBrowse. The value of key defaults to the value of tracklabel.
out A path to the output directory (default is 'data' in the current directory).
tile The directory where the tiles, or images corresponding to each zoom level of the track, are stored. Defaults to data/tiles.
bgcolor The color of the track background. Specified as "RED,GREEN,BLUE" in base ten numbers between 0 and 255. Defaults to "255,255,255".
fgcolor The color of the track foreground (i.e. the base pairing arcs). Specified as "RED,GREEN,BLUE" in base ten numbers between 0 and 255. Defaults to "0,255,0".
width The width in pixels of each tile. The default value is 2000.
height The height in pixels of each tile. Changing this parameter will cause a corresponding change in the top-to-bottom height of the track in JBrowse. The default value is 100.
thickness The thickness of the base pairing arcs in the track. The default value is 2.
nolinks Disables use of file system links to compress duplicate image files.

wig-to-json.pl

This script inputs a single wiggle track into JBrowse. In a wiggle track, a numeric value is associated with each nucleotide position in the reference sequence. This is represented in JBrowse as a track that looks like a bar graph, where the horizontal axis is for each nucleotide position, and the vertical axis is for the number associated with that position. The vertical axis currently does not have a scale; rather, the heights for each position are relative to each other.

Special Dependencies: libpng

In order to use wig-to-json.pl, the code for wig2png must be compiled. This can be done with the following command:

make

Note: If you are using Mac OS X, it might be necessary to execute make in the following way:

make GCC_LIB_ARGS=-L/usr/X11/lib GCC_INC_ARGS=-I/usr/X11/include

Terminology: In JBrowse jargon, a tile is a png image that is used as an entire track. When wig-to-json.pl is executed, a tile is created for each zoom level, and the set of generated tiles is used to display the track at all possible zoom levels. This is also the case for draw-basepair-track.pl.

Basic syntax:

bin/wig-to-json.pl --wig <wig file> --tracklabel <track name> [options]|}

Hint: If you are using this type of track to plot a measure of a prediction's quality, where the range of possible quality scores is from some lowerbound to some upperbound (for instance, between 0 and 1), you can specify these bounds with the max and min options.

Summary of wig-to-json.pl options.
Option Value
gff The name of the gff file that will be used. This option must be specified.
tracklabel The internal name that JBrowse will give to this feature track. This option requires a value.
key The external, human-readable label seen on the feature track when it is viewed in JBrowse. The value of key defaults to the value of tracklabel.
out A path to the output directory (default is 'data' in the current directory).
tile The directory where the tiles, or images corresponding to each zoom level of the track, are stored. Defaults to data/tiles.
bgcolor The color of the track background. Specified as "RED,GREEN,BLUE" in base ten numbers between 0 and 255. Defaults to "255,255,255".
fgcolor The color of the track foreground (i.e. the vertical bars of the wiggle track). Specified as "RED,GREEN,BLUE" in base ten numbers between 0 and 255. Defaults to "105,155,111".
width The width in pixels of each tile. The default value is 2000.
height The height in pixels of each tile. Changing this parameter will cause a corresponding change in the top-to-bottom height of the track in JBrowse. The default value is 100.
min The lowerbound to use for the track. By default, this is the lowest value in the wiggle file.
max The upperbound to use for the track. By default, this will be the highest value in the wiggle file.

Naming

generate-names.pl

This script is only important if your feature tracks are annotated with names (e.g. the name of a gene in a track containing genes). If the autocomplete option was used, running this script will make the locations of annotated features searchable via the small search text box (next to the button "Go"). Clicking on Go after entering a search term will take you to the annotation element that you searched for.

Basic syntax:

bin/generate-names.pl

Note that generate-names.pl does not require any arguments. However, some options are available:

Option Value
dir A path to the output directory (default is 'data/names' in the current directory).
thresh ???
verbose ???

Removing Tracks

While JBrowse does not support a script that removes individual tracks, there are a number of possible options that can be taken to change or remove a track:

1. Overwrite the unwanted track with a new track. This is useful when a mistake was made in preparing a track, and you are interested in removing the track only so that you can replace it with a correct track that has the same tracklabel (the 'tracklabel' is a track's internal name). This is done by writing the new information with the same value associated with the tracklabel option.

2. Remove the entire data directory. This is useful when you want to completely remove a track or set of tracks, rather than replacing them with different tracks. This is perhaps the fastest way to remove a track, but it has the obvious pitfall that you might also be removing tracks that you wanted to keep. If you don't have very many feature tracks, or if biodb-to-json.pl is being used to generate most of the feature tracks, (in which case most of the tracks can be recovered with a single execution of biodb-to-json.pl), this option will be fine.

3. Remove the information about the specific tracks from the data directory. This allows you to remove a track without removing every track, combining the advantages of the previous two methods for removing a set of tracks. The disadvantage is that you must manually remove an entry from a file that is interpreted by JBrowse. The important part to remove will be in trackInfo.js if you want to remove a feature track or refSeqs.js if you want to remove a sequence track.

Additional Information

Using Config Files

In the context of JBrowse, a config file is a set of instructions in JSON syntax that first indicates the location of the feature data, and then specifies a list of JBrowse tracks that can use the referenced feature data. The options for the feature tracks are virtually the same in the config file as they are in flatfile-to-json.pl. The difference is that, instead of inputting the feature tracks one at a time with flatfile-to-json.pl, the tracks are specified all at once in a file. This greatly reduces the amount of typing needed to change a track, especially for tracks that use several options. It also makes it easier to manage a large number of tracks, since the options used for those tracks are all recorded in a human-readable way.


Here is a sample config file with each line explained. Note that, in order for this config file to work, it would be necessary to remove the grey comments (since JSON does not support them).

{
  This is the header. It contains information about the database.
  description: a brief textual description of the data source.
  "description": "D. melanogaster (release 5.37)",
  db_adaptor: a perl module with methods for opening databases and extracting
information. This will normally be either Bio::DB::SeqFeature::Store,
Bio::DB::Das::Chado, or Bio::DB::GFF.
"db_adaptor": "Bio::DB::SeqFeature::Store", db_args: arguments required to produce an instance of the db_adaptor. The
required arguments can be found by searching for the db_adaptor on the CPAN
website.
"db_args": { adaptor: With Bio::DB::SeqFeature::Store, a value of "memory"
for the adaptor indicates that the data is stored somewhere in
the file system. Alternatively, it might have been stored in a
database such as MySQL or BerkeleyDB.
"-adaptor": "memory", dir: given the "memory" argument for the adaptor, this is the
file system path to the location in memory where the data is
stored. Data will automatically be extracted from any *.gff
or *.gff3 files in this directory.
"-dir": "/Users/stephen/Downloads/dmel_r5.37" }, This is the body. It contains information about the feature tracks. TRACK DEFAULTS: The default options for every track. "TRACK DEFAULTS": { class: same as cssClass in flatfile-to-json.pl. "class": "feature" }, tracks: information about each individual track. "tracks": [ Information about the first track. { track: same as 'tracklabel' in flatfile-to-json.pl. "track": "gene", key: same meaning as in flatfile-to-json.pl. "key": "Gene Span", feature: an array of the feature types that will be used for the track.
Similar to 'type' in flatfile-to-json.pl.
"feature": ["gene"], autocomplete: same meaning as in flatfile-to-json.pl. "autocomplete": "all", "class": "feature2", category: what is this? "category": "Gene Model Features", urlTemplate: same meaning as in flatfile-to-json.pl. Note how
urlTemplate is being used with a variable called "feature_id" defined
in extraData. In this way, different features in the same track can
be linked to different pages on FlyBase.
"urlTemplate": "http://flybase.org/cgi-bin/fbidq.html?{feature_id}", extraData: same as in flatfile-to-json.pl. "extraData": {"feature_id": "sub {shift->attributes(\"load_id\");}"} }, Information about the second track. { "track": "mRNA", "feature": ["mRNA"], "autocomplete": "alias", subfeatures: similar to 'getSubs' in flatfile-to-json.pl. "subfeatures": true, "key": "mRNA", "class": "transcript", subfeature_classes: same as 'subfeatureClasses' in flatfile-to-json.pl. "subfeature_classes": { "CDS": "transcript-CDS", "five_prime_UTR": "transcript-five_prime_UTR", "three_prime_UTR": "transcript-three_prime_UTR" }, arrowheadClass: same meaning as in flatfile-to-json.pl. "arrowheadClass": "transcript-arrowhead", clientConfig: same meaning as in flatfile-to-json.pl. "clientConfig": { "histScale":5 }, "urlTemplate": "http://flybase.org/cgi-bin/fbidq.html?{feature_id}", "extraData": {"feature_id": "sub {shift->attributes(\"load_id\");}"} } ] }


Note how the config file is divided into two parts, a header section that contains information about the database, and a body section that contains information about the feature tracks.

Using a Database Backend

Giving JBrowse Access to a Database

JBrowse is capable of extracting sequence and feature information from databases such as PostgreSQL, MySQL, BerkeleyDB, and Oracle. This is done by using prepare-refseqs.pl or biodb-to-json.pl with a config file whose header section contains information about the database.

For a PostgreSQL database with the Chado schema, the config file header would look something like this:

{
  "description": "D. melanogaster (release 5.37)",
  "db_adaptor": "Bio::DB::Das::Chado",
  "db_args": { "-dsn": "dbi:Pg:dbname=fruitfly;host=localhost;port=5432",
               "-user": "yourusername",
               "-pass": "yourpassword"
             },
  ...
}

In the database source name (dsn) argument, 'dbi:Pg' indicates that you are using PostgreSQL, and the dbname, host, and port were specified when the database was created with createdb. The user and pass arguments were specified when the PostgreSQL user account was created with createuser. Collectively, these arguments identify the database and give the Bio::DB::Das::Chado object access to it.

Assuming that you already have access to an existing database with the Chado schema and the feature data you're interested in, this is all you need in order to use JBrowse with the database.

Preparing a Database From Scratch

The way to set up the database is as follows:
1. Install the database management system (DBMS).
2. Import the appropriate schema into the database.
3. Import the sequence and feature data into the database.

As an example, try to prepare a PostgreSQL database with the Chado schema. Chado-1.11 can be downloaded here and most of the information you need to know about Chado installation can be found in INSTALL.Chado. If you choose to install the latest stable version of PostgreSQL (9.0.4 as of this date), you might encounter a few quirks:

When you create new users, you might have to explicitly request a password prompt with the --pwprompt option. If you were able to create a user account without specifying a password for that user, no password will work for that account.

When you are running make load_schema, you might get an error message about a failure to drop a database that does not exist. If this error is encountered, open bin/test_load.sh and comment out this line:

dropdb -h $DBHOST -p $DBPORT -U $DBUSER $DBNAME;

The resulting line should look like this:

# dropdb -h $DBHOST -p $DBPORT -U $DBUSER $DBNAME;

When you are running make ontologies, you will be given a list of ontologies that you can install. At the very least be sure to install the Relationship Ontology, Sequence Ontology, Gene Ontology, and Chado Feature Properties.

There are two GMOD scripts that are used to insert data from a fasta or gff file into the database:

1. gmod_gff3_preprocessor.pl standardizes the gff file, sorting the feature data and moving any fasta sequences to a separate file.

Basic syntax:

gmod_gff3_preprocessor.pl --gfffile <gff file>

2. gmod_bulk_load_gff3.pl uses the output of gmod_gff3_preprocessor.pl to input data into the database.

Fasta syntax:

gmod_bulk_load_gff3.pl --organism <common name> --fastafile <fasta-formatted sequence file>

GFF syntax:

gmod_bulk_load_gff3.pl --organism <common name> --gfffile <processed gff file>

After inputting this data into the database, JBrowse should be able to access it using a config file with a header like the one at the beginning of this topic.

See also

External Links