Difference between revisions of "JBrowse2 Tutorial PAG 2022"

From GMOD
Jump to: navigation, search
(Dynamically changing the mouseover text)
(Synteny)
Line 142: Line 142:
  
 
==Synteny==
 
==Synteny==
 +
 +
 +
===Getting the data===
 +
 +
Easiest for a workshop: minimap2
 +
 +
===Configuring with jbrowse admin===
 +
 +
 +
===Using dotplot and synteny views===
  
 
=Below is the most recent JBrowse 1 tutorial as a guide=
 
=Below is the most recent JBrowse 1 tutorial as a guide=

Revision as of 20:31, 28 November 2022

Contents

Prerequisites

  • NodeJS

Installed using the instructions on Nodejs.org:

 curl -fsSL https://deb.nodesource.com/setup_18.x | sudo -E bash - &&sudo apt-get install -y nodejs

  • A web server (Apache2 in this instance, but any will do). I enabled the "userdir" mod so we could all use the same machine for the tutorial:

 sudo a2enmod userdir
 sudo /etc/init.d/apache2 restart

Things done just for this tutorial

  • A script to create several users with public_html directories (link for when it exists)
  • Already installed the JBrowse command line interface (CLI) via the directions (i.e., sudo npm install -g @jbrowse/cli)
  • Installed bgzip, tabix and samtools via apt: sudo apt-get install samtools tabix.
  • Created a bgzipped and samtools faidx'ed FASTA file for C. elegans
  • Created a "Genes only" C. elegans GFF file (gzip -dc c_elegans.PRJNA13758.WS286.annotations.gff3.gz | grep "\tWormBase\t" > c_elegans.genes.gff3

Initializing JBrowse

First, use ssh to connect to the instance we have set up for this tutorial, tutorial.jbrowse.org. Do this with the user name and password you got from one of us (we have 50 users configured--hopefully that will be enough!):

 ssh username@tutorial.jbrowse.org

and supply the password. When you log in, you'll be in your user's home directory, where there is nothing but a public_html directory. That directory is also currently empty, so we'll use the JBrowse CLI to initialize a new JBrowse instance:

 jbrowse create public_html

Now change to that directory, cd public_html and do a file list to make sure it looks right:

put a picture here File:Public html listing.png

This is all of the software required to run JBrowse. If we now navigate to the tutorial machine's website with the port supplied on the username/password slip, you should see a page indicating that JBrowse was installed but not configured: http://tutorial.jbrowse.org:XXXX/.

put a picture here New jbrowse page.png

To make sure it really works, we can click on the Volvox (not really Volvox) data set.

To get started creating our JBrowse instance, we'll run the JBrowse admin-server, which looks just like JBrowse proper, but has an extra admin menu. Important note: The admin server is NOT meant to be left running; it is not particularly secure, so if you leave it up, somebody might start messing with your site. To start the admin server, we change to the directory where JBrowse will be served from (public_html) and run the jbrowse command to start it:

 jbrowse admin-server -p YYYY

When we execute that command, we get a message in the terminal that it started up and gives us some URLs to use to access the server. It will look something like this:

Admin-server.png

The part we need is the adminKey. In a browser window, enter a URL that looks like this: http://tutorial.jbrowse.org:YYYY?adminKey=yourkey

Adding a reference sequence

The first thing we need to do is add a reference sequence. There is already one prepared and on the web server for C. elegans and it is at

 http://tutorial.jbrowse.org/c_elegans.PRJNA13758.WS286.genomic.fa.gz
 http://tutorial.jbrowse.org/c_elegans.PRJNA13758.WS286.genomic.fa.gz.fai
 http://tutorial.jbrowse.org/c_elegans.PRJNA13758.WS286.genomic.fa.gz.gzi

To create this indexed reference sequence, the fasta was downloaded from the WormBase ftp site, and after uncompressing it, it was bgzipped and then indexed with SAMTools:

 bgzip c_elegans.PRJNA13758.WS286.genomic.fa
 samtools faidx c_elegans.PRJNA13758.WS286.genomic.fa.gz

To add this as a reference sequence to JBrowse, click on the "Start a new session" and then on the resulting page, select "Open assembly manager" from the Admin menu. In the dialog that opens, click the "Add new assembly" button. Finally, in add assembly dialog, put something useful in the "Assembly Name" field and then select "BgzipFastaAdapter" from the "Type" menu. At that point, the dialog will change slightly to give you places to put in the three URLs above:

Add assembly dialog.png

Copy and paste those URLs in to the appropriate fields and then click "Save new assembly."

  Note: this is one place where the web version of JBrowse with the admin server is slightly 
  different from the Desktop version: if we were using the desktop version, the above dialog
  would have also given the option for finding the files on a local hard drive rather than 
  only allowing URLs.

  Another note: In order for the above URLs to work with a web instance of JBrowse that 
  isn't on the "same" server (where different ports == a different server), CORS (cross 
  origin resource sharing) had to be enabled for the web server (in this case apache). 
  If you want to do the same thing for a server you control, google "enable CORS <your 
  server software name>" to find directions.

Adding a gene track from tabix-indexed GFF

Magic incantation for sorting GFF3 files, and then bgzipping it:

 sort -t"`printf '\t'`" -k1,1 -k4,4n c_elegans.genes.gff3 |bgzip > c_elegans.genes.sorted.gff3.gz

and then tabix indexing it:

 tabix c_elegans.genes.sorted.gff3.gz
 http://tutorial.jbrowse.org/c_elegans.genes.sorted.gff3.gz
 http://tutorial.jbrowse.org/c_elegans.genes.sorted.gff3.gz.tbi

Add track dialog.png


Genes track.png

Adding a search index

Adding a gene track from a JBrowse (NCList) track

Protein coding genes from WormBase's JBrowse 1 instance

 https://s3.amazonaws.com/agrjbrowse/MOD-jbrowses/WormBase/WS286/c_elegans_PRJNA13758/tracks/Curated Genes (protein coding)/{refseq}/trackData.jsonz

Protein coding genes.png

Adding a JBrowse 1 name index

Adding variant data from a tabix-indexed VCF

 https://storage.googleapis.com/elegansvariation.org/releases/current/WI.current.soft-filtered.vcf.gz

Cendr vcf track.png

Adding quantitative data from a BigWig

 https://data.broadinstitute.org/compbio1/PhyloCSFtracks/ce11/latest/PhyloCSF+1.bw

Using JEXL to modify the display

Dynamically changing the color

Dynamically changing the mouseover text

Synteny

Getting the data

Easiest for a workshop: minimap2

Configuring with jbrowse admin

Using dotplot and synteny views

Below is the most recent JBrowse 1 tutorial as a guide



This JBrowse tutorial was presented by Scott Cain at the Plant and Animal Genomes meeting using JBrowse 1.16.6.

This tutorial assumes a VirtualBox Ubuntu 18.04 (LTS) instance with the tutorial bundle zip file, also available on Amazon S3: JBrowse PAG 2020.ova (about 4GB) or PAG_2020_JBrowse.zip (about 36MB) for just the JBrowse source and data files for this tutorial.

Prerequisites

Prerequisites are installed by JBrowse automatically. A few things may fail to install (like legacy support for wiggle files), but that doesn't matter.

Make sure you can copy/paste from the wiki.

It's also very useful to know how to tab-complete in the shell.

It's probably a good idea to use a browser like Chrome or Firefox that has the ability to turn off caching while working on this tutorial. To do this in Chrome, with the browser open to the JBrowse page you're working on, select Developer->Javascript Console from the View menu. In the console, click the "gear" icon (settings) and check the box labeled "Disable Cache".

A few basic packages were installed before JBrowse via apt-get:

 sudo apt-get install build-essential zlib1g-dev apache2 curl

Also, a few items were installed that aren't needed for this tutorial but would be necessary if you wanted to add plugins (git and NodeJS):

 curl -sL https://deb.nodesource.com/setup_13.x | sudo -E bash -
 sudo apt-get install -y nodejs
 sudo apt-get install git

JBrowse Introduction

How and why JBrowse is different from most other web-based genome browsers, including GBrowse.

More detail: paper

JBrowse presentation

Setting up JBrowse

Getting JBrowse

  • download the demo bundle from Amazon and unzip it. Don't do the things in yellow; they were done ahead in the interests of time.
cd html
 ##curl -O https://s3.amazonaws.com/jbrowse-tutorials/PAG_2020_JBrowse.zip #that's a capital dash"O" not a zero/zed.
 <span class="enter">cp ~/PAG_2020_JBrowse.zip . ## if we don't need to update the zip file
 <span class="enter">unzip PAG_2020_JBrowse.zip</span>
 <span class="enter">cd PAG_2020_JBrowse
 unzip jbrowse-1.16.6-release.zip
 mv jbrowse-1.16.6-release ../jbrowse
  • run setup.sh to configure this copy of JBrowse
cd jbrowse
 ./setup.sh

Typically, this setup step doesn't take very long, but on these virtual machines on an already slow laptop, they can take a while.

Starting Point

Visit in web browser (ie, Firefox inside the virtual machine):

http://localhost/jbrowse/index.html

You should see a "Congratulations" page.

Basic Steps

There are four basic steps to setting up an instance of JBrowse:

  1. Load and format reference sequences
  2. Format data for tracks
  3. Configure direct-access tracks
  4. Index feature names

A Short Detour for GFF

GFF (Generic Feature Format) is a very commonly used text format for describing features that exist on sequences. We'll head off to that page to talk about it a bit.

Features from a directory of files

Here, we'll use the Bio::DB::SeqFeature::Store adaptor in "memory" mode to read a directory of files. There are adaptors available for use with many other databases, such as Chado and Bio::DB::GFF.

Config file: pythium-1.conf

{
  "description": "PAG 2017 P. ultima Example",
  "db_adaptor": "Bio::DB::SeqFeature::Store",
  "db_args" : {
      "-adaptor" : "memory",
      "-dir" : ".."
   },
...

Specify reference sequences

The first script to run is bin/prepare-refseqs.pl; that script is the way you tell JBrowse about what your reference sequences are. Running bin/prepare-refseqs.pl also sets up the "DNA" track.

Run this from within the jbrowse directory (you could run it elsewhere, but you'd have to explicitly specify the location of the data directory on the command line).

cd ~/html/jbrowse
bin/prepare-refseqs.pl --gff ../PAG_2020_JBrowse/scf1117875582023.gff

Refresh it in your web browser, you should new see the JBrowse UI and a sequence track, which will show you the DNA base pairs if you zoom in far enough.

Load Feature Data

Next, we'll use biodb-to-json.pl to get feature data out of the database and turn it into JSON data that the web browser can use.

In this case, we have specified all of our track configurations in pythium-1.conf.

...
 
  "TRACK DEFAULTS": {
    "class": "feature"
  },
 
 "tracks": [
    {
      "track": "Genes",
      "key": "Genes",
      "feature": ["mRNA"],
      "autocomplete": "all",
      "class": "transcript",
      "subfeature_classes" : {
            "CDS" : "transcript-CDS",
            "UTR" : "transcript-UTR"
      }
    },
   ...
]

track specifies the track identifier (a unique name for the track, for the software to use). This should be just letters and numbers and - and _ characters; using other characters makes things less convenient.

key specifies a human-friendly name for the track, which can use any characters you want.

feature gives a list of feature types to include in the track.

autocomplete including this setting makes the features in the track searchable.

urltemplate specifies a URL pattern that you can use to link genomic features to specific web pages.

class specifies the CSS class that describes how the feature should look.

For this particular track, I've specified the transcript feature class.

Run the bin/biodb-to-json.pl script with this config file to format this track, and the others in the file:

bin/biodb-to-json.pl --conf ../PAG_2020_JBrowse/pythium-1.conf

Refresh JBrowse in your web browser. You should now see a bunch of annotation tracks.

Index feature names

When you generate JSON for a track, if you specify "autocomplete" then a listing of all of the feature names from that track (along with feature locations) will also be generated and used to provide feature searching and autocompletion.

The bin/generate-names.pl script collects those lists of names from all the tracks and combines them into one big tree that the client uses to search.

bin/generate-names.pl -v

Visit in web browser, try typing a feature name, such as maker-scf1117875582023-snap-gene-0.26-mRNA-1. Notice that JBrowse tries to auto-complete what you type.

Features from GFF3 or BED files

We're going to add a couple more tracks that come from a flat file, repeats.gff. To get feature data from flat files into JBrowse, we use flatfile-to-json.pl.

  • We'll add a RepeatMasker track:
bin/flatfile-to-json.pl --trackLabel "repeat masker" \
    --trackType CanvasFeatures \
    --type match:repeatmasker --key RepeatMasker \
    --className generic_parent \
    --subfeatureClasses '{"match_part" : "feature"}' --gff ../PAG_2020_JBrowse/repeats.gff
  • And then a RepeatRunner track:
bin/flatfile-to-json.pl --trackLabel "repeat runner" \
    --trackType CanvasFeatures \
    --type protein_match:repeatrunner \
    --key RepeatRunner --className generic_parent \
    --subfeatureClasses '{"match_part" : "feature"}' --gff ../PAG_2020_JBrowse/repeats.gff

Visit in web browser; you should see the two new RepeatMasker and RepeatRunner tracks.

BAM alignments

JBrowse can display alignments directly from a BAM file on your web server. Simply place the BAM file in a directory accessible to your web server, and add a snippet of configuration to JBrowse to add the track, similar to:

     {
        "label" : "bam_alignments",
        "key" : "BAM alignments",
        "storeClass" : "JBrowse/Store/SeqFeature/BAM",
        "urlTemplate" : "../../PAG_2020_JBrowse/simulated-sorted.bam",
        "type" : "Alignments2"
      }

This can be added by either editing the data/trackList.json file with a text editor, or by running something like this at the command line to inject the track configuration:

echo '{
       "label" : "bam_alignments",
       "key" : "BAM alignments",
       "storeClass" : "JBrowse/Store/SeqFeature/BAM",
       "urlTemplate" : "../../PAG_2020_JBrowse/simulated-sorted.bam",
       "type" : "Alignments2"
     }' | bin/add-track-json.pl data/trackList.json

BAM coverage

This time we'll use a text editor and will edit the track configuration file directly. Type

  gedit data/trackList.json

and insert the text below in the "tracks" array (the easiest thing to do is find the "[" after "tracks", paste there and then add a comma after the "}").

     {
        "label" : "bam_coverage",
        "key" : "BAM Coverage",
        "storeClass" : "JBrowse/Store/SeqFeature/BAM",
        "urlTemplate" : "../../PAG_2020_JBrowse/simulated-sorted.bam",
        "type" : "SNPCoverage"
      }

and then press the Save button.

Quantitative data

BigWig

JBrowse can display quantitative data directly from a BigWig file on your web server. Simply place the BigWig file in a directory accessible to your web server, and add a snippet of configuration to JBrowse to add the track, similar to:

     {
        "label" : "bigwig_bam_coverage",
        "key" : "BigWig - BAM coverage",
        "storeClass" : "BigWig",
        "urlTemplate" : "../../PAG_2020_JBrowse/simulated-sorted.bam.coverage.bw",
        "type" : "JBrowse/View/Track/Wiggle/XYPlot",
        "variance_band" : true
      }

This can be added by either editing the data/trackList.json file with a text editor, or by running something like this at the command line to inject the track configuration:

echo ' {
       "label" : "bigwig_bam_coverage",
       "key" : "BigWig - BAM coverage",
       "storeClass" : "BigWig",
       "urlTemplate" : "../../PAG_2020_JBrowse/simulated-sorted.bam.coverage.bw",
       "type" : "JBrowse/View/Track/Wiggle/XYPlot",
       "variance_band" : true
     } ' | bin/add-track-json.pl data/trackList.json

Variation Data

VCF tracks

JBrowse can also display VCF variation data directly from a VCF file on your web server that has been compressed with Heng Li's bgzip and tabix. Simply place the .vcf.gz and .vcf.gz.tbi files in a directory accessible to your web server, and add a snippet of configuration to JBrowse to add the track, similar to:

      {
        "label" : "bam_variation",
        "key" : "VCF simulated variation",
        "storeClass" : "JBrowse/Store/SeqFeature/VCFTabix",
        "urlTemplate" : "../../PAG_2020_JBrowse/simulated-sorted.vcf.gz",
        "type" : "HTMLVariants"
      }

This can be added by either editing the data/trackList.json file with a text editor, or by running something like this at the command line to inject the track configuration:

echo ' {
       "label" : "bam_variation",
       "key" : "VCF simulated variation",
       "storeClass" : "JBrowse/Store/SeqFeature/VCFTabix",
       "urlTemplate" : "../../PAG_2020_JBrowse/simulated-sorted.vcf.gz",
       "type" : "HTMLVariants"
     } ' | bin/add-track-json.pl data/trackList.json

Paired read CRAM data

We can use a text editor again to edit track configuration file directly. So, if you don't already have gedit open, once again, type

  gedit data/trackList.json

and insert the text below in the "tracks" array (the easiest thing to do is find the "[" after "tracks", paste there and then add a comma after the "}").

     {
        "label" : "paired_cram",
        "key" : "Paired CRAM",
        "glyph": "JBrowse/View/FeatureGlyph/PairedAlignment",
        "storeClass" : "JBrowse/Store/SeqFeature/CRAM",
        "urlTemplate" : "../../PAG_2020_JBrowse/simulated.cram",
        "type" : "Alignments2"
      }

and then press the Save button.

Paired Read Glyph Options

Now is also a good time to look at the plethora of options available for working with the paired read glyph. To access them, hover the mouse over the the track label (where it says "Paired CRAM") and click the down triangle to display an options menu. Feel free to play, but because this track's data are simulated, they are kind of boring. Be sure to check out the track visualization types.

Faceted Track Selection

JBrowse has a very powerful faceted track selector that can be used to search for tracks using metadata associated with them.

The track metadata is kept in a CSV-format file, with any number of columns, and with a "label" column whose contents must correspond to the track labels in the JBrowse configuration.

The demo bundle contains an example trackMetadata.csv file, which can be copied into the data directory for use with this configuration.

cp ../PAG_2020_JBrowse/trackMetadata.csv data/

Then a simple faceted track selection configuration might look like:

   "trackSelector": {
       "type": 'Faceted',
   },
   "trackMetadata": {
       "sources": [
          { "type": 'csv', "url": 'data/trackMetadata.csv' }
       ]
   }

Copy the section above and put it in the empty curly braces in the jbrowse_conf.json file in the jbrowse directory, save it, refresh your browser, and you should now see the faceted track selector activated.

Changing the way tracks look

Tracks can be modified by changing several aspects of how the images are created. While this can be done be done both with HTML and Canvas tracks, this tutorial will focus on Canvas tracks only (the repeat tracks created above).

The configuration for the RepeatMasker track looks like this:

      {
         "key" : "RepeatMasker",
         "trackType" : "CanvasFeatures",
         "storeClass" : "JBrowse/Store/SeqFeature/NCList",
         "urlTemplate" : "tracks/repeat masker/{refseq}/trackData.json",
         "style" : {
            "subfeatureClasses" : {
               "match_part" : "feature"
            },
            "className" : "generic_parent"
         },
         "type" : "CanvasFeatures",
         "compress" : 0,
         "label" : "repeat masker"
      }

Open the data/trackList.json file in your favorite editor and Control-F will open a "find" window; search for "repeatmasker". A simple change we can make is to the color; in the line starting with "style", add:

  "color" : "black",

save the changes and select the RepeatMasker track or reload the browser to see the change. Many attributes of the display can be modified in this way, see the JBrowse Configuration Guide for a list of options.

Making changes based on the data

Much like GBrowse's perl callbacks that can change the track display, in the JBrowse configuration file you can include JavaScript functions to change the way tracks look. For example, in the RepeatMasker track, we can change the color of the glyph depending on what kind of repeat it is (where we happen to know that the type of repeat is encoded in the name). In this example, we leave the glyph black, unless it is a low complexity repeat, where we'll color it red. A function to do that would look like this:

  "color" : "function(feature) { var name = feature.get('Name'); if (name.match('Low_complexity') ) { return 'red'; } return 'black';  }",

When editing the trackList.json file directly in this way, the function has to go all on one line, but if we create an "include file" (not covered here) the function can have carriage returns in it. Replace the simple "color : black" section we just created in the configuration file with the function above, save the file and reload the browser page to see the changes (you might have to mouse around to find a low complexity repeat).

Making links open something else

The default action when you click on a glyph is to open a "floating" window that displays everything JBrowse knows about a feature. If you'd like something else to happen, you have several options (outlined here), including having a different floating window open or executing any JavaScript function you define. For this example, we'll create a link that will Google the repeat's name and open the result in a new window. In the RepeatMasker section of the JBrowse configuration, we'll add a section that looks like this after the style section:

        "onClick" : {
           "iconClass" : "dijitIconDatabase",
           "action" : "newWindow",
           "url" : "http://www.google.com/search?q={name}",
           "label" : "Search for {name} at Google",
           "title" : "function(track,feature,div) { return 'Searching for '+feature.get('name')+' at Google'; }"
        },

If you're having difficulties, the RepeatMasker section of the configuration file should now look something like this:

      {
         "key" : "RepeatMasker",
         "trackType" : "CanvasFeatures",
         "storeClass" : "JBrowse/Store/SeqFeature/NCList",
         "style" : {
            "color" : "function(feature) { var name = feature.get('Name'); if (name.match('Low_complexity') ) { return 'red'; } return 'black';  }",
            "subfeatureClasses" : {
               "match_part" : "feature"
            },
            "className" : "generic_parent"
         },
         "onClick" : {
           "iconClass" : "dijitIconDatabase",
           "action" : "newWindow",
           "url" : "http://www.google.com/search?q={name}",
           "label" : "Search for {name} at Google",
           "title" : "function(track,feature,div) { return 'Searching for '+feature.get('name')+' at Google'; }"
        },
         "label" : "repeat masker",
         "urlTemplate" : "tracks/repeat masker/{refseq}/trackData.json",
         "compress" : 0,
         "type" : "CanvasFeatures"
      },

Using Plugins

JBrowse is built with a very flexible and powerful plugin system. The JBrowse developers are working on a plugin registry website, but for the time being, you can look at the source code for what will drive the website in the jbrowse-registry github repo, and in particular, the file that contains the info about the available plugins, plugins.yaml.

For this tutorial, we'll look at a plugin that is shipped with JBrowse but isn't turned on by default. JBrowse plugins are typically stored in the plugins directory, and in 1.12.1's plugin directory there are 6 plugins:

 CategoryUrl
 DebugEvents
 HideTrackLabels
 NeatCanvasFeatures
 NeatHTMLFeatures
 RegexSequenceSearch

and the RegexSequenceSearch plugin is already activated (look under the "Track" menu for it). We will turn on the HideTrackLabels plugin. Open jbrowse.conf:

 gedit jbrowse.conf

and find (cntl-F) "plugins", which will look like this:

## uncomment and edit the example below to enable one or more
## JBrowse plugins
# [ plugins.MyPlugin ]
# location = plugins/MyPlugin
# [ plugins.AnotherPlugin ]
# location = ../plugin/dir/someplace/else

Add below this a few lines to active the HideTrackLabels plugin:

[ plugins.HideTrackLabels ]
location = plugins/HideTrackLabels

Note that not all plugins are activated this way: typically, if it modifies the way JBrowse *works*, it will go here. If the plugin modifies the way tracks look will go in the trackList.json file.

JBrowse Features

Highlighting interesting things

To highlight a region, you can either right-click on a feature and select 'highlight this', or you can set the highlight explicitly to a certain genomic region by clicking "View -> Set highlight" in the menu bar.

Beginning in JBrowse 1.10.0 you can also highlight a region with the mouse by clicking the highlighter tool (next to the Go button) and clicking and dragging to highlight a region.

Opening local files

JBrowse can display GFF3, BAM, BigWig, and VCF+Tabix files directly from your local machine without the need to transfer any data to the server. Just use the "File -> Open" tool from the menu bar to add tracks using local files.

Combination tracks

Starting in version 1.10.0, users can define tracks that are combinations of the data in other tracks. The operations used to combine these tracks can be set operations (union, intersection, subtraction), arithmetic operations for quantitative tracks (addition, subtraction, multiplication, division), and/or masking operations to just highlight or mask some regions based on data in another track.

To add a combination track, select "File->Add combination track" from the menu bar, and drag existing tracks into the new combination track to start combining them.

Upgrading an Existing JBrowse

If the old JBrowse is 1.3.0 or later, simply move the data directory from the old JBrowse directory into the new JBrowse directory after running the setup.sh script.

Common Problems

  • JSON syntax errors in configuration files (2.x series will stop this madness!)

Other links