Difference between revisions of "JBrowse FAQ"

From GMOD
Jump to: navigation, search
Line 157: Line 157:
  
 
# You actually have exceeded the chunkSize during regular loading of data. You might see one specific block/region out of your whole track is giving this error. In this case, simply increase it.
 
# You actually have exceeded the chunkSize during regular loading of data. You might see one specific block/region out of your whole track is giving this error. In this case, simply increase it.
# Your data is actually fairly sparse so the STATS ESTIMATION, which "doubles" the region it searches in until it gets enough data. If it doubles too many times, then the chunk will become large and then hit the limit. You can increase chunkSizeLimit, but it may be unreasonably large and this is technically a jbrowse bug.
+
# Your data is actually fairly sparse so when it first starts up, the "stats estimation routine", which "doubles" the region it searches in until it gets enough data, is failing. If it doubles too many times, then the chunk will become large and then hit the limit. You can increase chunkSizeLimit, but it may be unreasonably large and this is technically a jbrowse bug.
  
 
'''How to I customize the dialog boxes?'''
 
'''How to I customize the dialog boxes?'''

Revision as of 01:09, 10 March 2016

Some frequently asked questions, and some hypothetical ones for fun


How do I get started with installing JBrowse?

To setup jbrowse on your server, all you need to do is drop the files in your webapps folder on a web server and run the setup scripts. You can do this by downloing a .zip file release from http://jbrowse.org (e.g. JBrowse-1.12.1.zip) and unzipping it into your webapps dir. Change the permissions on the folder (i.e. "chown" it to yourself) if you end up having to use sudo to put it there. Then run setup.sh

Running setup.sh may have errors, but if it can at least get the perl pre-requisites installed, then you can then use scripts like flatfile-to-json.pl and prepare-refseqs.pl to get started


Note: also see the JBrowse Quick-start Tutorial http://jbrowse.org/code/JBrowse-1.12.1/docs/tutorial/ and the JBrowse desktop starting guide (the JBrowse desktop does not require a webserver)

I see a message that says "Congratulations, JBrowse is on the web" but I don't see my genome

The only reason you see this message is because jbrowse didn't find a folder named "data". Normally, when you load your genome, it creates the "data" folder and makes this message go away.

Note: if you have run setup.sh, this page will also show you a link to the "Volvox test data". If the volvox test data link is there, you can continue to load your genome, otherwise, try running setup.sh again. If you can't figure out the setup.sh, then contact the jbrowse team for details. http://jbrowse.org/contact/

How do I load my genome after I get JBrowse installed to a web folder?

You can download a FASTA file for your genome, and run

bin/prepare-refseqs.pl --fasta yourfile.fasta

This will setup a "data" subfolder inside your jbrowse subdirectory with your genome prepared to view

Then just open up "http://yourserver.com/jbrowse" and it should be visible with the reference sequence as a track


What does generate-names.pl do?

Generate-names.pl will create a "search index" on, by default, the "names, IDs, and Alias" fields for tracks loaded with flatfile-to-json.pl or biodb-to-json.pl. It will not try to index ids from BAM files or bigwigs, that would be silly, but it does also index names from VCF files.

You can select specific tracks that you want to index with --tracks arguments to generate-names.pl. You can disable "autocomplete" by setting --completetionLimit 0 on generate-names.pl. You can "update" your search index by using --incremental

Bonus feature: you can also index additional fields of a GFF file for example, if, during load using flatfile-to-json.pl, you specify the --nameAttributes to contain additional fields.

Why do I get a popup saying "Error reading from name store"?

The name store issue in JBrowse is finicky. You can try a couple things to fix the error following

  1. Refresh your browser (especially in Apollo, where session can expire)
  2. Re-run generate-names.pl
  3. Re-run generate-names.pl --hashBits 16 (manually specifying the hashBits can fix error sometimes)
  4. Make sure that your Name and ID fields don't contain full text descriptions (they should be symbols or identifiers). You something like jbrowse_elasticsearch for full text descriptions. You can also use jbrowse_elasticsearch as a drop in replacement, but it is still experimental.

What is the "label" in trackList.json and what is the key?

The track "label" is more like the track "identifier", it should be unique! The key is actually more like the name that is displayed for the track. It might sound counter intuitive to have label and key this way. Key is not a required attribute, but label is. The label can be specified by --trackLabel on command line tools. The key can be specified by --key.


How do I search for a feature in JBrowse

Some people don't know this, but the box that shows your current location, e.g. "chr1:0...1000 (1.0 Kb)" is also a search box! You can search for things that generate-names.pl indexed here.

Also, the search index can be used to "link" to features, for example, if you construct a link such as http://localhost/jbrowse/?loc=GENE1234

Then the search index will resolve the location of that gene and jump to it automatically.


How do I get full text descriptions to be searched?

Try out jbrowse_elasticsearch, it is still experimental but it allows this. Or, implement your own JBrowse REST names API. The default generate-names.pl is not built for searching full text descriptions.


How to get default tracks to display every time a user opens the browser?

There are severeal config variables (which you can define in any of your config files as a comma separated list of track LABELS)

  • Always on tracks: Track always come up
  • Force tracks: Overridden by URL bar
  • Default tracks: Overridden by URL bar and cookies

What is this error message loading GFF3?

If you get an error similar to this:

_Argument "-" isn't numeric in addition (+) at /Library/WebServer/Documents/scbrowse/JBrowse-1.12.0/bin/../src/perl5/Bio/JBrowse/FeatureStream/GFF3_LowLevel.pm line 32, <$f> line 44611._

Make sure your GFF3 is tab delimited

How do I set up multiple genomes in a single jbrowse instance?

By default, the scripts will output to a subdirectory called "data" in the jbrowse folder

You can control that output with most scripts using the —out parameter. This enables you to have "multiple data directories". Once the data directories are ready (inside the jbrowse folder), then use the URL bar to select which data directory to use with ?data=my_data_dir e.g.

http://mysite.org/jbrowse/?data=data1 http://mysite.org/jbrowse/?data=data2

What is the dataset selector

The dataset selector is a dropdown that can list all the genomes that are in your jbrowse instance

To configure the dataset selector, set a dataset_id inside your trackList.json or tracks.conf on your data directory, and then in jbrowse.conf, add a list of all your datasets with the dataset_ids that you listed in the genome's data directory.

See http://gmod.org/wiki/JBrowse_Configuration_Guide#Dataset_Selector

I set the style->height but it doesn't work

Don't add quotes to the numbers in your JSON trackList.json! Numbers can remain unquoted.

Note: I only chose style->height as an example here to mention that booleans and numbers in your JSON config files should be unquoted.

I set defaultTracks in jbrowse.conf but it doesn't work

Don't add quotes around the strings in your jbrowse.conf or tracks.conf files! The .conf format is a JBrowse custom format that does not require quotes, and the quotes will actually mess it up!

i.e.

use

defaultTracks=mytrack1,mytrack2

not

defaultTracks="mytrack1,mytrack2"

Note: I only chose this "defaultTracks" as an example to tell you not to add quotes to things in jbrowse.conf and tracks.conf files

My features don't have gene names (but I expected there to be names?)

If you have a very dense track with many features, JBrowse might decide to hide the labels to save space, but you can force them to display again by adding this to your trackList.json

"style":{"labelScale": 0.01}

This says that the label will be displayed when the zoom level is greater than 0.01 (which is measured in pixels per base pair, at max zoom level, there are 25 pixels per base pair for example).

How do I change the name that is displayed on my features

If you don't like the names in the "Name" or "ID" column of your GFF, and you instead want to use some other field as the name to be displayed, then you can add this to your trackList.json

"style": {"label": "my_custom_field"}

Note that you should probably index "my_custom_field" with generate-names.pl too, so you can load your GFF from flatfile-to-json.pl with --nameAttrbutes "my_custom_field", and then when you run generate-names.pl, your attribute will be found. That is optional but helps for searching.

How do I zoom in even more?

Set the config variable

view.maxPxPerBp=50

In your config file, e.g. jbrowse.conf

Note sometimes the "translations" will appear wrong at high zoom levels, so don't depend on this for the protein translations

By default the max zoom level is 25, so setting it to 50 makes you able to zoom in twice as much.

It keeps showing "too much data" on my track. How do I fix it and make my track display?

Increase maxFeatureScreenDensity to a higher value. This value is by default 0.5 but if you allow a higher "density" of features, set it to 6 for example and the message should disappear.


I get the error "Too much data...chunk size xxxxx exceeds chunkSizeLimit"

Two things can happen

  1. You actually have exceeded the chunkSize during regular loading of data. You might see one specific block/region out of your whole track is giving this error. In this case, simply increase it.
  2. Your data is actually fairly sparse so when it first starts up, the "stats estimation routine", which "doubles" the region it searches in until it gets enough data, is failing. If it doubles too many times, then the chunk will become large and then hit the limit. You can increase chunkSizeLimit, but it may be unreasonably large and this is technically a jbrowse bug.

How to I customize the dialog boxes?

There are many ways

  1. Set onClick->action to contentDialog and then set onClick->content to a function that returns a deferred, e.g. return dojo.xhrGet(…).then(function(res) { return 'my content'; })
  2. Set onClick->action to a newWindow and set

How do I customize feature colors?

In CanvasFeatures, this is done with the style->color parameter. The style->color parameter can be a function, so for example, in trackList.json

    "style": {
       "color": "function(feature) { return 'red'; }
    }

Will make your features red

    "style": {
       "color": "red"
    }

would do the same thing

It can be dynamic too though

    "style": {
       "color": "function(feature) { return feature.get('score')>50 ?'blue':'red'; }"
}

If you get a very complex function, consider putting it in a separate functions.conf file and include it, see config guide "Including external functions in trackList.json"

What is the difference between CanvasFeatures and HTMLFeatures?

There are a lot of differences!

  • CanvasFeatures are newer.
  • CanvasFeatures can support Gene glyphs, i.e., a gene with multiple transcripts are grouped together on the screen. In HTMLFeatures, you have to load at the "transcript" level, so this loses the gene level info (if you do try to load the --type gene, it will just load "gene spans", but then it doesn't display the transcript subfeatures. not terrible, but not as cool as CanvasFeatures)

Try CanvasFeatures out by specifying --trackType CanvasFeatures

What is a glyph?

Glyphs are a "unit" of drawing in a CanvasFeatures track. The glypg is just code that is responsible for drawing a feature on the screen.

My CanvasFeatures don't show up with subfeatures, why not?

If your GFF does not follow this structure

gene->mRNA->exon+CDS

Then you need to add extra configuration

Specifically, if it is "transcript" instead of "mRNA", you must set

"transcriptType": "transcript"

Also, if you only have "exon" and no "CDS", then you need to set

"subParts": "exon"

Because the default settings assumes exons AND cds's.

How do I handle GFF with match and match_part in CanvasFeatures

If your GFF file has features with this structure

match -> match_part

This only has two levels, you might consider just setting the "Segments" glyph

"glyph": "JBrowse/View/FeatureGlyph/Segments"

The segments glyph accepts all subfeatures

Can I get started with JBrowse without all the fuss of setup.sh and what-not

Yes! Try the jbrowse desktop versions, built with electron!

The Windows and OSX versions are easy to use, and all you need is to open your fasta file (ideally: indexed fasta).

You can also open BAM tracks, BigWigs, VCF.gz, and soon, Tabix indexed GFF.

How do I create a Tabix indexed GFF

The tabix guide will suggest simply sorting with GNU sort

sort -k1,1 -k4,4n myfile.gff > myfile.sorted.gff
bgzip myfile.sorted.gff
tabix -p gff myfile.sorted.gff.gz

I would recommend similar, but use genometools for it's -tidy and -sortlines options, it has a similar overall workflow but it catches errors in your GFF

gt gff3 -sortlines -tidy myfile.gff > myfile.sorted.gff
bgzip myfile.sorted.gff
tabix -p gff myfile.sorted.gff.gz

Special note: if you have subfeatures of your gene that are outside that appear before the gene in the file, which can happen even if for example a UTR shares a start coordinate with a gene start.

The UTR should be repositioned after the gene

The decision is largely arbitrary, but it affects GFF parsing

How do I change the color of bigwig dynamically

The pos_color and neg_color accept callback functions. The phytozome browser has good examples of this with the VISTA plot tool

How do I set up a BAM file?

When you set up a BAM file in jbrowse, the best way to do it is as follows

  1. Put the BAM file and the BAM index (.bai) in your data directory (e.g. you downloaded jbrowse, and your did prepare-refseqs.pl, which created a data subfolder. Put your BAM file in there)
  2. Then use add-bam-track.pl like this: add-bam-track.pl —label mybam —bam_url mybam.bam —in data/trackList.json
  3. Important! The bam_url is the path of the file relative to the data folder, so here, since it is simply in "data" already, I just put the file name. This part can be confusing to new users, so it bears repeating: the —bam_url is literally a URL that jbrowse uses on the client side to access your BAM file. The URL that it uses is relative to your data directory
  4. Other notes: don't use bam-to-json.pl, it is old and converting your probably humongous next-gen-sequencing BAM into text json is unwieldy
  5. Other notes 2: Your bam index should just be named the same as your BAM with .bai on the end
  6. Other notes 3: add-bam-track.pl does NOT copy the bam to your data directory for you, you put it there yourself, and specify the —bam_url appropriately

How do I set up a BigWig file?

It is the same as a BAM file, except you use add-bw-track.pl instead of add-bam-track.pl

How do I set up a VCF file?

This is where the training wheels come off

There is no add-vcf-track.pl, so you have to edit trackList.json yourself

First bgzip and tabix your vcf file

    bgzip myfile.vcf
    tabix -p vcf myfile.vcf.gz

If your VCF isn't sorted for any reason, just use the GNU sort utility to sort it by chromosome and coordinate or get vcf-sort from vcftools

Now that your VCF is indexed, follow these steps

  1. Put the myfile.vcf.gz and myfile.vcf.gz.tbi in your data directory
  2. Edit data/trackList.json
  3. Put the following in there:
    {
       "label": "mytrack",
       "urlTemplate": "myfile.vcf.gz",
       "storeClass": "JBrowse/Store/SeqFeature/VCFTabix", 
       "type": "CanvasVariants"
    }

That isn't too bad right? All that add-bam-track.pl does is automate that for your similarly for BAMs, so now that you have edited the config file by hand you are ready to take on the world!


Can I speed up JBrowse load time with VCF and BAM files

If the BAM and VCF files you have are large, the BAM index or TABIX index files can become large as well. Since the indexes must be fully downloaded before any of the data can be displayed, you can break your files up by chromosomes, and use {refseq} in a urlTemplate to break it up into manageable chunks.

E.g.

"urlTemplate": "myfile_{refseq}.bam"

That would search for myfile_chr1.bam and myfile_chr1.bam.bai when you open that track while browsing chr1

Can I add a loading bar while JBrowse is starting up?

Yes, you can configure one. See this section http://gmod.org/wiki/JBrowse_Configuration_Guide#Configure_a_Loading_Page


Can I speed up generate-names.pl?

Try using --completionLimit 0 with the command. It will disable autocompletion but still allow you to search exact matches.

What is the deal with all the different config file formats?

Jbrowse uses both json and ".conf" files for configuration, and both file types can contain the same types of information

The trackList.json and jbrowse_conf.json are examples of the json based format

The tracks.conf and jbrowse.conf are examples of the conf based format

The is a flexible system of "includes" so that all of these are mashed together at runtime into a usable config

The "order" behind it all is that

  1. index.html initializes a "Jbrowse" object

  2. by default the JBrowse object "includes"  both jbrowse_config.json and jbrowse.conf

  3. jbrowse_conf.json is empty by default, and jbrowse.conf by default includes {dataRoot}/trackList.json and {dataRoot}/tracks.conf

  4. dataRoot is resolved dynamically to whatever is currently specified as /jbrowse/?data=… on the URL, or it is just "data" by default

  5. your own trackList.json or tracks.conf files can themselves include other files, such as a "functions.conf" file

How do I fix the "Not a BAM file" issue?

This is normally due to a misconfigured headers on your Apache server. Disable mime_magic module and see the JBrowse configuration guide for tips

What is the error "invalid BGZF header" on my VCF files?

Your server is misconfigured for VCF.GZ files, and this can be due to it thinking that it should set "Content-Encoding: gzip" on the VCF.GZ files

This literally causes me madness because it is hard to figure out why the hell it is doing that, but nginx for example is configured better by default, so you might consider that

How do I add categories to the Hierarchical data selector?

the hierarchical data selector can support multiple levels of drop down categories

On your track, you could have

    {
     "metadata": {"category": "ParentCatgory / DiseaseBAM"},
     "label": "myTrack",
     "storeClass": "JBrowse/Store/SeqFeature/BAM",
     "type": "JBrowse/View/Track/Alignments2"
    },
    {
     "metadata": {"category": "ParentCatgory / NonDiseaseBAM"},
     "label": "myTrack2",
     "storeClass": "JBrowse/Store/SeqFeature/BAM",
     "type": "JBrowse/View/Track/Alignments2"
    }

How do I collapse categories in the Hierarchical data selector by default?

You can set collapseCategories="ParentCategory1/ChildCategory,ParentCategory2/ChildCategory", etc a comma separated list (don't include spaces around the slashes though.

Can I make an ultra-compact setting on my features?

Yes you can!

The styles on "CanvasFeatures" include normal, compact, and collapse

By default, compact divides the height of glyphs by 4, so if you make the height of your features smaller with style->height then when you set compact it will be ultra compact.

Can I visualize junctions from RNA-seq data

Yes, try out the SashimiPlot plugin

Can I view GCContent on my sequence data?

Yes, the GCContent plugin will calculate the GCContent from your sequence data automatically.

It works fairly well on mid size genomes. If you have very large megabase scale assemblies, then you might consider pre-calculating the GCContent.

Can I view GWAS results in JBrowse?

Yes, the VariantViewer plugin does this. It provides the following features:

  1. Plotting the y-axis for variants as from the "score" of the feature (typically -Math.log(pvalue)
  2. Has some fancy tricks to allow mouseovers and click options even though the features are moved all over the screen
  3. Integration with myvariant.info

Why does my trackList.json contain "className" (even on CanvasFeatures?)

className refers to a CSS class. If you are using CanvasFeatures, this will be totally unused. If you are using HTMLFeatures, then you can add custom CSS to make your feature have a custom class. Note, that the "subfeatureClasses" is a related variable: it is a CSS class for subfeatures.

By default, it would just use the "exon" class for exons or whatnot, but subfeatureClasses allows you to create a map e.g.

"subfeatureClasses" {"exon": "myCustomExonCSSClass"}

Why are my subfeatures being displayed as separate features?

Your GFF should use proper ID and Parent relations. Your subfeatures do not need to themselves have IDs if they have no further subfeatures, but they must have a Parent pointing to the Parent's ID

Note that it should be spelled Parent, not PARENT

How can I only load a specific type of feature from my GFF file?

You can use the —type argument for flatfile-to-json.pl

E.g.

flatfile-to-json.pl —type mRNA —gff mygff.gff

This will only load mRNAs from the GFF. Additionally, if you want to filter on the source column of the GFF, you can augment the —type argument with an extra formatted parameter for source —type mRNA:augustus

The —type argument can also be commas separated

What if I dont want to load the sequence data for the genome, but I want to display the features?

prepare-refseqs.pl accepts a —sizes parameter, which takes a file that is two columns, col1 is refseq names, col2 is refseq lengths

Can I have subtracks in JBrowse?

You can make a custom plugin to do this. The "multibigwig" plugin is an example of this

Why does my track keep saying "Loading"?

This normally means some javascript code for handling the track has crashed. Check your javascript console for clues on how to fix it. Note: you should use the JBrowse-1.11.6-dev.zip package for debugging, as opposed to JBrowse-1.11.6.zip, because the -dev package contains "un-minified" source code

Can I run JBrowse without making a webserver?


Commonly, people will download jbrowse and double click the html file and open up file:///c/myfolder/jbrowse/index.html and think it is working, but running JBrowse like this is not recommended. Note: it may appear to work for some limited cases opening up file:/// paths in the browser, but it fails on BAMs and many other operations.

You should use a well established HTTP server such as Apache, nginx, or even something like Tomcat can work.

Note: You can also use JBrowse Desktop to run JBrowse without a webserver.

Also note: servers like "SimpleHTTPServer" from Python or "http-server" from NPM are generally not full featured enough to run all jbrowse features correctly (specifically with regards to not handling range-requests in case of Python SimpleHTTPServer and not handling content-encoding correctly on tabix files with NPM http-server).

How do I get coverage for a BAM file?

  1. Use the SNPCoverage track
  2. Use the FeatureCoverage track type
  3. Make a bigwig for your BAM file (recommend: use "bedtools genomecov" to convert the BAM to bedgraph, and the convert bedgraph to bigwig with UCSC bedGraphToBigWig)

What do the colors mean on the BAM files for JBrowse

  • Light red is a forward read that is paired
  • Super light red is a forward read that is badly paired
  • Dark read is a forward read that is missing a pair
  • Light blue is a reverse read that is paired
  • Super light blue is a reverse read that is badly paired
  • Dark blue is a reverse read that is missing a pair
  • Grey/black is a read whose pair is on another chromosome

Can I use RNA-seq with JBrowse

Yep! The regular alignments track supports RNA-seq and will show spliced alignments. Note that there are two extra options the are special for RNA-seq

  • The "Use XS" option is a RNA-seq specific flag for detecting strand according to canonical splice sites
  • The "Use reversed template" option is flag normally used for stranded paired-end data  to make both reads in a pair look like they are in the same direction


Error during setup.sh like "No such file or directory at /loader/0x13517b30/App/cpanminus/script.pm** line 224."**

This can normally be fixed by deleting ~/.cpanm

It may be due to conflict between jbrowse's own cpanm and your system cpanm, not sure though.

Generally deleting ~/.cpanm is harmless, it is a "build" directory

What is "Integer overflow error"?

From what we have seen, the "Integer overflow error" sometimes appears on BigWig tracks when your webserver is not configured correctly. It seems to be due to errors with a "reverse proxy" or something not forwarding the data properly.

Therefore, it is most likely not due to corrupted bigwig files or whatnot, but more probably, due to your server's configuration.

Can I use JBrowse with phantomJS?

Yes! See http://gmod.org/wiki/JBrowse_Configuration_Guide#Rendering_high_resolution_screenshots_using_PhantomJS for an example

Can I create an adaptor for an existing web service?

The REST API track is sort of limited in some respects, but you can create your own "store class" as a plugin. This basically just requires one thing:

bin/new-plugin.pl MyPlugin 

Then, simply make a dojo class (using "dojo declare") in your plugin that implements a "getFeatures" function. The getFeatures function receives a query object with query.start, query.end, query.ref e.g. chr1 along with 3 callbacks: featureCallback, finishCallback, and errorCallback. If there is an error, than call the error callback obviously. Otherwise, for each feature that you want to display, call featureCallback with that (use JBrowse/Model/SimpleFeature to represent the feature). When you are out of features for the query region, call finishCallback.