Difference between revisions of "GBrowse FAQ"

From GMOD
Jump to: navigation, search
m (How do I show circular genomes?)
m (How many tracks can be displayed in GBrowse?: fix link)
 
(38 intermediate revisions by 7 users not shown)
Line 1: Line 1:
__TOC__
 
[[Category:FAQ]] [[Category:To Do]]
 
 
 
==About this FAQ==
 
==About this FAQ==
  
Line 13: Line 10:
  
  
==GBrowse Questions==
+
==General Questions==
  
 
===What is GBrowse good for?===
 
===What is GBrowse good for?===
Line 39: Line 36:
 
===I have a problem. What do I do?===
 
===I have a problem. What do I do?===
  
First consult the mailing lists at http://sourceforge.net/mailarchive/forum.php?forum_id=31947. Your problem may           already have been reported and discussed. If you find no help       there, then send email to gmod-gbrowse@lists.sourceforge.net. If you are pretty certain you have found a bug, please report it to the bug report tracking system at http://sourceforge.net/tracker/?func=add&group_id=27707&atid=511474.
+
First consult the {{GBrowseNabbleArchive|GBrowse mailing list archive at Nabble}}. Your problem may already have been reported and discussed. If you find no help there, then send email to gmod-gbrowse@lists.sourceforge.net. If you are pretty certain you have found a bug, please report it to the [http://sourceforge.net/tracker/?func=add&group_id=27707&atid=391291 bug report tracking system] (set the category to "GBrowse").
 
+
 
+
===How to add an outgoing link to a text on the feature detail page in Gbrowse?===
+
 
+
Add a line to the appropriate *.conf file using ''link''. For example:
+
<perl>
+
link = http://www.ncbi.nih.gov/SNP/snp_ref.cgi?rs=$name
+
</perl>
+
  
 +
==Problem-solving==
  
 
===Where do I download GBrowse?===
 
===Where do I download GBrowse?===
  
From http://sourceforge.net/project/showfiles.php?group_id=27707.     If you want to live on the bleeding edge, you may try the development version of GBrowse. Instructions for accessing the development   version of GBrowse can be found at http://www.gmod.org/cvs.shtml.
+
From [http://sourceforge.net/project/showfiles.php?group_id=27707 SourceForge]. If you want to live on the bleeding edge, you may try the development version of GBrowse. Instructions for accessing the development version of GBrowse can be found at [[Subversion]].
  
 
===How do I install GBrowse?===
 
===How do I install GBrowse?===
Line 58: Line 48:
 
After you unpack GBrowse, detailed installation instructions will      be found in the top level directory, in the file INSTALL.
 
After you unpack GBrowse, detailed installation instructions will      be found in the top level directory, in the file INSTALL.
  
 +
===Where do I find a list of all available glyphs?===
 +
 +
There is a list of glyphs at the end of the documentation for {{CPAN|Bio::Graphics::Glyph}} which you can also see by executing
 +
  perldoc Bio::Graphics::Glyph
 +
 +
from the command line.
  
 
===When I search, why doesn't GBrowse find my 3-letter gene name?===
 
===When I search, why doesn't GBrowse find my 3-letter gene name?===
  
If you are using the MySQL GFF adaptor and are storing gene names  inside Note attributes, then you may bump up against MySQL's default four-letter limit on full text searches. To fix this, either:
+
If you are using the [[MySQL]] [[GFF]] [[GBrowse Adaptors|adaptor]] and are storing gene names  inside Note attributes, then you may bump up against MySQL's default four-letter limit on full text searches. To fix this, either:
  
 
* Put the gene name in an Alias attribute, e.g. "Alias LEP"
 
* Put the gene name in an Alias attribute, e.g. "Alias LEP"
Line 68: Line 64:
  
 
* Change MySQL to allow searches on 3-character words.
 
* Change MySQL to allow searches on 3-character words.
         
+
 
 
The latter solution is a multi-step process:
 
The latter solution is a multi-step process:
  
Line 79: Line 75:
  
 
     mysql> repair table fattribute_to_feature quick;
 
     mysql> repair table fattribute_to_feature quick;
 
  
 
===How do I use semantic zooming to hide a track completely?===
 
===How do I use semantic zooming to hide a track completely?===
Line 89: Line 84:
  
  
===I have a multi-segmented feature (such as a multi-exon transcript). It looks fine at low power, but when I zoom in the connecting lines between segments disappear. Help!===
+
===How to add an outgoing link to a text on the feature detail page in Gbrowse?===
  
You need to structure the feature in such a way that it has a single parent part that spans the whole feature from end to end, and then use the appropriate aggregator. For example, the "match" aggregator looks for a parent feature of type "match" and subpart features of type "HSP." So the GFF file you load should look like this:
+
Add a line to the appropriate *.conf file using ''link''. For example:
 +
<syntaxhighlight lang="perl">
 +
  link = http://www.ncbi.nih.gov/SNP/snp_ref.cgi?rs=$name
 +
</syntaxhighlight>
  
            Chr1 . match  1  1000 . . . ID=Hit27
 
            Chr1 . HSP    1  200 . . . Parent=Hit27
 
            Chr1 . HSP  500  600 . . . Parent=Hit27
 
            Chr1 . HSP  900  1000 . . . Parent=Hit27
 
  
In GFF2 format, the example will look like this:
+
===I have a multi-segmented feature (such as a multi-exon transcript). It looks fine at low power, but when I zoom in the connecting lines between segments disappear. Help!===
  
            Chr1 . match 1  1000 . . . Hit Hit27
+
You need to structure the feature in such a way that it has a single parent part that spans the whole feature from end to endWhen using [[GFF3]] and a Bio::DB::SeqFeature::Store database (see [[GBrowse Adaptors]]), that is all you have to do. For example, using "match" as the feature in track configuration of the [[GBrowse Configuration HOWTO|GBrowse configuration file]] and GFF3 like this would work:
            Chr1 . HSP    1  200 . . . Hit Hit27
+
            Chr1 . HSP  500  600 . . . Hit Hit27
+
            Chr1 . HSP  900  1000 . . . Hit Hit27
+
  
For transcripts, use the "processed_transcript" aggregator and create features with a main part of "mRNA" and subparts of "CDS", "exon", and/or various types of UTRs.
+
            Chr1 . match        1 1000 . . . ID=Hit27
 +
            Chr1 . match_part    1  200 . . . Parent=Hit27
 +
            Chr1 . match_part  500  600 . . . Parent=Hit27
 +
            Chr1 . match_part  900  1000 . . . Parent=Hit27
  
 +
In [[GFF2]] format, you will need to use an aggregator, in this case, the "match" aggregator.  Example GFF2 will look like this:
  
===I'm using the GFF database adaptor. Is it better to load it using GFF2 or GFF3?===
+
            Chr1 . match    1  1000 . . . Hit Hit27
 +
            Chr1 . HSP      1  200 . . . Hit Hit27
 +
            Chr1 . HSP    500  600 . . . Hit Hit27
 +
            Chr1 . HSP    900  1000 . . . Hit Hit27
  
GFF3.
+
And you will use "match" (the name of the aggregator, not the name of the parent feature) as the feature in the track configuration in the [[GBrowse Configuration HOWTO|GBrowse config file]].
  
GFF2, described in the GBrowse tutorial, is the older version of the GFF feature annotation format. Its main limitation is that it cannot represent features that have more than one level of nested  subparts. For example, you cannot represent the relationship between a gene, two alternatively spliced transcripts, and the   exons inside the transcripts. GFF3 corrects this problem as well as a number of other deficiencies.
+
For transcripts, use the "processed_transcript" aggregator and create features with a main part of "mRNA" and subparts of "CDS""exon", and/or various types of UTRs.
 +
 
 +
===I'm using the GFF database adaptor. Is it better to load it using GFF2 or GFF3?===
  
You can load a BioPerl GFF database using either GFF2 or GFF3 format. However, the BioPerl GFF database schema has not yet been  updated to handle the new features of GFF3. Therefore you will not  benefit from most of GFF3's features. In particular, BioPerl will use the GFF3 ID as the name of the feature, rather than NAME (the  provided name will be recognizeed as a synonym). In addition, BioPerl allows only one level of feature nesting.
+
[[GFF3]].
  
The bottom line is that you might want to use GFF3 in order to have  forward-compatiility with new versions of Bio::DB::GFF. Otherwise GFF2 will work perfectly well.
+
[[GFF2]], described in the [http://gmod.svn.sourceforge.net/viewvc/gmod/Generic-Genome-Browser/branches/stable/docs/tutorial/dbgff/tutorial.html?content-type=text%2Fhtml Using GBrowse with Bio::DB::GFF] tutorial, is the older version of the [[GFF]] feature annotation format. Its main limitation is that it cannot represent features that have more than one level of nested subparts. For example, you cannot represent the relationship  between a gene, two alternatively spliced transcripts, and the  exons inside the transcripts. [[GFF3]] corrects this problem as well as a number of other deficiencies.
  
 
===How do I pass parameters into functions of init_code?===
 
===How do I pass parameters into functions of init_code?===
Line 129: Line 129:
 
                             return int($a + 0.5);
 
                             return int($a + 0.5);
 
                         }
 
                         }
+
 
 
             [TRACKS]
 
             [TRACKS]
 
             label    = sub {
 
             label    = sub {
Line 142: Line 142:
 
===How do I show circular genomes?===
 
===How do I show circular genomes?===
  
GBrowse was designed for linear genomes. In order to represent circular genomes, you'll need to break the circle open at a convenient  spot (preferably a spot that is not spanned by any genes). If you do need to break a gene, you'll have to put two entries for it in the GFF load file:
+
A patch is currently being developed to display circular genomes. You can '''git clone''' the branch code at https://github.com/GMOD/GBrowse/tree/nliles_gbrowse_circular. The patch will display basic glyphs, but has problems showing sequences. To implement the patch, you must add the tag '''region=circular;''' to column 9 of the first line of the gff file.
 +
 
 +
===There's a problem with my overview and detail images - they're the same scale===
 +
 
 +
That usually is a problem with your reference sequences. Either you
 +
haven't set you reference class correctly in your config file (if you
 +
are using [[GFF3]], it should probably be set to 'Sequence'), or you don't
 +
have reference sequences in your [[GFF]] file/database.  That is, you don't
 +
have the lines in your GFF file that correspond to the chromosomes.
 +
 
 +
===Can I add Popup Balloon Tips to GBrowse?===
 +
 
 +
Yes! As of version 1.69, you can populate popup balloons similar to the ones used by Google Maps. The balloons can contain arbitrary HTML (including images) and can be set to appear when the user hovers over a feature or left clicks on it. See [[GBrowse Configuration/Balloons]] for details.
 +
 
 +
===Why is GBrowse on Chado so slow?===
 +
 
 +
While it is convenient to run GBrowse on top of a Chado database that is being actively edited (so that users can see "live" edits), the Chado adaptor for GBrowse is fairly slow.  This is because Chado is not designed as a database to drive an interactive display like GBrowse.  Other adaptors like {{BPM|Bio::DB::GFF}} and {{BPM|Bio::DB::SeqFeature::Store}} run on highly denormalized but fast to query databases.  Chado on the other hand is highly normalized, which is good for preventing data handling errors, but bad in general for speed.  There are some things that can be done, like [[Materialized_views|materializing views]] to speed some of the slower queries, but in practice most users find it easier to dump the contents of their Chado database to GFF3 and and load it into a {{BPM|Bio::DB::SeqFeature::Store}} database.  [[GMODTools]] is a good tool for setting up periodic dumps of your Chado database to GFF3.
 +
 
 +
=== The "Bio::Graphics::BrowserConfig=HASH ... Bio/Graphics/Browser.pm line 587" Error ===
 +
 
 +
Users of GBrowse 1.69 will see this message in their Apache error logs:
 +
 
 +
Bio::Graphics::BrowserConfig=HASH(0x''nnnnnnn'') at ''some_path''/Bio/Graphics/Browser.pm line 587
 +
 
 +
This is a debugging statement that was accidentally left in the GBrowse 1.69 release.  You can either ignore it (it does not affect anything), or install the latest version of GBrowse from [[Subversion]].
 +
 
 +
=== Can I show more than one glyph in the same track? ===
 +
 
 +
Yes.  To show multiple glyphs in the same track set the glyph with a Perl ''callback''.  For example:
 +
<pre>
 +
[newtrack]
 +
feature = sRNA  sRNA_HL
 +
glyph    = sub {
 +
                my $f = shift;
 +
                my $type = $f->type->method;
 +
                if ($type eq 'sRNA') {
 +
                    return 'xyplot';
 +
                }
 +
                else {
 +
                    return 'processed_transcript';
 +
                }
 +
            }
 +
</pre>
 +
Another example:
 +
<pre>
 +
glyph        = sub { my $strand = shift->strand; return $strand >=0 ? 'gene' : 'box' }
 +
</pre>
 +
 
 +
For a situation where a attribute wouldn't make sense, you could return undef.
 +
 
 +
You can set the glyph based on any information that is available in callbacks.
 +
 
 +
From email threads:
 +
* {{NabbleThreadLink|more-than-one-glyph-in-same-track-tp951795p951795.html|More than one glyph in the same track}}, Alaguraj Veluchamy, July 2010.
 +
* {{NabbleThreadLink|Turn-some-Wiggle-tracks-off-when-no-value-in-selected-region-td946842.html#a946982|Callbacks on Glyphs}}, Andreas Redl, July 2010.
  
            -----------------------------------------------------------
+
=== How many tracks can be displayed in GBrowse? ===
            ====>                                          >==========
+
            second half                                      first half
+
  
You may have to give the two halves different names in order to prevent GBrowse from trying to join the two halves. This is not optimal and will be fixed in a later release.
+
There is no set maximum number of tracks.  As of about GBrowse 2.13, GBrowse works with more than 1,000 tracks.  From this {{NabbleThreadLink|http://gmod.827538.n3.nabble.com/Turn-some-Wiggle-tracks-off-when-no-value-in-selected-region-tp946842p946842.html|email thread (Kai Xia, 2010/07}}, [[User:Lstein|Lincoln Stein]] said:
 +
<div class="quotebox">
 +
I think you'll find that 20,000 track definitions are going to slow GBrowse down to the point of unusability. I have made some fixes to GBrowse2 that allow you to display > 1000 tracks with reasonable performance, but I've never tested in the 10,000 track range.
  
 +
If you combine wiggle data into subtracks, then performance in GBrowse2 should be quite good. There is also an interface that lets you define metadata for each subtrack and search, filter and sort on the basis of this metadata (see [[Creating and Managing Subtracks with GBrowse2]]). I've also just now added an option that lets you hide subtracks that have no data currently showing.
 +
</div>
  
[[Category:GMOD Components]]
+
[[Category:FAQ]]
 +
[[Category:GBrowse]]

Latest revision as of 14:48, 18 May 2013

About this FAQ

What is this FAQ?

It is the list of Frequently Asked Questions about GBrowse.

How is it maintained?

It is now maintained as a Wiki on this site. You can help maintain it by adding questions and answers.


General Questions

What is GBrowse good for?

GBrowse was designed to view genomes. It displays a graphical representation of a section of a genome, and shows the positions of genes and other functional elements. It can be configured to show both qualitative data such as the splicing structure of a gene, and quantitative data such as microarray expression levels.

Another good way to get an overview of the features GBrowse offers is to read the documentation at the GBrowse Wiki page.


What platforms does GBrowse run on?

GBrowse is a web-server application that is implemented in the Perl programming language. It will run on any machine that runs Perl, including Windows, Macintosh OS X, and most versions of Linux and UNIX.


How is GBrowse distributed?

GBrowse is distributed as source code for Macintosh OS X, UNIX and Linux platforms, and as pre-packaged binaries for Windows machines.


What are the terms of use for GBrowse?

GBrowse is distributed under the Perl Artistic License, which allows for unrestricted use and distribution, including commercial use and resale. You may modify and distribute modified versions of GBrowse provided that you credit the original authors for their contribution.


I have a problem. What do I do?

First consult the GBrowse mailing list archive at Nabble. Your problem may already have been reported and discussed. If you find no help there, then send email to gmod-gbrowse@lists.sourceforge.net. If you are pretty certain you have found a bug, please report it to the bug report tracking system (set the category to "GBrowse").

Problem-solving

Where do I download GBrowse?

From SourceForge. If you want to live on the bleeding edge, you may try the development version of GBrowse. Instructions for accessing the development version of GBrowse can be found at Subversion.

How do I install GBrowse?

After you unpack GBrowse, detailed installation instructions will be found in the top level directory, in the file INSTALL.

Where do I find a list of all available glyphs?

There is a list of glyphs at the end of the documentation for Bio::Graphics::Glyph which you can also see by executing

 perldoc Bio::Graphics::Glyph

from the command line.

When I search, why doesn't GBrowse find my 3-letter gene name?

If you are using the MySQL GFF adaptor and are storing gene names inside Note attributes, then you may bump up against MySQL's default four-letter limit on full text searches. To fix this, either:

  • Put the gene name in an Alias attribute, e.g. "Alias LEP"

or

  • Change MySQL to allow searches on 3-character words.

The latter solution is a multi-step process:

  • Open /etc/my.cnf and add the following configuration line to the [mysqld] section:
    ft_min_word_len=3
  • Restart the mysql server.
  • Connect to your database using the mysql command-line client and run the command:
   mysql> repair table fattribute_to_feature quick;

How do I use semantic zooming to hide a track completely?

If you wish to turn off a track entirely, you can use the "hide" flag to hide the track when the display exceeds a certain size:

            [6_frame_translation:50000]
            hide = 1


How to add an outgoing link to a text on the feature detail page in Gbrowse?

Add a line to the appropriate *.conf file using link. For example:

 link = http://www.ncbi.nih.gov/SNP/snp_ref.cgi?rs=$name


I have a multi-segmented feature (such as a multi-exon transcript). It looks fine at low power, but when I zoom in the connecting lines between segments disappear. Help!

You need to structure the feature in such a way that it has a single parent part that spans the whole feature from end to end. When using GFF3 and a Bio::DB::SeqFeature::Store database (see GBrowse Adaptors), that is all you have to do. For example, using "match" as the feature in track configuration of the GBrowse configuration file and GFF3 like this would work:

            Chr1 . match         1  1000 . . . ID=Hit27
            Chr1 . match_part    1   200 . . . Parent=Hit27
            Chr1 . match_part  500   600 . . . Parent=Hit27
            Chr1 . match_part  900  1000 . . . Parent=Hit27

In GFF2 format, you will need to use an aggregator, in this case, the "match" aggregator. Example GFF2 will look like this:

            Chr1 . match    1  1000 . . . Hit Hit27
            Chr1 . HSP      1   200 . . . Hit Hit27
            Chr1 . HSP    500   600 . . . Hit Hit27
            Chr1 . HSP    900  1000 . . . Hit Hit27

And you will use "match" (the name of the aggregator, not the name of the parent feature) as the feature in the track configuration in the GBrowse config file.

For transcripts, use the "processed_transcript" aggregator and create features with a main part of "mRNA" and subparts of "CDS", "exon", and/or various types of UTRs.

I'm using the GFF database adaptor. Is it better to load it using GFF2 or GFF3?

GFF3.

GFF2, described in the Using GBrowse with Bio::DB::GFF tutorial, is the older version of the GFF feature annotation format. Its main limitation is that it cannot represent features that have more than one level of nested subparts. For example, you cannot represent the relationship between a gene, two alternatively spliced transcripts, and the exons inside the transcripts. GFF3 corrects this problem as well as a number of other deficiencies.

How do I pass parameters into functions of init_code?

You want to define a function in the init_code section which will then be called within callbacks.

The answer is to do something like this:

           [GENERAL]
           init_code = sub round {
                           my $a = shift;
                           return int($a + 0.5);
                       }
           [TRACKS]
           label    = sub {
                     my $feature = shift;
                     my $score   = $feature->score;
                     return "score = ",round($score);
                     }

Note that you'll need GBrowse version 1.63 or higher for this to work.


How do I show circular genomes?

A patch is currently being developed to display circular genomes. You can git clone the branch code at https://github.com/GMOD/GBrowse/tree/nliles_gbrowse_circular. The patch will display basic glyphs, but has problems showing sequences. To implement the patch, you must add the tag region=circular; to column 9 of the first line of the gff file.

There's a problem with my overview and detail images - they're the same scale

That usually is a problem with your reference sequences. Either you haven't set you reference class correctly in your config file (if you are using GFF3, it should probably be set to 'Sequence'), or you don't have reference sequences in your GFF file/database. That is, you don't have the lines in your GFF file that correspond to the chromosomes.

Can I add Popup Balloon Tips to GBrowse?

Yes! As of version 1.69, you can populate popup balloons similar to the ones used by Google Maps. The balloons can contain arbitrary HTML (including images) and can be set to appear when the user hovers over a feature or left clicks on it. See GBrowse Configuration/Balloons for details.

Why is GBrowse on Chado so slow?

While it is convenient to run GBrowse on top of a Chado database that is being actively edited (so that users can see "live" edits), the Chado adaptor for GBrowse is fairly slow. This is because Chado is not designed as a database to drive an interactive display like GBrowse. Other adaptors like Bio::DB::GFF and Bio::DB::SeqFeature::Store run on highly denormalized but fast to query databases. Chado on the other hand is highly normalized, which is good for preventing data handling errors, but bad in general for speed. There are some things that can be done, like materializing views to speed some of the slower queries, but in practice most users find it easier to dump the contents of their Chado database to GFF3 and and load it into a Bio::DB::SeqFeature::Store database. GMODTools is a good tool for setting up periodic dumps of your Chado database to GFF3.

The "Bio::Graphics::BrowserConfig=HASH ... Bio/Graphics/Browser.pm line 587" Error

Users of GBrowse 1.69 will see this message in their Apache error logs:

Bio::Graphics::BrowserConfig=HASH(0xnnnnnnn) at some_path/Bio/Graphics/Browser.pm line 587

This is a debugging statement that was accidentally left in the GBrowse 1.69 release. You can either ignore it (it does not affect anything), or install the latest version of GBrowse from Subversion.

Can I show more than one glyph in the same track?

Yes. To show multiple glyphs in the same track set the glyph with a Perl callback. For example:

[newtrack]
feature = sRNA  sRNA_HL
glyph    = sub {
                my $f = shift;
                my $type = $f->type->method;
                if ($type eq 'sRNA') {
                     return 'xyplot';
                }
                else {
                     return 'processed_transcript';
                }
             }

Another example:

glyph        = sub { my $strand = shift->strand; return $strand >=0 ? 'gene' : 'box' }

For a situation where a attribute wouldn't make sense, you could return undef.

You can set the glyph based on any information that is available in callbacks.

From email threads:

How many tracks can be displayed in GBrowse?

There is no set maximum number of tracks. As of about GBrowse 2.13, GBrowse works with more than 1,000 tracks. From this email thread (Kai Xia, 2010/07, Lincoln Stein said:

I think you'll find that 20,000 track definitions are going to slow GBrowse down to the point of unusability. I have made some fixes to GBrowse2 that allow you to display > 1000 tracks with reasonable performance, but I've never tested in the 10,000 track range.

If you combine wiggle data into subtracks, then performance in GBrowse2 should be quite good. There is also an interface that lets you define metadata for each subtrack and search, filter and sort on the basis of this metadata (see Creating and Managing Subtracks with GBrowse2). I've also just now added an option that lets you hide subtracks that have no data currently showing.