Difference between revisions of "Creating and Managing Subtracks with GBrowse2"

From GMOD
Jump to: navigation, search
(Wiggle Track Example)
(Full Example)
 
(25 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
''For the main GBrowse 2.0 HOWTO article, see: [[GBrowse 2.0 HOWTO]].''
 +
 
As of GBrowse version 2.09, you can create an unlimited number of subtracks within a single major track in order to group a series of datasets that are logically linked, such as a timecourse. You can choose which subtracks to show by default and the order in which they will appear. When the user clicks a designated area in the titlebar, a dialog box appears that allows the user to select which subtracks to make visible. The user can also drag subtrack labels up and down to adjust the order in which they are displayed.
 
As of GBrowse version 2.09, you can create an unlimited number of subtracks within a single major track in order to group a series of datasets that are logically linked, such as a timecourse. You can choose which subtracks to show by default and the order in which they will appear. When the user clicks a designated area in the titlebar, a dialog box appears that allows the user to select which subtracks to make visible. The user can also drag subtrack labels up and down to adjust the order in which they are displayed.
  
==A Basic Example==
+
There are two mechanisms for defining subtracks. The "Metadata" mechanism (new in version 2.48) is used when you have one feature per subtrack such as a whole genome quantitative ("wiggle") feature and each feature is enumerable by its display name. You provide a file that lists each feature subtrack explicitly.
 +
 
 +
The second mechanism is more flexible and is used when there are too many features to list explicitly and/or there are more than one feature per subtrack. In this scheme, each subtrack is defined by a set of feature filters. The filters are applied to each filter in turn, sorting them into the appropriate subtrack.
 +
 
 +
==Using Metadata==
 +
 
 +
If you have just a few features and there is a one-to-one correspondence between feature and subtrack, then the easiest way to define subtracks is by use of an external metadata file. A typical file looks like this:
 +
 
 +
[feature_name_1]
 +
:dbid        = f101
 +
:selected    = 1
 +
display_name = My First Feature
 +
type        = some_type1
 +
method      = my_method1
 +
source      = my_source1
 +
some_attribute    = value1
 +
another_attribute = value2
 +
 +
[feature_name_2]
 +
:dbid        = f102
 +
:selected    = 1
 +
display_name = My Second Feature
 +
type        = some_type2
 +
method      = my_method2
 +
source      = my_source2
 +
some_attribute    = value3
 +
another_attribute = value4
 +
 +
[feature_name_3]
 +
:dbid        = f103
 +
type        = some_type2
 +
method      = my_method2
 +
source      = my_source2
 +
some_attribute    = value5
 +
another_attribute = value6
 +
 
 +
Each [stanza] begins with the name of a feature as it is represented
 +
in the underlying database. Below each [stanza] heading are a series of tag=value pairs. The
 +
following tag names have special meaning:
 +
 
 +
:dbid          Optional unique identifier for the subtrack; If provided, it can be used in the GBrowse
 +
                URL to select the subtrack.
 +
:selected      If true, this subtrack is selected by default when the containing track is turned on.
 +
display_name  Display name for the feature. If not present, will
 +
                default to the feature's native display name (i.e. the one in the [stanza]).
 +
type          What is returned by calling the feature's type() method.
 +
method        What is returned by calling the feature's method() method.
 +
source        What is returned by calling the feature's source() method.
 +
score          What is returned by calling the feature's score() method.
 +
 
 +
Any other tags become sortable attributes which are displayed by the
 +
GBrowse subtrack selection dialog box. For this to work properly, each
 +
tag must be present in each stanza. Tags that are present in some
 +
stanzas and not others are ignored.
 +
 
 +
Save this file anywhere convenient and then associate it with the desired track using the '''metadata''' option. This option takes the full path name to the metadata file. For example:
 +
 
 +
[ChIP-Seq]
 +
database = Peaks
 +
feature  = signal
 +
glyph    = vista_ploat
 +
metadata = /var/www/gbrowse2/databases/chip-seq/metadata.txt
 +
 
 +
===Full Example===
 +
 
 +
Here is a full working example of a metadata-based subtrack definition that shows a number of ChIP-seq experiments with the attributes "factor", "stage" and "algorithm". The subtrack selection dialog that this generates is shown on the right.
 +
 
 +
[[image:subtrack_selection_table_metadata.png|thumb|right|ChIP-seq subtracks using metadata file]]
 +
 
 +
The database stanza:
 +
 
 +
<pre>
 +
[Chip:database]
 +
db_adaptor    = Bio::DB::SeqFeature::Store
 +
db_args      = -adaptor memory
 +
-dsn /var/www/gbrowse2/databases/elegans_peakcallcomparison
 +
search options = none
 +
</pre>
 +
 
 +
The features in this database are named "BLMP1-L1 Berkeley", "BLMP1-L1 IDR", etc. An excerpt from one of the GFF3 files that comprises this database can be found at [[Using the vista_plot Glyph]].
 +
 
 +
The track stanza:
 +
 
 +
[ChIP-Seq]
 +
database = Chip
 +
feature  = signal
 +
glyph    = vista_ploat
 +
metadata = /var/www/gbrowse2/databases/chip-seq/metadata.txt
 +
 
 +
 
 +
Metadata file in /var/www/gbrowse2/databases/chip-seq/metadata.txt.
 +
 
 +
<pre>
 +
[BLMP1-L1 Berkeley]
 +
:dbid=101
 +
:selected=1
 +
factor = BMLP1
 +
stage  = L1
 +
algorithm = Berkeley
 +
 
 +
[BLMP1-L1 IDR]
 +
:dbid=102
 +
:selected=1
 +
factor = BMLP1
 +
stage  = L1
 +
algorithm = IDR
 +
 
 +
[BLMP1-L1 Published]
 +
:dbid=103
 +
factor = BMLP1
 +
stage  = L1
 +
algorithm = Published
 +
 
 +
[UNC130-L1 Berkeley]
 +
factor = UNC130
 +
stage  = L1
 +
:dbid=104
 +
algorithm = Berkeley
 +
 
 +
[UNC130-L1 IDR]
 +
factor = UNC130
 +
stage  = L1
 +
:dbid=105
 +
algorithm = IDR
 +
 
 +
[UNC130-L1 Published]
 +
factor = UNC130
 +
stage  = L1
 +
:dbid=106
 +
algorithm = Published
 +
</pre>
 +
 
 +
Save this file as /var/www/
 +
 
 +
==Using Subtrack Select ==
 +
 
 +
This section describes how to use the '''subtrack select''' and '''subtrack table''' options to create subtracks based on filters.
 +
 
 +
===A Basic Example===
  
 
Here is a simple example to show how the system works. We start out with a gene track that has no subtracks:
 
Here is a simple example to show how the system works. We start out with a gene track that has no subtracks:
Line 34: Line 174:
  
  
The "subtrack table" option defines the values on which to partition the data. In this case, we are going to partition the data into two subtracks: one for positive strand features (strand() returns +1) and one for negative strand features (strand() returns -1). This option has two or more lines, each one separated by a semicolon. Each line corresponds to a subtrack, which will be filtered by the value(s) specified on the line. We simply list +1 and -1 as our two subtrack filter values.
+
The "subtrack table" option defines the values on which to partition the data. In this case, we are going to partition the data into two subtracks: one for positive strand features (strand() returns +1) and one for negative strand features (strand() returns -1). This option has two or more lines, each one separated by a semicolon. Each line corresponds to a subtrack, which will be filtered by matching against the value(s) specified on the line. We simply list +1 and -1 as our two subtrack filter values.
  
 
Reloading the browser now gives the track shown on the upper right. When the user clicks on the titlebar where it says "Showing 2 of 2 tracks", he can control the display of the subtracks using the dialog on the lower right.
 
Reloading the browser now gives the track shown on the upper right. When the user clicks on the titlebar where it says "Showing 2 of 2 tracks", he can control the display of the subtracks using the dialog on the lower right.
 +
 +
[[image:subtracks_genes_second_iteration.png|thumb|right|Second iteration of strand-specific subtracks]]
 +
[[image:subtracks_genes_dialog_second_iteration.png|thumb|right|Second iteration of the subtrack selection dialog box]]
  
 
This is good, but has two aesthetic issues. First, the labels on the subtracks appear as "+1" and "-1" which is not intuitive. Similarly, the strand values in the selection dialog also appear as +1 and -1. We can considerably improve this by attaching human-readable labels to the dimension values and subtracks. Here is an improved configuration file:
 
This is good, but has two aesthetic issues. First, the labels on the subtracks appear as "+1" and "-1" which is not intuitive. Similarly, the strand values in the selection dialog also appear as +1 and -1. We can considerably improve this by attaching human-readable labels to the dimension values and subtracks. Here is an improved configuration file:
Line 53: Line 196:
 
  key          = Wormbase Genes
 
  key          = Wormbase Genes
  
[[image:subtracks_genes_second_iteration.png|thumb|right|Second iteration of strand-specific subtracks]]
 
[[image:subtracks_genes_dialog_second_iteration.png|thumb|right|Second iteration of the subtrack selection dialog box]]
 
  
We've done two things here. First, we've modified the "subtrack table" option so that each line is preceded with :''Name'' where ''Name'' is what we want to appear to the left of the subtrack in the display. The ":" symbol is required in front of the name but will not appear in the display. It can appear anywhere relative to the data items in the option.
+
We've done two things here. First, we've modified the "subtrack table" option so that each line is preceded with :''Name'' where ''Name'' is what we want to appear to the left of the subtrack in the display. The ":" symbol is required in front of the name but will not appear in the display. It can appear anywhere relative to the match items in the option.
  
 
Second, we added a "subtrack select labels" option to the stanza. This relabels the selectable dimension values with the desired human-readable labels within the dialog box itself. Notice that subtracks can have different names than their selection labels. In this example, we choose "Forward Strand" for +1 features in the dialog box, but "Forward" for the subtrack name.
 
Second, we added a "subtrack select labels" option to the stanza. This relabels the selectable dimension values with the desired human-readable labels within the dialog box itself. Notice that subtracks can have different names than their selection labels. In this example, we choose "Forward Strand" for +1 features in the dialog box, but "Forward" for the subtrack name.
Line 64: Line 205:
 
<br clear="all />
 
<br clear="all />
  
==Multidimensional Subtracks==
+
[[image:subtrack_selection_table_overview.png|thumb|right|ChIP-seq subtracks]]
 +
[[image:subtrack_selection_table.png|thumb|right|modENCODE ChIP-seq subtrack selector]]
  
The previous example partitioned subtracks on a single dimension. This example will show how to create individually-selectable subtracks based on multiple selection dimensions. As an example, we use a track based on the [http://www.modencode.org modENCODE] ChIP-seq tracks. These have four different dimensions, corresponding to the antibody used to bring down chromatin-bound transcription factors, the organism's developmental stage, the temperature at which the organism was grown, and whether the data set has been validated.
+
===Multidimensional Subtracks===
  
[[image:subtrack_selection_table.png|300px|right|modENCODE ChIP-seq subtracks]]
+
The previous example partitioned subtracks on a single dimension. This example will show how to create individually-selectable subtracks based on multiple selection dimensions. As an example, we use a track based on the [http://www.modencode.org modENCODE] ChIP-seq tracks. These have four different dimensions, corresponding to the antibody used to bring down chromatin-bound transcription factors, the organism's developmental stage, the temperature at which the organism was grown, and whether the data set has been validated.
[[image:subtrack_selection_table_overview.png|300px|right|ChIP-seq subtrack selector]]
+
  
The image on the right shows an excerpt from the histone modification ChIP-seq track from the modENCODE project and the dialog box used to select among the subtracks. This track was created with the following configuration:
+
The images below and to the right show a development version of the histone modification ChIP-seq track from the modENCODE project and the dialog box used to select among the subtracks. This track was created with the following configuration:
  
 
  [ChIP-seq]
 
  [ChIP-seq]
Line 92: Line 233:
 
  brief comment = This track shows modENCODE ChIP-seq characterization of
 
  brief comment = This track shows modENCODE ChIP-seq characterization of
 
                 histone marks across various stages and growth conditions.
 
                 histone marks across various stages and growth conditions.
+
 
 
In this case, the '''subtrack select''' option has four lines, each one separated by a semicolon (the semicolon on the final line is optional). Each line defines a subtrack dimension and has the format described above, consisting of a column label and a Bio::SeqFeature method call. However, in this case all the dimensions are contained in the features' tags (also known as feature "attributes"), which are accessed using the tag_value() and has_tag() methods. To specify which tags we are interested in, there is a third space-delimited argument that contains the tag name. So the Antibody dimension is determined by calling $feature->tag_value('antibody') and the Confirmed dimension is determined by calling $feature->has_tag('confirmed').
 
In this case, the '''subtrack select''' option has four lines, each one separated by a semicolon (the semicolon on the final line is optional). Each line defines a subtrack dimension and has the format described above, consisting of a column label and a Bio::SeqFeature method call. However, in this case all the dimensions are contained in the features' tags (also known as feature "attributes"), which are accessed using the tag_value() and has_tag() methods. To specify which tags we are interested in, there is a third space-delimited argument that contains the tag name. So the Antibody dimension is determined by calling $feature->tag_value('antibody') and the Confirmed dimension is determined by calling $feature->has_tag('confirmed').
  
Line 99: Line 240:
 
The '''brief comment''' option serves to define a short description of the subtracks that will be printed in the subtrack selection dialog box.
 
The '''brief comment''' option serves to define a short description of the subtracks that will be printed in the subtrack selection dialog box.
  
;'''subtrack select'''
+
===Adding subtrack IDs to the Table===
:This required option defines the dimensions along which the subtracks are defined. The combination of dimensions are used to create the column headings for each subtrack. Each dimension consists of three white-space separated values, terminated by a semicolon. The first value is the column label, e.g. "Antibody". The second column is a method call name to be applied to the feature (discussed at more length later). The optional third argument is an argument to be passed to the method call.
+
 
 +
If you wish to embed GBrowse in another application, you will probably want to assign IDs to the subtracks so that they can be turned on and off via the GBrowse URL in the same way that whole tracks are turned on and off. To do this, append "=''identifier''" to each line of the subtrack table like this:
 +
 
 +
subtrack table = H3K4Me3  E0-4h 23  1 * =100 ;
 +
                  H3K4Me3  E4-8h 23  0  =101 ;
 +
                  H3K4Me3  pupae 23  0  =102 ;
 +
                  H3K4Me3  pupae 26  0  =103 ;
 +
                  H3K9Me2  E0-4h 23  0 * =104 ;
 +
                  H3K9Me2  E4-8h 23  1  =105 ;
 +
                  H3K9Me2  pupae 23  0  =106 ;
 +
                  H3K9Me2  pupae 26  1  =107 ;
 +
                  H3K27Me3 E0-4h 23  1 * =108 ;
 +
 
 +
The identifier can be any combination of letters and numbers. Its exact position on the line doesn't matter.
 +
 
 +
The identifiers can then be used to select subtrack in the GBrowse URL:
 +
 
 +
  http://your.site/cgi-bin/gb2/gbrowse/elegans/?q=I:1000..2000;l=ChIP-seq/100+102+103
 +
 
 +
This will select the region between positions 1000 and 2000 on chromosome I, and turn on the ChIP-seq track, subtracks 100, 102 and 103.
 +
 
 +
===Labeling Subtracks===
 +
 
 +
Subtracks are labeled on the left side of the panel. If you are using "label_position=left" in your stanza, or the genes glyph with "label_transcripts" set to true, then it is possible for the labels of individual features to collide with the subtrack labels. There are two ways around this problem. One is to turn off subtrack labeling entirely and to let each individual feature's label identify subtracks. This works well with chromosome-wide features such as wiggle tracks when there is only one feature per subtrack (this is illustrated in the modENCODE data set above). The other is to relocate the position of the subtrack label to the top of the subtrack, where it won't clash with the feature labels. GBrowse will attempt to detect potential clash cases, and to configure them for you automatically, but there may be cases where you to have intervene manually.
  
The method name can be any of the methods accepted by Bio::SeqFeatureI feature objects. Typical methods used for the second argument are "display_name" (the name of the feature), "type", "method", "source", "score", "has_tag", and "tag_value". "has_tag" and "tag_value" both require the third argument to determine which tag to interrogate. For example, in the example above, the "Antibody" column is populated by interrogating each feature by calling $feature->tag_value('antibody').
+
To implement the first scheme, set the option "group_label" to a false value:
  
;'''subtrack table'''
+
  group_label = 0
:This required option defines the subtracks; one line for each row of the subtracks selection table. Each line is terminated by a semicolon (the very last line doesn't actually need to be terminated). Within each line are a series of values to be matched against the dimensions specified by "subtrack select". In the above example, the first subtrack is defined by features whose antibody tag equals "H3K4Me3", has the tag "confirmed", has a stage tag matching "E0-4h" and a "temp" tag matching "23". The second subtrack is defined by features whose antibody tag equals "H3K4Me3", does not have the tag "confirmed", whose stage matches "E4-8h", and so on. There must be exactly as many match values in each row of "subtrack table" as there are dimensions defined in "subtrack select." Matching is done using a case-'''sensitive''' exact match unless you preface the value with a "~" sign, in which case a case-insensitive regular expression match is used.
+
  
Additional arguments can follow the match values. An argument that begins with the "=" sign assigns an ID to the subtrack. This ID can be used to turn the subtrack on and off via the URL-based API. For example, calling GBrowse's URL with the arguments "?l=ChIP-seq/1,2,5" will turn on this track's subtracks 1, 2 and 5. An asterisk (*) specifies which tracks are turned on by default. The ID and the asterisk, if any, can occur anywhere in the subtrack definition line.
+
Be sure that the label assigned to each feature is sufficiently informative that they can substitute for the subtrack label.
  
;'''brief comment'''
+
To implement the second scheme, set "group_label_position" to "top":
:This optional option is a bit of explanatory text to insert at the top of the subtrack selection dialog.
+
  
;'''subtrack labels'''
+
  group_label_position = top
If you would like to select subtracks based on its type, a tag, or some other method, but you want the user to see a different label for the value when it is displayed in the subtrack selection dialog, you can specify the label here. In the example above, we are selecting certain subtracks based on their developmental stage, but instead of making the internal names for the stages visible (E0-4h, E4-8h, etc), we ask GBrowse to display more friendly names ("Early embryo", "Late embryo", etc.)
+
  
Finally, here is a simple one-dimensional example that defines several subtracks based on their display names:
+
For compatibility with earlier behavior, GBrowse will automatically set group_label to 0 if the track consists of quantitative data (uses one of the wiggle, xyplot, density or whisker glyphs). Please set group_label to a true value in order to activate subtrack labeling.
  
[ExpressionArrays]
+
===Hiding Subtracks with no Data===
subtrack select = "Array Name"  display_name
+
subtrack table  = Affy200K ;
+
                  Affy600K ;
+
                  AffyCpG28 ;
+
  
and here is another example that is functionally identical to the example given for the deprecated "select" option. Note the use of the ~ to indicate that the matches are to use case-insensitive regular expressions.
+
By default, if a subtrack has no data in the currently displayed region, its label will still be printed to show that the subtrack exists. If you prefer, you can set '''hide empty subtracks''' to a true value, in which case the display of empty subtracks will be suppressed.
  
[GrowthCurve]
+
[[Category:GBrowse]]
subtrack select = "Sample Time" source
+
[[Category:HOWTO]]
subtrack table = '~day 1' ;
+
[[Category:Configuration]]
                  '~day 2' ;
+
                  '~day 3'
+

Latest revision as of 19:42, 17 April 2012

For the main GBrowse 2.0 HOWTO article, see: GBrowse 2.0 HOWTO.

As of GBrowse version 2.09, you can create an unlimited number of subtracks within a single major track in order to group a series of datasets that are logically linked, such as a timecourse. You can choose which subtracks to show by default and the order in which they will appear. When the user clicks a designated area in the titlebar, a dialog box appears that allows the user to select which subtracks to make visible. The user can also drag subtrack labels up and down to adjust the order in which they are displayed.

There are two mechanisms for defining subtracks. The "Metadata" mechanism (new in version 2.48) is used when you have one feature per subtrack such as a whole genome quantitative ("wiggle") feature and each feature is enumerable by its display name. You provide a file that lists each feature subtrack explicitly.

The second mechanism is more flexible and is used when there are too many features to list explicitly and/or there are more than one feature per subtrack. In this scheme, each subtrack is defined by a set of feature filters. The filters are applied to each filter in turn, sorting them into the appropriate subtrack.

Using Metadata

If you have just a few features and there is a one-to-one correspondence between feature and subtrack, then the easiest way to define subtracks is by use of an external metadata file. A typical file looks like this:

[feature_name_1]
:dbid        = f101
:selected    = 1
display_name = My First Feature
type         = some_type1
method       = my_method1
source       = my_source1
some_attribute    = value1
another_attribute = value2

[feature_name_2]
:dbid        = f102
:selected    = 1
display_name = My Second Feature
type         = some_type2
method       = my_method2
source       = my_source2
some_attribute    = value3
another_attribute = value4

[feature_name_3]
:dbid        = f103
type         = some_type2
method       = my_method2
source       = my_source2
some_attribute    = value5
another_attribute = value6

Each [stanza] begins with the name of a feature as it is represented in the underlying database. Below each [stanza] heading are a series of tag=value pairs. The following tag names have special meaning:

:dbid          Optional unique identifier for the subtrack; If provided, it can be used in the GBrowse
               URL to select the subtrack.
:selected      If true, this subtrack is selected by default when the containing track is turned on.
display_name   Display name for the feature. If not present, will
               default to the feature's native display name (i.e. the one in the [stanza]).
type           What is returned by calling the feature's type() method.
method         What is returned by calling the feature's method() method.
source         What is returned by calling the feature's source() method.
score          What is returned by calling the feature's score() method.

Any other tags become sortable attributes which are displayed by the GBrowse subtrack selection dialog box. For this to work properly, each tag must be present in each stanza. Tags that are present in some stanzas and not others are ignored.

Save this file anywhere convenient and then associate it with the desired track using the metadata option. This option takes the full path name to the metadata file. For example:

[ChIP-Seq]
database = Peaks
feature  = signal
glyph    = vista_ploat
metadata = /var/www/gbrowse2/databases/chip-seq/metadata.txt

Full Example

Here is a full working example of a metadata-based subtrack definition that shows a number of ChIP-seq experiments with the attributes "factor", "stage" and "algorithm". The subtrack selection dialog that this generates is shown on the right.

ChIP-seq subtracks using metadata file

The database stanza:

 [Chip:database]
 db_adaptor    = Bio::DB::SeqFeature::Store
 db_args       = -adaptor memory
		-dsn /var/www/gbrowse2/databases/elegans_peakcallcomparison
 search options = none

The features in this database are named "BLMP1-L1 Berkeley", "BLMP1-L1 IDR", etc. An excerpt from one of the GFF3 files that comprises this database can be found at Using the vista_plot Glyph.

The track stanza:

[ChIP-Seq]
database = Chip
feature  = signal
glyph    = vista_ploat
metadata = /var/www/gbrowse2/databases/chip-seq/metadata.txt


Metadata file in /var/www/gbrowse2/databases/chip-seq/metadata.txt.

 [BLMP1-L1 Berkeley]
 :dbid=101
 :selected=1
 factor = BMLP1
 stage  = L1
 algorithm = Berkeley

 [BLMP1-L1 IDR]
 :dbid=102
 :selected=1
 factor = BMLP1
 stage  = L1
 algorithm = IDR

 [BLMP1-L1 Published]
 :dbid=103
 factor = BMLP1
 stage  = L1
 algorithm = Published

 [UNC130-L1 Berkeley]
 factor = UNC130
 stage  = L1
 :dbid=104
 algorithm = Berkeley

 [UNC130-L1 IDR]
 factor = UNC130
 stage  = L1
 :dbid=105
 algorithm = IDR

 [UNC130-L1 Published]
 factor = UNC130
 stage  = L1
 :dbid=106
 algorithm = Published

Save this file as /var/www/

Using Subtrack Select

This section describes how to use the subtrack select and subtrack table options to create subtracks based on filters.

A Basic Example

Here is a simple example to show how the system works. We start out with a gene track that has no subtracks:

[Genes]
feature      = gene
glyph        = gene
database     = sqlite-genes
category     = Genes:Coding
label        = 1
key          = Wormbase Genes
Genes track with no subtracks

The behavior of this track is to show both forward and reverse strand genes packed together for maximum efficiency, as shown in the figure on the right. Let's say we would prefer for forward and reverse stranded genes to be sorted into separate subtracks so that they do not intermingle. This can be done by adding subtrack select and subtrack table options to the configuration:

[Genes]
feature      = gene
glyph        = gene
database     = sqlite-genes
category     = Genes:Coding
label        = 1
subtrack select = Strand strand
subtrack table  =  +1 ;
                   -1
key          = Wormbase Genes

The "subtrack select" option defines a partitioning scheme for the data in the track. It consists of one or more lines defining the dimensions on which to partition the data. In this case, we are partitioning on only one dimension, the strandedness of the feature. The definition of this dimension is <Dimension Label> <method> (whitespace delimited, as usual), where the dimension label is a human-readable column label on the selection dialog and the method is any of the methods recognized by Bio::SeqFeatureI objects ("display_name", "primary_tag", "source_tag", "score", "has_tag", "tag_value", "start", "length", etc). In the example above, we are going to partition on the strand() method, and to label this dimension "Strand" on the popup menu presented to the user.

First iteration of strand-specific subtracks
Subtrack selection dialog box


The "subtrack table" option defines the values on which to partition the data. In this case, we are going to partition the data into two subtracks: one for positive strand features (strand() returns +1) and one for negative strand features (strand() returns -1). This option has two or more lines, each one separated by a semicolon. Each line corresponds to a subtrack, which will be filtered by matching against the value(s) specified on the line. We simply list +1 and -1 as our two subtrack filter values.

Reloading the browser now gives the track shown on the upper right. When the user clicks on the titlebar where it says "Showing 2 of 2 tracks", he can control the display of the subtracks using the dialog on the lower right.

Second iteration of strand-specific subtracks
Second iteration of the subtrack selection dialog box

This is good, but has two aesthetic issues. First, the labels on the subtracks appear as "+1" and "-1" which is not intuitive. Similarly, the strand values in the selection dialog also appear as +1 and -1. We can considerably improve this by attaching human-readable labels to the dimension values and subtracks. Here is an improved configuration file:

[Genes]
feature      = gene
glyph        = gene
database     = sqlite-genes
category     = Genes:Coding
label        = 1
subtrack select = Strand strand
subtrack table  = :Forward +1 ;
                  :Reverse -1
subtrack select labels = +1 "Forward Strand" ;
	                  -1 "Reverse Strand"
key          = Wormbase Genes


We've done two things here. First, we've modified the "subtrack table" option so that each line is preceded with :Name where Name is what we want to appear to the left of the subtrack in the display. The ":" symbol is required in front of the name but will not appear in the display. It can appear anywhere relative to the match items in the option.

Second, we added a "subtrack select labels" option to the stanza. This relabels the selectable dimension values with the desired human-readable labels within the dialog box itself. Notice that subtracks can have different names than their selection labels. In this example, we choose "Forward Strand" for +1 features in the dialog box, but "Forward" for the subtrack name.

The effect of these modifications are shown in the two figures to the right.


ChIP-seq subtracks
modENCODE ChIP-seq subtrack selector

Multidimensional Subtracks

The previous example partitioned subtracks on a single dimension. This example will show how to create individually-selectable subtracks based on multiple selection dimensions. As an example, we use a track based on the modENCODE ChIP-seq tracks. These have four different dimensions, corresponding to the antibody used to bring down chromatin-bound transcription factors, the organism's developmental stage, the temperature at which the organism was grown, and whether the data set has been validated.

The images below and to the right show a development version of the histone modification ChIP-seq track from the modENCODE project and the dialog box used to select among the subtracks. This track was created with the following configuration:

[ChIP-seq]
subtrack select = Antibody    tag_value antibody    ;
                  Stage       tag_value stage       ;
                  Temperature tag_value temp        ;
                  Confirmed   has_tag   confirmed   ;
subtrack table = H3K4Me3  E0-4h 23  1 * ;
                 H3K4Me3  E4-8h 23  0   ;
                 H3K4Me3  pupae 23  0   ;
                 H3K4Me3  pupae 26  0   ;
                 H3K9Me2  E0-4h 23  0 * ;
                 H3K9Me2  E4-8h 23  1   ;
                 H3K9Me2  pupae 23  0   ;
                 H3K9Me2  pupae 26  1   ;
                 H3K27Me3 E0-4h 23  1 * ;
subtrack select labels = E0-4h "Early embryo" ;
                         E4-8h "Late embryo"  ;
                         pupae "Pupating larvae" ;
brief comment = This track shows modENCODE ChIP-seq characterization of
                histone marks across various stages and growth conditions.

In this case, the subtrack select option has four lines, each one separated by a semicolon (the semicolon on the final line is optional). Each line defines a subtrack dimension and has the format described above, consisting of a column label and a Bio::SeqFeature method call. However, in this case all the dimensions are contained in the features' tags (also known as feature "attributes"), which are accessed using the tag_value() and has_tag() methods. To specify which tags we are interested in, there is a third space-delimited argument that contains the tag name. So the Antibody dimension is determined by calling $feature->tag_value('antibody') and the Confirmed dimension is determined by calling $feature->has_tag('confirmed').

The subtrack table option lists each of the possible subtracks. The first four columns correspond to the four dimensions specified by subtrack select, e.g. antibody followed by developmental stage, followed by temperature and confirmation status. In this case we do not specify subtrack labels for reasons discussed later. However, since there are a lot of subtracks, we do not want them all to be displayed by default. The optional asterisk symbol (*) falling anywhere inside a subtrack table line indicates that this subtrack is to be turned on by default. If no asterisks are present, all subtracks are turned on.

The brief comment option serves to define a short description of the subtracks that will be printed in the subtrack selection dialog box.

Adding subtrack IDs to the Table

If you wish to embed GBrowse in another application, you will probably want to assign IDs to the subtracks so that they can be turned on and off via the GBrowse URL in the same way that whole tracks are turned on and off. To do this, append "=identifier" to each line of the subtrack table like this:

subtrack table = H3K4Me3  E0-4h 23  1 * =100 ;
                 H3K4Me3  E4-8h 23  0   =101 ;
                 H3K4Me3  pupae 23  0   =102 ;
                 H3K4Me3  pupae 26  0   =103 ;
                 H3K9Me2  E0-4h 23  0 * =104 ;
                 H3K9Me2  E4-8h 23  1   =105 ;
                 H3K9Me2  pupae 23  0   =106 ;
                 H3K9Me2  pupae 26  1   =107 ;
                 H3K27Me3 E0-4h 23  1 * =108 ;

The identifier can be any combination of letters and numbers. Its exact position on the line doesn't matter.

The identifiers can then be used to select subtrack in the GBrowse URL:

  http://your.site/cgi-bin/gb2/gbrowse/elegans/?q=I:1000..2000;l=ChIP-seq/100+102+103

This will select the region between positions 1000 and 2000 on chromosome I, and turn on the ChIP-seq track, subtracks 100, 102 and 103.

Labeling Subtracks

Subtracks are labeled on the left side of the panel. If you are using "label_position=left" in your stanza, or the genes glyph with "label_transcripts" set to true, then it is possible for the labels of individual features to collide with the subtrack labels. There are two ways around this problem. One is to turn off subtrack labeling entirely and to let each individual feature's label identify subtracks. This works well with chromosome-wide features such as wiggle tracks when there is only one feature per subtrack (this is illustrated in the modENCODE data set above). The other is to relocate the position of the subtrack label to the top of the subtrack, where it won't clash with the feature labels. GBrowse will attempt to detect potential clash cases, and to configure them for you automatically, but there may be cases where you to have intervene manually.

To implement the first scheme, set the option "group_label" to a false value:

 group_label = 0

Be sure that the label assigned to each feature is sufficiently informative that they can substitute for the subtrack label.

To implement the second scheme, set "group_label_position" to "top":

 group_label_position = top

For compatibility with earlier behavior, GBrowse will automatically set group_label to 0 if the track consists of quantitative data (uses one of the wiggle, xyplot, density or whisker glyphs). Please set group_label to a true value in order to activate subtrack labeling.

Hiding Subtracks with no Data

By default, if a subtrack has no data in the currently displayed region, its label will still be printed to show that the subtrack exists. If you prefer, you can set hide empty subtracks to a true value, in which case the display of empty subtracks will be suppressed.