Revision as of 19:04, 22 July 2011 by RSCummings (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

In JBrowse, the extraData option causes additional data from a data source to be incorporated into the output file. In particular, it is useful when used with the urlTemplate option, because the data it extracts can be used to query annotation databases.

The argument for the extraData option is a JSON association list, where the keys are names (strings) and the values are perl subroutine definitions (also strings). The subroutine is evaluated for each feature that will be in the track.

To convince yourself of this, switch to your jbrowse directory, and try the following:

$ bin/prepare-refseqs.pl --fasta volvox.fa
$ bin/flatfile-to-json.pl --gff docs/tutorial/data_files/volvox.gff3 --tracklabel ExtraData_NoTrackChanges --type mRNA --extraData '{ "empty_column" : "sub { print(\"$0 is invoking the subroutine you defined.\\n\") }"}' 

The message is printed four times, because there are four features whose type is 'mRNA'. For this simple example, the subroutine did not return anything. Normally, it is used to return data specific to each feature.

The ability to extract feature data from a data structure in the underlying code suggests that we will need to understand how the data is stored in that structure. After a few minor simplifications, this is what the structure of each feature object looks like:

  "attributes" => {
    # attributes are optional; the ones listed here may or may not be defined for a given feature.
    # also, there could be any number of additional attributes.
    "load_id" => [<list of strings>],
    "parent_id" => [<list of strings>],
    "Alias" => [<list of strings>],
    "Note" => [<list of strings>],
  "ref" => <string>,
  "type" => <string>
  "name" => <string>,
  "phase" => <number>,
  "score" => <number>,
  "start" => <number>,
  "stop" => <number>, 
  "strand" => <number>

When the extraData subroutine is invoked, it is invoked with a feature object (which has the data structure shown above) as the only argument.

As an example, to get the type for each feature, one could do:

--extraData '{ "the_type" : "sub { return $_[0]->{\"type\"}; }" }'  

or, equivalently,

--extraData '{ "the_type" : "sub { shift->{\"type\"}; }" }'  

I will describe the first syntax, since I think it is more intuitive. $_[0] is a reference to the first argument to the subroutine (the feature object), and the arrow pointing to the curly braced, escaped string ("type") gets the data associated with that string from the feature object. That data is then returned.

This is doing almost exactly the same thing as the --getType option in flatfile-to-json.pl. The only difference is that I have chosen to refer to the extracted data as "the_type", and when it is done through --getType, the data is referred to simply as "type". I have only used this type extraction example for demonstration purposes; when you actually want to get the type, you should use --getType because it is more succinct and more easily understood.

Now, let's try to do something useful with --extraData that cannot be done with any other option. Let's extract an attribute.

Here's the command to extract the load_id attribute:

... --extraData '{ "load_id" : "sub { return $_[0]->{\"attributes\"}->{\"load_id\"}[0]; }" }' ...

It turns out that there is a somewhat cleaner way of doing this:

... --extraData '{ "load_id" : "sub { return $_[0]->attributes(\"load_id\"); }" }' ...

Now, when it is desirable to use this data in another option, the header associated with the data should be wrapped in curly braces, e.g.:

... --urlTemplate http://www.google.com/?q={the_type} ...

This is the most basic use case of extraData, where it is desirable to get data from each feature object and then immediately return it as is. With some knowledge of Perl, it would be straightforward to extend this case to map a subroutine over the data, or to combine different types of data.

One final word of caution. When you use the extraData option, the files with the data for the track must get larger to accommodate this extra data. The larger the files are, the longer it takes to load them from the server to the client. For this reason, it might be wise to use extraData sparingly, and to minimize the size of the data extracted from each feature.

See also