Difference between revisions of "JBrowse Advanced Topics"

From GMOD
Redirect page
Jump to: navigation, search
(Created page with "= Data Format Specification: Lazy Nested Containment List (<code>LazyNCList</code>) Feature Store = JBrowse uses lazily-loaded nested containment lists (LazyNCLists) as an effi…")
 
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
=  Data Format Specification: Lazy Nested Containment List (<code>LazyNCList</code>) Feature Store =
+
#REDIRECT [[JBrowse_Configuration_Guide#Advanced_Topics]]
 
+
JBrowse uses lazily-loaded nested containment lists (LazyNCLists) as an efficient format for storing feature data in pre-generated static files.  A nested containment list is a tree data structure in which the nodes of the tree are intervals  themselves features, and edges connecting features that lie within the bounds of (but are not subfeatures of) another feature.  It has some similarities to an R tree.  For more on NClists, see [http://bioinformatics.oxfordjournals.org/content/23/11/1386.abstract the Alekseyenko paper].
+
 
+
This data format is currently used in JBrowse 1.3 for tracks of type <code>FeatureTrack</code>, and the code that actually reads this format is in SeqFeatureStore/NCList.js and ArrayRepr.js.
+
 
+
The LazyNCList format can be broken down into two distinct subformats: the LazyNCList itself, and the array-based JSON representation of the features themselves.
+
 
+
== Array Representation (<code>ArrayRepr</code>) ==
+
 
+
For speed and memory efficiency, JBrowse feature JSON represents features as arrays instead of objects.  This is because the JSON representation is much more compact (saving a lot of disk space), and many browsers significantly optimize JavaScript Array objects over more general objects.
+
 
+
Each feature is represented as an array of the form <code>[ class, data, data, ... ]</code>, where the <code>class</code> is an integer index into the store's <code>classes</code> array (more on that in the next section).  Each of the elements in the <code>classes</code> array is an ''array representation'' that defines the meaning of each of the the elements in the feature array.
+
 
+
An '''array representation''' specification is encoded in JSON as (comments added):
+
 
+
{
+
  "attributes": [                  // array of attribute names for this representation
+
      "AttributeNameForIndex1",
+
      "AttributeNameForIndex2",
+
      ...
+
  ],
+
  "isArrayAttr": {                  // list of which attributes are themselves arrays
+
      "AttributeNameForIndexN": 1,
+
      ...
+
  }
+
}
+
 
+
== Lazy Nested-Containment Lists (<code>LazyNCList</code>) ==
+
 
+
A JBrowse LazyNCList is a nested containment list tree structure stored as one JSON file that contains the root node of the tree, plus zero or more "lazy" JSON files that contain subtrees of the main tree.  These subtree files are lazily fetched: that is, they are only fetched by JBrowse when they are needed to display a certain genomic region.
+
 
+
On disk, the files in an LazyNCList feature store look like this:
+
 
+
  # stats, metadata, and nclist root node
+
  data/tracks/<track_label>/<refseq_name>/trackData.json
+
  # lazily-loaded nclist subtrees
+
  data/tracks/<track_label>/<refseq_name>/lf-<chunk_number>.json
+
  # precalculated feature densities
+
  data/tracks/<track_label>/<refseq_name>/hist-<bin_size>.json
+
  ...
+
 
+
Where the <code>trackData.json</code> file is formatted as (comments added):
+
 
+
{
+
    "featureCount" : 4293,          // total number of features in this store
+
    "histograms" : {                // information about precalculated feature-frequency histograms
+
      "meta" : [
+
          {                        // description of each available bin-size for precalculated feature frequencies
+
            "basesPerBin" : "100000",
+
            "arrayParams" : {
+
                "length" : 904,
+
                "chunkSize" : 10000,
+
                "urlTemplate" : "hist-100000-{Chunk}.json"
+
            }
+
          },
+
          ...                      // and so on for each bin size
+
      ],
+
      "stats" : [
+
          {                          // stats about each precalculated set of binned feature frequencies
+
            "basesPerBin" : "100000", // bin size in bp 
+
            "max" : 51,              // max features per bin
+
            "mean" : 4.93030973451327 // mean features per bin
+
          },
+
          ...
+
      ]
+
    },
+
    "intervals" : {
+
      "classes" : [                // classes: array representations used in this feature data (see ArrayRepr section above)
+
          {
+
            "isArrayAttr" : {
+
                "Subfeatures" : 1
+
            },
+
            "attributes" : [
+
                "Start",
+
                "End",
+
                "Strand",
+
                "Source",
+
                "Phase",
+
                "Type",
+
                "Id",
+
                "Name",
+
                "Subfeatures"
+
            ]
+
          },
+
          ...
+
          {                        // the last arrayrepr class is the "lazyClass": fake features that point to other files
+
            "isArrayAttr" : {
+
                "Sublist" : 1
+
            },
+
            "attributes" : [
+
                "Start",
+
                "End",
+
                "Chunk"
+
            ]
+
          }
+
      ],
+
      "nclist" : [
+
          [
+
            2,                    // arrayrepr class 2
+
            12962,                // "Start" minimum coord of features in this subtree
+
            221730,              // "End"  maximum coord of features in this subtree
+
            1                    // "Chunk" (indicates this subtree is in lf-1.json)
+
          ],
+
          [
+
            2,                    // arrayrepr class 2
+
            220579,              // "Start" minimum coord of features in this subtree
+
            454457,              // "End"  maximum coord of features in this subtree
+
            2                    // "Chunk" (indicates this subtree is in lf-2.json)
+
          ],
+
          ...
+
      ],
+
      "lazyClass" : 2,            // index of arrayrepr class that points to a subtree
+
      "maxEnd" : 90303842,              // maximum coordinate of features in this store
+
      "urlTemplate" : "lf-{Chunk}.json", // format for lazily-fetched subtree files
+
      "minStart" : 12962                // minimum coordinate of features in this store
+
    },
+
    "formatVersion" : 1
+
}
+
 
+
= Data Format Specification: Fixed-Resolution Tiled Image Store =
+
 
+
JBrowse can display tracks composed of precalculated image tiles, stretching the tile images horizontally when necessary.  The JBrowse Volvox example data has a wiggle data track that is converted to image tiles using the included <code>wig2png</code> program, but any sort of image tiles can be displayed if they are laid out in this format.
+
 
+
The files for a tiled image track are structured by default like this:
+
 
+
  data/tracks/<track_label>/<refseq_name>/trackData.json
+
  data/tracks/<track_label>/<refseq_name>/<zoom_level_urlPrefix>/<index>.png
+
  ... (and so on, for many more PNG image files)
+
 
+
Where the PNG files are the image tiles themselves, and <code>trackData.json</code> contains metadata about the track in JSON format, including available zoom levels, the width and height of the image tiles, their base resolution (number of reference sequence base pairs per image tile), and statistics about the data (such as the global minimum and maximum of wiggle data).
+
 
+
The structure of the trackData.json file is:
+
 
+
{
+
  "tileWidth": 2000,            // width of all image tiles, in pixels
+
  "stats" : {                  // any statistics about the data being represented
+
      "global_min": 100,
+
      "global_max": 899
+
    },
+
  "zoomLevels" : [              // array describing what resolution levels are available
+
      {                          // in the precalculated image tiles
+
        "urlPrefix" : "1/",
+
        "height" : 100,
+
        "basesPerTile" : 2000
+
      },
+
      ... (and so on, for zoom levels in order of decreasing resolution / increasing bases per tile )
+
  ]
+
}
+
 
+
 
+
To see a working example of this in action, see the contents of <code>sample_data/json/volvox/tracks/volvox_microarray.wig/ctgA</code> after the Volvox wiggle sample data has been formatted.
+
 
+
The code for working with this tiled image format in JBrowse 1.3 is in <code>TiledImageStore/Fixed.js</code>.
+

Latest revision as of 20:30, 8 August 2012