Difference between revisions of "GBrowse 1 Configuration HOWTO"

From GMOD
Jump to: navigation, search
m (Many, but minor, fixes.)
(The [GENERAL] Section: Copied updates to stylesheet and merge searches general section options from the GBrowse/Configure HOWTO page before deleting that page.)
Line 576: Line 576:
 
*'''stylesheet'''
 
*'''stylesheet'''
 
Location of the stylesheet used to create the GBrowse look and feel. You can give a relative address (e.g. "gbrowse.css"), in which case GBrowse will look relative to the URL specified by "gbrowse root." Alternatively, you can specify an absolute URL (e.g. "/stylesheets/mysite.css").
 
Location of the stylesheet used to create the GBrowse look and feel. You can give a relative address (e.g. "gbrowse.css"), in which case GBrowse will look relative to the URL specified by "gbrowse root." Alternatively, you can specify an absolute URL (e.g. "/stylesheets/mysite.css").
 +
 +
''New in version 1.70:'' You can specify multiple stylesheets by separating them by spaces. You can also specify a media type by following this format:
 +
<pre><nowiki>
 +
stylesheet = http://www.example.com/stylesheets/lowres.css(screen)
 +
              http://www.example.com/stylesheets/audio.css(audio)
 +
              http://www.example.com/stylesheets/hires.css(paper)
 +
</nowiki></pre>
  
 
*'''buttons'''
 
*'''buttons'''
Line 668: Line 675:
 
*'''disable wildcards'''
 
*'''disable wildcards'''
 
Ordinarily a user can type in "YAL*" to find all features with names beginning with "YAL". This option, if set to a true value, disables wildcard searching.
 
Ordinarily a user can type in "YAL*" to find all features with names beginning with "YAL". This option, if set to a true value, disables wildcard searching.
 +
 +
*'''merge searches'''
 +
If this is set to true (the default), then features with the same name, chromosome and type will be merged into one feature during searches. If this is set to false (zero), then no merging will occur. Set this to true (1) if searches are returning many results, and to false (0) if searches are returning too few. (This option was added in version 1.70).
  
 
*'''truecolor'''
 
*'''truecolor'''

Revision as of 19:55, 29 December 2008


This document provides information on configuring the Generic Genome Browser (GBrowse), part of the GMOD Project.

CREATING NEW DATABASES FROM SCRATCH

This section describes how to create new annotation databases from scratch. ) There are three main database types:

  1. GFF version 2 databases (the oldest and best tested, but unable to represent genes with alternative splicing patterns).
  2. GFF version 3 databases (slightly faster than version 2 and able to represent multilevel genes).
  3. Chado databases (slower than the GFF databases, but very feature-rich)

GFF3 Databases

The GFF3 file format stands for "Genome Feature Format, level 3" and evolved from the GFF and GFF2 formats. Its full specification can be found at The Sequence Ontology Web Site. It was designed to be a light-and-easy way of describing most genomic annotations ranging from simple one-element features to complex multipart features, such as operons and their regulatory and structural elements. To run GBrowse off GFF3-based annotations, you will create a set of GFF3 files in a directory and then point GBrowse at that directory. Alternatively, for better performance and scalability, you can load the GFF3 files into an indexed database and configure the GBrowse server to run off the database.

The GFF3 Format

GFF3 format is a flat tab-delimited file. The first line of the file is a comment that identifies the file format and version. This is followed by a series of data lines, each one of which corresponds to an annotation. DNA and protein sequences can be interspersed with the annotation lines using FASTA format. Here is a miniature GFF3 file:

##gff-version 3
ctg123 . exon            1300  1500  .  +  .  ID=exon00001
ctg123 . exon            1050  1500  .  +  .  ID=exon00002
ctg123 . exon            3000  3902  .  +  .  ID=exon00003
ctg123 . exon            5000  5500  .  +  .  ID=exon00004
ctg123 . exon            7000  9000  .  +  .  ID=exon00005
##FASTA
>ctg123
cttctgggcgtacccgattctcggagaacttgccgcaccattccgccttg
tgttcattgctgcctgcatgttcattgtctacctcggctacgtgtggcta
tctttcctcggtgccctcgtgcacggagtcgagaaaccaaagaacaaaaa
aagaaattaaaatatttattttgctgtggtttttgatgtgtgttttttat
aatgatttttgatgtgaccaattgtacttttcctttaaatgaaatgtaat
cttaaatgtatttccgacgaattcgaggcctgaaaagtgtgacgccattc
gtatttgatttgggtttactatcgaataatgagaattttcaggcttaggc
ttaggcttaggcttaggcttaggcttaggcttaggcttaggcttaggctt
aggcttaggcttaggcttaggcttaggcttaggcttaggcttaggcttag
aatctagctagctatccgaaattcgaggcctgaaaagtgtgacgccattc

The ##gff-version and ##FASTA lines are headers that describe the type of data that follows. The annotation section is introduced by "##gff-version 3", and the sequence section is introduced by "##FASTA". The sequence section is optional.

The 9 columns of the annotation section are as follows:

Column 1: "seqid"

The ID of the landmark used to establish the coordinate system for the current feature. IDs may contain any characters, but must escape any characters not in the set [a-zA-Z0-9.:^*$@!+_?-|]. In particular, IDs may not contain unescaped whitespace and must not begin with an unescaped ">".
To escape a character in this, or any of the other GFF3 fields, replace it with the percent sign followed by its hexadecimal representation. For example, ">" becomes "%E3". See URL Encoding (or: 'What are those "%20" codes in URLs?') for details.

Column 2: "source"

The source is a free text qualifier intended to describe the algorithm or operating procedure that generated this feature. Typically this is the name of a piece of software, such as "Genescan" or a database name, such as "Genbank." In effect, the source is used to extend the feature ontology by adding a qualifier to the type creating a new composite type that is a subclass of the type in the type column. It is not necessary to specify a source. If there is no source, put a "." (a period) in this field.

Column 3: "type"

The type of the feature (previously called the "method"). This is constrained to be either: (a) a term from the "lite" sequence ontology, SOFA; or (b) a SOFA accession number. The latter alternative is distinguished using the syntax SO:000000. This field is required.

Columns 4 & 5: "start" and "end"

The start and end of the feature, in 1-based integer coordinates, relative to the landmark given in column 1. Start is always less than or equal to end.
For zero-length features, such as insertion sites, start equals end and the implied site is to the right of the indicated base in the direction of the landmark. These fields are required.

Column 6: "score"

The score of the feature, a floating point number. As in earlier versions of the format, the semantics of the score are ill-defined. It is strongly recommended that E-values be used for sequence similarity features, and that P-values be used for ab initio gene prediction features. If there is no score, put a "." (a period) in this field.

Column 7: "strand"

The strand of the feature. + for positive strand (relative to the landmark), - for minus strand, and . for features that are not stranded. In addition, ? can be used for features whose strandedness is relevant, but unknown.

Column 8: "phase"

For features of type "CDS", the phase indicates where the feature begins with reference to the reading frame. The phase is one of the integers 0, 1, or 2, indicating the number of bases that should be removed from the beginning of this feature to reach the first base of the next codon. In other words, a phase of "0" indicates that the next codon begins at the first base of the region described by the current line, a phase of "1" indicates that the next codon begins at the second base of this region, and a phase of "2" indicates that the codon begins at the third base of this region. This is NOT to be confused with the frame, which is simply start modulo 3. If there is no phase, put a "." (a period) in this field.
For forward strand features, phase is counted from the start field. For reverse strand features, phase is counted from the end field.
The phase is required for all CDS features.

Column 9: "attributes"

A list of feature attributes in the format tag=value. Multiple tag=value pairs are separated by semicolons. URL escaping rules are used for tags or values containing the following characters: ",=;". Spaces are allowed in this field, but tabs must be replaced with the %09 URL escape. This field is not required.

Column 9 Tags

Column 9 tags have predefined meanings:

ID
Indicates the unique identifier of the feature. IDs must be unique within the scope of the GFF file.
Name
Display name for the feature. This is the name to be displayed to the user. Unlike IDs, there is no requirement that the Name be unique within the file.
Alias
A secondary name for the feature. It is suggested that this tag be used whenever a secondary identifier for the feature is needed, such as locus names and accession numbers. Unlike ID, there is no requirement that Alias be unique within the file.
Parent
Indicates the parent of the feature. A parent ID can be used to group exons into transcripts, transcripts into genes, and so forth. A feature may have multiple parents. Parent can *only* be used to indicate a partof relationship.
Target
Indicates the target of a nucleotide-to-nucleotide or protein-to-nucleotide alignment. The format of the value is "target_id start end [strand]", where strand is optional and may be "+" or "-". If the target_id contains spaces, they must be escaped as hex escape %20.
Gap
The alignment of the feature to the target if the two are not collinear (e.g. contain gaps). The alignment format is taken from the CIGAR format described in the Exonerate documentation. See the GFF3 specification for more information.
Derives_from
Used to disambiguate the relationship between one feature and another when the relationship is a temporal one rather than a purely structural "part of" one. This is needed for polycistronic genes. See the GFF3 specification for more information.
Note
A free text note.
Dbxref
A database cross reference. See the GFF3 specification for more information.
Ontology_term
A cross reference to an ontology term. See the GFF3 specification for more information.

Multiple attributes of the same type are indicated by separating the values with the comma "," character, as in:

Parent=AF2312,AB2812,abc-3

Note that attribute names are case sensitive. "Parent" is not the same as "parent".

All attributes that begin with an uppercase letter are reserved for later use. Attributes that begin with a lowercase letter can be used freely by applications. You can stash any semi-structured data into the database by using one or more unreserved (lowercase) tags.

Nesting Features

Many genomic features are discontinuous and have multiple subparts. GFF3 represents such features by linking the parts together with the Parent tag. For example, to represent an mRNA transcript that has five exons, we could write this:

##gff-version 3
ctg123 . mRNA            1300  9000  .  +  .  ID=mrna0001;Name=sonichedgehog
ctg123 . exon            1300  1500  .  +  .  ID=exon00001;Parent=mrna0001
ctg123 . exon            1050  1500  .  +  .  ID=exon00002;Parent=mrna0001
ctg123 . exon            3000  3902  .  +  .  ID=exon00003;Parent=mrna0001
ctg123 . exon            5000  5500  .  +  .  ID=exon00004;Parent=mrna0001
ctg123 . exon            7000  9000  .  +  .  ID=exon00005;Parent=mrna0001

The first feature is an mRNA that extends from position 1300 to 9000 in genomic coordinates. It has an ID of "mrna0001" and a human-readable name of "sonichedgehog" (note that the ID and the Name are not the same thing). This is followed by five exon features, each of which is linked to the mRNA using a Parent tag. When GBrowse displays this transcript, it will display each of the exons linked together by a solid line. The entire set can be found by searching for the name "sonichedgehog."

The ID is really only important for linking features together. If a feature does not have any subparts, then it does not formally need an ID. Thus, we could simplify this by removing all the exon IDs:

##gff-version 3
ctg123 . mRNA            1300  9000  .  +  .  ID=mrna0001;Name=sonichedgehog
ctg123 . exon            1300  1500  .  +  .  Parent=mrna0001
ctg123 . exon            1050  1500  .  +  .  Parent=mrna0001
ctg123 . exon            3000  3902  .  +  .  Parent=mrna0001
ctg123 . exon            5000  5500  .  +  .  Parent=mrna0001
ctg123 . exon            7000  9000  .  +  .  Parent=mrna0001

Multiple levels of nesting are allowed. If this transcript is part of an operon, then we can add another level of nesting:

##gff-version 3
ctg123 . operon          1300 15000  .  +  .  ID=operon001;Name=superOperon
ctg123 . mRNA            1300  9000  .  +  .  ID=mrna0001;Parent=operon001;Name=sonichedgehog
ctg123 . exon            1300  1500  .  +  .  Parent=mrna0001
ctg123 . exon            1050  1500  .  +  .  Parent=mrna0001
ctg123 . exon            3000  3902  .  +  .  Parent=mrna0001
ctg123 . exon            5000  5500  .  +  .  Parent=mrna0001
ctg123 . exon            7000  9000  .  +  .  Parent=mrna0001
ctg123 . mRNA           10000 15000  .  +  .  ID=mrna0002;Parent=operon001;Name=subsonicsquirrel
ctg123 . exon           10000 12000  .  +  .  Parent=mrna0002
ctg123 . exon           14000 15000  .  +  .  Parent=mrna0002

Discontinuous Features

In addition to nested features, another common type of genomic annotation is the discontinuous feature in which a single feature spans multiple discontinuous portions of the genome. The primary example is an alignment, such as a cDNA sequence that has been aligned to genomic sequence. GFF3 deals with these features by representing each continuous segment as a distinct row, and then giving each segment the same ID to tie them together. For example:

ctg123 example match 26122 26126 . + . ID=match001
ctg123 example match 26497 26869 . + . ID=match001
ctg123 example match 27201 27325 . + . ID=match001
ctg123 example match 27372 27433 . + . ID=match001
ctg123 example match 27565 27565 . + . ID=match001

Note that this is distinct from the nested features we looked at in the previous section. In the former case, there is a single parent feature and multiple child features that are linked to the parent via a Parent tag. The IDs of the children are distinct from each other (or absent altogether). In the latter case, each segment of the discontinuous feature has the same ID. There is no parent.

Protein-Coding Genes

We'll now look at how to represent several common cases, starting with protein-coding genes.

The most general way of representing a protein-coding gene is the so-called "three-level gene." The top level is a feature of type "gene" which bundles up the gene's transcripts and regulatory elements. Beneath this level are one or more transcripts of type "mRNA". This level can also accommodate promoters and other cis-regulatory elements. At the third level are the components of the mRNA transcripts, most commonly CDS coding segments and UTRs. This example shows how to represent a gene named "EDEN" which has three alternatively-spliced mRNA transcripts:

ctg123 example gene            1050 9000 . + . ID=EDEN;Name=EDEN;Note=protein kinase
ctg123 example mRNA            1050 9000 . + . ID=EDEN.1;Parent=EDEN;Name=EDEN.1;Index=1
ctg123 example five_prime_UTR  1050 1200 . + . Parent=EDEN.1
ctg123 example CDS             1201 1500 . + 0 Parent=EDEN.1
ctg123 example CDS             3000 3902 . + 0 Parent=EDEN.1
ctg123 example CDS             5000 5500 . + 0 Parent=EDEN.1
ctg123 example CDS             7000 7608 . + 0 Parent=EDEN.1
ctg123 example three_prime_UTR 7609 9000 . + . Parent=EDEN.1
ctg123 example mRNA            1050 9000 . + . ID=EDEN.2;Parent=EDEN;Name=EDEN.2;Index=1
ctg123 example five_prime_UTR  1050 1200 . + . Parent=EDEN.2
ctg123 example CDS             1201 1500 . + 0 Parent=EDEN.2
ctg123 example CDS             5000 5500 . + 0 Parent=EDEN.2
ctg123 example CDS             7000 7608 . + 0 Parent=EDEN.2
ctg123 example three_prime_UTR 7609 9000 . + . Parent=EDEN.2
ctg123 example mRNA            1300 9000 . + . ID=EDEN.3;Parent=EDEN;Name=EDEN.3;Index=1
ctg123 example five_prime_UTR  1300 1500 . + . Parent=EDEN.3
ctg123 example five_prime_UTR  3000 3300 . + . Parent=EDEN.3
ctg123 example CDS             3301 3902 . + 0 Parent=EDEN.3
ctg123 example CDS             5000 5500 . + 1 Parent=EDEN.3
ctg123 example CDS             7000 7600 . + 1 Parent=EDEN.3
ctg123 example three_prime_UTR 7601 9000 . + . Parent=EDEN.3

We start with a feature of type "gene" with the ID "EDEN". This has three alternative splice forms named EDEN.1, EDEN.2 and EDEN.3. To tell GBrowse that each of these splice forms are part of the same gene, we give each one a Parent attribute of "EDEN" corresponding to the ID of the parent gene. Now consider mRNA EDEN.1. It has a five_prime_UTR feature, a three_prime_UTR feature, and four CDS features. To indicate that the CDS and UTR features belong to the mRNA, we give the mRNA a unique ID of "EDEN.1" and give each of the subfeatures a corresponding parent. This pattern repeats for each of the other two splice forms. Note how the five_prime_UTR of EDEN.3 is split in two parts.

We use "Name" to give the gene and its alternative splice forms a human-readable name, and use Note to provide a description for the gene as a whole (you can add notes to the individual mRNAs but they won't display by default). The Index=1 attribute is a hint to some indexed database to make the mRNAs searchable by name. This lets users find the gene by searching for the mRNA names ("EDEN.1") as well as by the gene name ("EDEN"). However, it is usually unnecessary to do this. Also notice that we are using the Phase column for the CDS features to describe how the CDS is translated into protein. See the description of phase at the beginning of this section.

There are other ways of representing genes. Please see The GFF3 Specification and The GBrowse Administration Tutorial for more information.

Alignments

Nucleotide to genome, and protein to genome alignments are a little tricky because they involve two coordinate systems, the coordinates of the alignment on the genome (known as the "source" coordinates), and the coordinates of the cDNA, EST or protein (known as the "target" coordinates). In GFF3, the target coordinates are specified using the Target tag.

ctg123 est EST_match 1050 1500 . + . ID=Match1;Name=agt830.5;Target=agt830.5 1 451
ctg123 est EST_match 3000 3202 . + . ID=Match1;Name=agt830.5;Target=agt830.5 452 654
ctg123 est EST_match 5410 5500 . - . ID=Match2;Name=agt830.3;Target=agt830.3 505 595
ctg123 est EST_match 7000 7503 . - . ID=Match2;Name=agt830.3;Target=agt830.3 1 504
ctg123 est EST_match 1050 1500 . + . ID=Match3;Name=agt221.5;Target=agt221.5 1 451
ctg123 est EST_match 5000 5500 . + . ID=Match3;Name=agt221.5;Target=agt221.5 452 952
ctg123 est EST_match 7000 7300 . + . ID=Match3;Name=agt221.5;Target=agt221.5 953 1253

This example shows three different alignment features of type "EST_match". Each alignment has a distinct ID, and all the discontinuous parts of the alignment have the same ID, as described earlier. In addition to the ID and Name tags, each segment also has a Target tag whose value has the format "<target seqid> <target start> <target end>." For example, the very first line indicates that the EST named agt830.5 aligns to genomic contig ctg123 such that positions 1 through 451 of agt830.5 aligns to bases 1050-1500 of ctg123.

Using the ##FASTA section of the GFF3 file, you can specify the sequence of the ESTs as well as of the contig, and GBrowse will display the DNA and/or protein sequences in the appropriate contexts.

See the GFF3 specification for instructions on how to represent gapped alignments.

Quantitative Data

GBrowse can plot quantitative data such as alignment scores, confidence scores from gene prediction programs, and microarray intensity data. There is a simple format that can be placed directly inside of a GFF3 file but does not scale to very large data sets, and a "WIG" format designed for very high-density quantitative data such as tiling arrays.

We first look at the simple format:

ctg123 affy microarray_oligo   1 100 281 . . Name=Expt1
ctg123 affy microarray_oligo 101 200 183 . . Name=Expt1
ctg123 affy microarray_oligo 201 300 213 . . Name=Expt1
ctg123 affy microarray_oligo 301 400 191 . . Name=Expt1
ctg123 affy microarray_oligo 401 500 288 . . Name=Expt1
ctg123 affy microarray_oligo 501 600 184 . . Name=Expt1

In this format, which can be embedded directly in the GFF3 file, each data point is a distinct feature with a start and end point. The features are grouped together by giving them a common experimental name so that they can be retrieved together. We use the score field (column 6) to represent the quantitative information (e.g. hybridization intensity).

In contrast, when using WIG format, the quantitative data is kept outside of the main database in a special-purpose binary file that is kept somewhere on the file system. In this case the GFF3 file contains a single line per experiment like this one:

ctg123 . microarray_oligo 1 50000 . . . Name=example;wigfile=/usr/data/ctg123.Expt1.wig

The .wig file is created and managed using a script called wiggle2gff3.pl that comes with GBrowse. Instructions on how to use this script is described in the GBrowse Administration Tutorial.

GFF2 Databases

The GFF2 File Format

The GFF file format stands for "Gene Finding Format" and was invented at the Sanger Centre. It is easy to use, but it suffers from two main limitations (see the box).

Why GFF2 is harmful to your health
One of GFF2's problems is that it is only able to represent one level of nesting of
features. This is mainly a problem when dealing with genes that have multiple
alternatively-spliced transcripts. GFF2 is unable to deal with the three-level
hierarchy of gene->transcript->exon. Most people get around this by declaring a
series of transcripts and giving them similar names to indicate that they come from
the same gene. The second limitation is that while GFF2 allows you to create
two-level hierarchies, such as transcript->exon, it doesn't have any concept of the
direction of the hierarchy. So it doesn't know whether the exon is a subfeature of
the transcript, or vice-versa. This means you have to use "aggregators" to sort out
the relationships. This is a major pain in the neck. For this reason, GFF2 format
has been deprecated in favor of GFF3 format databases.

The GFF format is a flat tab-delimited file, each line of which corresponds to an annotation, or feature. Each line has nine columns and looks like this:

Chr1  curated  CDS 365647  365963  .  +  1  Transcript "R119.7"

The 9 columns are as follows:

reference sequence
This is the ID of the sequence that is used to establish the coordinate system of the annotation. In the example above, the reference sequence is "Chr1".
source
The source of the annotation. This field describes how the annotation was derived. In the example above, the source is "curated" to indicate that the feature is the result of human curation. The names and versions of software programs are often used for the source field, as in "tRNAScan-SE/1.2".
method
The annotation method, also known as type. This field describes the type of the annotation, such as "CDS". Together the method and source describe the annotation type.
start position
The start of the annotation relative to the reference sequence.
stop position
The stop of the annotation relative to the reference sequence. Start is always less than or equal to stop.
score
For annotations that are associated with a numeric score (for example, a sequence similarity), this field describes the score. The score units are completely unspecified, but for sequence similarities, it is typically percent identity. Annotations that do not have a score can use "."
strand
For those annotations which are strand-specific, this field is the strand on which the annotation resides. It is "+" for the forward strand, "-" for the reverse strand, or "." for annotations that are not stranded.
phase
For annotations that are linked to proteins, this field describes the phase of the annotation on the codons. It is a number from 0 to 2, or "." for features that have no phase.
group
GFF provides a simple way of generating annotation hierarchies ("is composed of" relationships) by providing a group field. The group field contains the class and ID of an annotation which is the logical parent of the current one. In the example given above, the group is the Transcript named "R119.7".

The group field is also used to store information about the target of sequence similarity hits, and miscellaneous notes. See the next section for a description of how to describe similarity targets.

The sequences used to establish the coordinate system for annotations can correspond to sequenced clones, clone fragments, contigs or super-contigs.

In addition to a group ID, the GFF format allows annotations to have a group class. This makes sure that all groups are unique even if they happen to share the same name. For example, you can have a GenBank accession named AP001234 and a clone named AP001234 and distinguish between them by giving the first one a class of Accession and the second a class of Clone.

You should use double-quotes around the group name or class if it contains white space.

Creating a GFF2 table

The first 8 fields of the GFF format are easy to understand. The group field is a challenge. It is used in three distinct ways:

1to group together a single sequence feature that spans a discontinuous range, such as a gapped alignment. 2to name a feature, allowing it to be retrieved by name. 3to add one or more notes to the annotation.

1. Using the Group field for simple features

For a simple feature that spans a single continuous range, choose a name and class for the object and give it a line in the GFF file that refers to its start and stop positions.

Chr3   giemsa heterochromatin  4500000 6000000 . . .   Band 3q12.1

2. Using the Group field to group features that belong together

For a group of features that belong together, such as the exons in a transcript, choose a name and class for the object. Give each segment a separate line in the GFF file but use the same name for each line. For example:

IV     curated exon    5506900 5506996 . + .   Transcript B0273.1
IV     curated exon    5506026 5506382 . + .   Transcript B0273.1
IV     curated exon    5506558 5506660 . + .   Transcript B0273.1
IV     curated exon    5506738 5506852 . + .   Transcript B0273.1

These four lines refer to a biological object of class "Transcript" and name B0273.1. Each of its parts uses the method "exon", source "curated". Once loaded, the user will be able to search the genome for this object by asking the browser to retrieve "Transcript:B0273.1". The browser can also be configured to allow the Transcript: prefix to be omitted.

You can extend the idiom for objects that have heterogeneous parts, such as a transcript that has 5' and 3' UTRs

IV     curated  mRNA   5506800 5508917 . + .   Transcript B0273.1; Note "Zn-Finger"
IV     curated  5'UTR  5506800 5508999 . + .   Transcript B0273.1
IV     curated  exon   5506900 5506996 . + .   Transcript B0273.1
IV     curated  exon   5506026 5506382 . + .   Transcript B0273.1
IV     curated  exon   5506558 5506660 . + .   Transcript B0273.1
IV     curated  exon   5506738 5506852 . + .   Transcript B0273.1
IV     curated  3'UTR  5506852 5508917 . + .   Transcript B0273.1

In this example, there is a single feature with method "mRNA" that spans the entire range. It is grouped with subparts of type 5'UTR, 3'UTR and exon. They are all grouped together into a Transcript named B0273.1. Furthermore the mRNA feature has a note attached to it.

  • NOTE* The subparts of a feature are in absolute (chromosomal or contig) coordinates. It is not currently possible to define a feature in absolute coordinates and then to load its subparts using coordinates that are relative to the start of the feature.

Some annotations do not need to be individually named. For example, it is probably not useful to assign a unique name to each ALU repeat in a vertebrate genome. For these, just leave the Group field empty.

3. Using the Group field to add a note

The group field can be used to add one or more notes to an annotation. To do this, place a semicolon after the group name and add a Note field:

Chr3 giemsa heterochromatin 4500000 6000000 . . . Band 3q12.1 ; Note "Marfan's syndrome"

You can add multiple Notes. Just separate them by semicolons:

 Band 3q12.1 ; Note "Marfan's syndrome" ; Note "dystrophic dysplasia"

The Note should come AFTER the group type and name.

3. Using the Group field to add an alternative name

If you want the feature to be quickly searchable by an alternative name, you can add one or more Alias tags. A feature can have multiple aliases, and multiple features can share the same alias:

Chr3 giemsa heterochromatin 4500000 6000000 . . . Band 3q12.1 ; Alias MFX

Searches for aliases will be both faster and more reliable than searches for keywords in notes, since the latter relies on whole-text search methods that vary somewhat from DBMS to DBMS.


Identifying the reference sequence

Each reference sequence in the GFF table must itself have an entry. This is necessary so that the length of the reference sequence is known.

For example, if "Chr1" is used as a reference sequence, then the GFF file should have an entry for it similar to this one:

Chr1 assembly chromosome 1 14972282 . + . Sequence Chr1

This indicates that the reference sequence named "Chr1" has length 14972282 bp, method "chromosome" and source "assembly". In addition, as indicated by the group field, Chr1 has class "Sequence" and name "Chr1".

It is suggested that you use "Sequence" as the class name for all reference sequences, since this is the default class used by the Bio::DB::GFF module when no more specific class is requested. If you use a different class name, then be sure to indicate that fact with the "reference class" option (see below).


Sequence alignments

There are several cases in which an annotation indicates the relationship between two sequences. One common one is a similarity hit, where the annotation indicates an alignment. A second common case is a map assembly, in which the annotation indicates that a portion of a larger sequence is built up from one or more smaller ones.

Both cases are indicated by using the Target tag in the group field. For example, a typical similarity hit will look like this:

Chr1 BLASTX similarity 76953 77108 132 + 0 Target Protein:SW:ABL_DROME 493 544

Here, the group field contains the Target tag, followed by an identifier for the biological object. The GFF format uses the notation Class:Name for the biological object, and even though this is stylistically inconsistent, that's the way it's done. The object identifier is followed by two integers indicating the start and stop of the alignment on the target sequence.

Unlike the main start and stop columns, it is possible for the target start to be greater than the target end. The previous example indicates that the the section of Chr1 from 76,953 to 77,108 aligns to the protein SW:ABL_DROME starting at position 493 and extending to position 544.

A similar notation is used for sequence assembly information as shown in this example:

Chr1        assembly Link   10922906 11177731 . . . Target Sequence:LINK_H06O01 1 254826
LINK_H06O01 assembly Cosmid 32386    64122    . . . Target Sequence:F49B2       6 31742

This indicates that the region between bases 10922906 and 11177731 of Chr1 are composed of LINK_H06O01 from bp 1 to bp 254826. The region of LINK_H0601 between 32386 and 64122 is, in turn, composed of the bases 5 to 31742 of cosmid F49B2.


Dense quantitative data

If you have dense quantitative data, such as tiling array data, microarray expression data, ChIP-chip or ChIP-seq chromatin immunoprecipitation data, then you will probably want to create "Wiggle" format binary files, which represent the quantitative data in a compact format in external files. Use the wiggle2gff3.pl script, included in this distribution, to format and load this data. Run wiggle2gff3.pl -h for instructions.

Loading the GFF file into the database

Use the BioPerl script utilities bp_bulk_load_gff.pl, bp_load_gff.pl or (if you are brave) bp_fast_load_gff.pl to load the GFF file into the database. For example, if your database is a MySQL database on the local host named "dicty", you can load it into an empty database using bp_bulk_load_gff.pl like this:

 bp_bulk_load_gff.pl -c -d dicty my_data.gff

To update existing databases, use either bp_load_gff.pl or bp_fast_load_gff.pl. The latter is somewhat experimental, so use with care.

Aggregators

The Bio::DB::GFF database (and only Bio::DB::GFF!) has a feature known as "aggregators". These are small software packages that recognize certain common feature types and convert them into complex biological objects. These aggregators make it possible to develop intelligent graphical representations of annotations, such as a gene that draws confirmed exons differently from predicted ones.

An aggregator typically creates a new composite feature with a different method than any of its components. For example, the standard "alignment" aggregator takes multiple alignments of method "similarity", groups them by their name, and returns a single feature of method "alignment".

The various aggregators are described in detail in the Bio::DB::GFF manual page. It is easy to write new aggregators, and also possible to define aggregators on the fly in the gbrowse configuration file. It is suggested that you use the sample GFF files from the yeast, drosophila and C. elegans projects to see what methods to use to achieve the desired results.

In addition to the standard aggregators that are distributed with BioPerl, GBrowse distributes several experimental and/or special-purpose aggregators:

match_gap: This aggregator is used for GFF3 style gapped alignments, in which there is a single feature of method 'match' with a 'Gap' attribute. This aggregator was contributed by Dmitri Bichko. orf: This aggregator aggregates raw "ORF" features into "coding" features. It is basically identical to the "coding" aggregator, except that it looks for features of type "ORF" rather than "cds". reftranscript: This aggregator was written to make the compound feature, "reftranscript" for use with Gbrowse editing software developed outside of the GMOD development group. It can be used to aggregate "reftranscripts" from "refexons", loaded as second copy features. These features, in contrast to "transcripts", are usually implemented as features which cannot be edited and serve as starting point references for annotations added using Gbrowse for feature visualization. Adding features to the compound feature, "reftranscript", can be done by adding to the "part_names" call (i.e. "refCDS"). waba_alignment: This aggregator handles the type of alignments produced by Jim Kent's WABA program, and was written to be compatible with the C elegans GFF files. It aggregates the following feature types into an aggregate type of "waba_alignment":

  nucleotide_match:waba_weak
  nucleotide_match:waba_strong
  nucleotide_match:waba_coding

wormbase_gene: This aggregator was written to be compatible with the C elegans GFF2 files distributed by the Sanger Institute. It aggregates raw "CDS", "5'UTR", "3'UTR", "polyA" and "TSS" features into "transcript" features. For compatibility with the idiosyncrasies of the Sanger GFF format, it expects that the full range of the transcript is contained in a main feature of type "Sequence". It is strongly recommended that for mirroring C. elegans annotations, you use the "processed_transcript" aggregator in conjunction with the GFF3 files found at:

ftp://ftp.wormbase.org/pub/wormbase/genomes/elegans/genome_feature_tables/GFF3

IT IS NOT NECESSARY TO USE AGGREGATORS WITH THE CHADO, BIOSQL OR BIO::DB::SEQFEATURE::STORE (GFF3) DATABASES.


Chado Databases

Please see Chado.

Adding and Configuring Databases

Each data source has a corresponding configuration file in the directory gbrowse.conf. Once you've created and loaded a new database, you should make a copy of one of the existing configuration files and modify it to meet your needs. The name of the new configuration file must follow the form:

 sourcename.conf

where "sourcename" is a short word that describes the data source. You can use this name to select the data source when linking to the browser. Just construct a URL that uses "sourcename" as a virtual directory under cgi-bin/gbrowse:

 http://your.site.org/cgi-bin/gbrowse/sourcename/

(Note: If you don't add the slash at the end, gbrowse will automatically do it for you, since the terminal slash is needed to work around an apparent bug in MSIE's cookie handling.)

It is suggested that you use the same name as the database, although this isn't a requirement. (If no "source=" argument is given, gbrowse picks the first configuration file that occurs alphabetically; you can control this by placing numbers in front of the configuration file, as in "01.yeast.conf".)

The configuration file is divided into a number of sections, each one introduced by a [SECTION TITLE]. The [GENERAL] section contains settings that are applicable to the entire application. Other sections define tracks to display.1

I suggest that you begin with one of the example configuration files provided with the distribution and modify it to suit your needs.


The [GENERAL] Section

The [GENERAL] section consists of a series of name=value options. For example, the beginning of the yeast.conf sample configuration file looks like this:

[GENERAL]
description = S. cerevisiae (via SGD Nov 2001)
db_adaptor  = Bio::DB::GFF
db_args     = -adaptor dbi::mysql
              -dsn     dbi:mysql:database=yeast;host=localhost
aggregators = transcript alignment
user        =
passwd      =

Each option is a single word or phrase, usually in lower case. This is followed by an equals sign and the value of the option. You can add whitespace around the equals sign in order to increase readability. If a value is very long, you can continue it on additional lines provided that you put a tab or other whitespace on the continuation lines. For example:

description = S. cerevisiae annotations via SGD Nov 2001, and
            converted using the process_sgd.pl script

Any lines that begin with a pound sign (#) are considered comments and ignored.

During this discussion, you might want to follow along with one of the example configuration files.

The following [GENERAL] options are recognized:

  • description

The description of the database. This will appear in the popup menu that allows users to select the data source and in the header of the page. Don't make it as long as the previous example! (You will want to change this.)

  • db_adaptor

Tells GBrowse what database adaptor to use. By using different adaptors you can attach gbrowse to a variety of different databases. Currently the only stable adaptor you can use is Bio::DB::GFF, which is a standard set of adaptors contained in Bioperl.

  • db_args

Arguments to pass to the adaptor for it to use when making a database connection. The exact format will depend on the adaptor you're using. For Bio::DB::GFF running on top of a MySQL database use a db_args like the following:

   db_args = -adaptor dbi::mysql
             -dsn     dbi:mysql:database=<db_name>;host=<db_host>

replacing <db_name> and <db_host> with the database and database host of your choice. For MySQL databases running on the localhost, you can shorten this to just "db_name".

If the database requires you to log in with a user name and password, use the following db_adaptor:

   db_args = -adaptor dbi::mysql
             -dsn     dbi:mysql:database=<db_name>;host=<db_host>
             -user    <username>
             -pass    <password>

replacing <username> and <password> with the appropriate values. In the example configuration files, we use a username of "nobody" and an empty password. This is appropriate if the database is configured to allow "nobody" to log in from the local machine without using a password.

To use the Oracle version of Bio::DB::GFF, use these arguments:

   db_args = -adaptor dbi::oracle
             -dsn dbi:oracle:database=db_service

Where db_description should be replaced with the name of the desired database service definition. See the documentation for the Perl dbd::Oracle database driver for more information about the -dsn format.

To use the in-memory version of Bio::DB::GFF, use these arguments:

 db_args = -adaptor memory
           -dir   /path/to/directory

The indicated directory should contain one or more GFF and FASTA files, distinguished by the filename extensions .gff and .fa respectively.

  • allow remote callbacks

This option, if true, allows users to place callbacks ("sub ....") in the configuration sections of uploaded files.The callbacks will be executed in a Safe::World compartment, which forbids access to the file system, dangerous operations such as "exec" and "eval", and access to anything but the Bio::Graphics::SeqFeature and Bio::Graphics::Glyph classes. The option also affects remote annotation tracks. For this option to work, the Safe::World module must be installed from CPAN.

  • aggregators

This option is only valid when used with Bio::DB::GFF adaptors, and lists one or more aggregators to use for complex features. It is possible to declare your own aggregator here using a special syntax described in "B7. Declaring New Aggregators."

To disable the default aggregators, leave this setting blank, as in:

    aggregators=

To activate the default aggregators of "transcript," "clone," and "alignment," comment this setting out entirely:

   # aggregators =

Do not use aggregators with Bio::DB::SeqFeature::Store, BioSQL, or Chado.

  • user

The user name for the gbrowse script to log in under if you are not using "nobody". This is exactly the same as providing the -user option to db_args, and is deprecated.

  • pass

The password to use if the database is password protected. This is the same as providing the -pass option to db_args, and is deprecated.

  • gbrowse root

This specifies the URL of GBrowse's static files on your server, such as stylesheets, images and JavaScript files. The default is /gbrowse.

  • stylesheet

Location of the stylesheet used to create the GBrowse look and feel. You can give a relative address (e.g. "gbrowse.css"), in which case GBrowse will look relative to the URL specified by "gbrowse root." Alternatively, you can specify an absolute URL (e.g. "/stylesheets/mysite.css").

New in version 1.70: You can specify multiple stylesheets by separating them by spaces. You can also specify a media type by following this format:

 stylesheet = http://www.example.com/stylesheets/lowres.css(screen)
              http://www.example.com/stylesheets/audio.css(audio)
              http://www.example.com/stylesheets/hires.css(paper)
  • buttons

URL in which the various graphical buttons used by GBrowse are located. The relative and absolute addressing rules described for "stylesheet" apply here as well. (You will probably not need to change this.)

  • js

URL in which the gbrowse javascript helper function files are located. The relative and absolute addressing rules described for "stylesheet" apply here as well. (You will probably not need to change this).

  • tmpimages

URL of a writable directory in which GBrowse can write its temporary images. The format is:

 tmpimages = <tmpimages_url> <tmpimages_path>

Where <tmpimages_url> is the directory as it appears as a URL and <tmpimages_path> is the physical path to the directory as it appears to the filesystem. Usually the physical path is just the URL with the DocumentRoot configuration variable prepended to it, in which case only the URL is needed. However, if the URL is defined using an Alias directive, then the path argument is mandatory.

The tmpimages option is mandatory.

The relative and absolute addressing rules described for "stylesheet" apply here as well.

NOTE: The path argument is ignored if gbrowse is running under modperl, because modperl allows the URL to be translated into a physical directory programatically.

  • plugins

This is a list of plugins that you want to be available from gbrowse. Plugins are a way for third-party developers to add functionality to gbrowse without changing its core source code. Plugins are stored on the gbrowse configuration directory under a subdirectory named "plugins."

A good standard list of plugins is:

   plugins = SequenceDumper FastaDumper RestrictionAnnotator

See the contents of conf/plugins and contrib/plugins for more plugins that you can install.

  • quicklink plugins

This is a list of plugins that you want to appear as links in the link bar (which includes the [Bookmark this] and [Link to Image] links). Selecting one of these links is equivalent to choosing the plugin from the popup menu and pressing the "Go" button. The popup will continue to appear in the popup menu.

  • plugin_path

By default gbrowse searches for plugins in its standard location of conf/plugins. You can store plugins in a non-standard location by providing this option with a space-delimited list of additional directories to search in.

  • galaxy outgoing

If you would like GBrowse to be able to send data to the Galaxy bioinformatics analysis tool, then set this option to the URL for the Galaxy server you would like to use. A good default is:

  galaxy outgoing = http://main.g2.bx.psu.edu/

Without this option, GBrowse will be able to receive and process queries from Galaxy servers, but will not be able to initiate a connection. (Note, this option used to be named "galaxy", which still works for backward compatibility)

  • galaxy incoming

Use this option to change the URL that Galaxy will use when it tries to fetch GFF3-formatted data from GBrowse. The default is:

  http://yourhostname/cgi-bin/gbgff

However, the default will break if the Gbrowse web server is behind a web proxy that uses a different hostname. In this case, you will need to set the location of the gbgff script explicitly.

  • galaxy build name

To be most effective, Galaxy needs to know the genome build name corresponding to the annotations contained in the current database so that it can integrate GBrowse-generated data with other data sets. Each species has its own build name conventions, for example "hg18" for UCSC build number 18. Set this to the build name of your choice. If not present, the value default to the database name.

  • hilite fill, hilite outline

These options control the color of the selection rectangles that appear in the overview and regionview when you are zoomed into a region. The hilite fill controls the color of the rectangle interior, and the hilite outline controls the color of the rectangle outline. Colors can be specified by name (e.g. "pink"), or in HTML #RRGGBB format.

  • image widths

The image widths option controls the set of image sizes to offer the user. Its value is a space-delimited list of pixel widths. The default is probably fine. Note that the height of the image depends on the number of tracks and features, and cannot be controlled.

  • default width

The default width is the image width to start off with when the user invokes the browser for the first time. The default is 800.

  • default features

The default features option is a space-delimited list of tracks to turn on by default. You will probably need to change this. For example:

    default features = Genes ORFs tRNAs Centromeres:overview

The syntax for annotation plugins is slightly different. To activate an annotation plugin track by default, preface the plugin's name with "plugin:"

    default features = Genes ORFs Centromeres:overview
                       plugin:RestrictionAnnotator
  • reference class

gbrowse needs to know the class of the reference sequences that other features are placed on. The default is Sequence. If you want to use another class, such as Contig, please indicate the class here (if you don't, certain features such as the keyword search will fail):

     reference class = contig
  • initial landmark

This option controls what feature to show when the user first visits a gbrowse database and has not yet performed a search. If not present, gbrowse displays a page with the search area and options, but no overview or panel.

Example:

      initial landmark = Chr1
  • drag and drop

If this is set to true, then code will be activated that lets the user pick up and drag individual tracks in order to change their vertical stacking order. For this to work, the user must have a relatively recent browser (IE 5 or higher, Firefox 1.5 or higher) and must have JavaScript activated.

It is off by default for compatibility with older browsers.

  • disable wildcards

Ordinarily a user can type in "YAL*" to find all features with names beginning with "YAL". This option, if set to a true value, disables wildcard searching.

  • merge searches

If this is set to true (the default), then features with the same name, chromosome and type will be merged into one feature during searches. If this is set to false (zero), then no merging will occur. Set this to true (1) if searches are returning many results, and to false (0) if searches are returning too few. (This option was added in version 1.70).

  • truecolor

If this option is present and true, then GBrowse will create 24-bit (truecolor) images. This is mainly useful when using the "image" glyph, which allows you to paste arbitrary images onto the genome map. Do not use this option unless you need it, because it slows down drawing and makes the images much larger.

  • units, unit_divider

The units option allows GBrowse to display units on an alternate scale (for example, (centi)Morgans), and the unit_divider provides the converstion factor between base pair units (which is what must be specified in the GFF file) and the specified units. For example if it is known that 5010 base pairs is equal to one Morgan, 5010 would be specified for the unit_divider. Note that if unit_divider is specified, max segment, default segment and and zoom levels will all be interpreted in terms of the specified units.

  • max segment, min segment

These options control the size of segments that will be shown in the detailed view.

The max segment option sets an upper bound on the maximum size segment that will be displayed on the detailed view. Its value is in the selected units. Above this limit, the user will be prompted to select a smaller region on the birds-eye view. The default is 1,000,000 base pairs.

If the user tries to view a segment smaller than the min segment option, then the segment will be resized to be this size. The default is 20 bp.

  • default segment

The default segment option sets the width of the segment (bp) that will be displayed when the user clicks on the birds-eye view without previously having set a desired magnification. You may want to adjust this value.

  • zoom levels

GBrowse allows unlimited zoom levels. This option selects the width of each level, in bp. For example:

     zoom levels = 1000 2000 5000 10000 20000 40000 100000 200000
  • region segment

If this configuration option is set, a new "region panel" will appear that is intermediate in size between the overview and the detail panel. The value of this option becomes the size of the region panel in base pairs.

  • region sizes

This contains a space-delimited list of region panel sizes to present to the user in a popup menu:

    region sizes   = 5000 10000 20000
  • show sources

A 0 (false) or 1 (true) value which controls whether or not to show the popup menu displaying the defined data sources. Set this to 0 if you wish for the names of the data sources to be hidden. If not present, this option defaults to 1 (true).

Note that all data sources will need to have this option defined in order for it to take effect across all databases.

  • default varying

The track selection table will be sorted alphabetically, by default; setting this variable to true will cause the tracks to appear in the same order as they appear in the configuration file.

  • keyword search max

By default, GBrowse will limit the number of keyword search results to 1,000. The order in which the 1,000 hits are returned depends on how the database was loaded, and so you may see odd patterns, such as only hits on a particular chromosome being displayed. To raise the limit on keyword search results, set "keyword search max" to the desired maximum value.

  • overview units

This option controls the units that will be used on the scale for the birds-eye view display. Possible values are "bp" (base pairs), "k" (kilobases), "M" (megabases), and "G" (gigabases). If this option is omitted, the browser will guess the most appropriate unit.

  • overview bgcolor

This is the color for the background of the birds-eye view.

  • cache time

The server will cache track images for a period of time in order to speed up performance. After the time has expired, the cached version of the image will not be used. This option specifies the time, in hours, that images will be cached. The default is 1 hour.

If you are debugging your config file and want to see uncached images, call GBrowse with the CGI option nocache=1.

  • version

An optional numeric version for this configuration file. Every time gbrowse runs a user's request, it checks the value of the config file version against a version number saved in the user's settings. Of the current version is higher than the saved version, then gbrowse will reset the user's page session to its default values. Use this if you want to reset all users sessions to a known working state, or to draw their attention to a new feature you've added.

Example:

  version = 1.1
  • detailed bgcolor

This is the color for the background of the detailed view.

  • request timeout

This is the timeout value for requests. If a user requests a large region and the request takes more than the indicated number of seconds, then the request will timeout and the user will be advised to choose a smaller region. The default is 60 seconds (one minute). You can make the timeout longer or shorter than this.

  • head

This is content to insert into the HTML <head></head> section. It is the appropriate place to stick JavaScript code, etc. It can be a code reference if you wish.

  • header

This is a header to print at the top of the browser page. It is any valid HTML, and can span multiple lines provided that the continuation lines begin with white space.

It is also possible to place an anonymous Perl subroutine here. The code will be invoked during preparation of the page and must return a string value to use as the header. See COMPUTED OPTIONS for details.

Example:

   header = <h1>Welcome to the Volvox Sequence Page</h1>
  • onload

This is the name of javascript function(s) to be called via the page body's onload event handler. Any text included here will be used to mark-up the </nowiki><body></nowiki> element of the html printed by the gbrowse script. The onload event handler will fire after the page is finished loading, so this setting will be useful for running javascript functions that rely on all or part of the HTML having been loaded and interpreted by the browser. The onload text must use correct javascript syntax. For example:

 onload = alert('I am about to do something');doSomething('arg1','arg2')

will result in

 <body onload="alert('I am about to do something');doSomething('arg1','arg2')">
  • footer

This is a footer to print at the bottom of the browser page. It is any valid HTML, and can span multiple lines provided that the continuation lines begin with white space.

It is also possible to place an anonymous Perl subroutine here. The code will be invoked during preparation of the page and must return a string value to use as the header. See COMPUTED OPTIONS for details.

Example:

    footer = <hr>
        <table width="100%">
        <TR>
        <TD align="LEFT" class="databody">
        For the source code for this browser, see the <a href="http://gmod.org">
        Generic Model Organism Database Project.</a>  For other questions, send
        mail to <a href="mailto:lstein@cshl.org">lstein@cshl.org</a>.
        </TD>
        </TR>
        </table>
  • examples

You can provide GBrowse with some canned examples of "interesting regions" for the user to click on. The examples option, if present, provides a space-delimited list of interesting regions. For example:

      examples = II  NPY1 NAB2 Orf:YGL123W
  • automatic classes

When the user types in a search string that is not qualified by a class (as in EST:yk1234.5), GBrowse will automatically search for a matching feature of class "Sequence". You can have it search for the name in other classes as well by defining the "automatic classes" option.

Example:

       automatic classes = Symbol Gene Clone

When the user types in "hb3", the browser will search first for a Sequence feature of class hb3, followed in turn by matching features in Symbol, Gene and Clone. The search stops when the first match is found. Otherwise, the browser will proceed to a full text search of all the comment fields.

  • search attributes (Bio::DB::SeqFeature::Store adaptor only)

When the browser has searched the name and alias of features without success, it will do a whole database keyword search by calling the database's search_notes() method. By default this will search the text of all attributes, including such things as protein sequence. The Bio::DB::SeqFeature::Store database is a bit smarter about searching, and will only, by default, search attributes named "Note". You can expand the search by giving a list of attribute names to the "search attributes" option.

  • remote sources

This option allows you to add remote annotation sources to the menu of such sources at the bottom of the main window. The format is:

     remote sources = "Menu Label 1" http://url1.host.com/etc/etc
                      "Menu Label 2" http://url2.host.com/etc/etc
  • instructions, search_instructions, navigation_instructions

You may override the default instructions (as defined in the language-specific configuration files in conf/lang) by setting these options. For example:

        instructions = "Type in the name of a contig or clone."
  • no search

If you don't want the "Landmark or Region" textbox to appear, set this to true. The user will still be able to search the database by appending q=<search term> to the URL.

         no search = 1
  • no autosearch

If this option is set to a true value, then the user's previous search will not be automatically re-executed the next time he visits gbrowse. Instead, the previous search will be pasted into the "Landmark or Region" box and the user will have to press "Search" to reexecute it.

  • category tables

This option allows you to group the on/off checkboxes for set of tracks into a rectangular M x N table. It can be used to highlight the experimental design of a microarray or ChIP-on-Chip experiment.

The format is:

category tables = 'category name' 'columnlabel1 columnlabel2 columnlabel3' 'rowlabel1 rowlabel2 rowlabel3'

Where "category name" is the name of the track category (described in more detail below), "columnlabelN" is the label of the Nth column, and "rowlabelN" is the label of the Nth row. For example:

category tables = 'ArrayExpts' 'strain-A strain-B strain-C' 'temperature anaerobic aerobic'

This will set up all the tracks labeled with category "ArrayExpts" so that they are displayed in a 2x3 table like this:

                temperature     anaerobic      aerobic
  strain-A      track 1          track 4       track 7
  strain-B      track 2          track 5       track 8
  strain-C      track 3          track 6       track 9

"track N" will be replaced with the name you selected for the track.

Additional category tables can be specified using continuation lines:

category tables = 'ArrayExpts' 'strain-A strain-B strain-C' 'temperature anaerobic aerobic'
                  'CHiP-Chip'  'TFX1 ONE-CUT PHA4' '16-cell-stage 320-cell-stage adult'

See the GBrowse Administration Tutorial for more details.

  • instructions section, search section, overview section, region section, details section, tracks section, display_settings section, upload_tracks section

These options control which sections are displayed and whether they are initially open or collapsed. Their values are one of:

open Show the section initially open
closed Show the section initially collapsed
off Do not show the section at all

For example

instructions section = closed

will initially show the instructions section in collapsed form when the user visits GBrowse for the first time. "upload_tracks section = off" will disable the uploads section entirely.

Note that turning off the details section will effectively disable GBrowse, but you might want to do this if you want to show the overview section only. Turning off the search section will also disable the navigation buttons. If you want to disable searching selectively, you should use the "no search" option instead.

  • html1, html2, html3, html4, html5, html6

These options allow you to insert HTML into the GBrowse page at strategic places. Eventually this will be replaced with an HTML template system, but for now, this is the best we have.

Option Where it goes
header between the top and the instructions
html1 between the instructions and the navigation bar
html2 between the navigation bar and the overview
html3 between the overview and the detail view
html4 between the detail view and the data source panel
html5 between the data source panel and the track list
html6 between the track list and the annotation upload
footer between the annotation upload and the bottom

These can be code references. One useful thing to do is to use the language translator to insert language-specific HTML. Here's an example provided by Marc Logghe:

    html2 = sub {
        my $go = $main::CONFIG->tr('Go');
        return
        qq(
        <table width="800" border="0">
        <tr class="searchbody">
        <td align="left" colspan="3" />
        <b>Dump:</b><input type="button" value="Assembly" onclick="window.open('gbrowse?plugin=AssemblyDumper;plugin_action=$go');">
        <input type="button" value="Reads" onclick="window.open('gbrowse?plugin=ReadDumper;plugin_action=$go');">
        </td>
        </tr>
        </table>
        );
       }

If you use a coderef for the html options, the subroutine is passed two arguments. The first argument is a Bio::Das::SegmentI object (see the manual page for Bio::DB::GFF::RelSegment for details). The second argument is a hashref containing the user's settings for the current page.

  • keystyle, empty_tracks

These two general options control the appearance of the keys printed on the detailed view. keystyle takes one of two values "between" or "beneath".

      keystyle = between
      Print the track labels between the tracks themselves.
      keystyle = beneath
      Print the track labels at the bottom of the detailed view.

The "empty_tracks" option controls what to do when a track has no features in it. Possible values are:

      empty_tracks = key
      Print just the key (the track label).
      empty_tracks = suppress
      Suppress the track completely.
      empty_tracks = line
      Draw a solid line across the track.
      empty_tracks = dashed
      Draw a dashed line across the track.

The default value is "key."

  • background, postgrid

These two options can be used to place custom background images in the details panel and are useful for advanced operations such as colorizing the panel to show gaps in the assembly. Either option accepts either the path to a graphics file to be tiled onto the background, or a callback subroutine. In the case of the latter the callback will passed a two argument list consisting of the GD::Image object and the Bio::Graphics::Panel object. This gives the callback a chance to draw on top of the background using GD library calls.

The only difference between the two options is the time that they are applied relative to the grid that shows base pair coordinates. The background option is invoked before the grid is drawn so that the grid appears on top of it. The postgrid option is invoked after the grid is drawn, so that anything the option draws appears on top of the grid. See this email for an example of using this feature to show assembly gaps as vertical gray regions.

For a clever example of how to use postgrid calls, see the SynView synteny browser in the contrib directory of the GBrowse distribution. It uses a standard GBrowse configuration file with postgrid calls to draw trapezoids between glyphs to show synteny. For an example of how this looks, see PlasmoDB.

  • image_padding = 25, pad_left = 50, pad_right = 30

The image_padding option will add the indicated amount of whitespace (in pixels) to the right and left of the detail panel. The default is 25 pixels. You may need to adjust this if you are using the xyplot glyph and finding that the scale (which is printed outside the graph area) is being cut off.

You can individually adjust the left and right padding using pad_left and pad_right, which, if present, will supersede image_padding.

  • show track categories

If this option is set to a true value, then tracks that have been assigned to categories (using the "category" option described later), will have their categories included in their labels. For example, a track of key "Protein matches" and category "vertebrate" will be displayed in a track labeled "Protein match (vertebrate)".

The default is false.

  • das mapmaster

This option, which should appear somewhere in the [GENERAL] section, indicates that the database should be made available as a DAS source. The value of the option corresponds to the URL of the DAS reference server for this data source, or "SELF" if this database is its own reference server. (See http://www.biodas.org/ for an explanation of what reference servers are.)

Please see DAS_HOWTO for more information on using DAS with GBrowse.

  • proxy, http proxy, ftp proxy

If your web server is behind a firewall and needs to use a proxy in order to access remote HTTP or FTP sites, then one or more of these options needs to be specified in order for the "add remote annotations" feature to work (both for file-based and DAS-based remote annotations). "http proxy" will set the proxy to use for outgoing HTTP connections, "ftp proxy" will set the proxy to use for outgoing FTP connections, and "proxy" will set both. The value is the URL of the proxy:

  proxy = http://myproxy.myorg.com:9000
  • session driver, session args

These options fine-tune how GBrowse manages its state-maintaining sessions. GBrowse uses CGI::Session to store session data on the server. By default (if neither of these options is present), it uses CGI::Session's "file" driver and "default" serializer. The session files are stored in the "sessions" directory underneath the directory specified by the "tmpimages" option (e.g. /usr/local/apache/htdocs/gbrowse/tmpimages/sessions).

The "session driver" option will be passed to CGI::Session->new() as the first argument. It specifies the driver, serializer and ID generator according to the syntax described in the CGI::Session manual page. The "session args" option will be passed to CGI::Session->new() as the third argument. It specifies additional parameters to be passed to the selected driver.

For example, here is how to create session data that is stored in the MySQL "test" database under a table named "gbrowse_sessions." The session data will be stored in binary form by the Storable module:

session driver = driver:mysql;serializer:storable
session args   = DataSource test
                 TableName  gbrowse_sessions

See the CGI::Session documentation for information about setting up the MySQL table and appropriate permissions.

You might also want to read about CGI::Session::ID::salted_md5 for an ID generation algorithm that should be more secure (but slightly slower) than the default one.

You will not ordinarily need to use these settings, as the defaults seem to work well. If you change these defaults, be sure to change them in all configuration files; otherwise weird stuff will happen when moving from one data source to another.

  • remember settings time

The length of time to remember page-specific settings in the format "+NNNu", where NNN is a number and "u" is a unit ("w" = weeks, "d" = days, "M" = months). For example:

 remember settings time = +3M   # remember settings for 3 months

The users' settings, which includes uploaded files, track options and plugin configuration, will be reset to the default if he or she fails to visit the site within the time specified.

The default value is 1 month.

See the CGI manual page for more information on the time format.

  • remember cookie time

This is the length of time before the user's session cookie will stay on disk before it expires. It should be significantly longer than "remember settings time." The default is 12 months.

  • remember source time

Deprecated. Use "remember cookie time" instead.

  • msie hack

GBrowse uses HTTP POST to transfer the current page settings to the web server. Because of the way that Microsoft Internet Explorer caches pages, when users of this browser press the "Back" button, MSIE will display an annoying alert that prompts the user to reload the page.

When you set "msie hack" to a true value, Gbrowse will use the GET request when it detects MSIE in use. This will fix the "Back" button issue, but will put very long URLs in the Location box. It is your choice which of these is more annoying to your users.

  • suppress_menu

This option will cause the browser to ignore your configuration file when building the source menu. Your sourcse will still be accessible by URL using the gbrowse/yourSource or gbrowse?src=yourSource syntax. One possible application for this feature would be to your data source while you are testing a new configuration.

The [TRACK DEFAULTS] section

The track defaults section specifies default values for each track. The following common options are recognized:

            glyph
            height
            bgcolor
            fgcolor
            fontcolor
            font2color
            strand_arrow

These options control the default graphical settings for any annotation types that are not explicitly specified. See the section below on controlling the settings. Any of the options allowed in the [track] sections described below are allowed here.

  • label density

When there are too many annotations on the screen GBrowse automatically disables the printing of identifying labels next to the feature. "label density" controls where the cutoff occurs. The value in the example files is 25, meaning that labels will be turned off when there are more than 25 annotations of a particular type on display at once.

  • bump density

When there are too many annotations on the screen GBrowse automatically disables collision control. The "bump density" option controls where the cutoff occurs. The value in the example files is 100, meaning that when there more than 100 annotations of the same type on the display, the browser will stop shifting them verticially to prevent them from colliding, but will instead allow them to overlap.

  • link

The link option creates a default rule for creating outgoing links from the GBrowse display. When the user clicks on a feature of interest, he will be taken to the corresponding URL.

The link option's value should be a URL containing one or more variables. Variables begin with a dollar sign ($), and are replaced at run time with the information relating to the selected annotation. Recognized variables include:

    $name        The feature's name (group name)
    $id          The feature's id (eg, PK from a database)
    $class       The feature's class (group class)
    $method      The feature's method
    $source      The feature's source
    $ref         The name of the sequence segment (chromosome, contig)
                    on which this feature is located
    $description The feature's description (notes)
    $start       The start position of this feature, relative to $ref
    $end         The end position of this feature, relative to $ref
    $segstart    The left end of $ref displayed in the detailed view
    $segend      The right end of $ref displayed in the detailed view

For example, the wormbase.conf file uses this link rule:

    link = http://www.wormbase.org/db/get?name=$name;class=$class

At run time, if the user clicks on an EST named yk1234.5, this will generate the URL

    http://www.wormbase.org/db/get?name=yk1234.5;class=EST

It is possible to override the global link rule on a feature-by-feature basis. See the next section for details on this. It is also possible to declare a subroutine to compute the proper URL dynamically. See COMPUTED OPTIONS for details.

A special link type of AUTO will cause the feature to link to the gbrowse_details script, which summarizes information about the feature. The default is not to link at all.

  • link_target

By default links will replace the contents of the current window. If you wish, you can specify a new window to pop up when the user clicks on a feature, or designate a named window or frame to receive the contents of the link. To do this, add the "link_target" option to the [TRACK DEFAULTS] section or to a track stanza. The format is this:

      link_target = _blank

The value uses the HTML targetting rules to name/create the window to receive the value of the link. The first time the link is accessed, a window with the specified name is created. The next time the user clicks on a link with the same target, that window will receive the content of the link if it is still present, or it will be created again if it has been closed. A target named "_blank" is special and will always create a new window.

The "link_target" option can also be computed dynamically. See COMPUTED OPTIONS for details.

  • title

The title option controls the "tooltips" text that pops up when the mouse hovers over a glyph in certain browsers. The rules for generating titles are the same as the "link" option discussed above.The "title" option can also be computed dynamically. See COMPUTED OPTIONS for details.

Note HTML characters such as "<", ">" and "&" are not automatically escaped from the title. This lets you do neat stuff, such as create popup menus, but also means that you need to be careful. In particular, you must not use the quote character (") in the title, but either use the " entity, or the single quote ('). The function CGI::escapeHTML() is available to properly escape HTML characters in dynamically-generated titles.

The special value "AUTO" causes a default description to appear describing the name, type and position of the feature. This is also assumed if the title option is missing or blank.

See CONFIGURE BALLOON TOOLTIPS for the ability to created rich tooltips including images and links.

  • landmark_padding = 1000

The landmark_padding option will add the indicated number of base pairs to the right and left of all landmarks that are searched for by name.


Track Sections

Any other [Section] in the configuration file is treated as a declaration of a track. The order of track sections will become the default order of tracks on the display (the user can change this later). Here is a typical track declaration from yeast.conf:

[Genes]
feature      = gene:sgd
glyph        = generic
bgcolor      = yellow
forwardcolor = yellow
reversecolor = turquoise
strand_arrow = 1
height       = 6
description  = 1
key          = Named gene

This track is named "Genes". You may use a short mnemonic if you prefer; this will make the URL shorter when the user bookmarks a view he or she likes. Track names can contain almost any character, including whitespace, but cannot contain the "-" or "+" signs because these are used to separate track names in the URL when bookmarking. [My Genes] is OK, but [My-Genes] is not.

As in the general configuration section, the track declaration contains multiple name=value option pairs.

Valid options are as follows:

feature
This relates the track to one or more feature types as they appear in the database. Recall that each feature has a method and source. This is represented in the form method:source. So, for example, a feature of type "gene:sgd" has the method "gene" and the source "sgd".

It is possible to omit the source. A feature of type "gene" will include all features whose methods are "gene", regardless of the source field. It is not possible to omit the method. It is possible to have several feature types displayed on a single track. Simply provide the feature option with a space-delimited list of the features you want to include. For example:

   feature = gene:sgd stRNA:sgd

This will include features of type "gene:sgd" and "stRNA:sgd" in the same track and display them in a similar fashion.

remote feature
This relates the track to a remote feature track somewhere on the Internet. The value is a http: or ftp: URL, and may correspond to a static file of features in GFF format, gbrowse upload format, a CGI script, or a DAS source. When this option is active, the "feature" option and most of the glyph control options described below are ignored, but the "citation" and "key" options are honored.

Example:

remote feature = http://www.wormbase.org/cgi-bin/das/wormbase?type=mRNA
glyph
This controls the glyph (graphical icon) that is used to represent the feature. The list of glyphs and glyph-specific options are listed in the section GLYPHS AND GLYPH OPTIONS. The "generic" glyph is the default.
bgcolor
This controls the background color of the glyph. The format of colors is explained in GLYPHS AND GLYPH OPTIONS.
fgcolor
This controls the foreground color (outline color) of the glyph. The format of colors is explained in GLYPHS AND GLYPH OPTIONS.
fontcolor
This controls the color of the primary font of text drawn in the glyph. This is the font used for the features labels drawn at the top of the glyph.
font2color
This controls the color of the secondary font of text drawn in the glyph. This is the font used for the longish feature descriptions drawn at the bottom of the glyphs.
height
This option sets the height of the glyph. It is expressed in pixels.
strand_arrow
This is a true or false value, where true is 1 and false is 0. If this option is set to true, then the glyph will indicate the strandedness of the feature, usually by drawing an arrow of some sort. Some glyphs are inherently stranded, or inherently non-stranded and simply ignore this option.
label
This is a true or false value, where true is 1 and false is 0. If the option is set to true, then the name of the feature (i.e. its group name) is printed above the feature, space allowing.
description
This is a true or false value, where true is 1 and false is 0. If the option is set to true, then the description of the feature (any Note fields) is printed below the feature, space allowing.
key
This option controls the descriptive key that is drawn in the key area at the bottom of the image. It also appears in the checkboxes that the end user uses to switch tracks on and off. If not specified, it defaults to the track name.
citation
If present, this option creates a human-readable descriptive paragraph describing the feature and how it was derived. This is the text information that is displayed when the user clicks on the track name in the checkbox group. The value can either be a URL, in which case clicking on the track name invokes the corresponding URL, or a text paragraph, in which case clicking on the track name generates a page containing the text description. Long paragraphs can be continued across multiple lines, provided that continuation lines begin with whitespace.
link, title, link_target
These options are identical to the similarly-named options in the [GENERAL] section, but change the rules on a track-by-track basis. They can be used to override the global rules. To force a track not to contain any links, use a blank value.
box_subparts
If this option is greater than zero, then gbrowse will generate imagemap rectangles for each of the subparts of a feature (e.g. the exons within a transcript), allowing you to link each subpart separately. The numeric value will control the number of levels of subfeatures that the boxes will descend into. For example, if using the "gene" glyph, set -box_subparts to 2 to create boxes for the whole gene (level 0), the mRNAs (level 1) and the exons (level 2).
feature_low
If this option is present, GBrowse will use the list of feature types listed here at resolution views. (This is one of the ways that semantic zooming is implemented.) This allows you, for example, to switch off detailed exon, UTR, promoters and other within-the-gene features, and just show the start and stop of the transcription unit.
global feature
If this option is present and set to a true value (e.g. "1"), GBrowse will automatically generate a pseudo-feature that starts at the beginning of the currently displayed region and extends to the end. This is often used in conjunction with the "translation" and "dna" glyphs in order to display global characteristics of the sequence. If this option is set, then you do not need to specify a "feature" option.
group_pattern
This option lets you connect related features by dotted lines based on a pattern match in the features' names. A typical example is connecting the 5' and 3' read pairs from ESTs or plasmids. See GROUPING FEATURES for details.
group_on
For Bio::DB::SeqFeature::Store databases only, the group_on field allows you to group features together by display_name, target or any other method. This is mostly useful for XY-plot data, where you may want to dynamically group related data points together so that they share the same vertical scaling.

Example:

       group_on = display_name

(this feature is under refinement and may change in the future)

restrict
This option allows you to restrict who is allowed to view the current track by host name, IP address or username/password. See AUTHENTICATION AND AUTHORIZATION for details.
category
This option allows you to group tracks into different groups on the GBrowse display in addition to the default group called 'General'. For example, if you wanted several tracks to be in a separate group called "Genes", you would add this to each of the track definitions:
 category = Genes

As of GBrowse version 1.7, it is possible to create subcategories using this syntax:

 [label1]
 category = Genes:Coding
 ...
 [label2]
 category = Genes:Non-coding

This will create a section in the tracks panel called "Genes", which will have two subsections called "Coding" and "Non-coding". The track named "label1" will be placed in the first section, while the track named "label2" will be placed in the second.

Subcategories can be nested arbitrarily.

If all tracks are categorized, then the "General" category will not be displayed. If you have used a "category tables" option in the [GENERAL] section of the configuration file, then the names of the tracks labeled with this category will be placed into a table of the appropriate dimensions. Tracks will be placed into the table in column-major format: you should first list stanzas for all rows of column 1, then all rows of column 2, etc.

See the tutorial for more details.

das category, das landmark, das flatten, das subparts, das superparts, das glyph, das type
All these options pertain to exporting the GBrowse database as a DAS data source. Please see DAS_HOWTO for more information.

A large number of glyph-specific options are also recognized. These are described in the next section.

Glyphs and Glyph Options

A large variety of glyphs are available, and more are being added as the Bio::Graphics module grows.

A list of the common glyphs and their options is provided by the GBrowse itself. Click on the "[Help]" link in the section labeled "Upload your own annotations". This page also lists the valid foreground and background colors. Most of the glyphs are found in the BioPerl distribution, but a few are distributed directly with GBrowse.

The most popular glyph types are:

 Glyph                 Description
 -----                 -----------
 generic               a rectangle
 allele_tower          allele found at a SNP position
 arrow                 an arrow
 anchored_arrow        a span with vertical bases |-----|.  If one
                       or the other end of the feature is off-screen, the
                       base will be replaced by an arrow.
 box                   another rectangle; doesn't show subparts of features
 cds                   shows the reading frame of spliced transcripts; used
                       in conjunction with the "coding" aggregator.
 diamond               a point-like feature represented as a triangle
 dna                   DNA and GC content
 heterogeneous_segments a multi-segmented feature in which each segment can
                       have a distinctive color.  For Jim Kent's WABA features,
                       this works with the waba_alignment aggregator.
 idiogram              this takes specially-formatted feature data and turns it
                       into an idiogram of a Giemsa-stained metaphase chromosome
 image                 this embeds photographic images and/or diagrams on features
 processed_transcript  multi-purpose representation of a spliced mRNA, including
                       positions of UTRs
 segments              a multi-segmented feature such as an alignment
 span                  like anchored_arrow, except that the ends are
                       truncated at the edge of the panel, not turned
                       into an arrow
 trace                 reads an SCF trace file and draws a graphic representation
 triangle              a point-like feature represented as a diamond
 transcript            a gene model
 transcript2           a slightly different representation of a gene model
 translation           1-, 3- and 6-frame translations
 wormbase_transcript   yet another gene model that can show UTR segments
                       (for features that conform to the WormBase gene
                       schema). Used in conjunction with the
                       "wormbase_gene" aggregator.
 xyplot                histograms and line plots

A more definitive list of glyph options can be found in the Bio::Graphics manual pages. Consult the manual pages for the following modules:

 Glyph                         Manual Page
 -----                         -----------
 (common options for all)      Bio::Graphics::Glyph
 allele_tower                  Bio::Graphics::Glyph::allele_tower
 arrow                         Bio::Graphics::Glyph::arrow
 anchored_arrow                Bio::Graphics::Glyph::anchored_arrow
 box                           Bio::Graphics::Glyph::box
 cds                           Bio::Graphics::Glyph::cds
 crossbox                      Bio::Graphics::Glyph::crossbox
 diamond                       Bio::Graphics::Glyph::diamond
 dna                           Bio::Graphics::Glyph::dna
 dot                           Bio::Graphics::Glyph::dot
 ellipse                       Bio::Graphics::Glyph::ellipse
 extending_arrow               Bio::Graphics::Glyph::extending_arrow
 generic                       Bio::Graphics::Glyph::generic
 graded_segments               Bio::Graphics::Glyph::graded_segments
 heterogeneous_segments        Bio::Graphics::Glyph::heterogeneous_segments
 idiogram                      Bio::Graphics::Glyph::idiogram
 image                         Bio::Graphics::Glyph::image
 line                          Bio::Graphics::Glyph::line
 primers                       Bio::Graphics::Glyph::primers
 processed_transcript          Bio::Graphics::Glyph::processed_transcript
 rndrect                       Bio::Graphics::Glyph::rndrect
 ruler_arrow                   Bio::Graphics::Glyph::ruler_arrow
 segments                      Bio::Graphics::Glyph::segments
 span                          Bio::Graphics::Glyph::span
 toomany                       Bio::Graphics::Glyph::toomany
 trace                         Bio::Graphics::Glyph::trace
 transcript                    Bio::Graphics::Glyph::transcript
 transcript2                   Bio::Graphics::Glyph::transcript2
 translation                   Bio::Graphics::Glyph::translation
 triangle                      Bio::Graphics::Glyph::triangle
 wormbase_transcript           Bio::Graphics::Glyph::wormbase_transcript
 xyplot                        Bio::Graphics::Glyph::xyplot

The "perldoc" command is handy for reading the documentation from the Unix command line. For example:

  perldoc Bio::Graphics::Glyph::primers

This will provide you with a summary of the options that apply to the "primers" glyph.

In the manual pages, the glyph options are presented the way they are called from Perl. For example, the documentation will tell you to use the -connect_color option to set the color to use when drawing the line that connects the two inward pointing arrows in the primer pair glyph. This translates to the configuration file as an option named "connect_color". For example:

[PCR Products]
glyph = primer
connect_color = blue

When referring to colors, you can use a variety of color names such as "blue" and "green". To get the full list, cut and paste the following magic incantation into the command line:

perl -MBio::Graphics::Panel -e 'print join "\n",Bio::Graphics::Panel->color_names'

or see this URL:

 http://www.wormbase.org/db/seq/gbrowse?help=annotation

Alternatively, you can use the #RRGGBB notation to specify the red, green and blue components of the color. Refer to any book on HTML for the details on using the notation.

Adding features to the overview

You can make any set of tracks appear in the overview by creating a stanza with a title of the format [<label>:overview], where <label> is any unique label of your choice. The format of the stanza is identical to the others, but the indicated track will appear in the overview rather than as an option in the detailed view. For example, this stanza adds to the overview a set of features of method "gene", source "framework":

[framework:overview]
feature       = gene:framework
label         = 1
glyph         = generic
bgcolor       = lavender
height        = 5
key           = Mapped Genes

Similarly, you can make a track appear in the region panel by appending ":region" to its name:

[genedensity:region]
feature       = gene_density
glyph         = xyplot
graph_type    = boxes
scale         = right
bgcolor       = red
fgcolor       = red
height        = 20
key           = SNP Density


Semantic Zooming

Sometimes you will want to change the appearance of a track when the user has zoomed out or zoomed in beyond a certain level. To indicate this, create a set of "length qualified" stanzas of format [<label>:<zoom level>], where all stanzas share the same <label>, and <zoom level> indicates the minimum size of the region that the stanza will apply to. For example:

 [gene]
 feature = transcript:curated
 glyph    = dna
 fgcolor  = blue
 key      = genes
 citation = example semantic zoom track
 [gene:500]
 feature = transcript:curated
 glyph   = transcript2
 [gene:100000]
 feature = transcript:curated
 glyph   = arrow
 [gene:500000]
 feature = transcript:curated
 glyph   = generic

This series of stanzas says to use the "transcript2" glyph when the segment being displayed is 500 bp or longer, to use the "arrow" glyph when the segment being displayed is 100,000 bp or longer, and the "generic" glyph when the region being displayed is 500,000 bp or longer. For all other segment lengths (1 to 499 bp), the ordinary [gene] stanza will be consulted, and the "dna" glyph will be displayed. The bare [gene] stanza is used to set all but the "feature" options for the other stanzas. This means that the fgcolor, key and citation options are shared amongst all the [gene:XXXX] stanzas, but the "feature" option must be repeated.

You can override any options in the length qualified stanzas. For example, if you want to change the color to red in when displaying genes on segments between 500 and 99,999 bp, you can modify the [gene:500] stanza as follows:

 [gene:500]
 feature = transcript:curated
 glyph   = transcript2
 fgcolor = red

It is also possible to display different features at different zoom levels, although you should handle this potentially confusing feature with care.

If you wish to turn off a track entirely, you can use the "hide" flag to hide the track when the display exceeds a certain size:

 [6_frame_translation:50000]
 hide = 1


Computed Options

Some options can be computed at run time by using Perl subroutines as their values. These are known as "callbacks." Currently this works with the values of the "link", "title", "link_target", "header" and "footer" options, and any glyph-specific option that appears in a track section.

You need to know the Perl programming language to take advantage of this. The general format of this type of option is:

 option name = sub {
             some perl code;
             some more perl code;
             even more perl code;
             }

The value must begin with the sequence "sub {" in order to be recognized as a subroutine declaration. After this, you can have one or more lines of Perl code followed by a closing brace. Continuation lines must begin with whitespace.

When the browser first encounters an option like this one, it will attempt to compile it into Perl runtime code. If successful, the compiled code will be stored for later use and invoked whenever the value of the option is needed. (Otherwise, an error message will appear in your server error log).

For options of type "footer" and "header", the subroutine is passed no arguments. It is expected to produce some HTML and return it as a string value.

For glyph-specific features, such as "bgcolor" the subroutine will be called at run time with five arguments consisting of the feature, the name of the option, the current part number of the feature, the total number of parts in this feature, and the glyph corresponding to the feature. Usually you will just look at the first argument. The return value is treated as the value of the corresponding option. For example, this bgcolor subroutine will call the feature's primary_tag() method, and return "blue" if it is an exon, "orange" otherwise:

 bgcolor = sub {
         my $feature = shift;
         return "blue" if $feature->primary_tag eq 'exon';
         return "orange";
         }

See the manual page for Bio::DB::GFF::Feature for information on how to interrogate the feature object.

For special effects, such as coloring the first and last exons differently, you may need access to all five arguments. Here is an example that draws the first and last parts of a feature in blue and the rest in red:

  sub {
        my($feature,$option_name,$part_no,$total_parts,$glyph) = @_;
        return 'blue' if $part_no == 0;                # zero-based indexing!
        return 'blue' if $part_no == $total_parts-1;   # zero-based indexing!
        return 'red';
        }

If you need access to information in the parent of the feature (e.g. in a multipart feature), you can call the glyph's parent_feature() method:

 sub {
        my($feature,$option_name,$part_no,$total_parts,$glyph) = @_;
        my $parent = $glyph->parent_feature;
        return 'blue' if $parent->name =~ /Blue\d+/;
        return 'red';
        }

The parent_feature() method was added to Bioperl on 17 April 2008. If you are using an earlier version, parent_feature() will not be available.

See the Bio::Graphics::Panel manual page for more details.

Callbacks for the "link", "title", and "link_target" options have a slightly different call signature. They receive three arguments consisting of the feature, the Bio::Graphics::Panel object, and the Bio::Graphics::Glyph object corresponding to the current track within the panel:

 link = sub {
            my ($feature, $panel, $track) = @_;
            ... do something
            }

Ordinarily you will only need to use the feature object. The other arguments are useful to look up panel-specific settings such as the pixel width of the panel or the state of the "flip" setting:

 title = sub {
         my ($feature,$panel,$track) = @_;
         my $name = $feature->display_name;
         return $panel->flip ? "$name (flipped)" : $name;
      }

Named Subroutine References

If you use a version of BioPerl after April 15, 2003, you can also use references to named subroutines as option arguments. To use named subroutines, add an init_code section to the [GENERAL] section of the configuration file. init_code should contain nothing but subroutine definitions and other initialization routines. For example:

 init_code = sub score_color {
               my $feature = shift;
               if ($feature->score > 50) {
                 return 'red';
               } else {
                 return 'green';
               }
             }
             sub score_height {
               my $feature = shift;
               if ($feature->score > 50) {
                 return 10;
               } else {
                 return 5;
               }
             }

Then simply refer to these subroutines using the \&name syntax:

   [EST_ALIGNMENTS]
   glyph = generic
   bgcolor = \&score_color
   height  = \&score_height

You can declare global variables in the init_code subroutine if you use "no strict 'vars';" at the top of the section:

   init_code = no strict 'vars';
               $HEIGHT = 10;
               sub score_height {
                 my $feature = shift;
                 $HEIGHT++;
                 if ($feature->score > 50) {
                   return $HEIGHT*2;
                 } else {
                   return $HEIGHT;
                 }
               }

Due to the way the configuration file is parsed, there must be no empty lines in the init_code section. Either use comments to introduce white space, or "use" a .pm file to do anything fancy.

Subroutines that you define in the init_code section, as well as anonymous subroutines, will go into a package that changes unpredictably each time you load the page. If you need a predictable package name, you can define it this way:

  init_code = package My; sub score_height { .... }
  [EST_ALIGNMENTS]
  height = \&My::score_height

Declaring New Aggregators

The Bio::DB::GFF data model recognizes a single-level of "grouping" of features, but doesn't specify how to use the group information to correctly assemble the various individual components into a biological object. Aggregators are used to assemble this information. For example, let's say that you decide that your preferred "transcript" data model contains three subfeature types: a set of one or more features of method "exon", a single feature of method "TSS", and a single feature of method "polyA". Optionally, the data model could contain a single "main subfeature" that runs the length of the entire transcript. We might give this feature a method of "primary_transc" (for "primary transcript.")

In a GFF file, a three-exon transcript might be represented as follows:

Chr1 confirmed primary_transc 100 500  .  +  .  Transcript "ABC.1"
Chr1 confirmed TSS            100 100  .  +  .  Transcript "ABC.1"
Chr1 confirmed exon           100 200  .  +  .  Transcript "ABC.1"
Chr1 confirmed exon           250 300  .  +  .  Transcript "ABC.1"
Chr1 confirmed exon           400 500  .  +  .  Transcript "ABC.1"
Chr1 confirmed polyA          500 500  .  +  .  Transcript "ABC.1"

To aggregate this, you would like to create an aggregator named "transcript", whose "main method" is "primary_transc", and whose "sub methods" are "TSS," "exon," and "polyA."

The way to indicate this in the configuration file is to add a "complex aggregator" to the list of aggregators:

 aggregators = transcript{TSS,exon,polyA/primary_transc}

The format of this value is "aggregator_name{submethod1,submethod2,.../mainmethod}".

You can now use the name of the aggregator name as the argument of the "feature" option in a track section:

 [Transcripts]
 feature      = transcript
 glyph        = segments
 bgcolor      = wheat
 fgcolor      = black
 height       = 10
 key          = Transcripts

If you do not have a main subfeature, leave off the "/mainmethod". For example:

 aggregators = transcript{TSS,exon,polyA}

A few formatting notes. You are free to mix simple and complex aggregators in the "aggregator" option. For example, you can activate the standard "clone" and "alignment" aggregators as well as the new transcript aggregator with a line like this one:

aggregators = clone
              transcript{TSS,exon,polyA/primary_transc}
              alignment

If the complex aggregator contains whitespace or apostrophes, you must surround it with double-quotes, like this:

  "transcript{TSS,5'UTR,3'UTR,exon,polyA/primary_transc}"

Be aware that some glyphs look for particular method names when rendering aggregated features. For example, the standard "transcript" glyph is closely tied to the "transcript" aggregator, and looks for submethods named "intron", "exon" and "CDS", and a main method named "transcript."

Here is the list of available predefined aggregators:

    alignment
    clone
    coding
    transcript
    none
    orf
    waba_alignment
    wormbase_gene

To view the documentation for any of these aggregators, run the command "perldoc Bio::DB::GFF::Aggregator::aggregator_name", where "aggregator_name" is the name of the aggregator.

Grouping Features

gbrowse recognizes the concept of a "group" of related features that are connected by dotted lines. The canonical example is a pair of ESTs that are related by being from the two ends of the same cDNA clone. However many feature databases, including the GFF database recommended for gbrowse, do not allow for arbitrary hierarchical grouping. To work around this, you may specify a feature name-based regular expression that will be used to trigger grouping.

It works like this. Say you are working with EST feature pairs and they follow the nomenclature 501283.5 and 501283.3, where the suffix is "5" or "3" depending on whether the read was from the 5' or 3' ends of the insert. To group these pairs by a dotted line, specify the "group_pattern" option in the appropriate track section:

     group_pattern =  /\.[53]$/

At render time, gbrowse will strip off this pattern from the names of all features in the EST track and group those that have a common base name. Hence 501283.5 and 501283.3 will be grouped together by a dotted line, because after the pattern is removed, they will share the same common name "501283".

This works for all embedded pattern, provided that stripping out the pattern results in related features sharing the same name. For example, if the convention were "est.for.501283" and "est.rev.501283", then this grouping pattern would have the desired effect:

     group_pattern = /\.(for|rev)\./

Don't forget to escape regular expression meta-characters and to consider the various ways in which the regular expression might break. It is entirely possible to create an invalid regular expression, in which case gbrowse will crash until you comment out the offending option.


Controlling the gbrowse_details page

If a track definition's "link" option (see section B2) is set to AUTO, the gbrowse_details script will be invoked when the user clicks on a feature contained within the track. This will generate a simple table of all feature information available in the database. This includes the user-defined tag/value attributes set in Column 9 of the GFF for that feature.

You can control, to some extent, the formatting of the tag value table by providing a configuration stanza with the following format:

 [feature_type:details]
 tag1 = formatting rule
 tag2 = formatting rule
 tag3 = formatting rule

"feature_type" is the type of the feature you wish to control. For example, "gene:sgd" or simply "gene". You may also specify a feature_type of "default" to control the formatting for all features. "tag1", "tag2" and so forth are the tags that you wish to control the formatting of. The tags "Name," "Class", "Type", "Source", "Position", and "Length" are valid for all features, while "Target" and "Matches" are valid for all features that have a target alignment. In addition, you can use the names of any attributes that you have defined. Tags names are NOT case sensitive, and you may use a tag named "default" to define a formatting rule that is general to all tags (more specific formatting rules will override less specific ones).

A formatting rule can be a string with (possible) substitution values, or a callback. If a string, it can contain one or more of the substitution variable "$name", "$start", "$end", "$stop", "$strand", "$method", "$type", "$description" and "$class", which are replaced with the corresponding values from the current feature. In addition, the substitution variable "$value" is replaced with the current value of the attribute, and the variable "$tag" is replaced with the current tag (attribute) name. HTML characters are passed through.

For example, here is a simple way to boldface the Type field, italicize the Length field, and turn the Notes into a Google search:

 [gene:details]
 Type   = <b>$value</b>
 Length = <b>$value</b>
 Note  = <a href="http://www.google.com/search?q=$value">$value</a>

If you provide a callback, the callback subroutine will be invoked with three arguments. WARNING: the three arguments are different from the ones passed to other callbacks, and consist of the tag value, the tag name, and the current feature:

 Note = sub {
            my($value,$tag_name,$feature) = @_;
            do something....
            }

You can use this feature to format sequence attributes nicely. For example, if your features have a Translation attribute which contains their protein translations, then you are probably unsatisified with the default formatting of these features. You can modify this with a callback that word-wraps the value into lines of at most 60 characters, and puts the whole thing in a <pre> section.


[gene:details]
Translation = sub {
               my $value = shift;
               $value =~ s/(\S{1,60})/$1\n/g;
               "<pre>$value</pre>";
            }

Linking out from gbrowse_details

The formatting rule mechanism described in the previous section is the recommended way of creating a link out from the gbrowse_details page. However, an older mechanism is available for backward compatibility.

To use this legacy mechanism, create a stanza header named [TagName:DETAILS], where TagName is the name of the tag (attribute name) whose values you wish to turn into URLs, and where DETAILS must be spelled with capital letters. Put the option "URL" inside this stanza, containing a string to be transformed into the URL.

For example, to link to a local cgi script from the following GFF line:

IV     curated exon    518     550     . + .   Transcript B0273.1; local_id 11723

one might add the following stanza to the configuration file:

   [local_id:DETAILS]
   URL   = http://localhost/cgi-bin/localLookup.cgi?tag=$tag;id=$value

The URL option's value should be a URL containing one or more variables. Variables begin with a dollar sign ($), and are replaced at run time with the information relating to the selected feature attribute. Recognized variables are:

    $tag        The "tag" of the tag/value pair
    $value      The "value" of the tag/value pair

The value of URL can also be an anonymous subroutine, in which case the subroutine will be invoked with a two-element argument list consisting of the name of the tag and its value. This example, provided by Cyril Pommier, will convert Dbxref tags into links to NCBI, provided that the value of the tag looks like an NCBI GI number:

[Dbxref:DETAILS]
URL = sub {
      my ($tag,$value)=@_;
      if ($value =~ /NCBI_gi:(.+)/){
       return "http://www.ncbi.nlm.nih.gov/gquery/gquery.fcgi?term=$1";
       }
       return;
     }


Configuring Balloon Tooltips

GBrowse can display popup balloons when the user hovers over or clicks on a feature. The balloons can display arbitrary HTML, either provided in the config file, or fetched remotely via a URL. You can use this feature to create multiple choice menus when the user clicks on the feature, to pop up images on mouse hovers, or even to create little embedded query forms. See http://mckay.cshl.edu/balloons.html for examples.

In the config file for the database you wish to modify, set "balloon tips" to a true value:

     [GENERAL]
     ...
     balloon tips = 1

Then add "balloon hover" and/or "balloon click" options to the track stanzas that you wish to add buttons to. You can also place these options in [TRACK DEFAULTS] to create a default balloon.

"balloon hover" specifies HTML or a URL that will be displayed when the user hovers over a feature. "balloon click" specifies HTML or a URL that will appear when the user clicks on a feature. The HTML can contain images, formatted text, and even controls. Examples:

 balloon hover = <h2>Gene $name</h2>
 balloon click = <h2>Gene $name</h2>
                 <a href='http://www.google.com/search?q=$name'>Search Google</a><br>
                 <a href='http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&term=$name'>Search NCBI</a><br>

Alternatively, you can populate the balloon using data from an HTML page or dynamic CGI script running on the same server as GBrowse. This uses AJAX; it can often speed up page loading by reducing the amount of text that must be downloaded by the client. To dynamically load the balloon contents from the server, use a balloon hover or balloon click option like this:

 balloon click = /cgi-bin/get_gene_data?gene=$name

In this case, when the user clicks on the feature, it creates a balloon whose content contains the HTML returned by the CGI script "get_gene_data". GBrowse knows that this is a URL rather than the contents of the balloon by looking for the leading slash. However, to reduce ambiguity, we recommend that you prefix the URL with "url:" as so:

 balloon click = url:/cgi-bin/get_gene_data?gene=$name

This also allows you to refer to relative URLs:

 balloon click = url:../../get_gene_data?gene=$name

It is also possible to fill the balloon with content from a remote source. Simply specify a full URL beginning with "http:" "https:" or "ftp:"

balloon hover = http://www.wormbase.org/db/get?name=$name;class=gene

Note that the balloon library uses an internal <iframe> to populate the balloon with the content of external URLs. This means that vertical and horizontal scrollbars will appear if the content is too large to be contained within the balloon. If the formatting does not look right, you can design a custom balloon of the proper size as described in the next section.

The usual option value substitution rules ($name, $start, etc) work with balloons, as do callbacks. GBrowse will automatically escapes single (') and double (") quotes in the values returned by the "balloon hover" and "balloon click" options so that you don't have to worry about them messing up the HTML.

You might also wish to specify "titles are balloons" in the [GENERAL] section:

 [GENERAL]
 titles are balloons = 1

This will generate balloons on all mouse hover events, using the content that would otherwise have been placed in the built-in browser tooltip.

There is a limited amount of balloon customization that you can perform within the [track] section. If you wish the balloon to be sticky (require the user to press the close button) even if it is a hover balloon, then place this option in the [track section]:

 balloon sticky = 1

Setting "balloon sticky" to 0 will have the effect of making balloons disappear as soon as the mouse leaves them, even if it was created by a mouse click event.

If you are displaying content from a remote web or FTP server and you do not like the height of the balloon, you can adjust the height with the "balloon height" option:

 balloon height = 400

Customizing Balloons

GBrowse supports multiple balloons with different shapes, sizes, background images and timing properties. There is one built-in balloon, named "balloon", which should meet most peoples' needs. However, you can configure any number of custom balloons.

To declare two new balloons, create a "custom balloons" option in the [GENERAL] section:

custom balloons = [blue_balloon]
                  images   =  /gbrowse/images/blue_balloons
                  maxWidth = 300
                  shadow   = 0
                  [wide_balloon]
                  maxWidth = 800

This creates two new balloons. The first, named "blue_balloon" will look for its images and icons at the local URL /gbrowse/images/blue_balloons. It will have a maximum width of 300 pixels, and will cast no shadow. The second, named "wide_balloon" takes all the defaults for the default balloon, including the location of its images in the directory /gbrowse/images/balloons, except that it has a maximum width of 800 pixels. The various balloon options are described well on the Popup Balloons page.

To use the blue balloon rather than the standard one, format the "balloon hover" and/or "balloon click" options like this:

 balloon click = [blue_balloon] /cgi-bin/get_gene_data?gene=$name

The [blue_balloon] keyword tells gbrowse to use the blue balloon for clicks on these features. The standard balloon is called "balloon", and so the following two options are equivalent:

 balloon click = /cgi-bin/get_gene_data?gene=$name
 balloon click = [balloon] /cgi-bin/get_gene_data?gene=$name

The images for custom balloons reside in the default location of /gbrowse/images/balloons, unless indicated otherwise using the "images" config option. To use custom balloon images, point "images" to a a web-accessible directory in your document tree which contains the seven PNG images described on the Popup Balloons page. These images must be named as follows:

 balloon.png     down_right.png  up_right.png
 balloon_ie.png  down_left.png   up_left.png
 close.png

Tips for creating these images can be found on Popup Balloons.

Generating Static Images: PNGs, SVGs and PDFs

GBrowse can create three types of static image suitable for incorporation into posters, publications or other web pages:

  • PNG -- a bitmapped format suitable for low-resolution graphics, such as web pages.
  • SVG -- an editable vector-graphics format, suitable for posters, publications and other high-resolution applications.
  • PDF -- the familiar document exchange format, suitable for posters, publications and other high-resolution applications.

All the work is handled by the gbrowse_img script and needs little configuration. When the user selects a region and set of tracks to browse, the "link to image" and "high-res image" links at the top of the page will be automatically set to reproduce the user's view as closely as possible. GBrowse_img customization options, including instructions on how to embed an image in a web page so that the clickable imagemap links are maintained, can be found here.

The PNG generation will work in the default installation. In order to get SVG generation to work, you will need to install the perl SVG and GD::SVG modules, available from CPAN.

For PDF generation, you will need the perl GD and GD::SVG modules installed, as well as a helper application called Inkscape. Inkscape provies a command-line tool that will convert SVG files into PDF. To install, download and install it somewhere on the standard system path (e.g. /usr/bin). You will then need to create two subdirectories in the web user's home directory in order for inkscape to work properly. Assuming that the web user is "www-data" run the following commands:

sudo mkdir ~www-data/{.inkscape,.gnome2}
sudo chown www-data ~www-data/{.inkscape,.gnome2}

This will create the two directories ".inkscape" and ".gnome2" in the www-data user's home directory, and make them writable by the www-data user.

Unfortunately, Inkscape will generate a one line warning into the server error log every time it executes:

(inkscape:28490): Gdk-CRITICAL **: gdk_display_list_devices: assertion `GDK_IS_DISPLAY (display)' failed

At the current time there is no known fix for this problem.

Note that Inkscape PDF generation should work properly on all platforms, including Linux, Mac OSX and Windows. However, it has only been tested on Linux platforms at the current time.

Generating Feature Frequency histograms

Note: this applies to GFF2 databases only and needs to be rewritten slightly for GFF3

With a little bit of additional effort, you can set one or more tracks up to display a density histogram of the features contained within the track. For example, the human data source in the GBrowse demo uses density histograms in the chromosomal overview. In addition, when the features in the SNP track become too dense to view, this track converts into a histogram. To see this in action, turn on the SNP track and then zoom out beyond 150K.

There are four steps for making histograms:

  1. generate the density data using the bp_generate_histogram.pl script.
  2. load the density data using bp_load_gff.pl or bp_fast_load_gff.pl.
  3. declare a density aggregator to the gbrowse configuration file
  4. add the density aggregator to the appropriate track in the configuration file.

The first step is to generate the density data. Currently this is done by generating a GFF file containing a set of "bin" feature types. Use the bp_generate_histogram.pl script to do this. You will find it in BioPerl under the scripts/Bio-DB-GFF directory.

Assuming that your database is named "dicty", you have a feature named SNP, and you wish to generate a density distribution across 10,000 bp bins, here is the command you would use:

 bp_generate_histogram.pl -merge -d dicty -bin 10000 SNP >snp_density.gff

This is saying to use the "dicty" database (-d) option, to use 10,000 bp bins (the -bin option) and to count the occurrences of the SNP feature throughout the database. In addition, the -merge option says to merge all types of SNPs into a single bin. Otherwise they will be stratified by their source. The resulting GFF file contains a series of entries like these ones:

 Chr1  SNP bin 1     10000 49 + . bin Chr1:SNP
 Chr1  SNP bin 10001 20000 29 + . bin Chr1:SNP

What this is saying is that there are now a series of pseudo-features of type "bin:SNP" that occupy successive 10,000 bp regions of the genome. The score field contains the number of times a SNP was seen in that bin.

You'll now load this file using bp_load_gff.pl or bp_fast_load_gff.pl:

 bp_load_gff.pl -d dicty snp_density.gff

The next step is to tell GBrowse how to use this information. You do this by creating a new aggregator for the SNP density information. Open the GBrowse configuration file and find the aggregators option. Add a new aggregator that looks like this:

 aggregators = snp_density{bin:SNP}

This is declaring a new feature named "snp_density" that is composed of subparts of type bin:SNP.

The last step is to declare a track for the density information. You will use the "xyplot" glyph, which can draw a number of graphs, including histograms. To add the SNP density information as a static track in the overview, create a section like this one:

[SNP:overview]
feature       = snp_density
glyph         = xyplot
graph_type    = boxes
scale         = right
bgcolor       = red
fgcolor       = red
height        = 20
key           = SNP Density

This is declaring a new constant track in the overview named "SNP Density." The feature is "snp_density", corresponding to the aggregator declared earlier. The glyph is "xyplot" using the graph type of "boxes" to generate a column graph.

To set up a track so that the histogram appears when the user zooms out beyond 100,000 bp but shows the detailed information at higher magnifications, generate two track sections like these:

 [SNPs]
 feature       = snp
 glyph         = triangle
 point         = 1
 orient        = N
 height        = 6
 bgcolor       = blue
 fgcolor       = blue
 key           = SNPs
 [SNPs:100000]
 feature       = snp_density
 glyph         = xyplot
 graph_type    = boxes
 scale         = right

The first track section sets up the defaults for the SNP track. SNPs are represented as blue triangles pointing North. The second track declaration declares that when the user zooms out to over 100K base pairs, GBrowse should display the snp_density feature using the xyplot glyph.

INTERNATIONALIZATION

GBrowse is partially internationalized. End-users whose browsers are set to request a non-English language will see the GBrowse main and secondary screens in their preferred language, provided that GBrowse has the appropriate translation file.

Translation files are located in gbrowse.conf/languages/ and use the standard two-letter language abbreviations, such as "fr" for French, as well as the regional abbregiations, such as fr-CA for Canadian French. Currently there are translation files for French, Italian, and Japanese. If your favorite language isn't supported, you are encouraged to create a new translation file and contribute it to the GBrowse development effort. Please contact Lincoln Stein (lstein@cshl.org) for help in doing this.

If the end user does not specify a preferred language, GBrowse will default to "en" (English). You can change this by placing a "language" option in the configuration file somewhere inside the [GENERAL] section. For example, to make Japanese the default, create this entry:

 language = ja

GBrowse will still use the end-user's preferred language in preference to the default if the preferred language is available.

Although GBrowse automatically changes the text and button language, it can't automatically translate the track labels. If you would like the track labels to localize, you will have to provide your own translations in the "key", "citation" and "category" options. The syntax is similar to that used for semantic zooming:

 [gene]
 glyph   = transcript
 feature = transcript:curated
 height  = 10
 key     = Named Gene
 key:fr  = GËnes NommÈs
 key:it  = I Geni dati un nome a
 key:sp  = Los Genes denominados
 category = Genes
 category:fr = GËnes

The option is followed by a colon and the two-letter language name to indicate that when the page is being displayed with this language, to use the indicated value of the option. The option without the colon is the default. You may enter accented and umlauted characters directly, as shown, or use the HTML entities. Non-English character sets, such as Japanese, should also work correctly, provided that the translation file indicates the correct character set to use.

HELP FILES:

The GBrowse help files are in English. Although there is support for internationalizing the hep files, no one has done this yet. If you are industrious and wish to translate the help files into your favorite language, find the two help files where they are located in htdocs/gbrowse/. One is named general_help.html, while the other is named annotation_help.html. Translate them, and create new files with the language prefix appended to the end. For example, the French translation of annotation_help.html would be annotation_help.html.fr.

LIMITATIONS:

- There is no localization support. For example, GBrowse will print large numbers using commas (e.g. 1,234,567) instead of periods, even when talking to a European browser.

- Although the HTML frame around the GBrowse genome image will use the appropriate character set, the overview and detail images themselves are limited to Latin alphabets. This is because of limited native character support in the GD library used by GBrowse. When a non-Latin character set is called for, such as Japanese, GBrowse will use Japanese for the frame, but English for the image.

- The rate at which the GBrowse team adds new features to the browser often outstrips the ability of volunteers to update the translation files. This means that new buttons and fields may be displayed in English on an otherwise correctly internationalized page.


AUTHENTICATION AND AUTHORIZATION

You can restrict who has access to gbrowse by IP address, host name, domain or username and password. Restriction can apply to the database as a whole, or to particular annotation tracks.

To limit access to a whole database, you can use Apache's standard authentication and authorization. Gbrowse uses a URL of this form to select which database it is set to:

     http://your.host/cgi-bin/gbrowse/your_database

where "your_database" is the name of the currently selected database. For example, the yeast database is http://your.host/cgi-bin/gbrowse/yeast.

To control access to the entire database, create a <Location> section in httpd.conf. The <Location> section should look like this:

  <Location /cgi-bin/gbrowse/your_database>
       Order deny,allow
       deny from all
       allow from localhost .cshl.edu .ebi.ac.uk
  </Location>

This denies access to everybody except for "localhost" and browsers from the domains .cshl.edu and .ebi.ac.uk. You can also limit by IP address, by username and password or by combinations of these techniques. See http://httpd.apache.org/docs/howto/auth.html for the full details.

You can also limit individual tracks to certain individuals or organizations. Unless the stated requirements are met, the track will not appear on the main screen or any of the configuration screens. To set this up, add a "restrict" option to the track you wish to make off-limits:

       [PROPRIETARY]
       feature = etc
       glyph   = etc
       restrict = Order deny,allow
                  deny from all
                  allow from localhost .cshl.edu .ebi.ac.uk

The value of the restrict option is identical to the Apache authorization directives and can include any of the directives "Order," "Satisfy," "deny from," "allow from," "require valid-user" or "require user." The only difference is that the "require group" directive is not supported, since the location of Apache's group file is not passed to CGI scripts. Note that username/password authentication must be turned on in httpd.conf and the user must have successfully authenticated himself in order for the username to be available.

As with other gbrowse options, restrict can be a code subroutine. The subroutine will be called with three arguments consisting of the host, ip address and authenticated user. It should return a true value to allow access to the track, or a false value to forbid it. This can be used to implement group-based authorization or more complex schemes.

Here is an example that uses the Text::GenderFromName to allow access if the user's name sounds female and forbids access if the name sounds male. (It might be useful for an X-chromosome annotation site.)

   restrict = sub {
              my ($host,$ip,$user) = @_;
              return unless defined $user;
              use Text::GenderFromName qw(gender);
              return gender($user) eq 'f';
            }

You should be aware that the username will only be defined if username authentication is turned on and the user has successfully authenticated himself against Apache's user database using the correct password. In addition, the hostname will only be defined if HostnameLookups have been turned on in httpd.conf. In the latter case, you can convert the IP address into a hostname using this piece of code:

   use Socket;
   $host = gethostbyaddr(inet_aton($addr),AF_INET);

Note that this may slow down the response time of gbrowse noticeably if you have a slow DNS name server.

Another thing to be aware of when restricting access to an entire database is that that even though the database itself will not be accessible to unauthorized users, the name of the database will still be available from the popup "Data Source" menu. If you wish even the name to be suppressed from view by unauthorized users, add the following line to the [GENERAL] section of the configuration file of the database you wish to suppress:

   restrict = require valid-user

The syntax described earlier for restricting access to tracks by hostname, IP address or username holds true for restricting the visibility of the database on the Data Source popup menu.


DISPLAYING GENETIC AND RH MAPS

GBrowse can be tweaked to make it more suitable for displaying genetic and radiation hybrid maps.

The main issue is that the Bio::DB::GFF database expects coordinates to be positive integers, not fractions, but genetic and RH maps use floating point numbers. Working around this is a bit of an ugly hack. Before loading your data you must multiply all your coordinates by a constant power of 10 in order to convert them into integers. For example, if a genetic map uses Morgan units ranging from 0 to 1.80, you would multiple by 100 to create a map in ranging from 0 to 180.

Create a GFF file containing the markers in modified coordinates and load it as usual. Now you must tell GBrowse to reverse these changes. Enter the following options into the [GENERAL] section of the configuration file:

units = M
unit_divider = 100

These two options tell GBrowse to use "M" (Morgan) units, and to divide all coordinates by 100. GBrowse will automatically display the scale using the most appropriate units, so the displayed map will typically be drawn using cM units.


CHANGING THE LOCATION OF THE CONFIGURATION FILES

If you wish to change the location of the gbrowse.conf configuration file directory, you must manually edit the gbrowse CGI script. Open the script in a text editor, and find this section:

###################################################################
# Non-modperl users should change this variable if needed to point
# to the directory in which the configuration files are stored.
#
use constant CONF_DIR => '/usr/local/apache/conf/gbrowse.conf';
#
###################################################################

Change the definition of CONF_DIR to the desired location of the configuration files.

An alternative, for users of mod_perl only, is to add the GBrowseConf per-directory variable to the configuration for the directory in which the gbrowse script lives. This variable overrides the CONF_DIR value. For example:

<Directory /usr/local/apache/cgi-perl>
  SetHandler      perl-script
  PerlHandler     Apache::Registry
  PerlSendHeader  On
  Options         +ExecCGI
  PerlSetVar      GBrowseConf /etc/gbrowse.conf
</Directory>


USING DAS (DISTRIBUTED ANNOTATION SYSTEM) DATABASES

You may insert features from a DAS source into any named track. Create a stanza as usual but instead of specifying the feature type using the "feature" option, give the desired DAS URL using the "remote feature" option:

remote feature = http://dev.hapmap.org/cgi-perl/das/t2d_testing?type=ldblock

Because DAS sources specify the glyph and visualization options, most of the settings such as bgcolor will be ignored. However, the track key and citation options are honored.

You can use the same syntax to load a GFF file or a feature file in Gbrowse upload format into a track. Just provide a URL that returns the desired data.

You can also run GBrowse entirely off a single DAS source. To get this support, you must use Bio::Das version 0.90 or higher, available from http://www.biodas.org.

A sample [GENERAL] configuration section looks like this:

[GENERAL]
description   = Das Example Database (dicty)
db_adaptor    = Bio::Das
db_args       = -source http://www.biodas.org/cgi-bin/das
                -dsn    dicty

The db_adaptor option must be set to "Bio::Das". The db_args option must contain a -source pointing to the base of the remote DAS server, and a -dsn pointing to the name of the annotation database.

The remainder of the configuration file should be configured as described earlier. The following short script will return a list of the feature types known to the remote DAS server. You can use the output of this script as the basis for the tracks to configure.

 #!/usr/bin/perl

 use strict;

 use Bio::Das;
 my $db = Bio::Das->new('http://localhost/cgi-bin/das'=>'dicty');
 print join "\n",$db->types;

Limitations:

The DAS implementation does not descend into subcomponents. For example, if the user requests features on a chromosome, but the remote DAS server has annotated genes using contig coordinates, then the genes will not appear on the chromosome.

The gbrowse_details script does not provide useful information because the DAS/1 protocol does not provide a way to retrieve attribute information on a named feature.


THE Bio::MOBY BROWSER

The BioMOBY project aims to design and deploy platforms that enable and simplify biological database interoperability.

To date, the MOBY-Services (MOBY-S) branch of the BioMOBY project has published a fairly stable API that is now being used by data providers worldwide to publish their data in an interoperable manner. A simple MOBY browser has been written for Gbrowse that allows the end-user to "surf" out of their Gbrowse view and begin exploring data related to the genomic features displayed in Gbrowse.

Configuration of the gbrowse_moby script does, at this time, require some VERY simple code-editing, and small modifications to your XX.organism.conf configuration file. These are described in detail below:

SYNOPSIS
In 0X.organism.conf, for example:
[ORIGIN]
link         = http://yoursite.com/cgi-bin/gbrowse_moby?source=$source&name=$name&class=$class&method=$method&ref=$ref&description=$description
feature      = origin:Sequence
glyph        = anchored_arrow
fgcolor      = orange
font2color   = red
linewidth    = 2
height       = 10
description  = 1
key          = Definition line
link_target  = _MOBY

AND/OR

[db_xref:DETAILS]
URL = http://yoursite.com/cgi-bin/gbrowse_moby?namespace=$tag;id=$value

Note that all you are doing in each case is to associate a mouse click on a particular feature type with an invocation of the gbrowse_moby script, passing a few of the common Gbrowse variables in the GET string. The gbrowse_moby script will take information passed from a click on a Gbrowse feature, or a click on a configured DETAILS GFF attribute type, and initiate a MOBY browsing session with information from that link. Most information is discarded. The only useful information to MOBY is a "namespace" and an "id" within that namespace. Generally speaking, namespaces in Gbrowse will have to be mapped to a namespace in the MOBY namespace ontology (which is derived from the Gene Ontology Database Cross-Reference Abbreviations list). Currently, this requires editing of the gbrowse_moby code, where a Perl hash named %source2namespace maps the GFF source (column 2) to a MOBY namespace:

 $source2namespace{$source} = moby_namespace
REQUIRED LIBRARIES
This script requires libraries from the BioMOBY project. Currently these are only available from the CVS. Anonymous checkout of the BioMOBY project can be accomplished as follows:
 cvs -d :pserver:cvs@cvs.open-bio.org:/home/repository/moby login

When prompted for a password, type "cvs".

 cvs -d :pserver:cvs@cvs.open-bio.org:/home/repository/moby co moby-live
 cvs update -dP

You will then need to enter the moby-live/Perl folder and run

perl Makefile.PL; make; make install

to install the MOBY libraries into your system.

USAGE
gbrowse_moby understands the following variables, some of which (*) may be passed from Gbrowse through a mouse-click into the GET string:
* $source    - converted into a MOBY namespace by parsing
             the 'source' GFF tag against the %source2namespace
             hash.
            (see more detailed explanation in the examples below)
$namespace - used verbatim as a valid MOBY namespace
* $name      - used verbatim as a MOBY id interpreted in the namespace
* $id        - used verbatim as a MOBY id interpreted in the namespace
* $class     - this is the GFF column 9 class; used for the page title
$objectclass - this should be a MOBY Class ontology term
              (becomes Class 'Object' by default, and this
               is usually correct)
$object      - contains the raw XML of a valid MOBY object

Note that you MUST at least pass a namespace-type variable (source/namespace) and an id-type variable (name/id) in order to have a successful MOBY call.

EXAMPLES
Simple GFF: If your GFF were:
     A22344  Genbank  origin  1000  2000  87  +  .

You would set your configuration file as follows:

    [ORIGIN]
    link         = http://yoursite.com/cgi-bin/gbrowse_moby?source=$source&name=$name&class=$class
    feature      = origin:Genbank

and you would edit the gbrowse_moby script as follows:

     my %source2namespace = (
        #   GFF-source           MOBY-namespace
           'Genbank'       =>      'NCBI_Acc',
     );

this maps the GFF source tag "Genbank" to the MOBY namespace "NCBI_Acc" GFF With non-MOBY Attributes: If your GFF were:

     A22344  Genbank origin  1000  2000 87 + . Locus CDC23

You would set your configuration file as follows:

    [ORIGIN]
    link         = http://yoursite.com/cgi-bin/gbrowse_moby?source=$source&name=$name&class=$class
    feature      = origin:Genbank

and you might also set a DETAILS call to handle the Locus Xref: (notice that we use the 'source' tag to force a translation of the foreign namespace into a MOBY namespace)

    [db_xref:DETAILS]
    URL = http://brie4.cshl.org:9320/cgi-bin/gbrowse_moby?source=$tag;id=$value

then to handle the mapping of Locus to YDB_Locus as well as the Genbank GFF source tag you would edit the source2namespace hash in gbrowse_moby to read:

     my %source2namespace = (
        #   GFF-source           MOBY-namespace
           'Genbank'       =>      'NCBI_Acc',
           'Locus'         =>      'YDB_Locus',
     );

GFF With MOBY Attributes: If your GFF were (NCBI_gi is a valid MOBY namespace):

     A22344  Genbank origin  1000  2000 87 + . NCBI_gi 118746

You would set your configuration file as follows:

    [ORIGIN]
    link         = http://yoursite.com/cgi-bin/gbrowse_moby?source=$source&name=$name&class=$class
    feature      = origin:Genbank

and you might also set a DETAILS call to handle the NCBI_gi Xref: (notice that we now use the 'namespace' tag to indicate that the tag is already a valid MOBY namespace)

    [db_xref:DETAILS]
    URL = http://brie4.cshl.org:9320/cgi-bin/gbrowse_moby?namespace=$tag;id=$value

Since there is no need to map the namespace portion, we now only need to handle the Genbank GFF source as before:

     my %source2namespace = (
        #   GFF-source           MOBY-namespace
           'Genbank'       =>      'NCBI_Acc',
     );
HINTS
-The full listing of valid MOBY namespaces is available at:
   http://mobycentral.cbr.nrc.ca/cgi-bin/types/Namespaces

-A useful mapping to make is to put the organism name into the Global_Keyword namespace. This will trigger discovery of MedLine searches for papers about that organism.


BioMOBY Services

A selection of services are distributed with the Gbrowse package that will allow you to serve your underlying data using the BioMOBY Services architecture.

To enable these, simply do the following:

. Set-up and fill your database
as per the normal Gbrowse instructions
. Edit the moby.conf file
in the /$CONFIG/gbrowse.conf/MobyServices folder. It should be set up as follows:

a. Reference: Your reference sequences will be based on some type of identifier - e.g. they will be from Genbank or from Embl or from Flybase, etc. Look-up the BioMOBY namespace corresponding to the type of identifier you are using for your Reference sequences and put that identifier here. -The full listing of valid MOBY namespaces is available at:

   http://mobycentral.cbr.nrc.ca/cgi-bin/types/Namespaces

b. authURI: You are required to identify yourself when registering MOBY Services. Your authURI is a URI uniquely identifying you. This is generally your domain (e.g. flybase.org) c. contactEmail: You are required to provide a contact email address to which people can contact you v.v. the services you are providing. d. CGI_URL: This is simply the URL to the folder from which you are serving your gbrowse scripts. e.g. http://flybase.org/cgi-bin/gbrowse/ DO NOT include the script name in this parameter! It is the folder only!! e. [Namespace_Class_Mappings]: This section is just a list of tuples indicating the relationship between various entities in your database (e.g. Genes, Transcripts) and their equivalent BioMOBY namespaces. For example, if you are TAIR, and you have entities in your database called "Locus", you would add the line:

       Locus = TAIR_Locus

to this section of the config file. This will allow people who have TAIR_Locus identifiers in-hand to discover your service and request information about that locus from your database. You may add as many Namespace->Class mappings as you wish; one per line.

. REGISTERING SERVICES
To register your services with the MOBY Central web service registry simply run the register_moby_services.pl script, located in the Generic-Genome-Browser/bin folder. The script documentation can be retrieved with POD or simple documentation can be printed by simply running the script with no command-line parameters. Generally speaking you need only run:
perl register_moby_services.pl -register

As services are registered they will be added to a file: registeredMOBYServices.dat. This file is used to de-register your services if you wish to do so. To deregister, simply run:

perl register_moby_services.pl -clean

If your .dat file is not available, cleaning your services will be unsuccessful.

. Service script
Your services are served by the script moby_server in your cgi-bin folder. This is auto-configured by the register_services step above, so generally speaking you do not need to edit this script.

FILTERING SEARCH RESULTS

GBrowse provides a method to filter the contents of individual tracks based on information that can be obtained from feature attributes. For example, suppose you have performed a blast and added all hits as similarity features on an entry. In gbrowse, all those features can get a little crowdy. The administrator can decide to show only the top 5 of the blast hits. This can easily be accomplished by adding the filter option in the conf file. It might look like this:

 [BLAST]
 feature       = blast
 glyph         = segments
 filter = sub {
                my $feat = shift;
                (my $rank) = $feat->get_tag_values('rank'); # persistent Bio::SeqFeature::Generic features
                #(my $rank) = $feat->attributes('rank'); # Bio::DB::GFF::Feature
                $rank < 6;
              }

Another useful example is to show features coming from a plain genbank file. When loaded into BioSQL the source becomes 'EMBL/Genbank/SwissProt'. Using the Bio::DB::Das::BioSQL adaptor you have to pass the source to the feature option. It can be rather difficult to distinguish all the features when they all have the same source string. This problem can be solved using the filter option. In the following example the difference between the features is done based on the primary_tag

 [REGION]
 feature      = EMBL/GenBank/SwissProt
 filter       = sub {
                 my $feat = shift;
                 $feat->primary_tag =~ /region/i;
                }
 key          = RefSeq Protein Domains
 [SIGPEPTIDE]
 feature      = EMBL/GenBank/SwissProt
 filter       = sub {
                 my $feat = shift;
                 $feat->primary_tag =~ /sig_peptide/i;
                }
 key          = RefSeq Signal Peptide


INVOKING GBROWSE URLs (under construction)

This section describes the public CGI parameters recognized by GBrowse. By setting the parameters in the URL, you can get gbrowse to do various useful things:

The source argument: The last component of the gbrowse path is the symbolic name of the data source. For example:

  http://www.your.site/cgi-bin/gbrowse/volvox
  http://www.your.site/cgi-bin/gbrowse/yeast
  http://www.your.site/cgi-bin/gbrowse/my_testing_database

These will correspond to config files named volvox.pm, yeast.pm and my_testing_database.pm respectively. As noted earlier, you can place numbers in front of the configuration file names in order to adjust the order in which they appear in the data source menu. NOTE: For obscure reasons involving Internet Explorer compatibility, gbrowse will add an extra slash to the end of the URL, resulting in URLs that look like:

 http://www.your.site/cgi-bin/gbrowse/yeast/?q=NAB2

Don't worry about this. The URL works the same with and without the terminal slash. q: The argument "q" will set the landmark or search string:

   http://www.your.site/cgi-bin/gbrowse/yeast?q=NAB2

This will have the same effect as typing "NAB2" into the gbrowse search box. To go immediately to the multiple hits page (which shows hits on several overview panels), use multiple q arguments:

  http://www.your.site/cgi-bin/gbrowse/yeast?q=NAB2;q=NPY1

Alternatively, you can use a single q parameter and separate each landmark name with a dash:

  http://www.your.site/cgi-bin/gbrowse/yeast?q=NAB2-NPY1

The rules for specifying relative offsets and object classes are the same as in the main search field:

  http://www.your.site/cgi-bin/gbrowse/yeast?q=Gene:NAB2:1..5000

ref, start, stop, end: Together the "ref," "start" and "stop" arguments specify the reference sequence and the start and end coordinates of the region of interest. The "q" argument, if present, overrides these settings. The "end" argument is a synonym for "stop". label: The tracks to display. This parameter must contain the track names (i.e. the names in [brackets] in the config file) separated by "+" or "-" characters. For example:

  http://www.your.site/cgi-bin/gbrowse/yeast?label=ORFs-tRNAs

To use the "+" character you may have to URL escape it:

  http://www.your.site/cgi-bin/gbrowse/yeast?label=ORFs%2BtRNAs

All tracks not explicitly given by the label parameter will be closed (disabled). enable: Tracks to enable. The tracks indicated by this parameter will be opened in addition to any tracks that were previously opened by the user. The format is the same as label:

  http://www.your.site/cgi-bin/gbrowse/yeast?enable=ORFs-tRNAs

disable: Tracks to close. The tracks indicated by this parameter will be disabled. Tracks not mentioned by this parameter will keep their previous state. The format is the same as label:

  http://www.your.site/cgi-bin/gbrowse/yeast?disable=ORFs-tRNAs

When modifying track state, the "label" parameter is processed first, followed by the "enable" parameter and the "disable" parameter. flip: Whether to flip the display. If set to a true value (flip=1), then the coordinates will be reversed so that forward strand features become reverse strand features. If set to a false value (flip=0) or absent, then the forward strand is displayed as per usual. width: Set the width of the overview, region and details images, in pixels. region_size: Set the length of the region covered by the "region" panel, in base pairs. h_feat: The name of a feature to highlight in the format "<feature_name>@<color_name>". Example:

     h_feat=SKT5@blue

You may omit "@color", in which case the highlight will default to yellow. You can specify multiple h_feat arguments in order to highlight several features with distinct colors. Passing an argument of h_feature=_clear_ will clear all feature highlighting. h_region: The name of a region to highlight in the format "<seq_id>:start..end@color". Example:

     h_region=Chr3:200000..250000@wheat

You may omit "@color" in which case the highlight will default to lightgrey. You can specify multiple h_region arguments in order to highlight multiple sequence ranges with different colors. Passing an argument of h_region=_clear_ will clear all region highlighting. ks: The position of the key in the detail panel. Possible values are "between," "beneath," "left" and "right". sk: The sort order of track names in the "Tracks" panel. Values are "sorted" (alphabetically sorted by name) and "unsorted" (sorted by the order of tracks as defined in the config file). add: Upload a feature and add it in its own track. The format is "reference+type+name+start..end", where reference is the landmark for the coordinates (e.g. a named gene or chromosome), type is the type of the feature, name is the name of the feature, and start..end are the start and end coordinates. For a feature that has multiple segments, you may use multiple start..end ranges, separated by commas. Example:

 add=chr3+miRNA+mir144+2309229..2309300,2309501..2309589

Pass multiple "add" parameters to upload several features. "add" can be abbreviated to "a" for terseness. style: Specify the style for features uploaded using "add". It is a flattened version of the style configuration sections described in this document. Lines are separated by "+" symbols rather than newlines. The first word in the argument is the feature type to configure, for example "miRNA." Subsequent option=value pairs control the glyph and glyph options. For example, if you have added a "miRNA" annotation, then you can tell the renderer to use a red arrow for this glyph in this way:

  style=miRNA+glyph=arrow+fgcolor=red

"style" can be abbreviated to "s" for terseness. track_options: If true, open up the track configuration page. help: Open up the specified help page. Possible values are:

    "general"    open the general help page
    "citations"  open up the track description & citation page
    "link_image" open the page that describes how to
                 generate an embedded image of the current view
     "svg_image" the page that describes how to generate SVGs

id: The id is a unique session ID that will store persistent configuration information. You do not typically need to use the id parameter except in the circumstance in which you wish to upload an annotation file programatically, in which case you should choose some large hard-to-guess number. Upload, upload_annotations, id: These three arguments must be present in order to upload a file of external annotations to the server. "Upload" must be a true value, and "upload_annotations" will contain the content of the uploaded file. Note that you must POST the data using MIME type "multipart/form-data". The "id" argument is used to associated the upload with a session. Pick some long, hard to guess number. This will be associated stably with the uploaded file(s). To see the upload information, provide the same number in the "id" argument every time you access gbrowse. eurl: Specify the URL of a remote annotation source to load into the database. You should also supply an "id" argument as well, as described earlier, in order to be able to view the annotations. plugin, plugin_do: These arguments run plugins. The "plugin" argument gives the name of the plugin to activate. The name is the last component of the plugin package name, e.g. FastaDumper. The "plugin_do" argument selects what to do with the plugin. Possible values are "Configure", "Find" and "Go". "Configure" launches the plugin's configure page, "Go" runs dumper plugins' dump operation, and "Find" activates finder plugins' find function. For find operations, you should in most cases pass the find string in the "q" argument, but this depends on the particular plugin. Each plugin may have its own set of URL arguments. A plugin's arguments are preceded by the plugin's name. For example, the FastaDumper plugin has a parameter named "format" which controls the output format. So to invoke this plugin and make the output plain text, one would provide the arguments:

http://www.your.site/cgi-bin/gbrowse/yeast?q=NUT21;plugin=FastaDumper;
            plugin_do=Go;FastaDumper.format=text

Plugins tend not to be well documented, so you may have to read through the source code to figure out their arguments.

IMPORTANT MAINTENANCE

GBrowse creates lots of cache files as it operates, and it does not garbage collect them automatically. To keep the cache files under control, you should run the following cron job at regular intervals:

        cd HTDOCS/gbrowse/tmp
        find . -type f -atime +20 -print -exec rm {} \;

Be sure to replace HTDOCS with the path to your web server HTML document root directory, and make sure that the cron job runs under has the proper permissions to delete the files in this directory.


FURTHER INFORMATION

For further information, bug reports, etc, please consult the GMOD Mailing Lists. The main mailing list for gbrowse support is mailto:gmod-gbrowse@lists.sourceforge.net.

Have fun!

Lincoln Stein & the GMOD development team lstein@cshl.edu