GBrowse

From GMOD
Revision as of 18:43, 11 March 2011 by Elee@berkeleybop.org (Talk)

Jump to: navigation, search

Template:SessionHead

GBrowse Session

2011 GMOD Spring Training
8-12 March 2011
Scott Cain

{{#icon: GBrowseLogo.png|GBrowse|200|gmod:GBrowse}}


Prerequisites

Installed before using apt or cpan.

Install GBrowse

Easily installed via the cpan shell:

 sudo cpan
 cpan> install Bio::Graphics::Browser2

Which gets all of the prereqs that aren't installed on the machine.

Tutorial

Go to http://localhost/gbrowse2

Basic Chado Configuration (if we have time)

Bio::DB::Das::Chado was installed when we created the image. Sample configuration files are available with GBrowse, and we'll get the sample Chado file:

 wget http://gmod.svn.sourceforge.net/viewvc/gmod/Generic-Genome-Browser/trunk/contrib/conf_files/07.chado.conf -O pythium.conf


Some simple tweaks and additions:

  • Change description
  • Get rid of database = main
  • Remove or change examples (yeast examples don't help anybody)
  • Add initial landmark (initial landmark = scf1117875582023)

DB connection info

[annotation:database]
db_adaptor    = Bio::DB::Das::Chado
db_args       = -dsn dbi:Pg:dbname=chado
                -user gmod
                -inferCDS 1
                -srcfeatureslice 1
search options = default

Add a BAM data source

[bam_sample:database]
db_adaptor     = Bio::DB::Sam
db_args        = -fasta /var/www/gbrowse2/databases/pythium/scf1117875582023.fasta
                 -bam   /var/www/gbrowse2/databases/pythium/simulated-sorted.bam
search options = default

Add track defaults

[TRACK DEFAULTS]
glyph       = generic
database    = annotation
height      = 8
bgcolor     = cyan
fgcolor     = black
label density = 25
bump density  = 100

Note particularly the "database" entry--for most tracks we'll be using the annotation database, but the bam_sample data source will be available when we want it.

Add some tracks

[Genes]
feature      = gene
glyph        = gene
ignore_sub_part = polypeptide
#bgcolor      = yellow
forwardcolor = yellow
reversecolor = turquoise
label        = sub { my $f = shift;
                    my $name = $f->display_name;
                    my @aliases = sort $f->attributes('Alias');
                    $name .= " (@aliases)" if @aliases;
                    $name;
  }
height       = 6
description  = 0
key          = Named gene

[CDS]
feature      = mRNA
glyph        = cds
description  = 0
ignore_sub_part = polypeptide exon
height       = 26
sixframe     = 1
label        = sub {shift->name . " reading frame"}
key          = CDS
citation     = This track shows CDS reading frames.

[repeats]
feature      = match:repeatmasker
glyph        = generic
bgcolor      = black
key          = Repeats

[ests]
feature      = expressed_sequence_match
glyph        = segments
stranded     = 1
bgcolor      = green
key          = EST matches

[proteins]
feature      = protein_match
glyph        = segments
stranded     = 1
bgcolor      = pink
fgcolor      = red
key          = protein matches

[CoverageXyplot]
feature        = coverage
glyph          = wiggle_xyplot
database       = bam_sample
height         = 50
fgcolor        = black
bicolor_pivot  = 20
pos_color      = blue
neg_color      = red
key            = Coverage (xyplot)

[Reads]
feature        = match
glyph          = segments
draw_target    = 1
show_mismatch  = 1
mismatch_color = red
database       = bam_sample
bgcolor        = blue
fgcolor        = white
height         = 5
label density  = 50
bump           = fast
key            = Reads 

[Pair]
feature       = read_pair
glyph         = segments
database      = bam_sample
draw_target   = 1
show_mismatch = 1
bgcolor       = sub {
                my $f = shift;
                return $f->attributes('M_UNMAPPED') ? 'red' : 'green';
                }
fgcolor       = green
height        = 3
label         = sub {shift->display_name}
label density = 50
bump          = fast
connector     = dashed
balloon hover = sub {
                my $f     = shift;
                return  unless $f->type eq 'match';
                return 'Read: '.$f->display_name.' : '.$f->flag_str;
                }
key           = Read Pairs

Add our new database to the GBrowse.conf

To let GBrowse know that there is a new database available, we have to add a few lines to GBrowse.conf. Add this to the bottom:

[pythium]
description   = Pythium ultimum
path          = pythium.conf

Updating SAMtools

The version of SAMtools may need to be updated. Get the samtools release:

 cd ~/Documents/Software
 wget -O samtools-0.1.13.tar.bz2 http://sourceforge.net/projects/samtools/files/samtools/0.1.13/samtools-0.1.13.tar.bz2/download
 tar jxvf samtools-0.1.13.tar.bz2
 cd samtools-0.1.13
 make

Install Bio::DB::Sam:

 sudo cpan
 cpan> install Bio::DB::Sam

when asked "Please enter the location of the bam.h and compiled libbam.a files:", answer:

 /home/gmod/Documents/Software/samtools-0.1.13

Add semantic zooming for the BAM tracks

Not doing this for very dense data (like BAM) is probably the number one performance killers for GBrowse; asking GBrowse to draw a track that has thousands of glyphs is time consuming (and ultimately, probably not very informative).

[Reads:5001]
feature        = coverage
glyph          = wiggle_density
height         = 15

[Pair:5001]
feature        = coverage
glyph          = wiggle_density
height         = 15
bgcolor        = purple

Add "show summary" functionality

For other tracks, when zoomed way out (100kb or 1MB), performance can similarly suffer, with a decreasing "information" content. Newer versions of GBrowse provide the ability to automatically generate density plots when zoomed out. This functionality is available from Chado and Bio::DB::SeqFeature::Store data adaptors. To prepare our Chado database to do this semantic zooming, we need to run a script that comes with Bio::DB::Das::Chado:

 cd ~/Documents/Software/gbrowse-adaptors/Chado
 svn update
 perl bin/gmod_create_summary_statistics.pl

and then add to the pythium.conf file, somewhere near the top (ie, not in the track definitions):

 show summary = 99999

Enabling full text searching

If we try searching for "gene 7.92", we'll get "Not Found" as a result, even though genemark-scf1117875582023-abinit-gene-7.92 does exist. To look for partial strings, we need to enable full text searching. To do so, we need to run another script that comes with Bio::DB::Das::Chado:

 perl /home/gmod/Documents/Software/gbrowse-adaptors/Chado/bin/gmod_chado_fts_prep.pl

This does several things (including poorly estimating how long it will take to finish), including creating materialized views, using a tool provided by SOL Genomics Network (SGN). In practice, it would be a good idea to read the documentation of gmod_materialized_view_tool.pl for information on keeping the view up to date.

We also have to tell GBrowse that this Chado database can now do full text searching, by adding this to the Chado database stanza:

 -fulltext 1

Now we can search for "gene 7.92" and we'll find our gene (plus it's mRNA and exons) and we can click on the gene to see it in GBrowse.

Evaluation

Please give us your comments on this session. We will ask for your feedback on each session and the course as a whole on the last day. Your comments will help guide the direction and content of future GMOD training and outreach efforts.


Next session →   Apollo

Facts about "GBrowse"RDF feed
Available on platformweb +
Date published2009 +, 2008 +, 2007 +, 2006 + and 2002 +
Has DOI10.1002/0471250953.bi0909s28 +, 10.1186/1471-2148-9-221 +, 10.1016/j.tube.2009.07.005 +, 10.1089/zeb.2008.0531 +, 10.1002/0471250953.bi0909s17 +, 10.1371/journal.pone.0000322 +, 10.1186/1751-0473-1-4 + and 10.1101/gr.403602 +
Has PMCIDPMC2755008 +, PMC1829191 +, PMC1636335 + and PMC187535 +
Has PMID19957275 +, 19732458 +, 19683474 +, 18554176 +, 18428797 +, 17389913 +, 17147784 + and 12368253 +
Has URLhttp://sourceforge.net/projects/gmod/files/Generic%20Genome%20Browser/ +, https://github.com/GMOD/GBrowse +, http://gbrowse.org +, http://www.wormbase.org/tools/genome/gbrowse/c_elegans/ +, http://flybase.org/cgi-bin/gbrowse/dmel + and http://hapmap.ncbi.nlm.nih.gov/cgi-perl/gbrowse/gbrowse +
Has authorDonlin MJ +, Cheng F +, Chen W +, Richards E +, Deng L +, Zeng C +, Bhardwaj A +, Bhartiya D +, Kumar N +, Open Source Drug Discovery Consortium +, Scaria V +, Meli R +, Prasad A +, Patowary A +, Lalwani MK +, Maini J +, Sharma M +, Singh AR +, Kumar G +, Jadhav V +, Sivasubbu S +, Schacherer J +, Ruderfer DM +, Gresham D +, Dolinski K +, Botstein D +, Kruglyak L +, Wilkinson M +, Stein LD +, Mungall C +, Shu S +, Caudy M +, Mangone M +, Day A +, Nickerson E +, Stajich JE +, Harris TW +, Arva A + and Lewis S +
Has descriptionGBrowse is a combination of database and iGBrowse is a combination of database and interactive web pages for manipulating and displaying annotations on genomes. Features include:
  • Simultaneous bird's eye and detailed views of the genome.
  • Scroll, zoom, center.
  • Use a variety of premade glyphs or create your own.
  • Attach arbitrary URLs to any annotation.
  • Order and appearance of tracks are customizable by administrator and end-user.
  • Search by annotation ID, name, or comment.
  • Supports third party annotation using GFF formats.
  • Settings persist across sessions.
  • DNA and GFF dumps.
  • Connectivity to different databases, including BioSQL and Chado.
  • Multi-language support.
  • Third-party feature loading.
  • Customizable plug-in architecture (e.g. run BLAST, dump & import many formats, find oligonucleotides, design primers, create restriction maps, edit features)
Note that the information on this page refers to GBrowse 2; GBrowse 1.x is recommended only for applications where legacy browser support is required and a single database is used.
is required and a single database is used. +
Has development statusactive +
Has full nameGeneric Genome Browser +
Has input formatGFF3 + and GFF2 +
Has licenceGPL2 + and Artistic License +
Has logoGBrowseLogo.png +
Has publication detailsCurrent protocols in bioinformatics / editoral board, Andreas D. Baxevanis ... [et al.] Chapter 9: Unit 9.9 +, BMC evolutionary biology 9: 221 +, Tuberculosis (Edinburgh, Scotland) 89: 386-7 +, Zebrafish 5: 125-30 +, PloS one 2: e322 +, Source code for biology and medicine 1: 4 + and Genome research 12: 1599-610 +
Has software maturity statusmature +
Has support statusactive +
Has titleWormBase +, FlyBase +, HapMap +, Using the Generic Genome Browser (GBrowse). +, SNP@Evolution: a hierarchical database of positive selection on the human genome. +, TBrowse: an integrative genomics map of Mycobacterium tuberculosis. +, FishMap: a community resource for zebrafish genomics. +, Genome-wide analysis of nucleotide-level variation in commonly used Saccharomyces cerevisiae strains. +, Gbrowse Moby: a Web-based browser for BioMoby Services. + and The generic genome browser: a building block for a model organism system database. +
Has topicGBrowse +
Is open sourceYes +
Link typedownload +, source code +, website + and wild URL +
Published inCurrent protocols in bioinformatics / editoral board, Andreas D. Baxevanis ... [et al.] +, BMC evolutionary biology +, Tuberculosis (Edinburgh, Scotland) +, Zebrafish +, PloS one +, Source code for biology and medicine + and Genome research +
Release date1 January 2001 +
Tool functionality or classificationGenome Visualization & Editing +
Written in languagePerl +
Has subobjectThis property is a special property in this wiki.GBrowse#http://sourceforge.net/projects/gmod/files/Generic%20Genome%20Browser/ +, GBrowse#https://github.com/GMOD/GBrowse +, GBrowse#http://gbrowse.org +, GBrowse#http://www.wormbase.org/tools/genome/gbrowse/c_elegans/ +, GBrowse#http://flybase.org/cgi-bin/gbrowse/dmel + and GBrowse#http://hapmap.ncbi.nlm.nih.gov/cgi-perl/gbrowse/gbrowse +