Installed before using apt or cpan.
Easily installed via the cpan shell:
sudo cpan
cpan> install Bio::Graphics::Browser2
Which gets all of the prereqs that aren’t installed on the machine.
Go to http://localhost/gbrowse2
Bio::DB::Das::Chado was installed when we created the image. Sample configuration files are available with GBrowse, and we’ll get the sample Chado file:
wget http://gmod.svn.sourceforge.net/viewvc/gmod/Generic-Genome-Browser/trunk/contrib/conf_files/07.chado.conf -O pythium.conf
Some simple tweaks and additions:
database = main
initial landmark = scf1117875582023
)[annotation:database]
db_adaptor = Bio::DB::Das::Chado
db_args = -dsn dbi:Pg:dbname=chado
-user gmod
-inferCDS 1
-srcfeatureslice 1
search options = default
[bam_sample:database]
db_adaptor = Bio::DB::Sam
db_args = -fasta /var/www/gbrowse2/databases/pythium/scf1117875582023.fasta
-bam /var/www/gbrowse2/databases/pythium/simulated-sorted.bam
search options = default
[TRACK DEFAULTS]
glyph = generic
database = annotation
height = 8
bgcolor = cyan
fgcolor = black
label density = 25
bump density = 100
Note particularly the “database” entry–for most tracks we’ll be using the annotation database, but the bam_sample data source will be available when we want it.
[Genes]
feature = gene
glyph = gene
ignore_sub_part = polypeptide
#bgcolor = yellow
forwardcolor = yellow
reversecolor = turquoise
label = sub { my $f = shift;
my $name = $f->display_name;
my @aliases = sort $f->attributes('Alias');
$name .= " (@aliases)" if @aliases;
$name;
}
height = 6
description = 0
key = Named gene
[CDS]
feature = mRNA
glyph = cds
description = 0
ignore_sub_part = polypeptide exon
height = 26
sixframe = 1
label = sub {shift->name . " reading frame"}
key = CDS
citation = This track shows CDS reading frames.
[repeats]
feature = match:repeatmasker
glyph = generic
bgcolor = black
key = Repeats
[ests]
feature = expressed_sequence_match
glyph = segments
stranded = 1
bgcolor = green
key = EST matches
[proteins]
feature = protein_match
glyph = segments
stranded = 1
bgcolor = pink
fgcolor = red
key = protein matches
[CoverageXyplot]
feature = coverage
glyph = wiggle_xyplot
database = bam_sample
height = 50
fgcolor = black
bicolor_pivot = 20
pos_color = blue
neg_color = red
key = Coverage (xyplot)
[Reads]
feature = match
glyph = segments
draw_target = 1
show_mismatch = 1
mismatch_color = red
database = bam_sample
bgcolor = blue
fgcolor = white
height = 5
label density = 50
bump = fast
key = Reads
[Pair]
feature = read_pair
glyph = segments
database = bam_sample
draw_target = 1
show_mismatch = 1
bgcolor = sub {
my $f = shift;
return $f->attributes('M_UNMAPPED') ? 'red' : 'green';
}
fgcolor = green
height = 3
label = sub {shift->display_name}
label density = 50
bump = fast
connector = dashed
balloon hover = sub {
my $f = shift;
return unless $f->type eq 'match';
return 'Read: '.$f->display_name.' : '.$f->flag_str;
}
key = Read Pairs
To let GBrowse know that there is a new database available, we have to add a few lines to GBrowse.conf. Add this to the bottom:
[pythium]
description = Pythium ultimum
path = pythium.conf
The version of SAMtools may need to be updated. Get the samtools release:
cd ~/Documents/Software
wget -O samtools-0.1.13.tar.bz2 http://sourceforge.net/projects/samtools/files/samtools/0.1.13/samtools-0.1.13.tar.bz2/download
tar jxvf samtools-0.1.13.tar.bz2
cd samtools-0.1.13
make
Install Bio::DB::Sam:
sudo cpan
cpan> install Bio::DB::Sam
when asked “Please enter the location of the bam.h and compiled libbam.a files:”, answer:
/home/gmod/Documents/Software/samtools-0.1.13
Not doing this for very dense data (like BAM) is probably the number one performance killers for GBrowse; asking GBrowse to draw a track that has thousands of glyphs is time consuming (and ultimately, probably not very informative).
[Reads:5001]
feature = coverage
glyph = wiggle_density
height = 15
[Pair:5001]
feature = coverage
glyph = wiggle_density
height = 15
bgcolor = purple
For other tracks, when zoomed way out (100kb or 1MB), performance can similarly suffer, with a decreasing “information” content. Newer versions of GBrowse provide the ability to automatically generate density plots when zoomed out. This functionality is available from Chado and Bio::DB::SeqFeature::Store data adaptors. To prepare our Chado database to do this semantic zooming, we need to run a script that comes with Bio::DB::Das::Chado:
cd ~/Documents/Software/gbrowse-adaptors/Chado
svn update
perl bin/gmod_create_summary_statistics.pl
and then add to the pythium.conf file, somewhere near the top (ie, not in the track definitions):
show summary = 99999
If we try searching for “gene 7.92
”, we’ll get “Not Found” as a
result, even though genemark-scf1117875582023-abinit-gene-7.92 does
exist. To look for partial strings, we need to enable full text
searching. To do so, we need to run another script that comes with
Bio::DB::Das::Chado:
perl /home/gmod/Documents/Software/gbrowse-adaptors/Chado/bin/gmod_chado_fts_prep.pl
This does several things (including poorly estimating how long it will
take to finish), including creating materialized views, using a tool
provided by SOL Genomics Network (SGN).
In practice, it would be a good idea to read the documentation of
gmod_materialized_view_tool.pl
for information on keeping the view up
to date.
We also have to tell GBrowse that this Chado database can now do full text searching, by adding this to the Chado database stanza:
-fulltext 1
Now we can search for “gene 7.92
” and we’ll find our gene (plus its
mRNA and exons) and we can click on the gene to see it in GBrowse.