Difference between revisions of "GBrowse"
(→GBrowse Development: Holy crap--it is not on hold. Who wrote that?) |
(→Add our new database to the GBrowse.conf) |
||
(30 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
+ | {{SessionHead}} | ||
+ | {| class="tutorialheader" | ||
+ | | {{TutorialTitleLine|GBrowse}}<br /> | ||
+ | [[2011 GMOD Spring Training]]<br /> | ||
+ | 8-12 March 2011<br /> | ||
+ | [[User:Scott|Scott Cain]] | ||
+ | | align="right" | {{#icon: GBrowseLogo.png|GBrowse|200|gmod:GBrowse}} | ||
+ | |} | ||
− | + | {{TocRight}} | |
+ | =Prerequisites= | ||
− | + | Installed before using apt or cpan. | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | = | + | =Install GBrowse= |
− | + | Easily installed via the cpan shell: | |
+ | <span class="enter">sudo cpan</span> | ||
+ | cpan> <span class="enter">install Bio::Graphics::Browser2</span> | ||
− | + | Which gets all of the prereqs that aren't installed on the machine. | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | =Tutorial= | |
− | + | ||
− | + | Go to http://localhost/gbrowse2 | |
− | [[ | + | =Basic [[Chado]] Configuration (if we have time)= |
− | + | {{CPAN|Bio::DB::Das::Chado}} was installed when we created the image. Sample configuration files are available with GBrowse, and we'll get the sample Chado file: | |
− | + | <span class="enter">wget http://gmod.svn.sourceforge.net/viewvc/gmod/Generic-Genome-Browser/trunk/contrib/conf_files/07.chado.conf -O pythium.conf</span> | |
− | |||
− | |||
− | + | Some simple tweaks and additions: | |
− | == | + | *Change description |
+ | *Get rid of <tt>database = main</tt> | ||
+ | *Remove or change examples (yeast examples don't help anybody) | ||
+ | *Add initial landmark (<tt>initial landmark = scf1117875582023</tt>) | ||
− | + | ==DB connection info== | |
− | === | + | [annotation:database] |
+ | db_adaptor = Bio::DB::Das::Chado | ||
+ | db_args = -dsn dbi:Pg:dbname=chado | ||
+ | -user gmod | ||
+ | -inferCDS 1 | ||
+ | -srcfeatureslice 1 | ||
+ | search options = default | ||
− | + | ==Add a BAM data source== | |
− | + | ||
− | + | ||
− | === | + | [bam_sample:database] |
+ | db_adaptor = Bio::DB::Sam | ||
+ | db_args = -fasta /var/www/gbrowse2/databases/pythium/scf1117875582023.fasta | ||
+ | -bam /var/www/gbrowse2/databases/pythium/simulated-sorted.bam | ||
+ | search options = default | ||
− | + | ==Add track defaults== | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | === | + | [TRACK DEFAULTS] |
+ | glyph = generic | ||
+ | database = annotation | ||
+ | height = 8 | ||
+ | bgcolor = cyan | ||
+ | fgcolor = black | ||
+ | label density = 25 | ||
+ | bump density = 100 | ||
− | + | Note particularly the "database" entry--for most tracks we'll be using the annotation database, but the bam_sample data source will be available when we want it. | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | == | + | ==Add some tracks== |
− | + | [Genes] | |
+ | feature = gene | ||
+ | glyph = gene | ||
+ | ignore_sub_part = polypeptide | ||
+ | #bgcolor = yellow | ||
+ | forwardcolor = yellow | ||
+ | reversecolor = turquoise | ||
+ | label = sub { my $f = shift; | ||
+ | my $name = $f->display_name; | ||
+ | my @aliases = sort $f->attributes('Alias'); | ||
+ | $name .= " (@aliases)" if @aliases; | ||
+ | $name; | ||
+ | } | ||
+ | height = 6 | ||
+ | description = 0 | ||
+ | key = Named gene | ||
+ | |||
+ | [CDS] | ||
+ | feature = mRNA | ||
+ | glyph = cds | ||
+ | description = 0 | ||
+ | ignore_sub_part = polypeptide exon | ||
+ | height = 26 | ||
+ | sixframe = 1 | ||
+ | label = sub {shift->name . " reading frame"} | ||
+ | key = CDS | ||
+ | citation = This track shows CDS reading frames. | ||
+ | |||
+ | [repeats] | ||
+ | feature = match:repeatmasker | ||
+ | glyph = generic | ||
+ | bgcolor = black | ||
+ | key = Repeats | ||
+ | |||
+ | [ests] | ||
+ | feature = expressed_sequence_match | ||
+ | glyph = segments | ||
+ | stranded = 1 | ||
+ | bgcolor = green | ||
+ | key = EST matches | ||
+ | |||
+ | [proteins] | ||
+ | feature = protein_match | ||
+ | glyph = segments | ||
+ | stranded = 1 | ||
+ | bgcolor = pink | ||
+ | fgcolor = red | ||
+ | key = protein matches | ||
+ | |||
+ | [CoverageXyplot] | ||
+ | feature = coverage | ||
+ | glyph = wiggle_xyplot | ||
+ | database = bam_sample | ||
+ | height = 50 | ||
+ | fgcolor = black | ||
+ | bicolor_pivot = 20 | ||
+ | pos_color = blue | ||
+ | neg_color = red | ||
+ | key = Coverage (xyplot) | ||
+ | |||
+ | [Reads] | ||
+ | feature = match | ||
+ | glyph = segments | ||
+ | draw_target = 1 | ||
+ | show_mismatch = 1 | ||
+ | mismatch_color = red | ||
+ | database = bam_sample | ||
+ | bgcolor = blue | ||
+ | fgcolor = white | ||
+ | height = 5 | ||
+ | label density = 50 | ||
+ | bump = fast | ||
+ | key = Reads | ||
+ | |||
+ | [Pair] | ||
+ | feature = read_pair | ||
+ | glyph = segments | ||
+ | database = bam_sample | ||
+ | draw_target = 1 | ||
+ | show_mismatch = 1 | ||
+ | bgcolor = sub { | ||
+ | my $f = shift; | ||
+ | return $f->attributes('M_UNMAPPED') ? 'red' : 'green'; | ||
+ | } | ||
+ | fgcolor = green | ||
+ | height = 3 | ||
+ | label = sub {shift->display_name} | ||
+ | label density = 50 | ||
+ | bump = fast | ||
+ | connector = dashed | ||
+ | balloon hover = sub { | ||
+ | my $f = shift; | ||
+ | return '' unless $f->type eq 'match'; | ||
+ | return 'Read: '.$f->display_name.' : '.$f->flag_str; | ||
+ | } | ||
+ | key = Read Pairs | ||
− | GBrowse | + | ==Add our new database to the GBrowse.conf== |
− | + | To let GBrowse know that there is a new database available, we have to add a few lines to GBrowse.conf. Add this to the bottom: | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | == | + | [pythium] |
+ | description = Pythium ultimum | ||
+ | path = pythium.conf | ||
− | === | + | ===Updating SAMtools=== |
− | + | The version of SAMtools may need to be updated. Get the samtools release: | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | cd ~/Documents/Software | |
+ | wget -O samtools-0.1.13.tar.bz2 http://sourceforge.net/projects/samtools/files/samtools/0.1.13/samtools-0.1.13.tar.bz2/download | ||
+ | tar jxvf samtools-0.1.13.tar.bz2 | ||
+ | cd samtools-0.1.13 | ||
+ | make | ||
− | + | Install Bio::DB::Sam: | |
− | + | ||
− | + | ||
− | + | ||
− | + | sudo cpan | |
+ | cpan> install Bio::DB::Sam | ||
− | + | when asked "Please enter the location of the bam.h and compiled libbam.a files:", answer: | |
− | + | ||
− | + | /home/gmod/Documents/Software/samtools-0.1.13 | |
− | + | ||
− | + | ==Add semantic zooming for the BAM tracks== | |
− | + | ||
− | + | Not doing this for very dense data (like BAM) is probably the number one performance killers for GBrowse; asking GBrowse to draw a track that has thousands of glyphs is time consuming (and ultimately, probably not very informative). | |
− | + | ||
− | + | [Reads:5001] | |
− | + | feature = coverage | |
− | | | + | glyph = wiggle_density |
− | + | height = 15 | |
− | + | ||
− | + | [Pair:5001] | |
− | + | feature = coverage | |
+ | glyph = wiggle_density | ||
+ | height = 15 | ||
+ | bgcolor = purple | ||
+ | |||
+ | ==Add "show summary" functionality== | ||
+ | |||
+ | For other tracks, when zoomed way out (100kb or 1MB), performance can similarly suffer, with a decreasing "information" content. Newer versions of GBrowse provide the ability to automatically generate density plots when zoomed out. This functionality is available from Chado and {{CPAN|Bio::DB::SeqFeature::Store}} data adaptors. To prepare our Chado database to do this semantic zooming, we need to run a script that comes with Bio::DB::Das::Chado: | ||
+ | |||
+ | cd ~/Documents/Software/gbrowse-adaptors/Chado | ||
+ | svn update | ||
+ | perl bin/gmod_create_summary_statistics.pl | ||
+ | |||
+ | and then add to the pythium.conf file, somewhere near the top (ie, not in the track definitions): | ||
− | + | show summary = 99999 | |
− | == | + | ==Enabling full text searching== |
− | + | If we try searching for "<tt>gene 7.92</tt>", we'll get "Not Found" as a result, even though genemark-scf1117875582023-abinit-gene-7.92 does exist. To look for partial strings, we need to enable full text searching. To do so, we need to run another script that comes with Bio::DB::Das::Chado: | |
− | + | ||
− | + | ||
− | + | ||
− | + | perl /home/gmod/Documents/Software/gbrowse-adaptors/Chado/bin/gmod_chado_fts_prep.pl | |
− | [[ | + | This does several things (including poorly estimating how long it will take to finish), including creating materialized views, using a tool provided by [[gmod:Category:SGN|SOL Genomics Network (SGN)]]. In practice, it would be a good idea to read the documentation of <tt>gmod_materialized_view_tool.pl</tt> for information on keeping the view up to date. |
− | + | We also have to tell GBrowse that this Chado database can now do full text searching, by adding this to the Chado database stanza: | |
− | + | -fulltext 1 | |
− | + | Now we can search for "<tt>gene 7.92</tt>" and we'll find our gene (plus it's mRNA and exons) and we can click on the gene to see it in GBrowse. | |
+ | = Evaluation = | ||
− | + | {{Feedback}} | |
− | + | {{NextSession|Apollo|Apollo}} | |
− | + |
Revision as of 18:43, 11 March 2011
GBrowse Session 2011 GMOD Spring Training |
{{#icon: GBrowseLogo.png|GBrowse|200|gmod:GBrowse}} |
Prerequisites
Installed before using apt or cpan.
Install GBrowse
Easily installed via the cpan shell:
sudo cpan cpan> install Bio::Graphics::Browser2
Which gets all of the prereqs that aren't installed on the machine.
Tutorial
Go to http://localhost/gbrowse2
Basic Chado Configuration (if we have time)
Bio::DB::Das::Chado was installed when we created the image. Sample configuration files are available with GBrowse, and we'll get the sample Chado file:
wget http://gmod.svn.sourceforge.net/viewvc/gmod/Generic-Genome-Browser/trunk/contrib/conf_files/07.chado.conf -O pythium.conf
Some simple tweaks and additions:
- Change description
- Get rid of database = main
- Remove or change examples (yeast examples don't help anybody)
- Add initial landmark (initial landmark = scf1117875582023)
DB connection info
[annotation:database] db_adaptor = Bio::DB::Das::Chado db_args = -dsn dbi:Pg:dbname=chado -user gmod -inferCDS 1 -srcfeatureslice 1 search options = default
Add a BAM data source
[bam_sample:database] db_adaptor = Bio::DB::Sam db_args = -fasta /var/www/gbrowse2/databases/pythium/scf1117875582023.fasta -bam /var/www/gbrowse2/databases/pythium/simulated-sorted.bam search options = default
Add track defaults
[TRACK DEFAULTS] glyph = generic database = annotation height = 8 bgcolor = cyan fgcolor = black label density = 25 bump density = 100
Note particularly the "database" entry--for most tracks we'll be using the annotation database, but the bam_sample data source will be available when we want it.
Add some tracks
[Genes] feature = gene glyph = gene ignore_sub_part = polypeptide #bgcolor = yellow forwardcolor = yellow reversecolor = turquoise label = sub { my $f = shift; my $name = $f->display_name; my @aliases = sort $f->attributes('Alias'); $name .= " (@aliases)" if @aliases; $name; } height = 6 description = 0 key = Named gene [CDS] feature = mRNA glyph = cds description = 0 ignore_sub_part = polypeptide exon height = 26 sixframe = 1 label = sub {shift->name . " reading frame"} key = CDS citation = This track shows CDS reading frames. [repeats] feature = match:repeatmasker glyph = generic bgcolor = black key = Repeats [ests] feature = expressed_sequence_match glyph = segments stranded = 1 bgcolor = green key = EST matches [proteins] feature = protein_match glyph = segments stranded = 1 bgcolor = pink fgcolor = red key = protein matches [CoverageXyplot] feature = coverage glyph = wiggle_xyplot database = bam_sample height = 50 fgcolor = black bicolor_pivot = 20 pos_color = blue neg_color = red key = Coverage (xyplot) [Reads] feature = match glyph = segments draw_target = 1 show_mismatch = 1 mismatch_color = red database = bam_sample bgcolor = blue fgcolor = white height = 5 label density = 50 bump = fast key = Reads [Pair] feature = read_pair glyph = segments database = bam_sample draw_target = 1 show_mismatch = 1 bgcolor = sub { my $f = shift; return $f->attributes('M_UNMAPPED') ? 'red' : 'green'; } fgcolor = green height = 3 label = sub {shift->display_name} label density = 50 bump = fast connector = dashed balloon hover = sub { my $f = shift; return unless $f->type eq 'match'; return 'Read: '.$f->display_name.' : '.$f->flag_str; } key = Read Pairs
Add our new database to the GBrowse.conf
To let GBrowse know that there is a new database available, we have to add a few lines to GBrowse.conf. Add this to the bottom:
[pythium] description = Pythium ultimum path = pythium.conf
Updating SAMtools
The version of SAMtools may need to be updated. Get the samtools release:
cd ~/Documents/Software wget -O samtools-0.1.13.tar.bz2 http://sourceforge.net/projects/samtools/files/samtools/0.1.13/samtools-0.1.13.tar.bz2/download tar jxvf samtools-0.1.13.tar.bz2 cd samtools-0.1.13 make
Install Bio::DB::Sam:
sudo cpan cpan> install Bio::DB::Sam
when asked "Please enter the location of the bam.h and compiled libbam.a files:", answer:
/home/gmod/Documents/Software/samtools-0.1.13
Add semantic zooming for the BAM tracks
Not doing this for very dense data (like BAM) is probably the number one performance killers for GBrowse; asking GBrowse to draw a track that has thousands of glyphs is time consuming (and ultimately, probably not very informative).
[Reads:5001] feature = coverage glyph = wiggle_density height = 15 [Pair:5001] feature = coverage glyph = wiggle_density height = 15 bgcolor = purple
Add "show summary" functionality
For other tracks, when zoomed way out (100kb or 1MB), performance can similarly suffer, with a decreasing "information" content. Newer versions of GBrowse provide the ability to automatically generate density plots when zoomed out. This functionality is available from Chado and Bio::DB::SeqFeature::Store data adaptors. To prepare our Chado database to do this semantic zooming, we need to run a script that comes with Bio::DB::Das::Chado:
cd ~/Documents/Software/gbrowse-adaptors/Chado svn update perl bin/gmod_create_summary_statistics.pl
and then add to the pythium.conf file, somewhere near the top (ie, not in the track definitions):
show summary = 99999
Enabling full text searching
If we try searching for "gene 7.92", we'll get "Not Found" as a result, even though genemark-scf1117875582023-abinit-gene-7.92 does exist. To look for partial strings, we need to enable full text searching. To do so, we need to run another script that comes with Bio::DB::Das::Chado:
perl /home/gmod/Documents/Software/gbrowse-adaptors/Chado/bin/gmod_chado_fts_prep.pl
This does several things (including poorly estimating how long it will take to finish), including creating materialized views, using a tool provided by SOL Genomics Network (SGN). In practice, it would be a good idea to read the documentation of gmod_materialized_view_tool.pl for information on keeping the view up to date.
We also have to tell GBrowse that this Chado database can now do full text searching, by adding this to the Chado database stanza:
-fulltext 1
Now we can search for "gene 7.92" and we'll find our gene (plus it's mRNA and exons) and we can click on the gene to see it in GBrowse.
Evaluation
Please give us your comments on this session. We will ask for your feedback on each session and the course as a whole on the last day. Your comments will help guide the direction and content of future GMOD training and outreach efforts.
Available on platform | web + |
Has URL | http://sourceforge.net/projects/gmod/files/Generic%20Genome%20Browser/ +, https://github.com/GMOD/GBrowse +, http://gbrowse.org +, http://www.wormbase.org/tools/genome/gbrowse/c_elegans/ +, http://flybase.org/cgi-bin/gbrowse/dmel + and http://hapmap.ncbi.nlm.nih.gov/cgi-perl/gbrowse/gbrowse + |
Has description | GBrowse is a combination of database and i … GBrowse is a combination of database and interactive web pages for manipulating and displaying annotations on genomes. Features include:
|
Has development status | inactive + |
Has full name | Generic Genome Browser + |
Has input format | GFF3 + and GFF2 + |
Has licence | GPL2 + and Artistic License + |
Has logo | GBrowseLogo.png + |
Has software maturity status | mature + |
Has support status | inactive + |
Has title | WormBase +, FlyBase + and HapMap + |
Has topic | GBrowse + |
Is open source | Yes + |
Link type | download +, source code +, website + and wild URL + |
Release date | 1 January 2001 + |
Tool functionality or classification | Genome Visualization & Editing + |
Written in language | Perl + |
Has subobjectThis property is a special property in this wiki. | GBrowse#http://sourceforge.net/projects/gmod/files/Generic%20Genome%20Browser/ +, GBrowse#https://github.com/GMOD/GBrowse +, GBrowse#http://gbrowse.org +, GBrowse#http://www.wormbase.org/tools/genome/gbrowse/c_elegans/ +, GBrowse#http://flybase.org/cgi-bin/gbrowse/dmel + and GBrowse#http://hapmap.ncbi.nlm.nih.gov/cgi-perl/gbrowse/gbrowse + |