SOBA Tutorial 2012

(Redirected from SOBA Tutorial

This SOBA tutorial was taught as part of the 2012 GMOD Summer School and the 2013 GMOD Summer School.


About SOBA

SOBA is a command line tool and web application for analyzing GFF3 annotations. GFF3 is a standard file format for genomic annotation data. SOBA gathers statistics from GFF3 files and renders them as tables and graphs.

The web version of SOBA will produce the following:

In addition, the command line tool (SOBAcl) flexibly produces a much wider variety of tables, figures and graphs based on the data in a GFF3 file, as well as the ability to produce complex and extensible custom reports via a robust template system.

SOBA is a tool for those dealing with genomic sequence annotation who want to view genome-wide summaries of their annotation files. For example: SOBA would be a useful tool at an annotation jamboree for a newly sequenced organism and when preparing the resulting genome paper; SOBA would help those developing annotation tools to quickly evaluate updates to their tool; SOBA assists comparative genomics analyses by providing a high-level overview of the genome of multiple organisms. SOBA complements genome browsers by providing a summary of all the features annotated in the genome.

SOBA is built with Perl (and JavaScript for the web interface). The web interface uses CGI::Application as a Perl webapp framework and the JQuery JavaScript library for Web 2.0 effects and AJAX. Both versions of SOBA use the Template Tooklit (TT) to generate html/txt reports, graphviz for the ontology graphs, and GD for charts. Template Toolkit makes extensibility very easy, at least for someone who’s willing to learn the fairly simple template language of TT as you don’t need to know Perl or any other programming to use TT.

SOBA Web Application

Documentation for the web interface to SOBA is available on the Sequence Ontology Wiki as well as via tool-tips on the site itself.

Navigate to the SOBA Web Application with any modern browser that has JavaScript enabled.

SOBA feature lengths by chromosome

We can constrain the features reported in other ways as well.

SOBAcl --columns seqid --rows type --data length --data_type mean \
  --layout table --format text --select 'start => [">=", "1000"], \
  end => ["<=", "1000000"]' hsap_hg18_demo.gff3


SOBAcl has support for more complex reports.

SOBAcl --report attributes --format html_page hsap_hg18_demo.gff3

These reports can be generated by custom templates.

SOBAcl --columns file   --rows type --data length --data_type mean \
  --layout table --format tab --template \
             count  length (mean)
CDS          11944  165.853064300067
contig           3  229900897.333333
exon             12918  288.366697631212
five_prime_UTR   1381   597.052136133237
gene             1157   67319.117545376
mRNA             1085   70187.8202764977
ncRNA            72 24089.3611111111
three_prime_UTR  1385   569.969675090253


Facts about “SOBA Tutorial 2012

Has topic SOBA +