Jump to: navigation, search

Galaxy Example

GBrowse and JBrowse are excellent for visualizing our assembly and annotation. But what if you want to do some further analysis and exploration? You can manually browse the assembly, but then you won't get tenure.

To further analyze your data you need tools like Galaxy, BioMart and InterMine. Since I work for Galaxy, we'll spend some time working on a simple example in that. We'll touch on BioMart or InterMine as time allows.

1. Get to Galaxy

We could run this analysis on the free public Galaxy server (, or on the Galaxy that has been installed on our VMware image. Let's run it on our local install.

Note: Please don't run on the local install with me. The public server might be able to support 120 people doing this simultaneously. The local install won't.

2. What have we got?

First load the GFF that MAKER produced into Galaxy

Get Data → Upload File → &rarr Execute
This uploads the GFF file into Galaxy. It recognizes it as a GFF3 file.

Now, because of a bug in Galaxy (don't tell anyone), we need to convert it to BED to run a subsequent step.

Convert Formats → GFF-to-BED → Execute

Now lets see what is in the annotation. Lets count the number of different feature types in the file.

Join, Subtract and Group → Group → Group by Column: c4

This tells Galaxy please group the lines by the value in column 4, which is the SO type of the feature

Add new operation → Type: Count → Execute

Now count the number of lines that have each type.

Anything interesting? Hmmm. We've got one more exon than CDS. I wonder where that is?

3. Get just the Exons and CDSs

Just get the exons:

Filter and Sort → Filter → Filter: GFF-to-Bed on data

The SO type is in column 4 in BED.

With following condition: c4=='exon' → Execute

Repeat with CDS.

4. See what is in the exon set that is not in the CDS set

Operate on Genomic Intervals → Subtract
Subtract CDSs from Exons &rarr Execute

5. Investigate

We have one exon left. Go visualize it in GBrowse or JBrowse.