Difference between revisions of "Clements/AGS"

From GMOD
Jump to: navigation, search
m (1. What have we got?)
m (3. Get just the Exons and CDSs)
 
(2 intermediate revisions by the same user not shown)
Line 5: Line 5:
 
To further analyze your data you need tools like [[Galaxy]], [[BioMart]] and [[InterMine]].  Since I work for Galaxy, we'll spend some time working on a simple example in that.  We'll touch on BioMart or InterMine as time allows.
 
To further analyze your data you need tools like [[Galaxy]], [[BioMart]] and [[InterMine]].  Since I work for Galaxy, we'll spend some time working on a simple example in that.  We'll touch on BioMart or InterMine as time allows.
  
=== 0. Get to Galaxy ===
+
=== 1. Get to Galaxy ===
  
 
We could run this analysis on the free public Galaxy server (http://usegalaxy.org), or on the Galaxy that has been installed on our VMware image.  Let's run it on our local install.
 
We could run this analysis on the free public Galaxy server (http://usegalaxy.org), or on the Galaxy that has been installed on our VMware image.  Let's run it on our local install.
Line 11: Line 11:
 
''Note: Please don't run on the local install with me.  The public server might be able to support 120 people doing this simultaneously.  The local install won't.''
 
''Note: Please don't run on the local install with me.  The public server might be able to support 120 people doing this simultaneously.  The local install won't.''
  
=== 1. What have we got? ===
+
=== 2. What have we got? ===
  
 
First load the GFF that MAKER produced into Galaxy
 
First load the GFF that MAKER produced into Galaxy
Line 25: Line 25:
  
 
:'''Join, Subtract and Group → Group → Group by Column: c4'''
 
:'''Join, Subtract and Group → Group → Group by Column: c4'''
 +
 +
This tells Galaxy please group the lines by the value in column 4, which is the SO type of the feature
  
 
:'''Add new operation → Type: Count → Execute'''
 
:'''Add new operation → Type: Count → Execute'''
 +
 +
Now count the number of lines that have each type.
 +
 +
Anything interesting?  Hmmm. We've got one more exon than CDS.  I wonder where that is?
 +
 +
=== 3. Get just the Exons and CDSs ===
 +
 +
Just get the exons:
 +
 +
: '''Filter and Sort → Filter → Filter: GFF-to-Bed on data'''
 +
 +
The SO type is in column 4 in BED.
 +
 +
: '''With following condition: c4=='exon' → Execute'''
 +
 +
Repeat with CDS.
 +
 +
 +
=== 4. See what is in the exon set that is not in the CDS set ===
 +
 +
: '''Operate on Genomic Intervals → Subtract'''
 +
 +
: '''Subtract CDSs from Exons &rarr Execute'''
 +
 +
=== 5. Investigate ===
 +
 +
We have one exon left.  Go visualize it in GBrowse or JBrowse.

Latest revision as of 20:04, 9 June 2011

Galaxy Example

GBrowse and JBrowse are excellent for visualizing our assembly and annotation. But what if you want to do some further analysis and exploration? You can manually browse the assembly, but then you won't get tenure.

To further analyze your data you need tools like Galaxy, BioMart and InterMine. Since I work for Galaxy, we'll spend some time working on a simple example in that. We'll touch on BioMart or InterMine as time allows.

1. Get to Galaxy

We could run this analysis on the free public Galaxy server (http://usegalaxy.org), or on the Galaxy that has been installed on our VMware image. Let's run it on our local install.

Note: Please don't run on the local install with me. The public server might be able to support 120 people doing this simultaneously. The local install won't.

2. What have we got?

First load the GFF that MAKER produced into Galaxy

Get Data → Upload File → ftp://ftp.gmod.org/pub/gmod/Meetings/2011/AGS/3263.maker.output/3263.all.gff &rarr Execute
This uploads the GFF file into Galaxy. It recognizes it as a GFF3 file.

Now, because of a bug in Galaxy (don't tell anyone), we need to convert it to BED to run a subsequent step.

Convert Formats → GFF-to-BED → Execute

Now lets see what is in the annotation. Lets count the number of different feature types in the file.

Join, Subtract and Group → Group → Group by Column: c4

This tells Galaxy please group the lines by the value in column 4, which is the SO type of the feature

Add new operation → Type: Count → Execute

Now count the number of lines that have each type.

Anything interesting? Hmmm. We've got one more exon than CDS. I wonder where that is?

3. Get just the Exons and CDSs

Just get the exons:

Filter and Sort → Filter → Filter: GFF-to-Bed on data

The SO type is in column 4 in BED.

With following condition: c4=='exon' → Execute

Repeat with CDS.


4. See what is in the exon set that is not in the CDS set

Operate on Genomic Intervals → Subtract
Subtract CDSs from Exons &rarr Execute

5. Investigate

We have one exon left. Go visualize it in GBrowse or JBrowse.