NOTE: We are working on migrating this site away from MediaWiki, so editing pages will be disabled for now.

Arthropod Genomics 2011/Genome Project 101 Workshop

From GMOD
Revision as of 16:35, 7 June 2011 by Scott (Talk | contribs)

Jump to: navigation, search
Under Construction

This page or section is under construction.

This page will be used for the Genome 101 Workshop at Arthropod Genomics 2011.

VMware Image

A VMware image will be made available to participants of the workshop. We will use this image during the workshop

System Configuration

This section attempts to track what we did to create the VMware image

Operating System Ubuntu 11.04, 64 bit client. This is a popular Linux distribution
Memory 2 GB. If you run this on a system that has 2 gigabytes or less of memory, please decrease this number
Disk 80 GB. This is allocated 2 GB at a time, as needed, but VMware.
Networking NAT
Username gmod
Password

Installed Prerequisite Software

GMOD components have a variety of prerequisite software that needs to be installed. Here is a list of what was installed so we could install and run GMOD software.

Software How Comments
Mercurial sudo apt-get install mercurial Revision control system used by Galaxy
Microsoft TrueType core fonts sudo apt-get install ttf-mscorefonts-installer Used by Galaxy.
python-dev sudo apt-get install python-dev Used in Galaxy.
python-setuptools sudo apt-get install python-setuptools Used in Galaxy.
python-pip sudo apt-get install python-pip Used in Galaxy.
bx-python scripts sudo pip install bx-python Scripts used by Galaxy
Python 2.6 sudo apt-get install python2.6 Ubuntu 11.04 comes with Python 2.7, which Galaxy, does not like. This installs 2.6 in parallel.
Graphics libraries sudo apt-get install libgd2-xpm-dev libgd-gd2-perl libgd-tools libgd-svg-perl Used by GBrowse
System utilities and web server sudo apt-get install autoconf apache2 Used by GBrowse and Chado
Database server sudo apt-get install postgresql postgresql-client Used by Chado and GBrowse
Variety of perl modules sudo apt-get install libcgi-session-perl libdbd-pg-perl libdigest-md5-file-perl libclass-base-perl libmodule-build-perl libstatistics-descriptive-perl libtemplate-perl libxml-simple-perl liblog-log4perl-perl libparse-recdescent-perl libsql-translator-perl perl-doc Used by Chado and GBrowse
Perl graphics library cpan> install GD Used by JBrowse and GBrowse
BioPerl libraries cpan> install Bio::Perl Bio::Graphics JSON Used by JBrowse, GBrowse and Chado
GBrowse Chado adaptor cpan> install Bio::DB::Das::Chado Used by JBrowse, GBrowse and Chado
More perl libraries cpan> install GO::Parser Module::Load DBIx::DBSchema XML::Parser::PerlSAX Used by Chado

PostgreSQL Configuration

The postgresql server will be set up with fairly unrestricted access to make life easier during the tutorial. If used "in real life", the configuration should be tightened down quite a bit.

Edit config file

 sudo su -
 vi /etc/postgresql/8.4/main/pg_hba.conf

Change the bottom lines to look like this:

 # "local" is for Unix domain socket connections only
 local   all         all                               trust
 # IPv4 local connections:
 host    all         all         127.0.0.1/32          trust
 # IPv6 local connections:
 host    all         all         ::1/128               trust

by replacing the text in the last column to "trust" as it is here (that's the insecure part!). Then restart the postgresql server:

 /etc/init.d/postgresql restart

Then, switch users to the "postgres" user and create a new user called "gmod":

 su - postgres
 createuser gmod
   Shall the new role be a superuser? (y/n) y
 exit  # to leave postgres user shell
 exit  # to leave root shell

Install DBIx::DBStag

This is a perl module that can only be installed after PostgreSQL is configured, so it is installed now. First, create a database called "test":

 createdb test

Then install via the cpan shell:

 cpan
 cpan> install DBIx::DBStag

Note that installing via the cpan shell is difficult if you typically use cpan as root, like "sudo cpan". If instead you use cpan as a regular user but have it configured to do "sudo make install" and "sudo ./Build install" it is easy and works correctly.

GMOD Components

MAKER Web Annotation Service

While we could install MAKER locally on this machine, it is nice to be able to make use of the web service provided by Mark Yandell's group at University of Utah. To use it, go to

 http://derringer.genetics.utah.edu/cgi-bin/MWAS/maker.cgi

and create a free account (I created one for this tutorial with a user name of gmodags). After that is created, we can upload some sample data. I put the sample data that I used on ~/Downloads/MAKER_input, where there are three files:

  • pyu-contig.fasta - a FASTA file containing a 1.7 MB contig
  • pyu-est.fasta - A set of assembled 454 read ESTs from P. ultimum and a related organism
  • pyu-protein.fasta - a set of protein sequence from a related organism

After clicking on the "New Job" tab, I uploaded all three files in the appropriate spot, ignoring the others:

MAKER contig.png

MAkER est.png

MAKER protien.png

After uploading these, I pressed "Add to Job Queue" to get it started running. The job waited under an hour before starting, and then finished in under three hours.

Upon finishing, I was presented with multiple ways of looking at the data:

MAKER download.png

and after taking a quick look at both GBrowse and JBrowse, downloaded the data to the machine (in ~/Downloads/3263.maker.output). The GFF file in this directory will be loaded into Chado.

Galaxy

The default python on Ubuntu 11.04 is 2.7. We need 2.6 to run Galaxy. Using the instructions from the GetGalaxy wiki page, Python 2.6 was downloaded and added at the front of the path.

mkdir ~/galaxy-python
ln -s /path/to/python2.5 ~/galaxy-python/python

~/.bashrc was edited and these lines were added to the end.

# Use Python 2.6 for Galaxy
export PATH=~/galaxy-python:$PATH

Galaxy was then downloaded:

cd ~/Documents
mkdir work
cd work
hg clone http://bitbucket.org/galaxy/galaxy-dist

And we then customized the landing image for this conference. (Details are not important.)

And now we can start it:

cd galaxy-dist
sh run.sh

And Galaxy is now installed and running. Goto http://localhost:8080.

Chado

Get Chado from SourceForge; point a browser at

 http://sourceforge.net/projects/gmod/files/gmod/chado-1.11/chado-1.11.tar.gz/download

and extract the files:

 cd ~/Downloads
 tar zxvf chado-1.11.tar.gz
 cd chado-1.11

Set up some environment variables:

 vi ~/.bashrc

and add these lines to the bottom:

 export GMOD_ROOT=/usr/local/gmod
 export CHADO_DB_NAME=chado
 export CHADO_DB_USERNAME=gmod

save .bashrc and source it so that the values are available in the shell:

 source ~/.bashrc

Now to install Chado:

 perl Makefile.PL

Accept all of the defaults except for the "default organism" question. Put "pythium" here.

 make
 sudo make install
 make load_schema   #ignore the error about a chado database not existing
 make prepdb
 make ontologies
   answer with 1,2,4

Add our organism to the database:

 psql chado
 psql> INSERT INTO organism ( abbreviation, genus, species, common_name) VALUES ('P.ultimum','Pythium','ultimum','pythium');

Make a database dump that saves progress to this point:

 pg_dump chado | bzip2 -c > ontologies_only_dbdump.bz2

GBrowse

GBrowse can be installed directly from the cpan shell like several of the GBrowse prerequisites were installed:

 cpan
 cpan> install Bio::Graphics::Browser2

Accept all of the defaults when asked questions.

JBrowse

 wget http://jbrowse.org/releases/jbrowse-1.2.1.zip