Difference between revisions of "Tripal Tutorial v2.0"

From GMOD
Jump to: navigation, search
(Using the Bulk Loader)
(Replaced content with "The Tripal v2.0 tutorial is now found at the Tripal website inside of the [http://tripal.info/tutorials/v2.0/full Tripal User's Guide].")
 
(31 intermediate revisions by the same user not shown)
Line 1: Line 1:
Welcome to the [[Tripal]] v2.0 Tutorial.  Here you will find instructions for installation, usage and administration of a Tripal-based genome website.  This tutorial guides the user through the process of installation, setup and data loading of genomic feature data and annotations.
+
The Tripal v2.0 tutorial is now found at the Tripal website inside of the [http://tripal.info/tutorials/v2.0/full Tripal User's Guide].
 
+
'''Note''': Tripal is provided free of charge, as-is with no warranty or guarantee of fitness.  The developers are committed to creating a platform usable by all and as bug free as possible.  However, bugs may be present, especially with the newest features. If you find problems or bugs, please feel free to report them either via the [https://lists.sourceforge.net/lists/listinfo/gmod-tripal Tripal mailing List] or adding a bug report on the [https://drupal.org/project/issues/1337878 Tripal issues tracker].
+
 
+
 
+
<font color="red">'''Note''':  This tutorial has been updated from the v1.1 tutorial for Tripal v2.0a.  It is, however, correct up to the section about importing publications.  As Tripal 2.0-alpha matures the tutorial will be completely updated.  Thank you for your patience.</font>
+
 
+
 
+
=== What is Tripal ===
+
 
+
Tripal is a suite of PHP5 modules that bridges the Drupal Content Managment System (CMS) and GMOD Chado.  The goal is to simplify construction of a community genomics, genetics or biological website to enable individual labs or research communities to construct a high-quality, standards-based website for data sharing and collaboration.
+
 
+
[[Image:600px-WhatisTripal.png|300px]]
+
 
+
=== Content Management System ===
+
''Definition From Wikipedia:''
+
 
+
A '''content management system''' ('''CMS''') is the '''collection of procedures''' used to manage work flow in a collaborative environment. These procedures can be manual or computer-based. The procedures are designed to do the following:
+
* Allow for a large number of people to contribute to and share stored data
+
* Control access to data, based on user roles (defining which information users or user groups can view, edit, publish, etc.)
+
* Aid in easy storage and retrieval of data
+
* Reduce repetitive duplicate input
+
* Improve the ease of report writing
+
* Improve communication between users
+
 
+
In a CMS, data can be defined as nearly anything: documents, movies, pictures, phone numbers, scientific data, and so forth. CMSs are frequently used for storing, controlling, revising, semantically enriching, and publishing documentation. Serving as a central repository, the CMS increases the version level of new updates to an already existing file. Version control is one of the primary advantages of a CMS.
+
 
+
=== Drupal ===
+
Drupal is an open-source freely available CMS with thousands of users and existing sites. Features of Drupal
+
 
+
* A well-supporting community.
+
* Books, tutorials and online forums for help .
+
* Hundreds of user-contributed extension modules that are freely available.
+
* Hundreds of user-contributed themes to instantly change the look-and-feel of the site
+
* User management infrastructure.
+
* Allows for non-coding manipulation of the website contents.  Anyone can edit content.
+
* Easy to install and maintain
+
 
+
Drupal website:  http://www.drupal.org
+
Drupal modules:  http://www.drupal.org/project/modules
+
Drupal themes:  http://www.drupal.org/project/themes
+
 
+
Tripal v2.0 is compatible with Drupal v7. 
+
 
+
=== Chado ===
+
You can find more detailed information about Chado here: http://gmod.org/wiki/Chado_-_Getting_Started.  However, one thing to remember in regards to Tripal organization is that Chado has a modular structure:
+
 
+
<div class="indent">
+
* Audit - for database audits
+
* Companalysis - for data from computational analysis
+
* Contact - for people, groups, and organizations
+
* Controlled Vocabulary (cv) - for controlled vocabularies and ontologies
+
* Expression - for summaries of RNA and protein expresssion
+
* General - for identifiers
+
* Genetic - for genetic data and genotypes
+
* Library - for descriptions of molecular libraries
+
* Mage - for microarray data
+
* Map - for maps without sequence
+
* Organism - for taxonomic data
+
* Phenotype - for phenotypic data
+
* Phylogeny - for organisms and phylogenetic trees
+
* Publication (pub) - for publications and references
+
* Sequence - for sequences and sequence features
+
* Stock - for specimens and biological collections
+
* WWW -
+
</div>
+
 
+
Tripal is also modular following these same designations.
+
 
+
=== Goals of Tripal ===
+
*Simplify Construction of Biological Databases
+
**Reduce time of development
+
**Reduce costs
+
**Reduce technical resources (i.e. programmers, systems admins).
+
**A non-technical site administrator can add content without knowing PHP, HTML, JavaScript.
+
 
+
*Greater Flexibility of the Biological Website
+
**Social Networking
+
**Non-biological content
+
**Outreach, tutorials, documentation, protocols, publications
+
 
+
*Expandability
+
**Site can be programmatically expanded in any way
+
**Changes to base-code are not needed but modules are added.
+
**Availability of an Application Programmer Interface (API)
+
 
+
*Reusability
+
**All code can be shared.  Expansion modules created by one group can be shared with all.
+
 
+
=== Structure of Tripal ===
+
Tripal is a collection of modules that integrate with Drupal.  These modules are divided into hierarchical categories:
+
 
+
[[Image:TripalLayers.png|300px]]
+
 
+
The Tripal Core level contains supportive functionality for all other modules.  It contains
+
* A jobs management utility
+
* A utility to manage materialized views
+
* An API for these features
+
* Functions for managing module specific CV terms
+
* Functions for interfacing with Chado.
+
 
+
 
+
The Chado-centric modules provide:
+
* Edit/Update/Delete for Chado modules.
+
* Bulk loaders for these data
+
* Basic visualizations for data in Chado specific for the module
+
* An API for easily accessing Chado.
+
 
+
 
+
Analysis modules provide
+
* Custom visualization for specific analyses (e.g. Blast, KEGG, InterProScan, Unigene construction)
+
* Uses the API from the Tripal Analysis (Chado-centric) module.
+
 
+
 
+
Applications:
+
* These are full blown applications that use Tripal, Drupal and Chado and typically consist of several Chado-centric modules, Analysis modules and custom built modules.  (e.g. Breeders Toolbox currently under construction).
+
 
+
=== Sites Running Tripal ===
+
{| class="wikitable"
+
|-
+
!Site Name
+
!URL
+
|-
+
|Banana Genome Hub
+
|http://banana-genome.cirad.fr/
+
|-
+
|Pulse Crops Genomics & Breeding
+
|http://knowpulse2.usask.ca/portal/
+
|-
+
|Genome Database for Vaccinium
+
|http://www.vaccinium.org
+
|-
+
|Genome Database for Rosacaee
+
|http://www.rosaceae.org
+
|-
+
|Cool Season Food Legume Database
+
|http://www.gabcsfl.org
+
|-
+
|Cacao Genome Database
+
|http://www.cacaogenomedb.org
+
|-
+
|Fagaceae Genome Web
+
|http://www.fagaceae.org
+
|-
+
|Citrus Genome Database
+
|http://www.citrusgenomedb.org
+
|-
+
|Marine Genomics Project
+
|http://www.marinegenomics.org
+
|}
+
 
+
=== Resources ===
+
 
+
The Tripal home site where you can find everything about Tripal:  http://tripal.info
+
 
+
GMOD Tripal mailing lists:  http://gmod.org/wiki/GMOD_Mailing_Lists
+
 
+
GMOD Tutorials from previous GMOD schools:  http://gmod.org/wiki/Tripal
+
 
+
=== Contributing Organizations ===
+
Individuals from these organizations have provided design and coding for Tripal v2.0
+
 
+
[[Image:150px-USLogo.png]]
+
[[Image:150px-WSULogo.png]]
+
 
+
Acknowledgments are extended to the Clemson University Genomics Institute where Tripal was first conceived and where development of earlier releases was performed.
+
 
+
[[Image:150px-CUGILogo.png|75px]]
+
 
+
Also, special thanks are extended to the GMOD project for logistical support and community interaction!!
+
 
+
===Funding===
+
 
+
Funding for Tripal v2.0 has been provided through various grants from various sources. 
+
 
+
 
+
=== Publications ===
+
# Lacey-Anne Sanderson, Stephen. P. Ficklin, Chun-Huai Cheng, Sook Jung, Frank A. Feltus, Kirstin E. Bett, Dorrie Main.  [http://database.oxfordjournals.org/content/2013/bat075.full| Tripal v1.1: a Standards-based Platform for Construction of Online Genetic and Genomic Databases].  ''Databases'', Sept 2013.
+
# Stephen P. Ficklin, Lacey-Anne Sanderson, Chun-Huai Cheng, Margaret Staton, Taein Lee, Il-Hyung Cho, Sook Jung, Kirstin E Bett, Dorrie Main. [http://database.oxfordjournals.org/content/2011/bar044.full| Tripal: a construction Toolkit for Online Genome Databases]. ''Database'', Sept 2011.
+
 
+
== Pre-planning ==
+
=== IT Infrastructure ===
+
Tripal requires a server with adequate resources to handle the expected load and systems administration skills to get the machine up and running, applications installed and everything properly secure.  Tripal requires a PostgreSQL databases server, Apache (or equivalent) web server, PHP5 and several configuration options to make it all work.  However, once these prerequisites are met, working with Drupal and Tripal are quite easy.
+
 
+
 
+
There are four ways you could get a Tripal/Drupal/Chado database web server available for your site
+
 
+
# '''Option #1  In-house dedicated servers:'''  You may have access to servers in your own department or group which you have administrative control and wish to install Tripal/Drupal/Chado on these.
+
# '''Option #2  Institutional IT support:'''  Your institution may provide IT servers and would support your efforts to install a website with database backend.
+
# '''Option #3  Commercial web-hosting:'''  If options #1 and #2 are not available to you, commercial web-hosting is an affordable option.  For large databases you may require a dedicated server. [http://www.bluehost.com/ Bluehost.com] is a web hosting service that provides hosting compatible with Drupal, Tripal and its dependencies.
+
# '''Option #4  In the Cloud:'''  Tripal is a part of the [http://gmod.org/wiki/Cloud GMOD in the cloud ] Amazon AWS image created by GMOD.  It is also accompanied by other GMOD tools such as GBrowse2, JBrowse, Apollo and WebApollo.
+
 
+
 
+
After selection of one of the options above you can arrange your database/webserver in the following ways:
+
 
+
# '''Arrangement #1:'''  The database and web server are housed on a single server.  ''This is the approach taken by this course''.  It is necessary to gain access to a machine with enough memory (RAM), hard disk speed and space, and processor power to handle both services.
+
# '''Arrangement #2:'''  The database and web server are housed on different servers.  This provides dedicated resources to each service (i.e. web and database).
+
 
+
 
+
'''Selection of an appropriate machine'''
+
 
+
Databases are typically bottle-necked by RAM and disk speed.  Selection of the correct balance of RAM, disk speed, disk size and CPU speed is important and dependent on the size of the data.  The best advice is to consult an IT professional who can recommend a server installation tailored for the expected size of your data.
+
 
+
 
+
<font color="red">'''Note'''</font>:  Tripal does require command-line access to the web server with adequate local file storage for loading of large data files.  Be sure to check with your service provider to make sure command-line access is possible.
+
 
+
=== Technical Skills===
+
Depending on your needs, you may need additional Technical support....
+
 
+
 
+
'''Tripal already supports my data, what personnel to I need to maintain it?'''
+
 
+
* Someone to install/setup the IT infrastructure
+
* Someone who understands the data to load it properly
+
 
+
 
+
'''Tripal does not yet support all of my data, but I want to use what's been done and expand on it....?'''
+
 
+
* Someone to install/setup the IT infrastructure
+
* Someone who understands the data to load it properly
+
* PHP/HTML/CSS/JavaScript programmer(s) to write your custom extensions
+
 
+
 
+
=== Why Use Tripal ===
+
Tripal v2.0 provides default pages for most Chado data types.  It also support '''all''' of Chado in terms of data access.  '''''So why use Tripal?'''''
+
 
+
# You want to use a community-supported common database infrastrcure (i.e. Chado).
+
# You need a web interface but do not want to build one from scratch.
+
# You need content-management capabilities (distributed content editing, user management, social networking... i.e. Drupal)
+
# You want to contribute to a community effort to help build a tool others can use.
+
# You want to participate in a community with other database developers using the same technology and confronting similar problems.
+
# You want to use open-source and free technology!
+
 
+
=== Development and Production Instances ===
+
It is recommended that you have separate development and production instances of Tripal.  The staging or development instance allows you to test new functionality, add customizations, or test modification or additions to data without disturbing the production instance. The production instance serves content to the rest of the world.  Once you are certain that customizations and new functionality will work well on the development instance you can easily re-implement or copy these over to the production site.  Sometimes it may take a few trials to load data in the way you want. A development sites lets you take time to test data loading prior to making it public. The development site can be password-protected to allow only access only to site administrators, developers or collaborators.
+
 
+
==Server Installation ==
+
The following instructions are for setup of Tripal on an [http://www.ubuntu.com/ Ubuntu version 12.04 server edition]. When possible, alternative command-line statements have been added to this tutorial as users of other Linux version have provided feedback.  Unless specifically identified, all commands are for Ubuntu 12.04 linux.
+
 
+
During installation of the Ubuntu 12.04 server please select the following software packages for installation:
+
 
+
* OpenSSH server
+
* LAMP (includes Apache and PHP)
+
* PostgreSQL database version 8.4 or higher
+
 
+
=== Apache Setup ===
+
Apache is the web server software.  Apache should be installed.  On the Ubuntu server, navigate to your new website using this address:  [http://localhost/ http://localhost/].  You should see the text "It works!".
+
 
+
[[Image:ItWorks.png|800px]]
+
 
+
Drupal works best with the Apache rewrite module enabled. To do this execute the following on the command-line:
+
<pre class="enter">
+
  cd /etc/apache2/mods-enabled
+
  sudo ln -s ../mods-available/rewrite.load
+
</pre>
+
 
+
Next we need to edit the apache configuration file to give Drupal full controls of options within the directory root.  Edit the /etc/apache2/sites-available/000-default file:
+
 
+
<pre class="enter">
+
  cd /etc/apache2/sites-available/
+
  sudo gedit default
+
</pre>
+
 
+
And change the '''AllowOverride''' option from '''None''' to '''All''':
+
 
+
<pre class="enter">
+
  <Directory /var/www/>
+
      Options Indexes FollowSymLinks MultiViews
+
      AllowOverride All
+
      Order allow,deny
+
      allow from all
+
  </Directory>
+
</pre>
+
 
+
Now restart your apache again.
+
<pre class="enter">
+
sudo /etc/init.d/apache2 restart
+
</pre>
+
 
+
=== Setup PHP ===
+
PHP comes loaded onto the server, but we need a few other packages:
+
 
+
[[File:Ubuntu.jpg|50px]] First install php5 using Ubuntu's apt-get utility:
+
<pre class="enter">
+
  sudo apt-get install php5-pgsql
+
  sudo apt-get install php5-gd
+
</pre>
+
 
+
For newer versions of Ubuntu (e.g. 13.10) you will also want to install the php5-json package:
+
<pre class="enter">
+
  sudo apt-get install php5-json
+
</pre>
+
 
+
[[File:Suse.png|50px]]  For Suse Linux you may need to install the '''php-posix''' package:
+
<pre class="enter">
+
  yum install php-posix
+
</pre>
+
 
+
[[File:Red hat logo big.jpg|50px]] For RedHat Linux you may also need to install the '''php-process''' package:
+
<pre class="enter">
+
  yum install php-process
+
</pre>
+
 
+
Change some php settings (as root):
+
<pre class="enter">
+
  cd /etc/php5/apache2
+
  sudo gedit php.ini
+
</pre>
+
 
+
Set the <tt>memory_limit</tt> to something larger than <tt>128M</tt> (should not exceed physical memory, be conservative but not too much so):
+
 
+
  <span class="enter">memory_limit = 2048M;</span>
+
 
+
 
+
Now, restart the webserver:
+
 
+
  <span class="enter">sudo /etc/init.d/apache2 restart</span>
+
 
+
===Install phpPgAdmin===
+
phpPgAdmin is a nice web-based utility for easy administration of a [[gmod:PostgreSQL|PostgreSQL]] database. '''Note''': PhpPgAdmin is not required for successful operation of Tripal but is very useful.
+
 
+
<pre class="enter">
+
  sudo apt-get install phppgadmin
+
</pre>
+
 
+
Next, we need to make changes to the configuration settings so that we can remotely access phppgadmin.  To do this, edit the phppgadmin config file for apache:
+
 
+
<pre class="enter">
+
  cd /etc/apache2/conf.d
+
  sudo gedit phppgadmin
+
</pre>
+
 
+
Now, comment out the line that allows access to the local server only, and uncomment the line that allows
+
access to anyone.
+
 
+
<pre class="enter">
+
#allow from 127.0.0.0/255.0.0.0 ::1/128
+
allow from all
+
</pre>
+
 
+
We also want to password protect PhpPgAdmin using Apache's access control mechanisms.  we need to instruct Apache to use password protection for PhpPgAdmin.  To do this add the following lines within the Directory stanza just below the line we just uncommented:
+
 
+
<pre class="enter">
+
AuthType Basic
+
AuthName "Password Required"
+
AuthUserFile /usr/share/phppgadmin/.htpasswd
+
Require User tripaladmin
+
</pre>
+
 
+
The lines above instruct apache to use basic authentication, that the password file is located at /usr/share/phppgadmin/.htpasswd and the only user allowed to login is 'tripaladmin'.  Save the configuration file.  Next we need to create the password file:
+
 
+
<pre class="enter">
+
  cd /usr/share/phppgadmin
+
  sudo htpasswd -c .htpasswd tripaladmin
+
</pre>
+
 
+
The htpasswd command above will create the .htpasswd file and add the new user 'tripaladmin'.  You will need to set a password when requested. Finally, restart the webserver:
+
 
+
<pre class="enter">
+
sudo /etc/init.d/apache2 restart
+
</pre>
+
 
+
Now navigate to the URL [[http://localhost/phppgadmin http://localhost/phppgadmin]] and you should see the following:
+
 
+
[[Image:Phppgadmin.png|800px]]
+
 
+
The username 'tripaladmin' and the password you specified should be required when accessing the PhpPgAdmin page.
+
 
+
== Database Setup ==
+
 
+
Drupal can use a MySQL or PostgreSQL database but Chado prefers PostgreSQL so that is what we will use for both Drupal and Chado. We need to create the Drupal database. The following command can be used to create a new database user and database.
+
 
+
 
+
First, become the 'postgres' user:
+
<pre class="enter">
+
sudo su - postgres
+
</pre>
+
 
+
 
+
Next, create the new 'drupal' user account.  This account will not be a "superuser' nor allowed to create new roles, but should be allowed to create a database.
+
 
+
<pre class="enter">
+
createuser -P drupal
+
</pre>
+
 
+
When requested, enter an appropriate password:
+
 
+
  <span class="enter">
+
  Enter password for new role: *********
+
  Enter it again:  *********
+
  Shall the new role be a superuser? (y/n) n
+
  Shall the new role be allowed to create databases? (y/n) y
+
  Shall the new role be allowed to create more new roles? (y/n) n
+
  </span>
+
 
+
 
+
 
+
Finally, create the new database:
+
<pre class="enter">
+
createdb drupal -O drupal
+
</pre>
+
 
+
 
+
We no longer need to be the postgres user so exit
+
<pre class="enter">
+
exit
+
</pre>
+
 
+
==Install Drupal==
+
=== Software Installation ===
+
 
+
We want to install Drupal into our web document root (<tt>/var/www</tt>).  We want to be able to do this as our 'ubuntu' user.  So, first, set the directory permissions to allow this:
+
 
+
<pre class="enter">
+
  cd /var
+
  sudo chown -R ubuntu www
+
  sudo chgrp -R ubuntu www
+
</pre>
+
 
+
In the command above we set the owner and group of the directory to be '''ubuntu''' (our user group).
+
 
+
Tripal 2.0 requires version 7.x of Drupal.  Drupal can be freely downloaded from the http://www.drupal.org website.  At the writing of this Tutorial the most recent version of Drupal 7 is version 7.28.  The software can be downloaded manually from the Drupal website through a web browser or we can use the 'wget' command to retrieve it:
+
 
+
<pre class="enter">
+
  cd /var/www
+
  wget http://ftp.drupal.org/files/projects/drupal-7.28.tar.gz
+
</pre>
+
 
+
 
+
Next, we want to install Drupal.  We will use the '''tar''' command to uncompress the software:
+
<pre class="enter">
+
  cd /var/www
+
  tar -zxvf drupal-7.28.tar.gz
+
</pre>
+
 
+
 
+
Notice that we now have a <tt>drupal-7.28</tt> directory with all of the Drupal files.  We want the Drupal files to be in our document root, not in a 'drupal-7.28' subdirectory.  So, we'll move the contents of the directory up one level:
+
 
+
<pre class="enter">
+
mv drupal-7.28/* ./
+
mv drupal-7.28/.htaccess ./
+
mv index.html index.html.orig
+
</pre>
+
 
+
 
+
'''<font color="red">Note:</font>''' It is extremely important the the hidden file <tt>.htaccess</tt> is also moved (note the second 'mv' command above. Check to make sure this file is there
+
 
+
<pre class="enter">
+
  ls -l .htaccess
+
</pre>
+
 
+
Notice that the last of the three <tt>mv</tt> commands renames the <tt>index.html</tt> file and calls it <tt>index.html.orig</tt>.  The <tt>index.html</tt> file was serving as the home page for the website.  Drupal uses an <tt>index.php</tt> page for it's home page but the web server has preference for the <tt>index.html</tt> page.  So, we move it out of the way.
+
 
+
=== Configuration File===
+
Next, we need to tell Drupal how to connect to our database.  To do this we have to setup a configuration file.  Drupal comes with an example configuration file which we can borrow.
+
 
+
 
+
First navigate to the location where the configuration file should go:
+
<pre class="enter">
+
  cd /var/www/sites/default/
+
</pre>
+
 
+
 
+
Next, copy the example configuration that already exists in the directory to be our actual configuration file by renaming it to <tt>settings.php</tt>.
+
<pre class="enter">
+
  cp default.settings.php settings.php
+
</pre>
+
 
+
 
+
Now, we need to edit the configuration file to tell Drupal how to connect to our database server.  To do this we'll use an easy to use text editor '''gedit'''
+
<pre class="enter">
+
  gedit settings.php
+
</pre>
+
 
+
 
+
Find the following line
+
<pre class="enter">
+
$databases = array();
+
</pre>
+
 
+
Add the following just after the above line:
+
<pre class="enter">
+
$databases['default']['default'] = array(
+
  'driver' => 'pgsql',
+
  'database' => 'drupal',
+
  'username' => 'drupal',
+
  'password' => '********',
+
  'host' => 'localhost',
+
  'prefix' => '',
+
);
+
</pre>
+
 
+
Replace the text '********' with your database password for the user 'drupal' created previously.
+
 
+
=== Files directory creation ===
+
Finally, we need to create three new directories.  The first is the <tt>files</tt> directory which Drupal uses for storing uploaded files.
+
 
+
<pre class="enter">
+
  cd /var/www/sites/default
+
  mkdir files
+
  sudo chown ubuntu:www-data files
+
  sudo chmod g+rw files
+
 
+
  cd /var/www/sites/all
+
  mkdir libraries
+
 
+
</pre>
+
 
+
 
+
The above command creates the directory but sets the group to be the web server (i.e. www) with read/write permissions.  This way the web server can write to the directory but so can we.
+
 
+
=== Web-based Steps ===
+
 
+
Navigate to the installation page of our new web site http://localhost/install.php
+
 
+
[[Image:Tripal2.0_install1.png]]
+
 
+
Ensure that '''Standard''' is selected and click '''Save and Continue'''.  You will next be asked to select the language you want to use.  Choose '''English''' (which is probably the only choice).
+
 
+
[[Image:Tripal2.0 install2.png|1000px]]
+
 
+
You will see a progress bar (as shown above) as Drupal is installed.  Once it completes, a configuration page with some final settings will be visible.
+
 
+
[[Image:Tripal2.0 install3.png|1000px]]
+
 
+
 
+
Set the following
+
* Site Information
+
** Site Name:  Tripal 2.0
+
** Site email:  Your email address
+
* Site Maintenance Account
+
** Username:  administrator (all lower-case)
+
** Email:  Your email address
+
** Password:  ********
+
* Server Settings
+
** Default country: (wherever the site is located)
+
** Default time zone: (your time zone)
+
* Update Notifications (both boxes checked)
+
 
+
Now, click the '''Save and Continue''' button.  You will see a message about unable to send an email.  This is safe to ignore for the tutorial, but for a production site you will need that your server can send emails to a service provider.  Now, your site is enabled.  Click the link '''Your new site''':
+
 
+
[[Image:Tripal2.0_install4.png|1000px]]
+
 
+
=== Drupal Cron Entry ===
+
The last step for installing Drupal is setting up the automatted Cron entry.  The Drupal cron is used to automatically execute necessary housekeeping tasks on a regular interval.  We want to integrate Drupal Cron with the UNIX Cron facility.  To do so, we must first get the appropriate URL for the cron by navigating to '''Configuration''' &rarr; '''Cron'''. On this page you will see a link that we will use for cron:
+
 
+
 
+
[[Image:Tripal2.0 cron1.png|1000px]]
+
 
+
In this example the URL is http://localhost/cron.php?cron_key=3979hhoUyiCQ2PhRpWEZc3lrnFbPvPeXSBDSC5kEk0U
+
 
+
To add an entry to the UNIX cron we must use the '''crontab''' tool:
+
 
+
<pre class="enter">
+
  sudo crontab -e
+
</pre>
+
 
+
{{TextEditorLink|nano}}
+
 
+
 
+
Add this line to the crontab:
+
 
+
<pre class="enter">
+
  0,30 * * * * /usr/bin/wget -O - -q http://localhost/cron.php?cron_key=3979hhoUyiCQ2PhRpWEZc3lrnFbPvPeXSBDSC5kEk0U
+
</pre>
+
 
+
 
+
Now save the changes.  We have now added a UNIX cron job that will occur every 30 minutes that will execute the <tt>cron.php</tt> script and cause Drupal to perform housekeeping tasks.
+
 
+
== Drush ==
+
Drush is a command-line utility that allows for non-graphical access to the Drupal website.  You can use it to automatically download and install themes and modules, clear the Drupal cache, upgrade the site and more.  Tripal v2.0 supports Drush.  For this tutorial we will use Drush and therefore we want the most recent version installed.  Drush can be found on a GitHub page at https://github.com/drush-ops/drush.  For ubuntu, Drush can be installed simply by issuing the following command:
+
 
+
<pre class="enter">
+
  apt-get install drush
+
</pre>
+
 
+
However, for older version of Ubuntu or other operation systems drush can be manually installed.  To install manually, we want Drush to reside in /usr/local which is where 3rd party software is normally installed.  We'll download the package to /usr/local/drush:
+
<pre class="enter">
+
  cd /usr/local
+
  sudo git clone https://github.com/drush-ops/drush.git drush
+
</pre>
+
 
+
Next, we want the operating system to know about drush. We'll create a symbolic link to the Drush executable in the /usr/local/bin directory where the operating systems looks for executables:
+
<pre class="enter">
+
  sudo ln -s /usr/local/drush/drush /usr/local/bin/drush
+
</pre>
+
 
+
Finally Drush needs to perform updates the first time it is run, so we'll run it with elevated privileges (using sudo) so that it can perform it's updates.  In the future we no longer need 'sudo' to run drush:
+
<pre class="enter">
+
  sudo drush
+
</pre>
+
 
+
You must always run drush commands  within the Drupal installation.  It does not matter what subdirectory so long as you are in the Drupal directory sturcture.  To see a list of available commands type the following:
+
 
+
<pre class="enter">
+
  cd /var/www/
+
  drush
+
</pre>
+
 
+
== Explore Drupal ==
+
=== User Account Page ===
+
All users have an account page.  Currently, we are logged in as the administrator.  The account page is simple for now.  Click the '''My account''' link on the left sidebar.  You'll see a brief history for the user and an '''Edit''' tab.  Users can edit their own information using the edit interface:
+
 
+
[[Image:Tripal2.0 drupal user edit page.png]]
+
 
+
=== Creating Content ===
+
Creation of content in Drupal is very easy.  Click the '''Add content''' link on the top administrative menu.
+
 
+
[[Image:Tripal2.0_Drupal_add_content.png]]
+
 
+
 
+
You'll see two content types that come default with Drupal:  Article and Basic Page.  Here is where a user can add simple new pages to the website without knowledge of HTML or CSS.  Click the '''Basic Page''' content type to see the interface for creating a new page:
+
 
+
[[Image:Tripal2.0 Drupal add page.png]]
+
 
+
 
+
You'll notice at the top a '''Title''' field and a '''Body''' text box.  All pages require a title and typically have some sort of content entered in the body.  Additionally, there are other options that allow someone to enter HTML if they would like, save revisions of a page to preserve a history and to set authoring and publishing information.
+
 
+
 
+
For practice, try to create two new pages.  A '''Home''' page and an '''About''' page for our site.  First create the home page and second create the about page.  Add whatever text you like for the body.
+
 
+
==== Finding Content ====
+
 
+
To find any content that has been created on the site, click the '''Find Content'''' link on the administrative menu at the top.  The page shows all content available on the site. You will see the "About" and "Home" pages you created previously:
+
 
+
[[Image:Tripal2.0_Drupal_find_content.png]]
+
 
+
 
+
You'll also notice a set of drop down boxes for filtering the content.  For sites with many different content types and pages this helps to find content.  You can use this list to click to view each page or to edit.
+
 
+
=== Site Administration ===
+
==== Site Building ====
+
===== Modules =====
+
Click the '''Modules''' link on administrative menu at the top of the page:
+
 
+
[[Image:Tripal2.0_Drupal_modules.png]]
+
 
+
Here is where you see the various modules that make up Drupal.  Take a minute to scroll through the list and read some of the descriptions.  The modules you see here are core modules that come with Drupal.  Those that are checked come pre-enabled.  Those that are not checked we will need to install them if we want to use them.  To enable or "turn on" a module, check the box next to the desired module, then scroll to the bottom and click 'Save configuration'.  Your site will now have the functionality provided by that module.  Alternatively, you can search for modules that may be useful to your intended site design at the Drupal module repository, https://drupal.org/project/project_module, and install them by clicking the '''Install New Module''' link.  Finally, a 3rd method to install modules is by use of the '''drush''' tool. We will use '''drush''' for this tutorial.
+
 
+
===== Themes =====
+
Next, click the '''Appearance''' link on the administrative menu at the top of the page:
+
 
+
[[Image:Tripal2.0_Drupal_appearance.png]]
+
 
+
 
+
Here, you'll see a list of themes that come with Drupal by default. Here you will see the '''default theme''' is called Bartik.  This '''theme''' controls the appearance of all content on the site.  You can easily change the way the site looks by changing the '''default theme''' to another theme.  For this tutorial, we would like to use the '''Garland''' theme.  If you scroll down you'll see that one theme named '''Garland'''.  click the link in the Garland theme section titled '''Enable and set default'''.  The current look of the site is using the Garland theme. 
+
 
+
[[Image:Tripal2.0_Drupal_appearance_garland.png]]
+
 
+
 
+
Now, click the house icon in the top left.  Our home page now uses the Garland theme:
+
 
+
[[Image:Tripal2.0_Drupal_garland_home.png]]
+
 
+
===== Blocks =====
+
Blocks in Drupal are used to provide additional content to any page that already exists.  Examples of blocks might be a short overview of recent news items, Twitter feeds, links, recently added content, etc.  The blocks interface can be found by navigating to '''Structure''' &rarr; '''Blocks''' using the top administrative menu.
+
 
+
On this page you'll see a list of available blocks and where they are located within the site. 
+
 
+
[[Image:Tripal2.0_Drupal_blocks.png]]
+
 
+
Here you can see that the '''Search form''', '''Navigation''', and '''User Login''' blocks are all on the left sidebar of Garland theme.  There are also a list of other regions available that do not have any blocks and there are many blocks which are Disabled but could be added to a region on the page.  For this tutorial, we would like for blocks to appear on the right sidebar rather than the left sidebar.  Therefore, change the '''Search form''', '''Navigation''', and '''User Login''' to all use the right sidebar by changing the drop down box next to each one.  When done, click the '''Save Blocks''' button at the bottom.  Now when we view our home page the navigation links, search form and user login box (not shown while logged in) all appear on the right side:
+
 
+
[[Image:Tripal2.0_Drupal_garland_home2.png]]
+
 
+
===== Menus =====
+
For this tutorial, we want to add new links in the '''Main Menu''' to our new Home and About pages we created earlier.  In the Garland theme, the main menu appears in the top right corner and currently only has the link 'Home'.
+
We want to change this link to direct to our new home page.  But first, we need to find the path for our home page. The path for a page can be found in the address bar for the page.  In Drupal pages of content are generally referred to as '''nodes'''.  We can find the new home and about pages using the '''Find content''' link in the top administrative menu.  If we click the link for our home page you'll see the address is http://localhost/node/1.  Our about page is http://localhost/node/2 (i.e the first and second pages we created). 
+
 
+
Drupal provides an interface for working with menus, including adding new menu items to an existing menu or for creating new menus.  You can find the interface for working with menus by navigating to '''Structure''' &rarr; '''Menus''' via the
+
administrative top menu:
+
 
+
 
+
[[Image:Tripal2.0_Drupal_menu.png]]
+
 
+
 
+
Click the link '''list links''' in the operations section for the '''Main Menu'''.  Here we see that the '''Home''' link already exists:
+
 
+
 
+
[[Image:Tripal2.0_Drupal_main_menu.png]]
+
 
+
Click '''edit''' to change the location of the '''Home''' menu item.  In the form that appears, we need to set the path for our new home page.  The path for each of these nodes is simply <tt>node/1</tt>  and <tt>node/2</tt>.  Fill out the form fields with these values
+
 
+
* Menu Link Title:  Home
+
* Path:  node/1
+
* Description:  Tripal 2.0 Demo Home Page
+
* Enabled:  checked
+
* Show as Expanded:  no check
+
* Parent item:  <Main menu>
+
* Weight: 0
+
 
+
[[Image:Tripal2.0_Drupal_main_menu_home.png]]
+
 
+
The settings above will give the menu link a title of '''Home''' and put it on the '''Main menu''' menu.  If we then click the '''Save'' button at the bottom our '''Home''' menu item now redirects us to our new home page.  Now, we also want to add a new menu item for the '''About''' page.  Return to the '''Main menu''' configuration page and add a new link with the following values:
+
 
+
* Menu Link Title:  About
+
* Path:  node/2
+
* Description:  About this site
+
* Enabled:  checked
+
* Show as Expanded:  no check
+
* Parent item:  <Main menu>
+
* Weight: 0
+
 
+
Click '''Save''' and a new menu item should appear.  You can then change the order of the menu items by dragging and dropping the link using the cross-hairs next to each menu item.
+
 
+
===== URL Path =====
+
As mentioned previously, the URL paths for our pages have <tt>node/1</tt> and <tt>node/2</tt> in the address.  This is not very intuitive for site visitors. 
+
 
+
 
+
To set a path, click on our new '''About''' page in the new menu link at the top and click the '''Edit''' tab (you may have to close the overlay to see the menu item).  Scroll to the bottom of the edit page and you'll see a section titled '''URL path setting'''.  click to open this section.  Since this is our about page, we simply want the URL to be http://localhost/about.  To do this, just add the word '''about''' in the text box and click the '''Save''' button.  You will now notice that the URL for this page is no longer http://localhost/node/2  but now http://localhost/about.  Although, both links will still get you to our About page.
+
 
+
[[Image:Tripal2.0 drupal edit about page.png]]
+
 
+
Now, use the instructions described above to set a path of 'home' for our home page.
+
 
+
===== Site Configuration =====
+
There are many options under the '''Configuration''' link of the administrative menu at the top.  Here we will only look at one of these at the moment--the '''Site Information''' page. Here you will find the settings we made when installing the site.  You can change the site name, add a slogan, mission and footer text to the.  The section titled '''Front Page''' is where we can tell Drupal to use our new '''Home''' page we created as the first page visitors see when they view the site. We want this to be the same as the home page we created and added a link for in the '''Main menu'''. In this text box enter the text <tt>node/1</tt>.  Notice there is no preceeding forward slash. Alternatively we could have used the URL path we added in the previous step.  Let's add a slogan: '''Resources for Community Genomics'''. 
+
 
+
[[Image:Tripal2.0_Drupal_site_configuration.png]]
+
 
+
Now, click the '''Save configuration''' button at the bottom.  You'll now see the slogan now at the top of the page.  Also, if you click the site name or the home icon at the top left we are now redirected to the new home page.
+
 
+
=== User Accounts ===
+
For this tutorial, we will not discuss in depth the user management infrastructure except to point out:
+
 
+
* User accounts can be created
+
* Users are assigned to various roles
+
* Permissions for those roles can be set to allow groups of users certain administrative rights or access to specific data.
+
 
+
Explore the Drupal '''User Management''' menu to see how users can be created, added to roles with specific permissions.
+
 
+
== Prepare Drupal for Tripal ==
+
 
+
=== 3rd Party Modules ===
+
We can install new extension modules which we will need later.  For this tutorial we have several modules that we will need to install but which do not yet appear in the list of modules.  To do this, we must follow these steps:
+
 
+
# Locate the extension modules on the Drupal website (https://drupal.org/)
+
# Retrieve the module using a '''drush''' command.
+
# Check for a README.txt or INSTALL.txt for any further instructions for installation of the module
+
# Return the the Drupal '''Administer''' &rarr; '''Site Building''' &rarr; '''Modules''' page and enable the module.
+
 
+
 
+
For an example, let's install the '''ctools''' which is a prerequisite for Tripal. The CTools module can be found here: [http://drupal.org/project/ctools http://drupal.org/project/ctools].  We will download the current version using the drush command.  On the command-line, execute the following:
+
 
+
<pre class="enter">
+
  cd /var/www/sites/all/modules
+
  drush pm-download ctools
+
</pre>
+
 
+
 
+
Check the README for additional installation instructions
+
<pre class="enter">
+
  cd ctools
+
  ls
+
</pre>
+
 
+
There is no README.txt so we are done with installation.  Next, return to the '''Modules''' page and enable the '''Chaos Tools''' module by checking the box next to it:
+
 
+
 
+
[[Image:Tripal2.0_ctools_enable.png]]
+
 
+
 
+
Notice that the ctools package provided many modules and they all appear under a '''Chaos Tools Suite''' category.
+
 
+
Alternatively, you can enable the module using a simple drush command:
+
 
+
<pre class="enter">
+
  drush pm-enable ctools
+
</pre>
+
 
+
 
+
For this tutorial, the CCK, Views, and CKEditor modules should also be downloaded and installed following the same instructions above
+
 
+
<pre class="enter">
+
drush pm-download views cck ckeditor
+
</pre>
+
 
+
For CKEditor, the README file indicates we need to install the CKEditor library before we can enable this module.  We must first get this package from online.  The '''wget''' command can be used to download the file directly using the command-line:
+
 
+
Here is a quick command for downloading this file
+
<pre class="enter">
+
  cd /var/www/sites/all/modules/ckeditor
+
  wget http://download.cksource.com/CKEditor/CKEditor/CKEditor%204.3.2/ckeditor_4.3.2_standard.zip
+
</pre>
+
 
+
Now unzip the package and rename it according to the instructions
+
<pre class="enter">
+
  unzip ckeditor_4.3.2_standard.zip
+
</pre>
+
 
+
Once all installation steps have been completed the Views, CCK and CKEditor modules can be enabled with the following
+
<pre class="enter">
+
  drush pm-enable views views_ui
+
  drush pm-enable cck
+
  drush pm-enable ckeditor
+
</pre>
+
 
+
 
+
For reference, the modules installed above can be found here:
+
* Views: http://drupal.org/project/views
+
* CCK:  http://drupal.org/project/cck
+
* CKEditor: http://drupal.org/project/ckeditor
+
 
+
==== Configure CKEditor ====
+
 
+
Next, we need to configure the CKEditor which provides the MS Word-style interface for adding content.  Navigate to '''Configuration''' &rarr; '''CKEditor'''.  You will see a page similar to the following:
+
 
+
[[Image:Tripal2.0 ckeditor config.png]]
+
 
+
Click the 'Edit' link beside 'CKEditor Global Profile'.  On the page that appears, we want to expand the 'Visibility Settings' and switch the radio button from 'Exclude' to 'Include'.  Then clear all of the entries in the textbox named 'Fields to exclude/include':
+
 
+
 
+
[[Image:Tripal-v1.1 ckeditor2.png|800px]]
+
 
+
Add the following lines to the textbox you just cleared:
+
 
+
<pre class="enter">
+
page@node/add/page.edit-body
+
chado_organism@node/add/chado-organism.edit-description
+
chado_organism@node/*/edit.edit-description
+
chado_analysis@node/add/chado-analysis.edit-description
+
chado_analysis@node/*/edit.edit-description
+
</pre>
+
 
+
This will disable the CKEditor for all text boxes except for generic pages, organism descriptions and analysis descriptions. We can return later to add any other textareas to the list.  You can find the identifier, similar to those we added to the textbox above, underneath any compatible text box. CKEditor puts the identifier under each textbox for your reference. Simply cut-and-paste the identifier.  For example, the screenshot from the '''Create Page''' page is shown below. Notice the CKEdintifier for the textbox named <tt>sky:page@node/add/page.edit-body.</tt>.  This was one of the identifiers we used in the textbox above, but with the theme name (e.g. <tt>sky</tt>) removed.
+
 
+
[[Image:Tripal v1.1-ckeditor3.png|800px]]
+
 
+
Click the '''Update global profile button'''.  Next, under the '''Profiles''' section. Click the '''edit''' link next to '''Default''' profile.  When the page appears, open the '''Editor Appearance''' section, and set the Toolbar by clicking the '''full''' link.  finally, click the '''Save''' button.
+
 
+
== Tripal Installation ==
+
=== Get the Software ===
+
To download Tripal and the Extension modules change to the directory where Drupal keeps it's modules:
+
 
+
<pre class="enter">
+
cd /var/www/sites/all/modules
+
</pre>
+
 
+
To obtain Tripal, issue the following '''git'' commands:
+
 
+
<pre class="enter">
+
git clone http://git.drupal.org/sandbox/spficklin/1337878.git tripal
+
cd tripal
+
git checkout 7.x-2.0a
+
</pre>
+
 
+
We also want to obtain several Extension modules that will be used in this tutorial.  Those modules are available on the [http://tripal.info/extensions Extensions Page] of the Tripal website.  However, these extension modules are also available via a git repository so we will use a git commands to obtain these.
+
 
+
<pre class="enter">
+
cd /var/www/sites/all/modules
+
git clone http://git.drupal.org/sandbox/spficklin/1578226.git tripal_blast_analysis
+
cd tripal_blast_analysis
+
git checkout 7.x-2.0a-tripal_v2.0a
+
 
+
cd /var/www/sites/all/modules
+
git clone http://git.drupal.org/sandbox/spficklin/1578234.git tripal_kegg_analysis
+
cd tripal_kegg_analysis
+
git checkout 7.x-2.0a-tripal_v2.0a
+
 
+
cd /var/www/sites/all/modules
+
git clone http://git.drupal.org/sandbox/spficklin/1578232.git tripal_interpro_analysis
+
cd tripal_interpro_analysis
+
git checkout 7.x-2.0a-tripal_v2.0a
+
 
+
cd /var/www/sites/all/modules
+
git clone http://git.drupal.org/sandbox/spficklin/1578230.git tripal_go_analysis
+
cd tripal_go_analysis
+
git checkout 7.x-2.0a-tripal_v2.0a
+
 
+
cd /var/www/sites/all/modules
+
git clone http://git.drupal.org/sandbox/spficklin/1578246.git tripal_unigene_analysis
+
cd tripal_unigene_analysis
+
git checkout 7.x-2.0a-tripal_v2.0a
+
 
+
</pre>
+
 
+
=== Applying Patches ===
+
 
+
A bug exists in Drupal related to the bytea data type in PostgreSQL.  At the writing of this document, a fix is not yet incorporated into Drupal, but a patch has been provided.  Execute the following commands to patch Drupal:
+
 
+
<pre class="enter">
+
cd /var/www
+
wget --no-check-certificat https://drupal.org/files/drupal.pgsql-bytea.27.patch
+
patch -p1 < drupal.pgsql-bytea.27.patch
+
</pre>
+
 
+
There is also a bug in the Drupal Views 3.0 code that prevents Tripal's administrative and search data views from functioning. The patch is provided within the tripal_veiws module.  To apply the patch execute the following:
+
 
+
<pre class="enter">
+
cd /var/www/sites/all/modules/views
+
patch -p1 < ../tripal/tripal_views/views-sql-compliant-three-tier-naming-1971160-22.patch
+
</pre>
+
 
+
=== Tripal Installation ===
+
Previously in this Tutorial we enabled the '''Path''' and '''Search''' modules.  The process for enabling the Tripal modules is the same.  The site administrator can navigate to the '''Administer''' &rarr; '''Site Building''' &rarr; '''Modules''' page and enable each of the Tripal modules.  However, Drush make it easier to enable modules from the command-line.    First, we must enable the tripal_core module.  Enter the following command
+
 
+
<pre class="enter">
+
drush pm-enable tripal_core
+
</pre>
+
 
+
Now that the core module is enabled, A new 'Tripal' menu item appears at the top. 
+
 
+
[[Image:Tripal2.0 install5.png|1000px]]
+
 
+
Next, we must next install Chado.  In the web browser, navigate to '''Tripal''' &rarr; '''Setup Tripal''' &rarr; '''Install Chado Schema'''.  Because this is a fresh install, select the option to install Chado v1.2 and click the button '''Install/ugrapde Chado'''
+
 
+
[[Image:Tripal2.0 install6.png|1000px]]
+
 
+
After the button is clicked a message will appear stating "Job 'Install Chado v1.2' submitted. Check the jobs page for status".  Click the '''jobs page''' link to see the job that was submitted:
+
 
+
[[Image:Tripal2.0 install7.png|1000px]]
+
 
+
The job is waiting in the queue until the Tripal jobs system wakes and tries to run the job. The jobs management subsystem allows modules to submit long-running jobs, on behalf of site administrators or site visitors.  Often, long running jobs can time out on the web server and fail to complete.  The jobs system runs separately in the background. In the example above we now see a job for installing Chado.  The job view page provides details such as the name of the job, dates that the job was submitted and job status.
+
 
+
Jobs in the queue can be executed in two ways:
+
* Manually through a command-line call
+
* Using the UNIX cron to automatically launch the command-line
+
 
+
When we installed Drupal we installed a Cron job to allow the software to run housekeeping tasks on a regular bases.  Tripal needs a cron entry as well to allow for regular execution of jobs in the queue.  We will need to add a second cron entry:
+
 
+
<pre class="enter">
+
  sudo crontab -e
+
</pre>
+
{{TextEditorLink|nano}}
+
 
+
 
+
Add this line to the crontab
+
<pre class="enter">
+
  0,15,30,45 * * * * (cd /var/www; drush trpjob-run administrator ) > /dev/null
+
</pre>
+
 
+
 
+
This entry will run the Tripal cron every 15 minutes as the administrator user.  For this tutorial we do not want to wait 15 minutes at the most to execute our jobs.  So, we will run the jobs manually.  Tripal supports Drush and therefore has it's own commands.  We can use drush to manually launch the job:
+
 
+
<pre class="enter">
+
cd /var/www
+
drush trpjob-run administrator
+
</pre>
+
 
+
As the installation of Chado proceeds, we should see the following text in the terminal window. The final message indicates that the installation of Chado was successful:
+
 
+
<pre>
+
Tripal Job Launcher
+
Running as user 'administrator'
+
-------------------
+
Calling: tripal_core_install_chado(Install Chado v1.2, 1)
+
Creating 'chado' schema
+
Loading sites/all/modules/tripal/tripal_core/chado_schema/default_schema-1.2.sql...
+
Install of Chado v1.2 (Step 1 of 2) Successful!
+
Loading sites/all/modules/tripal/tripal_core/chado_schema/initialize-1.2.sql...
+
Install of Chado v1.2 (Step 2 of 2) Successful.
+
Installation Complete
+
</pre>
+
 
+
 
+
Also, we see that the job has completed when refreshing the jobs management page:
+
 
+
[[Image:Tripal2.0 install8.png|1000px]]
+
 
+
Now that Chado is installed, we can continue with installation of the remaining Tripal modules.  These modules should be installed in the following order one at a time.  If you install them all at once you may encounter errors later.  Install the modules in the following way (and order):
+
 
+
<pre class="enter">
+
drush pm-enable tripal_views
+
drush pm-enable tripal_db
+
drush pm-enable tripal_cv
+
drush pm-enable tripal_organism
+
drush pm-enable tripal_analysis
+
drush pm-enable tripal_feature
+
</pre>
+
 
+
In this tutorial we will also discuss adding publications.  The Tripal Pub module is dependent on the Tripal Contact module. We can enable both of them together:
+
 
+
<pre class="enter">
+
drush pm-enable tripal_contact tripal_pub
+
</pre>
+
 
+
 
+
Now, enable the remaining Tripal extension modules
+
 
+
<pre class="enter">
+
drush pm-enable tripal_analysis_blast
+
drush pm-enable tripal_analysis_go
+
drush pm-enable tripal_analysis_interpro
+
drush pm-enable tripal_analysis_kegg
+
</pre>
+
 
+
There are more Tripal modules that can be enabled (e.g. tripal_project, tripal_stock, etc.).  But for this tutorial we will only be using the modules we enabled above.
+
 
+
The Tripal modules create directories in the /var/www/sites/default/files directory.  By default, Drupal expects the 'sites/default/files' directory to be writeable by the web server.  Because we installed the Tripal modules using Drush we need to reset the permissions for the web user.  Execute the following command to give the web user group permission to write to that directory
+
 
+
<pre class="enter">
+
sudo chown -R ubuntu:www-data /var/www/sites/default/files
+
sudo chmod -R g+rw /var/www/sites/default/files
+
</pre>
+
 
+
Tripal is now installed with a fresh instance of Chado!
+
 
+
=== Controlled Vocabularies: Installing CVs ===
+
 
+
Before we can proceed with populating our Chado table with genomic data we must first load some controlled vocabularies (i.e. ontologies).  To do this, navigate to '''Tripal''' &rarr; '''Chado Data Loaders''' &rarr; '''OBO File Loader'''.  You'll see the following page:
+
 
+
[[Image:Tripal2.0_cv1.png]]
+
 
+
 
+
The Ontology loader will allow you to select a pre-defined ontology from the drop down list or allow you to provide your own to be loaded.  If you provide your own, you give the remote URL of the OBO file or provide the full path on the local web server where the OBO file is located.  In the case of a remote URL, Tripal first downloads and then parses the OBO file for loading. If you do provide your own OBO file it will appear in the saved drop down list for loading of future updates to the ontology.
+
 
+
For this tutorial, we need to install these ontologies:
+
# Chado feature properties
+
# Relationship ontology
+
# Sequence ontology
+
# Gene ontology.
+
 
+
Do so by selecting an ontology from the drop-down and clicking the '''Submit''' button.  Repeat this process for each of the four ontologies.  You'll notice each time that a job is added to the jobs subsystem.
+
 
+
 
+
Now manually launch these jobs
+
 
+
<pre class="enter">
+
cd /var/www
+
drush trpjob-run administrator
+
</pre>
+
 
+
 
+
'''<font color="red">Note:</font>''' Loading the Gene Ontology will take several hours.
+
 
+
=== Setting Perimssions ===
+
Because we are logged on to the site as the administrator user we are able to see all content.  However, Drupal provides user management and permissions tools that allows the site admin to set which types of users can view the content on the site.  By default there are two types of users '''anonymous''' and '''authenticated''' users.  For this tutorial we want to set permissions so that anonymous visitors to the site can see the genomic content.  To do this, navigate to '''People''' &rarr; '''Permissions'''.  Here you will see permissions for all types of content.
+
 
+
[[File:Tripal2.0_Drupal_permissions.png]]
+
 
+
 
+
Scroll through the list of permissions until you come to those for Tripal.  Be sure to give anonymous and authenticated users the following permissions:
+
 
+
* View Analyses
+
* View Features
+
* View Organisms
+
 
+
Each time you install a new module you should always check the '''Permissions''' page and set any new permissions that may have been added by the new module.
+
 
+
== Using Tripal ==
+
=== Creating Organism Pages ===
+
There are two ways to create pages for organism.  If your organism is already in Chado then you can sync the organism.  Sync'ing is the process of creating Drupal pages for content in Chado. If an organism is not in Chado you will need manually add it using the Tripal web interface.  The following two sections describe both methods.
+
 
+
==== What if Our Organism is Already in Chado? ====
+
Now that we have Chado loaded and populated we would like to create pages for our orgnisms.  Chado comes pre-loaded with a few species already, so we will check to see if our organism is already present.  To do this navigate to '''Tripal''' &rarr; '''Chado Modules''' &rarr; '''Organisms''' &rarr; '''Sync Organisms'''
+
 
+
[[Image:Tripal2.0 organism1.png|1000px]]
+
 
+
 
+
This page has two different options.  The first is the top section labeled '''Sync Organisms'''.  In this section is a list of organisms.  These are the organisms that come with the default Chado installation.  If our organism is already in the list (e.g.  ''Drosophila  melenogaster'') then we need to inform Drupal that we have data in Chado for which we would like a web page.  This is what we call '''Syncing'''.  We need to sync Drupal and Chado so that Drupal knows about our organism.  To do this, click the check box next to '''Drosophila melenogaster''' and then click the '''Submit Sync Job'''.
+
 
+
As usual we want to run this job manually:
+
 
+
<pre class="enter">
+
cd /var/www
+
drush trpjob-run administrator
+
</pre>
+
 
+
 
+
Now that our organism is synced we should have a new page for ''Drosophila melenogaster''.  To find the page, click the '''Find Content''' menu item at the top.  A list of all content pages currently available in Drupal should appear.  Our new organism page should appear at the top of the list.
+
 
+
Click the link titled '''Drosophila Melenogaster''' and the following page should appear:
+
 
+
[[File:Tripal2.0_organism2.png]]
+
 
+
By default all Tripal pages have a table of contents on the left and data in the center.  As links in the table of content are clicked the content in the middle updates. Also, messages to the site administrator are contained in blue shaded regions.  These blue shaded regions will only appear to users with the '''Administer Tripal''' permission. This page, however, is a bit empty.  We need to add some details.  We want to add a description for this organism and an image.  Click the '''Edit''' tab next to the page title. In the form that appears add the following text (taken from wikipedia: http://en.wikipedia.org/wiki/Drosophila_melanogaster) for the description:
+
 
+
"The genome of D. melanogaster (sequenced in 2000, and curated at the FlyBase database) contains four pairs of chromosomes: an X/Y pair, and three autosomes labeled 2, 3, and 4. The fourth chromosome is so tiny that it is often ignored, aside from its important eyeless gene. The D. melanogaster sequenced genome of 165 million base pairs has been annotated[17] and contains approximately 13,767 protein-coding genes, which comprise ~20% of the genome out of a total of an estimated 14,000 genes. More than 60% of the genome appears to be functional non-protein-coding DNA involved in gene expression control. Determination of sex in Drosophila occurs by the ratio of X chromosomes to autosomes, not because of the presence of a Y chromosome as in human sex determination. Although the Y chromosome is entirely heterochromatic, it contains at least 16 genes, many of which are thought to have male-related functions."
+
 
+
For the image, download this image below and upload it using the '''Organism image''' upload field on the page. 
+
 
+
[[Image:Dmel.jpg|200px]]
+
 
+
Click the '''Save''' button.  Now we have a more informative page:
+
 
+
[[Image:Tripal2.0_organism3.png]]
+
 
+
==== Manually Adding an Organism ====
+
For this tutorial we will be loading data for ''Citrus sinensis'' (sweet orange), but this organism is not in Chado by default. We can easily add the organism using the '''Add Content''' link in the top administrative menu.  The '''Add Content''' page now has many more content types than when we first saw it.  Previously we only had '''Page''' and '''Story''' content types.  Now we have more content types such as '''Analysis''', '''Organism''', and '''Feature'''.
+
 
+
[[File:Tripal2.0_Drupal_add_content2.png]]
+
 
+
 
+
To add a new organism simply click the '''Organism''' link and and fill in the fields with these values:
+
 
+
* Genus:  Citrus
+
* Species:  sinensis
+
* Abbreviation:  C. sinensis
+
* Common name:  Sweet orange
+
* Description:  Sweet orange is the No.1 citrus production in the world, accounting for about 70% of the total. Brazil, Flordia (USA), and China are the three largest sweet orange producers. Sweet orange fruits have very tight peel and are classified into the hard-to-peel group. They are often used for juice processing, rather than fresh consumption. Valencia, Navel, Blood, Acidless, and other subtypes are bud mutants of common sweet orange varieties. Sweet orange is considered as an introgression of a natural hybrid of mandarin and pummelo; some estimates shows more mandarin genomic background than pummelo.  The genome size is estimated at 380Mb across 9 haploid chromosomes.
+
 
+
And, use the following image:
+
 
+
[[Image:Citrus sinensis.jpg|200px]]
+
 
+
Save the page and view the new Organism:
+
 
+
[[File:Tripal2.0_organism4.png]]
+
 
+
=== Creating an Analysis ===
+
For this tutorial, we will later import a set of genes, and their associated mRNA, CDS, UTRs, etc.  Tripal requires that an analysis be associated with all imported features.  This has several advantages, including:
+
 
+
* The source of features (sequences) can be traced. Even for features simply downloaded from a database, someone else can see where the features came from.
+
* Provides a mechanism for describing how the features were created (e.g. whole genome structural and functional annotation description)
+
* The analysis links all of the features together which can be useful for querying for specific features from an analysis.
+
 
+
To create an analysis for loading our genomic data, navigate to the '''Add content''' and click on the link: '''Analysis'''
+
 
+
The analysis creation page will appear:
+
 
+
[[Image:Tripal2.0_analysis1.png]]
+
 
+
Here you can provide the necessary details to help others understand the source of your data.  For this tutorial, enter the following:
+
 
+
* Analysis Name:  Whole Genome Assembly and Annotation of Citrus Sinensis (JGI)
+
* Program, Pipeline Name or Method Name:  Performed by JGI
+
* Program, Pipeline Name or Method Name:  v1.0
+
* Source Name:  JGI Citrus sinensis assembly/annotation v1.0 (154)
+
* Source URI:  http://www.phytozome.net/citrus.php
+
* Time Executed: Feb 1, 2011
+
* Materials and Methods: (if using CKEditor, click the 'Source' button before pasting)
+
 
+
<pre class="enter">
+
<p>
+
<strong><em>Note: </em>The following text comes from phytozome.org:</strong></p>
+
<p>
+
<u>Genome Size / Loci</u><br />
+
This version (v.1) of the assembly is 319 Mb spread over 12,574 scaffolds. Half the genome is accounted for by 236 scaffolds 251 kb or longer.&nbsp;The current gene set (orange1.1) integrates 3.8 million ESTs with homology and ab initio-based gene predictions (see below). 25,376 protein-coding loci have been predicted, each with a primary transcript. An additional 20,771 alternative transcripts have been predicted, generating a total of 46,147 transcripts. 16,318 primary transcripts have EST support over at least 50% of their length. Two-fifths of the primary transcripts (10,813) have EST support over 100% of their length.</p>
+
<p>
+
<u>Sequencing Method</u><br />
+
Genomic sequence was generated using a whole genome shotgun approach with 2Gb sequence coming from GS FLX Titanium; 2.4 Gb from FLX Standard; 440 Mb from Sanger paired-end libraries; 2.0 Gb from 454 paired-end libraries</p>
+
<p>
+
<u>Assembly Method</u><br />
+
The 25.5 million 454 reads and 623k Sanger sequence reads were generated by a collaborative effort by 454 Life Sciences, University of Florida and JGI. The assembly was generated by Brian Desany at 454 Life Sciences using the Newbler assembler.</p>
+
<p>
+
<u>Identification of Repeats</u><br />
+
A de novo repeat library was made by running RepeatModeler (Arian Smit, Robert Hubley) on the genome to produce a library of repeat sequences. Sequences with Pfam domains associated with non-TE functions were removed from the library of repeat sequences and the library was then used to mask 31% of the genome with RepeatMasker.</p>
+
<p>
+
<u>EST Alignments</u><br />
+
We aligned the sweet orange EST sequences using Brian Haas&#39;s PASA pipeline which aligns ESTs to the best place in the genome via gmap, then filters hits to ensure proper splice boundaries.</p>
+
 
+
</pre>
+
 
+
 
+
'''<font color="red">Note:</font>''':  Above we entered HTML.  This is not the easiest way to enter text, but makes it simple for this tutorial.  When the '''ckeditor''' module is installed and properly setup the user is provided with editor tools that makes it much easier to add text to any page. 
+
 
+
After saving, you should have the following analysis page:
+
 
+
[[File:Tripal2.0_analysis2.png]]
+
 
+
=== Creating a Database Cross Reference ===
+
For our site, we want to create gene pages with sequences and have those pages link back to JGI where we obtained the genes.  Therefore,  we want to add a database reference for JGI.  To add a new external databases, navigate to '''Tripal''' &rarr; '''Chado Modules''' &rarr; '''Databases''' &rarr; '''Add a Database'''.  The resulting page provides fields for adding a new database:
+
 
+
[[File:Tripal2.0_db1.png]]
+
 
+
Enter the following values for the fields:
+
 
+
* Name: Phytozome
+
* Description:  Phytozome is a joint project of the Department of Energy's Joint Genome Institute and the Center for Integrative Genomics to facilitate comparative genomic studies amongst green plants
+
* URL: http://www.phytozome.net/
+
* URL prefix:  http://www.phytozome.net/genePage.php?search=1&detail=1&er=1&crown&method=0&searchText=transcriptid%3A
+
 
+
The URL prefix is important as it will be used to create the links on our gene pages.  Our gene name will be appended to this URL to create the link that will take us to the corresponding gene page on Flybase.
+
 
+
Click '''Add'''.
+
 
+
We now have added a new database!
+
 
+
Later we will also load Blast data.  We need to create two new databases for those as well.  Create the following entries for NCBI nr, and ExPASy SwissProt:
+
 
+
* Name: NCBI nr
+
* Description:  NCBI's non-redundant database.
+
* URL: http://www.ncbi.nlm.nih.gov/
+
* URL prefix:  http://www.ncbi.nlm.nih.gov/protein/
+
 
+
 
+
* Name: ExPASy Swiss-Prot
+
* Description:  A curated protein sequence database which strives to provide a high level of annotation, a minimal level of redundancy and high level of integration with other databases
+
* URL: http://expasy.org/sprot/
+
* URL prefix:  http://www.uniprot.org/uniprot/
+
 
+
=== Loading Feature Data ===
+
Now that we have our organism and whole genome analysis ready, we can being loading genomic data.  For this tutorial only a single gene from sweet orange will be loaded into the databsae.  This is to ensure  we can move through the tutorial rather quickly.  The following datasets will be used for this tutorial:
+
 
+
* [[Media:Citrus sinensis-orange1.1g015632m.g.gff3|Citrus sinensis-orange1.1g015632m.g.gff3]]
+
* [[Media:Citrus sinensis-scaffold00001.fasta|Citrus sinensis-scaffold00001.fasta]]
+
* [[Media:Citrus sinensis-orange1.1g015632m.g.fasta|Citrus sinensis-orange1.1g015632m.g.fasta]]
+
 
+
 
+
Download these to the <tt>/var/www/sites/default/files</tt>. The quickest method is to right-click on the links above, then '''wget''' to retrieve the file:
+
 
+
<pre class="enter">
+
  cd /var/www/sites/default/files
+
  wget http://www.gmod.org/mediawiki/images/d/dc/Citrus_sinensis-orange1.1g015632m.g.gff3
+
  wget http://www.gmod.org/mediawiki/images/8/87/Citrus_sinensis-scaffold00001.fasta
+
  wget http://www.gmod.org/mediawiki/images/9/90/Citrus_sinensis-orange1.1g015632m.g.fasta
+
</pre>
+
 
+
 
+
==== Loading a GFF3 File ====
+
The gene features (e.g. gene, mRNA, 5_prime_UTRs, CDS 3_prime_UTRS) are stored in the GFF3 file downloaded in the previous step.  We will load this GFF3 file and consequently load our gene features into the database.  Navigate to '''Tripal''' &rarr; '''Chado Data Loaders''' &rarr; '''GFF3 file loader'''.
+
 
+
[[Image:Tripal2.0_gff3_import.png]]
+
 
+
Perform the following:
+
 
+
# Enter the path on the file system where our GFF file resides (<tt>/var/www/sites/default/files/Citrus_sinensis-orange1.1g015632m.g.gff3</tt>)
+
# Choose the organism to which the GFF3 file belongs (in this case ''Citrus sinensis (sweet orange)''
+
# Select the analysis named "Whole Genome Assembly and Annotation of Citrus sinensis...".
+
# Leave all other options as default.
+
 
+
Finally, click the '''Import GFF3 file''' button.  You'll notice a job was submitted to the jobs subsystem.  Now, to complete the process we need the job to run.  We'll do this manually:
+
 
+
<pre class="enter">
+
cd /var/www;
+
drush trpjob-run administrator
+
</pre>
+
 
+
You should see output similar to the following:
+
 
+
<pre>
+
Tripal Job Launcher
+
Running as user 'administrator'
+
-------------------
+
Calling: tripal_feature_load_gff3(/var/www/sites/default/files/Citrus_sinensis-orange1.1g015632m.g.gff3, 13, 10, 0, 1, 0, 0, 1, , , 0, , , , 0, 8)
+
 
+
NOTE: Loading of this GFF file is performed using a database transaction.
+
If the load fails or is terminated prematurely then the entire set of
+
insertions/updates is rolled back and will not be found in the database
+
 
+
Opening /var/www/sites/default/files/Citrus_sinensis-orange1.1g015632m.g.gff3
+
Parsing Line 138 (100.00%). Memory: 25,873,800 bytes.
+
Setting ranks of children...
+
Setting 10 of 10 (100.00%). Memory: 25,901,976 bytes.
+
Done
+
 
+
</pre>
+
 
+
<font color="red">'''Note'''</font>:  For very large GFF files the loader can take quite a while to complete.
+
 
+
==== Loading FASTA files ====
+
Using the Tripal GFF loader we were able to populate the database with the genomic features for our organism.  However, those features now need nucleotide sequence data.  To do this, we will load the nucleotide sequences for the mRNA features and the scaffold sequence.  Navigate to the '''Tripal''' &rarr; '''Chado Data Loaders''' &rarr; '''FASTA file loader''' Page
+
 
+
 
+
[[Image:Tripal2.0_fasta_loader.png]]
+
 
+
 
+
Before loading the FASTA file we must first know the Sequence Ontology (SO) term that describes the sequences we are about to upload.  We can find the appropriate SO terms from our GFF file.  In the GFF file we see the SO terms that correspond to our FASTA files are  'scaffold' and 'mRNA'.
+
 
+
'''<font color="red">IMPORTANT:</font>''' It is important to ensure prior to importing, that the FASTA loader will be able to appropriately match the sequence in the FASTA file with existing sequences in the database.  Before loading FASTA files take special care to ensure the definition line of your FASTA file can uniquely identify the feature for the specific organism and sequence type. For example, in our GFF file an mRNA feature appears as follows:
+
 
+
<pre class="enter">
+
scaffold00001  phytozome6      mRNA    4058460 4062210 .      +      .      ID=PAC:18136217;Name=orange1.1g015632m;PACid=18136217;Parent=orange1.1g015632m.g
+
</pre>
+
 
+
Note that for this mRNA feature the ID is '''PAC:18136217''' and the name is '''orange1.1g015632m'''.  In Chado, features always have a human readable name which does not need to be unique, and also a unique name which must be unique for the organism and SO type. In the GFF file, the ID becomes the unique name and the Name becomes the human readable name.
+
 
+
In our FASTA file the definition line for this mRNA is:
+
 
+
<pre class="enter">
+
>orange1.1g015632m PAC:18136217 (mRNA) Citrus sinensis
+
</pre>
+
 
+
 
+
By default Tripal will match the sequence in a FASTA file with the feature that matches the first word in the definition line.  In this case the first word is '''orange1.1g015632m'''. As defined in the GFF file, the name and unique name are different for this mRNA.  However, we can see that the first word in the definition line of the FASTA file is the name and the second is the unique name.  Therefore, when we load the FASTA file we should specify that we are matching by the name because it appears first in the definition line.
+
 
+
If however, we cannot guarantee the that feature name is unique then we can use a regular expressions in the '''Advanced Options''' to tell Tripal where to find the name or unique name in the definition line of your FASTA file.
+
 
+
'''<font color="red">IMPORTANT:</font>'''  When loading FASTA files to update ''existing'' features, always choose "Update only" as the import method.  Otherwise, Tripal may add the features in the FASTA file as new features if it cannot properly match them to existing features.
+
 
+
 
+
Now, enter the following values in the fields on the web form:
+
 
+
* FASTA file:  /var/www/sites/default/files/Citrus_sinensis-scaffold00001.fasta
+
* Organism:  Citrus sinensis (Sweet orange)
+
* Sequence type:  supercontig  (scaffold is an alias for supercontig in the sequence ontology)
+
* Method: Update only  (we do not want to insert these are they should already be there)
+
* Name Match Type: Name
+
* Analysis:  Whole Genome Assembly and Annotation of Citrus sinensis....
+
 
+
Click the '''Import Fasta File''', and a job will be added to the jobs system.  Run the job:
+
 
+
<pre class="enter">
+
cd /var/www
+
drush trpjob-run administrator
+
</pre>
+
 
+
Next do the same for the genes GFF:
+
 
+
* FASTA file:  /var/www/sites/default/files/Citrus_sinensis-orange1.1g015632m.g.fasta
+
* Organism:  Citrus sinensis (Sweet orange)
+
* Sequence type:  mRNA
+
* Method: Update only
+
* Name Match: Name
+
* Analysis:  Whole Genome Assembly and Annotation of Citrus sinensis....
+
 
+
 
+
Now run this job:
+
 
+
<pre class="enter">
+
cd /var/www;
+
drush trpjob-run administrator
+
</pre>
+
 
+
Now the scaffold sequence and mRNA sequences are loaded
+
 
+
'''<font color="red">Note</font>''' It is not necessary to load the mRNA sequences as those can be derived from their alignments with the scaffold sequence. However, in Chado the feature table has a 'residues' column. Therefore, it is best practice to load the sequence when possible.
+
 
+
 
+
The FASTA loader has some advanced options which we will not cover in this tutorial.  But briefly, the advanced options allow you to create relationships between features and associate them with external databases.  For example, the definition line for an mRNA is:
+
 
+
<pre class="enter">
+
>orange1.1g015632m PAC:18136217 (mRNA) Citrus sinensis
+
</pre>
+
 
+
Here we have more information than just the feature name.  We have a unique Phytozome accession number (e.g. PAC ID) for the mRNA.  Using the '''External Database Reference''' section under  '''Advanced Options''' we can provide the name of the database and a regular expression to tell the loader how to find the accession number in the definition line. 
+
 
+
If the name of the gene to which this mRNA belonged was also on the definition line, we could use the '''Relationships''' section to link this mRNA with it's gene parent.  Fortunately, this information is also in our GFF file and these relationships have already been made.
+
 
+
=== Creating Feature Pages ===
+
Now that we've loaded our feature data, we must "sync" them.  Loading of the GFF file in the previous step has populated the feature tables of Chado for us, but now Drupal must know about these features.  To sync features, navigating to '''Tripal''' &rarr; '''Chado Modules''' &rarr; '''Features''' &rarr; '''Sync'''.
+
 
+
[[File:Tripal2.0_feature_sync.png]]
+
 
+
Here we can specify the types of features to sync and the organism.  This allows us to create feature pages for different types of features for different organisms.  In our case, we want gene and mRNA pages (these types were present in our GFF file). To only create pages for genes and mRNA we want to enter the sequence ontology terms '''gene''' and '''mRNA''' in the '''Feature Types''' box.  Place each term on a separate line.
+
 
+
Next, select the organism "Citrus sinensis", and click the "Sync Features" button.  A job is then added to the jobs management system which we need to manually run rather than wait on the cron entry to run it.
+
 
+
<pre class="enter">
+
cd /var/www
+
drush trpjob-run administrator
+
</pre>
+
 
+
 
+
Our features are now synced:
+
 
+
 
+
[[File:Tripal-Features-Synced.png]]
+
 
+
'''<font color="red">Note:</font>''' It is not necessary to sync all types of features in the GFF file.  For example, do not sync the '''scaffold'''.  The feature is large and would have many relationships to other features. Only sync features that you will want users to view.  For example, each mRNA is composed of several '''CDS''' features.  These CDS features do not need their own page and therefore do not need to be synced..
+
 
+
 
+
Now, we can view our gene and mRNA pages.  Click the '''Find Content''' link.  to see the newly added features. Click the new page titled '''orange1.1g015615m, PAC:18136219 (mRNA) Citrus sinensis'''.  Here we can see the gene feature we added and its corresponding mRNA's. 
+
 
+
[[File:Tripal2.0_feature2.png]]
+
 
+
=== Materialized Views ===
+
Chado is efficient as a data warehouse but queries can become slow depending on the number of table joins and amount of data.  To help simplify and speed these queries, materialized views can be employed. For a materialized view, a table is created and then populated with the results of a pre-defined SQL query.  Therefore, rather than execute the pre-defined query which may take a long time, the query on the materialized view is more simple and faster.  A side effect, however is redundant data, with the materialized view becoming stale if not updated regularly.
+
 
+
Tripal provides a mechanism for populating and updating these materialized views.  These can be found on the '''Tripal''' &rarr; '''Chado Schema''' &rarr; '''Materialized Views''' page.
+
 
+
[[File:Tripal2.0_mviews.png]]
+
 
+
 
+
Here we see several materialized views.  These were installed automatically by the various Tripal modules.  To update these views, click the '''Populate''' button for each one.
+
 
+
This will submit jobs to populate the views with data.  Now, run the jobs:
+
 
+
<pre class="enter">
+
cd /var/www
+
drush trpjob-run administrator
+
</pre>
+
 
+
 
+
You can now see that all views are up-to-date on the '''Materialized Views''' Page.  The number of rows in the view table is shown:
+
 
+
[[File:Tripal-MViews-Populated.png|800px]]
+
 
+
 
+
Materialized views are most useful when creating custom pages where data is queried in novel ways.
+
 
+
=== Feature Page Configuration ===
+
The feature configuration page allows us to perform configuration changes for the entire site. Navigate to the '''Tripal''' &rarr; '''Chado Modules''' &rarr; '''Features''' &rarr; '''Settings''' page.
+
 
+
==== Feature URLs ====
+
First on the configuration page is the '''Feature URL Path''' settings. In this section, you can use tokens to create the URL path for each feature.  By default, the URL string is '''/feature/[genus]/[species]/[type]/[uniquename]'''.  The tokens in this URL string are '''[genus]''', '''[species]''', '''[type]''' and '''[uniquename]'''.  These tokens are substituted by the genus, species, type and uniquename of the feature respectively.  All other characters are left as is.  Thus, for feature '''orange1.1g015615m''' the URL becomes:  http://localhost/feature/citrus/sinensis/mRNA/PAC%3A18136219.  You can change the tokens as desired.  But, be certain to always create a URL that is guaranteed to be unique.  The URL string provided by default will always be unique.
+
 
+
[[Image:Tripal2.0_feature_configure.png]]
+
 
+
'''<font color="red">Note:</font>''' Additionally, Tripal uses these URLs for any feature '''/feature/[name]''' or '''feature/[uniquename]'''.  These allow for more simple linking from external sites to the features in a Tripal site.  For example, GBrowse can be configured to link to features in Tripal.  If two or more features have the same name then Tripal will present a table of matching features for the user to select from.
+
 
+
==== Feature Browser ====
+
Next on the configuration page are '''Feature Browser''' settings.  By default, Tripal will provide a browser on the organism page that allows a visitor to easily find a feature.  For large sites with many features this would be an inefficient way to find a specific feature, but it does allow visitors who simply want to explore the site to quickly find example pages.  This browser will only show synced features and will only show features of the type specified in the '''Feature Types''' box.  We want to show '''genes''' pages so alter the contents of this box to contain only the word '''gene'''. 
+
 
+
[[File:Tripal2.0_feature_browser.png]]
+
 
+
==== Feature Summary Report ====
+
Next on the configuration page is the '''Feature Summary Report''' setting.  By default, on the organism page, Tripal will provide a list of all features belonging to an organism and provide a pie-chart of this list.  For example, below is a screen shot of the '''Feature Summary''' on the '''<em>Citrus sinensis</em>''' page for the data we loaded.  This is only useful when data from a single analysis is associated with an organism.  For sites with only a single unigene (transcriptome analysis) or a single whole genome then this summary would be appropriate. For sites with multiple analyses it may confuse site visitors who see mulitple counts.
+
 
+
 
+
[[Image:Tripal2.0_feature_summary.png]]
+
 
+
On the feature settings page, you can also specify which feature types should appear and rename them to be more meaningful.  We want to provide a list of the total number of scaffolds, genes and mRNA.  To do this, enter the following contents in the '''Map feature types''' box"
+
<pre>
+
supercontig = Scaffolds
+
gene = Genes
+
mRNA
+
 
+
</pre>
+
 
+
Cick the '''Save configuration''' button at the bottom.  Now the Feature Summary on the organism page appears as:
+
 
+
[[Image:Tripal2.0_feature_summary2.png]]
+
 
+
 
+
<font color="red">'''Note'''</font>  The feature summary is only available when the '''organism_feature_count''' materialized view is populated.  Each time new data is added, this materialized view should  be re-populated to capture the changes and have those shown in the summary.
+
 
+
=== Loading Functional Data Using Extension Modules ===
+
For this tutorial we will be loading functional data for our gene.  To do this we will use the Blast, KEGG, and InterPro extension modules.  These modules were installed previously.  Blast, KEGG and InterPro analyses were completed prior to this tutorial and results files are avaialble for downloading:
+
 
+
* [[Media:Citrus sinensis-orange1.1g015632m.g.iprscan.xml|Citrus sinensis-orange1.1g015632m.g.iprscan.xml]]
+
* [[Media:Citrus sinensis-orange1.1g015632m.g.KEGG.heir.tar.gz|Citrus sinensis-orange1.1g015632m.g.KEGG.heir.tar.gz]]
+
* [[Media:Blastx citrus sinensis-orange1.1g015632m.g.fasta.0 vs uniprot sprot.fasta.out|Blastx citrus sinensis-orange1.1g015632m.g.fasta.0 vs uniprot sprot.fasta.out]]
+
* [[Media:Blastx citrus sinensis-orange1.1g015632m.g.fasta.0 vs nr.out|Blastx citrus sinensis-orange1.1g015632m.g.fasta.0 vs nr.out]]
+
 
+
Download these files to the <tt>/var/www/sites/default/files</tt> directory.  To do so quickly run these commands:
+
 
+
<pre class="enter">
+
cd /var/www/sites/default/files
+
wget http://www.gmod.org/mediawiki/images/0/0c/Citrus_sinensis-orange1.1g015632m.g.iprscan.xml
+
wget http://www.gmod.org/mediawiki/images/1/13/Citrus_sinensis-orange1.1g015632m.g.KEGG.heir.tar.gz
+
wget http://www.gmod.org/mediawiki/images/e/e8/Blastx_citrus_sinensis-orange1.1g015632m.g.fasta.0_vs_uniprot_sprot.fasta.out
+
wget http://www.gmod.org/mediawiki/images/2/24/Blastx_citrus_sinensis-orange1.1g015632m.g.fasta.0_vs_nr.out
+
</pre>
+
 
+
==== Loading BLAST Results ====
+
===== Configuring BLAST Databases =====
+
Now that we have our features loaded we want to add some functional data as well. We need to create a new analysis page for our BLAST results.  The '''Tripal Blast Analysis''' extension module will parse BLAST results and load them into Chado after a BLAST analysis page is created.  However, before we create the page we need to ensure that the BLAST module can properly parse the BLAST hits.  To do this, navigate to '''Tripal''' &rarr; '''Extension Modules''' &rarr; '''Tripal Blast Analyses'''.  On this page will be configuration settings for the Tripal BLAST Analysis extension module. 
+
 
+
This page allows you to specify a different, more meaningful name for the sequence library file (a.k.a. database) used for BLASTing. This name will be displayed with BLAST results. You can also provide regular expressions for parsing BLAST hits.  For example, the following is an line for a match from SwissProt:
+
 
+
<pre class="enter">
+
sp|P43288|KSG1_ARATH Shaggy-related protein kinase alpha OS=Arabidopsis thaliana GN=ASK1 PE=2 SV=3
+
</pre>
+
 
+
Here the hit name is "KSG1_ARATH", the accession is "P43288", the hit description is "Shaggy-related protein kinase alpha OS=Arabidopsis thaliana" and the organism is "Arabidopsis thaliana".  We need regular expressions to tell Tripal how to extract these unique parts from the match text.  Because Tripal is a PHP application, the syntax for regular expressions follows the PHP method.  Documentation for regular PHP expressions can be found [http://php.net/manual/en/reference.pcre.pattern.syntax.php here].  The following regular expressions can be used to extract the hit name, the accession, hit description and organism for the example SwissProt line above:
+
 
+
{| class="wikitable"
+
!Element
+
!RE
+
|-
+
|Hit Name
+
|<pre>^sp\|.*?\|(.*?)\s.*?$</pre>
+
|-
+
|Hit Description
+
|<pre>^sp\|.*?\|.*?\s(.*)$</pre>
+
|-
+
|Hit Accession
+
|<pre>^sp\|(.*?)\|.*?\s.*?$</pre>
+
|-
+
|Hit Organism
+
|<pre>^.*?OS=(.*?)\s\w\w=.*$</pre>
+
|}
+
 
+
 
+
In this tutorial, we will be adding BLAST results for the two databases we created earlier in the tutorial: ExPASy SwissProt and NCBI nr.  First, select ExPASy SwissProt from the drop-down menu.  A form will appear:
+
 
+
[[File:Tripal2.0 Blast settings.png]]
+
 
+
In the form fields, add the following values: 
+
 
+
* Title for the BLAST analysis: (leave blank)
+
* Regular expression for Hit Name: ^sp\|.*?\|(.*?)\s.*?$
+
* Regular expression for Hit Description:  ^sp\|.*?\|.*?\s(.*)$
+
* Regular expression for Hit Accession: ^sp\|(.*?)\|.*?\s.*?$
+
* Regular expression for Organism: ^.*?OS=(.*?)\s\w\w=.*$
+
* Organism Name: (leave blank)
+
 
+
Click '''Save Settings'''.
+
 
+
'''<font color="red">Note:</font>'''  The match accession will be used for building web links to the external database. The accession will be appended to the '''URL Prefix''' set earlier when the database record was first created.
+
 
+
Now select the NCBI nr database from the drop-down and click the radio button.  NCBI databases use a format that is compatible with BLAST. Therefore, the hit name, accession and description are handled differently in the BLAST XML results.  To correctly parse results from an NCBI database click the '''Use Genebank style parser''' checkbox.  This should disable all other fields and is all we need for this database.
+
 
+
===== Load the BLAST Results =====
+
 
+
Now we can create out analysis page. Navigate to '''Create Content''' page and select the '''Analysis: BLAST''' content type.  Add the following values for this analysis.  In the fields set the following values:
+
 
+
* Analysis Name:  blastx Citrus sinensis v1.0 genes vs ExPASy SwissProt
+
* Program: blastall
+
* Program Version: 2.2.25
+
* Algorithm: blastx
+
* Source name:  C. sinensis mRNA vs ExPASy SwissProt
+
* Time Executed:  (today's date)
+
* Materials & Methods:  C. sinensis mRNA sequences were BLAST'ed against the ExPASy SwissProt protein database using a local installation of BLAST on in-house linux server.  Expectation value was set at 1e-6
+
* BLAST Settings
+
** Database: ExPASy SwissProt
+
** BLAST XML File/Directory:  /var/www/sites/default/files/Blastx_citrus_sinensis-orange1.1g015632m.g.fasta.0_vs_uniprot_sprot.fasta.out
+
** Parameters: -p blastx -e 1e-6 -m 7
+
** Submit a job to parse the XML output:  checked
+
** Keywords for custom search: checked
+
 
+
Click the '''Save''' button.  You can now see our new Analysis.
+
 
+
[[File:Tripal2.0_Blast_analysis.png]]
+
 
+
 
+
Now we need to manually run the job to parse the BLAST results:
+
 
+
<pre class="enter">
+
cd /var/www
+
drush trpjob-run administrator
+
</pre>
+
 
+
 
+
The results should now be loaded.  if we visit our feature page, for feature 'orange1.1g015615m' (http://localhost/feature/citrus/sinensis/mRNA/PAC%3A18136219) we should now see BLAST results by clicking the 'Homology' link in the left table of contents.
+
 
+
[[File:Tripal2.0_feature_homology.png]]
+
 
+
 
+
Now we want to add the results for NCBI nr.  Repeat the steps above to add a new analysis with the following details:
+
 
+
* Analysis Name:  blastx Citrus sinensis v1.0 genes vs NCBI nr
+
* Program: blastall
+
* Program Version: 2.2.25
+
* Algorithm: blastx
+
* Source name:  C. sinensis mRNA vs NCBI nr
+
* Time Executed:  (today's date)
+
* Materials & Methods:  C. sinensis mRNA sequences were BLAST'ed against the NCBI nr protein database using a local installation of BLAST on in-house linux server.  Expectation value was set at 1e-6
+
* Blast Settings
+
** Database: NCBI nr
+
** Blast XML File/Directory:  /var/www/sites/default/files/Blastx_citrus_sinensis-orange1.1g015632m.g.fasta.0_vs_nr.out
+
** Parameters: -p blastx -e 1e-6 -m 7
+
** Submit a job to parse the XML output:  checked
+
** Keywords for custom search: checked
+
 
+
Click the '''Save''' button and manually run the job:
+
 
+
<pre class="enter">
+
cd /var/www
+
drush trpjob-run administrator
+
</pre>
+
 
+
Return to the example feature page to view the newly added results: http://localhost/feature/citrus/sinensis/mRNA/PAC%3A18136219
+
 
+
==== Loading InterProScan Results ====
+
Now we want to load results from an InterProScan.  For this tutorial, these results were obtained by using a local installation of InterProScan installed on a computational cluster.  However, you may choose to use Blast2GO or the online InterProScan utility. Results should be saved in XML format.
+
 
+
 
+
To create an analysis, click the '''Add Content''' link in the administrative menu and select the content type '''Analysis: Interpro'''. Add the following values for this analysis
+
 
+
*Analysis Name: InterPro Annotations of C. sinensis v1.0
+
*Program: InterProScan
+
*Program Version: 4.8
+
*Algorithm: iprscan
+
*Source name: C. sinensis v1.0 mRNA
+
*Time Executed: (today's date)
+
*Materials & Methods: C. sinensis mRNA sequences were mapped to IPR domains and GO terms using a local installation of InterProScan executed on a computational cluster.  InterProScan date files used were MATCH_DATA_v32, DATA_v32.0 and PTHR_DATA v31.0.
+
*InterPro Settings
+
** InterProScan XML File/Directory: /var/www/sites/default/files/Citrus_sinensis-orange1.1g015632m.g.iprscan.xml
+
** Check the box 'Submit a job to parse the Interpro XML output'
+
** Check the box 'Load GO terms'
+
** Parameters:  iprscan -cli -goterms -ipr -format xml
+
 
+
 
+
Click the '''Save''' button. You can now see our new Analysis.
+
 
+
[[File:Tripal2.0 feature interpro.png]]
+
 
+
 
+
Now we need to manually run the job to parse the Inetpro results:
+
 
+
<pre class="enter">
+
cd /var/www
+
drush trpjob-run administrator
+
</pre>
+
 
+
 
+
The results should now be loaded. if we visit our feature page, http://localhost/orange1.1g015615m, we should now see interpro results by clicking on the "Interpro Report" link on the right sidebar.
+
 
+
[[File:Tripal2.0_feature_interpro2.png]]
+
 
+
==== Loading KEGG Analysis Results ====
+
Now we want to load results from a KEGG/KAAS analysis (http://www.genome.ad.jp/tools/kaas/). The KAAS server receives as input a FASTA file of sequences and annotates those with KEGG orthologs and pathways.  The tool also generates an heirarchy (heir) output file.  This output  file can be read directly by the Tripal Analysis KEGG module.
+
 
+
To create an analysis,click the '''Add Content''' link in the administrative menu and select the content type '''Analysis: KEGG'''. Add the following values for this analysis
+
 
+
*Analysis Name: KEGG analysis of C. sinensis v1.0
+
*Program, Pipeline Name or Method Name : KEGG Automatic Annotation Server (KAAS)
+
*Program, Pipeline or Method version : 1.64a
+
*Source name: C. sinensis v1.0 genes
+
*Time Executed: (todays date)
+
*Materials & Methods: C. sinensis mRNA sequences were uploaded to the KEGG Automatic Annotation Server (KAAS) where they were mapped to KEGG pathways and orthologs. The SBH (single-directional best hit) was used with the genes data set being the defaults for genes.
+
*KEGG Settings
+
** KAAS hier.tar.gz Output File:  /var/www/sites/default/files/Citrus_sinensis-orange1.1g015632m.g.KEGG.heir.tar.gz
+
** Check the box "Submit a job to parse the kegg output into Chado"
+
 
+
 
+
Click the '''Save''' button. You can now see our new Analysis.
+
 
+
[[File:Tripal2.0_kegg_analysis.png]]
+
 
+
Now we need to manually run the job to parse the KEGG results:
+
 
+
<pre class="enter">
+
cd /var/www;
+
drush trpjob-run administrator
+
</pre>
+
 
+
 
+
A KEGG report is avilable on the analysis and the organism page. Navigate to the ''Citrus sinensis'' organism page and click the '''KEGG Reports''' in the '''Resources''' sidebar.  A page with instructions is visible:
+
 
+
[[File:Tripal2.0_organism_kegg.png]]
+
 
+
 
+
We have already loaded the data, therefore, we only need to popluate the '''kegg_by_organism''' materialized view.  Click the link to populate the view. After populating the view we can now return to the organism page and view the KEGG report:
+
 
+
[[File:Tripal2.0 organism kegg2.png]]
+
 
+
 
+
Site visitors can browse KEGG results by expanding the trees correspoding to the the heirarchy terms.  This same report is also visible on the KEGG analysis page.
+
 
+
==== Viewing Assigned Terms ====
+
 
+
When we imported the InterPro analysis results, the InterPro terms were assigned to the mRNA features. We also requested when we created the InterPro Analysis page that it parse GO terms from the results. As a result, GO terms have also been assigned. Also, when importing the KEGG results, KEGG orthologs and pathways were assigned to features. Therefore, we now have a new '''Annotated Terms''' item in the table of contents.  For our example feature (http://localhost/feature/citrus/sinensis/mRNA/PAC%3A18136219), the results are as follows:
+
 
+
[[File:Tripal2.0_feature_terms.png]]
+
 
+
<font color="red">NOTE: the remainder of this section is not yet accurate for Tripal v2.0-alpha. The GO Reports have not yet been fully ported. Once that work is completed, this part of the tutorial will be updated </font>
+
 
+
Because we now have GO terms associated with features we can setup the GO report that appears on the organism page. Navigate to the Citrus sinensis organism page and click the '''Go Analysis Reports''' in the '''Resources''' sidebar.  A page appears with instructions to the site administrator that gives details to make the report visible.
+
 
+
[[File:Triapl-GO-Report-NotSetup.png|800px]]
+
 
+
Follow the instructions as presented on the page. Briefly, you need to
+
 
+
#  Set the CV term paths for the three GO vocabularies.  This should have been done automatically when you loaded the Gene Ontology earlier in the Tutorial. 
+
#  Populate the '''go_count_analysis''' materialized view.
+
 
+
When complete the following report will be visible:
+
 
+
[[File:Tripal-GO-Report.png|800px]]
+
 
+
The GO report provides pie charts and an expandable tree for browsing results.  Clicking on a GO term in the true will cause a box to appear with details about the term and a link to download a FASTA file of all features annotated with the term.  Notice that the graphs are quite simple and the graph for the '''cellular component''' is missing. This is because we only loaded GO assignments for a single gene.
+
 
+
=== Adding Publications ===
+
Tripal provides an interface for automatically and manually adding publications.  First we will manually add a new publication.  To do this, we must first enable the Tripal Pub module.  We have previously used Drush to install modules in this tutorial and the commands to install the Tripal Pub module are similar.  The Tripal Contact module is a dependency of the Tripal Pub module, so we must enable both:
+
 
+
<pre class="enter">
+
cd /var/www
+
drush pm-enable tripal_contact tripal_pub
+
</pre>
+
 
+
You should see the following output in the terminal:
+
 
+
<pre>
+
The following extensions will be enabled: tripal_pub, tripal_contact
+
Do you really want to continue? (y/n): y
+
tripal_contact was enabled successfully.                                                                  [ok]
+
tripal_pub was enabled successfully.                                                                      [ok]
+
The directory sites/default/files/tripal/tripal_pub has been created.                                      [status]
+
Job 'Load OBO Tripal Publication' submitted.  Check the jobs page for status                              [status]
+
The directory sites/default/files/tripal/tripal_contact has been created.                                  [status]
+
Job 'Load OBO Tripal Contacts' submitted.  Check the jobs page for status                                  [status]
+
</pre>
+
 
+
You will notice that two jobs were submitted.  These jobs will load a contact and publication ontology.  The Tripal Contact and Pub ontologies are custom vocabularies used for organizing information about publications and contact information.  So, before we can add publications (or contacts) we need to run these jobs:
+
 
+
<pre class="enter">
+
cd /var/www
+
drush trpjob-run administrator
+
</pre>
+
 
+
'''Note''':  Always remember to set permissions for any new modules that are installed.
+
 
+
==== Manually Adding a Publication ====
+
Now that the Tripal publication and contact ontologies are loaded we can add publications.  First, we will manually add a publication.  Click the '''Add Content''' link in the administrative menu and then '''Publication'''.
+
 
+
[[File:Tripal2.0_pub_create.png]]
+
 
+
We will add information about the Tripal publication.  Enter the following values:
+
 
+
* Publication Title:  Tripal v1.1: a standards-based toolkit for construction of online genetic and genomic databases.
+
* Publication Type:  Journal Article
+
* Publication Year: 2013
+
* Citation: Sanderson LA, Ficklin SP, Cheng CH, Jung S, Feltus FA, Bett KE, Main D. Tripal: a construction Toolkit for Online Genome Databases. Database, Oct 25 2013. bat075
+
 
+
To further describe the publication we will add all other details as properties.  Select the property in the drop-down, add the text and click the '''add''' button for each of the following properties:
+
* Journal Name:  Database
+
* Abstract: Tripal is an open-source freely available toolkit for construction of online genomic and genetic databases. It aims to facilitate development of community-driven biological websites by integrating the GMOD Chado database schema with Drupal, a popular website creation and content management software. Tripal provides a suite of tools for interaction with a Chado database and display of content therein. The tools are designed to be generic to support the various ways in which data may be stored in Chado. Previous releases of Tripal have supported organisms, genomic libraries, biological stocks, stock collections and genomic features, their alignments and annotations. Also, Tripal and its extension modules provided loaders for commonly used file formats such as FASTA, GFF, OBO, GAF, BLAST XML, KEGG heir files and InterProScan XML. Default generic templates were provided for common views of biological data, which could be customized using an open Application Programming Interface to change the way data are displayed. Here, we report additional tools and functionality that are part of release v1.1 of Tripal. These include (i) a new bulk loader that allows a site curator to import data stored in a custom tab delimited format; (ii) full support of every Chado table for Drupal Views (a powerful tool allowing site developers to construct novel displays and search pages); (iii) new modules including 'Feature Map', 'Genetic', 'Publication', 'Project', 'Contact' and the 'Natural Diversity' modules. Tutorials, mailing lists, download and set-up instructions, extension modules and other documentation can be found at the Tripal website located at http://tripal.info. DATABASE URL: http://tripal.info/.
+
* Publication Date: 2013 Oct 25
+
* Authors: Sanderson LA, Ficklin SP, Cheng CH, Jung S, Feltus FA, Bett KE, Main D
+
 
+
 
+
Next, to link this publication to it's record in PubMed we need to add an entry in the section titled '''Relationships'''.  Add the following
+
* PMID: 24163125
+
 
+
 
+
Now click the '''Save''' button at the bottom
+
 
+
Our publication has been added and you should see the following page:
+
 
+
[[File:Tripal2.0_pub_new.png]]
+
 
+
 
+
Now we have a publication page, but the title links to the PubMed page for the article.  If we want to change this link to be at the online journal. We can edit the publication by clicking the '''Edit''' link and adding a new  property of type '''URL''' with the value: 
+
 
+
<pre>http://database.oxfordjournals.org/content/2013/bat075.long</pre>
+
 
+
After saving the page, the title is now linked to the article on the Journal site rather than the PubMed site.  However, the link to PubMed is still found under the '''Cross References''' link.
+
 
+
==== Searching for Publications ====
+
By default, Tripal provides simple search tools for many data types (e.g. organisms, analyses, features, etc).  These can be found in the menu under '''Search  Data'''.  To search for publications, click the '''Publications''' link under '''Search Data'''.
+
 
+
On the search form, clicking the search button without providing any criteria will provide a list of all publications.  For this tutorial, we only have a single publication:
+
 
+
[[File:Tripal2.0 pub search.png]]
+
 
+
However, you will notice that if you try to select a criteria that nothing is available.  Tripal allows you to set which fields a user can use as criteria. In some cases not all fields will be appropriate given the publications available on the site. All of the properties available when adding a publication can be searched, but some properties like the '''URL''' may not be necessary for searching.  You can specify which fields to use for search criteria by clicking the '''Publication Module Settings Page''' in the administrator information box just above the search form.  At the top of the resulting page you will see a section for '''Searching Options''':
+
 
+
[[File:Tripal2.0 pub search options.png]]
+
 
+
Here you can select which properties a user can use for searching.  For this tutorial, find and check these options:
+
 
+
* Abstract
+
* Authors
+
* Journal Name
+
* Title
+
 
+
Then click the '''Save configuration''' button at the bottom.  If we return to the publication search page, we now have criteria for searching.
+
 
+
==== Bulk Import of Publications====
+
Tripal supports bulk importing of publications from remote databases such as NCBI PubMed and the USDA National Agricultural Library (AGL).  Support of PubMed is built-in to the Tripal module, but support for AGL requires some additional setup on the server.  You can find instructions for preparing the server for AGL on the '''Tripal''' &rarr; '''Chado Modules''' &rarr; '''Publications''' &rarr; '''Help''' page.  For this tutorial we will create an importer for PubMed.
+
 
+
Creation of an importer is an administrative function.  A publication importer is created by the site administrator and consists of a set of search criteria for finding multiple publications at one time.  When the importer is run, it will query the remote database, retrieve the publications that match the criteria and add them to the database.  Because we loaded genomic data for <em>Citrus sinensis</em> we will create an importer that will find all  publications related to this species. 
+
 
+
First, navigate to '''Tripal''' &rarr; '''Chado Modules''' &rarr; '''Publications''' &rarr; '''Publication Importers''' and click the link '''New Importer'''.  You will see the following page:
+
 
+
[[File:Tripal2.0_pub_new_importer.png]]
+
 
+
Enter the following values in the fields:
+
 
+
* Remote Database:  PubMed
+
* Loader Name:  Pubs for Citrus sinensis
+
* Criteria #1:
+
** Scope: Abstract/Title
+
** Search Terms:  Citrus sinensis
+
** is Phrase?: checked
+
 
+
Now, click the 'Test Importer' button.  This will connect to PubMed and search for all publications that match our provided criteria.  On the date this portion of the tutorial was written, 532 publications were found:
+
 
+
[[File:Tripal2.0_pub_new_importer_test.png]]
+
 
+
Now, save this importer.  You should see that we have one importer in the list:
+
 
+
[[File:Tripal2.0_pub_importer_list.png]]
+
 
+
We can use this importer to load all 532 publications related to <em>Citrus sinensis</em> from PubMed into our database (how to load these will be shown later).  However,  what if new publications are added?  We would like this importer to be run monthly so that we can automatically add new publications as they become available.  But we do not need to try to reload these 532 again. So, we will create a new importer that only finds publications within the last 30 days.  To do this, click the link '''New Importer'''.  Now, add the following criteria:
+
 
+
* Remote Database:  PubMed
+
* Loader Name:  Pubs for Citrus sinensis last 30 days
+
* Days since record modified: 30
+
* Criteria #1:
+
** Scope: Abstract/Title
+
** Search Terms:  Citrus sinensis
+
** is Phrase?: checked
+
 
+
Now, when we test the importer we find only 1 publications that have been add (created) in PubMed in the last 30 days:
+
 
+
[[File:Tripal2.0_pub_new_importer_test30.png]]
+
 
+
 
+
Save this importer. 
+
 
+
Next, there are two ways to import these publications.  The first it to manually import them.  There is a Drush command that is used for importing publications.  Return to the terminal and run the following command:
+
 
+
<pre class="enter">
+
cd /var/www
+
drush tpubs-import
+
</pre>
+
 
+
You should see output to the terminal that begins like this:
+
 
+
<pre>
+
NOTE: Loading of publications is performed using a database transaction.
+
If the load fails or is terminated prematurely then the entire set of
+
insertions/updates is rolled back and will not be found in the database
+
 
+
Importing: Pubs for Citrus sinensis
+
</pre>
+
 
+
And as publications are imported each one is printed to the screen. The importer will pause while it requests 100 publications. It will then load those, then pause to request another 100 until it imports all publications that match the criteria.
+
 
+
Some things to know about the publication importer:
+
# The importer keeps track of publications from the remote database using the publication accession (e.g. PubMed ID).
+
# If a publication with an accession (e.g. PubMed ID) already exists in the local database, the record will be updated.
+
# If a publication in the local database matches by title, journal and year with one that is to be imported, then the record will be updated.  You can change the requirement of which fields to match at the '''Tripal''' &rarr; '''Chado Modules''' &rarr; '''Publications''' &rarr; '''Settings''' page. On the settings page, look for the '''Import Settings''' section.
+
 
+
 
+
The second way to import publications is to add an entry to the UNIX cron.  We did this previously for the Tripal Jobs management system when we first installed Tripal.  We will add another entry for importing publications.  But first, now that we have imported all of the relevant pubs, we need to return to the importers list at '''Tripal''' &rarr; '''Chado Modules''' &rarr; '''Publications''' &rarr; '''Publication Importers''' and disable the first importer we created.  We do not want to run that importer again, as we've already imported all historical publications on record at PubMed.  Click the edit button next to the importer named <tt>Pubs for Citrus sinensis</tt>, click the '''disable''' checkbox and then save the template.  The template should now be disabled.
+
 
+
[[File:Tripal2.0_pub_impoter_disabled.png]]
+
 
+
Now we have the importer titled <tt>Pubs for Citrus sinensis last 30 days</tt> enabled. This is the importer we want to run on a monthly basis.  The cron entry will do this for us.  On the terminal open the crontab with the following command:
+
 
+
<pre class="enter">
+
sudo crontab -e
+
</pre>
+
 
+
Now add the following line to the bottom of the crontab:
+
 
+
<pre class="enter">
+
30 8 1,15 * *  su - www-data -c '/usr/local/drush/drush -r /var/www -l http://[site url] tpubs-import --report=[your email] > /dev/null'
+
</pre>
+
 
+
Where
+
* [site url] is the full URL of your site
+
* [your email] is the email address of the user that should receive an email containing a list of publications that were imported.  You can separate multiple email addresses with a comma.
+
 
+
The cron entry above will launch the importer at 8:30am on the first and fifteenth days of the month.  We will run this importer twice a month in the event it fails to run (e.g. server is down) at least one time during the month.
+
 
+
=== Drupal Views Integration ===
+
[https://drupal.org/project/views Drupal Views] is a powerful 3rd-party module that allows an authorized user to query database tables in novel and unique ways and to create custom pages and search forms. Tripal has fully integrates the Chado database tables with Drupal Views.
+
 
+
==== Adding a New View ====
+
As an introduction to Drupal Views, we will create a new page without any PHP or HTML programming.  To create a new view navigate to '''Structure''' &rarr; '''Views''' and click the '''Add new view'' link near the top of the page.  Here you see brief configuration page: 
+
 
+
[[File:Tripal2.0 views new.png]]
+
 
+
Suppose we want to create a '''Species''' page for our site that lists all of the available species that our site houses. Tripal provides "teasers" for all of its pages.  A teaser is a brief set of contents about a page.  So, we want the '''Species''' page to be a list of organism page teasers.  To do this, enter the following
+
 
+
# View name:  All species
+
# Show:  Content
+
# of type: Organism
+
# sorted by: Title
+
# Page title: Species
+
# Path:  http://localhost/species
+
# Display format: Unformatted, list of teasers, with links, without comments
+
# Items to Display: 10
+
# Use a Pager: checked
+
# Create a menu link: checked
+
##  Menu: Main menu
+
##  Link text: Species
+
 
+
Despite that we will be creating a list of species, we select '''Show''' as '''Content'''.  This is because all nodes (nodes are pages in Drupal lingo) have teasers.  So, we want to use the node teasers.  If however, we wanted to use content directly from the Chado organism table then we would have selected  '''Chado Organism'''.  The settings above also specify a page title, a filter to include only nodes (pages) of the Organism type, a URL for the page and details for adding the page to the main menu of the site.
+
 
+
Now we see the same form we saw before when editing an existing view, but there are no fields, sort criteria, or filters for this view.  The first thing we want to do is indicate to the view that we will be using Node teasers.  To do this, click the '''Row style''' link in the '''Basic Settings''' section.  A box appears towards the bottom with radio buttons.  Click the '''Save & Exit''' button.  There should now be a link in the main menu at the top left of the page titled '''Species'''.  If we click that link we can now see our new species page:
+
 
+
[[File:Tripal2.0 views species.png]]
+
 
+
==== Editing Existing Views ====
+
Many of the basic search tools available under the '''Search Data''' menu item are Drupal Views and hence can be customized using the '''Views''' interface.  As a brief introduction to views we will examine one of these views and customize it.  In order to use Views you need a basic understanding of the Chado tables.
+
 
+
First, navigate to '''Structure''' &rarr; '''Views'''
+
 
+
[[File:Tripal2.0_views1.png]]
+
 
+
Here we see the list of the views that have already been created.  All of the tripal views were created automatically by Tripal modules when they were enabled.  At the bottom of the page are inactive views that come by default with the Views module.  Scroll to the view titled '''Feature User Search''' and click the 'Edit' button to the right of it.  You will see the following page:
+
 
+
[[File:Tripal2.0_views_feature_search.png]]
+
 
+
On this page you will see several sections:  Title, Format, Fields, Filter Criteria, Sort Criteria, Page Settings and a few others.  This tutorial will not describe all of these settings.  There are tutorials on the web that better explains these fields.  For this tutorial we will discuss only a few of them. 
+
 
+
Breifly, the '''Fields''' section lists the fields that will be used for the view. For example, in this view we will be using the feature's uniquename, name, type, organism common name and a few other fields from the Chado feature table.  We also have the node ID of the Drupal node that corresponds to the feature.
+
 
+
The '''Sort criteria''' section lists the order in which results will be shown.  The results will be sorted by organism common name, feature type and feature name, in that order.
+
 
+
The '''Filters''' section provides a set of criteria for limiting which records will be shown.  We want to limit the results by common name, feature type and feature name.  The filters are used to create the search form at the top of the features search page:
+
 
+
 
+
Suppose we do not like the way our search pages behaves.  Currently, for features, when search results are returned we see the unique name, name, feature type, common name of the organism, sequence length, if the sequence is obsolete and the date it was accessioned.  We do not want the sequence length, or if the sequence is obsolete or the date it was accessioned to appear in our search results.  And, suppose we want the genus and species to be listed instead of the common name.  We can make these changes by editing the view settings.
+
 
+
First, we need to add the genus and species as fields to show.  To do this, click '''Add'' in the header of the '''Fields''' section.  In the overlay a list of available fields will appear:
+
 
+
[[File:Tripal2.0 views fields.png]]
+
 
+
In the '''Filter''' Drop down you can see the '''Groups''' of fields associated with features. These groups correspond to Chado tables. Select the group '''Chado Organism'''.  This reduces the list of fields to only those associated with the Chado orgnaism table.  Click the checkbox for '''Chado Organism: Genus''' and '''Chado Organism: Species''' and click '''Apply (all displays)'''.
+
 
+
Next we see a set of configuration settings for the fields we selected.  In this case, we see configuration settings for the '''Genus''' field first.
+
 
+
[[File:Tripal2.0_views_fields_configure.png]]
+
 
+
Views will let you control a lot of how the field is seen (or not seen) in the page results.  Here we want to leave all the defaults.  click the '''Apply (all displays)''' button at the bottom.  The configuration settings for the '''Species''' field now appears.  Leave the defaults as well and click '''Apply (all displays)'''.  Now that we are done configuring our new fields we can preview the changes by viewing the example at the bottom of the page.  Below is a screen shot of the view after we have added our new fields:
+
 
+
[[File:Tripal2.0 views preview.png]]
+
 
+
Now, we want to remove the unwanted fields.  In the '''Fields''' section, click on the field named '''Chado Feature:  Seqlen (Sequence Length)'''.  A configuration page appears similar to what we saw previously for '''Genus''' and '''Species'''.  Check the box '''Exclude from display'''.  This will leave the field present in the list of fields but will not show it in the resulting view.  Click the '''Apply (all displays)''' button.    Do the same for the common name, and is obsolete fields.  Alternatively, if we no longer want to keep this field in the view, then we could remove the field by clicking the '''Remove''' button. Our view now appears as follows:
+
 
+
[[File:Tripal2.0 views preview2.png]]
+
 
+
Next, we want to limit results to only those with synced pages.  Currently any feature present in the database is visible in the list, but we only have pages for genes and mRNA features.  To limit the results only to synced features first locate the '''Filter Criteria''' section and click the '''Add'' button in the header.  In the '''Filter''' dropdown, select '''Content'''.  The list of available fields for a Drupal node are available to select. All nodes must have a node ID.  Therefore, we can filter results to exclude those that do not have a node ID.  Look for the element titled '''Content: Nid''', select it, and click the '''Apply (all displays''' button.  A configuration screen appears:
+
 
+
[[File:Tripal2.0 views filters.png]]
+
 
+
We want to only show features that have a page which means that the Node ID (Nid) must not be empty.  Therefore, select the '''is not empty(NOT NULL)''' from the '''operator''' dropdown and click '''Apply (all displays)'''.  Now, only synced features (those with pages) will appear in the list.
+
 
+
[[File:Tripal2.0_views_preview3.png]]
+
 
+
Next, in the view preview the '''Type Id''' dropdown field used for filtering the results by feature type contains all available feature types. However, because we have limited our view to only show features with pages we also need to limit this list.  We can easily do this by clicking the '''Chado Feature: Type Id''' field in the '''Filter Criteria''' section.  On the configuration overlay that appears, click the '''Grouped Filters''' radio button under '''Filter type to expose'''.  A new box appears that allows you to add the values that should appear in the drop down.  We will limist search capabilities to just mRNA and genes as these are the two types of features that we have pages for.  Add these two values as shown in the screen shot below and click the '''Apply (all displays''' button.
+
 
+
[[File:Tripal2.0 views filters2.png]]
+
 
+
Now, the '''Type Id''' dropdown only contains gene and mRNA.  We can similarly limit the values that appear in the '''Organism Common Name''' dropdown as well. 
+
 
+
 
+
If we click the '''Save''' button at the top right of the page, then this will save all of the changes we have made to the view and the '''Features''' search page under '''Search Data''' will be updated.
+
 
+
We have only touched on a few of the capabilities of the Views interface.  You can create advanced looking forms and pages using Views.
+
 
+
=== Customizing The Look-and-Feel of Tripal ===
+
The default look-and-feel of data presented by Tripal is set in Drupal-style template files.  These template files can be found inside of the Tripal theme and '''theme''' folder of the Tripal Extension modules.  Drupal allows you to customize the templates.  For this tutorial we will not cover customization of template files. However, a tutorial for customizing the look-and-feel of the site using templates can be found in the [http://gmod.org/wiki/Tripal_Developer%27s_Handbook|Tripal Developers Handbook].
+
 
+
=== Using the Bulk Loader ===
+
The bulk loader is a tool that Tripal provides for loading of data contained in tab dilimeted files.  So far we have loaded files in standard file formats (e.g. FASTA, GFF, OBO), but Chado can support a variety of different biological data types and there are often no community standard file formats.  For example, there is no file format for importing genotype and phenotype data.  That data can be stored in the feature, stock and natural diversity tables of Chado.  As another example, there is also no file format for bulk loading of organisms.  The Bulk Loader was introduced in Tripal v1.1 and provides a web interface for building custom data loader.  The site developer creates the bulk loader "template".  This template can then be used and re-used for any tab dilimeted file that follows the format described by the template.  Additionally, bulk loading templates can be exported allowing Tripal sites to share loaders with one another. 
+
 
+
To use the bulk loader you must be familiar with the Chado database schema and have an idea for where data should be stored.  It is best practice to consult the GMOD website or consult the Chado community (via the [https://lists.sourceforge.net/lists/listinfo/gmod-schema gmod-schema mailing list]) when deciding how to store data. 
+
 
+
<font color="red">'''Note:''' The bulk loader images shown here are from Tripal v2.0-alpha which looks very similar to the bulk loader in Tripal v1.1.  There are plans to simplify the interface before a stable release of Tripal v2.  Therefore, keep in mind that the interface may change</font>
+
 
+
This tutorial will show a brief example of how to use the Tripal bulk loader to import a list of organisms and associate them with their NCBI taxonomy IDs.  The input file we will use contains the list of all '''Fragaria''' (strawberry) species in NCBI at the time of the writing of this tutorial.  Click the file and download
+
 
+
* [[File:Fragaria.txt]]
+
 
+
Download this file to the /var/www/sites/default/files. The quickest method is to right-click on the link above to copy the URL location, then use wget to retrieve the file:
+
 
+
<pre class="enter">
+
cd /var/www/sites/default/files
+
wget http://gmod.org/mediawiki/images/a/a9/Fragaria.txt
+
</pre>
+
 
+
This file has three columns:  NCBI taxonomy ID, genus and species:
+
 
+
<pre>
+
3747    Fragaria        x ananassa             
+
57918  Fragaria        vesca         
+
60188  Fragaria        nubicola               
+
64939  Fragaria        iinumae       
+
64940  Fragaria        moschata               
+
64941  Fragaria        nilgerrensis           
+
64942  Fragaria        viridis       
+
</pre>
+
 
+
We want to add these species to Chado, and we want to associate the NCBI taxonomy with these organisms.  The first step is to decide where in Chado these data should go.  In Chado, organisms are stored in the '''organism''' table.  This table has the following fields:
+
 
+
{| border="1" cellpadding="3"
+
|+ organism Structure
+
|-
+
! FK
+
! Name
+
! Type
+
! Description
+
|- class="tr0"
+
|
+
| organism_id
+
| serial
+
| '' PRIMARY KEY ''
+
|- class="tr1"
+
|
+
| abbreviation
+
| character varying(255)
+
| '' ''
+
|- class="tr0"
+
|
+
| genus
+
| character varying(255)
+
| '' UNIQUE#1 NOT NULL ''
+
|- class="tr1"
+
|
+
| species
+
| character varying(255)
+
| '' UNIQUE#1 NOT NULL ''<br /><br />A type of organism is always uniquely identified by genus and species. When mapping from the NCBI taxonomy names.dmp file, this column must be used where it is present, as the common_name column is not always unique (e.g. environmental samples). If a particular strain or subspecies is to be represented, this is appended onto the species name. Follows standard NCBI taxonomy pattern.
+
|- class="tr0"
+
|
+
| common_name
+
| character varying(255)
+
| '' ''
+
|- class="tr1"
+
|
+
| comment
+
| text
+
| '' ''
+
|}
+
 
+
We can therefore store the second and third columns of the tab-delimited input file in the '''genus'' and '''species''' columns of the organism table.  In order to store a database external reference (such as for the NCBI Taxonomy ID) we need to use the following tables:  '''db''', '''dbxref''', and '''organism_dbxref''''.    The '''db''' table will house the entry for the NCBI Taxonomy; the '''dbxref''' table will house the entry for the taxonomy ID and the '''organism_dbxref''' table will link the taxonomy ID stored in the '''dbxref''' table with the organism housed in the '''organism''' table.  For reference, the fields of these tables are as follows:
+
 
+
 
+
{| border="1" cellpadding="3"
+
|+ db Structure
+
|-
+
! F-Key
+
! Name
+
! Type
+
! Description
+
|- class="tr0"
+
|
+
| db_id
+
| serial
+
| '' PRIMARY KEY ''
+
|- class="tr1"
+
|
+
| name
+
| character varying(255)
+
| '' UNIQUE NOT NULL ''
+
|- class="tr0"
+
|
+
| description
+
| character varying(255)
+
| '' ''
+
|- class="tr1"
+
|
+
| urlprefix
+
| character varying(255)
+
| '' ''
+
|- class="tr0"
+
|
+
| url
+
| character varying(255)
+
| '' ''
+
|}
+
 
+
 
+
{| border="1" cellpadding="3"
+
|+ dbxref Structure
+
|-
+
! F-Key
+
! Name
+
! Type
+
! Description
+
|- class="tr0"
+
|
+
| dbxref_id
+
| serial
+
| '' PRIMARY KEY ''
+
|- class="tr1"
+
|
+
[[Chado_Tables#Table:_db| db]]
+
| db_id
+
| integer
+
| '' UNIQUE#1 NOT NULL ''
+
|- class="tr0"
+
|
+
| accession
+
| character varying(255)
+
| '' UNIQUE#1 NOT NULL ''<br /><br />The local part of the identifier. Guaranteed by the db authority to be unique for that db.
+
|- class="tr1"
+
|
+
| version
+
| character varying(255)
+
| ''<nowiki> UNIQUE#1 NOT NULL DEFAULT ''::character varying </nowiki>''
+
|- class="tr0"
+
|
+
| description
+
| text
+
| '' ''
+
|}
+
 
+
 
+
{| border="1" cellpadding="3"
+
|+ organism_dbxref Structure
+
|-
+
! FK
+
! Name
+
! Type
+
! Description
+
|- class="tr0"
+
|
+
| organism_dbxref_id
+
| serial
+
| '' PRIMARY KEY ''
+
|- class="tr1"
+
|
+
[[Chado_Tables#Table:_organism| organism]]
+
| organism_id
+
| integer
+
| '' UNIQUE#1 NOT NULL ''
+
|- class="tr0"
+
|
+
[[Chado_Tables#Table:_dbxref| dbxref]]
+
| dbxref_id
+
| integer
+
| '' UNIQUE#1 NOT NULL ''
+
|}
+
+
 
+
Before we can use the bulk loader we must enable the module. We can do this using a typical drush command:
+
 
+
<pre class="enter">
+
cd /var/www/
+
drush pm-enable tripal_bulk_loader
+
</pre>
+
 
+
To create a bulk loader template, navigate to '''Tripal''' &rarr; '''Chado Data Loaders''' &rarr '''Bulk Loader''' &rarr '''Templates''' and click the link '''Add Template'''. The following page appears:
+
 
+
[[File:Tripal2.0 bulk loader new.png]]
+
 
+
We need to first provide a name for our template.  Try to name templates in a way that are meaningful for others. Currently only site administartors can load files using the bulk loader.  But, future versions of Tripal will provide functionality to allow other priviledged users the ability to use the bulk loader templates.  Thus, it is important to name the templates so that others can easily identify the purpose of the template. For this example, enter the name '''NCBI Taxonomy Importer (taxid, genus, species)'''. The following page appears:
+
 
+
[[File:Tripal2.0 bulk loader create1.png]]
+
 
+
=== Advanced Features ===
+
==== The Tripal Bulk Loader ====
+
 
+
 
+
 
+
The Tripal Bulk Loader is a new feature added to version 1.0.  Often, data is not in common formats such as GFF, FASTA, GAF, InterPro XML, etc., but rather in Excel spreadsheets or tab-delimited or comma-separted files. The goal of the bulk loader is to enable a user to load data in these formats into the Chado schema. Currently, the bulk loader allows a site administrator to create custom loader templates that will allow a user to load tab-delimited files of any format.
+
 
+
Using the bulk loader web-interface, the priviledged user creates a "template" for loading a tab-delimited file.  This templates specifies which fields in the Chado tables the values in the tab-delimited file will be stored.  Once the template is fully defined, the priviledge user saves the template for other users to use.  Another user can then load any tab-delimited files that matches the template.  The user can upload as many files as desired.
+
 
+
==== Creating Custom Modules ====
+
 
+
As mentioned early in the Tutoral, Tripal is a modular software package.  A Tripal API has been developed to help others who want to extend the functionality of Tripal. Anyone is welcome to develop modules for Tripal to suit their own needs and perhaps share them back with the community. The Tripal API can be found on the home page: http://tripal.sourceforge.net/. Information for developing new module can be found in the [[Tripal Developer's Handbook]].
+
 
+
 
+
Modules that conform the the Tripal API and Drupal coding standards will be officially approved by the Tripal Developers Consortium.  These modules will be listed on the Tripal website and will be available in the Drupal module repository for download.
+
 
+
 
+
Anyone wishing to extend Tripal should sign up for the developers mailing list https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel and try to attend one of the monthly developer's meetings to discuss the desired extensions.
+

Latest revision as of 02:10, 29 August 2014

The Tripal v2.0 tutorial is now found at the Tripal website inside of the Tripal User's Guide.