GMOD

GMOD Malaysia 2014/Tripal Tutorial

This Tripal tutorial was presented by Scott Cain at GMOD Malaysia 2014, February 2014. This tutorial requires Tripal version 1.1.

The most recent Tripal tutorial can be found at the Tripal Tutorial page.

This tutorial uses the AWS AMI ‘ named ‘ in the ‘

.

Welcome to the Tripal v1.1 Tutorial. Here you will find instructions for installation, usage and administration of a Tripal-based genome website. This tutorial guides the user through the process of installation, setup and data loading of genomic feature data and annotations.

Note: Tripal is provided free of charge, as-is with no warranty or guarantee of fitness. The developers are committed to creating a platform usable by all and as bug free as possible. However, bugs may be present, especially with the newest features. If you find problems or bugs, please feel free to report them either via the Tripal mailing List or adding a bug report on the Tripal issues tracker.

Contents

What is Tripal

Tripal is a suite of PHP5 modules that bridges the Drupal Content Managment System (CMS) and GMOD Chado. The goal is to simplify construction of a community genomics website to enable individual labs or research communities to construct a high-quality, standards-based website for data sharing and collaboration.

600px-WhatisTripal.png

Content Management System

Definition From Wikipedia:

A content management system (CMS) is the collection of procedures used to manage work flow in a collaborative environment. These procedures can be manual or computer-based. The procedures are designed to do the following:

In a CMS, data can be defined as nearly anything: documents, movies, pictures, phone numbers, scientific data, and so forth. CMSs are frequently used for storing, controlling, revising, semantically enriching, and publishing documentation. Serving as a central repository, the CMS increases the version level of new updates to an already existing file. Version control is one of the primary advantages of a CMS.

Drupal

Drupal is an open-source freely available CMS with thousands of users and existing sites. Features of Drupal

Drupal website: http://www.drupal.org Drupal modules: http://www.drupal.org/project/modules Drupal themes: http://www.drupal.org/project/themes

Tripal v1.0 is compatible with Drupal v6. This is the final Drupal 6 compatible version. Later releases will be compatible with Drupal v7.

Chado

You can find more detailed information about Chado here: Chado - Getting Start. However, one thing to remember in regards to Tripal organization is that Chado has a modular structure:

Tripal is also modular along these same designations.

Goals of Tripal

Structure of Tripal

Tripal is a collection of modules that integrate with Drupal. These modules are divided into hierarchical categories:

TripalLayers.png

The Tripal Core level contains supportive functionality for all other modules. It contains

The Chado-centric modules provide:

Analysis modules provide

Applications:

Sites Running Tripal

Site Name URL
Banana Genome Hub <a href=”http://banana-genome.cirad.fr/” class=”external free”
rel=”nofollow”>http://banana-genome.cirad.fr/</a>  
Pulse Crops Genomics & Breeding <a href=”http://knowpulse2.usask.ca/portal/” class=”external free”
rel=”nofollow”>http://knowpulse2.usask.ca/portal/</a>  
Genome Database for Vaccinium <a href=”http://www.vaccinium.org” class=”external free”
rel=”nofollow”>http://www.vaccinium.org</a>  
Genome Database for Rosacaee <a href=”http://www.rosaceae.org” class=”external free”
rel=”nofollow”>http://www.rosaceae.org</a>  
Cool Season Food Legume Database <a href=”http://www.gabcsfl.org” class=”external free”
rel=”nofollow”>http://www.gabcsfl.org</a>  
Cacao Genome Database <a href=”http://www.cacaogenomedb.org” class=”external free”
rel=”nofollow”>http://www.cacaogenomedb.org</a>  
Fagaceae Genome Web <a href=”http://www.fagaceae.org” class=”external free”
rel=”nofollow”>http://www.fagaceae.org</a>  
Citrus Genome Database <a href=”http://www.citrusgenomedb.org” class=”external free”
rel=”nofollow”>http://www.citrusgenomedb.org</a>  
Marine Genomics Project <a href=”http://www.marinegenomics.org” class=”external free”
rel=”nofollow”>http://www.marinegenomics.org</a>  

Resources

The Tripal home site where you can find everything about Tripal: http://tripal.info

GMOD Tripal mailing lists: http://gmod.org/wiki/GMOD_Mailing_Lists

GMOD Tutorials from previous GMOD schools: http://gmod.org/wiki/Tripal

Contributing Organizations

Individuals from these organizations have provided design and coding for Tripal v1.1

150px-USLogo.png 150px-WSULogo.png

Acknowledgments are extended to the Clemson University Genomics Institute where Tripal was first conceived and where development of earlier releases was performed.

150px-CUGILogo.png

Also, special thanks are extended to the GMOD project for logistical support and community interaction!!

Funding

Funding for Tripal v1.0 has been provided through various grants from various sources.

Publications

  1. Lacey-Anne Sanderson, Stephen. P. Ficklin, Chun-Huai Cheng, Sook Jung, Frank A. Feltus, Kirstin E. Bett, Dorrie Main. Tripal v1.1: a Standards-based Platform for Construction of Online Genetic and Genomic Databases”. Submitted for review June 2013.
  2. Stephen P. Ficklin, Lacey-Anne Sanderson, Chun-Huai Cheng, Margaret Staton, Taein Lee, Il-Hyung Cho, Sook Jung, Kirstin E Bett, Dorrie Main. Tripal: a construction Toolkit for Online Genome Databases. Database, Sept 2011. Vol 2011.

Pre-planning

IT Infrastructure

Tripal requires a server with adequate resources to handle the expected load and systems administration skills to get the machine up and running, applications installed and everything properly secure. Tripal requires a PostgreSQL databases server, Apache (or equivalent) web server, PHP5 and several configuration options to make it all work. However, once these prerequisites are met, working with Drupal and Tripal are quite easy.

There are four ways you could get a Tripal/Drupal/Chado database web server available for your site

  1. Option #1 In-house dedicated servers: You may have access to servers in your own department or group which you have administrative control and wish to install Tripal/Drupal/Chado on these.
  2. Option #2 Institutional IT support: Your institution may provide IT servers and would support your efforts to install a website with database backend.
  3. Option #3 Commercial web-hosting: If options #1 and #2 are not available to you, commercial web-hosting is an affordable option. For large databases you may require a dedicated server. Bluehost.com is a web hosting service that provides hosting compatible with Drupal, Tripal and its dependencies.
  4. Option #4 In the Cloud: Tripal is a part of the GMOD in the cloud Amazon AWS image created by GMOD. It is also accompanied by other GMOD tools such as GBrowse2, JBrowse, Apollo and WebApollo.

After selection of one of the options above you can arrange your database/webserver in the following ways:

  1. Arrangement #1: The database and web server are housed on a single server. This is the approach taken by this course. It is necessary to gain access to a machine with enough memory (RAM), hard disk speed and space, and processor power to handle both services.
  2. Arrangement #2: The database and web server are housed on different servers. This provides dedicated resources to each service (i.e. web and database).

Selection of an appropriate machine

Databases are typically bottle-necked by RAM and disk speed. Selection of the correct balance of RAM, disk speed, disk size and CPU speed is important and dependent on the size of the data. The best advice is to consult an IT professional who can recommend a server installation tailored for the expected size of your data.

Note: Tripal does require command-line access to the web server with adequate local file storage for loading of large data files. Be sure to check with your service provider to make sure command-line access is possible.

Technical Skills

Depending on your needs, you may need additional Technical support….

Tripal already supports my data, what personnel to I need to maintain it?

Tripal does not yet support all of my data, but I want to use what’s been done and expand on it….?

Why Use Tripal

Tripal v1.1 provides default views for most Chado data types. It also support all of Chado in terms of data access. So why use Tripal?

  1. You want to use a community-supported common database infrastrcure (i.e. Chado).
  2. You need a web interface but do not want to build one from scratch.
  3. You need content-management capabilities (distributed content editing, user management, social networking… i.e. Drupal)
  4. You want to contribute to a community effort to help build a tool others can use.
  5. You want to participate in a community with other database developers using the same technology and confronting similar problems.
  6. You want to use open-source and free technology!

Development and Production Instances

It is recommended that you have separate development and production instances of Tripal. The staging or development instance allows you to test new functionality, add customizations, or test modification or additions to data without disturbing the production instance. The production instance serves content to the rest of the world. Once you are certain that customizations and new functionality will work well on the development instance you can easily re-implement or copy these over to the production site. Sometimes it may take a few trials to load data in the way you want. A development sites lets you take time to test data loading prior to making it public. The development site can be password-protected to allow only access only to site administrators, developers or collaborators.

Server Installation

The following instructions are for setup of Tripal on an Ubuntu version 12.04 server edition. When possible, alternative command-line statements have been added to this tutorial as users of other Linux version have provided feedback. Unless specifically identified, all commands are for Ubuntu 12.04 linux.

During installation of the Ubuntu 12.04 server please select the following software packages for installation:

After installation the Ubuntu Unity Desktop can be installed. For the virtual machine image that accompanies this tutorial, the following command was issued to install the desktop:

sudo apt-get install ubuntu-desktop

Reboot your server after installation of the Ubuntu Desktop.

Apache Setup

Apache is the web server software. Apache should be installed. On the Ubuntu server, navigate to your new website using this address: http://localhost/. You should see the text “It works!”.

ItWorks.png

Enable the rewrite module for apache. This is useful so that we can use Clean URLs with Drupal. Clean URLs are not required but make the page URLs easier to use. We’ll discuss clean URLs later.

   cd /etc/apache2/mods-enabled
   sudo ln -s ../mods-available/rewrite.load

Next we need to edit the apache configuration file to give Drupal full controls of options within the directory root. Edit the /etc/apache2/sites-available/000-default file:

   cd /etc/apache2/sites-available/
   sudo gedit default

And change the AllowOverride option from None to All:

   <Directory /var/www/>
      Options Indexes FollowSymLinks MultiViews
      AllowOverride All
      Order allow,deny
      allow from all
   </Directory>

Now restart your apache again.

sudo /etc/init.d/apache2 restart

Setup PHP

PHP comes loaded onto the server, but we need a few other packages:

Ubuntu.jpg First install php5 using Ubuntu’s apt-get utility:

   sudo apt-get install php5-pgsql
   sudo apt-get install php5-gd

For newer versions of Ubuntu (e.g. 13.10) you will also want to install the php5-json package:

   sudo apt-get install php5-json

Suse.png For Suse Linux you may need to install the php-posix package:

   yum install php-posix

Red hat logo big.jpg For RedHat Linux you may also need to install the php-process package:

   yum install php-process

Change some php settings (as root):

   cd /etc/php5/apache2
   sudo gedit php.ini

Set the memory_limit to something larger than 128M (should not exceed physical memory, be conservative but not too much so):

  memory_limit = 2048M;

Now, restart the webserver:

  sudo /etc/init.d/apache2 restart

Install phpPgAdmin

phpPgAdmin is a nice web-based utility for easy administration of a PostgreSQL database. Note: PhpPgAdmin is not required for successful operation of Tripal but is very useful.

   sudo apt-get install phppgadmin

Next, we need to make changes to the configuration settings so that we can remotely access phppgadmin. To do this, edit the phppgadmin config file for apache:

   cd /etc/apache2/conf.d
   sudo gedit phppgadmin

Now, comment out the line that allows access to the local server only, and uncomment the line that allows access to anyone.

#allow from 127.0.0.0/255.0.0.0 ::1/128
allow from all

We also want to password protect PhpPgAdmin using Apache’s access control mechanisms. we need to instruct Apache to use password protection for PhpPgAdmin. To do this add the following lines within the Directory stanza just below the line we just uncommented:

AuthType Basic
AuthName "Password Required"
AuthUserFile /usr/share/phppgadmin/.htpasswd
Require User tripaladmin

The lines above instruct apache to use basic authentication, that the password file is located at /usr/share/phppgadmin/.htpasswd and the only user allowed to login is ‘tripaladmin’. Save the configuration file. Next we need to create the password file:

   cd /usr/share/phppgadmin
   sudo htpasswd -c .htpasswd tripaladmin

The htpasswd command above will create the .htpasswd file and add the new user ‘tripaladmin’. You will need to set a password when requested. Finally, restart the webserver:

sudo /etc/init.d/apache2 restart

Now navigate to the URL [http://localhost/phppgadmin] and you should see the following:

Phppgadmin.png

The username ‘tripaladmin’ and the password you specified should be required when accessing the PhpPgAdmin page.

Database Setup

Drupal can run on a MySQL or PostgreSQL database but Chado prefers PostgreSQL so that is what we will use for both Drupal and Chado. We need to create the Drupal database. The following command can be used to create a new database user and database.

First, become the ‘postgres’ user:

sudo su - postgres

Next, create the new ‘drupal’ user account. This account will not be a “superuser’ nor allowed to create new roles, but should be allowed to create a database.

createuser -P drupal

When requested, enter an appropriate password:

  Enter password for new role: *********
  Enter it again:  *********
  Shall the new role be a superuser? (y/n) n
  Shall the new role be allowed to create databases? (y/n) y
  Shall the new role be allowed to create more new roles? (y/n) n

Finally, create the new database:

createdb drupal -O drupal

We no longer need to be the postgres user so exit

exit

Install Drupal

Software Installation

We want to install Drupal into our web document root (/var/www). We want to be able to do this as our ‘ubuntu’ user. So, first, set the directory permissions to allow this:

  cd /var
  sudo chown -R ubuntu www
  sudo chgrp -R ubuntu www

In the command above we set the owner and group of the directory to be ubuntu (our user group).

Tripal currently requires version 6.x of Drupal. Drupal can be freely downloaded from the http://www.drupal.org website. At the writing of this Tutorial the most recent version of Drupal 6 is version 6.28. The software can be downloaded manually from the Drupal website through a web browser or we can use the ‘wget’ command to retrieve it:

   cd /var/www
   wget http://ftp.drupal.org/files/projects/drupal-6.28.tar.gz

Note: The current version of Drupal is Drupal 7.x. The major release v1.1 of Tripal is the final major release that will be compatible with Drupal 6.x. Future major releases of Tripal will be compatible with Drupal 7.x.

Next, we want to install Drupal. We will use the tar command to uncompress the software:

  cd /var/www
  tar -zxvf drupal-6.28.tar.gz

Notice that we now have a drupal-6.28 directory with all of the Drupal files. We want the Drupal files to be in our document root, not in a ‘drupal-6.28’ subdirectory. So, we’ll move the contents of the directory up one level:

mv drupal-6.28/* ./
mv drupal-6.28/.htaccess ./
mv index.html index.html.orig

Note: It is extremely important the the hidden file .htaccess is also moved (note the second ‘mv’ command above. Check to make sure this file is there

   ls -l .htaccess

Notice that the last of the three mv commands renames the index.html file and calls it index.html.orig. The index.html file was serving as the home page for the website. Drupal uses an index.php page for it’s home page but the web server has preference for the index.html page. So, we move it out of the way.

Configuration File

Next, we need to tell Drupal how to connect to our database. To do this we have to setup a configuration file. Drupal comes with an example configuration file which we can borrow.

First navigate to the location where the configuration file should go:

  cd /var/www/sites/default/

Next, copy the example configuration that already exists in the directory to be our actual configuration file by renaming it to settings.php.

  cp default.settings.php settings.php

Now, we need to edit the configuration file to tell Drupal how to connect to our database server. To do this we’ll use an easy to use text editor gedit

  gedit settings.php

Find the variable $db_url and set it to this

  $db_url = 'pgsql://drupal:********@localhost/drupal';

Replace the text ‘********’ with your database password for the user ‘tripal’ created previously.

Final directory creation

Finally, we need to create three new directories. The first is the files directory which Drupal uses for storing uploaded files.

  cd /var/www/sites/default
  mkdir files
  sudo chown ubuntu:www-data files
  sudo chmod g+rw files

The above command creates the directory but sets the group to be the web server (i.e. www) with read/write permissions. This way the web server can write to the directory but so can we.

Also, we need to create two new directories, one for storing module files we’ll be installing and another for themes which we’ll also be installing later:

Now create the modules and themes directory

  cd /var/www/sites/all
  mkdir modules
  mkdir themes

Compatibility with other tools

We want to ensure that our Drupal installation doesn’t interfere with other web-based tools, such as GBrowse. We need update a setting in the .htaccess file that came with Drupal which instructs the web server to look for both index.php and index.html files when serving pages.

Use ‘gedit’ to modify the /var/www.htaccess file.

   cd /var/www
   gedit .htaccess

Locate the line DirectoryIndex and change it to mach the following:

  DirectoryIndex index.php index.html

Web-based Steps

Navigate to the installation page of our new web site http://localhost/install.php

800px-Install1.png

Click the link in the middle section that reads Install Drupal in English

800px-Tripal Install2.png

When the progress bar shows completing the page will switch to a configuration page with some final settings.

800px-Tripal-install3.png

Set the following

Now, click the Save and Continue button. You will see a message about unable to send an email. This is safe to ignore as email capabilities are not fully enabled on this VMWare image. Now, your site is enabled. Click the link Your new site:

800px-Tripal Install4.png

Drupal Cron Entry

The last step for installing Drupal is setting up the automatted Cron entry. The Drupal cron is used to automatically execute necessary housekeeping tasks on a regular interval. Cron is a UNIX facility for scheduling jobs to run at specific intervals.

Drupal itself requires an entry in the crontab to function. To edit the cron launch the crontab editor:

  sudo crontab -e

A word on text editors such as nano.

Add this line to the crontab

  0,30 * * * * /usr/bin/wget -O - -q http://localhost/cron.php > /dev/null

Now save the changes. We have now added a UNIX cron job that will occur every 30 minutes that will execute the cron.php script and cause Drupal to perform housekeeping tasks.

Drush

Drush is a command-line utility that allows for non-graphical access to the Drupal website. You can use it to automatically download and install themes and modules, clear the Drupal cache, upgrade the site and more. Tripal v1.0 supports Drush. For this tutorial we will use Drush and therefore we want the most recent version installed. Drush can be found on the Drupal website at http://drupal.org/project/drush.

To install drush first retrieve the most recent version from it’s Drupal project page. The current version at the writing of this document is version 7.x-5.9. While this version is intended for use with Drupal 7, it is backwards compatible with Drupal 6 and provides the most functionality.

We want Drush to reside in /usr/local which is where 3rd party software is normally installed. We’ll download the package to /usr/local/src and uncompress into /usr/local:

  cd /usr/local/src
  sudo wget http://ftp.drupal.org/files/projects/drush-7.x-5.9.tar.gz
  cd /usr/local
  sudo tar -zxvf src/drush-7.x-5.9.tar.gz

Next, we want the operating system to know about drush. We’ll create a symbolic link to the Drush executable in the /usr/local/bin directory where the operating systems looks for executables:

  sudo ln -s /usr/local/drush/drush /usr/local/bin/drush

Finally Drush needs to perform updates the first time it is run, so we’ll run it with elevated privileges (using sudo) so that it can perform it’s updates. In the future we no longer need ‘sudo’ to run drush:

  sudo drush

You must always run drush commands within the Drupal installation. It does not matter what subdirectory so long as you are in the Drupal directory sturcture. To see a list of available commands type the following:

  cd /var/www/
  drush

Explore Drupal

User Account Page

All users have an account page. Currently, we are logged in as the administrator. The account page is simple for now. Click the My account link on the left sidebar. You’ll see a brief history for the user and an Edit tab. Users can edit their own information using the edit interface:

800px-ExplorDrupal1.png

Creating Content

Creation of content in Drupal is very easy. Click the Create content link on the left sidebar.

800px-ExplorDrupal2.png

You’ll see two content types that come default with Drupal: Page and Story. Here is where a user can add simple new pages to the website without knowledge of HTML or CSS. Click the Page content type to see the interface for creating a new page:

800px-ExploreDrupal3.png

You’ll notice at the top a Title field and a Body text box. All pages require a title and typically have some sort of content entered in the body. Additionally, there are other options that allow someone to enter HTML if they would like, save revisions of a page to preserve a history and to set authoring and publishing information.

800px-ExploreDrupal4.png

For practice, try to create two new pages. A Home page and an About page for our site. First create the home page and second create the about page. Add whatever text you like for the body.

Site Administration

Content Management

There are many options under the Administer link on the left sidebar. Here you can manage the site setup, monitor and control content, manage users and view reports.

800px-DrupalAdmin1.png

We will not explore all of the options here but will visit a few of the more important ones for this tutorial. First, click the Content Management link on the left sidebar. You’ll see different options.

800px-DrupalAdminContent.png

Click the Content link. The page shows all content available on the site. You will see the “About” and “Home” pages you created previously:

800px-DrupalContent.png

You’ll also notice a set of drop down boxes for filtering the content. For sites with many different content types and pages this helps to find content. You can use this list to click to view each page or to edit.

Site Building

Modules

Click the Site Building link on the let sidebar under the Administer link. You’ll see several new menu options: Blocks, Menus, Modules and Themes. First click Modules

800px-AdminModules.png

Here is where you will see the various modules that make up Drupal. Take a minute to scroll through the list of these and read some of the descriptions. The modules you see here are core modules that come with Drupal. Those that are checked come pre-enabled. Those that are not checked we will need to install. For this tutorial we will need two additional modules that are not yet installed. Locate the modules Path and Search and check the box next to each of those. Scroll to the bottom and click ‘Save configuration’.

The Path and Search modules are now installed. The Search module enables site-wide searching capabilities for our site and the Path module enables alternative naming of page URLs (we will discuss later).

Themes

Next, click the Themes link under AdministerSite Building on the left sidebar.

800px-DrupalThemes.png

Here, you’ll see a list of themes that come with Drupal by default. If you scroll down you’ll see that one theme named Garland is enabled and set as default. The current look of the site is using the Garland them. Change the them by checking the Enable checkbox and the default radio button for the Pushbutton theme and then clicking Save configuration. Now you’ll see that the theme has changed.

800px-DrupalThemes2.png

Blocks

Blocks in Drupal are used to provide content in regions of a Drupal theme. For example, navigate to AdminsterSite BuildingBlocks.

You’ll see that regions of the theme have been identified. Within the Sky theme you can see the regions with dashed lines around them. Also, you’ll see a list of available blocks. You can select where blocks will appear by selecting the region in the drop down list. Blocks may also be hidden, if desired, by selecting <none> in the dropdown.

Drupal blocks1.png

Take time to turn on and off blocks to see where they appear. Re-arrange blocks by dragging and dropping the cross-hairs beside each one. Be sure to leave the blocks in the configuration shown in the image below finished:

Drupal blocks2.png

Drupal provides an interface for working with menus, including adding new menu items to an existing menu or for creating new menus. For the exercise in the Blocks section above we added the Primary links menu to the Content top section of the Sky theme. To view the Primary links menu, navigate to AdministerSite buildingMenus.

Drupal menus1.png

Select the menu Primary links. You’ll see it currently has no item.

Drupal menus2.png

As a demonstration for working with menus we’ll add two menu items for the Home and About pages we created earlier. To do so, click the Add item tab. You will see a form for providing information about the menu item to be added.

Drupal menus3.png

The first field is the path. We need to find the path for our home page.

The path for a page can be found in the address bar for the page. In Drupal pages of content are generally referred to as nodes. So, in the address bar for our home page you’ll see the address is http://localhost/node/1. Our about page should be http://localhost/node/2 (i.e the first and second pages we created).

The path for each of these nodes is simply node/1 and node/2. Returning to our tab where we are adding a menu item, enter the path node/1. We will set the fields in this ways:

The settings above will give the menu link a title of Home and put it on the Primary Links menu. We now have a Home menu item in the top just under the header, and our Home menu item now appears in the list of menu items for the Primary Links menu

Druapl menus4.png

Using the insructions above to add a second menu item for our about page and arrange. Use the ‘weight’ value so that our Home link appears first and the About link appears second.

URL Path

As mentioned previously, the URL paths for our pages have node/1 and node/2 in the address. This is not very intuitive for site visitors. Earlier we enabled the Path module. This module will allow us to set a more human-readable path for our pages.

To set a path, click on our new About page in the new menu link at the top and click the Edit tab. Scroll to the bottom of the edit page and you’ll see a section titled URL path setting. click to open this section.

Drupal url.png

Since this is our about page, we simply want the URL to be http://localhost/about. To do this, just add the word about in the text box. You will now notice that the URL for this page is no longer http://localhost/node/2 but now http://localhost/about. Although, both links will still get you to our About page.

Now, use the instructions described above to set a path of ‘home’ for our home page.

Site Configuration

There are many options under the AdministerSite configuration page. Here we will only look at one of these at the moment–the Site Information page.

Drupal config.png

Here you will find the configuration options we set when installing the site. You can change the site name, add a slogan, mission and footer text to the. Towards the bottom there is a text box titled Default front page. This is where we can tell Drupal to use our new Home page we created as the first page visitors see when the view the site. In this text box enter the text node/1. This is the address of our home page. We must use the node number here and not our new URL path of ‘home’ that we just created. Let’s change the name of our site from Tripal demo to My Community Genome Database and add a slogan: Resources for Community Genomics.

Now, click the Save configuration button at the bottom. You’ll see our site name has changed at the top. Also, if we click the logo image at the top of the site and it will take you to the front page with our new home page appearing.

User Accounts

For this tutorial, we will not discuss in depth the user management infrastructure except to point out:

Explore the Drupal User Management menu to see how users can be created, added to roles with specific permissions.

Prepare Drupal for Tripal

Theme Installation

Drupal allows us to install new themes. Installation of themes involves these steps:

  1. Locate and download a theme from the Drupal website (http://www.drupal.org/themes)
  2. Unpack the theme in the /var/www/sites/all/themes directory
  3. Return to the Drupal AdministerSite BuildingThemes page and enable the theme

For this tutorial, we will use the Sky theme which is available from http://drupal.org/project/sky. We can use the drush utility to download the theme

  cd /var/www/sites/all/themes
  drush pm-download sky

This should unpack the theme for us. Now, navigate to AdministerSite BuildingThemes and enable the ‘Sky’ theme and set it as default:

800px-DrupalThemes3.png

The sky theme was obtained at this address: http://drupal.org/project/sky

Theme Configuration

Here we return to theming. There are several configuration options that are available to help customize the theme for your site. These can be found by navigating to the AdministerSite BuildingThemes page and clicking the Configure tab near the top.

Appearing under the Configure link will be small menu with a listing of every theme we have enabled. You should see the Sky theme at the end of this list. Click that theme because that is the one we are using and want to configure:

800px-DrupalSkyTheme.png

Here you can turn on and off the presence of the logo, site name, slogan, mission statement, etc. For this particular theme we can also adjust background colors and images, link colors, font style and size, and more. Notice when we added a slogan in a previous step but it did not appear anywhere on the site. To make it appear, check the box next to Slogan.

Also set the following for the theme:

Then, click the Save Configuration button at the bottom. The pages inow a bit wider and our slogan appears.

3rd Party Modules

We can install new extension modules which we will need later. For this workshop we have several modules that we will need to install but which do not yet appear in the list of modules. To do this, we must follow these steps:

  1. Locate the extension modules from the Drupal website
  2. Retrieve the module using a drush command.
  3. Check for a README.txt or INSTALL.txt for any further instructions for installation of the module
  4. Return the the Drupal AdministerSite BuildingModules page and enable the module.

For an example, let’s install the Views module needed for this workshop. The Views module can be found here: http://drupal.org/project/views. We will download the current version as of the writing of this tutorial:

  cd /var/www/sites/all/modules
  drush pm-download views

Check the README for additional installation instructions

  cd views
  ls
  less README.txt

Use the space-bar to scroll through the README.txt file. Hit the ‘q’ key to quit

There are no other installation steps besides what we’ve done. So return to the AdministerSite BuildingModules page and enable the Views module.

800px-DrupalViews.png

Notice that the Views package provided three different related modules and they all appear under a Views category.

Alternatively, you can enable the module using another drush command:

  drush pm-enable views views_ui views_export

for this Tutorial, CCK, Views, Views Data Export, JQuery update, and CKEditor should also be downloaded and installed following the same instructions above

drush pm-download views
drush pm-download views_data_export
drush pm-download cck
drush pm-download jquery_update
drush pm-download ckeditor

For CKEditor, the README file indicates we need to install the CKEditor library before we can enable this module. We must first get this package from online.

Here is a quick command for downloading this file

  cd /var/www/sites/all/modules/ckeditor
  wget http://download.cksource.com/CKEditor/CKEditor/CKEditor%204.1.2/ckeditor_4.1.2_standard.zip

Now unzip the package and rename it according to the instructions

  unzip ckeditor_4.1.2_standard.zip

Once all installation steps have been completed the Views Data Export, CCK and CKEditor modules can be enabled with the following

   drush pm-enable views_data_export
   drush pm-enable content fieldgroup content_permissions nodereference userreference text content_copy optionwidgets number
   drush pm-enable jquery_update
   drush pm-enable ckeditor

For reference, the modules installed above can be found here:

Configure CKEditor

Next, we need to configure the CKEditor which provides the Word-style interface for adding content. Navigate to ‘Administer’ -> ‘Site Configuration’ -> ‘CKEditor’. You will see a page similar to the following:

Tripal-v1.1 ckeditor1.png

Click the ‘Edit’ link beside ‘CKEditor Global Profile’. On the page that appears, we want to expand the ‘Visibility Settings’ and switch the radio button from ‘Exclude’ to ‘Include’. Then clear all of the entries in the textbox named ‘Fields to exclude/include’:

Tripal-v1.1 ckeditor2.png

Add the following lines to the textbox you just cleared:

page@node/add/page.edit-body
chado_organism@node/add/chado-organism.edit-description
chado_organism@node/*/edit.edit-description
chado_analysis@node/add/chado-analysis.edit-description
chado_analysis@node/*/edit.edit-description

This will disable the CKEditor for all text boxes except for generic pages, organism descriptions and analysis descriptions. We can return later to add any other textareas to the list. You can find the identifier, similar to those we added to the textbox above, underneath any compatible text box. CKEditor puts the identifier under each textbox for your reference. Simply cut-and-paste the identifier. For example, the screenshot from the Create Page page is shown below. Notice the CKEdintifier for the textbox named sky:page@node/add/page.edit-body.. This was one of the identifiers we used in the textbox above, but with the theme name (e.g. sky) removed.

Tripal v1.1-ckeditor3.png

Click the Update global profile button. Next, under the Profiles section. Click the edit link next to Default profile. When the page appears, open the Editor Appearance section, and set the Toolbar by clicking the full link. finally, click the Save button.

Tripal Installation

Get the Software

To download Tripal and the Extension modules change to the directory where Drupal keeps it’s modules:

cd /var/www/sites/all/modules

To obtain Tripal, issue the following ‘git commands:

git clone http://git.drupal.org/sandbox/spficklin/1337878.git tripal
cd tripal
git checkout 6.x-1.1
cd ../

We also want to obtain several Extension modules that will be used in this tutorial. Those modules are available on the Extensions Page of the Tripal website. However, these extension modules are also available via a git repository so we will use a git commands to obtain these.

git clone http://git.drupal.org/sandbox/spficklin/1578226.git tripal_blast_analysis
cd tripal_blast_analysis
git checkout 6.x-1.1-tripal_v1.1
cd ../

git clone http://git.drupal.org/sandbox/spficklin/1578234.git tripal_kegg_analysis
cd tripal_kegg_analysis
git checkout 6.x-1.1-tripal_v1.1
cd ../

git clone http://git.drupal.org/sandbox/spficklin/1578232.git tripal_interpro_analysis
cd tripal_interpro_analysis
git checkout 6.x-1.1-tripal_v1.1
cd ../

git clone http://git.drupal.org/sandbox/spficklin/1578230.git tripal_go_analysis
cd tripal_go_analysis
git checkout 6.x-1.1-tripal_v1.1
cd ../

git clone http://git.drupal.org/sandbox/spficklin/1578246.git tripal_unigene_analysis
cd tripal_unigene_analysis
git checkout 6.x-1.1-tripal_v1.1
cd ../

The above commands will download the main tripal package as well as the Blast, KEGG, InterPro, GO and Unigene extension modules. Tripal also has a theme as well. Change to the theme directory:

cd /var/www/sites/all/themes

And issue the following git commands:

git clone http://git.drupal.org/sandbox/spficklin/1342972.git tripal_theme
cd tripal_theme
git checkout 6.x-1.1

Installation

Previously in this Tutorial we enabled the Path and Search modules. The process for enabling the Tripal modules is the same. The site administrator can navigate to the AdministerSite BuildingModules page and enable each of the Tripal modules. However, Drush make it easier to enable modules from the command-line. First, we must enable the tripal_core module. Enter the following command

drush pm-enable tripal_core

Now that the core module is enabled, we must next install Chado. In the web browser, navigate to AdministerTripal ManagementInstall Chado Schema. Since this is a fresh install, select the option to install Chado v1.2 and click the button Install/ugrapde Chado

Chado install.png

After the button is clicked a message will appear stating “Job ‘Install Chado v1.2’ submitted. Check the jobs page for status”. Click the jobs page link to see the job that was submitted:

ChadoInstallJob.png

The job is waiting in the queue until the Tripal jobs system wakes and tries to run the job. The jobs management subsystem allows modules to submit long-running jobs, on behalf of site administrators or site visitors. Often, long running jobs can time out on the web server and fail to complete. The jobs system runs separately in the background using the command-line on an automated schedule but jobs are submitted through the web interface by users.

So, in the example above we now see a job for installing Chado. The job view page provides details such as the name of the job, The user who submitted the job, dates that the job was submitted and job status.

Jobs in the queue can be executed in two ways:

When we installed Drupal we installed a Cron job to allow the software to run housekeeping tasks on a regular bases. Tripal needs a cron entry as well to allow for regular execution of jobs in the queue. We will need to add a second cron entry:

   sudo crontab -e

A word on text editors such as nano.

Add this line to the crontab

   0,15,30,45 * * * * (cd /var/www; drush trpjob-run administrator ) > /dev/null

This entry will run the Tripal cron every 15 minutes as the administrator user. For this tutorial we do not want to wait 15 minutes at the most to execute our jobs. So, we will run the jobs manually. Tripal supports Drush and therefore has it’s own commands. We can use drush to manually launch the job:

drush trpjob-run administrator

We should now see to following text in the terminal window indicating that the installation of Chado was successful:

Tripal Job Launcher
Running as user 'administrator'
-------------------
Calling: tripal_core_install_chado(Install Chado v1.2, 1)
Creating 'chado' schema
Loading sites/all/modules/tripal/tripal_core/chado_schema/default_schema-1.2.sql...
Install of Chado v1.2 (Step 1 of 2) Successful!
Loading sites/all/modules/tripal/tripal_core/chado_schema/initialize-1.2.sql...
Install of Chado v1.2 (Step 2 of 2) Successful.
Installation Complete

Also, we see that the job has completed when refreshing the jobs management page:

ChadoInstallDone.png

Now that Chado is installed, we can continue with installation of the remaining Tripal modules. These modules should be installed in the following order one at a time. If you install them all at once you may encounter errors later. Install the modules in the following way (and order):

drush pm-enable tripal_db
drush pm-enable tripal_cv
drush pm-enable tripal_organism
drush pm-enable tripal_analysis
drush pm-enable tripal_feature
drush pm-enable tripal_views

Now, enable the remaining Tripal extension modules

drush pm-enable tripal_analysis_blast
drush pm-enable tripal_analysis_go
drush pm-enable tripal_analysis_interpro
drush pm-enable tripal_analysis_kegg

There are more Tripal modules that can be enabled (e.g. tripal_project, tripal_stock, etc.). But for this tutorial we will only be using the modules we enabled above.

The Tripal modules create directories in the /var/www/sites/default/files directory. By default, Drupal expects the ‘sites/default/files’ directory to be writeable by the web server. Because we installed the Tripal mdoules using Drush we need to reset the permissions for the web user. Execute the following command to give the web user group permission to write to that directory

sudo chown -R ubuntu:www-data /var/www/sites/default/files
sudo chmod -R g+rw /var/www/sites/default/files

The last component we need to enable is the Tripal base theme. This theme provides the necessary look-and-feel to the data presented by Tripal. Installation is the same as for modules:

drush pm-enable tripal

The Tripal theme is not a full Drupal theme. It is intended to beincorporated into the site’s primary theme. In this tutorial we are currently using the sky theme. Therefore, we need to inform Drupal that the sky theme will be using Tripal as a base theme. To do this, change to the sky theme directory:

cd /var/www/sites/all/themes/sky

And edit the sky.info file

gedit sky.info

And add the following line to the bottom of the file:

base theme = tripal

If you do not wish to use the Sky theme, you simply need to find the corresponding .info file for your default theme and add the same line to the file.

Tripal is now installed!

Controlled Vocabularies: Installing CVs

Before we can proceed with populating our Chado table with genomic data we must first load some controlled vocabularies (i.e. ontologies). To do this, navigate to AdministerTripal ManagementVocabularies. You’ll see a page describing the purpose of the module and how to use it. Click the link on the left sidebar titled ‘Load Ontology With OBO File’. You’ll see the following page:

Tripal-LoadOntology.png

The Ontology loader will allow you to select a pre-defined ontology from the drop down list or allow you to provide your own to be loaded. If you provide your own, you give the remote URL of the OBO file or provide the full path on the local web server where the OBO file is located. In the case of a remote URL, Tripal first downloads and then parses the OBO file for loading. If you do provide your own OBO file it will appear in the saved drop down list for loading of future updates to the ontology.

For this tutorial, we need to install these ontologies:

  1. Chado feature properties
  2. Relationship ontology
  3. Sequence ontology
  4. Gene ontology.

Do so by selecting one and clicking the Submit button. Repeat this process for each of the three ontologies. You’ll notice each time that a job is added to the jobs subsystem.

Now manually launch these jobs

cd /var/www
drush trpjob-run administrator

Note: Loading the Gene Ontology will take several hours.

Setting Perimssions

Because we are logged on to the site as an admin user we are able to see all content. However, Drupal provides User Management infrastructure that allows the site admin to set which types of users can view the content on the site. By default there are two types of users anonymous and authenticated users. For this tutorial we want to set permissions so that anonymous visitors to the site can see the genomic content. To do this, navigate to Administeruser ManagementPermissions. Here you will see permissions for all types of content.

Triapl-Permissions.png

Scroll through the list of permissions and set the following for both anonymous and authenticated users:

Each time you install a new module you should always check the Permissions page and set any new permissions that may have been added by the new module.

Using Tripal

Creating Organism Pages

There are two ways to create pages for organism. If your organism is already in Chado then you can sync the organism. If it is not in Chado you will need manually create it using the Tripal web interface. The following two sections describe both methods.

What if Our Organism is Already in Chado?

Now that we have Chado loaded and populated we would like to create a home page for our species. Chado comes pre-loaded with a few species already, so we will check to see if our organism is already present. To do this navigate to Administer -> Tripal ManagementOrganismsConfiguration

800px-TripalOrganisms.png

This configuration page has several different options. We will discuss two of these here. The first is the top section labeled Sync Organisms. In this section is a list of organisms. These are the organisms that come by default with Chado. If our organism is already in the list (e.g. Drosophila melenogaster) then we need to inform Drupal that we have data in Chado for which we would like a web page. This is what we call Syncing. We need to sync Drupal and Chado so that Drupal knows about our organism. To do this, click the check box next to Drosophila melenogaster and then click the Submit Sync Job.

As usual we want to run this job manually:

cd /var/www
drush trpjob-run administrator

Now that our organism is synced we should have a new page for Drosophila melenogaster. To find the page, click the Organisms menu item in the left side bar under Search Biological Data. This menu item was automatically added when we installed the Tripal Organism module. On this page we see a list of organisms that are present in Chado. Notice that only the fruitfly is clickable because it is the only organism synced.

TripalOrganismList.png

Now if we click the ‘fruitfly’ link it should take us to our new organism page:

TripalOrganismFruitFly.png

By default all Tripal pages have a center content section and a right side-bar section with links for Resources. However, this page is a bit empty. We need to add some details. Click the Edit tab towards the top of the page. Notice two if the fields are missing content: the description and the organism image

TripalOrganismFruitFly2.png

For the description add the following text (taken from wikipedia: http://en.wikipedia.org/wiki/Drosophila_melanogaster):

“The genome of D. melanogaster (sequenced in 2000, and curated at the FlyBase database) contains four pairs of chromosomes: an X/Y pair, and three autosomes labeled 2, 3, and 4. The fourth chromosome is so tiny that it is often ignored, aside from its important eyeless gene. The D. melanogaster sequenced genome of 165 million base pairs has been annotated[17] and contains approximately 13,767 protein-coding genes, which comprise ~20% of the genome out of a total of an estimated 14,000 genes. More than 60% of the genome appears to be functional non-protein-coding DNA involved in gene expression control. Determination of sex in Drosophila occurs by the ratio of X chromosomes to autosomes, not because of the presence of a Y chromosome as in human sex determination. Although the Y chromosome is entirely heterochromatic, it contains at least 16 genes, many of which are thought to have male-related functions.”

For the image, download this image below and upload it using the interface on the page.

Dmel.jpg

Save the page. Now we have a more informative page:

TripalOrgPageDone.png

What if My Organism Is Not Present in Chado?

For this tutorial we will be loading data for Citrus sinensis (sweet orange), but this organism is not in Chado by default. We can easily add the organism using the Create Content page. You can find this link on the left side bar navigation menu. The Create Content page has many more content types than when we first saw it. Previously we only had Page and Story content types. Now we have more content types added by the Tripal Analysis, Organism, Feature and Extension modules .

Tripal-Create content.png

To add a new organism simply click the Organism link and and fill in the fields with these values:

And, use the following image:

Citrus sinensis.jpg

Save the page and view the new Organism:

Tripal-Organism-Citrus.png

Creating an Analysis

For this tutorial, we will later import a set of genes, including associated mRNA, CDS, UTRs, etc. Tripal requires that an analysis be associated with all imported features. This has several advantages, including:

To create an analysis for loading our genomic data, navigate to the Create content and click on the link: Analysis

The analysis creation page will appear:

TripalCreateAnalysis.png

Here you can provide the necessary details to help others understand the source of your data. For this tutorial, enter the following:

<p>
    <strong><em>Note: </em>The following text comes from phytozome.org:</strong></p>
<p>
    <u>Genome Size / Loci</u><br />
    This version (v.1) of the assembly is 319 Mb spread over 12,574 scaffolds. Half the genome is accounted for by 236 scaffolds 251 kb or longer. The current gene set (orange1.1) integrates 3.8 million ESTs with homology and ab initio-based gene predictions (see below). 25,376 protein-coding loci have been predicted, each with a primary transcript. An additional 20,771 alternative transcripts have been predicted, generating a total of 46,147 transcripts. 16,318 primary transcripts have EST support over at least 50% of their length. Two-fifths of the primary transcripts (10,813) have EST support over 100% of their length.</p>
<p>
    <u>Sequencing Method</u><br />
    Genomic sequence was generated using a whole genome shotgun approach with 2Gb sequence coming from GS FLX Titanium; 2.4 Gb from FLX Standard; 440 Mb from Sanger paired-end libraries; 2.0 Gb from 454 paired-end libraries</p>
<p>
    <u>Assembly Method</u><br />
    The 25.5 million 454 reads and 623k Sanger sequence reads were generated by a collaborative effort by 454 Life Sciences, University of Florida and JGI. The assembly was generated by Brian Desany at 454 Life Sciences using the Newbler assembler.</p>
<p>
    <u>Identification of Repeats</u><br />
    A de novo repeat library was made by running RepeatModeler (Arian Smit, Robert Hubley) on the genome to produce a library of repeat sequences. Sequences with Pfam domains associated with non-TE functions were removed from the library of repeat sequences and the library was then used to mask 31% of the genome with RepeatMasker.</p>
<p>
    <u>EST Alignments</u><br />
    We aligned the sweet orange EST sequences using Brian Haas's PASA pipeline which aligns ESTs to the best place in the genome via gmap, then filters hits to ensure proper splice boundaries.</p>

Note:: Above we entered HTML. This is not the easiest way to enter text, but makes it simple for this tutorial. When the ckeditor module is installed and properly setup the user is provided with editor tools that makes it much easier to add text to any page.

After saving, you should have the following analysis page:

Tripal-Analysis-Citrus.png

Creating a Database Cross Reference

For our site, we want to create gene pages with sequences and have those pages link back to JGI where we obtained the genes. Therefore, we want to add a database reference for JGI. To add a new external databases, navigate to AdministerTripal ManagementDatabasesAdd a Database. The resulting page provides fields for adding a new database:

Tripal-Add-Database.png

Enter the following values for the fields:

The URL prefix is important as it will be used to create the links on our gene pages. Our gene name will be appended to this URL to create the link that will take us to the corresponding gene page on Flybase.

Click Add.

We now have added a new database!

Later we will also load Blast data. We need to create two new databases for those as well. Creat the following entries for NCBI nr, and ExPASy SwissProt:

Loading Feature Data

Now that we have our organism and whole genome analysis ready, we can being loading genomic data. For this tutorial only a single gene from sweet orange will be loaded into the databsae. This is to ensure we can move through the tutorial rather quickly. The following datasets will be used for this tutorial:

Download these to the /var/www/sites/default/files. The quickest method is to right-click on the links above, then wget to retrieve the file:

  cd /var/www/sites/default/files
  #Note that these files no longer exist!
  wget http://gmod.org/mediawiki/images/d/dc/Citrus_sinensis-orange1.1g015632m.g.gff3
  wget http://gmod.org/mediawiki/images/8/87/Citrus_sinensis-scaffold00001.fasta
  wget http://gmod.org/mediawiki/images/9/90/Citrus_sinensis-orange1.1g015632m.g.fasta

Loading a GFF3 File

The gene features (e.g. gene, mRNA, 5_prime_UTRs, CDS 3_prime_UTRS) are stored in the GFF3 file downloaded in the previous step. We will load this GFF3 file and consequently load our gene features into the database. Navigate to AdministerTripal ManagementFeaturesImport a GFF3 file.

Tripal-importGFF.png

Perform the following:

  1. Enter the path on the file system where our GFF file resides (/var/www/sites/default/files/Citrus_sinensis-orange1.1g015632m.g.gff3)
  2. Choose the organism to which the GFF3 file belongs (in this case Citrus sinensis (sweet orange)
  3. Select the analysis named “Whole Genome Assembly and Annotation of Citrus sinensis…”.
  4. Leave all other options as default.

Finally, click the Import GFF3 file button. You’ll notice a job was submitted to the jobs subsystem. Now, to complete the process we need the job to run. We’ll do this manually:

cd /var/www;
drush trpjob-run administrator

Note: For very large GFF files the loader can take quite a while to complete. There should be no errors or warnings after loading the GFF file but there will be information about the time and memory used when loading the file. Below is similar to what you should see:

Tripal Job Launcher
Running as user 'administrator'
-------------------
Calling: tripal_feature_load_gff3(/var/www/sites/default/files/Citrus_sinensis-orange1.1g015632m.g.gff3, 13, 10, 0, 1, 0, 0, 1, , , 0, , , , 0, 8)

NOTE: Loading of this GFF file is performed using a database transaction.
If the load fails or is terminated prematurely then the entire set of
insertions/updates is rolled back and will not be found in the database

Opening /var/www/sites/default/files/Citrus_sinensis-orange1.1g015632m.g.gff3
Parsing Line 138 (100.00%). Memory: 39,138,672 bytes.
Setting ranks of children...
Done

Loading FASTA files

Using the Tripal GFF loader we were able to populate the database with the genomic features for our organism. However, those features now need nucleotide sequence data. To do this, we will load the nucleotide sequences for the mRNA features and the scaffold sequence. Navigate to the AdministerTripal ManagementFeaturesImport a Multi-FASTA file Page

Tripal-import fasta.png

Before loading the FASTA file we must first know the Sequence Ontology (SO) term that describes the sequences we are about to upload. We can find the appropriate SO terms from our GFF file. In the GFF file we see the SO terms that correspond to our FASTA files are ‘scaffold’ and ‘mRNA’.

IMPORTANT: It is important to ensure prior to importing, that the FASTA loader will be able to appropriately match the sequence in the FASTA file with existing sequences in the database. Before loading FASTA files take special care to ensure the definition line of your FASTA file can uniquely identify the feature for the specific organism and sequence type. For example, in our GFF file an mRNA feature appears as follows:

scaffold00001   phytozome6      mRNA    4058460 4062210 .       +       .       ID=PAC:18136217;Name=orange1.1g015632m;PACid=18136217;Parent=orange1.1g015632m.g

Note that for this mRNA feature the ID is PAC:18136217 and the name is orange1.1g015632m. In Chado, features always have a human readable name which does not need to be unique, and also a unique name which must be unique for the organism and SO type. In the GFF file, the ID becomes the unique name and the Name becomes the human readable name.

In our FASTA file the definition line for this mRNA is:

>orange1.1g015632m PAC:18136217 (mRNA) Citrus sinensis

By default Tripal will match the sequence in a FASTA file with the feature that matches the first word in the definition line. In this case the first word is orange1.1g015632m. As defined in the GFF file, the name and unique name are different for this mRNA. However, we can see that the first word in the definition line of the FASTA file is the name and the second is the unique name. Therefore, when we load the FASTA file we should specify that we are matching by the name because it appears first in the definition line.

If however, we cannot guarantee the that feature name is unique then we can use a regular expressions in the Advanced Options to tell Tripal where to find the name or unique name in the definition line of your FASTA file.

IMPORTANT: When loading FASTA files to update existing features, always choose “Update only” as the import method. Otherwise, Tripal may add the features in the FASTA file as new features if it cannot properly match them to existing features.

Now, enter the following values in the fields on the web form:

Click the Import Fasta File, and a job will be added to the jobs system. Run the job:

cd /var/www
drush trpjob-run administrator

Next do the same for the genes GFF:

Now run this job:

cd /var/www;
drush trpjob-run administrator

Now the scaffold sequence and mRNA sequences are loaded

Note It is not necessary to load the mRNA sequences as those can be derived from their alignments with the scaffold sequence. However, in Chado the feature table has a ‘residues’ column. Therefore, it is best practice to load the sequence when possible.

The FASTA loader has some advanced options which we will not cover in this tutorial. But briefly, the advanced options allow you to create relationships between features and associate them with external databases. For example, the definition line for an mRNA is:

>orange1.1g015632m PAC:18136217 (mRNA) Citrus sinensis

Here we have more information than just the feature name. We have a unique Phytozome accession number (e.g. PAC ID) for the mRNA. Using the External Database Reference section under Advanced Options we can provide the name of the database and a regular expression to tell the loader how to find the accession number in the definition line. However, the database cross-reference is present in the GFF file and this association has already been made so we do not need to make the association here.

If the name of the gene to which this mRNA belonged was also on the definition line, we could use the Relationships section to link this mRNA with it’s gene parent. Fortunately, this information is also in our GFF file and these relationships have already been made.

Creating Feature Pages

Now that we’ve loaded our feature data, we must “sync” them. Loading of the GFF file in the previous step has populated the feature tables of Chado for us, but now Drupal must know about these features. To sync features, navigating to AdministerTripal ManagementFeaturesSync Features (unlike for syncing organisms, features have a separate page for syncing).

Tripal-Sync-Features.png

Here we can specify the types of features to sync and the organism. This allows us to create feature pages for different types of features for different organisms. Enter into the Feature Types the features that should have pages on the site. In this case, we want gene and mRNA pages. Features of these types were present in our GFF file.

Next, select the organism “Citrus sinensis”, and click the “Sync all features” button. A job is then added to the jobs management system which we need to manually run rather than wait on the cron entry to run it.

cd /var/www
drush trpjob-run administrator

Our features are now synced:

Tripal-Features-Synced.png

Note: It is not necessary to sync all types of features in the GFF file. For example, do not sync the scaffold. The feature is large and would have many relationships to other features. Only sync features that you will want users to view. For example, each mRNA is composed of several CDS features. These CDS features do not need their own page and therefore do not need to be synced..

Now, we can view our gene and mRNA pages. Navigate to Search Biological DataFeautres. Select gene and mRNA in the Type select box and click the Show button. The list of genes and mRNA will be all available features.

Tripal-Search-Features.png

Here we can see the gene feature we added and its corresponding mRNA’s. Click on the mRNA named orange1.1g015615m. You can now see the details page with available resources on the side bar:

Triapl-New-Feature-Page.png

Materialized Views

Chado is very efficient as a data warehouse but queries can become slow depending on the number of table joins and amount of data. To help simplify and speed these queries, materialized views can be employed. For a materialized view, a table is created and then populated with the results of a pre-defined SQL query. Therefore, rather than execute the pre-defined query which may take a long time, the query on the materialized view is more simple and faster. A side effect, however is redundant data, with the materialized view becoming stale if not updated regularly.

Tripal provides a mechanism for populating and updating these materialized views. These can be found on the AdministerTripal ManagementMViews page.

Tripal-MViews.png

Here we see several materialized views. These were installed automatically by the various Tripal modules. To update these views, click the Populate button for each one.

This will submit jobs to populate the views with data. Now, run the jobs:

cd /var/www
drush trpjob-run administrator

You can now see that all views are up-to-date on the Materialized Views Page. The number of rows in the view table is shown:

Tripal-MViews-Populated.png

Materialized views are most useful when creating custom pages where data is queried in novel ways.

Feature Page Configuration

Feature URLs

The feature configuration page allows us to perform configuration changes for the entire site. Navigate to the AdministerTripal ManagementFeaturesFeature Configuration page.

TripalFeatureConfiguration.png

First, we can alter the feature URL path which is used to construct the URL that visitors can bookmark or link to for each feature. The URL would be http://localhost/[identifier] where [identifier] could be the name of the feature, unique name or internal ID number, or a combination of the organism, type and name of the feature. If we choose to use an internal ID, we can specify a prefix for the internal ID number when it appears on the URL path. For example, if we leave the default prefix of ‘ID’ and have a feature ID number of 283942, the feature ID would appear as ‘ID283942’.

For this tutorial, we want to reset the URL for all of the features we loaded previously. To do this, click the option feature name and then click the Set Feature URLs. This will add a job to the jobs system. We want to execute this job manually:

cd /var/www
drush trpjob-run administrator

You should see the following output to the terminal:

Tripal Job Launcher (in parallel)
Running as user 'administrator'
-------------------
Calling: tripal_feature_set_urls(, 18)
Setting URL alias for orange1.1g015632m.g: node/4 => orange1.1g015632m.g
Setting URL alias for orange1.1g015632m: node/5 => orange1.1g015632m
Setting URL alias for orange1.1g015645m: node/6 => orange1.1g015645m
Setting URL alias for orange1.1g015615m: node/7 => orange1.1g015615m
Setting URL alias for orange1.1g015662m: node/8 => orange1.1g015662m
Setting URL alias for orange1.1g017341m: node/9 => orange1.1g017341m
Setting URL alias for orange1.1g018514m: node/10 => orange1.1g018514m
Setting URL alias for orange1.1g022520m: node/11 => orange1.1g022520m
Setting URL alias for orange1.1g022799m: node/12 => orange1.1g022799m
Setting URL alias for orange1.1g022797m: node/13 => orange1.1g022797m

Previously, URLs were set to use the Internal ID by default, but now the URLs for features are similar to: http://localhost/orange1.1g015632m.

Feature Browser

Next on the configuration page are Feature Browser settings. By default, Tripal will provide a browser on the organism page that allows a visitor to easily find a feature. For large sites with many features this would be an inefficient way to find a specific feature, but it does allow visitors who simply want to explore the site to quickly find example pages. This browser will only show synced features and will only show features of the type specified in the Feature Types box. We want to show genes pages so alter the contents of this box to contain only the word gene. Optionally, you can turn this browser off by setting the appropriate radio buttons. If we then navigate to the organism page for Citrus sinensis and click the link on the right sidebar titled Feature Browser we can see the genes listed with links for each feature page.

Tripal-FeatureBrowser.png

Feature Summary Report

Next on the configuration page is the Feature Summary Report setting. By default, on the organism page, Tripal will provide a list of all features belonging to an organism and provide a pie-chart of this list. For example, below is a screen shot of the Data Type Summary on the Citrus sinensis page for the data we loaded.

Tripal-FeatureTypes.png

You can turn off this summary graph using the feature configuration page by setting the appropriate radio button in the Feature Summary Report section. You can also specify which feature types to show and can rename them to be more meaningful. To select which items will appear in this list, add the following contents to the Map feature types box:

CDS
five_prime_UTR = 5'UTR
three_prime_UTR = 3'UTR
mRNA
supercontig = Supercontig
gene = Gene

Now the Data Type Summary on the organism page appears as:

TripalDataTypeSummary.png

Note The data type summary is only available when the organism_feature_count materialized view is populated. Each time new data is added, this materialized view should be updated to capture the changes and have those shown in the summary.

Loading Functional Data Using Extension Modules

For this tutorial we will be loading functional data for our gene. To do this we will use the Blast, KEGG, and InterPro extension modules. These modules were installed previously. Blast, KEGG and InterPro analyses were completed prior to this tutorial and results files are avaialble for downloading:

Download these files to the /var/www/sites/default/files directory. To do so quickly run these commands:

  cd /var/www/sites/default/files
  # Note that these files no longer exist!
  wget http://gmod.org/mediawiki/images/0/0c/Citrus_sinensis-orange1.1g015632m.g.iprscan.xml
  wget http://gmod.org/mediawiki/images/1/13/Citrus_sinensis-orange1.1g015632m.g.KEGG.heir.tar.gz
  wget http://gmod.org/mediawiki/images/e/e8/Blastx_citrus_sinensis-orange1.1g015632m.g.fasta.0_vs_uniprot_sprot.fasta.out
  wget http://gmod.org/mediawiki/images/2/24/Blastx_citrus_sinensis-orange1.1g015632m.g.fasta.0_vs_nr.out

Loading Blast Results

Configuring Blast Databases

Now that we have our features loaded we want to add some functional data as well. We need to create a new analysis page for our blast results. The Tripal Blast Analysis extension module will parse blast results and load them into Chado after a blast analysis page is created. However, before we create the page we need to ensure that the blast module can properly parse the blast hits. To do this, navigate to AdministerTripal ManagementAnalysesBlast Settings. The following page will appear.

Tripal-Blast-Settings.png

This page allows you to specify a different, more meaningful name for the database. This name will be displayed with blast results. You can also provide regular expressions for parsing blast hits. For example, the following is an line for a match from SwissProt:

sp|P43288|KSG1_ARATH Shaggy-related protein kinase alpha OS=Arabidopsis thaliana GN=ASK1 PE=2 SV=3

Here the hit name is “KSG1_ARATH”, the accession is “P43288”, the hit description is “Shaggy-related protein kinase alpha OS=Arabidopsis thaliana” and the organism is “Arabidopsis thaliana”. We need regular expressions to tell Tripal how to extract these unique parts from the match text. Because Tripal is a PHP application, the syntax for regular expressions follows the PHP method. Documentation for regular PHP expressions can be found here. The following regular expressions can be used to extract the hit name, the accession, hit description and organism for the example SwissProt line above:

Element RE
Hit Name
^sp\|.*?\|(.*?)\s.*?$
Hit Description
^sp\|.*?\|.*?\s(.*)$
Hit Accession
^sp\|(.*?)\|.*?\s.*?$
Hit Organism
^.*?OS=(.*?)\s\w\w=.*$

In this tutorial, we will be adding blast results for the two databases we created earlier in the tutorial: ExPASy SwissProt and NCBI nr. First, select ExPASy SwissProt from the drop-down menu, and add the following values to the fields.

Click Save Settings.

Note: The match accession will be used for building web links to the external database. The accession will be appended to the URL Prefix set earlier when the database record was first created.

Now select the NCBI nr database from the drop-down and click the radio button. NCBI databases use a format that is compatible with blast. Therefore, the hit name, accession and description are handled differently in the BLAST XML results. To correctly parse results from an NCBI database click the Use Genebank style parser checkbox. This should disable all other fields and is all we need for this database.

Load the Blast Results

Now we can create out analysis page. Navigate to Create Content page and select the Analysis: Blast content type. Add the following values for this analysis. In the fields set the following values:

Click the Save button. You can now see our new Analysis.

Tripal-Analysis-Blast.png

Now we need to manually run the job to parse the Blast results:

cd /var/www
drush trpjob-run administrator

The results should now be loaded. if we visit our feature page, for feature ‘orange1.1g015615m’ (http://localhost/orange1.1g015615m) we should now see blast results by clicking the ‘ExPASy SwissProt’ link on the right sidebar:

Tripal-Blast-Results.png

Now we want to add the results for NCBI nr. Repeat the steps above to add a new analysis with the following details:

Click the Save button and manually run the job:

cd /var/www
drush trpjob-run administrator

Return to the example feature page to view the newly added results: http://localhost/orange1.1g015615m

Loading InterProScan Results

Now we want to load results from an InterProScan. For this tutorial, these results were obtained by using a local installation of InterProScan installed on a computational cluster. However, you may choose to use Blast2GO or the online InterProScan utility. Results should be saved in XML format.

To create an analysis, navigate to the Create Content page and select the content type Analysis: Interpro. Add the following values for this analysis

Click the Save button. You can now see our new Analysis.

Triapl-InterPro-Analysis.png

Now we need to manually run the job to parse the Inetpro results:

cd /var/www
drush trpjob-run administrator

The results should now be loaded. if we visit our feature page, http://localhost/orange1.1g015615m, we should now see interpro results by clicking on the “Interpro Report” link on the right sidebar.

Tripal-Interpro-Results.png

Viewing GO Terms

When we setup the InterPro analysis we requested that it parse GO terms from the InterProScan results. As a result, we now have a new GO Assigments item in the Resources sidebar. For our example feature (http://localhost/orange1.1g015615m), the results are as follows:

Tripal-GO-Results.png

Because we now have GO terms associated with features we can setup the GO report that appears on the organism page. Navigate to the Citrus sinensis organism page and click the Go Analysis Reports in the Resources sidebar. A page appears with instructions to the site administrator that gives details to make the report visible.

Triapl-GO-Report-NotSetup.png

Follow the instructions as presented on the page. Briefly, you need to

  1. Set the CV term paths for the three GO vocabularies. This should have been done automatically when you loaded the Gene Ontology earlier in the Tutorial.
  2. Populate the go_count_analysis materialized view.

When complete the following report will be visible:

Tripal-GO-Report.png

The GO report provides pie charts and an expandable tree for browsing results. Clicking on a GO term in the true will cause a box to appear with details about the term and a link to download a FASTA file of all features annotated with the term. Notice that the graphs are quite simple and the graph for the cellular component is missing. This is because we only loaded GO assignments for a single gene.

Loading KEGG Analysis Results

Now we want to load results from a KEGG/KAAS analysis (http://www.genome.ad.jp/tools/kaas/). The KAAS server receives as input a FASTA file of sequences and annotates those with KEGG terms. The tool also generates an heirarchy (heir) output file. This output file can be read directly by the Tripal Analysis KEGG module.

To create an analysis, navigate to the Create Content page and select the content type Analysis: KEGG. Add the following values for this analysis

Click the Save button. You can now see our new Analysis.

Tripal-KEGG-Analysis.png

Now we need to manually run the job to parse the KEGG results:

cd /var/www;
drush trpjob-run administrator

Note: currently in the development version of Tripal there is a bug where the KEGG data is not loaded when the job runs. You must run the job a second time for the KEGG results to be loaded. To do this, edit the analysis by clicking the Edit tab at the top of the analysis, re-check the box titled “Submit a job to parse the kegg output into Chado”, click Save and then re-run the job.

If we navigate to our feature page (http://localhost/orange1.1g015615m) we can now see the KEGG results by clicking the KEGG Assignments link in the Resources sidebar.

Tripal-KEGG-Results.png

Simliar to the GO report. A KEGG report is also avilable on the analysis and the organism page. Navigate to the Citrus sinensis organism page and click the KEGG Analysis Reports in the Resources sidebar. A page with instructions is visible:

Tripal-KEGG-Report-PRE.png

Follow the instructions on the page. Because we have already loaded the data we only need to popluate the kegg_by_organism materialized view. After populating the view we can now return to the organism page and view the KEGG report:

Tripal-KEGG-Results-After.png

Site visitors can browse KEGG results by expanding the trees correspoding to the the Brite terms.

Adding New Resources To Pages

Each page created by Tripal tends to have a Resources sidebar. We’ve seen this for the analysis, organism and feature pages. In some cases we may want to add new items to the resources sidebar. In the case of our Citrus sinensis organism we want to add a Downloads link to the organism page for relevant data. We do this using the Content Construction Kit (CCK). The CCK is a 3rd-parth Drupal module that allows new fields to be added to any content. Using the CCK we will add new fields to our organism page which Tripal will recognize as new resources.

Earlier in the tutorial we installed the CCK, but for review, we can do so easily using drush:

drush pm-enable content
drush pm-enable text
drush pm-enable number

Now, navigate to AdministerContent managementContent Types. The following page appears:

Tripal-Content-types.png

Beside the Organism content type, click on the link manage fields. The field editor page appears:

Tripal-CCK-ManageFields.png

We need to add three specific fields. In the form elements just below the existing fields, add the following values in the New field section:

Important: be sure to spell the field name correctly.

Click Save button and the following field setup page appears:

Tripal-CCK-FieldSetup.png

On this page, leave all fields as default except one. Set the number of values to Unlimited and click the Save field settings button. We can now see the new field:

Tripal-CCK-Field-RT.png

Finally, we don’t want this field to show up on the page but rather we want Tripal to handle it. If Tripal sees a field with the name ‘field_resource_titles’ it will automatically add it to the Resources sidebar and hence we do not need Drupal to display it. To exclude the field, click the Display fields tab. On this page there are two checkboxes, each under a column titled ‘Exclude’. Check both checkboxes and then click the Save button.

Tripal-CCK-Field-RT.png

Next, we need to add a second field to accompany the new field we just created. These two fields will work in conjunction with each other. One will serve as the title for the content and will appear on the resources sidebar as the link, and the other will house the content that will appear when the link is clicked. Follow the same steps as described previously to create a new field:

Click Save, then set the following:

Remember to click Display fields tab to exclude this field from display.

We now have two fields that allow us to add new content to the pages, one lets us provide a title and the other allows us to provide content. Suppose we also want to simply add links to the resources sidebar. We can do this with a third field. Add the following new field:

On the settings page set:

Don’t forget to exclude this field as well!

Finally, we want to group these fields together. Otherwise, when editing the organism page these fields may not be next to each other. We can do this using the same form we used to add new fields. But this time we will add a group. Under the New group section add the following group:

Click ‘Save’. We should now have a new group entry in bold text. We can use the cross-hairs next to each field and drag and drop the fields we want into the new group. Do this with the Resource Titles, Resource Blocks and Resource Links fields–in that order. When done it should look as follows:

Tripal-v1.1 resource group.png

Click the Save button.

In summary, we created three new fields that will appear on our organism page. Tripal will recognize these new field names (e.g. field_resource_titles, field_resources_blocks and field_resource_links) and will automatically put new items on the Resources sidebar for titles and links. The resource blocks are then the text content that corresponds to the titles.

Now, we can return to our Citrus sinensis organism page, and click the ‘Edit’ tab at the top. Scroll to the section titled Additional Resources to see our newly added fields.

Tripal-v1.1 org edit resources.png

We will use these fields to add a link to the Phytozome page for Citrus sinensis and also a Downloads block that will allow the site visitor to download all of the data files we used in this tutorial. First, add the following text to the Resource Links text box:

C sinensis at Phytozome|http://www.phytozome.net/search.php?method=Org_Csinensis

The title that will appear in the sidebar occurs before the ‘|’ character. The link comes after. Now we want to add our downloads page. In the Resource Titles field add the text, ‘Downloads’. Then in the text area under Resource Blocks add the following text:

<p>The following annotation files are available for download:</p>
<b>Structural Annotations</b>
<table>
  <tr>
    <th>C. sinensis v1.0 scaffolds (FASTA format)</th>
    <td><a href="/sites/default/files/citrus_sinensis-scaffold00001.fasta">citrus_sinensis-scaffold00001.fasta</a></td>
  </tr>
  <tr>
    <th>C. siensis v1.0 genes sequences (FASTA format)</th>
    <td><a href="/sites/default/files/citrus_sinensis-orange1.1g015632m.g.fasta">citrus_sinensis-orange1.1g015632m.g.fasta</a></td>
  </tr>
  <tr>
    <th>C. siensis v1.0 genes (GFF3 format)</th>
    <td><a href="/sites/default/files/citrus_sinensis-orange1.1g015632m.g.gff3">citrus_sinensis-orange1.1g015632m.g.gff3</a></td>
  </tr>
</table>

<b>Functional Annotations</b>
<table>
  <tr>
    <th>Blast of C. sinensis v1.0 mRNA vs NCBI nr (XML format)</th>
    <td><a href="/sites/default/files/blastx_citrus_sinensis-orange1.1g015632m.g.fasta.0_vs_nr.out">blastx_citrus_sinensis-orange1.1g015632m.g.fasta.0_vs_nr.out</a></td>
  </tr>
  <tr>
    <th>Blast of C. sinensis v1.0 mRNA vs ExPASy SwissProt (XML format)</th>
    <td><a href="/sites/default/files/blastx_citrus_sinensis-orange1.1g015632m.g.fasta.0_vs_uniprot_sprot.fasta.out">blastx_citrus_sinensis-orange1.1g015632m.g.fasta.0_vs_uniprot_sprot.fasta.out</a></td>
  </tr>
  <tr>
    <th>InterPro analysis of C. sinensis v1.0 mRNA (XML format)</th>
    <td><a href="/sites/default/files/citrus_sinensis-orange1.1g015632m.g.iprscan.xml">citrus_sinensis-orange1.1g015632m.g.iprscan.xml</a></td>
  </tr>
  <tr>
    <th>KEGG analysis of C. sinensis v1.0 mRNA (KEGG heir format)</th>
    <td><a href="/sites/default/files/citrus_sinensis-orange1.1g015632m.g.KEGG.heir.tar.gz">citrus_sinensis-orange1.1g015632m.g.KEGG.heir.tar.gz</td>
  </tr>
</table>

Note:: Above we added raw HTML for our downloads block. However, if you prefer, the ckeditor module can configured so you do not need to use HTML.

Click Save. Now, the link to Phytozome is present in the Resources sidebar and the Download block appears as follows:

Tripal-Downloads.png

Because we set the fields to be unlimited in number you can add as many links or titles with blocks as you like by editing the organism page and adding more. Be sure that the order of the Titles and blocks correspond with each other.

Finally, we can add resources for all Tripal content types (e.g. Analysis, Feature, etc.) However, because we have already created the CCK fields we don’t need to create them again. We simply reuse them on the other content types. For example, if we were to add these same fields to the Analysis content type, a new section appears when managing the fields called Existing field.

Tripal-CCK-Reuse.png

We can use this new section to add the three fields we created previously to the Analysis content type. For consistency we should use the same name for the label. And we should group these fields together as well.

Linking to Resource Blocks

As mentioned previously, each Tripal page typically has a Resources sidebar. In the previous section we learned how to add custom items to the sidebar using CCK fields. In some cases, we may want to link directly to these resource blocks. For example, the URL for our Citrus sinensis page has been:

http://localhost/node/2

On that page we have a feature browser, KEGG reports, GO reports, a data type summary and now we have a new Downloads block.

Suppose we want to have a new Downloads page that will provide links to all of the downloadable content on our site. We have downloadable content on our citrus page, but, it would be confusing if on our new Downloads page, we simply had a link to our organism page and expected users to click the Downloads link in the sidebar. What we really need is a link directly to the Downloads block on the Citrus sinensis page.

To create this link, first right-click on the link in the Resources sidebar and copying the link address. Paste the link in a text-editor so you can see the full URL. For this tutorial, the new Downloads link is

http://localhost/node/2#tripal_organism-resource_0-box

Notice the text after the hash symbol: tripal_organism-resource_0-box. If you split that text into three pieces, using the dash as a delimiter, then the second text (resource_0) is the text we will use to build the link. The direct link to the Downloads block on our organism page is then the concatenation of the URL for our species page with this text in the following way:

http://localhost/node/2?block=resource_0

Notice, that we added the suffix ‘?block=’ followed by the text ‘resource_0’ to our original organism link. You can create links to any block on the Resources sidebar using this method. As another example, here is the direct link to the data type summary block on our organism page:

http://localhost/node/2?block=feature_counts

Suppose we do not want our organism page to have node/2 in the URL, but rather we want the name of the species in our URL. If we edit our organism and set the URL path settings to be

species/citrus_sinensis

The URL for our organism page now becomes:

http://localhost/species/citrus_sinensis

We can still use the same scheme described above to link to the resource blocks, but using the new URL for our species:

http://localhost/species/citrus_sinensis?block=resource_0
http://localhost/species/citrus_sinensis?block=feature_counts

Adding Publications

Tripal v1.1 provides a new interface for automatically and manually adding publications. First we will manually add a new publication. To do this, we must first enable the Tripal Pub module. The Tripal Pub module can be installed using the site’s Modules page at AdministerSite BuildingModules, or using Drush. We have previously used Drush to install modules in this tutorial and the commands to install the Tripal Pub module are similar:

cd /var/www
drush pm-enable tripal_pub

You should see the following output in the terminal:

The following extensions will be enabled: tripal_pub, tripal_contact
Do you really want to continue? (y/n): y
tripal_contact was enabled successfully.                                                                   [ok]
tripal_pub was enabled successfully.                                                                       [ok]
The directory sites/default/files/tripal/tripal_pub has been created.                                      [status]
Job 'Load OBO Tripal Publication' submitted.  Check the jobs page for status                               [status]
The directory sites/default/files/tripal/tripal_contact has been created.                                  [status]
Job 'Load OBO Tripal Contacts' submitted.  Check the jobs page for status                                  [status]

You will notice that the Tripal Contact module was also installed, and two jobs were submitted. These jobs will load a contact and publication ontology. The Tripal Contact and Pub ontologies are custom vocabularies used for organizing information about publications and contact information. So, before we can add publications (or contacts) we need to run these jobs:

cd /var/www
drush trpjob-run administrator

Note: Always remember to set permissions for any new modules that are installed. Permissions can be set at AdministerUser ManagementPermissions. For this tutorial we do not need to set permissions because we are logging on using the administrator account, but for a real site permissions are necessary.

Manually Adding a Publication

Now that the Tripal publication and contact ontologies are loaded we can add publications. First, we will manually add a publication. Navigate to Create Content and click Publication.

Tripal-v1.1 create publication.png

We will add information about the Tripal publication. Enter the following values:

Our publication has been added and you should see the following page:

Tripal-v1.1 manual publication.png

Note: It seems a bug is present here. If you add additional properties, but accidentally forget a required field when saving, you will receive a message that the field is missing, which you can correct. However, when you click the save button again you will receive a page full of text and not the screen shot above. The publication was, however, saved and you can find it by using the publication search tool at http://localhost/find/publications. This bug will be corrected in future updates of Tripal.

Now we have a publication page, and it is informative but it doesn’t link to the website where the publication can be found. When we created the publication we failed to add a ‘URL’ as a property. Edit the page and add a new URL property with the property URL with the value:

http://database.oxfordjournals.org/content/2011/bar044.long

After saving the page, the title is now linked to the full article.

Searching for Publications

By default, Tripal provides a simple searching interface for many data types (e.g. organisms, analyses, features, etc). These can be found in the menu under Search Biological Data. However, the publications search tool is not in this list. It may be there in the future, but for now, it can be found at this URL http://[your site name]/find/publications.

Clicking the search button without providing any criteria will provide a list of all publications. For this tutorial, we only have a single publication:

Tripal-v1.1 pub search.png

However, you will notice that if you try to select a criteria that nothing is available. Tripal allows you to set which fields a user can use as criteria. In some cases not all fields will be appropriate given the publications available on the site. All of the properties available when adding a publication can be searched, but some properties like the URL may not be necessary for searching. You can specify which fields to use for search criteria by navigating to AdministerTripal ManagementPublicationsConfiguration. At the top of the page you will see a section for Searching Options:

Tripal-v1.1 pubconfig searchoptions.png

Here you can select which properties a user can use for searching. For this tutorial, find and check these options:

Then click the Save configuration button at the bottom. If we return to the publication search page, we now have criteria for searching.

Bulk Import of Publications

Tripal supports bulk importing of publications from remote databases such as NCBI PubMed and the USDA National Agricultural Library (AGL). Support of PubMed is built-in to the Tripal module, but support for AGL requires some additional setup on the server. You can find instructions for preparing the server for AGL on the AdministerTripal ManagementPublications page in the Setup Instructions section. For this tutorial we will create an importer for PubMed.

Creation of an importer is an administrative function. A publication importer is created by the site administrator and consists of a set of search criteria for finding multiple publications at one time. When the importer is run, it will query the remote database, retrieve the publications that match the criteria and add them to the database. Because we loaded genomic data for Citrus sinensis we will create an importer that will find all publications related to this species.

First, navigate to AdministerTripal ManagementPublicationsAdd an Importer. You will see the following page:

Tripal-v1.1 pubimporter1.png

Enter the following values in the fields:

Now, click the ‘Test Importer’ button. This will connect to PubMed and search for all publications that match our provided criteria. On the date this portion of the tutorial was written, 473 publications were found:

Tripal-v1.1 pubimporter2.png

Now, save this importer. You should see that we have one importer in the list:

Tripal-v1.1 pubimporter3.png

We can use this importer to load all 473 publications about Citrus sinensis from PubMed into our database (how to load these will be shown later). However, what if new publications are added? We would like this importer to be run monthly so that we can automatically add new publications as they become available. But we do not need to try to reload these 473 again. So, we will create a new importer that only finds publications within the last 30 days. To do this, click the link Create a new publication importer just below the list of importers. Now, add the following criteria:

Now, when we test the importer we find only 2 publications that have been add (created) in PubMed in the last 30 days:

Tripal-v1.1 pubimporter4.png

Save this importer.

Next, there are two ways to import these publications. The first it to manually import them. There is a Drush command that is used for importing publications. Return to the terminal and run the following command:

cd /var/www
drush tpubs-import

You should see output to the terminal that begins like this:

NOTE: Loading of publications is performed using a database transaction.
If the load fails or is terminated prematurely then the entire set of
insertions/updates is rolled back and will not be found in the database

Importing: Pubs for Citrus sinensis

And as publications are imported each one is printed to the screen. The importer will pause while it requests 100 publications. It will then load those, then pause to request another 100 until it imports all publications that match the criteria.

Some things to know about the publication importer:

  1. The importer keeps track of publications from the remote database using the publication accession (e.g. PubMed ID).
  2. If a publication with an accession (e.g. PubMed ID) already exists in the local database, the record will be updated.
  3. If a publication in the local database matches by title, journal and year with one that is to be imported, then the record will be updated. You can change the requirement of which fields to match at the AdministerTripal ManagementPublicationsConfiguration Page.

The second way to import publications is to add an entry to the UNIX cron. We did this previously for the Tripal Jobs management system when we first installed Tripal. We will add another entry for importing publications. But first, now that we have imported all of the relevant pubs, we need to return to the importers list at AdministerTripal ManagementPublicationsImporters List and disable the first importer we created. We do not want to run that importer again, as we’ve already imported all historical publications on record at PubMed. Click the edit button next to the importer named Pubs for Citrus sinensis, click the disable checkbox and then save the template. The template should now be disabled.

Tripal-v1.1 pubimporter5.png

Now we have the importer titled Pubs for Citrus sinensis last 30 days enabled. This is the importer we want to run on a monthly basis. The cron entry will do this for us. On the terminal open the crontab with the following command:

sudo crontab -e

Now add the following line to the bottom of the crontab:

30 8 1,15 * *  su - www-data -c 'cd /var/www; /usr/local/drush/drush -l http://[site url] tpubs-import --report=[your email] > /dev/null'

Where

The cron entry above will launch the importer at 8:30am on the first and fifteenth days of the month. We will run this importer twice a month in the event it fails to run (e.g. server is down) at least one time during the month.

Drupal Views Integration

Editing Existing Views

Drupal Views is a powerful 3rd-party module that allows an authorized user to query database tables in novel and unique ways and to create custom pages and search forms. Tripal has fully integrated the Chado database tables with Drupal Views. The basic search tools available under the Search Biological Data are all Drupal Views. As a brief introduction to views we will examine one of these views and customize it. In order to use Views you need a basic understanding of the Chado tables.

First, navigate to AdministerSite buildingViews

Tripal-v1.1 views1.png

Here we see the list of the views that have already been created. All of the active views were created automatically by Tripal modules when they were installed. At the bottom of the page are inactive views that come by default with the Views module. Scroll to the view titled feature_listing and click the ‘Edit’ button to the right of it. You will see the following page:

Tripal-v1.1 feature listing1.png

On this page you will see several sections: View settings, Basic settings, Relationships, Arguments, Fields, Sort criteria, and Filters. This tutorial will not describe all of these settings. There are tutorials on the web that better explains these fields. For this tutorial we will discuss only a few of them.

The Basic settings section provides details about how the view behaves, such as how many records to show at one time, how the results should be laid out on the page, etc. For example, for this view, the results are displayed in a table format, with 50 items per page.

The Fields section lists the fields that will be used for the view. For example, in this view we will be using the feature’s uniquename, name, type, organism common name and a few other fields from the Chado feature table. We also have the node ID of the Drupal node that corresponds to the feature.

The Sort criteria section lists the order in which results will be shown. The results will be sorted by organism common name, feature type and feature name, in that order.

The Filters section provides a set of criteria for limiting which records will be shown. We want to limit the results by common name, feature type and feature name. The filters are used to create the search form at the top of the features search page:

Tripal-Search-Features.png

A single view can have multiple displays. On the left hand side is the interface for adding additional displays:

Tripal-v1.1 feature listing displays.png

The first item titled features_all provides all of the default settings for the view. The next item titled Page is the page display. If you click Page you will see the same sections but a new section appears titled Page settings. Notice that the page settings contains the URL path (e.g. chado/features) for the view. This is the URL we use on the site to search for features (e.g. http://localhost/chado/features). We can customize this page by altering any of the fields. Those that are light gray and italic are inherited from the default view settings. We can test changes to the view using the Live preview section at the bottom of the view configuration page.

Note: always be cautious to notice when you are editing default settings for the entire view and settings just for a single display.

Now, suppose we do not like the way our search pages looks. Currently, for features, when search results are returned we see the unique name, name, feature type, common name of the organism, sequence length, if the sequence is obsolete and the date it was accessioned. We do not want the sequence length, or if the sequence is obsolete or the date it was accessioned to appear in our search results. And, suppose we want the genus and species to be listed instead of the common name. We can make these changes by editing the view settings.

First, we need to add the genus and species as fields to show. To do this, click the small plus in the header of the Fields section. A new section towards the bottom of the setting sections will appear with a long list of fields you can select from and a drop down to let you filter the list:

Tripal-v1.1 feature listing add field.png

In the Drop down you can see all of the Groups of fields associated with features. These groups correspond to Chado tables. Select the group Chado Organism. This reduces the list of fields to only those associated with the Chado orgnaism table. Click the checkbox for Chado Organism: Genus and Chado Organism: Species and click Save.

Next we see a set of configuration settings for the fields we selected. In this case, we see configuration settings for the Genus field first.

Tripal-v1.1 feature listing add field2.png

Views will let you control a lot of how the field is seen (or not seen) in the page results. Here we want to leave all the defaults. click the Update default display button at the bottom. This will ensure that this change is made to our current page and to the default settings as well. The configuration settings for the Species field now appears. Leave the defaults as well and click Update default display. Now that we are done configuring our new fields we can preview the changes using the Live Preview section. Below is a screen shot of the view after we have added our new fields:

Tripal-v1.1 feature listing live preview1.png

Now, we want to remove the unwanted fields. In the Fields section, click on the field named Chado Feature: Timeaccessioned. A configuration page appears similar to what we saw previously for Genus and Species. Check the box Exclude from display. This will leave the field present in the view but will not show it. Click the Update default display button. Alternatively, if we no longer want to keep this field in the view, then we could remove the field by clicking the Remove button. Do the same for the common name, sequence length and is obsolete fields. Our view now appears as follows:

Tripal-v1.1 feature listing live preview2.png

If we click the Save button at the bottom, this will save all of the changes we have made to the view and the Features search page under Search Biological Data will be updated.

We have only touched on a few of the capabilities of the Views interface. You can create advanced looking forms and pages using Views.

Adding a New View

To create a new view navigate to Administer’ Site building Views and click the Add tab near the top of the page. Here you see a set of fields. There are two that are required: the View name and the View type.

Tripal-v1.1 views new.png

Suppose we want to create a ‘Species’ page for our site that lists all of the available species that our site houses. To do this, enter the following

  1. View name: all_species
  2. View type: Node

Despite that we will be creating a list of species, we select the view type as ‘Node’. This is because all nodes (nodes are pages in Drupal lingo) have teasers. A teaser is a brief set of contents about a page and it only shows up in certain cases. Tripal provides teasers for all of its pages. So, we want to use the node teasers in our list of species. If however, we wanted to use content directly from the Chado organism table then we would have selected the view type to be Chado Organism. Click the Next button at the bottom.

Now we see the same form we saw before when editing an existing view, but there are no fields, sort criteria, or filters for this view. The first thing we want to do is indicate to the view that we will be using Node teasers. To do this, click the Row style link in the Basic Settings section. A box appears towards the bottom with radio buttons. Click the button for Node and then Update.

Tripal-v1.1 views row style.png

In the next box that appears, select Teaser in the drop down and click Update

Tripal-v1.1 views row style2.png

Next, we want to limit the nodes that are shown to be only those for organisms. In the Filters section, click the small plus in the section header. Select Node in the Groups drop down and click the box Node: Type and then click Add. This will allows us to filter on the node type. A new section appears allowing you to select which node type to filter. Check the Organism box and click Update

Tripal-v1.1 views filter1.png

Next we want to sort the list by title so that our species are in alphabetical order. In the Sort criteria section. In the settings select the group Node, check the box Node: Title and click the Add button. Then choose Ascending and click Update. We now have a sorted list of species showing up in the Live preview section:

Tripal-v1.1 views species previiew1.png

Notice in the Basic settings’ section that were are limiting the view to only show 10 items (see the items to display setting). If we ever have more than 10 species we will want to increase that number, or we will want to add a pager. A pager is an interface that appears at the bottom of the page that will allow a visitor to cycle between pages of content. If we have more than 10 species then the first page will show the first 10 species and an interface will be provided to let the user scroll to the next set of 10, etc. Click the Use Pager setting and change it to be Full pager. The pager will not be visible unless we have more than 10 species pages.

Finally, we want to create a display of type Page for this view. On the top left-hand side of the configuration settings, just above the Add display button, select Page in the drop down and click Add display. All of the fields are now greyed and italicized indicating this new page is inheriting all of the view settings we just created. Also a new Page settings section appeared. We need a URL for this species page so, click the Path setting and add the following text:

species

This will create a page at the URL http://localhost/species

Next, click the Menu item. And add click the NOrmal Menu Entry radio button, set the title to be Species and click Update:

Tripal-v1.1 views menu setup.png

Finally click Save to save this view. We now have a new menu item named Species on the main menu, and if we click that menu item we now see our new species page:

Tripal-v1.1 views new final.png

Limitations of views

Customizing The Look-and-Feel of Tripal

The default look-and-feel of data presented by Tripal is set in Drupal-style template files. These template files can be found inside of the Tripal theme and theme folder of the Tripal Extension modules. Drupal allows you to customize the templates. For this tutorial we will not cover customization of template files. However, a tutorial for customizing the look-and-feel of the site using templates can be found in the Developers Handbook.

Advanced Features

The Tripal Bulk Loader

The Tripal Bulk Loader is a new feature added to version 1.0. Often, data is not in common formats such as GFF, FASTA, GAF, InterPro XML, etc., but rather in Excel spreadsheets or tab-delimited or comma-separted files. The goal of the bulk loader is to enable a user to load data in these formats into the Chado schema. Currently, the bulk loader allows a site administrator to create custom loader templates that will allow a user to load tab-delimited files of any format.

Using the bulk loader web-interface, the priviledged user creates a “template” for loading a tab-delimited file. This templates specifies which fields in the Chado tables the values in the tab-delimited file will be stored. Once the template is fully defined, the priviledge user saves the template for other users to use. Another user can then load any tab-delimited files that matches the template. The user can upload as many files as desired.

Creating Custom Modules

As mentioned early in the Tutoral, Tripal is a modular software package. A Tripal API has been developed to help others who want to extend the functionality of Tripal. Anyone is welcome to develop modules for Tripal to suit their own needs and perhaps share them back with the community. The Tripal API can be found on the home page: http://tripal.sourceforge.net/. Information for developing new module can be found in the Tripal Developer’s Handbook.

Modules that conform the the Tripal API and Drupal coding standards will be officially approved by the Tripal Developers Consortium. These modules will be listed on the Tripal website and will be available in the Drupal module repository for download.

Anyone wishing to extend Tripal should sign up for the developers mailing list https://lists.sourceforge.net/lists/listinfo/gmod-tripal-devel and try to attend one of the monthly developer’s meetings to discuss the desired extensions.

Categories:

Facts about “GMOD Malaysia 2014/Tripal Tutorial

   
Has topic Tripal +

Documentation

Community

Tools