Difference between revisions of "Chado Django HOWTO"

From GMOD
Jump to: navigation, search
(This HOWTO describes how to use the Django framework to interact with a Chado database)
 
m (Text replace - "</python>" to "</syntaxhighlight>")
 
(22 intermediate revisions by 3 users not shown)
Line 1: Line 1:
Victor de Jager
+
--[[User:Vdejager|Vdejager]] 11:07, 1 September 2008 (UTC)
victor.de.jager@<removethis>.nbic.nl
+
27 August 2008
+
  
 +
= Chado access with Django HOWTO =
  
 
== Abstract ==
 
== Abstract ==
 
+
This [[:Category:HOWTO|HOWTO]] describes how to use the [http://www.djangoproject.com/ Django] (Python based) framework for accessing a [[Chado]] database. The Django framework can be used to create web interfaces and command line tools using the Python language.
this HOWTO describes how to use the Django (Python based) framework for accessing a Chado database. The Django framework can be used to create web interfaces and commandline tools using the Python language.
+
  
 
== Introduction ==
 
== Introduction ==
During the first GMOD Summerschool and GMOD Usermeeting a great deal was learned about Chado and the surrounding GMOD Tools. Specifically that one should try not to change the Chado scheme (althouhgh some do with very good reasons) and secondly not to change code of third party tools, perl modules etc in order to make them work with Chado. (or at least if they are bug fixes, give them back to the community). This will break upgradability and platform independance of those tools.
+
During the [[2008 GMOD Summer School|first GMOD Summer school]] and [[July 2008 GMOD Meeting]] a great deal was learned about [[Chado]] and the surrounding [[GMOD Components|GMOD tools]]. Specifically that one should try not to change the Chado schema (although some do with very good reasons) and secondly not to change code of third party tools, Perl modules etc in order to make them work with Chado. (Or at least if they are bug fixes, give them back to the community). This will break upgradability and platform independence of those tools.
  
== Why Django ==
+
== Why Django? ==
Django is a high-level Python Web framework that encourages rapid development and clean, pragmatic design and adheres the DRY (Don't Repeat Yourself) principle.
+
Some reasons why to use [http://www.djangoproject.com/ Django] as web framework
  
=== high performance ====
+
=== High Performance ===
Developed and used over two years by a fast-moving online-news operation, Django was designed to handle two challenges: the intensive deadlines of a newsroom and the stringent requirements of the experienced Web developers who wrote it. Although most genome annotation databases probably won't have to endure a milion hits per hour they will be able to benefit from a lot of optimization strategies applied to high traffic sites like query caching and lazy querying methods.
+
Django is a high-level [http://www.python.org/ Python] web framework that encourages rapid development and clean, pragmatic design and adheres the [http://docs.djangoproject.com/en/dev/misc/design-philosophies/#dry DRY (Don't Repeat Yourself)] principle.
 +
Developed and used over two years by a fast-moving online-news operation, Django was designed to handle two challenges: the intensive deadlines of a newsroom and the stringent requirements of the experienced Web developers who wrote it. Although most genome annotation databases probably won't have to endure a million hits per hour they will be able to benefit from a lot of optimization strategies applied to high traffic sites like query caching and lazy querying methods.
  
=== structure ===
+
=== Structure ===
 
Django lets you structure the design of a site to a high degree without giving up any flexibility or portability. Django certainly does not give you an out of the box website, but gives you a flexible and highly documented framework that is well maintained by a large community.
 
Django lets you structure the design of a site to a high degree without giving up any flexibility or portability. Django certainly does not give you an out of the box website, but gives you a flexible and highly documented framework that is well maintained by a large community.
  
This makes Django a nice choice for data disclosure projects like a webstite on top of a Chado database. There are other such frameworks like Turbogears (Python), Hibernate(Java), Ruby on Rails and Catalyst(Perl). Choose what you like and write a howto as well. Python is the most used language in our lab and thus an obvious first choice. (and the inventor is Dutch, Guido van Rossum, http://www.python.org/~guido/ , employed by Google.)
+
This makes Django a nice choice for data disclosure projects like a website on top of a Chado database. There are other such frameworks like [http://turbogears.org/ Turbogears] (Python), [http://www.hibernate.org/ Hibernate] (Java), [http://www.rubyonrails.org/ Ruby on Rails] and [http://www.catalystframework.org/ Catalyst] (Perl). Choose what you like and write a HOWTO as well. Python is the most used language in [http://www2.cmbi.ru.nl/groups/bacterial-genomics/research/ our lab] and thus an obvious first choice. (and the inventor is Dutch, [http://www.python.org/~guido/ Guido van Rossum], employed by Google.)
 +
 
 +
In an ideal world one would be able to upgrade the Django framework code without breaking anything (a practice I have been doing for almost a year with some other sites under development, only the last major changes to Django broke a site (but how and why to fix those is well described in the Django documentation)
 +
 
 +
Also, since the Chado schema is bigger than most schemas, the models will be generated or regenerated automatically.
 +
Any model specific functionality is attached to the model classes in such a way that the models can be upgraded independently without breaking the website code.
  
=== our goal ===
 
We will use the Django framework as showcase for disclosing our microbial genome information.
 
  
 +
=== Our Goal ===
 +
We will use the Django framework as showcase for annotating and disclosing our microbial genome database.
  
 
== Prerequisites ==
 
== Prerequisites ==
* A working Chado database. It should work with most versions. This howto was created using version 1.01 of the schema.
+
* If you are not familiar with Django, start reading the tutorial at http://docs.djangoproject.com (stable) or http://www.djangoproject.com/documentation/ (development)
* Python, at least 2.4, but preferrably version 2.5, this is probably already installed during your Linux setup.
+
* A working Chado database. It should work with most recent versions. This howto was created using version 1.01 of the schema.
* Apache 2 with mod_python installed. alternatively you may setup a mod_wsgi server as described in http://ericholscher.com/blog/2008/jul/8/setting-django-and-mod_wsgi/
+
* Python, at least 2.4, but preferably version 2.5, this is probably already installed during your Linux setup.
* psycopg2, the python postgres interface, which should be found in your Linx distribution or can be snatched from http://www.initd.org/
+
* Apache 2 with mod_python installed. alternatively you may setup a mod_wsgi server as described in [http://ericholscher.com/blog/2008/jul/8/setting-django-and-mod_wsgi/ Django and mod-wsgi]
 +
* psycopg2, the python postgres interface, which should be found in your Linux distribution or can be snatched from http://www.initd.org/
 
* Django of course. This howto is written with the Django version 1.0 beta 2, actually revision 8791 from the Django SVN repository which should be virtually identical to version 1.0.
 
* Django of course. This howto is written with the Django version 1.0 beta 2, actually revision 8791 from the Django SVN repository which should be virtually identical to version 1.0.
* please make shure mod_python works as described in http://www.djangoproject.com/documentation/modpython/
+
* Please make sure mod_python works as described in http://www.djangoproject.com/documentation/modpython/
* try to get the django welcome screen before continuing.
+
* Try to get the Django welcome screen before continuing the project creation step.
  
=== Important Django URLS ===
+
=== Important Django Urls ===
* http://www.djangoproject.com/  (the projects home)  
+
* http://www.djangoproject.com/  (the projects home)
 
* http://docs.djangoproject.com/en/dev/contents/ (a MUST READ if you are not familiar with the Django framework, try the tutorial)
 
* http://docs.djangoproject.com/en/dev/contents/ (a MUST READ if you are not familiar with the Django framework, try the tutorial)
 
* http://code.djangoproject.com/wiki/BackwardsIncompatibleChanges (have this at hand when you are following the SVN developer version of Django trunk)
 
* http://code.djangoproject.com/wiki/BackwardsIncompatibleChanges (have this at hand when you are following the SVN developer version of Django trunk)
 +
* http://www.djangoproject.com/community/ (a lot of talk, tips and code links)
 
* http://www.djangosnippets.org/ (all kinds of handy code snippets)
 
* http://www.djangosnippets.org/ (all kinds of handy code snippets)
 
* http://www.python.org (python documentation)
 
* http://www.python.org (python documentation)
 
  
 
== Preparations ==
 
== Preparations ==
  
From this point on it is assumed you have read the Django introduction and tutorial on the djangoproject website.
+
From this point on it is assumed you have read the [http://www.djangoproject.com/documentation/ Django introduction and tutorial] on the Django project website.  
  
In an ideal world one would be able to upgrade the Django framework code without breaking anything (a practise I have been doing for almost a year with some other sites under development, only the last major changes to Django broke a site (but how and why to fix those is well described in the Django documentation)
+
=== Create a Django project ===
 +
A Django project consists of a tree of files under a certain directory. This directory may be inside a user's home dir or inside a specific location where all Django projects are stored. When a Django website is created following the guidelines in the official documentation it should be a minimal task to change locations or even servers making deployment a breeze.
  
Also, since the Chado scheme is bigger than most schemes, the models should be generated ore regenerated automatically.
+
Inside your home directory create a Django project with the following command:
Any model specific functionality should be attached to the model classes in such a way that the models can be upgraded independently without breaking the website code.
+
  
 +
    django-admin.py startproject <your project name>
 +
    ''example /home/gmod/projects/django-admin startproject microgear''
  
=== create a Django project ===
+
This will create a directory that contains several files:
A Django project consists of a tree of files under a certain directory (preventing scattered code). this directory may be inside a user's homedir or inside a specific location where all Djano projects are stored. When a Django website is created following the guidelines in the official documentation it should be a minimal task to change locations or even servers making deployment a breeze.
+
  
Inside your home directory we create a Django project with the following command.
+
    [http://docs.python.org/tut/node8.html#packages __init__.py]
 +
    manage.py
 +
    settings.py
 +
    urls.py
  
django-admin.py startproject <your project name>
+
We start by changing the <tt>settings.py</tt> file and filling in some options:
  
This will create a directory that contains several files
+
    DATABASE_ENGINE = 'postgresql_psycopg2'            # 'postgresql_psycopg2', 'postgresql', 'mysql', 'sqlite3' or 'oracle'.
 +
    DATABASE_NAME = 'dev_chado_03'                   # Or path to database file if using sqlite3.
 +
    DATABASE_USER = 'chado'                            # Not used with sqlite3.
 +
    DATABASE_PASSWORD = '<no i'm not giving you mine>' # Not used with sqlite3.
 +
    DATABASE_HOST = ''                                # Set to empty string for localhost (uses sockets)
 +
                                                      # Set to machine IP to force tcp connection. Not used with sqlite3.
 +
    DATABASE_PORT = ''                                # Set to empty string for default. Not used with sqlite3.
  
__init__.py
+
* Make sure you set <tt>MEDIA_ROOT</tt>, <tt>MEDIA_URL</tt> and <tt>ADMIN_MEDIA_PREFIX</tt> as described in the Django manual.
manage.py
+
* Make <tt>site_media/</tt> a symlink in your project dir pointing to a directory on your web server's document root. This is where all your static files go (pdf's, jpgs,pngs etc)
settings.py
+
urls.py
+
  
We start by changing the settings.py file and filling in some options:
+
Save the file and we are ready for the model generation part.
  
DATABASE_ENGINE = 'postgresql_psycopg2'            # 'postgresql_psycopg2', 'postgresql', 'mysql', 'sqlite3' or 'oracle'.
+
== The Django Model Philosophy ==
DATABASE_NAME = 'dev_chado_03'                   # Or path to database file if using sqlite3.
+
DATABASE_USER = 'chado'                            # Not used with sqlite3.
+
DATABASE_PASSWORD = '<no i'm not giving you mine>' # Not used with sqlite3.
+
DATABASE_HOST = ''                                # Set to empty string for localhost (uses sockets)
+
  # Set to machine IP to force tcp connection. Not used with sqlite3.
+
DATABASE_PORT = ''                                # Set to empty string for default. Not used with sqlite3.
+
  
make sure you set MEDIA_ROOT, MEDIA_URL and ADMIN_MEDIA_PREFIX as described in the Django manual.
+
A model is the single, definitive source of data about your data. It contains the essential fields and behaviors of the data you’re storing. Django follows the DRY Principle. The goal is to define your data model in one place and automatically derive things from it.
  
make site_media/ a symlink in your project dir pointing to a directory on your webserver's document root. This is where all your static files go (pdf's, jpgs,pngs etc)
+
This is not going to work for a Chado database since the  schema is predefined and works a little bit different than how Django normally would create it. Django also does not know how to create views and such although it can perfectly use them as we will notice later.
  
save the file and we are ready for the model generation part.
+
=== Creating a Django App ===
 +
First create a Django application inside you project directory. Switch to your project directory and create an application framework with the command:
  
 +
  ./manage.py startapp <your application name>
  
=== Creating a Django app ===
+
  ''example /home/gmod/projects/microgear/manage.py startapp gmod''
First create a Django application inside you project directory. Switch to your project directory and create an application framework ith the command: ./manage.py startapp gmod
+
This will create a directory inside your project drectory named gmod and contains all file scaffolds we will need later.
+
  
  
=== Creating the models ===
+
This will create a directory inside your project directory named <tt>gmod</tt> and contains all file scaffolds we will need later.
Now we switch back to our project directory.
+
  
./manage.py inspectdb > unsortedmodels.py
+
=== Creating the Models ===
 +
Now we switch back to our project directory and enter the following command.
  
This will create a raw models.py with a model for each table and view in the Postgres database. We will need to edit this file a bit with a PERL script.
+
  ./manage.py inspectdb > unsortedmodels.py
  
Each foreign key relation should have a unique name in Django to support reverse relationships. The following perl code will create these unique names.
+
This will create a raw <tt>models.py</tt> with a model for each table and view in the specified [[:Category:PostgreSQL|Postgres]] database. We will need to edit this file a bit with a Perl script.
The code rewrites the models in such a way that one could query reverse relations like this:
+
  
feature.featureloc_feature_set.all() # returns all feature locations for a given feature
+
Each foreign key relation should have a unique name in Django to support [http://www.djangoproject.com/documentation/db-api/#backward reverse relationships]. The following Perl code will create these unique names. The code rewrites the models in such a way that these reverse relations are supported using a model method with the following name:
  
The code will also create an admin.py file for linking the models to the admin site (handy for smaller size tables like the organism table.
+
  model.relatedmodelname_field_set.(queryfilters)
  
Perl code:
 
  
 +
  ''example: Feature.featureset_feature.filter(srcfeature_exact='NC_004567')''
  
#########################################################################
+
:The table [[Chado Sequence Module#Table: featureloc|featureloc]] has two foreign keys to the table [[Chado Sequence Module#Table: feature|feature]], one through the field 'feature' and the other through the field 'srcfeature'. The above Django queryset will return all features that are referenced by featureloc records that have 'NC_004567' as source feature value.
sortmodels.pl.gz
+
#########################################################################
+
  
usage: perl sortmodels.pl unsortedmodels.py models.py <project> <app>
 
  
The resulting files, models.py and admin.py should be copied to the <app> directory.
+
The code will also create an <tt>admin.py</tt> file for linking the models to the admin site (handy for smaller size tables like the [[Chado Organism Module#Table: organism|organism]], [[Chado General Module#Table: db|db]] or [[Chado CV Module#Table: cv|cv]] tables).
Have a look at these files. A model in Django representing a database table looks like this:
+
  
class Feature(models.Model):
+
Perl code is available at http://www.cmbi.ru.nl/~vdejager/gmod/sortmodel.pl.gz Feel free to change and republish since my Perl is a bit rusty.
    feature_id = models.IntegerField(primary_key=True)
+
    dbxref = models.ForeignKey('Dbxref', related_name="feature_dbxref_set")
+
    organism = models.ForeignKey('Organism', related_name="feature_organism_set")
+
    name = models.CharField(max_length=255)
+
    uniquename = models.TextField()
+
    residues = models.TextField()
+
    seqlen = models.IntegerField()
+
    md5checksum = models.TextField() # This field type is a guess.
+
    type = models.ForeignKey('Cvterm', related_name="feature_type_set")
+
    is_analysis = models.BooleanField()
+
    is_obsolete = models.BooleanField()
+
    timeaccessioned = models.DateTimeField()
+
    timelastmodified = models.DateTimeField()
+
  
    class Meta:
+
Usage:
app_label="<your app name>"
+
perl sortmodels.pl unsortedmodels.py models.py <project> <app>
        db_table = u'feature'
+
  
 +
The resulting files, <tt>models.py</tt> and <tt>admin.py</tt> should be copied to the <app> directory.
 +
Have a look at these files. A model in Django representing a database table looks like this:
 +
<syntaxhighlight lang="python">
 +
    class Feature(models.Model):
 +
        feature_id = models.IntegerField(primary_key=True)
 +
        dbxref = models.ForeignKey('Dbxref', related_name="feature_dbxref_set")
 +
        organism = models.ForeignKey('Organism', related_name="feature_organism_set")
 +
        name = models.CharField(max_length=255)
 +
        uniquename = models.TextField()
 +
        residues = models.TextField()
 +
        seqlen = models.IntegerField()
 +
        md5checksum = models.TextField() # This field type is a guess.
 +
        type = models.ForeignKey('Cvterm', related_name="feature_type_set")
 +
        is_analysis = models.BooleanField()
 +
        is_obsolete = models.BooleanField()
 +
        timeaccessioned = models.DateTimeField()
 +
        timelastmodified = models.DateTimeField()
 +
</syntaxhighlight>
  
===  creating model specific functions ===
+
===  Creating Model Specific Functions ===
in Django it is possible to specify so called model methods. These model methods describe the way a model behaves and can add certain functionality to a model. A special model method called __unicode__ describes how to display the standard name of a model instance (representing a record in the database). We use these methods to get something readable while playing with the command line further in this tutorial
+
In Django it is possible to specify so called ''model methods''. These model methods describe the way a model behaves and can add certain functionality to a model. A special model method called <tt>__unicode__</tt> describes how to display the standard name of a model instance (representing a record in the database). We use these methods to get something readable while playing with the command line further in this tutorial.
  
We could create this model definition by editing the classes in model.py, but instead we will use a common python pattern.
+
We could create this model definition by editing the classes in <tt>model.py</tt>, but instead we will use a common Python pattern.
We create a new file called modeldefs.py. Inside this file we will create all our model methods and link them together inside the special __init__.py file that is used to initialize the classes in Python.
+
  
modeldefs.py:
+
We create a new file called <tt>modeldefs.py</tt>. Inside this file we will create all our model methods and link them together inside the special <tt>__init__.py</tt> file that is used to initialize the [http://docs.python.org/tut/node8.html#packages package] in Python
  
#this file contains all the model methods we will attach to the specific models in the __init__.py file
+
<tt>modeldefs.py</tt>:
# one method may be attached to different model adhering to the DRY principle
+
<syntaxhighlight lang="python">
#
+
    #this file contains all the model methods we will attach to the specific models in the __init__.py file
#The line below imports all the Chado models
+
    # one method may be attached to different model adhering to the DRY principle
from <project>.<app>.models import *
+
    #
 +
    #The line below imports all the Chado models
 +
    from <project>.<app>.models import *
  
#this is a generic method definition for model, returning the value of the field called 'name'
+
    #this is a generic method definition for model, returning the value of the field called 'name'
def unicode_name(self):
+
    def unicode_name(self):
    return self.name
+
        return self.name
  
  
# this is a method definition for the 'Organism' model, returning the value of the field called
+
    # this is a method definition for the 'Organism' model, returning the value of the field called
# 'common_name'
+
    # 'common_name'
def unicode_common_name(self):
+
    def unicode_common_name(self):
    return self.common_name
+
        return self.common_name
 +
</syntaxhighlight>
  
 +
=== Attaching the Model Method Definitions to Specific Models ===
 +
<tt>__init__.py</tt>:
 +
<syntaxhighlight lang="python">
 +
    # this file attaches defined methods to specific models
 +
    #
 +
    # import the model method definitions
 +
    from <project>.<app>.modeldefs import *
  
=== attaching the model method defnitions to specific models ===
+
    setattr(Organism, '__unicode__', unicode_common_name)
__init__.py:
+
# this file attaches defined methods to specific models
+
#
+
# import the model method definitions
+
from <project>.<app>.modeldefs import *
+
  
setattr(Organism, '__unicode__', unicode_common_name)
+
    setattr(Cv, '__unicode__', unicode_name)
 +
    setattr(Db, '__unicode__', unicode_name)
 +
    setattr(Cvterm, '__unicode__', unicode_name)
 +
    setattr(Feature, '__unicode__', unicode_name)
  
setattr(Cv, '__unicode__', unicode_name)
+
    setattr(Featureloc, '__unicode__', location)
setattr(Db, '__unicode__', unicode_name)
+
</syntaxhighlight>
setattr(Cvterm, '__unicode__', unicode_name)
+
setattr(Feature, '__unicode__', unicode_name)
+
  
setattr(Featureloc, '__unicode__', location)
+
=== Link Everything Together ===
 +
Go to your project directory to change the files below:
  
 +
'''In <tt>settings.py</tt>:'''
  
 +
The <tt>INSTALLED_APPS</tt> section should contain besides the standard settings.
 +
<syntaxhighlight lang="python">
 +
    'django.contrib.admin',
 +
    '<project>.<app>.',
 +
</syntaxhighlight>
 +
''Note the comma at the last item. This is a Python requisite.''
  
 +
'''In <tt>urls.py</tt>:'''
  
=== link everyling together ===
+
Uncomment all line described as necessary for the automatic admin site.
  
in settings.py:
+
=== Finalizing ===
the INSTALLED_APPS section should contain besides the standard settings.
+
Once this has been inserted we need to run one other command. From the command line inside your <project> run
 +
    ./manage syncdb
 +
This will install all the tables necessary for the Django Admin application.
 +
You are now ready to continue building a website or run scripts using the Django framework against a Chado database.
 +
Alternatively, you should be able to go to your website url admin page and see the models described in the <tt>@adminmodels</tt> array in the <tt>sortmodels.pl</tt> script
  
    'django.contrib.admin',
+
''example: http://localhost/microgear/admin/'' (although this url depends on ''how'' you install your Django sites.
    '<project>.<app>.',
+
**note the comma at the last item, this is a python requisite
+
  
=== finalizing ===
+
== Using Django From the Command Line ==
Once this has been inserted we need to run one other command. From the commandline inside your <project> run
+
./manage syncdb
+
this will install all the tables necessary for the Django Admin application.
+
You are now ready to continue building a website or run a scripts using the Django framework against a Chado database.
+
  
 +
(You may want to install [http://code.google.com/p/django-command-extensions/wiki/InstallationInstructions Django commandline extensions].)
  
== using Django from the command line ==
+
=== Starting an Interactive Python Shell ===
(you may want to install Django commandline extentions from http://code.google.com/p/django-command-extensions/wiki/InstallationInstructions )
+
 
+
=== staring an interactive python shell
+
 
+
  
 
Inside your project dir
 
Inside your project dir
./manage shell
+
    ./manage shell
 +
    >>>from <project>.<app>.models import *
  
from <project>.<app>.models import *
+
=== Querying the Database ===
  
=== querying the database ===
+
See the [http://www.djangoproject.com/documentation/db-api/ Django database API documentation] for an explanation of all database api methods.
  
 
Show all Organisms in the database:
 
Show all Organisms in the database:
  
Organism.objects.all()
+
    >>>Organism.objects.all()
  
All Features from a specific organism (See http://www.djangoproject.com/documentation/db-api/ for an explanation of all database api methods):
+
All Features from a specific organism:
  
Feature.objects.filter(Organism__common_name__iexact='Lactobacillus_plantarum')
+
    >>>Feature.objects.filter(Organism__common_name__iexact='Lactobacillus_plantarum')
  
 
All Features from a specific source feature between a start and stop location:
 
All Features from a specific source feature between a start and stop location:
  
Feature.featureloc_feature_set.filter(strand__exact=1).filter(fmin__gte=10000).filter(fmax__lte=20000)
+
    >>>Feature.featureloc_feature_set.filter(strand__exact=1).filter(fmin__gte=10000).filter(fmax__lte=20000)
  
=== stacking queries ===
+
=== Stacking Queries ===
  
=== show me the generated SQL ===
+
Using Q objects
It is possible to see the SQL Django generates using the follwing commands
+
  
Make sure your Django DEBUG setting is set to True. Then, just do this:
+
=== Show Me the Generated SQL ===
 +
It is possible to see the [[Glossary#SQL|SQL]] Django generates using the following commands
  
>>> from django.db import connection
+
Make sure your Django <tt>DEBUG</tt> setting is set to <tt>True</tt>. Then, just do this:
>>> connection.queries
+
  
connection.queries is only available if DEBUG is True. It’s a list of dictionaries in order of query execution. Each dictionary has the following:
+
    >>> from django.db import connection
 +
    >>> connection.queries
  
``sql`` -- The raw SQL statement
+
<tt>connection.queries</tt> is only available if <tt>DEBUG</tt> is <tt>True</tt>. It’s a list of dictionaries in order of query execution. Each dictionary has the following:
``time`` -- How long the statement took to execute, in seconds.
+
  
connection.queries includes all SQL statements — INSERTs, UPDATES, SELECTs, etc. Each time your app hits the database, the query will be recorded.
+
* <tt>sql</tt> -- The raw SQL statement
 +
* <tt>time</tt> -- How long the statement took to execute, in seconds.
  
 +
<tt>connection.queries</tt> includes all SQL statements — INSERTs, UPDATES, SELECTs, etc. Each time your app hits the database, the query will be recorded.
  
== running commandline python scripts using Django for database interaction ==  
+
== Running commandline python scripts using Django for database interaction ==
  
  
Line 246: Line 263:
  
  
== example website ==
+
== Example website ==
 
+
 
+
 
+
 
+
 
+
 
+
 
+
 
+
 
+
 
+
 
+
 
+
 
+
 
+
 
+
 
+
==
+
== Headline text ==
+
==
+
  
 
[[Category:HOWTO]]
 
[[Category:HOWTO]]

Latest revision as of 18:11, 9 October 2012

--Vdejager 11:07, 1 September 2008 (UTC)

Chado access with Django HOWTO

Abstract

This HOWTO describes how to use the Django (Python based) framework for accessing a Chado database. The Django framework can be used to create web interfaces and command line tools using the Python language.

Introduction

During the first GMOD Summer school and July 2008 GMOD Meeting a great deal was learned about Chado and the surrounding GMOD tools. Specifically that one should try not to change the Chado schema (although some do with very good reasons) and secondly not to change code of third party tools, Perl modules etc in order to make them work with Chado. (Or at least if they are bug fixes, give them back to the community). This will break upgradability and platform independence of those tools.

Why Django?

Some reasons why to use Django as web framework

High Performance

Django is a high-level Python web framework that encourages rapid development and clean, pragmatic design and adheres the DRY (Don't Repeat Yourself) principle. Developed and used over two years by a fast-moving online-news operation, Django was designed to handle two challenges: the intensive deadlines of a newsroom and the stringent requirements of the experienced Web developers who wrote it. Although most genome annotation databases probably won't have to endure a million hits per hour they will be able to benefit from a lot of optimization strategies applied to high traffic sites like query caching and lazy querying methods.

Structure

Django lets you structure the design of a site to a high degree without giving up any flexibility or portability. Django certainly does not give you an out of the box website, but gives you a flexible and highly documented framework that is well maintained by a large community.

This makes Django a nice choice for data disclosure projects like a website on top of a Chado database. There are other such frameworks like Turbogears (Python), Hibernate (Java), Ruby on Rails and Catalyst (Perl). Choose what you like and write a HOWTO as well. Python is the most used language in our lab and thus an obvious first choice. (and the inventor is Dutch, Guido van Rossum, employed by Google.)

In an ideal world one would be able to upgrade the Django framework code without breaking anything (a practice I have been doing for almost a year with some other sites under development, only the last major changes to Django broke a site (but how and why to fix those is well described in the Django documentation)

Also, since the Chado schema is bigger than most schemas, the models will be generated or regenerated automatically. Any model specific functionality is attached to the model classes in such a way that the models can be upgraded independently without breaking the website code.


Our Goal

We will use the Django framework as showcase for annotating and disclosing our microbial genome database.

Prerequisites

  • If you are not familiar with Django, start reading the tutorial at http://docs.djangoproject.com (stable) or http://www.djangoproject.com/documentation/ (development)
  • A working Chado database. It should work with most recent versions. This howto was created using version 1.01 of the schema.
  • Python, at least 2.4, but preferably version 2.5, this is probably already installed during your Linux setup.
  • Apache 2 with mod_python installed. alternatively you may setup a mod_wsgi server as described in Django and mod-wsgi
  • psycopg2, the python postgres interface, which should be found in your Linux distribution or can be snatched from http://www.initd.org/
  • Django of course. This howto is written with the Django version 1.0 beta 2, actually revision 8791 from the Django SVN repository which should be virtually identical to version 1.0.
  • Please make sure mod_python works as described in http://www.djangoproject.com/documentation/modpython/
  • Try to get the Django welcome screen before continuing the project creation step.

Important Django Urls

Preparations

From this point on it is assumed you have read the Django introduction and tutorial on the Django project website.

Create a Django project

A Django project consists of a tree of files under a certain directory. This directory may be inside a user's home dir or inside a specific location where all Django projects are stored. When a Django website is created following the guidelines in the official documentation it should be a minimal task to change locations or even servers making deployment a breeze.

Inside your home directory create a Django project with the following command:

   django-admin.py startproject <your project name>
   example /home/gmod/projects/django-admin startproject microgear

This will create a directory that contains several files:

   __init__.py
   manage.py
   settings.py
   urls.py

We start by changing the settings.py file and filling in some options:

   DATABASE_ENGINE = 'postgresql_psycopg2'            # 'postgresql_psycopg2', 'postgresql', 'mysql', 'sqlite3' or 'oracle'.
   DATABASE_NAME = 'dev_chado_03'	                   # Or path to database file if using sqlite3.
   DATABASE_USER = 'chado'                            # Not used with sqlite3.
   DATABASE_PASSWORD = '<no i'm not giving you mine>' # Not used with sqlite3.
   DATABASE_HOST =                                  # Set to empty string for localhost (uses sockets)
                                                      # Set to machine IP to force tcp connection. Not used with sqlite3.
   DATABASE_PORT =                                  # Set to empty string for default. Not used with sqlite3.
  • Make sure you set MEDIA_ROOT, MEDIA_URL and ADMIN_MEDIA_PREFIX as described in the Django manual.
  • Make site_media/ a symlink in your project dir pointing to a directory on your web server's document root. This is where all your static files go (pdf's, jpgs,pngs etc)

Save the file and we are ready for the model generation part.

The Django Model Philosophy

A model is the single, definitive source of data about your data. It contains the essential fields and behaviors of the data you’re storing. Django follows the DRY Principle. The goal is to define your data model in one place and automatically derive things from it.

This is not going to work for a Chado database since the schema is predefined and works a little bit different than how Django normally would create it. Django also does not know how to create views and such although it can perfectly use them as we will notice later.

Creating a Django App

First create a Django application inside you project directory. Switch to your project directory and create an application framework with the command:

  ./manage.py startapp <your application name>
  example /home/gmod/projects/microgear/manage.py startapp gmod


This will create a directory inside your project directory named gmod and contains all file scaffolds we will need later.

Creating the Models

Now we switch back to our project directory and enter the following command.

  ./manage.py inspectdb > unsortedmodels.py

This will create a raw models.py with a model for each table and view in the specified Postgres database. We will need to edit this file a bit with a Perl script.

Each foreign key relation should have a unique name in Django to support reverse relationships. The following Perl code will create these unique names. The code rewrites the models in such a way that these reverse relations are supported using a model method with the following name:

  model.relatedmodelname_field_set.(queryfilters)


  example: Feature.featureset_feature.filter(srcfeature_exact='NC_004567')
The table featureloc has two foreign keys to the table feature, one through the field 'feature' and the other through the field 'srcfeature'. The above Django queryset will return all features that are referenced by featureloc records that have 'NC_004567' as source feature value.


The code will also create an admin.py file for linking the models to the admin site (handy for smaller size tables like the organism, db or cv tables).

Perl code is available at http://www.cmbi.ru.nl/~vdejager/gmod/sortmodel.pl.gz Feel free to change and republish since my Perl is a bit rusty.

Usage:

perl sortmodels.pl unsortedmodels.py models.py <project> <app>

The resulting files, models.py and admin.py should be copied to the <app> directory. Have a look at these files. A model in Django representing a database table looks like this:

    class Feature(models.Model):
        feature_id = models.IntegerField(primary_key=True)
        dbxref = models.ForeignKey('Dbxref', related_name="feature_dbxref_set")
        organism = models.ForeignKey('Organism', related_name="feature_organism_set")
        name = models.CharField(max_length=255)
        uniquename = models.TextField()
        residues = models.TextField()
        seqlen = models.IntegerField()
        md5checksum = models.TextField() # This field type is a guess.
        type = models.ForeignKey('Cvterm', related_name="feature_type_set")
        is_analysis = models.BooleanField()
        is_obsolete = models.BooleanField()
        timeaccessioned = models.DateTimeField()
        timelastmodified = models.DateTimeField()

Creating Model Specific Functions

In Django it is possible to specify so called model methods. These model methods describe the way a model behaves and can add certain functionality to a model. A special model method called __unicode__ describes how to display the standard name of a model instance (representing a record in the database). We use these methods to get something readable while playing with the command line further in this tutorial.

We could create this model definition by editing the classes in model.py, but instead we will use a common Python pattern.

We create a new file called modeldefs.py. Inside this file we will create all our model methods and link them together inside the special __init__.py file that is used to initialize the package in Python

modeldefs.py:

    #this file contains all the model methods we will attach to the specific models in the __init__.py file
    # one method may be attached to different model adhering to the DRY principle
    #
    #The line below imports all the Chado models
    from <project>.<app>.models import *
 
    #this is a generic method definition for model, returning the value of the field called 'name'
    def unicode_name(self):
        return self.name
 
 
    # this is a method definition for the 'Organism' model, returning the value of the field called
    # 'common_name'
    def unicode_common_name(self):
        return self.common_name

Attaching the Model Method Definitions to Specific Models

__init__.py:

    # this file attaches defined methods to specific models
    #
    # import the model method definitions
    from <project>.<app>.modeldefs import *
 
    setattr(Organism, '__unicode__', unicode_common_name)
 
    setattr(Cv, '__unicode__', unicode_name)
    setattr(Db, '__unicode__', unicode_name)
    setattr(Cvterm, '__unicode__', unicode_name)
    setattr(Feature, '__unicode__', unicode_name)
 
    setattr(Featureloc, '__unicode__', location)

Link Everything Together

Go to your project directory to change the files below:

In settings.py:

The INSTALLED_APPS section should contain besides the standard settings.

    'django.contrib.admin',
    '<project>.<app>.',

Note the comma at the last item. This is a Python requisite.

In urls.py:

Uncomment all line described as necessary for the automatic admin site.

Finalizing

Once this has been inserted we need to run one other command. From the command line inside your <project> run

   ./manage syncdb

This will install all the tables necessary for the Django Admin application. You are now ready to continue building a website or run scripts using the Django framework against a Chado database. Alternatively, you should be able to go to your website url admin page and see the models described in the @adminmodels array in the sortmodels.pl script

example: http://localhost/microgear/admin/ (although this url depends on how you install your Django sites.

Using Django From the Command Line

(You may want to install Django commandline extensions.)

Starting an Interactive Python Shell

Inside your project dir

   ./manage shell
   >>>from <project>.<app>.models import *

Querying the Database

See the Django database API documentation for an explanation of all database api methods.

Show all Organisms in the database:

   >>>Organism.objects.all()

All Features from a specific organism:

   >>>Feature.objects.filter(Organism__common_name__iexact='Lactobacillus_plantarum')

All Features from a specific source feature between a start and stop location:

   >>>Feature.featureloc_feature_set.filter(strand__exact=1).filter(fmin__gte=10000).filter(fmax__lte=20000)

Stacking Queries

Using Q objects

Show Me the Generated SQL

It is possible to see the SQL Django generates using the following commands

Make sure your Django DEBUG setting is set to True. Then, just do this:

   >>> from django.db import connection
   >>> connection.queries

connection.queries is only available if DEBUG is True. It’s a list of dictionaries in order of query execution. Each dictionary has the following:

  • sql -- The raw SQL statement
  • time -- How long the statement took to execute, in seconds.

connection.queries includes all SQL statements — INSERTs, UPDATES, SELECTs, etc. Each time your app hits the database, the query will be recorded.

Running commandline python scripts using Django for database interaction

Tips and tricks

BioPython interaction

Example website