Difference between revisions of "Chado Full Text Search"

From GMOD
Jump to: navigation, search
(New page: Starting with the 0.30 release of the Chado GBrowse adaptor (Bio::DB::Das::Chado), there is support for full text searching of names and attributes of features in Chado through the GBrows...)
 
m (Text replace - "</sql>" to "</syntaxhighlight>")
 
(10 intermediate revisions by 3 users not shown)
Line 1: Line 1:
Starting with the 0.30 release of the Chado GBrowse adaptor (Bio::DB::Das::Chado), there is  support for full text searching of names and attributes of features in Chado through the GBrowse "Landmark or Region" search box.  This functionality is present largely due to the initial work by Leighton Pritchard, and we'd like to thank him for contributing to the Chado GBrowse adaptor.
+
Starting with the 0.30 release of the [[Chado]] [[GBrowse Adaptors|GBrowse adaptor]] ({{CPAN|Bio::DB::Das::Chado}}), there is  support for full text searching of names and attributes of features in Chado through the GBrowse "Landmark or Region" search box.  This functionality is present largely due to the initial work by [http://www.scri.ac.uk/staff/leightonpritchard Leighton Pritchard], and we'd like to thank him for contributing to the Chado GBrowse adaptor.
  
 
==Prerequisites==
 
==Prerequisites==
  
*PostgreSQL 8.4
+
* [[PostgreSQL]] 8.4
*Chado 1.11
+
* [[Chado]] 1.11
*GBrowse 1.70 or 2.0
+
* [[GBrowse]] 1.70 or 2.0
  
 
==Configuration==
 
==Configuration==
  
Before full text searching can be used, the database must be prepared.  Full text searching requires that a view that is present in the standard Chado distribution be materialized (that is, turned into a table), as well as adding a column to a few tables.  The Chado GBrowse adaptor comes with a script, gmod_chado_fts_prep.pl, that will perform these actions:
+
Before full text searching can be used, the database must be prepared.  Full text searching requires that a view that is present in the standard Chado distribution be materialized (that is, turned into a table), as well as adding a column to a few tables.  The Chado GBrowse adaptor comes with a script, <tt>gmod_chado_fts_prep.pl</tt>, that will perform these actions:
  
 
*Validate that you have the prerequisite software.
 
*Validate that you have the prerequisite software.
*Add "searchable_name" column to the feature table.
+
*Add "searchable_name" column to the {{ChadoMTLink|Sequence|feature|feature}} table.
*Add "searchable_synonym_sgml" column to the synonym table.
+
*Add "searchable_synonym_sgml" column to the {{ChadoMTLink|Sequence|synonym|synonym}} table.
*Add "searchable_accession" column to the dbxref table.
+
*Add "searchable_accession" column to the {{ChadoMTLink|General|dbxref|dbxref}} table.
 
*Add triggers on these tables so that when the name or accession is added or modified, the "searchable" column will be updated too.
 
*Add triggers on these tables so that when the name or accession is added or modified, the "searchable" column will be updated too.
*Execute the gmod_materialized_view_tool.pl to materialize the all_feature_names view.
+
*Execute the <tt>gmod_materialized_view_tool.pl</tt> to materialize the all_feature_names view.
  
In order to turn on full text searching through GBrowse, the "-fulltext 1" argument must be passed to the Chado GBrowse adaptor from the configuration file.  Like this example:
+
In order to turn on full text searching through GBrowse, the <tt>-fulltext 1</tt> argument must be passed to the Chado GBrowse adaptor from the configuration file.  Like this example:
  
<code>
+
[main:database]
[main:database]
+
db_adaptor    = Bio::DB::Das::Chado
db_adaptor    = Bio::DB::Das::Chado
+
db_args      = -dsn dbi:Pg:dbname=chado;host=localhost;port=5432
db_args      = -dsn dbi:Pg:dbname=chado;host=localhost;port=5432
+
 
                 -user cain
 
                 -user cain
 
                 -organism 'Saccharomyces cerevisiae'
 
                 -organism 'Saccharomyces cerevisiae'
 
                 -srcfeatureslice 1
 
                 -srcfeatureslice 1
 
                 -fulltext 1
 
                 -fulltext 1
</code>
 
  
 
==Maintaining the materialized view==
 
==Maintaining the materialized view==
  
If the data in your Chado database is changing, it is a good idea to periodically update the all_feature_names materialized view.  The gmod_materialized_view_tool.pl
+
If the data in your [[Chado]] database is changing, it is a good idea to periodically update the all_feature_names materialized view.  The <tt>gmod_materialized_view_tool.pl</tt> has a function for examining the database and updating materialized views.  This functionality can be run via a {{ManPageLink|8|cron|<tt>cron</tt>}} job to make it happen on a regular basis:
 +
 
 +
  1 0 * * * gmod_materialized_view_tool.pl --automatic
 +
 
 +
In this example, <tt>cron</tt> will examine the database, and if a view hasn't be updated recently, it will update the materialized at one minute past midnight, everyday.
 +
 
 +
To determine if a materialized view is out of date, the script only checks when was the last time it was updated, and if it is older than the configured time, the script recreates the materialized view (that is, there is no process for "marking" a view as out of date, because for instance, data has been inserted into one of the underlying tables).  When the all_feature_names materialized view was created with the above script, the frequency was set at daily, but this can be modified by updating the materialized_view table, and changing the value in refresh_time, which is an integer number of seconds.  The default value is 86400 (24 hours * 60 minutes * 60 seconds), but can be changed to weekly (604800) or any other value desired:
 +
 
 +
<syntaxhighlight lang="sql">
 +
  UPDATE materialized_view SET refresh_time = 604800 WHERE name = 'all_feature_names';
 +
</syntaxhighlight>
 +
 
 +
[[Category:Chado]]
 +
[[Category:GBrowse]]

Latest revision as of 23:33, 8 October 2012

Starting with the 0.30 release of the Chado GBrowse adaptor (Bio::DB::Das::Chado), there is support for full text searching of names and attributes of features in Chado through the GBrowse "Landmark or Region" search box. This functionality is present largely due to the initial work by Leighton Pritchard, and we'd like to thank him for contributing to the Chado GBrowse adaptor.

Prerequisites

Configuration

Before full text searching can be used, the database must be prepared. Full text searching requires that a view that is present in the standard Chado distribution be materialized (that is, turned into a table), as well as adding a column to a few tables. The Chado GBrowse adaptor comes with a script, gmod_chado_fts_prep.pl, that will perform these actions:

  • Validate that you have the prerequisite software.
  • Add "searchable_name" column to the feature table.
  • Add "searchable_synonym_sgml" column to the synonym table.
  • Add "searchable_accession" column to the dbxref table.
  • Add triggers on these tables so that when the name or accession is added or modified, the "searchable" column will be updated too.
  • Execute the gmod_materialized_view_tool.pl to materialize the all_feature_names view.

In order to turn on full text searching through GBrowse, the -fulltext 1 argument must be passed to the Chado GBrowse adaptor from the configuration file. Like this example:

[main:database]
db_adaptor    = Bio::DB::Das::Chado
db_args       = -dsn dbi:Pg:dbname=chado;host=localhost;port=5432
               -user cain
               -organism 'Saccharomyces cerevisiae'
               -srcfeatureslice 1
               -fulltext 1

Maintaining the materialized view

If the data in your Chado database is changing, it is a good idea to periodically update the all_feature_names materialized view. The gmod_materialized_view_tool.pl has a function for examining the database and updating materialized views. This functionality can be run via a cron job to make it happen on a regular basis:

 1 0 * * * gmod_materialized_view_tool.pl --automatic

In this example, cron will examine the database, and if a view hasn't be updated recently, it will update the materialized at one minute past midnight, everyday.

To determine if a materialized view is out of date, the script only checks when was the last time it was updated, and if it is older than the configured time, the script recreates the materialized view (that is, there is no process for "marking" a view as out of date, because for instance, data has been inserted into one of the underlying tables). When the all_feature_names materialized view was created with the above script, the frequency was set at daily, but this can be modified by updating the materialized_view table, and changing the value in refresh_time, which is an integer number of seconds. The default value is 86400 (24 hours * 60 minutes * 60 seconds), but can be changed to weekly (604800) or any other value desired:

  UPDATE materialized_view SET refresh_time = 604800 WHERE name = 'all_feature_names';