Difference between revisions of "Apollo PureJDBCTransactionWriter HOWTO"

From GMOD
Jump to: navigation, search
m (Changed FAQ formatting to match other FAQ pages.)
m (Configure and/or create name adapter)
 
(8 intermediate revisions by 4 users not shown)
Line 4: Line 4:
 
=Background=
 
=Background=
  
The Apollo chado/JDBC data adapter currently supports '''two''' methods for saving changes made in Apollo to a chado database via JDBC.  These two methods are implemented by the following (Java) classes:
+
The [[Apollo]] [[chado]]/JDBC data adapter currently supports '''two''' methods for saving changes made in Apollo to a chado database via JDBC.  These two methods are implemented by the following (Java) classes:
  
:* [http://gmod.cvs.sourceforge.net/gmod/apollo/src/java/apollo/dataadapter/chado/jdbc/JDBCTransactionWriter.java apollo.dataadapter.chado.jdbc.JDBCTransactionWriter]
+
:* {{SF_SVN|apollo/trunk/src/java/apollo/dataadapter/chado/jdbc/JDBCTransactionWriter.java apollo.dataadapter.chado.jdbc.JDBCTransactionWriter}}
:* [http://gmod.cvs.sourceforge.net/gmod/apollo/src/java/apollo/dataadapter/chado/jdbc/PureJDBCTransactionWriter.java apollo.dataadapter.chado.jdbc.PureJDBCTransactionWriter]
+
:* {{SF_SVN|apollo/trunk/src/java/apollo/dataadapter/chado/jdbc/PureJDBCTransactionWriter.java apollo.dataadapter.chado.jdbc.PureJDBCTransactionWriter}}
  
In theory these two classes represent different implementations of the same basic functionality.  In practice, however, there are a number of operational differences between the two.  One of the most significant is that the first class, JDBCTransactionWriter--which was also the first of the two to be implemented--will only work if the target chado database has been preloaded with a specific set of triggers and stored procedures.  The second class, PureJDBCTransactionWriter, does not require any custom stored procedures or triggers, relying instead on Java and JDBC to implement the logic encapsulated in the stored procedures (hence the "Pure JDBC" designation.)  The remainder of this document deals almost entirely with how to configure Apollo to use the second write method/class, the PureJDBCTransactionWriter.  However, there are pros and cons to both of these classes/methods, which should be considered before settling on one or the other for a particular application.  For more information about the differences between the two (albeit from the perspective of the initial author of the second class), and the reasons for adding the PureJDBCTransactionWriter, please see the detailed commit logs for the class, in particular the comments that accompanied the initial commit of this class: http://gmod.cvs.sourceforge.net/gmod/apollo/src/java/apollo/dataadapter/chado/jdbc/PureJDBCTransactionWriter.java?view=log#rev1.1
+
In theory these two classes represent different implementations of the same basic functionality.  In practice, however, there are a number of operational differences between the two.  One of the most significant is that the first class, JDBCTransactionWriter--which was also the first of the two to be implemented--will only work if the target chado database has been preloaded with a specific set of triggers and stored procedures.  The second class, PureJDBCTransactionWriter, does not require any custom stored procedures or triggers, relying instead on Java and JDBC to implement the logic encapsulated in the stored procedures (hence the "Pure JDBC" designation.)  The remainder of this document deals almost entirely with how to configure Apollo to use the second write method/class, the PureJDBCTransactionWriter.  However, there are pros and cons to both of these classes/methods, which should be considered before settling on one or the other for a particular application.  For more information about the differences between the two (albeit from the perspective of the initial author of the second class), and the reasons for adding the PureJDBCTransactionWriter, please see the detailed commit logs for the class, in particular the comments that accompanied the initial commit of [http://gmod.svn.sourceforge.net/viewvc/gmod/apollo/trunk/src/java/apollo/dataadapter/chado/jdbc/PureJDBCTransactionWriter.java?view=log#rev8349 PureJDBCTransactionWriter.java].
  
 
=Prerequisites=
 
=Prerequisites=
  
:* An Apollo-compatible chado database
+
:* An [[Apollo]]-compatible [[Chado]] database
:* A copy of Apollo built from the latest CVS HEAD
+
:* A copy of Apollo built from the latest [[SVN]] trunk.
:* Any configuration files or settings required to get Apollo to '''read''' from the aforementioned chado database using the chado/JDBC adapter.
+
:* Any configuration files or settings required to get Apollo to '''read''' from the aforementioned Chado database using the Chado/JDBC adapter.
 
:* An indefatigable devotion to working with beta code
 
:* An indefatigable devotion to working with beta code
  
Line 24: Line 24:
 
   SaveClonedFeaturesInTransactions "true"
 
   SaveClonedFeaturesInTransactions "true"
  
Set the above parameter to '''true''' to activate a reverse-compatible change to the way that Apollo tracks transactions internally that is essential for the PureJDBCTransactionWriter to work correctly.  A more detailed explanation of this parameter and the reason for its existence can be found here: http://gmod.cvs.sourceforge.net/gmod/apollo/src/java/apollo/config/Config.java?view=log#rev1.65
+
Set the above parameter to '''true''' to activate a reverse-compatible change to the way that Apollo tracks transactions internally that is essential for the PureJDBCTransactionWriter to work correctly.  A more detailed explanation of this parameter and the reason for its existence can be found [http://gmod.svn.sourceforge.net/viewvc/gmod/apollo/trunk/src/java/apollo/config/Config.java?view=log#rev8352 in the Subversion log].
  
 
==conf/chado-adapter.xml==
 
==conf/chado-adapter.xml==
Line 51: Line 51:
 
There are two name adapter implementations that are known to work and have been tested with the PureJDBCTransactionWriter:
 
There are two name adapter implementations that are known to work and have been tested with the PureJDBCTransactionWriter:
  
:* [http://gmod.cvs.sourceforge.net/gmod/apollo/src/java/apollo/config/TigrAnnotNameAdapter.java?view=log apollo.config.TigrAnnotNameAdapter ]
+
:* {{SF_SVN|apollo/trunk/src/java/apollo/config/TigrAnnotNameAdapter.java|apollo.config.TigrAnnotNameAdapter}}
:* [http://gmod.cvs.sourceforge.net/gmod/apollo/src/java/apollo/config/ParameciumNameAdapter.java?view=log apollo.config.ParameciumNameAdapter]
+
:* {{SF_SVN|apollo/trunk/src/java/apollo/config/ParameciumNameAdapter.java|apollo.config.ParameciumNameAdapter}}
  
It is possible that other name adapters will work without modification but this has not been verified.  If it is necessary to create a new name adapter it should be possible to modify one of the above name adapters to work with a different database.  The ParameciumNameAdapter works with a PostgreSQL chado database and queries the database itself (at feature creation time) to generate new unique ids.  The TigrAnnotNameAdapter, on the other hand, works with a Sybase chado database and relies on an external program (the path to which is defined by the '''TigrAnnotNameAdapterNewNameCommand''' global configuration parameter in apollo.cfg) to generate new ids.
+
It is possible that other name adapters will work without modification but this has not been verified.  If it is necessary to create a new name adapter it should be possible to modify one of the above name adapters to work with a different database.  The ParameciumNameAdapter works with a [[PostgreSQL]] [[Chado]] database and queries the database itself (at feature creation time) to generate new unique ids.  The TigrAnnotNameAdapter, on the other hand, works with a Sybase Chado database and relies on an external program (the path to which is defined by the '''TigrAnnotNameAdapterNewNameCommand''' global configuration parameter in <tt>apollo.cfg</tt>) to generate new ids.
  
If an existing name adapter (other than the two mentioned above) does not work with the PureJDBCTransactionWriter, it may be because the adapter has not been updated to take into account changes in the ApolloNameAdapterI interface that were introduced to acommodate the new writer class.  See http://gmod.cvs.sourceforge.net/gmod/apollo/src/java/apollo/config/ApolloNameAdapterI.java?view=log#rev1.27 and http://gmod.cvs.sourceforge.net/gmod/apollo/src/java/apollo/config/ApolloNameAdapterI.java?view=log#rev1.25 for additional details.
+
If an existing name adapter (other than the two mentioned above) does not work with the PureJDBCTransactionWriter, it may be because the adapter has not been updated to take into account changes in the ApolloNameAdapterI interface that were introduced to accommodate the new writer class.  See [http://gmod.svn.sourceforge.net/viewvc/gmod/apollo/trunk/src/java/apollo/config/ApolloNameAdapterI.java?view=log#rev8463 revision 8463] and [http://gmod.svn.sourceforge.net/viewvc/gmod/apollo/trunk/src/java/apollo/config/ApolloNameAdapterI.java?view=log#rev8370 revision 8370] for additional details.
  
 
=Test saving edits=
 
=Test saving edits=
Line 78: Line 78:
 
These two parameters allow one to specify that lookups should be done using '''only''' the chado feature.uniquename (by setting <queryFeatureIdWithName> to '''false''') or '''only''' the chado feature.name (by setting <queryFeatureIdWithUniquename> to '''false''').  Depending on the DBMS and database configuration this may result in faster feature_id lookups.  This is an issue that affects the PureJDBCTransactionWriter because a typical commit/save operation can entail numerous feature_id lookups and therefore the speed of the lookups can have a substantial impact on performance.  If both of these parameters are set to '''false''' then the chado/JDBC adapter will generate a runtime error.  If both are set to '''true''' (or are omitted) then the adapter reverts to its original behavior.
 
These two parameters allow one to specify that lookups should be done using '''only''' the chado feature.uniquename (by setting <queryFeatureIdWithName> to '''false''') or '''only''' the chado feature.name (by setting <queryFeatureIdWithUniquename> to '''false''').  Depending on the DBMS and database configuration this may result in faster feature_id lookups.  This is an issue that affects the PureJDBCTransactionWriter because a typical commit/save operation can entail numerous feature_id lookups and therefore the speed of the lookups can have a substantial impact on performance.  If both of these parameters are set to '''false''' then the chado/JDBC adapter will generate a runtime error.  If both are set to '''true''' (or are omitted) then the adapter reverts to its original behavior.
  
=Known bugs and/or limitations=
+
=Known limitations=
  
 +
:* The current PureJDBCTransactionWriter implementation has no support for exon sharing.  In other words, if two different splice forms of the same Apollo-created gene have a shared exon, the writer will insert an exon feature (and featureloc) for each of them.
 +
:* PureJDBCTransactionWriter currently does not support name adapters that assign "temp" ids (which are unique only within the current Apollo session) that are later replaced by "permanent" ids (which are unique within the target database.)
 
:* Since both JDBC writer classes rely on saving a list of Apollo transactions to the database they can generally only be used to "Save" to the same chado database from which the annotations were first read.
 
:* Since both JDBC writer classes rely on saving a list of Apollo transactions to the database they can generally only be used to "Save" to the same chado database from which the annotations were first read.
:* A corollary of the fact that the JDBC writer classes work almost solely off the transaction list (i.e., a set of diffs to a known database state) is that they have no direct way to detect if the database has been edited by another process sometime between loading the annotations and saving the transactions that correspond to the user's edits.
+
:* A corollary of the fact that the JDBC writer classes work almost solely off the transaction list (i.e., a set of diffs to a known database state) is that they have no direct way to detect if the database has been edited by another process sometime between loading the annotations and saving the transactions that correspond to the user's edits.  It is currently assumed that curators and database editors use a manual protocol to avoid write conflicts (e.g., "checking out" contigs by editing a wiki page or master list.)
 +
 
 +
=Known bugs=
 +
 
 +
:* In some circumstances the PureJDBCTransactionWriter will write overly long amino acid sequences to the database (i.e., sequences that contain in-frame stops).  This problem is not always apparent because when Apollo re-loads the relevant feature(s) it appears to automatically correct the erroneous translation.
 
:* Certain save operations (e.g., for information that can only be edited in the Apollo Annotation Info Editor) may not be supported by the PureJDBCTransactionWriter.
 
:* Certain save operations (e.g., for information that can only be edited in the Apollo Annotation Info Editor) may not be supported by the PureJDBCTransactionWriter.
  
Line 90: Line 96:
  
 
=Authors=
 
=Authors=
+
 
 
* [[User:Jcrabtree|Jcrabtree]]
 
* [[User:Jcrabtree|Jcrabtree]]
  
 +
[[Category:Apollo]]
 +
[[Category:Chado]]
 +
[[Category:Java]]
 
[[Category:HOWTO]]
 
[[Category:HOWTO]]

Latest revision as of 21:58, 15 September 2009

This HOWTO contains instructions for using the Apollo PureJDBCTransactionWriter, a component of the chado/JDBC adapter that allows Apollo to save changes directly to a chado database via JDBC, without having to install any stored procedures or triggers in the target database.


Background

The Apollo chado/JDBC data adapter currently supports two methods for saving changes made in Apollo to a chado database via JDBC. These two methods are implemented by the following (Java) classes:

In theory these two classes represent different implementations of the same basic functionality. In practice, however, there are a number of operational differences between the two. One of the most significant is that the first class, JDBCTransactionWriter--which was also the first of the two to be implemented--will only work if the target chado database has been preloaded with a specific set of triggers and stored procedures. The second class, PureJDBCTransactionWriter, does not require any custom stored procedures or triggers, relying instead on Java and JDBC to implement the logic encapsulated in the stored procedures (hence the "Pure JDBC" designation.) The remainder of this document deals almost entirely with how to configure Apollo to use the second write method/class, the PureJDBCTransactionWriter. However, there are pros and cons to both of these classes/methods, which should be considered before settling on one or the other for a particular application. For more information about the differences between the two (albeit from the perspective of the initial author of the second class), and the reasons for adding the PureJDBCTransactionWriter, please see the detailed commit logs for the class, in particular the comments that accompanied the initial commit of PureJDBCTransactionWriter.java.

Prerequisites

  • An Apollo-compatible Chado database
  • A copy of Apollo built from the latest SVN trunk.
  • Any configuration files or settings required to get Apollo to read from the aforementioned Chado database using the Chado/JDBC adapter.
  • An indefatigable devotion to working with beta code

Edit configuration files

conf/apollo.cfg

 SaveClonedFeaturesInTransactions	"true"

Set the above parameter to true to activate a reverse-compatible change to the way that Apollo tracks transactions internally that is essential for the PureJDBCTransactionWriter to work correctly. A more detailed explanation of this parameter and the reason for its existence can be found in the Subversion log.

conf/chado-adapter.xml

Note that all of the following are properties of the <chadoInstance> that corresponds to the target chado database(s):

 <pureJDBCWriteMode>true</pureJDBCWriteMode>

Setting this to true instructs Apollo to use the PureJDBCTransactionWriter for writes to any database associated with this chadoInstance. If set to false (the default value) then the JDBCTransactionWriter will be used instead.

 <pureJDBCCopyOnWrite>false</pureJDBCCopyOnWrite>

This should be set to false or omitted from the configuration file, as the "copy on write" mode has not yet been implemented. At issue here is whether Apollo should perform an "in-place" update of the chado database (i.e., false, the current default) or attempt to maintain the complete history of annotations/edits by creating a copy of the affected portions of the data model and using chado feature_relationship rows of type 'derives_from' to track the changes (i.e., true, the as-yet unimplemented "copy on write" mode.)

 <pureJDBCNoCommit>false</pureJDBCNoCommit>

Set this to false for production use; when set to true the PureJDBCTransactionWriter will issue an SQL ROLLBACK at the very end of each Save operation. This can be useful during testing as it allows one to check whether the writer is generating the correct sequence of SQL statements, but without changing the database. Note however that its utility is frequently limited to a single "Save" operation, because the SQL statements generated by the writer in subsequent save operations will often depend on changes that were made permanent in a previous Save. In other words, Apollo assumes that its internal data model is in agreement with the database after every "Save", and this assumption is invalidated by the ROLLBACKs generated in "no commit" mode.

Note that the <writebackXmlTemplateFile> is not used by the PureJDBCTransactionWriter. One advantage of the PureJDBCTransactionWriter over the JDBCTransactionWriter is that it delegates any and all update operations to the JdbcChadoAdapter, making it unnecessary to store duplicated configuration information in another file.

Style and tiers files

Create .style and .tiers files as described in the Apollo user guide, but note that the name adapters assigned to feature types in the .tiers files (via the name_method parameter) must be configured as described below.

Configure and/or create name adapter

Another crucial difference between the PureJDBCTransactionWriter and the JDBCTransactionWriter is that the former assigns (permanent) names to new features at the time they are created, whereas the latter assigns names at the time they are inserted into the database, and uses temporary names/ids in the interim.

There are two name adapter implementations that are known to work and have been tested with the PureJDBCTransactionWriter:

It is possible that other name adapters will work without modification but this has not been verified. If it is necessary to create a new name adapter it should be possible to modify one of the above name adapters to work with a different database. The ParameciumNameAdapter works with a PostgreSQL Chado database and queries the database itself (at feature creation time) to generate new unique ids. The TigrAnnotNameAdapter, on the other hand, works with a Sybase Chado database and relies on an external program (the path to which is defined by the TigrAnnotNameAdapterNewNameCommand global configuration parameter in apollo.cfg) to generate new ids.

If an existing name adapter (other than the two mentioned above) does not work with the PureJDBCTransactionWriter, it may be because the adapter has not been updated to take into account changes in the ApolloNameAdapterI interface that were introduced to accommodate the new writer class. See revision 8463 and revision 8370 for additional details.

Test saving edits

Test the new name adapter with <pureJDBCNoCommit> set to true and <logDirectory> set to a writable log directory (these are both properties of the <chadoInstance> in the chado-adapter.xml configuration file.) This will allow all the basic write functions (delete gene, split transcript, add new transcript, merge transcripts, etc.) to be debugged without modifying (and possibly corrupting) the database. In the case an error occurs the relevant Log4J log files (in <logDirectory>) can be examined (e.g., using chainsaw) to determine what went wrong.

Other notes

The following changes to the chado/JDBC adapter are not all directly related to the development of the PureJDBCTransactionWriter, but may be of use when setting up the Apollo configuration to *read* from the chado database. These are all properties of the <chadoInstance> in chado-adapter.xml:

 <logDirectory>/home/username/.apollo</logDirectory>

This parameter was introduced as part of the project to convert Apollo to use Log4J for all logging operations (versus writing directly to stderr/stdout.) When this parameter is defined the chado/JDBC adapter will automatically create two timestamped log files in the specified directory for each and every Apollo session. These log files will contain a record of all the SQL statements executed by the chado/JDBC adapter and, in the case of the PureJDBCTransactionWriter, a record of the mapping from Apollo transactions to JDBC/SQL statements. These logs are written in Log4J's XML format and are best examined using a tool like chainsaw.

 <queryFeatureIdWithUniquename>true</queryFeatureIdWithUniquename>
 <queryFeatureIdWithName>false</queryFeatureIdWithName>

The original JdbcChadoAdapter would query for chado feature_ids using both the chado feature.uniquename and chado feature.name, using an SQL query like the following:

 SELECT f.feature_id FROM feature f WHERE (f.uniquename = 'genename' OR f.name = 'genename')

These two parameters allow one to specify that lookups should be done using only the chado feature.uniquename (by setting <queryFeatureIdWithName> to false) or only the chado feature.name (by setting <queryFeatureIdWithUniquename> to false). Depending on the DBMS and database configuration this may result in faster feature_id lookups. This is an issue that affects the PureJDBCTransactionWriter because a typical commit/save operation can entail numerous feature_id lookups and therefore the speed of the lookups can have a substantial impact on performance. If both of these parameters are set to false then the chado/JDBC adapter will generate a runtime error. If both are set to true (or are omitted) then the adapter reverts to its original behavior.

Known limitations

  • The current PureJDBCTransactionWriter implementation has no support for exon sharing. In other words, if two different splice forms of the same Apollo-created gene have a shared exon, the writer will insert an exon feature (and featureloc) for each of them.
  • PureJDBCTransactionWriter currently does not support name adapters that assign "temp" ids (which are unique only within the current Apollo session) that are later replaced by "permanent" ids (which are unique within the target database.)
  • Since both JDBC writer classes rely on saving a list of Apollo transactions to the database they can generally only be used to "Save" to the same chado database from which the annotations were first read.
  • A corollary of the fact that the JDBC writer classes work almost solely off the transaction list (i.e., a set of diffs to a known database state) is that they have no direct way to detect if the database has been edited by another process sometime between loading the annotations and saving the transactions that correspond to the user's edits. It is currently assumed that curators and database editors use a manual protocol to avoid write conflicts (e.g., "checking out" contigs by editing a wiki page or master list.)

Known bugs

  • In some circumstances the PureJDBCTransactionWriter will write overly long amino acid sequences to the database (i.e., sequences that contain in-frame stops). This problem is not always apparent because when Apollo re-loads the relevant feature(s) it appears to automatically correct the erroneous translation.
  • Certain save operations (e.g., for information that can only be edited in the Apollo Annotation Info Editor) may not be supported by the PureJDBCTransactionWriter.

FAQ

Is it possible to use both JDBCTransactionWriter and PureJDBCTransactionWriter in the same Apollo install?

Yes, this is possible, at least in theory. The choice of write method can be made on a per-chadoInstance basis in the chado-adapter.xml configuration file, by setting the <pureJDBCWriteMode> flag. Although it is necessary to set the global configuration option SaveClonedFeaturesInTransactions to true in order for the PureJDBCTransactionWriter to work, the setting of this parameter should not affect the JDBCTransactionWriter in any way.

Authors