http://gmod.org/mediawiki/api.php?action=feedcontributions&user=165.124.152.78&feedformat=atomGMOD - User contributions [en]2024-03-28T12:31:39ZUser contributionsMediaWiki 1.23.13http://gmod.org/wiki/Modware_PresentationModware Presentation2007-02-22T22:13:55Z<p>165.124.152.78: /* Problem 1 - Create and Insert a Gene */</p>
<hr />
<div>Eric Just, Senior Bioinformatics Scientist, dictyBase: http://dictybase.org<br />
Center for Genetic Medicine, Northwestern University. This is an edited version of [[Media:Modware.pdf|Eric's presentation]].<br />
<br />
=====Why Modware Was Developed=====<br />
<br />
* Each feature type requires different behavior<br />
* Want to leave schema semantics out of application<br />
* Want to leverage work done in BioPerl<br />
* Re-use code developed for common use cases<br />
* DictyBase is using a superset of Modware<br />
** Modware uses this code, but strips out all non-standard GMOD code<br />
* Provides nice interface over stock GMOD installation<br />
<br />
=====What is in the Feature Table?=====<br />
<br />
The core of Chado<br />
<br />
* Chromosome<br />
* Contig<br />
* Gene<br />
* mRNA<br />
* Exon<br />
* Lots of other things - See Sequence Ontology!<br />
<br />
=====Modware Features=====<br />
<br />
* Multiple Feature classes<br />
** CHROMOSOME, GENE, MRNA, CONTIG<br />
* Each class provides type specific methods<br />
* Logic such as building exon structure of mRNA features is encapsulated<br />
* Parent class Modware::Feature<br />
** Provides common methods such as name(), primary_id(), external_ids() <br />
** Abstract factory for various feature types<br />
* Lazy : information is only retrieved when you ask for it, but cached for speedy retrieval the next time it is required<br />
* Uses Bioperl and its objects<br />
** Each different feature subclass has a bioperl() method that returns <br />
an appropriate BioPerl object.<br />
** Bioperl object manipulation used to update feature coordinates<br />
* Subclasses provide type-specific methods<br />
** For example, Chromosome isn't the same as Gene which isn't the same as ...<br />
* Any feature type not explicitly supported in Modware::Feature class is blessed as a Modware::Feature::GENERIC class<br />
** Has a start/stop coordinate on a genomic sequence feature (no structure like a trasncript with exons)<br />
<br />
=====Architectural Overview=====<br />
<br />
* Object-oriented Perl interface to Chado<br />
* Built on top of Chado::AutoDBI<br />
* Connection handled by GMOD<br />
* Database transactions supported<br />
* BioPerl used to represent and manipulate sequence and feature structure<br />
* ‘Lazy’ evaluation<br />
<br />
=====Create and Insert Chromosome=====<br />
<br />
<perl><br />
my $seq_io = new Bio::SeqIO(<br />
-file => "../data/fake_chromosome.txt",<br />
-format => 'fasta'<br />
);<br />
<br />
# Bio::SeqIO will return a Bio::Seq object which<br />
# Modware uses as its representation<br />
my $seq = $seq_io->next_seq();<br />
<br />
my $reference_feature = new Modware::Feature(<br />
-type => 'chromosome',<br />
-bioperl => $seq,<br />
-description => "This is a test",<br />
-name => 'Fake',<br />
-source => 'GMOD 2007 Demo'<br />
);<br />
<br />
# Inserts chromosome into database<br />
$reference_feature->insert();<br />
</perl><br />
<br />
<br />
=====Problem 1 - Create and Insert a Gene=====<br />
<br />
1) Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned feature_id for each inserted gene.<br />
<br />
Gene Feature<br />
symbol: x-ray<br />
synonyms: none<br />
mRNA Feature<br />
exon:<br />
start: 1703<br />
end: 1900<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 1 - Create and Insert a Gene=====<br />
<br />
1) Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned feature_id for each inserted gene.<br />
<br />
Gene Feature<br />
symbol: x-men<br />
synonyms: wolverine<br />
mRNA Feature<br />
exon_1: <br />
start: 12648<br />
end: 13136<br />
strand: 1<br />
srcFeature_id: <br />
Id of genomic sample<br />
<br />
=====Problem 1 - Create and Insert a Gene=====<br />
<br />
1) Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned feature_id for each inserted gene.<br />
<br />
Gene Feature<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
mRNA Feature<br />
exon_1: <br />
start: 13691<br />
end: 13767<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
exon_2: <br />
start: 14687<br />
end: 14720<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 1 - Create and Insert a Gene=====<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
…<br />
<br />
<perl><br />
my $gene_feature = new Modware::Feature(<br />
-type => 'gene',<br />
-name => 'xfile',<br />
-description => 'A test gene for GMOD meeting',<br />
-source => 'GMOD 2007 Demo'<br />
);<br />
<br />
$gene_feature->add_synonym( 'mulder' );<br />
$gene_feature->add_synonym( 'scully' );<br />
<br />
# inserts object into database<br />
$gene_feature->insert();<br />
print 'Inserted gene with feature_id:'.$gene_feature->feature_id()."\n";<br />
</perl><br />
<br />
=====Problem 1 - Create mRNA BioPerl Object=====<br />
<br />
exon_1: exon_2: <br />
start: 13691 start: 14687<br />
end: 13767 end: 14720<br />
strand: 1 strand: 1<br />
srcFeature_id: Id of genomic sample srcFeature_id: Id of genomic sample<br />
<br />
<perl><br />
# First, create exon features (using Bioperl)<br />
my $exon_1 = new Bio::SeqFeature::Gene::Exon (<br />
-start => 13691,<br />
-end => 13767,<br />
-strand => 1,<br />
-is_coding => 1<br />
);<br />
<br />
my $exon_2 = new Bio::SeqFeature::Gene::Exon (<br />
-start => 14687,<br />
-end => 14720,<br />
-strand => 1,<br />
-is_coding => 1<br />
);<br />
<br />
# Next, create transcript feature to 'hold' exons (using Bioperl)<br />
my $bioperl_mrna = new Bio::SeqFeature::Gene::Transcript();<br />
<br />
# Add exons to transcript (using Bioperl)<br />
$bioperl_mrna->add_exon( $exon_1 );<br />
$bioperl_mrna->add_exon( $exon_2 );<br />
</perl><br />
<br />
=====Problem 1 - Create and Insert mRNA=====<br />
<br />
The BioPerl object holds the location information, but now we want to create a Modware object and link it to the gene as well as locate it on the chromosome.<br />
<br />
<perl><br />
# Now create Modware Feature to 'hold' bioperl object<br />
my $mrna_feature = new Modware::Feature(<br />
-type => 'mRNA',<br />
-bioperl => $bioperl_mrna,<br />
-source => 'GMOD 2007 Demo',<br />
-reference_feature => $reference_feature<br />
);<br />
<br />
# Associate mRNA to gene (required for insertion)<br />
$mrna_feature->gene( $gene_feature );<br />
<br />
# inserts object into database<br />
$mrna_feature->insert();<br />
</perl><br />
<br />
=====Problem 2 - Writing the Report=====<br />
<br />
2) Retrieve and print the following report for gene xfile<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>xfile cds<br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
Create new package, GMODWriter, to write the report, this package<br />
uses Modware and Bioperl methods.<br />
<br />
<perl><br />
use Modware::Gene;<br />
use GMODWriter;<br />
<br />
my $xfile_gene = new Modware::Gene( -name => 'xfile' );<br />
GMODWriter->Write_gene_report( $xfile_gene );<br />
</perl><br />
<br />
* What's the difference between Modware::Gene and Modware::Feature? Gene is-a Feature.<br />
<br />
=====Problem 2 - Writing the Report=====<br />
<br />
2) Retrieve and print the following report for gene xfile<br />
<br />
* The mRNA object contains the Bioperl object<br />
** Why not just subclass? More flexibility the way shown here<br />
<br />
<perl><br />
package GMODWriter; <br />
sub Write_gene_report {<br />
my ($self, $gene) = @_;<br />
my $symbol = $gene->name();<br />
<br />
my @synonyms = @{ $gene->synonyms() };<br />
my $syn_string = join ",", @synonyms;<br />
my $description = $gene->description();<br />
my $type = $gene->type();<br />
# get features associated with the gene that are of type 'mRNA'<br />
my ($mrna) = grep { $_->type() eq 'mRNA' } @{ $gene->features() };<br />
# use bioperl method to get exons from mRNA<br />
my @exons = $mrna->bioperl->exons_ordered();<br />
# Modware will return a nice fasta file for you.<br />
my $fasta = $mrna->sequence( -type => 'cds', -format => 'fasta' );<br />
# Now print the actual report<br />
print "symbol: $symbol\n";<br />
print "synonyms: $syn_string\n";<br />
print "description: $description\n";<br />
print "type: $type\n";<br />
<br />
my $count = 0;<br />
foreach my $exon (@exons ) {<br />
$count++;<br />
print "exon${count} start: ".$exon->start()."\n";<br />
<br />
print "exon${count} end: ".$exon->end()."\n";<br />
<br />
}<br />
print "$fasta";<br />
}<br />
. . .<br />
</perl><br />
<br />
=====Problem 3 - Updating a Gene Name=====<br />
<br />
3) Update the gene xfile: change the name symbol to x-file and retrieve the changed record. Regenerate gene report<br />
<br />
<perl><br />
use Modware::Gene;<br />
use Modware::DBH;<br />
use GMODWriter;<br />
<br />
eval{<br />
<br />
# get xfile gene<br />
my $xfile_gene = new Modware::Gene( -name => 'xfile' );<br />
<br />
# change the name<br />
$xfile_gene->name( 'x-file' );<br />
# write changes to database<br />
$xfile_gene->update();<br />
<br />
# we can use the original object if we want, but instead<br />
# we refetch from the database to 'prove' the name has been changed<br />
my $xfile_gene2 = new Modware::Gene( -name => 'x-file' );<br />
# use our GMODWriter package to write report for x-file<br />
GMODWriter->Write_gene_report( $xfile_gene2 );<br />
<br />
};<br />
if ($@){<br />
warn $@;<br />
new Modware::DBH->rollback();<br />
}<br />
</perl><br />
<br />
<br />
=====Problem 4 - Search and Display Results=====<br />
<br />
4) Search for all genes with symbols starting with "x-*". With the results produce the following simple result list (organism will vary):<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
1325 x-ray Xenopus laevis<br />
<br />
<perl><br />
use Modware::Gene;<br />
use Modware::DBH;<br />
use GMODWriter;<br />
<br />
# find genes starting with 'x-'<br />
my $results = Modware::Search::Gene->Search_by_name( 'x-*' );<br />
<br />
# write the search results<br />
GMODWriter->Write_search_results( $results )<br />
</perl><br />
<br />
<br />
=====Problem 4 - Search and Display Results=====<br />
<br />
4) Search for all genes with symbols starting with "x-*". With the results produce the following simple result list (organism will vary):<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
1325 x-ray Xenopus laevis<br />
<br />
<br />
<perl><br />
sub Write_search_results {<br />
my ($self, $itr) = @_;<br />
# loop through iterator<br />
while (my $gene = $itr->next) {<br />
# print the requested information<br />
print $gene->feature_id . "\t" . $gene->name .<br />
"\t" . $gene->organism_name . "\n";<br />
}<br />
}<br />
</perl><br />
<br />
=====Problem 5 - Delete a Gene=====<br />
<br />
5) Delete the gene x-ray. Run the search and report again.<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
<br />
<perl><br />
# get the xray gene<br />
my $xray = new Modware::Gene( -name => 'x-ray' );<br />
<br />
# set is_deleted = 1, this will 'hide' the gene from searches,<br />
# also sets the is_available to 0, the gene is no longer visible<br />
# to a search.<br />
<br />
$xray->is_deleted(1);<br />
<br />
# write change to database<br />
$xray->update();<br />
<br />
# find genes starting with 'x-'<br />
my $results = Modware::Search::Gene->Search_by_name( 'x-*' );<br />
<br />
# write the search results<br />
GMODWriter->Write_search_results( $results )<br />
</perl><br />
<br />
<br />
=====Other Modware Highlights=====<br />
<br />
* Easy to write applications with Modware<br />
* Extensible<br />
* Available through Sourceforge<br />
** http://gmod-ware.sourceforge.net<br />
* Easy to install<br />
* Large unit test coverage<br />
* Current release 0.2-RC1<br />
** Works with GMOD’s latest release<br />
* Sample script demoed here are available<br />
** sample_scripts directory<br />
<br />
=====Other Nice Things About Modware=====<br />
<br />
<br />
* Bioperl-style documentation <br />
** http://gmod-ware.sourceforge.net/doc/<br />
** POD for all methods<br />
* If Chado changes then...<br />
** Manually change Modware or ... <br />
** AutoDBI will automatically adjust to the change, depends on the change<br />
* Can set multiple connections through AutoDBI's <code>set_connection</code><br />
<br />
=====Coming Attractions=====<br />
<br />
* Support for changing genomic sequence<br />
* ncRNAs<br />
* UTRs<br />
* Onotology modules<br />
* Phenotype Annotations<br />
* Getting a new database handle returns the existing<br />
** Thinking about configuring modules to set what database handle can be used<br />
* Pass an argument ''type'' to the Gene's feature() method<br />
* Type the kind of synonym is being inserted?<br />
** Possible: trade-off between simplicity and functionality<br />
<br />
* Send us your ideas!<br />
<br />
<br />
=====Discussion=====<br />
<br />
* How hard is it to extend Modware?<br />
** Not known absolutely, but generally thought to be not difficult<br />
<br />
=====Acknowlegments=====<br />
<br />
* Rex Chisholm, PhD<br />
* Warren Kibbe, PhD <br />
* Scott Cain<br />
* Brian O’connor<br />
* Sohel Merchant<br />
* Petra Fey<br />
* Pascale Gaudet,<br />
* Karen Pilcher<br />
<br />
* BioPerl<br />
* GMOD<br />
* SGD</div>165.124.152.78http://gmod.org/wiki/GMOD_MiddlewareGMOD Middleware2007-02-21T20:49:40Z<p>165.124.152.78: /* Configuration */</p>
<hr />
<div>==Executive Summary==<br />
<br />
Although GMOD uses the Chado schema for its underlying database, each group has developed a separate interface to their databases. This meeting was designed to review the experiences each group has had with its particular solution and to recommend a "best practice." <br />
<br />
All participants had developed some type of Object-Relational Mapping (ORM) tool. The tools fell into two general classes. The first class, which includes Hibernate (Java), iBatis (Java), and Chado::AutoDBI (Perl) examines the relational schema and configuration files to create the middleware automatically, or create a relational schema and middleware automatically from a data model (InterMine). The second class involved hand-coding an API to create an object-oriented interface to Chado. Of the tools presented, Modware, built on top of Chado::AutoDBI, was the most mature and feature-rich.<br />
<br />
The consensus from the meeting was that while fully automatic tools are convenient, that there is considerable value in creating a hand-crafted predictable API that reflects the biological data closely. The approach taken by Modware, which starts from an auto-generated interface and then adds a hand-coded API layer, is both comprehensible and powerful. The hand-edited layer will serve the common cases in detail while the auto-generated layer retains the full flexibility of the Chado schema for the cases that fall outside the hand-edited API. We recommend that implementors of Perl-based GMOD tools seriously consider Modware for their middleware interface to Chado, and that the implementors of Java-based tools collaborate to create a Java API that parallels Modware's. <br />
<br />
<br />
__TOC__<br />
<br clear='all'><br />
<br />
==Middleware for Chado databases==<br />
<br />
===Participants===<br />
<br />
On January 19 2007, representatives of the major model organism databases (MODs) and other interested parties met to discuss and compare Middleware packages used by developers working for the MODs. The workshop attendees were tasked to make specific recommendations. Such recommendations will focus the efforts of the MODs on specific packages and lead to code re-use and more feature-rich middleware.<br />
<br />
After an introductory presentation attendees listened to a series of presentations on the different Middleware packages. Each presenter described the features of the package, both positive and negative, and showed how the package would be used to address a specific set of test problems. The attendees reassembled in a general discussion group to discuss the presentations and to develop a consensus statement. This document represents the consensus of the workshop and shows the individual presentations.<br />
<br />
<br />
===Introduction===<br />
<br />
A group of some 50 GMOD developers gathered at the annual meeting to discuss middleware. This one day meeting had the following general goals:<br />
<br />
* To educate GMOD programmers on methods and practices for Middleware<br />
* To facilitate discussion on the best methods<br />
* To guide GMOD to a uniform Middleware layer<br />
* To generate this central reference document for Middleware projects, including:<br />
** Platform information<br />
** Strengths & weaknesses of different Middleware packages<br />
** Specific examples of how one would use a given middleware package<br />
<br />
One of the key characteristics of the GMOD software project is the variety of approaches and components that it supports. This applies to applications, database schemas, as well as to middleware, a software ''layer'' that mediates the exchange of data between the applications and the databases. Despite this diversity certain applications and schemas have emerged as key supported components in GMOD, such as the GBrowse application and the Chado schema, to name just two. However, a consensus view has not emerged with respect to middleware, and there are certainly a number of different middleware packages that have been used in the GMOD world, coming from within this world and from the larger world of open source.<br />
<br />
In late 2006 the GMOD developers took note of the large number of middleware packages in use and elected to embark on a short-term study to evaluate and compare these packages. The primary motivation here was to select or recommend certain packages over others specifically within the GMOD context. The assumption is that making such recommendations will serve to focus the developers' effort on a smaller number of packages. Clearly it is also assumed that such a focus will inevitably lead to greater support for and use of those recommended packages, and that all within GMOD will benefit.<br />
<br />
Another purpose of this study is to educate GMOD programmers on best practices concerning the use and development of middleware. It is expected that common agreement on these practices will lead to the development of more effective software as well as the best use of the software in practice. Finally, this study should generate a central reference document on these different middleware packages used in GMOD. This reference will contain platform- and language-specific information as well as descriptions of the strengths and weaknesses of the packages that can be used by GMOD developers when considering middleware.<br />
<br />
====General Evaluation Criteria====<br />
<br />
The GMOD developers proposed that each presenter provide some basic information about each middleware package, both general and technical. In addition each middleware application was asked to address a set of sample problems, shown below. These example problems are thought to typify some of the common functions that a scientist may need when working with their own database. It was understood that not all software would be able to handle all aspects of the sample problems and this demonstration was not intended to be ''live''.<br />
<br />
=====Problem 1=====<br />
<br />
Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned <code>feature_id</code> for each inserted gene.<br />
<br />
Note:<br />
<br />
* The coordinates are given in exact coordinates.<br />
* Use the <code>organism_id</code> for your organism<br />
* Store description in the chado table <code>featureprop</code><br />
* A sequence in fasta format (see [[FakeChromosome|Fake Chromosome]]) should be loaded as genomic sequence, either chromosome or contig -- this will be used as a <code>srcfeature</code> in <code>featureloc</code><br />
<br />
Gene descriptions:<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
mRNA Feature<br />
exon_1: <br />
start: 13691<br />
end: 13767<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
exon_2: <br />
start: 14687<br />
end: 14720<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
symbol: x-men<br />
synonyms: wolverine<br />
mRNA Feature<br />
exon_1: <br />
start: 12648<br />
end: 13136<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
symbol: x-ray<br />
synonyms: none<br />
exon: <br />
start: 1703<br />
end: 1900<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 2=====<br />
<br />
Retrieve and print the following report for gene '''xfile''' (the coding sequence and exon coordinates are derived from the associated mRNA feature). The results should resemble the following:<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>xfile cds <br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
=====Problem 3=====<br />
<br />
Update the gene '''xfile''': change the name symbol to '''x-file''' and retrieve the changed record. Regenerate the report from Problem 1. The results should resemble the following:<br />
<br />
symbol: x-file<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>x-file cds<br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
=====Problem 4=====<br />
<br />
Search for all genes with symbols starting with ''x-''. With the results produce the following simple result list<br />
(organism will vary):<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
1325 x-ray Xenopus laevis<br />
<br />
=====Problem 5=====<br />
<br />
Delete the gene '''x-ray''' using the <code>geneId</code>. Run the search and report in Problem 4 again to show the delete has taken<br />
place, with a result resembling the following:<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
<br />
=====Problem Results=====<br />
<br />
All presenters paid attention to the assigned problems and all packages could perform the required operations if appropriate: the GBrowse (DasI) Adaptor and InterMine, being read-only packages, did not carry out the edit/ delete tasks. Clearly one can see many differences between packages in how the the problems were solved, please see the presentations themselves for these details.<br />
<br />
==The Middleware Packages==<br />
<br />
The one day meeting heard presentations from developers using both Perl and Java middleware and a number of satisfactory solutions were described. The focus in almost all cases was some sort of system that connected to the Chado relational database. Although other databases are encountered in the GMOD world (e.g. [http://biosql.org BioSQL]) the Chado schema is popular and serves as a good test schema for this exercise given its complexity. The primary focus in the talks was on functionality from the perspective of writing code and extending the software and less attention was given to performance. Each presenter focussed on their middleware and few side-by-side comparisons were made (for one comparison please see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]] by Josh Goodman, written before this meeting).<br />
<br />
The Chado::AutoDBI middleware package is built on Perl's Class:DBI, a module that acts as an ORM tool. The Chado::AutoDBI interface is autogenerated directly from the Chado schema using a template adapted from the [http://turnkey.sf.net Turnkey] project. This greatly reduces the amount of time developers need to spend maintaining the API since the code can just be regenerated when the schema changes. Modware is built on top of Chado::AutoDBI. Both Chado::AutoDBI and Modware are built specifically for the Chado schema and aren't general tools. However, the Turnkey project can be used to produce an AutoDBI equivalent for other database schemas.<br />
<br />
A key difference is that Modware uses Bioperl as its programmatic interface whereas the Chado::AutoDBI API resembles that of Class::DBI. Furthermore Chado::AutoDBI wraps the entire Chado schema where Modware objects only address a subset of the Chado schema that maps to Bioperl objects. Neither of these packages require configuration, they are pre-configured for Chado and all one needs to do is provide connection information (server, username, password, etc). This schema, in theory, could be running on any popular RDBMS (Postgres, Mysql, Oracle, etc.), this flexibility is built into Class::DBI.<br />
<br />
The Java packages, Hibernate and iBatis, are far more general than the two packages discussed above and one could use them with any relational schema. One distinguishing feature is the languages that these packages use: Hibernate tends to present more Java to the programmer whereas iBatis presents both Java and XML. Both allow you to address the schema with SQL should you choose to do so. Hibernate also introduces its own language, HQL, a Java-SQL hybrid. Both would need some configuration to connect to the Chado schema but this should not be considered a significant barrier. Both packages can be used with any popular RDBMS (Postgres, Mysql, Oracle, etc.).<br />
<br />
XORT is not a typical ORM tool like these other packages but has been included here because of its utility in bulk operations. This capability is one that the ORM tools are thought not to do very well as the serial construction and destruction of objects is typically not fast, and constructing very large numbers of objects simultaneously consumes quite a bit of memory. This is not to say that one shouldn't use an ORM tool for bulk operations but that one should test the tool in question and not assume its performance is adequate to the given task. Such a test may show an entirely different approach, like that of XORT, is more appropriate.<br />
<br />
InterMine is another Java-based generic ORM tool that differs from the others by being tuned for read-only performance. It has been designed to support large complex queries and bulk data production efficiently. Performance of live databases can be tuned through the generation of pre-computed tables: a query optimiser intercepts all queries and re-writes them to make use of the available pre-computed tables. Like the other ORM packages, maintenance is minimised through extensive automatic code generation - when the database object model changes recompilation updates Java classes, the relational schema, the web application and mappings between them. Currently InterMine systems require Postgres but could be extended to other platforms that support the 'Explain' used by the query optimiser. Apart from Java and pilot Perl APIs, a variant of OQL, IQL, is provided.<br />
<br />
====Java Middleware====<br />
<br />
The Java packages all used Java plus XML, to some degree. In addition iBatis exposes SQL to the developer and it was argued that this could be viewed either as an advantage (allows tuning of underlying SQL) or a disadvantage. Both Hibernate and iBatis are designed to operate with any relational schema, not just Chado. <br />
<br />
Both Abator and iBatis share one limitation which is that one can not <br />
auto-generate higher level objects, such as Genes. In fairness one must add that this would not really <br />
be possible to do accurately for any system. There is not sufficient <br />
information present in the database schema to derive all of the <br />
relationships necessary. In both packages the auto-generation code only mimics the schema and it would have take additional work to get <br />
a package to automatically create and relate subsets of data based on "join <br />
tables" like feature_relationship property. This mostly has to do <br />
with the circular nature of the schema.<br />
<br />
=====Hibernate=====<br />
<br />
From the [http://www.hibernate.org Hibernate Web site]:<br />
<br />
''Hibernate lets you develop persistent classes following object-oriented idiom - including association, inheritance, polymorphism, composition, and collections. Hibernate allows you to express queries in its own portable SQL extension (HQL), as well as in native SQL, or with an object-oriented Criteria and Example API. Unlike many other persistence solutions, Hibernate does not hide the power of SQL from you and guarantees that your investment in relational technology and knowledge is as valid as always.''<br />
<br />
Hibernate is thought to work best when used in conjunction with a schema that's been designed with objects in mind, an ''object-oriented schema''.<br />
<br />
======Abstraction======<br />
<br />
Hibernate is the more abstracted of the 2 Java packages, it allows you to work with the relational database with the least exposure to SQL if you choose to do this. It is probably considered the more flexible of the 2 with respect to language since one can program in Java or HBL (Hibernate Query Language), a hybrid between SQL and Java.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons between Hibernate and iBatis were made.<br />
<br />
======Configuration======<br />
<br />
Hibernate has the ability to read a database schema and generate Java <br />
Code representing the database objects or tables. In this exercise the developer chose not to use <br />
this method because one must create a file telling the code auto-generation what to use for indexes, which fields should be complex <br />
objects, and so on. It was judged to be less work to just go ahead and make <br />
the code by hand.<br />
<br />
For each object mapped to the database, one can create an XML file <br />
that tells Hibernate which database fields to map to specific object <br />
properties. Additionally, one can implement equals() and hashcode() <br />
functions for each object based on the table unique constraints. <br />
After doing that, Hibernate takes care of all the Create-Retreive-Update-Delete (CRUD) and transaction <br />
functionality.<br />
<br />
======Documentation======<br />
<br />
Hibernate is a popular and well-supported tool with extensive documentation.<br />
<br />
=====iBatis=====<br />
<br />
From the [http://ibatis.apache.org iBatis Web site]:<br />
<br />
''The Data Mapper framework (a.k.a. SQL Maps) will help to significantly reduce the amount of Java and .NET code that is normally neededto access a relational database. This framework maps classes to SQL statements using a very simple XML descriptor. Simplicity is the biggest advantage of iBATIS over other frameworks and object relational mapping tools. To use iBATIS you need only be familiar with your own application domain objects (basic JavaBeans or .NET classes), XML, and SQL. There is very little else to learn. There is no complex scheme required to join tables or execute complex queries. Using iBATIS you have the full power of real SQL at your fingertips. The iBATIS Data Mapper framework can map nearly any database to any object model and is very tolerant of legacy designs, or even bad designs. This is all achieved without special database tables, peer objects or code generation.''<br />
<br />
======Abstraction======<br />
<br />
iBatis does not attempt to achieve abstraction in the way that other Java ORM tools do and assumes that viewing SQL is an advantage, not a disadvantage.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons made between iBatis and Hibernate.<br />
<br />
======Configuration======<br />
<br />
The following general steps are performed:<br />
<br />
# Create an XML file of tables the developer wants to create objects for <br />
#* Including specifying the method for retrieving auto-generated sequence values from inserts<br />
#* This file could als be easily auto-generated<br />
# Make some type adjustments for some variables <br />
#* For example, iBatis mapped some things to Java strings that should have been Integers<br />
<br />
======Documentation======<br />
<br />
iBatis is a popular and well-supported tool with extensive documentation.<br />
<br />
<br />
=====InterMine=====<br />
<br />
>From the [http://www.intermine.org InterMine Web site]:<br />
<br />
''InterMine is an general-purpose object-oriented data warehouse<br />
system developed as part of the [http://www.flymine.org FlyMine]:<br />
project and made available as stand-alone open-source software. It is<br />
able to create a query-optimised data warehouse with a powerful web<br />
interface for any data model.<br />
<br />
InterMine.bio is an extension to InterMine that defines a biological<br />
data model (based on the [http://song.sourceforge.net/ Sequence<br />
Ontology]) and is able to integrate data from many standard formats<br />
and databases used in biology. Instances of InterMine.bio are created<br />
by specifying the sources to load and configuring the particular<br />
organisms and data files required. A framework is provided for adding<br />
new sources to load custom data, and each new source can easily extend<br />
the data model.''<br />
<br />
======Abstraction======<br />
<br />
An InterMine data model is defined at the object level and a database<br />
schema is automatically generated. The database schema is entirely<br />
hidden, queries are performed on the object model by a Java API,<br />
an OQL-like query language (IQL), or a pilot Perl API.<br />
<br />
======Performance======<br />
<br />
Following build, InterMine-based systems are read-only. In contrast<br />
to transactional systems this makes it easier to focus on query<br />
performance. In particular, InterMine makes it straightforward to<br />
manage controlled denormalisation of data to enhance query<br />
performance: a generic query optimiser intercepts queries and<br />
transparently re-writes them to make use of precomputed tables. New<br />
precomputed tables can be added at any time, allowing performance<br />
tuning of the live database. Performance is also enhanced through the<br />
use of a large object cache in the web application.<br />
<br />
======Configuration======<br />
<br />
An InterMine object model is defined as an XML file, Java business<br />
objects and a relational schema are automatically generated from it.<br />
The mapping between objects and the database is thus handled<br />
automatically with no additional configuration. Generation of<br />
appropriate indexes is also automatic. InterMine cannot access an<br />
existing database schema but could import data from one by defining<br />
an object model and importing the data.<br />
<br />
======Documentation======<br />
<br />
Documentation on InterMine's functionality and instructions for setting<br />
up a new instance are provided at http://trac.flymine.org<br />
<br />
<br />
====Perl Middleware====<br />
<br />
The Perl approaches used only the Perl language (the Java packages all used Java plus XML, to some degree). The XORT application is not, strictly speaking, ''middleware'' but has proven to be very useful in bulk operations using the Chado schema and Chado XML though in principle it can be used with any relational schema.<br />
<br />
=====Chado::AutoDBI=====<br />
<br />
The Chado::AutoDBI tool is based on Class::DBI and maps objects directly to tables in the Chado schema. It provides a very easy perl interface for low-level access to the database. It is currently used by the [http://www.gmod.org/gmodweb GMODWeb] project, as a bulk loader for the Chado database, and as the underlying ORM tool in Modware. Chado::AutoDBI is automatically generated from the Chado database schema which makes it very easy to update when changes are made. This code autogeneration process was adapted from the [http://turnkey.sf.net Turnkey] project which is a generic ORM/website code generation tool.<br />
<br />
======Abstraction======<br />
<br />
Chado::AutoDBI objects map directly to the Chado tables so it could be said that Chado::AutoDBI is as abstract as Chado itself. Therefore one needs to become somewhat familiar with Chado itself in order to use Chado::AutoDBI.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons of performance were done using Perl middleware. All packages were deemed to give adequate performance when used to connect UIs to underlying databases. On the other hand the presenters were reluctant to recommend their packages for ''bulk'' operations. <br />
<br />
======Configuration======<br />
<br />
Chado::AutoDBI requires no configuration, it is designed to interact with the Chado schema out-of-the-box. If the schema changes then Chado::AutoDBI can be quickly autogenerated again to adjust to the change.<br />
<br />
======Documentation======<br />
<br />
Chado::AutoDBI is distributed with the chado schema available at the [http://sf.net/projects/gmod GMOD] website. For more information see the [http://gmod.cvs.sourceforge.net/gmod/schema/chado/README.AutoDBI?view=markup README].<br />
<br />
=====Modware=====<br />
<br />
======Abstraction======<br />
<br />
Modware has higher level of abstraction than that provided by Chado::AutoDBI. The critical point is that it can create BioPerl object representations of features directly from Chado, this could either be highly suitable or not at all appropriate for a given environment.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons of performance were done using Perl middleware. All packages were deemed to give adequate performance when used to connect UIs to underlying databases.<br />
<br />
======Configuration======<br />
<br />
Modware requires minimal configuration, it is designed to interact with the Chado schema out-of-the-box. If the schema changes then the underlying Chado::AutoDBI should be able to adjust to the change. Some convenience views are created when installing Modware (i.e. V_MRNA_FEATURES).<br />
<br />
======Documentation======<br />
<br />
Bioperl-style documentation at http://gmod-ware.sourceforge.net/doc/, written for POD for all methods.<br />
<br />
=====GBrowse (DasI)=====<br />
<br />
======Abstraction======<br />
Since it implements the Bio::DasI interface, the abstraction is similar to what one would see with BioPerl/BioDas features.<br />
<br />
======Performance======<br />
Bio::DB::Das::Chado performs relatively well since it is designed to be a database adaptor to drive a Generic Genome Browser (GBrowse) instance. Though it does not perform as well as the {{BPM|Bio::DB::GFF}} and {{BPM|Bio::DB::SeqFeature}} adaptors, those are for databases designed specifically for quick retrieval of data for GBrowse, whereas Chado is designed as a complete data warehouse. The main way to get better performance for the Chado GBrowse adaptor is to implement materialized views of the GFF or SeqFeature schema inside Chado.<br />
<br />
======Configuration======<br />
<br />
This package needs no configuration, it is pre-configured for Chado.<br />
<br />
======Documentation======<br />
<br />
The DasI interface is well-documented, about a dozen methods and three classes, all documented.<br />
<br />
<br />
<br />
===Getting More Information===<br />
<br />
The issues around using and developing middleware are of general interest in GMOD. If you have questions about middleware we suggest that you contact the GMOD Development list rather than contacting individual developers. You can sign up for the list here:<br />
<br />
https://lists.sourceforge.net/lists/listinfo/gmod-devel<br />
<br />
==Object-Relational Mapping Principles==<br />
<br />
====Presentation by Sohel Merchant====<br />
<br />
Sohel Merchant, Bioinformatics Software Engineer at dictyBase, Center for Genetic Medicine, Northwestern University, Chicago. This Wiki section is an edited version of [[Media:Object_Relational_Mapping_Layer.pdf|Sohel's presentation]].<br />
<br />
=====Outline=====<br />
<br />
* The Problem<br />
* Solutions<br />
* ORM<br />
* Perl – Class::DBI<br />
* Summary<br />
<br />
=====The Problem=====<br />
<br />
* Developers need to perform Create, Retrieve, Update, Delete (aka CRUD) operations on data inside an application.<br />
* The real world objects represented using a programming language needs to be stored in databases<br />
* Using relational databases to store object-oriented data leads to a semantic gap<br />
* RDBMS have fixed types, but OO can have more complicated user defined types.<br />
<br />
=====Solutions=====<br />
<br />
* Data Access Object (DAO)<br />
* Developer writes a class which contains one attribute for each field in the table<br />
* Methods for CRUD typically contains JDBC/DBI code with the necessary SQL statements.<br />
* Object Relational Mapping (ORM), WikiPedia:<br />
** “ORM is a programming technique that links databases to object-oriented language concepts, creating (in effect) a virtual object database.“<br />
* Developer needs to configure the ORM<br />
* Less amount of manual coding<br />
* CRUD methods are automatically generated by the ORM layer<br />
<br />
=====ORM=====<br />
<br />
ORM solutions<br />
<br />
* Perl<br />
** Class::DBI<br />
* Java<br />
** EJB<br />
** Hibernate<br />
** JDO<br />
** iBatis<br />
<br />
=====Perl - Class::DBI=====<br />
<br />
* Provides a simple interfaces for wrapping Perl classes around a database tables<br />
* Tables are mapped directly to objects<br />
* The table column name are mapped to the get/set methods<br />
* Can be used with transactions<br />
<br />
=====Class::DBI=====<br />
<br />
Defining a class in Class::DBI to represent a table:<br />
<br />
CVTERM<br />
cvterm_id<br />
cv_id<br />
name<br />
definition<br />
dbxref_id<br />
<br />
Corresponding code:<br />
<br />
<perl><br />
package Chado::Cvterm;<br />
use base 'Chado::DBI';<br />
Chado::Cvterm->set_up_table('Cvterm');<br />
</perl><br />
<br />
=====Class::DBI - CRUD=====<br />
<br />
<perl><br />
## Create<br />
$term_dbobj = Chado::Cvterm->create({<br />
name => ”DUMMY TERM”,<br />
cv_id => 1,<br />
dbxref_id => 125<br />
});<br />
## Retrieve<br />
$term_dbobj = Chado::Cvterm->retrieve(2);<br />
<br />
## Update<br />
$term_dbobj->name( $term->name() );<br />
$term_dbobj->definition( $term->definition );<br />
<br />
## Delete<br />
$term_dbobj->delete();<br />
</perl><br />
<br />
=====Java - Hibernate=====<br />
<br />
* Hibernate maps Java Objects directly to database tables<br />
* Scalable<br />
* Works well for controlled Data model<br />
<br />
=====Java - iBatis=====<br />
<br />
* iBATIS maps Java Objects to the results of SQL Queries<br />
* XML definitions for queries<br />
* Queries and managing Maps<br />
* Transactions<br />
* Good fit for existing database schema<br />
<br />
=====Summary=====<br />
<br />
* ORM provides painless roundtrip of data between the application and database.<br />
* Reduces the amount of SQL code and allows a programmatic style interface to the RDBMS<br />
* Choice of ORM solution depends on the type of project<br />
* Flybase examined iBatis and Hibernate, both use XML configuration files<br />
** Hibernate is better if you're building schema from scratch<br />
** Both auto-configure given a schema. <br />
** Both have strengths and weaknesses.<br />
* Is Hibernate better when you're in the process of designing a schema?<br />
** Hibernate can assist you in making a ''Hibernate-compatible'' schema.<br />
<br />
===XORT===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/project/showfiles.php?group_id=27707<br />
* Language: Perl<br />
* Authors: Pinglei Zhou<br />
* Users: Flybase<br />
* Support: Pinglei Zhou<br />
* Third party code: Uses XML::Parser::PerlSAX, Unicode::String, XML::DOM, and DBI from Perl (CPAN)<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
=====Comparing Hibernate & XORT=====<br />
<br />
Flybase tried Hibernate, but just creating simple print() statements in the course of doing bulk operations they<br />
encountered performance issues. Therer are many caching parameters available in Hibernate but the problem is that Chado is recursive or cyclical. <br />
XORT does some simple, and automatic, caching. With XORT you can handle recursive or cyclical operations more easily. In common operations such as merging genes Chado users will encounter this issue routinely.<br />
<br />
Also see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]].<br />
<br />
====Limitations====<br />
<br />
* Database schemas need to follow certain rules<br />
** All must have internal int primary key<br />
** All must have unique key(s)<br />
* It may take a long path to retrieve certain type of data<br />
** Example: gene->allele->genotype->phenotype via feature_relationship<br />
* Structure not stored in memory, you flush out data as it goes<br />
<br />
====Presentation by Pinglei Zhou and Josh Goodman====<br />
<br />
[[XORT Presentation|XORT Presentation]]<br />
<br />
===Chado::AutoDBI===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/projects/gmod/<br />
* Language: Perl<br />
* Authors: Allen Day, Scott Cain, Brian O'Connor, & others<br />
* Users: <br />
* Support:<br />
* Third party code: Based on Class::DBI by Michael Schwern & Tony Bowden<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Performance<br />
** Can one read thousands of objects into memory? You could do this but it's not suited to bulk operations<br />
* Joins & complex queries<br />
<br />
<perl><br />
# Add the add_constructor for looking for name lengths<br />
<br />
__PACKAGE__ ->add_constructor(long_names => qq{ length(name) > 15 });<br />
<br />
# Custom SQL<br />
<br />
__PACKAGE__->set_sql(xfiles => qq{<br />
SELECT FEATURE_ID<br />
FROM FEATURE<br />
WHERE NAME = 'xfiles' });<br />
</perl><br />
<br />
====Presentation by Brian O'Connor====<br />
<br />
[[Chado::AutoDBI Presentation|Chado::AutoDBI Presentation]]<br />
<br />
===Modware===<br />
Modware seeks to provide an object-oriented perl interface to Chado to allow for rapid application development without worrying about the complex details of the schema on a day-to-day basis.<br />
<br />
====Background====<br />
<br />
* Source: http://gmod-ware.sourceforge.net<br />
* Language: Perl<br />
* Authors: Sohel Merchant, Eric Just<br />
* Contact: e-just [at] northwestern.edu<br />
* Users: dictyBase<br />
* Support: Fully supported by authors. Sourceforge infrastructure (bug reports, mailing lists)<br />
* Third party code: GMOD, BioPerl<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity: Uses the Chado::AutoDBI connection from GMOD. No connection configuration necessary since Modware is built on top of GMOD and the coonnection is configured on GMOD install. <br />
* Transaction support: Transactions are fully supported. The database handle is available as a singleton through Modware::DBH. To rollback at any time, simply insert '''new Modware::DBH->rollback()''' into script.<br />
* Code generation: No automatic code generation.<br />
<br />
====Special topics====<br />
<br />
* Features in Modware map to familiar Bioperl (Bio::Seq and Bio::SeqFeature) objects that can be accessed through a 'bioperl' method. <br />
Modware internally links Bio::SeqFeautre objects to the proper Bio::Seq objects to provide maximum functionality from the BioPerl libraries.<br />
* Modware's API documentation is complete and easily browsed at [http://gmod-ware.sourceforge.net/doc/ http://gmod-ware.sourceforge.net/doc/]<br />
* Plenty of other documentation (quick starts with simple use cases) at [http://gmod-ware.sourceforge.net http://gmod-ware.sourceforge.net]<br />
<br />
====Limitations====<br />
* Currently Alpha release (hoping for user feedback to create Beta)<br />
* Does not ''cover'' all of Chado<br />
* Not enough users to get quality feedback yet<br />
* Performance is slower by object-oriented nature<br />
* Language dependent<br />
<br />
====Presentation by Eric Just====<br />
<br />
[[Modware Presentation|Modware Presentation]]<br />
<br />
===GBrowse (DasI) Adaptor===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/project/showfiles.php?group_id=27707&package_id=34513&release_id=433523<br />
* Language: Perl<br />
* Authors: Scott Cain<br />
* Users: Several through GBrowse, none that I know of as a scripting tool.<br />
* Support: [http://lists.sourceforge.net/lists/listinfo/gmod-gbrowse GBrowse mailing list]<br />
* Third party code: None.<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity: Connects ''via'' Perl's DBI<br />
* Transaction support: N/A (read only adapter)<br />
* Code generation: N/A<br />
<br />
====Special topics====<br />
<br />
Clearly, Bio::DB::Das::Chado is designed for use with GBrowse.<br />
<br />
====Limitations====<br />
<br />
* Read-only<br />
* Not generic middleware but if you use Chado and GBrowse may be useful<br />
* Incomplete implementation of Bio::DasI; just enough to make GBrowse work<br />
* Also, despite the name, has never been tested with a Das server.<br />
<br />
====Presentation by Scott Cain====<br />
<br />
[[GBrowse (DasI) Presentation|GBrowse (DasI) Adaptor Presentation]]<br />
<br />
===iBatis and Abator===<br />
<br />
====Background====<br />
<br />
* Source: http://ibatis.apache.org/<br />
* Language: Java<br />
* Authors: Apache group<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Does not hide SQL<br />
* Does not create a whole object model of the database in memory<br />
* Not as widely used as Hibernate<br />
* No Perl version<br />
<br />
====Presentation by Jeff Bowes====<br />
<br />
[[iBatis Presentation|iBatis Presentation]]<br />
<br />
===Hibernate===<br />
<br />
====Background====<br />
<br />
* Source: http://www.hibernate.org<br />
* Language: Java<br />
* Authors: JBoss group<br />
* Users: VectorBase<br />
* Support: JBoss group<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Limitations====<br />
<br />
* Issue around Completeness <br />
* Exception Handling<br />
* Performance Tuning<br />
<br />
====Presentation by Robert Bruggner====<br />
<br />
[[Hibernate Presentation|Hibernate Presentation]]<br />
<br />
===PSU Chado Interface===<br />
<br />
====Background====<br />
<br />
* Source: [http://www.sanger.ac.uk/pathogens Pathogen Sequencing Unit], Sanger Institute<br />
* Language: Java<br />
* Authors: Chinmay Patel, Adrian Tivey, Tim Carver<br />
* Users: In the future this will be freely available and used by [http://www.genedb.org GeneDB] and [http://www.sanger.ac.uk/Software/Artemis Artemis]<br />
* Support: In development<br />
* Third party code: iBatis and Spring/Hibernate<br />
<br />
====Overview====<br />
<br />
We're using a common interface with two different implementations: an iBatis and a Hibernate one. This gives us the ability to choose either implementation depending upon the requirements of the application. <br />
<br />
<br />
====iBatis====<br />
See also [[#iBatis_and_Abator|iBatis and Abator]]<br />
<br />
=====Technical Overview=====<br />
<br />
* Database connectivity:<br />
Dynamically via Java code or properties file<br />
* Transaction support:<br />
Yes<br />
* Code generation:<br />
Using common interface which was originally automatically generated, but mappings hand-generated<br />
<br />
=====Advantages=====<br />
<br />
* Direct access to SQL<br />
<br />
=====Limitations=====<br />
<br />
* Not completely pluggable between both engines<br />
* Lazy loading is either on or off<br />
<br />
====Hibernate====<br />
See also [[#Hibernate_2|Hibernate]]<br />
<br />
=====Technical Overview=====<br />
<br />
* Database connectivity:<br />
Via usual Spring configuration methods<br />
* Transaction support:<br />
Yes<br />
* Code generation:<br />
Interface and Hibernate implementation originally auto-generated, then hand-edited<br />
<br />
=====Advantages=====<br />
* Complete coverage of core schema (except Phenotype module)<br />
* Choice of writing in either object-level query language or SQL<br />
<br />
=====Limitations=====<br />
* Not completely pluggable between both engines<br />
* Not currently using subclassing eg just a Feature, not Gene, Exon etc<br />
<br />
====Presentation by Chinmay Patel====<br />
<br />
[[PSU Presentation|PSU Presentation]]<br />
<br />
===GUS Web Development Kit (WDK)===<br />
<br />
====Background====<br />
<br />
* Source: http://www.gusdb.org/WDK<br />
* Language: XML/Java/JSP<br />
* Authors: GUS WDK development team<br />
* Users: PlasmoDB/ApiDB/CryptoDB/ToxoDB, others under development<br />
* Support: GUS WDK development team<br />
* Third party code: Struts/JSP/Ajax/Axis<br />
* Platform requirements: any Oracle, PostgreSQL or MySQL database; Linux; Tomcat<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity: JDBC<br />
* Transaction support: read-only presentation layer object model<br />
* Code generation: java classes generated from detailed object specification in XML<br />
<br />
====Advantages====<br />
<br />
* Configurable coarse grained layer tailored for the presentation layer<br />
* Configurable in XML by non-programmers<br />
* Seemless integration with front-end query engine and web page generator<br />
<br />
====Limitations====<br />
<br />
* read-only<br />
* not designed as a general purpose ORT<br />
* configuration is complicated<br />
<br />
====Presentation by Steve Fischer====<br />
<br />
[[GUS_WDK_Presentation|GUS WDK Presentation]]<br />
<br />
===InterMine===<br />
<br />
====Background====<br />
<br />
* Source: http://www.intermine.org<br />
* Language: Java<br />
* Authors: FlyMine/InterMine development team<br />
* Users: FlyMine, StemCellMine, others under development<br />
* Support: FlyMine/InterMine development team<br />
* Third party code: Struts/JSP/Ajax<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity: JDBC<br />
* Transaction support: basic during build (doesn't support concurrent writes) as database is optimised for high read-only query performance<br />
* Code generation: extensive use of automatic code generation<br />
<br />
====Limitations====<br />
<br />
* slow, but improving, build times (loads and integrates ~25m objects in ~36 hours)<br />
* configuration complicated but being simplfied<br />
* deals well with overlapping features, but currently limited support for querying locations in DNA or protein sequences<br />
* export still limited to tab or comma-delimited or FASTA formats though great flexibility in choice of output columns and their order<br />
* sequences not handled very well<br />
** e.g. each chromosome sequence is stored in one big text field in PostgreSQL<br />
* can't yet do queries involving sizes of collections of things<br />
** e.g. find the genes with only 1 transcript<br />
* doesn't yet support left outer join behaviour (under development)<br />
<br />
====Presentation by Gos Micklem====<br />
<br />
[[InterMine_Presentation|InterMine Presentation]]<br />
<br />
<br />
<br />
==Wiki Authors==<br />
<br />
* Jeff Bowes, XenBase<br />
* Robert Bruggner, VectorBase<br />
* Scott Cain, GMOD<br />
* Steve Fischer, PlasmoDB/GUS<br />
* Josh Goodman, FlyBase<br />
* Eric Just, DictyBase<br />
* Sohel Merchant, DictyBase<br />
* Brian O'Connor, UCLA<br />
* Brian Osborne, GMOD<br />
* Chinmay Patel, GeneDB<br />
* Pinglei Zhou, FlyBase<br />
* Gos Micklem, FlyMine/InterMine</div>165.124.152.78http://gmod.org/wiki/GMOD_MiddlewareGMOD Middleware2007-02-06T22:57:58Z<p>165.124.152.78: /* The Middleware Packages */</p>
<hr />
<div>==Executive Summary==<br />
<br />
Although GMOD uses the Chado schema for its underlying database, each group has developed a separate interface to their databases. This meeting was designed to review the experiences each group has had with its particular solution and to recommend a "best practice." <br />
<br />
All participants had developed some type of Object-Relational Mapping (ORM) tool. The tools fell into two general classes. The first class, which includes Hibernate (Java), iBatis (Java), and Chado::AutoDBI (Perl) examines the relational schema and configuration files to create the middleware automatically. The second class involved hand-coding an API to create an object-oriented interface to Chado. Of the tools presented, Modware, built on top of Chado::AutoDBI, was the most mature and feature-rich.<br />
<br />
The consensus from the meeting was that while fully automatic tools are convenient, that there is considerable value in creating a hand-crafted predictable API that reflects the biological data closely. The approach taken by Modware, which starts from an auto-generated interface and then adds a hand-coded API layer, is both comprehensible and powerful. We recommend that implementors of Perl-based GMOD tools seriously consider Modware for their middleware interface to Chado, and that the implementors of Java-based tools collaborate to create a Java API that parallels Modware's.<br />
<br />
<br />
__TOC__<br />
<br clear='all'><br />
<br />
==Middleware for Chado databases==<br />
<br />
===Participants===<br />
<br />
On January 19 2007, representatives of the major model organism databases (MODs) and other interested parties met to discuss and compare Middleware packages used by developers working for the MODs. The workshop attendees were tasked to make specific recommendations. Such recommendations will focus the efforts of the MODs on specific packages and lead to code re-use and more feature-rich middleware.<br />
<br />
After an introductory presentation attendees listened to a series of presentations on the different Middleware packages. Each presenter described the features of the package, both positive and negative, and showed how the package would be used to address a specific set of test problems. The attendees reassembled in a general discussion group to discuss the presentations and to develop a consensus statement. This document represents the consensus of the workshop and shows the individual presentations.<br />
<br />
<br />
===Introduction===<br />
<br />
A group of some 50 GMOD developers gathered at the annual meeting to discuss middleware. This one day meeting had the following general goals:<br />
<br />
* To educate GMOD programmers on methods and practices for Middleware<br />
* To facilitate discussion on the best methods<br />
* To guide GMOD to a uniform Middleware layer<br />
* To generate this central reference document for Middleware projects, including:<br />
** Platform information<br />
** Strengths & weaknesses of different Middleware packages<br />
** Specific examples of how one would use a given middleware package<br />
<br />
One of the key characteristics of the GMOD software project is the variety of approaches and components that it supports. This applies to applications, database schemas, as well as to middleware, a software ''layer'' that mediates the exchange of data between the applications and the databases. Despite this diversity certain applications and schemas have emerged as key supported components in GMOD, such as the GBrowse application and the Chado schema, to name just two. However, a consensus view has not emerged with respect to middleware, and there are certainly a number of different middleware packages that have been used in the GMOD world, coming from within this world and from the larger world of open source.<br />
<br />
In late 2006 the GMOD developers took note of the large number of middleware packages in use and elected to embark on a short-term study to evaluate and compare these packages. The primary motivation here was to select or recommend certain packages over others specifically within the GMOD context. The assumption is that making such recommendations will serve to focus the developers' effort on a smaller number of packages. Clearly it's also assumed that such a focus will inevitably lead to greater support for and use of those recommended packages, and that all will GMOD will benefit.<br />
<br />
Another purpose of this study is to educate GMOD programmers on best practices concerning the use and development of middleware. It's expected that common agreement on these practices will lead to the development of more effective software as well as the best use of the software in practice. Finally, this study should generate a central reference document on these different middleware packages used in GMOD. This reference will contain platform- and language-specific information as well as descriptions of the strengths and weaknesses of the packages that can be used by GMOD developers when considering middleware.<br />
<br />
====General Evaluation Criteria====<br />
<br />
The GMOD developers proposed that each presenter provide some basic information about each middleware package, both general and technical. In addition each middleware application was asked to address a set of sample problems, shown below. These example problems are thought to typify some of the common functions that the scientist may need when working with their own database. It was understood that not all software would be able to handle all aspects of the sample problems and this demonstration was not intended to be ''live''.<br />
<br />
=====Problem 1=====<br />
<br />
Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned <code>feature_id</code> for each inserted gene.<br />
<br />
Note:<br />
<br />
* The coordinates are given in exact coordinates.<br />
* Use the <code>organism_id</code> for your organism<br />
* Store description in the chado table <code>featureprop</code><br />
* A sequence in fasta format (see [[FakeChromosome|Fake Chromosome]]) should be loaded as genomic sequence, either chromosome or contig -- this will be used as a <code>srcfeature</code> in <code>featureloc</code><br />
<br />
Gene descriptions:<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
mRNA Feature<br />
exon_1: <br />
start: 13691<br />
end: 13767<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
exon_2: <br />
start: 14687<br />
end: 14720<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
symbol: x-men<br />
synonyms: wolverine<br />
mRNA Feature<br />
exon_1: <br />
start: 12648<br />
end: 13136<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
symbol: x-ray<br />
synonyms: none<br />
exon: <br />
start: 1703<br />
end: 1900<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 2=====<br />
<br />
Retrieve and print the following report for gene '''xfile''' (the coding sequence and exon coordinates are derived from the associated mRNA feature). The results should resemble the following:<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>xfile cds <br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
=====Problem 3=====<br />
<br />
Update the gene '''xfile''': change the name symbol to '''x-file''' and retrieve the changed record. Regenerate the report from Problem 1. The results should resemble the following:<br />
<br />
symbol: x-file<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>x-file cds<br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
=====Problem 4=====<br />
<br />
Search for all genes with symbols starting with ''x-''. With the results produce the following simple result list<br />
(organism will vary):<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
1325 x-ray Xenopus laevis<br />
<br />
=====Problem 5=====<br />
<br />
Delete the gene '''x-ray''' using the <code>geneId</code>. Run the search and report in Problem 4 again to show the delete has taken<br />
place, with a result resembling the following:<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
<br />
=====Problem Results=====<br />
<br />
All presenters paid attention to the assigned problems and all packages could perform the required operations, except for the GBrowse (DasI) Adaptor which is read-only software. Clearly one can see many differences between packages in how the the problems were solved, please see the presentations themselves for these details.<br />
<br />
===The Middleware Packages===<br />
<br />
The one day meeting heard presentations from developers using both Perl and Java middleware and a number of satisfactory solutions were described. The focus in all cases was some sort of system that connected to the Chado relational database. Although other databases are encountered in the GMOD world (e.g. [http://biosql.org BioSQL]) the Chado schema is popular and serves as a good test schema for this exercise given its complexity. The primary focus in the talks was on functionality from the perspective of writing code and extending the software and less attention was given to performance. Each presenter focussed on their middleware and little side-by-side comparisons were made (for one comparison please see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]] by Josh Goodman, written before this meeting).<br />
<br />
The Chado::AutoDBI middleware package is built on Perl's Class:DBI, a module that can be considered either an ORM tool or a tool to build ORM tools. Modware is built on top of Chado::AutoDBI. Both Chado::AutoDBI and Modware are built specifically for the Chado schema and aren't general tools. <br />
<br />
A key difference is that Modware defines an interface that avoids schema semantics whereas the Chado::AutoDBI API resembles that of Class::DBI and closely maps objects to schema tables. Furthermore Chado::AutoDBI wraps the entire Chado schema where Modware objects only address a subset of the Chado schema. Neither of these packages require configuration, they are pre-configured for Chado and all one needs to do is connect to an existing schema. This schema, in theory, could be running on any popular RDBMS (Postgres, Mysql, Oracle, etc.), this flexibility is built into Class::DBI.<br />
<br />
The Java packages, Hibernate and iBatis, are far more general than the 2 packages discussed above and one could use them with any relational schema. One distinguishing feature is the languages that these packages use: Hibernate tends to present more Java to the programmer whereas iBatis presents both Java and XML. Both allow you to address the schema with SQL should you choose to do so. Hibernate also introduces its own language, HQL, a Java-SQL hybrid. Both would need some configuration to connect to the Chado schema but this should not be considered a significant barrier. Both packages can be used with any popular RDBMS (Postgres, Mysql, Oracle, etc.).<br />
<br />
XORT is not a typical ORM tool like these other packages but has been included here because of its utility in bulk operations. This capability is one that the ORM tools are thought not to do very well as the serial construction and destruction of objects is typically not fast, and constructing very large numbers of objects simultaneously consumes quite a bit of memory. This is not to say that one shouldn't use an ORM tool for bulk operations but that one should test the tool in question and not assume its performance is adequate to the given task. Such a test may show an entirely different approach, like that of XORT, is more appropriate.<br />
<br />
====Java Middleware====<br />
<br />
The Java packages all used Java plus XML, to some degree. In addition iBatis exposes SQL to the developer and it was argued that this could be viewed either as an advantage (allows tuning of underlying SQL) or a disadvantage. Both Hibernate and iBatis are designed to operate with any relational schema, not just Chado. <br />
<br />
Both Abator and iBatis share one limitation which is that one can not <br />
auto-generate higher level objects, such as Genes. In fairness one must add that this would not really <br />
be possible to do accurately for any system. There is not sufficient <br />
information present in the database schema to derive all of the <br />
relationships necessary. In both packages the auto-generation code only mimics the schema and it would have take additional work to get <br />
a package to automatically create and relate subsets of data based on "join <br />
tables" like feature_relationship property. This mostly has to do <br />
with the circular nature of the schema.<br />
<br />
=====Hibernate=====<br />
<br />
From the [http://www.hibernate.org Hibernate Web site]:<br />
<br />
''Hibernate lets you develop persistent classes following object-oriented idiom - including association, inheritance, polymorphism, composition, and collections. Hibernate allows you to express queries in its own portable SQL extension (HQL), as well as in native SQL, or with an object-oriented Criteria and Example API. Unlike many other persistence solutions, Hibernate does not hide the power of SQL from you and guarantees that your investment in relational technology and knowledge is as valid as always.''<br />
<br />
Hibernate is thought to work best when used in conjunction with a schema that's been designed with objects in mind, an ''object-oriented schema''.<br />
<br />
======Abstraction======<br />
<br />
Hibernate is the more abstracted of the 2 Java packages, it allows you to work with the relational database with the least exposure to SQL if you choose to do this. It is probably considered the more flexible of the 2 with respect to language since one can program in Java or HBL (Hibernate Query Language), a hybrid between SQL and Java.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons between Hibernate and iBatis were made.<br />
<br />
======Configuration======<br />
<br />
Hibernate has the ability to read a database schema and generate Java <br />
Code representing the database objects or tables. In this exercise the developer chose not to use <br />
this method because one must create a file telling the code auto-generation what to use for indexes, which fields should be complex <br />
objects, and so on. It was judged to be less work to just go ahead and make <br />
the code by hand.<br />
<br />
For each object mapped to the database, one can create an XML file <br />
that tells Hibernate which database fields to map to specific object <br />
properties. Additionally, one can implement equals() and hashcode() <br />
functions for each object based on the table unique constraints. <br />
After doing that, Hibernate takes care of all the Create-Retreive-Update-Delete (CRUD) and transaction <br />
functionality.<br />
<br />
======Documentation======<br />
<br />
Hibernate is a popular and well-supported tool with extensive documentation.<br />
<br />
=====iBatis=====<br />
<br />
From the [http://ibatis.apache.org iBatis Web site]:<br />
<br />
''The Data Mapper framework (a.k.a. SQL Maps) will help to significantly reduce the amount of Java and .NET code that is normally neededto access a relational database. This framework maps classes to SQL statements using a very simple XML descriptor. Simplicity is the biggest advantage of iBATIS over other frameworks and object relational mapping tools. To use iBATIS you need only be familiar with your own application domain objects (basic JavaBeans or .NET classes), XML, and SQL. There is very little else to learn. There is no complex scheme required to join tables or execute complex queries. Using iBATIS you have the full power of real SQL at your fingertips. The iBATIS Data Mapper framework can map nearly any database to any object model and is very tolerant of legacy designs, or even bad designs. This is all achieved without special database tables, peer objects or code generation.''<br />
<br />
======Abstraction======<br />
<br />
iBatis does not attempt to achieve abstraction in the way that other Java ORM tools do and assumes that viewing SQL is an advantage, not a disadvantage.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons made between iBatis and Hibernate.<br />
<br />
======Configuration======<br />
<br />
The following general steps are performed:<br />
<br />
# Create an XML file of tables the developer wants to create objects for <br />
#* Including specifying the method for retrieving auto-generated sequence values from inserts<br />
#* This file could als be easily auto-generated<br />
# Make some type adjustments for some variables <br />
#* For example, iBatis mapped some things to Java strings that should have been Integers<br />
<br />
======Documentation======<br />
<br />
iBatis is a popular and well-supported tool with extensive documentation.<br />
<br />
====Perl Middleware====<br />
<br />
The Perl approaches used only the Perl language (the Java packages all used Java plus XML, to some degree). The XORT application is not, strictly speaking, ''middleware'' but has proven to be very useful in bulk operations using the Chado schema and Chado XML though in principle it can be used with any relational schema.<br />
<br />
=====Chado::AutoDBI=====<br />
<br />
======Abstraction======<br />
<br />
Chado::AutoDBI objects map directly to the Chado tables so it could be said that Chado::AutoDBI is as abstract as Chado itself. Therefore one needs to become somewhat familiar with Chado itself in order to use Chado::AutoDBI.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons of performance were done using Perl middleware. All packages were deemed to give adequate performance when used to connect UIs to underlying databases. On the other hand the presenters were reluctant to recommend their packages for ''bulk'' operations. <br />
<br />
======Configuration======<br />
<br />
Chado::AutoDBI requires no configuration, it is designed to interact with the Chado schema out-of-the-box. If the schema changes then the underlying Class::DBI should be able to adjust to the change.<br />
<br />
======Documentation======<br />
<br />
=====Modware=====<br />
<br />
======Abstraction======<br />
<br />
Modware has higher level of abstraction than that provided by Chado::AutoDBI. The critical point is that it can create BioPerl object representations of features directly from Chado, this could either be highly suitable or not at all appropriate for a given environment.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons of performance were done using Perl middleware. All packages were deemed to give adequate performance when used to connect UIs to underlying databases.<br />
<br />
======Configuration======<br />
<br />
Modware requires no configuration, it is designed to interact with the Chado schema out-of-the-box. If the schema changes then the underlying Chado::AutoDBI should be able to adjust to the change. Some convenience views are created when installing Modware (i.e. V_MRNA_FEATURES).<br />
<br />
======Documentation======<br />
<br />
Bioperl-style documentation at http://gmod-ware.sourceforge.net/doc/, written for POD for all methods.<br />
<br />
=====GBrowse (DasI)=====<br />
<br />
======Abstraction======<br />
Since it implements the Bio::DasI interface, the abstraction is similar to what one would see with BioPerl/BioDas features.<br />
<br />
======Performance======<br />
Bio::DB::Das::Chado performs relatively well since it is designed to be a database adaptor to drive a Generic Genome Browser (GBrowse) instance. Though it does not perform as well as the {{BPM|Bio::DB::GFF}} and {{BPM|Bio::DB::SeqFeature}} adaptors, those are for databases designed specifically for quick retrieval of data for GBrowse, whereas Chado is designed as a complete data warehouse. The main way to get better performance for the Chado GBrowse adaptor is to implement materialized views of the GFF or SeqFeature schema inside Chado.<br />
<br />
======Configuration======<br />
<br />
This package needs no configuration, it is pre-configured for Chado.<br />
<br />
======Documentation======<br />
<br />
The DasI interface is well-documented, about a dozen methods and three classes, all documented.<br />
<br />
===Getting More Information===<br />
<br />
The issues around using and developing middleware are of general interest in GMOD. If you have questions about middleware we suggest that you contact the GMOD Development list rather than contacting individual developers. You can sign up for the list here:<br />
<br />
https://lists.sourceforge.net/lists/listinfo/gmod-devel<br />
<br />
<br />
===Object-Relational Mapping Principles===<br />
<br />
====Presentation by Sohel Merchant====<br />
<br />
Sohel Merchant, Bioinformatics Software Engineer at dictyBase, Center for Genetic Medicine, Northwestern University, Chicago. This Wiki section is an edited version of [[Media:Object_Relational_Mapping_Layer.pdf|Sohel's presentation]].<br />
<br />
=====Outline=====<br />
<br />
* The Problem<br />
* Solutions<br />
* ORM<br />
* Perl – Class::DBI<br />
* Summary<br />
<br />
=====The Problem=====<br />
<br />
* Developers need to perform Create, Retrieve, Update, Delete (aka CRUD) operations on data inside an application.<br />
* The real world objects represented using a programming language needs to be stored in databases<br />
* Using relational databases to store object-oriented data leads to a semantic gap<br />
* RDBMS have fixed types, but OO can have more complicated user defined types.<br />
<br />
=====Solutions=====<br />
<br />
* Data Access Object (DAO)<br />
* Developer writes a class which contains one attribute for each field in the table<br />
* Methods for CRUD typically contains JDBC/DBI code with the necessary SQL statements.<br />
* Object Relational Mapping (ORM), WikiPedia:<br />
** “ORM is a programming technique that links databases to object-oriented language concepts, creating (in effect) a virtual object database.“<br />
* Developer needs to configure the ORM<br />
* Less amount of manual coding<br />
* CRUD methods are automatically generated by the ORM layer<br />
<br />
=====ORM=====<br />
<br />
ORM solutions<br />
<br />
* Perl<br />
** Class::DBI<br />
* Java<br />
** EJB<br />
** Hibernate<br />
** JDO<br />
** iBatis<br />
<br />
=====Perl - Class::DBI=====<br />
<br />
* Provides a simple interfaces for wrapping Perl classes around a database tables<br />
* Tables are mapped directly to objects<br />
* The table column name are mapped to the get/set methods<br />
* Can be used with transactions<br />
<br />
=====Class::DBI=====<br />
<br />
Defining a class in Class::DBI to represent a table:<br />
<br />
CVTERM<br />
cvterm_id<br />
cv_id<br />
name<br />
definition<br />
dbxref_id<br />
<br />
Corresponding code:<br />
<br />
<perl><br />
package Chado::Cvterm;<br />
use base 'Chado::DBI';<br />
Chado::Cvterm->set_up_table('Cvterm');<br />
</perl><br />
<br />
=====Class::DBI - CRUD=====<br />
<br />
<perl><br />
## Create<br />
$term_dbobj = Chado::Cvterm->create({<br />
name => ”DUMMY TERM”,<br />
cv_id => 1,<br />
dbxref_id => 125<br />
});<br />
## Retrieve<br />
$term_dbobj = Chado::Cvterm->retrieve(2);<br />
<br />
## Update<br />
$term_dbobj->name( $term->name() );<br />
$term_dbobj->definition( $term->definition );<br />
<br />
## Delete<br />
$term_dbobj->delete();<br />
</perl><br />
<br />
=====Java - Hibernate=====<br />
<br />
* Hibernate maps Java Objects directly to database tables<br />
* Scalable<br />
* Works well for controlled Data model<br />
<br />
=====Java - iBatis=====<br />
<br />
* iBATIS maps Java Objects to the results of SQL Queries<br />
* XML definitions for queries<br />
* Queries and managing Maps<br />
* Transactions<br />
* Good fit for existing database schema<br />
<br />
=====Summary=====<br />
<br />
* ORM provides painless roundtrip of data between the application and database.<br />
* Reduces the amount of SQL code and allows a programmatic style interface to the RDBMS<br />
* Choice of ORM solution depends on the type of project<br />
* Flybase examined iBatis and Hibernate, both use XML configuration files<br />
** Hibernate is better if you're building schema from scratch<br />
** Both auto-configure given a schema. <br />
** Both have strengths and weaknesses.<br />
* Is Hibernate better when you're in the process of designing a schema?<br />
** Hibernate can assist you in making a ''Hibernate-compatible'' schema.<br />
<br />
===XORT===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/project/showfiles.php?group_id=27707<br />
* Language: Perl<br />
* Authors: Pinglei Zhou<br />
* Users: Flybase<br />
* Support: Pinglei Zhou<br />
* Third party code: Uses XML::Parser::PerlSAX, Unicode::String, XML::DOM, and DBI from Perl (CPAN)<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
=====Comparing Hibernate & XORT=====<br />
<br />
Flybase tried Hibernate, but just creating simple print() statements in the course of doing bulk operations they<br />
encountered performance issues. Therer are many caching parameters available in Hibernate but the problem is that Chado is recursive or cyclical. <br />
XORT does some simple, and automatic, caching. With XORT you can handle recursive or cyclical operations more easily. In common operations such as merging genes Chado users will encounter this issue routinely.<br />
<br />
Also see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]].<br />
<br />
====Limitations====<br />
<br />
* Database schemas need to follow certain rules<br />
** All must have internal int primary key<br />
** All must have unique key(s)<br />
* It may take a long path to retrieve certain type of data<br />
** Example: gene->allele->genotype->phenotype via feature_relationship<br />
* Structure not stored in memory, you flush out data as it goes<br />
<br />
====Presentation by Pinglei Zhou and Josh Goodman====<br />
<br />
[[XORT Presentation|XORT Presentation]]<br />
<br />
===Chado::AutoDBI===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/projects/gmod/<br />
* Language: Perl<br />
* Authors: Allen Day, Scott Cain, Brian O'Connor, & others<br />
* Users: <br />
* Support:<br />
* Third party code: Based on Class::DBI by Michael Schwern & Tony Bowden<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Performance<br />
** Can one read thousands of objects into memory? You could do this but it's not suited to bulk operations<br />
* Joins & complex queries<br />
<br />
<perl><br />
# Add the add_constructor for looking for name lengths<br />
<br />
__PACKAGE__ ->add_constructor(long_names => qq{ length(name) > 15 });<br />
<br />
# Custom SQL<br />
<br />
__PACKAGE__->set_sql(xfiles => qq{<br />
SELECT FEATURE_ID<br />
FROM FEATURE<br />
WHERE NAME = 'xfiles' });<br />
</perl><br />
<br />
====Presentation by Brian O'Connor====<br />
<br />
[[Chado::AutoDBI Presentation|Chado::AutoDBI Presentation]]<br />
<br />
===Modware===<br />
Modware seeks to provide an object-oriented perl interface to Chado to allow for rapid application development without worrying about the complex details of the schema on a day-to-day basis.<br />
<br />
====Background====<br />
<br />
* Source: http://gmod-ware.sourceforge.net<br />
* Language: Perl<br />
* Authors: Sohel Merchant, Eric Just<br />
* Contact: e-just [at] northwestern.edu<br />
* Users: dictyBase<br />
* Support: Fully supported by authors. Sourceforge infrastructure (bug reports, mailing lists)<br />
* Third party code: GMOD, BioPerl<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity: Uses the Chado::AutoDBI connection from GMOD. No connection configuration necessary since Modware is built on top of GMOD and the coonnection is configured on GMOD install. <br />
* Transaction support: Transactions are fully supported. The database handle is available as a singleton through Modware::DBH. To rollback at any time, simply insert '''new Modware::DBH->rollback()''' into script.<br />
* Code generation: No automatic code generation.<br />
<br />
====Special topics====<br />
<br />
* Features in Modware map to familiar Bioperl (Bio::Seq and Bio::SeqFeature) objects that can be accessed through a 'bioperl' method. <br />
Modware internally links Bio::SeqFeautre objects to the proper Bio::Seq objects to provide maximum functionality from the BioPerl libraries.<br />
* Modware's API documentation is complete and easily browsed at [http://gmod-ware.sourceforge.net/doc/ http://gmod-ware.sourceforge.net/doc/]<br />
* Plenty of other documentation (quick starts with simple use cases) at [http://gmod-ware.sourceforge.net http://gmod-ware.sourceforge.net]<br />
<br />
====Limitations====<br />
* Currently Alpha release (hoping for user feedback to create Beta)<br />
* Does not ''cover'' all of Chado<br />
* Not enough users to get quality feedback yet<br />
* Performance is slower by object-oriented nature<br />
* Language dependent<br />
<br />
====Presentation by Eric Just====<br />
<br />
[[Modware Presentation|Modware Presentation]]<br />
<br />
===GBrowse (DasI) Adaptor===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/project/showfiles.php?group_id=27707&package_id=34513&release_id=433523<br />
* Language: Perl<br />
* Authors: Scott Cain<br />
* Users: Several through GBrowse, none that I know of as a scripting tool.<br />
* Support: [http://lists.sourceforge.net/lists/listinfo/gmod-gbrowse GBrowse mailing list]<br />
* Third party code: None.<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity: Connects ''via'' Perl's DBI<br />
* Transaction support: N/A (read only adapter)<br />
* Code generation: N/A<br />
<br />
====Special topics====<br />
<br />
Clearly, Bio::DB::Das::Chado is designed for use with GBrowse.<br />
<br />
====Limitations====<br />
<br />
* Read-only<br />
* Not generic middleware but if you use Chado and GBrowse may be useful<br />
* Incomplete implementation of Bio::DasI; just enough to make GBrowse work<br />
* Also, despite the name, has never been tested with a Das server.<br />
<br />
====Presentation by Scott Cain====<br />
<br />
[[GBrowse (DasI) Presentation|GBrowse (DasI) Adaptor Presentation]]<br />
<br />
===iBatis and Abator===<br />
<br />
====Background====<br />
<br />
* Source: http://ibatis.apache.org/<br />
* Language: Java<br />
* Authors: Apache group<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Does not hide SQL<br />
* Does not create a whole object model of the database in memory<br />
* Not as widely used as Hibernate<br />
* No Perl version<br />
<br />
====Presentation by Jeff Bowes====<br />
<br />
[[iBatis Presentation|iBatis Presentation]]<br />
<br />
===Hibernate===<br />
<br />
====Background====<br />
<br />
* Source: http://www.hibernate.org<br />
* Language: Java<br />
* Authors: JBoss group<br />
* Users: VectorBase<br />
* Support: JBoss group<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Limitations====<br />
<br />
* Issue around Completeness <br />
* Exception Handling<br />
* Performance Tuning<br />
<br />
====Presentation by Robert Bruggner====<br />
<br />
[[Hibernate Presentation|Hibernate Presentation]]<br />
<br />
===PSU Chado Interface===<br />
<br />
====Background====<br />
<br />
* Source:<br />
* Language: Java<br />
* Authors: Chinmay Patel, Adrian Tivey, Tim Carver<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Limitations====<br />
<br />
* Hibernate Save: no equivalent to a ''find or save'' method.<br />
* Not great for bulk data retrieval.<br />
* Hibernate works best when applied to databases designed with objects in mind (''Object Oriented Databases''). <br />
<br />
<br />
====Presentation by Chinmay Patel====<br />
<br />
[[PSU Presentation|PSU Presentation]]<br />
<br />
===Wiki Authors===<br />
<br />
* Jeff Bowes, XenBase<br />
* Robert Bruggner, VectorBase<br />
* Scott Cain, GMOD<br />
* Josh Goodman, FlyBase<br />
* Eric Just, DictyBase<br />
* Sohel Merchant, DictyBase<br />
* Brian O'Connor, UCLA<br />
* Brian Osborne, GMOD<br />
* Chinmay Patel, GeneDB<br />
* Pinglei Zhou, FlyBase</div>165.124.152.78http://gmod.org/wiki/GMOD_MiddlewareGMOD Middleware2007-02-06T22:52:00Z<p>165.124.152.78: /* The Middleware Packages */</p>
<hr />
<div>==Executive Summary==<br />
<br />
Although GMOD uses the Chado schema for its underlying database, each group has developed a separate interface to their databases. This meeting was designed to review the experiences each group has had with its particular solution and to recommend a "best practice." <br />
<br />
All participants had developed some type of Object-Relational Mapping (ORM) tool. The tools fell into two general classes. The first class, which includes Hibernate (Java), iBatis (Java), and Chado::AutoDBI (Perl) examines the relational schema and configuration files to create the middleware automatically. The second class involved hand-coding an API to create an object-oriented interface to Chado. Of the tools presented, Modware, built on top of Chado::AutoDBI, was the most mature and feature-rich.<br />
<br />
The consensus from the meeting was that while fully automatic tools are convenient, that there is considerable value in creating a hand-crafted predictable API that reflects the biological data closely. The approach taken by Modware, which starts from an auto-generated interface and then adds a hand-coded API layer, is both comprehensible and powerful. We recommend that implementors of Perl-based GMOD tools seriously consider Modware for their middleware interface to Chado, and that the implementors of Java-based tools collaborate to create a Java API that parallels Modware's.<br />
<br />
<br />
__TOC__<br />
<br clear='all'><br />
<br />
==Middleware for Chado databases==<br />
<br />
===Participants===<br />
<br />
On January 19 2007, representatives of the major model organism databases (MODs) and other interested parties met to discuss and compare Middleware packages used by developers working for the MODs. The workshop attendees were tasked to make specific recommendations. Such recommendations will focus the efforts of the MODs on specific packages and lead to code re-use and more feature-rich middleware.<br />
<br />
After an introductory presentation attendees listened to a series of presentations on the different Middleware packages. Each presenter described the features of the package, both positive and negative, and showed how the package would be used to address a specific set of test problems. The attendees reassembled in a general discussion group to discuss the presentations and to develop a consensus statement. This document represents the consensus of the workshop and shows the individual presentations.<br />
<br />
<br />
===Introduction===<br />
<br />
A group of some 50 GMOD developers gathered at the annual meeting to discuss middleware. This one day meeting had the following general goals:<br />
<br />
* To educate GMOD programmers on methods and practices for Middleware<br />
* To facilitate discussion on the best methods<br />
* To guide GMOD to a uniform Middleware layer<br />
* To generate this central reference document for Middleware projects, including:<br />
** Platform information<br />
** Strengths & weaknesses of different Middleware packages<br />
** Specific examples of how one would use a given middleware package<br />
<br />
One of the key characteristics of the GMOD software project is the variety of approaches and components that it supports. This applies to applications, database schemas, as well as to middleware, a software ''layer'' that mediates the exchange of data between the applications and the databases. Despite this diversity certain applications and schemas have emerged as key supported components in GMOD, such as the GBrowse application and the Chado schema, to name just two. However, a consensus view has not emerged with respect to middleware, and there are certainly a number of different middleware packages that have been used in the GMOD world, coming from within this world and from the larger world of open source.<br />
<br />
In late 2006 the GMOD developers took note of the large number of middleware packages in use and elected to embark on a short-term study to evaluate and compare these packages. The primary motivation here was to select or recommend certain packages over others specifically within the GMOD context. The assumption is that making such recommendations will serve to focus the developers' effort on a smaller number of packages. Clearly it's also assumed that such a focus will inevitably lead to greater support for and use of those recommended packages, and that all will GMOD will benefit.<br />
<br />
Another purpose of this study is to educate GMOD programmers on best practices concerning the use and development of middleware. It's expected that common agreement on these practices will lead to the development of more effective software as well as the best use of the software in practice. Finally, this study should generate a central reference document on these different middleware packages used in GMOD. This reference will contain platform- and language-specific information as well as descriptions of the strengths and weaknesses of the packages that can be used by GMOD developers when considering middleware.<br />
<br />
====General Evaluation Criteria====<br />
<br />
The GMOD developers proposed that each presenter provide some basic information about each middleware package, both general and technical. In addition each middleware application was asked to address a set of sample problems, shown below. These example problems are thought to typify some of the common functions that the scientist may need when working with their own database. It was understood that not all software would be able to handle all aspects of the sample problems and this demonstration was not intended to be ''live''.<br />
<br />
=====Problem 1=====<br />
<br />
Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned <code>feature_id</code> for each inserted gene.<br />
<br />
Note:<br />
<br />
* The coordinates are given in exact coordinates.<br />
* Use the <code>organism_id</code> for your organism<br />
* Store description in the chado table <code>featureprop</code><br />
* A sequence in fasta format (see [[FakeChromosome|Fake Chromosome]]) should be loaded as genomic sequence, either chromosome or contig -- this will be used as a <code>srcfeature</code> in <code>featureloc</code><br />
<br />
Gene descriptions:<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
mRNA Feature<br />
exon_1: <br />
start: 13691<br />
end: 13767<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
exon_2: <br />
start: 14687<br />
end: 14720<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
symbol: x-men<br />
synonyms: wolverine<br />
mRNA Feature<br />
exon_1: <br />
start: 12648<br />
end: 13136<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
symbol: x-ray<br />
synonyms: none<br />
exon: <br />
start: 1703<br />
end: 1900<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 2=====<br />
<br />
Retrieve and print the following report for gene '''xfile''' (the coding sequence and exon coordinates are derived from the associated mRNA feature). The results should resemble the following:<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>xfile cds <br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
=====Problem 3=====<br />
<br />
Update the gene '''xfile''': change the name symbol to '''x-file''' and retrieve the changed record. Regenerate the report from Problem 1. The results should resemble the following:<br />
<br />
symbol: x-file<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>x-file cds<br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
=====Problem 4=====<br />
<br />
Search for all genes with symbols starting with ''x-''. With the results produce the following simple result list<br />
(organism will vary):<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
1325 x-ray Xenopus laevis<br />
<br />
=====Problem 5=====<br />
<br />
Delete the gene '''x-ray''' using the <code>geneId</code>. Run the search and report in Problem 4 again to show the delete has taken<br />
place, with a result resembling the following:<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
<br />
=====Problem Results=====<br />
<br />
All presenters paid attention to the assigned problems and all packages could perform the required operations, except for the GBrowse (DasI) Adaptor which is read-only software. Clearly one can see many differences between packages in how the the problems were solved, please see the presentations themselves for these details.<br />
<br />
===The Middleware Packages===<br />
<br />
The one day meeting heard presentations from developers using both Perl and Java middleware and a number of satisfactory solutions were described. The focus in all cases was some sort of system that connected to the Chado relational database. Although other databases are encountered in the GMOD world (e.g. [http://biosql.org BioSQL]) the Chado schema is popular and serves as a good test schema for this exercise given its complexity. The primary focus in the talks was on functionality from the perspective of writing code and extending the software and less attention was given to performance. Each presenter focussed on their middleware and little side-by-side comparisons were made (for one comparison please see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]] by Josh Goodman, written before this meeting).<br />
<br />
The Chado::AutoDBI middleware package is built on Perl's Class:DBI, a module that can be considered either an ORM tool or a tool to build ORM tools. Modware is built on top of Chado::AutoDBI. Both Chado::AutoDBI and Modware are built specifically for the Chado schema and aren't general tools. <br />
<br />
A key difference is that Modware uses Bioperl as its programmatic interface whereas the Chado::AutoDBI API resembles that of Class::DBI. Furthermore Chado::AutoDBI wraps the entire Chado schema where Modware objects only address a subset of the Chado schema. Neither of these packages require configuration, they are pre-configured for Chado and all one needs to do is connect to an existing schema. This schema, in theory, could be running on any popular RDBMS (Postgres, Mysql, Oracle, etc.), this flexibility is built into Class::DBI.<br />
<br />
The Java packages, Hibernate and iBatis, are far more general than the 2 packages discussed above and one could use them with any relational schema. One distinguishing feature is the languages that these packages use: Hibernate tends to present more Java to the programmer whereas iBatis presents both Java and XML. Both allow you to address the schema with SQL should you choose to do so. Hibernate also introduces its own language, HQL, a Java-SQL hybrid. Both would need some configuration to connect to the Chado schema but this should not be considered a significant barrier. Both packages can be used with any popular RDBMS (Postgres, Mysql, Oracle, etc.).<br />
<br />
XORT is not a typical ORM tool like these other packages but has been included here because of its utility in bulk operations. This capability is one that the ORM tools are thought not to do very well as the serial construction and destruction of objects is typically not fast, and constructing very large numbers of objects simultaneously consumes quite a bit of memory. This is not to say that one shouldn't use an ORM tool for bulk operations but that one should test the tool in question and not assume its performance is adequate to the given task. Such a test may show an entirely different approach, like that of XORT, is more appropriate.<br />
<br />
====Java Middleware====<br />
<br />
The Java packages all used Java plus XML, to some degree. In addition iBatis exposes SQL to the developer and it was argued that this could be viewed either as an advantage (allows tuning of underlying SQL) or a disadvantage. Both Hibernate and iBatis are designed to operate with any relational schema, not just Chado. <br />
<br />
Both Abator and iBatis share one limitation which is that one can not <br />
auto-generate higher level objects, such as Genes. In fairness one must add that this would not really <br />
be possible to do accurately for any system. There is not sufficient <br />
information present in the database schema to derive all of the <br />
relationships necessary. In both packages the auto-generation code only mimics the schema and it would have take additional work to get <br />
a package to automatically create and relate subsets of data based on "join <br />
tables" like feature_relationship property. This mostly has to do <br />
with the circular nature of the schema.<br />
<br />
=====Hibernate=====<br />
<br />
From the [http://www.hibernate.org Hibernate Web site]:<br />
<br />
''Hibernate lets you develop persistent classes following object-oriented idiom - including association, inheritance, polymorphism, composition, and collections. Hibernate allows you to express queries in its own portable SQL extension (HQL), as well as in native SQL, or with an object-oriented Criteria and Example API. Unlike many other persistence solutions, Hibernate does not hide the power of SQL from you and guarantees that your investment in relational technology and knowledge is as valid as always.''<br />
<br />
Hibernate is thought to work best when used in conjunction with a schema that's been designed with objects in mind, an ''object-oriented schema''.<br />
<br />
======Abstraction======<br />
<br />
Hibernate is the more abstracted of the 2 Java packages, it allows you to work with the relational database with the least exposure to SQL if you choose to do this. It is probably considered the more flexible of the 2 with respect to language since one can program in Java or HBL (Hibernate Query Language), a hybrid between SQL and Java.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons between Hibernate and iBatis were made.<br />
<br />
======Configuration======<br />
<br />
Hibernate has the ability to read a database schema and generate Java <br />
Code representing the database objects or tables. In this exercise the developer chose not to use <br />
this method because one must create a file telling the code auto-generation what to use for indexes, which fields should be complex <br />
objects, and so on. It was judged to be less work to just go ahead and make <br />
the code by hand.<br />
<br />
For each object mapped to the database, one can create an XML file <br />
that tells Hibernate which database fields to map to specific object <br />
properties. Additionally, one can implement equals() and hashcode() <br />
functions for each object based on the table unique constraints. <br />
After doing that, Hibernate takes care of all the Create-Retreive-Update-Delete (CRUD) and transaction <br />
functionality.<br />
<br />
======Documentation======<br />
<br />
Hibernate is a popular and well-supported tool with extensive documentation.<br />
<br />
=====iBatis=====<br />
<br />
From the [http://ibatis.apache.org iBatis Web site]:<br />
<br />
''The Data Mapper framework (a.k.a. SQL Maps) will help to significantly reduce the amount of Java and .NET code that is normally neededto access a relational database. This framework maps classes to SQL statements using a very simple XML descriptor. Simplicity is the biggest advantage of iBATIS over other frameworks and object relational mapping tools. To use iBATIS you need only be familiar with your own application domain objects (basic JavaBeans or .NET classes), XML, and SQL. There is very little else to learn. There is no complex scheme required to join tables or execute complex queries. Using iBATIS you have the full power of real SQL at your fingertips. The iBATIS Data Mapper framework can map nearly any database to any object model and is very tolerant of legacy designs, or even bad designs. This is all achieved without special database tables, peer objects or code generation.''<br />
<br />
======Abstraction======<br />
<br />
iBatis does not attempt to achieve abstraction in the way that other Java ORM tools do and assumes that viewing SQL is an advantage, not a disadvantage.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons made between iBatis and Hibernate.<br />
<br />
======Configuration======<br />
<br />
The following general steps are performed:<br />
<br />
# Create an XML file of tables the developer wants to create objects for <br />
#* Including specifying the method for retrieving auto-generated sequence values from inserts<br />
#* This file could als be easily auto-generated<br />
# Make some type adjustments for some variables <br />
#* For example, iBatis mapped some things to Java strings that should have been Integers<br />
<br />
======Documentation======<br />
<br />
iBatis is a popular and well-supported tool with extensive documentation.<br />
<br />
====Perl Middleware====<br />
<br />
The Perl approaches used only the Perl language (the Java packages all used Java plus XML, to some degree). The XORT application is not, strictly speaking, ''middleware'' but has proven to be very useful in bulk operations using the Chado schema and Chado XML though in principle it can be used with any relational schema.<br />
<br />
=====Chado::AutoDBI=====<br />
<br />
======Abstraction======<br />
<br />
Chado::AutoDBI objects map directly to the Chado tables so it could be said that Chado::AutoDBI is as abstract as Chado itself. Therefore one needs to become somewhat familiar with Chado itself in order to use Chado::AutoDBI.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons of performance were done using Perl middleware. All packages were deemed to give adequate performance when used to connect UIs to underlying databases. On the other hand the presenters were reluctant to recommend their packages for ''bulk'' operations. <br />
<br />
======Configuration======<br />
<br />
Chado::AutoDBI requires no configuration, it is designed to interact with the Chado schema out-of-the-box. If the schema changes then the underlying Class::DBI should be able to adjust to the change.<br />
<br />
======Documentation======<br />
<br />
=====Modware=====<br />
<br />
======Abstraction======<br />
<br />
Modware has higher level of abstraction than that provided by Chado::AutoDBI. The critical point is that it uses the [http://bioperl.org Bioperl] objects as accessors, this could either be highly suitable or not at all appropriate for a given environment.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons of performance were done using Perl middleware. All packages were deemed to give adequate performance when used to connect UIs to underlying databases.<br />
<br />
======Configuration======<br />
<br />
Modware requires no configuration, it is designed to interact with the Chado schema out-of-the-box. If the schema changes then the underlying Chado::AutoDBI should be able to adjust to the change. Some convenience views are created when installing Modware (i.e. V_MRNA_FEATURES).<br />
<br />
======Documentation======<br />
<br />
Bioperl-style documentation at http://gmod-ware.sourceforge.net/doc/, written for POD for all methods.<br />
<br />
=====GBrowse (DasI)=====<br />
<br />
======Abstraction======<br />
Since it implements the Bio::DasI interface, the abstraction is similar to what one would see with BioPerl/BioDas features.<br />
<br />
======Performance======<br />
Bio::DB::Das::Chado performs relatively well since it is designed to be a database adaptor to drive a Generic Genome Browser (GBrowse) instance. Though it does not perform as well as the {{BPM|Bio::DB::GFF}} and {{BPM|Bio::DB::SeqFeature}} adaptors, those are for databases designed specifically for quick retrieval of data for GBrowse, whereas Chado is designed as a complete data warehouse. The main way to get better performance for the Chado GBrowse adaptor is to implement materialized views of the GFF or SeqFeature schema inside Chado.<br />
<br />
======Configuration======<br />
<br />
This package needs no configuration, it is pre-configured for Chado.<br />
<br />
======Documentation======<br />
<br />
The DasI interface is well-documented, about a dozen methods and three classes, all documented.<br />
<br />
===Getting More Information===<br />
<br />
The issues around using and developing middleware are of general interest in GMOD. If you have questions about middleware we suggest that you contact the GMOD Development list rather than contacting individual developers. You can sign up for the list here:<br />
<br />
https://lists.sourceforge.net/lists/listinfo/gmod-devel<br />
<br />
<br />
===Object-Relational Mapping Principles===<br />
<br />
====Presentation by Sohel Merchant====<br />
<br />
Sohel Merchant, Bioinformatics Software Engineer at dictyBase, Center for Genetic Medicine, Northwestern University, Chicago. This Wiki section is an edited version of [[Media:Object_Relational_Mapping_Layer.pdf|Sohel's presentation]].<br />
<br />
=====Outline=====<br />
<br />
* The Problem<br />
* Solutions<br />
* ORM<br />
* Perl – Class::DBI<br />
* Summary<br />
<br />
=====The Problem=====<br />
<br />
* Developers need to perform Create, Retrieve, Update, Delete (aka CRUD) operations on data inside an application.<br />
* The real world objects represented using a programming language needs to be stored in databases<br />
* Using relational databases to store object-oriented data leads to a semantic gap<br />
* RDBMS have fixed types, but OO can have more complicated user defined types.<br />
<br />
=====Solutions=====<br />
<br />
* Data Access Object (DAO)<br />
* Developer writes a class which contains one attribute for each field in the table<br />
* Methods for CRUD typically contains JDBC/DBI code with the necessary SQL statements.<br />
* Object Relational Mapping (ORM), WikiPedia:<br />
** “ORM is a programming technique that links databases to object-oriented language concepts, creating (in effect) a virtual object database.“<br />
* Developer needs to configure the ORM<br />
* Less amount of manual coding<br />
* CRUD methods are automatically generated by the ORM layer<br />
<br />
=====ORM=====<br />
<br />
ORM solutions<br />
<br />
* Perl<br />
** Class::DBI<br />
* Java<br />
** EJB<br />
** Hibernate<br />
** JDO<br />
** iBatis<br />
<br />
=====Perl - Class::DBI=====<br />
<br />
* Provides a simple interfaces for wrapping Perl classes around a database tables<br />
* Tables are mapped directly to objects<br />
* The table column name are mapped to the get/set methods<br />
* Can be used with transactions<br />
<br />
=====Class::DBI=====<br />
<br />
Defining a class in Class::DBI to represent a table:<br />
<br />
CVTERM<br />
cvterm_id<br />
cv_id<br />
name<br />
definition<br />
dbxref_id<br />
<br />
Corresponding code:<br />
<br />
<perl><br />
package Chado::Cvterm;<br />
use base 'Chado::DBI';<br />
Chado::Cvterm->set_up_table('Cvterm');<br />
</perl><br />
<br />
=====Class::DBI - CRUD=====<br />
<br />
<perl><br />
## Create<br />
$term_dbobj = Chado::Cvterm->create({<br />
name => ”DUMMY TERM”,<br />
cv_id => 1,<br />
dbxref_id => 125<br />
});<br />
## Retrieve<br />
$term_dbobj = Chado::Cvterm->retrieve(2);<br />
<br />
## Update<br />
$term_dbobj->name( $term->name() );<br />
$term_dbobj->definition( $term->definition );<br />
<br />
## Delete<br />
$term_dbobj->delete();<br />
</perl><br />
<br />
=====Java - Hibernate=====<br />
<br />
* Hibernate maps Java Objects directly to database tables<br />
* Scalable<br />
* Works well for controlled Data model<br />
<br />
=====Java - iBatis=====<br />
<br />
* iBATIS maps Java Objects to the results of SQL Queries<br />
* XML definitions for queries<br />
* Queries and managing Maps<br />
* Transactions<br />
* Good fit for existing database schema<br />
<br />
=====Summary=====<br />
<br />
* ORM provides painless roundtrip of data between the application and database.<br />
* Reduces the amount of SQL code and allows a programmatic style interface to the RDBMS<br />
* Choice of ORM solution depends on the type of project<br />
* Flybase examined iBatis and Hibernate, both use XML configuration files<br />
** Hibernate is better if you're building schema from scratch<br />
** Both auto-configure given a schema. <br />
** Both have strengths and weaknesses.<br />
* Is Hibernate better when you're in the process of designing a schema?<br />
** Hibernate can assist you in making a ''Hibernate-compatible'' schema.<br />
<br />
===XORT===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/project/showfiles.php?group_id=27707<br />
* Language: Perl<br />
* Authors: Pinglei Zhou<br />
* Users: Flybase<br />
* Support: Pinglei Zhou<br />
* Third party code: Uses XML::Parser::PerlSAX, Unicode::String, XML::DOM, and DBI from Perl (CPAN)<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
=====Comparing Hibernate & XORT=====<br />
<br />
Flybase tried Hibernate, but just creating simple print() statements in the course of doing bulk operations they<br />
encountered performance issues. Therer are many caching parameters available in Hibernate but the problem is that Chado is recursive or cyclical. <br />
XORT does some simple, and automatic, caching. With XORT you can handle recursive or cyclical operations more easily. In common operations such as merging genes Chado users will encounter this issue routinely.<br />
<br />
Also see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]].<br />
<br />
====Limitations====<br />
<br />
* Database schemas need to follow certain rules<br />
** All must have internal int primary key<br />
** All must have unique key(s)<br />
* It may take a long path to retrieve certain type of data<br />
** Example: gene->allele->genotype->phenotype via feature_relationship<br />
* Structure not stored in memory, you flush out data as it goes<br />
<br />
====Presentation by Pinglei Zhou and Josh Goodman====<br />
<br />
[[XORT Presentation|XORT Presentation]]<br />
<br />
===Chado::AutoDBI===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/projects/gmod/<br />
* Language: Perl<br />
* Authors: Allen Day, Scott Cain, Brian O'Connor, & others<br />
* Users: <br />
* Support:<br />
* Third party code: Based on Class::DBI by Michael Schwern & Tony Bowden<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Performance<br />
** Can one read thousands of objects into memory? You could do this but it's not suited to bulk operations<br />
* Joins & complex queries<br />
<br />
<perl><br />
# Add the add_constructor for looking for name lengths<br />
<br />
__PACKAGE__ ->add_constructor(long_names => qq{ length(name) > 15 });<br />
<br />
# Custom SQL<br />
<br />
__PACKAGE__->set_sql(xfiles => qq{<br />
SELECT FEATURE_ID<br />
FROM FEATURE<br />
WHERE NAME = 'xfiles' });<br />
</perl><br />
<br />
====Presentation by Brian O'Connor====<br />
<br />
[[Chado::AutoDBI Presentation|Chado::AutoDBI Presentation]]<br />
<br />
===Modware===<br />
Modware seeks to provide an object-oriented perl interface to Chado to allow for rapid application development without worrying about the complex details of the schema on a day-to-day basis.<br />
<br />
====Background====<br />
<br />
* Source: http://gmod-ware.sourceforge.net<br />
* Language: Perl<br />
* Authors: Sohel Merchant, Eric Just<br />
* Contact: e-just [at] northwestern.edu<br />
* Users: dictyBase<br />
* Support: Fully supported by authors. Sourceforge infrastructure (bug reports, mailing lists)<br />
* Third party code: GMOD, BioPerl<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity: Uses the Chado::AutoDBI connection from GMOD. No connection configuration necessary since Modware is built on top of GMOD and the coonnection is configured on GMOD install. <br />
* Transaction support: Transactions are fully supported. The database handle is available as a singleton through Modware::DBH. To rollback at any time, simply insert '''new Modware::DBH->rollback()''' into script.<br />
* Code generation: No automatic code generation.<br />
<br />
====Special topics====<br />
<br />
* Features in Modware map to familiar Bioperl (Bio::Seq and Bio::SeqFeature) objects that can be accessed through a 'bioperl' method. <br />
Modware internally links Bio::SeqFeautre objects to the proper Bio::Seq objects to provide maximum functionality from the BioPerl libraries.<br />
* Modware's API documentation is complete and easily browsed at [http://gmod-ware.sourceforge.net/doc/ http://gmod-ware.sourceforge.net/doc/]<br />
* Plenty of other documentation (quick starts with simple use cases) at [http://gmod-ware.sourceforge.net http://gmod-ware.sourceforge.net]<br />
<br />
====Limitations====<br />
* Currently Alpha release (hoping for user feedback to create Beta)<br />
* Does not ''cover'' all of Chado<br />
* Not enough users to get quality feedback yet<br />
* Performance is slower by object-oriented nature<br />
* Language dependent<br />
<br />
====Presentation by Eric Just====<br />
<br />
[[Modware Presentation|Modware Presentation]]<br />
<br />
===GBrowse (DasI) Adaptor===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/project/showfiles.php?group_id=27707&package_id=34513&release_id=433523<br />
* Language: Perl<br />
* Authors: Scott Cain<br />
* Users: Several through GBrowse, none that I know of as a scripting tool.<br />
* Support: [http://lists.sourceforge.net/lists/listinfo/gmod-gbrowse GBrowse mailing list]<br />
* Third party code: None.<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity: Connects ''via'' Perl's DBI<br />
* Transaction support: N/A (read only adapter)<br />
* Code generation: N/A<br />
<br />
====Special topics====<br />
<br />
Clearly, Bio::DB::Das::Chado is designed for use with GBrowse.<br />
<br />
====Limitations====<br />
<br />
* Read-only<br />
* Not generic middleware but if you use Chado and GBrowse may be useful<br />
* Incomplete implementation of Bio::DasI; just enough to make GBrowse work<br />
* Also, despite the name, has never been tested with a Das server.<br />
<br />
====Presentation by Scott Cain====<br />
<br />
[[GBrowse (DasI) Presentation|GBrowse (DasI) Adaptor Presentation]]<br />
<br />
===iBatis and Abator===<br />
<br />
====Background====<br />
<br />
* Source: http://ibatis.apache.org/<br />
* Language: Java<br />
* Authors: Apache group<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Does not hide SQL<br />
* Does not create a whole object model of the database in memory<br />
* Not as widely used as Hibernate<br />
* No Perl version<br />
<br />
====Presentation by Jeff Bowes====<br />
<br />
[[iBatis Presentation|iBatis Presentation]]<br />
<br />
===Hibernate===<br />
<br />
====Background====<br />
<br />
* Source: http://www.hibernate.org<br />
* Language: Java<br />
* Authors: JBoss group<br />
* Users: VectorBase<br />
* Support: JBoss group<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Limitations====<br />
<br />
* Issue around Completeness <br />
* Exception Handling<br />
* Performance Tuning<br />
<br />
====Presentation by Robert Bruggner====<br />
<br />
[[Hibernate Presentation|Hibernate Presentation]]<br />
<br />
===PSU Chado Interface===<br />
<br />
====Background====<br />
<br />
* Source:<br />
* Language: Java<br />
* Authors: Chinmay Patel, Adrian Tivey, Tim Carver<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Limitations====<br />
<br />
* Hibernate Save: no equivalent to a ''find or save'' method.<br />
* Not great for bulk data retrieval.<br />
* Hibernate works best when applied to databases designed with objects in mind (''Object Oriented Databases''). <br />
<br />
<br />
====Presentation by Chinmay Patel====<br />
<br />
[[PSU Presentation|PSU Presentation]]<br />
<br />
===Wiki Authors===<br />
<br />
* Jeff Bowes, XenBase<br />
* Robert Bruggner, VectorBase<br />
* Scott Cain, GMOD<br />
* Josh Goodman, FlyBase<br />
* Eric Just, DictyBase<br />
* Sohel Merchant, DictyBase<br />
* Brian O'Connor, UCLA<br />
* Brian Osborne, GMOD<br />
* Chinmay Patel, GeneDB<br />
* Pinglei Zhou, FlyBase</div>165.124.152.78http://gmod.org/wiki/GMOD_MiddlewareGMOD Middleware2007-02-06T18:28:41Z<p>165.124.152.78: /* Executive Summary */</p>
<hr />
<div>==Executive Summary==<br />
<br />
''The meeting participants examined two types of Object-Relational Mapping (ORM) tools.'' Hibernate (Java), iBatis (Java), and Chado::AutoDBI (Perl) tools will examine any relational schema and configuration files to create a middleware ''de novo''. Modware is hand-coded using Chado::AutoDBI to create an object-oriented interface to Chado that does not use schema semantics and more closely resembles biological semantics.<br />
<br />
''A predictable API is desirable.'' The participants' view is that an API that is easily understood and reflects current models of biological data is the most desirable API. Currently Modware (in conjunction with Chado::AutoDBI) comes closest to providing this comprehensibility.<br />
<br />
__TOC__<br />
<br clear='all'><br />
<br />
==Middleware for Chado databases==<br />
<br />
===Participants===<br />
<br />
On January 19 2007, representatives of the major model organism databases (MODs) and other interested parties met to discuss and compare Middleware packages used by developers working for the MODs. The workshop attendees were tasked to make specific recommendations. Such recommendations will focus the efforts of the MODs on specific packages and lead to code re-use and more feature-rich middleware.<br />
<br />
After an introductory presentation attendees listened to a series of presentations on the different Middleware packages. Each presenter described the features of the package, both positive and negative, and showed how the package would be used to address a specific set of test problems. The attendees reassembled in a general discussion group to discuss the presentations and to develop a consensus statement. This document represents the consensus of the workshop and shows the individual presentations.<br />
<br />
<br />
===Introduction===<br />
<br />
A group of some 50 GMOD developers gathered at the annual meeting to discuss middleware. This one day meeting had the following general goals:<br />
<br />
* To educate GMOD programmers on methods and practices for Middleware<br />
* To facilitate discussion on the best methods<br />
* To guide GMOD to a uniform Middleware layer<br />
* To generate this central reference document for Middleware projects, including:<br />
** Platform information<br />
** Strengths & weaknesses of different Middleware packages<br />
** Specific examples of how one would use a given middleware package<br />
<br />
One of the key characteristics of the GMOD software project is the variety of approaches and components that it supports. This applies to applications, database schemas, as well as to middleware, a software ''layer'' that mediates the exchange of data between the applications and the databases. Despite this diversity certain applications and schemas have emerged as key supported components in GMOD, such as the GBrowse application and the Chado schema, to name just two. However, a consensus view has not emerged with respect to middleware, and there are certainly a number of different middleware packages that have been used in the GMOD world, coming from within this world and from the larger world of open source.<br />
<br />
In late 2006 the GMOD developers took note of the large number of middleware packages in use and elected to embark on a short-term study to evaluate and compare these packages. The primary motivation here was to select or recommend certain packages over others specifically within the GMOD context. The assumption is that making such recommendations will serve to focus the developers' effort on a smaller number of packages. Clearly it's also assumed that such a focus will inevitably lead to greater support for and use of those recommended packages, and that all will GMOD will benefit.<br />
<br />
Another purpose of this study is to educate GMOD programmers on best practices concerning the use and development of middleware. It's expected that common agreement on these practices will lead to the development of more effective software as well as the best use of the software in practice. Finally, this study should generate a central reference document on these different middleware packages used in GMOD. This reference will contain platform- and language-specific information as well as descriptions of the strengths and weaknesses of the packages that can be used by GMOD developers when considering middleware.<br />
<br />
====General Evaluation Criteria====<br />
<br />
The GMOD developers proposed that each presenter provide some basic information about each middleware package, both general and technical. In addition each middleware application was asked to address a set of sample problems, shown below. These example problems are thought to typify some of the common functions that the scientist may need when working with their own database. It was understood that not all software would be able to handle all aspects of the sample problems and this demonstration was not intended to be ''live''.<br />
<br />
=====Problem 1=====<br />
<br />
Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned <code>feature_id</code> for each inserted gene.<br />
<br />
Note:<br />
<br />
* The coordinates are given in exact coordinates.<br />
* Use the <code>organism_id</code> for your organism<br />
* Store description in the chado table <code>featureprop</code><br />
* A sequence in fasta format (see [[FakeChromosome|Fake Chromosome]]) should be loaded as genomic sequence, either chromosome or contig -- this will be used as a <code>srcfeature</code> in <code>featureloc</code><br />
<br />
Gene descriptions:<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
mRNA Feature<br />
exon_1: <br />
start: 13691<br />
end: 13767<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
exon_2: <br />
start: 14687<br />
end: 14720<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
symbol: x-men<br />
synonyms: wolverine<br />
mRNA Feature<br />
exon_1: <br />
start: 12648<br />
end: 13136<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
symbol: x-ray<br />
synonyms: none<br />
exon: <br />
start: 1703<br />
end: 1900<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 2=====<br />
<br />
Retrieve and print the following report for gene '''xfile''' (the coding sequence and exon coordinates are derived from the associated mRNA feature). The results should resemble the following:<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>xfile cds <br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
=====Problem 3=====<br />
<br />
Update the gene '''xfile''': change the name symbol to '''x-file''' and retrieve the changed record. Regenerate the report from Problem 1. The results should resemble the following:<br />
<br />
symbol: x-file<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>x-file cds<br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
=====Problem 4=====<br />
<br />
Search for all genes with symbols starting with ''x-''. With the results produce the following simple result list<br />
(organism will vary):<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
1325 x-ray Xenopus laevis<br />
<br />
=====Problem 5=====<br />
<br />
Delete the gene '''x-ray''' using the <code>geneId</code>. Run the search and report in Problem 4 again to show the delete has taken<br />
place, with a result resembling the following:<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
<br />
=====Problem Results=====<br />
<br />
All presenters paid attention to the assigned problems and all packages could perform the required operations, except for the GBrowse (DasI) Adaptor which is read-only software. Clearly one can see many differences between packages in how the the problems were solved, please see the presentations themselves for these details.<br />
<br />
===The Middleware Packages===<br />
<br />
The one day meeting heard presentations from developers using both Perl and Java middleware and a number of satisfactory solutions were described. The focus in all cases was some sort of system that connected to the Chado relational database. Although other databases are encountered in the GMOD world (e.g. [http://biosql.org BioSQL]) the Chado schema is popular and serves as a good test schema for this exercise given its complexity. The primary focus in the talks was on functionality from the perspective of writing code and extending the software and less attention was given to performance. Each presenter focussed on their middleware and little side-by-side comparisons were made (for one comparison please see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]] by Josh Goodman, written before this meeting).<br />
<br />
The Chado::AutoDBI middleware package is built on Perl's Class:DBI, a module that can be considered either an ORM tool or a tool to build ORM tools. Modware is built on top of Chado::AutoDBI. Both Chado::AutoDBI and Modware are built specifically for the Chado schema and aren't general tools. <br />
<br />
A key difference is that Modware uses Bioperl as its programmatic interface whereas the Chado::AutoDBI API resembles that of Class::DBI. Furthermore Chado::AutoDBI wraps the entire Chado schema where Modware objects only address those parts of the Chado schema that map to Bioperl objects. Neither of these packages require configuration, they are pre-configured for Chado and all one needs to do is connect to an existing schema. This schema, in theory, could be running on any popular RDBMS (Postgres, Mysql, Oracle, etc.), this flexibility is built into Class::DBI.<br />
<br />
The Java packages, Hibernate and iBatis, are far more general than the 2 packages discussed above and one could use them with any relational schema. One distinguishing feature is the languages that these packages use: Hibernate tends to present more Java to the programmer whereas iBatis presents both Java and XML. Both allow you to address the schema with SQL should you choose to do so. Hibernate also introduces its own language, HQL, a Java-SQL hybrid. Both would need some configuration to connect to the Chado schema but this should not be considered a significant barrier. Both packages can be used with any popular RDBMS (Postgres, Mysql, Oracle, etc.).<br />
<br />
XORT is not a typical ORM tool like these other packages but has been included here because of its utility in bulk operations. This capability is one that the ORM tools are thought not to do very well as the serial construction and destruction of objects is typically not fast, and constructing very large numbers of objects simultaneously consumes quite a bit of memory. This is not to say that one shouldn't use an ORM tool for bulk operations but that one should test the tool in question and not assume its performance is adequate to the given task. Such a test may show an entirely different approach, like that of XORT, is more appropriate.<br />
<br />
====Java Middleware====<br />
<br />
The Java packages all used Java plus XML, to some degree. In addition iBatis exposes SQL to the developer and it was argued that this could be viewed either as an advantage (allows tuning of underlying SQL) or a disadvantage. Both Hibernate and iBatis are designed to operate with any relational schema, not just Chado. <br />
<br />
Both Abator and iBatis share one limitation which is that one can not <br />
auto-generate higher level objects, such as Genes. In fairness one must add that this would not really <br />
be possible to do accurately for any system. There is not sufficient <br />
information present in the database schema to derive all of the <br />
relationships necessary. In both packages the auto-generation code only mimics the schema and it would have take additional work to get <br />
a package to automatically create and relate subsets of data based on "join <br />
tables" like feature_relationship property. This mostly has to do <br />
with the circular nature of the schema.<br />
<br />
=====Hibernate=====<br />
<br />
From the [http://www.hibernate.org Hibernate Web site]:<br />
<br />
''Hibernate lets you develop persistent classes following object-oriented idiom - including association, inheritance, polymorphism, composition, and collections. Hibernate allows you to express queries in its own portable SQL extension (HQL), as well as in native SQL, or with an object-oriented Criteria and Example API. Unlike many other persistence solutions, Hibernate does not hide the power of SQL from you and guarantees that your investment in relational technology and knowledge is as valid as always.''<br />
<br />
Hibernate is thought to work best when used in conjunction with a schema that's been designed with objects in mind, an ''object-oriented schema''.<br />
<br />
======Abstraction======<br />
<br />
Hibernate is the more abstracted of the 2 Java packages, it allows you to work with the relational database with the least exposure to SQL if you choose to do this. It is probably considered the more flexible of the 2 with respect to language since one can program in Java or HBL (Hibernate Query Language), a hybrid between SQL and Java.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons between Hibernate and iBatis were made.<br />
<br />
======Configuration======<br />
<br />
Hibernate has the ability to read a database schema and generate Java <br />
Code representing the database objects or tables. In this exercise the developer chose not to use <br />
this method because one must create a file telling the code auto-generation what to use for indexes, which fields should be complex <br />
objects, and so on. It was judged to be less work to just go ahead and make <br />
the code by hand.<br />
<br />
For each object mapped to the database, one can create an XML file <br />
that tells Hibernate which database fields to map to specific object <br />
properties. Additionally, one can implement equals() and hashcode() <br />
functions for each object based on the table unique constraints. <br />
After doing that, Hibernate takes care of all the Create-Retreive-Update-Delete (CRUD) and transaction <br />
functionality.<br />
<br />
======Documentation======<br />
<br />
Hibernate is a popular and well-supported tool with extensive documentation.<br />
<br />
=====iBatis=====<br />
<br />
From the [http://ibatis.apache.org iBatis Web site]:<br />
<br />
''The Data Mapper framework (a.k.a. SQL Maps) will help to significantly reduce the amount of Java and .NET code that is normally neededto access a relational database. This framework maps classes to SQL statements using a very simple XML descriptor. Simplicity is the biggest advantage of iBATIS over other frameworks and object relational mapping tools. To use iBATIS you need only be familiar with your own application domain objects (basic JavaBeans or .NET classes), XML, and SQL. There is very little else to learn. There is no complex scheme required to join tables or execute complex queries. Using iBATIS you have the full power of real SQL at your fingertips. The iBATIS Data Mapper framework can map nearly any database to any object model and is very tolerant of legacy designs, or even bad designs. This is all achieved without special database tables, peer objects or code generation.''<br />
<br />
======Abstraction======<br />
<br />
iBatis does not attempt to achieve abstraction in the way that other Java ORM tools do and assumes that viewing SQL is an advantage, not a disadvantage.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons made between iBatis and Hibernate.<br />
<br />
======Configuration======<br />
<br />
The following general steps are performed:<br />
<br />
# Create an XML file of tables the developer wants to create objects for <br />
#* Including specifying the method for retrieving auto-generated sequence values from inserts<br />
#* This file could als be easily auto-generated<br />
# Make some type adjustments for some variables <br />
#* For example, iBatis mapped some things to Java strings that should have been Integers<br />
<br />
======Documentation======<br />
<br />
iBatis is a popular and well-supported tool with extensive documentation.<br />
<br />
====Perl Middleware====<br />
<br />
The Perl approaches used only the Perl language (the Java packages all used Java plus XML, to some degree). The XORT application is not, strictly speaking, ''middleware'' but has proven to be very useful in bulk operations using the Chado schema and Chado XML though in principle it can be used with any relational schema.<br />
<br />
=====Chado::AutoDBI=====<br />
<br />
======Abstraction======<br />
<br />
Chado::AutoDBI objects map directly to the Chado tables so it could be said that Chado::AutoDBI is as abstract as Chado itself. Therefore one needs to become somewhat familiar with Chado itself in order to use Chado::AutoDBI.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons of performance were done using Perl middleware. All packages were deemed to give adequate performance when used to connect UIs to underlying databases. On the other hand the presenters were reluctant to recommend their packages for ''bulk'' operations. <br />
<br />
======Configuration======<br />
<br />
Chado::AutoDBI requires no configuration, it is designed to interact with the Chado schema out-of-the-box. If the schema changes then the underlying Class::DBI should be able to adjust to the change.<br />
<br />
======Documentation======<br />
<br />
=====Modware=====<br />
<br />
======Abstraction======<br />
<br />
Modware has higher level of abstraction than that provided by Chado::AutoDBI. The critical point is that it uses the [http://bioperl.org Bioperl] objects as accessors, this could either be highly suitable or not at all appropriate for a given environment.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons of performance were done using Perl middleware. All packages were deemed to give adequate performance when used to connect UIs to underlying databases.<br />
<br />
======Configuration======<br />
<br />
Modware requires no configuration, it is designed to interact with the Chado schema out-of-the-box. If the schema changes then the underlying Chado::AutoDBI should be able to adjust to the change. Some convenience views are created when installing Modware (i.e. V_MRNA_FEATURES).<br />
<br />
======Documentation======<br />
<br />
Bioperl-style documentation at http://gmod-ware.sourceforge.net/doc/, written for POD for all methods.<br />
<br />
=====GBrowse (DasI)=====<br />
<br />
======Abstraction======<br />
Since it implements the Bio::DasI interface, the abstraction is similar to what one would see with BioPerl/BioDas features.<br />
<br />
======Performance======<br />
Bio::DB::Das::Chado performs relatively well since it is designed to be a database adaptor to drive a Generic Genome Browser (GBrowse) instance. Though it does not perform as well as the {{BPM|Bio::DB::GFF}} and {{BPM|Bio::DB::SeqFeature}} adaptors, those are for databases designed specifically for quick retrieval of data for GBrowse, whereas Chado is designed as a complete data warehouse. The main way to get better performance for the Chado GBrowse adaptor is to implement materialized views of the GFF or SeqFeature schema inside Chado.<br />
<br />
======Configuration======<br />
<br />
This package needs no configuration, it is pre-configured for Chado.<br />
<br />
======Documentation======<br />
<br />
The DasI interface is well-documented, about a dozen methods and three classes, all documented.<br />
<br />
===Getting More Information===<br />
<br />
The issues around using and developing middleware are of general interest in GMOD. If you have questions about middleware we suggest that you contact the GMOD Development list rather than contacting individual developers. You can sign up for the list here:<br />
<br />
https://lists.sourceforge.net/lists/listinfo/gmod-devel<br />
<br />
<br />
===Object-Relational Mapping Principles===<br />
<br />
====Presentation by Sohel Merchant====<br />
<br />
Sohel Merchant, Bioinformatics Software Engineer at dictyBase, Center for Genetic Medicine, Northwestern University, Chicago. This Wiki section is an edited version of [[Media:Object_Relational_Mapping_Layer.pdf|Sohel's presentation]].<br />
<br />
=====Outline=====<br />
<br />
* The Problem<br />
* Solutions<br />
* ORM<br />
* Perl – Class::DBI<br />
* Summary<br />
<br />
=====The Problem=====<br />
<br />
* Developers need to perform Create, Retrieve, Update, Delete (aka CRUD) operations on data inside an application.<br />
* The real world objects represented using a programming language needs to be stored in databases<br />
* Using relational databases to store object-oriented data leads to a semantic gap<br />
* RDBMS have fixed types, but OO can have more complicated user defined types.<br />
<br />
=====Solutions=====<br />
<br />
* Data Access Object (DAO)<br />
* Developer writes a class which contains one attribute for each field in the table<br />
* Methods for CRUD typically contains JDBC/DBI code with the necessary SQL statements.<br />
* Object Relational Mapping (ORM), WikiPedia:<br />
** “ORM is a programming technique that links databases to object-oriented language concepts, creating (in effect) a virtual object database.“<br />
* Developer needs to configure the ORM<br />
* Less amount of manual coding<br />
* CRUD methods are automatically generated by the ORM layer<br />
<br />
=====ORM=====<br />
<br />
ORM solutions<br />
<br />
* Perl<br />
** Class::DBI<br />
* Java<br />
** EJB<br />
** Hibernate<br />
** JDO<br />
** iBatis<br />
<br />
=====Perl - Class::DBI=====<br />
<br />
* Provides a simple interfaces for wrapping Perl classes around a database tables<br />
* Tables are mapped directly to objects<br />
* The table column name are mapped to the get/set methods<br />
* Can be used with transactions<br />
<br />
=====Class::DBI=====<br />
<br />
Defining a class in Class::DBI to represent a table:<br />
<br />
CVTERM<br />
cvterm_id<br />
cv_id<br />
name<br />
definition<br />
dbxref_id<br />
<br />
Corresponding code:<br />
<br />
<perl><br />
package Chado::Cvterm;<br />
use base 'Chado::DBI';<br />
Chado::Cvterm->set_up_table('Cvterm');<br />
</perl><br />
<br />
=====Class::DBI - CRUD=====<br />
<br />
<perl><br />
## Create<br />
$term_dbobj = Chado::Cvterm->create({<br />
name => ”DUMMY TERM”,<br />
cv_id => 1,<br />
dbxref_id => 125<br />
});<br />
## Retrieve<br />
$term_dbobj = Chado::Cvterm->retrieve(2);<br />
<br />
## Update<br />
$term_dbobj->name( $term->name() );<br />
$term_dbobj->definition( $term->definition );<br />
<br />
## Delete<br />
$term_dbobj->delete();<br />
</perl><br />
<br />
=====Java - Hibernate=====<br />
<br />
* Hibernate maps Java Objects directly to database tables<br />
* Scalable<br />
* Works well for controlled Data model<br />
<br />
=====Java - iBatis=====<br />
<br />
* iBATIS maps Java Objects to the results of SQL Queries<br />
* XML definitions for queries<br />
* Queries and managing Maps<br />
* Transactions<br />
* Good fit for existing database schema<br />
<br />
=====Summary=====<br />
<br />
* ORM provides painless roundtrip of data between the application and database.<br />
* Reduces the amount of SQL code and allows a programmatic style interface to the RDBMS<br />
* Choice of ORM solution depends on the type of project<br />
* Flybase examined iBatis and Hibernate, both use XML configuration files<br />
** Hibernate is better if you're building schema from scratch<br />
** Both auto-configure given a schema. <br />
** Both have strengths and weaknesses.<br />
* Is Hibernate better when you're in the process of designing a schema?<br />
** Hibernate can assist you in making a ''Hibernate-compatible'' schema.<br />
<br />
===XORT===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/project/showfiles.php?group_id=27707<br />
* Language: Perl<br />
* Authors: Pinglei Zhou<br />
* Users: Flybase<br />
* Support: Pinglei Zhou<br />
* Third party code: Uses XML::Parser::PerlSAX, Unicode::String, XML::DOM, and DBI from Perl (CPAN)<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
=====Comparing Hibernate & XORT=====<br />
<br />
Flybase tried Hibernate, but just creating simple print() statements in the course of doing bulk operations they<br />
encountered performance issues. Therer are many caching parameters available in Hibernate but the problem is that Chado is recursive or cyclical. <br />
XORT does some simple, and automatic, caching. With XORT you can handle recursive or cyclical operations more easily. In common operations such as merging genes Chado users will encounter this issue routinely.<br />
<br />
Also see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]].<br />
<br />
====Limitations====<br />
<br />
* Database schemas need to follow certain rules<br />
** All must have internal int primary key<br />
** All must have unique key(s)<br />
* It may take a long path to retrieve certain type of data<br />
** Example: gene->allele->genotype->phenotype via feature_relationship<br />
* Structure not stored in memory, you flush out data as it goes<br />
<br />
====Presentation by Pinglei Zhou and Josh Goodman====<br />
<br />
[[XORT Presentation|XORT Presentation]]<br />
<br />
===Chado::AutoDBI===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/projects/gmod/<br />
* Language: Perl<br />
* Authors: Allen Day, Scott Cain, Brian O'Connor, & others<br />
* Users: <br />
* Support:<br />
* Third party code: Based on Class::DBI by Michael Schwern & Tony Bowden<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Performance<br />
** Can one read thousands of objects into memory? You could do this but it's not suited to bulk operations<br />
* Joins & complex queries<br />
<br />
<perl><br />
# Add the add_constructor for looking for name lengths<br />
<br />
__PACKAGE__ ->add_constructor(long_names => qq{ length(name) > 15 });<br />
<br />
# Custom SQL<br />
<br />
__PACKAGE__->set_sql(xfiles => qq{<br />
SELECT FEATURE_ID<br />
FROM FEATURE<br />
WHERE NAME = 'xfiles' });<br />
</perl><br />
<br />
====Presentation by Brian O'Connor====<br />
<br />
[[Chado::AutoDBI Presentation|Chado::AutoDBI Presentation]]<br />
<br />
===Modware===<br />
Modware seeks to provide an object-oriented perl interface to Chado to allow for rapid application development without worrying about the complex details of the schema on a day-to-day basis.<br />
<br />
====Background====<br />
<br />
* Source: http://gmod-ware.sourceforge.net<br />
* Language: Perl<br />
* Authors: Sohel Merchant, Eric Just<br />
* Contact: e-just [at] northwestern.edu<br />
* Users: dictyBase<br />
* Support: Fully supported by authors. Sourceforge infrastructure (bug reports, mailing lists)<br />
* Third party code: GMOD, BioPerl<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity: Uses the Chado::AutoDBI connection from GMOD. No connection configuration necessary since Modware is built on top of GMOD and the coonnection is configured on GMOD install. <br />
* Transaction support: Transactions are fully supported. The database handle is available as a singleton through Modware::DBH. To rollback at any time, simply insert '''new Modware::DBH->rollback()''' into script.<br />
* Code generation: No automatic code generation.<br />
<br />
====Special topics====<br />
<br />
* Features in Modware map to familiar Bioperl (Bio::Seq and Bio::SeqFeature) objects that can be accessed through a 'bioperl' method. <br />
Modware internally links Bio::SeqFeautre objects to the proper Bio::Seq objects to provide maximum functionality from the BioPerl libraries.<br />
* Modware's API documentation is complete and easily browsed at [http://gmod-ware.sourceforge.net/doc/ http://gmod-ware.sourceforge.net/doc/]<br />
* Plenty of other documentation (quick starts with simple use cases) at [http://gmod-ware.sourceforge.net http://gmod-ware.sourceforge.net]<br />
<br />
====Limitations====<br />
* Currently Alpha release (hoping for user feedback to create Beta)<br />
* Does not ''cover'' all of Chado<br />
* Not enough users to get quality feedback yet<br />
* Performance is slower by object-oriented nature<br />
* Language dependent<br />
<br />
====Presentation by Eric Just====<br />
<br />
[[Modware Presentation|Modware Presentation]]<br />
<br />
===GBrowse (DasI) Adaptor===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/project/showfiles.php?group_id=27707&package_id=34513&release_id=433523<br />
* Language: Perl<br />
* Authors: Scott Cain<br />
* Users: Several through GBrowse, none that I know of as a scripting tool.<br />
* Support: [http://lists.sourceforge.net/lists/listinfo/gmod-gbrowse GBrowse mailing list]<br />
* Third party code: None.<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity: Connects ''via'' Perl's DBI<br />
* Transaction support: N/A (read only adapter)<br />
* Code generation: N/A<br />
<br />
====Special topics====<br />
<br />
Clearly, Bio::DB::Das::Chado is designed for use with GBrowse.<br />
<br />
====Limitations====<br />
<br />
* Read-only<br />
* Not generic middleware but if you use Chado and GBrowse may be useful<br />
* Incomplete implementation of Bio::DasI; just enough to make GBrowse work<br />
* Also, despite the name, has never been tested with a Das server.<br />
<br />
====Presentation by Scott Cain====<br />
<br />
[[GBrowse (DasI) Presentation|GBrowse (DasI) Adaptor Presentation]]<br />
<br />
===iBatis and Abator===<br />
<br />
====Background====<br />
<br />
* Source: http://ibatis.apache.org/<br />
* Language: Java<br />
* Authors: Apache group<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Does not hide SQL<br />
* Does not create a whole object model of the database in memory<br />
* Not as widely used as Hibernate<br />
* No Perl version<br />
<br />
====Presentation by Jeff Bowes====<br />
<br />
[[iBatis Presentation|iBatis Presentation]]<br />
<br />
===Hibernate===<br />
<br />
====Background====<br />
<br />
* Source: http://www.hibernate.org<br />
* Language: Java<br />
* Authors: JBoss group<br />
* Users: VectorBase<br />
* Support: JBoss group<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Limitations====<br />
<br />
* Issue around Completeness <br />
* Exception Handling<br />
* Performance Tuning<br />
<br />
====Presentation by Robert Bruggner====<br />
<br />
[[Hibernate Presentation|Hibernate Presentation]]<br />
<br />
===PSU Chado Interface===<br />
<br />
====Background====<br />
<br />
* Source:<br />
* Language: Java<br />
* Authors: Chinmay Patel, Adrian Tivey, Tim Carver<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Limitations====<br />
<br />
* Hibernate Save: no equivalent to a ''find or save'' method.<br />
* Not great for bulk data retrieval.<br />
* Hibernate works best when applied to databases designed with objects in mind (''Object Oriented Databases''). <br />
<br />
<br />
====Presentation by Chinmay Patel====<br />
<br />
[[PSU Presentation|PSU Presentation]]<br />
<br />
===Wiki Authors===<br />
<br />
* Jeff Bowes, XenBase<br />
* Robert Bruggner, VectorBase<br />
* Scott Cain, GMOD<br />
* Josh Goodman, FlyBase<br />
* Eric Just, DictyBase<br />
* Sohel Merchant, DictyBase<br />
* Brian O'Connor, UCLA<br />
* Brian Osborne, GMOD<br />
* Chinmay Patel, GeneDB<br />
* Pinglei Zhou, FlyBase</div>165.124.152.78http://gmod.org/wiki/GMOD_MiddlewareGMOD Middleware2007-02-06T17:51:56Z<p>165.124.152.78: /* Configuration */</p>
<hr />
<div>==Executive Summary==<br />
<br />
''The meeting participants examined two types of Object-Relational Mapping (ORM) tools.'' The Hibernate and iBatis tools will examine any relational schema and create middleware ''de novo'' whereas Modware and Chado::AutoDBI are hand-built to match GMOD's Chado relational schema.<br />
<br />
''A predictable API is desirable.'' The participants' view is that an API that is easily understood and reflects current models of biological data is the most desirable API. Currently the Modware and Chado::AutoDBI packages come closest to providing this comprehensibility.<br />
<br />
__TOC__<br />
<br clear='all'><br />
==Middleware for Chado databases==<br />
<br />
===Participants===<br />
<br />
On January 19 2007, representatives of the major model organism databases (MODs) and other interested parties met to discuss and compare Middleware packages used by developers working for the MODs. The workshop attendees were tasked to make specific recommendations. Such recommendations will focus the efforts of the MODs on specific packages and lead to code re-use and more feature-rich middleware.<br />
<br />
After an introductory presentation attendees listened to a series of presentations on the different Middleware packages. Each presenter described the features of the package, both positive and negative, and showed how the package would be used to address a specific set of test problems. The attendees reassembled in a general discussion group to discuss the presentations and to develop a consensus statement. This document represents the consensus of the workshop and shows the individual presentations.<br />
<br />
<br />
===Introduction===<br />
<br />
A group of some 50 GMOD developers gathered at the annual meeting to discuss middleware. This one day meeting had the following general goals:<br />
<br />
* To educate GMOD programmers on methods and practices for Middleware<br />
* To facilitate discussion on the best methods<br />
* To guide GMOD to a uniform Middleware layer<br />
* To generate this central reference document for Middleware projects, including:<br />
** Platform information<br />
** Strengths & weaknesses of different Middleware packages<br />
** Specific examples of how one would use a given middleware package<br />
<br />
One of the key characteristics of the GMOD software project is the variety of approaches and components that it supports. This applies to applications, database schemas, as well as to middleware, a software ''layer'' that mediates the exchange of data between the applications and the databases. Despite this diversity certain applications and schemas have emerged as key supported components in GMOD, such as the GBrowse application and the Chado schema, to name just two. However, a consensus view has not emerged with respect to middleware, and there are certainly a number of different middleware packages that have been used in the GMOD world, coming from within this world and from the larger world of open source.<br />
<br />
In late 2006 the GMOD developers took note of the large number of middleware packages in use and elected to embark on a short-term study to evaluate and compare these packages. The primary motivation here was to select or recommend certain packages over others specifically within the GMOD context. The assumption is that making such recommendations will serve to focus the developers' effort on a smaller number of packages. Clearly it's also assumed that such a focus will inevitably lead to greater support for and use of those recommended packages, and that all will GMOD will benefit.<br />
<br />
Another purpose of this study is to educate GMOD programmers on best practices concerning the use and development of middleware. It's expected that common agreement on these practices will lead to the development of more effective software as well as the best use of the software in practice. Finally, this study should generate a central reference document on these different middleware packages used in GMOD. This reference will contain platform- and language-specific information as well as descriptions of the strengths and weaknesses of the packages that can be used by GMOD developers when considering middleware.<br />
<br />
====General Evaluation Criteria====<br />
<br />
The GMOD developers proposed that each presenter provide some basic information about each middleware package, both general and technical. In addition each middleware application was asked to address a set of sample problems, shown below. These example problems are thought to typify some of the common functions that the scientist may need when working with their own database. It was understood that not all software would be able to handle all aspects of the sample problems and this demonstration was not intended to be ''live''.<br />
<br />
=====Problem 1=====<br />
<br />
Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned <code>feature_id</code> for each inserted gene.<br />
<br />
Note:<br />
<br />
* The coordinates are given in exact coordinates.<br />
* Use the <code>organism_id</code> for your organism<br />
* Store description in the chado table <code>featureprop</code><br />
* A sequence in fasta format (see [[FakeChromosome|Fake Chromosome]]) should be loaded as genomic sequence, either chromosome or contig -- this will be used as a <code>srcfeature</code> in <code>featureloc</code><br />
<br />
Gene descriptions:<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
mRNA Feature<br />
exon_1: <br />
start: 13691<br />
end: 13767<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
exon_2: <br />
start: 14687<br />
end: 14720<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
symbol: x-men<br />
synonyms: wolverine<br />
mRNA Feature<br />
exon_1: <br />
start: 12648<br />
end: 13136<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
symbol: x-ray<br />
synonyms: none<br />
exon: <br />
start: 1703<br />
end: 1900<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 2=====<br />
<br />
Retrieve and print the following report for gene '''xfile''' (the coding sequence and exon coordinates are derived from the associated mRNA feature). The results should resemble the following:<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>xfile cds <br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
=====Problem 3=====<br />
<br />
Update the gene '''xfile''': change the name symbol to '''x-file''' and retrieve the changed record. Regenerate the report from Problem 1. The results should resemble the following:<br />
<br />
symbol: x-file<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>x-file cds<br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
=====Problem 4=====<br />
<br />
Search for all genes with symbols starting with ''x-''. With the results produce the following simple result list<br />
(organism will vary):<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
1325 x-ray Xenopus laevis<br />
<br />
=====Problem 5=====<br />
<br />
Delete the gene '''x-ray''' using the <code>geneId</code>. Run the search and report in Problem 4 again to show the delete has taken<br />
place, with a result resembling the following:<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
<br />
=====Problem Results=====<br />
<br />
All presenters paid attention to the assigned problems and all packages could perform the required operations, except for the GBrowse (DasI) Adaptor which is read-only software. Clearly one can see many differences between packages in how the the problems were solved, please see the presentations themselves for these details.<br />
<br />
===The Middleware Packages===<br />
<br />
The one day meeting heard presentations from developers using both Perl and Java middleware and a number of satisfactory solutions were described. The focus in all cases was some sort of system that connected to the Chado relational database. Although other databases are encountered in the GMOD world (e.g. [http://biosql.org BioSQL]) the Chado schema is popular and serves as a good test schema for this exercise given its complexity. The primary focus in the talks was on functionality from the perspective of writing code and extending the software and less attention was given to performance. Each presenter focussed on their middleware and little side-by-side comparisons were made (for one comparison please see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]] by Josh Goodman, written before this meeting).<br />
<br />
The Chado::AutoDBI middleware package is built on Perl's Class:DBI, a module that can be considered either an ORM tool or a tool to build ORM tools. Modware is built on top of Chado::AutoDBI. Both Chado::AutoDBI and Modware are built specifically for the Chado schema and aren't general tools. <br />
<br />
A key difference is that Modware uses Bioperl as its programmatic interface whereas the Chado::AutoDBI API resembles that of Class::DBI. Furthermore Chado::AutoDBI wraps the entire Chado schema where Modware objects only address those parts of the Chado schema that map to Bioperl objects. Neither of these packages require configuration, they are pre-configured for Chado and all one needs to do is connect to an existing schema. This schema, in theory, could be running on any popular RDBMS (Postgres, Mysql, Oracle, etc.), this flexibility is built into Class::DBI.<br />
<br />
The Java packages, Hibernate and iBatis, are far more general than the 2 packages discussed above and one could use them with any relational schema. One distinguishing feature is the languages that these packages use: Hibernate tends to present more Java to the programmer whereas iBatis presents both Java and XML. Both allow you to address the schema with SQL should you choose to do so. Hibernate also introduces its own language, HQL, a Java-SQL hybrid. Both would need some configuration to connect to the Chado schema but this should not be considered a significant barrier. Both packages can be used with any popular RDBMS (Postgres, Mysql, Oracle, etc.).<br />
<br />
XORT is not a typical ORM tool like these other packages but has been included here because of its utility in bulk operations. This capability is one that the ORM tools are thought not to do very well as the serial construction and destruction of objects is typically not fast, and constructing very large numbers of objects simultaneously consumes quite a bit of memory. This is not to say that one shouldn't use an ORM tool for bulk operations but that one should test the tool in question and not assume its performance is adequate to the given task. Such a test may show an entirely different approach, like that of XORT, is more appropriate.<br />
<br />
====Java Middleware====<br />
<br />
The Java packages all used Java plus XML, to some degree. In addition iBatis exposes SQL to the developer and it was argued that this could be viewed either as an advantage (allows tuning of underlying SQL) or a disadvantage. Both Hibernate and iBatis are designed to operate with any relational schema, not just Chado. <br />
<br />
Both Abator and iBatis share one limitation which is that one can not <br />
auto-generate higher level objects, such as Genes. In fairness one must add that this would not really <br />
be possible to do accurately for any system. There is not sufficient <br />
information present in the database schema to derive all of the <br />
relationships necessary. In both packages the auto-generation code only mimics the schema and it would have take additional work to get <br />
a package to automatically create and relate subsets of data based on "join <br />
tables" like feature_relationship property. This mostly has to do <br />
with the circular nature of the schema.<br />
<br />
=====Hibernate=====<br />
<br />
From the [http://www.hibernate.org Hibernate Web site]:<br />
<br />
''Hibernate lets you develop persistent classes following object-oriented idiom - including association, inheritance, polymorphism, composition, and collections. Hibernate allows you to express queries in its own portable SQL extension (HQL), as well as in native SQL, or with an object-oriented Criteria and Example API. Unlike many other persistence solutions, Hibernate does not hide the power of SQL from you and guarantees that your investment in relational technology and knowledge is as valid as always.''<br />
<br />
Hibernate is thought to work best when used in conjunction with a schema that's been designed with objects in mind, an ''object-oriented schema''.<br />
<br />
======Abstraction======<br />
<br />
Hibernate is the more abstracted of the 2 Java packages, it allows you to work with the relational database with the least exposure to SQL if you choose to do this. It is probably considered the more flexible of the 2 with respect to language since one can program in Java or HBL (Hibernate Query Language), a hybrid between SQL and Java.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons between Hibernate and iBatis were made.<br />
<br />
======Configuration======<br />
<br />
Hibernate has the ability to read a database schema and generate Java <br />
Code representing the database objects or tables. In this exercise the developer chose not to use <br />
this method because one must create a file telling the code auto-generation what to use for indexes, which fields should be complex <br />
objects, and so on. It was judged to be less work to just go ahead and make <br />
the code by hand.<br />
<br />
For each object mapped to the database, one can create an XML file <br />
that tells Hibernate which database fields to map to specific object <br />
properties. Additionally, one can implement equals() and hashcode() <br />
functions for each object based on the table unique constraints. <br />
After doing that, Hibernate takes care of all the Create-Retreive-Update-Delete (CRUD) and transaction <br />
functionality.<br />
<br />
======Documentation======<br />
<br />
Hibernate is a popular and well-supported tool with extensive documentation.<br />
<br />
=====iBatis=====<br />
<br />
From the [http://ibatis.apache.org iBatis Web site]:<br />
<br />
''The Data Mapper framework (a.k.a. SQL Maps) will help to significantly reduce the amount of Java and .NET code that is normally neededto access a relational database. This framework maps classes to SQL statements using a very simple XML descriptor. Simplicity is the biggest advantage of iBATIS over other frameworks and object relational mapping tools. To use iBATIS you need only be familiar with your own application domain objects (basic JavaBeans or .NET classes), XML, and SQL. There is very little else to learn. There is no complex scheme required to join tables or execute complex queries. Using iBATIS you have the full power of real SQL at your fingertips. The iBATIS Data Mapper framework can map nearly any database to any object model and is very tolerant of legacy designs, or even bad designs. This is all achieved without special database tables, peer objects or code generation.''<br />
<br />
======Abstraction======<br />
<br />
iBatis does not attempt to achieve abstraction in the way that other Java ORM tools do and assumes that viewing SQL is an advantage, not a disadvantage.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons made between iBatis and Hibernate.<br />
<br />
======Configuration======<br />
<br />
The following general steps are performed:<br />
<br />
# Create an XML file of tables the developer wants to create objects for <br />
#* Including specifying the method for retrieving auto-generated sequence values from inserts<br />
#* This file could als be easily auto-generated<br />
# Make some type adjustments for some variables <br />
#* For example, iBatis mapped some things to Java strings that should have been Integers<br />
<br />
======Documentation======<br />
<br />
iBatis is a popular and well-supported tool with extensive documentation.<br />
<br />
====Perl Middleware====<br />
<br />
The Perl approaches used only the Perl language (the Java packages all used Java plus XML, to some degree). The XORT application is not, strictly speaking, ''middleware'' but has proven to be very useful in bulk operations using the Chado schema and Chado XML though in principle it can be used with any relational schema.<br />
<br />
=====Chado::AutoDBI=====<br />
<br />
======Abstraction======<br />
<br />
Chado::AutoDBI objects map directly to the Chado tables so it could be said that Chado::AutoDBI is as abstract as Chado itself. Therefore one needs to become somewhat familiar with Chado itself in order to use Chado::AutoDBI.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons of performance were done using Perl middleware. All packages were deemed to give adequate performance when used to connect UIs to underlying databases. On the other hand the presenters were reluctant to recommend their packages for ''bulk'' operations. <br />
<br />
======Configuration======<br />
<br />
Chado::AutoDBI requires no configuration, it is designed to interact with the Chado schema out-of-the-box. If the schema changes then the underlying Class::DBI should be able to adjust to the change.<br />
<br />
======Documentation======<br />
<br />
=====Modware=====<br />
<br />
======Abstraction======<br />
<br />
Modware has higher level of abstraction than that provided by Chado::AutoDBI. The critical point is that it uses the [http://bioperl.org Bioperl] objects as accessors, this could either be highly suitable or not at all appropriate for a given environment.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons of performance were done using Perl middleware. All packages were deemed to give adequate performance when used to connect UIs to underlying databases.<br />
<br />
======Configuration======<br />
<br />
Modware requires no configuration, it is designed to interact with the Chado schema out-of-the-box. If the schema changes then the underlying Chado::AutoDBI should be able to adjust to the change. Some convenience views are created when installing Modware (i.e. V_MRNA_FEATURES).<br />
<br />
======Documentation======<br />
<br />
Bioperl-style documentation at http://gmod-ware.sourceforge.net/doc/, written for POD for all methods.<br />
<br />
=====GBrowse (DasI)=====<br />
<br />
======Abstraction======<br />
<br />
======Performance======<br />
<br />
======Configuration======<br />
<br />
This package needs no configuration, it is pre-configured for Chado.<br />
<br />
======Documentation======<br />
<br />
The DasI interface is well-documented, about a dozen methods and three classes, all documented.<br />
<br />
===Getting More Information===<br />
<br />
The issues around using and developing middleware are of general interest in GMOD. If you have questions about middleware we suggest that you contact the GMOD Development list rather than contacting individual developers. You can sign up for the list here:<br />
<br />
https://lists.sourceforge.net/lists/listinfo/gmod-devel<br />
<br />
<br />
===Object-Relational Mapping Principles===<br />
<br />
====Presentation by Sohel Merchant====<br />
<br />
Sohel Merchant, Bioinformatics Software Engineer at dictyBase, Center for Genetic Medicine, Northwestern University, Chicago. This Wiki section is an edited version of [[Media:Object_Relational_Mapping_Layer.pdf|Sohel's presentation]].<br />
<br />
=====Outline=====<br />
<br />
* The Problem<br />
* Solutions<br />
* ORM<br />
* Perl – Class::DBI<br />
* Summary<br />
<br />
=====The Problem=====<br />
<br />
* Developers need to perform Create, Retrieve, Update, Delete (aka CRUD) operations on data inside an application.<br />
* The real world objects represented using a programming language needs to be stored in databases<br />
* Using relational databases to store object-oriented data leads to a semantic gap<br />
* RDBMS have fixed types, but OO can have more complicated user defined types.<br />
<br />
=====Solutions=====<br />
<br />
* Data Access Object (DAO)<br />
* Developer writes a class which contains one attribute for each field in the table<br />
* Methods for CRUD typically contains JDBC/DBI code with the necessary SQL statements.<br />
* Object Relational Mapping (ORM), WikiPedia:<br />
** “ORM is a programming technique that links databases to object-oriented language concepts, creating (in effect) a virtual object database.“<br />
* Developer needs to configure the ORM<br />
* Less amount of manual coding<br />
* CRUD methods are automatically generated by the ORM layer<br />
<br />
=====ORM=====<br />
<br />
ORM solutions<br />
<br />
* Perl<br />
** Class::DBI<br />
* Java<br />
** EJB<br />
** Hibernate<br />
** JDO<br />
** iBatis<br />
<br />
=====Perl - Class::DBI=====<br />
<br />
* Provides a simple interfaces for wrapping Perl classes around a database tables<br />
* Tables are mapped directly to objects<br />
* The table column name are mapped to the get/set methods<br />
* Can be used with transactions<br />
<br />
=====Class::DBI=====<br />
<br />
Defining a class in Class::DBI to represent a table:<br />
<br />
CVTERM<br />
cvterm_id<br />
cv_id<br />
name<br />
definition<br />
dbxref_id<br />
<br />
Corresponding code:<br />
<br />
<perl><br />
package Chado::Cvterm;<br />
use base 'Chado::DBI';<br />
Chado::Cvterm->set_up_table('Cvterm');<br />
</perl><br />
<br />
=====Class::DBI - CRUD=====<br />
<br />
<perl><br />
## Create<br />
$term_dbobj = Chado::Cvterm->create({<br />
name => ”DUMMY TERM”,<br />
cv_id => 1,<br />
dbxref_id => 125<br />
});<br />
## Retrieve<br />
$term_dbobj = Chado::Cvterm->retrieve(2);<br />
<br />
## Update<br />
$term_dbobj->name( $term->name() );<br />
$term_dbobj->definition( $term->definition );<br />
<br />
## Delete<br />
$term_dbobj->delete();<br />
</perl><br />
<br />
=====Java - Hibernate=====<br />
<br />
* Hibernate maps Java Objects directly to database tables<br />
* Scalable<br />
* Works well for controlled Data model<br />
<br />
=====Java - iBatis=====<br />
<br />
* iBATIS maps Java Objects to the results of SQL Queries<br />
* XML definitions for queries<br />
* Queries and managing Maps<br />
* Transactions<br />
* Good fit for existing database schema<br />
<br />
=====Summary=====<br />
<br />
* ORM provides painless roundtrip of data between the application and database.<br />
* Reduces the amount of SQL code and allows a programmatic style interface to the RDBMS<br />
* Choice of ORM solution depends on the type of project<br />
* Flybase examined iBatis and Hibernate, both use XML configuration files<br />
** Hibernate is better if you're building schema from scratch<br />
** Both auto-configure given a schema. <br />
** Both have strengths and weaknesses.<br />
* Is Hibernate better when you're in the process of designing a schema?<br />
** Hibernate can assist you in making a ''Hibernate-compatible'' schema.<br />
<br />
===XORT===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/project/showfiles.php?group_id=27707<br />
* Language: Perl<br />
* Authors: Pinglei Zhou<br />
* Users: Flybase<br />
* Support: Pinglei Zhou<br />
* Third party code: Uses XML::Parser::PerlSAX, Unicode::String, XML::DOM, and DBI from Perl (CPAN)<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
=====Comparing Hibernate & XORT=====<br />
<br />
Flybase tried Hibernate, but just creating simple print() statements in the course of doing bulk operations they<br />
encountered performance issues. Therer are many caching parameters available in Hibernate but the problem is that Chado is recursive or cyclical. <br />
XORT does some simple, and automatic, caching. With XORT you can handle recursive or cyclical operations more easily. In common operations such as merging genes Chado users will encounter this issue routinely.<br />
<br />
Also see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]].<br />
<br />
====Limitations====<br />
<br />
* Database schemas need to follow certain rules<br />
** All must have internal int primary key<br />
** All must have unique key(s)<br />
* It may take a long path to retrieve certain type of data<br />
** Example: gene->allele->genotype->phenotype via feature_relationship<br />
* Structure not stored in memory, you flush out data as it goes<br />
<br />
====Presentation by Pinglei Zhou and Josh Goodman====<br />
<br />
[[XORT Presentation|XORT Presentation]]<br />
<br />
===Chado::AutoDBI===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/projects/gmod/<br />
* Language: Perl<br />
* Authors: Allen Day, Scott Cain, Brian O'Connor, & others<br />
* Users: <br />
* Support:<br />
* Third party code: Based on Class::DBI by Michael Schwern & Tony Bowden<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Performance<br />
** Can one read thousands of objects into memory? You could do this but it's not suited to bulk operations<br />
* Joins & complex queries<br />
<br />
<perl><br />
# Add the add_constructor for looking for name lengths<br />
<br />
__PACKAGE__ ->add_constructor(long_names => qq{ length(name) > 15 });<br />
<br />
# Custom SQL<br />
<br />
__PACKAGE__->set_sql(xfiles => qq{<br />
SELECT FEATURE_ID<br />
FROM FEATURE<br />
WHERE NAME = 'xfiles' });<br />
</perl><br />
<br />
====Presentation by Brian O'Connor====<br />
<br />
[[Chado::AutoDBI Presentation|Chado::AutoDBI Presentation]]<br />
<br />
===Modware===<br />
Modware seeks to provide an object-oriented perl interface to Chado to allow for rapid application development without worrying about the complex details of the schema on a day-to-day basis.<br />
<br />
====Background====<br />
<br />
* Source: http://gmod-ware.sourceforge.net<br />
* Language: Perl<br />
* Authors: Sohel Merchant, Eric Just<br />
* Contact: e-just [at] northwestern.edu<br />
* Users: dictyBase<br />
* Support: Fully supported by authors. Sourceforge infrastructure (bug reports, mailing lists)<br />
* Third party code: GMOD, BioPerl<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity: Uses the Chado::AutoDBI connection from GMOD. No connection configuration necessary since Modware is built on top of GMOD and the coonnection is configured on GMOD install. <br />
* Transaction support: Transactions are fully supported. The database handle is available as a singleton through Modware::DBH. To rollback at any time, simply insert '''new Modware::DBH->rollback()''' into script.<br />
* Code generation: No automatic code generation.<br />
<br />
====Special topics====<br />
<br />
* Features in Modware map to familiar Bioperl (Bio::Seq and Bio::SeqFeature) objects that can be accessed through a 'bioperl' method. <br />
Modware internally links Bio::SeqFeautre objects to the proper Bio::Seq objects to provide maximum functionality from the BioPerl libraries.<br />
* Modware's API documentation is complete and easily browsed at [http://gmod-ware.sourceforge.net/doc/ http://gmod-ware.sourceforge.net/doc/]<br />
* Plenty of other documentation (quick starts with simple use cases) at [http://gmod-ware.sourceforge.net http://gmod-ware.sourceforge.net]<br />
<br />
====Limitations====<br />
* Currently Alpha release (hoping for user feedback to create Beta)<br />
* Does not ''cover'' all of Chado<br />
* Not enough users to get quality feedback yet<br />
* Performance is slower by object-oriented nature<br />
* Language dependent<br />
<br />
====Presentation by Eric Just====<br />
<br />
[[Modware Presentation|Modware Presentation]]<br />
<br />
===GBrowse (DasI) Adaptor===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/project/showfiles.php?group_id=27707&package_id=34513&release_id=433523<br />
* Language: Perl<br />
* Authors: Scott Cain<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity: Connects ''via'' Perl's DBI<br />
* Transaction support: N/A (read only adapter)<br />
* Code generation: N/A<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Read-only<br />
* Not generic middleware but if you use Chado and GBrowse may be useful<br />
* Incomplete implementation of Bio::DasI; just enough to make GBrowse work<br />
* Also, despite the name, has never been tested with a Das server.<br />
<br />
====Presentation by Scott Cain====<br />
<br />
[[GBrowse (DasI) Presentation|GBrowse (DasI) Adaptor Presentation]]<br />
<br />
===iBatis and Abator===<br />
<br />
====Background====<br />
<br />
* Source: http://ibatis.apache.org/<br />
* Language: Java<br />
* Authors: Apache group<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Does not hide SQL<br />
* Does not create a whole object model of the database in memory<br />
* Not as widely used as Hibernate<br />
* No Perl version<br />
<br />
====Presentation by Jeff Bowes====<br />
<br />
[[iBatis Presentation|iBatis Presentation]]<br />
<br />
===Hibernate===<br />
<br />
====Background====<br />
<br />
* Source: http://www.hibernate.org<br />
* Language: Java<br />
* Authors: JBoss group<br />
* Users: VectorBase<br />
* Support: JBoss group<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Limitations====<br />
<br />
* Issue around Completeness <br />
* Exception Handling<br />
* Performance Tuning<br />
<br />
====Presentation by Robert Bruggner====<br />
<br />
[[Hibernate Presentation|Hibernate Presentation]]<br />
<br />
===PSU Chado Interface===<br />
<br />
====Background====<br />
<br />
* Source:<br />
* Language: Java<br />
* Authors: Chinmay Patel, Adrian Tivey, Tim Carver<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Limitations====<br />
<br />
* Hibernate Save: no equivalent to a ''find or save'' method.<br />
* Not great for bulk data retrieval.<br />
* Hibernate works best when applied to databases designed with objects in mind (''Object Oriented Databases''). <br />
<br />
<br />
====Presentation by Chinmay Patel====<br />
<br />
[[PSU Presentation|PSU Presentation]]<br />
<br />
===Wiki Authors===<br />
<br />
* Jeff Bowes, XenBase<br />
* Robert Bruggner, VectorBase<br />
* Scott Cain, GMOD<br />
* Josh Goodman, FlyBase<br />
* Eric Just, DictyBase<br />
* Sohel Merchant, DictyBase<br />
* Brian O'Connor, UCLA<br />
* Brian Osborne, GMOD<br />
* Chinmay Patel, GeneDB<br />
* Pinglei Zhou, FlyBase</div>165.124.152.78http://gmod.org/wiki/GMOD_MiddlewareGMOD Middleware2007-02-06T17:48:06Z<p>165.124.152.78: /* Background */</p>
<hr />
<div>==Executive Summary==<br />
<br />
''The meeting participants examined two types of Object-Relational Mapping (ORM) tools.'' The Hibernate and iBatis tools will examine any relational schema and create middleware ''de novo'' whereas Modware and Chado::AutoDBI are hand-built to match GMOD's Chado relational schema.<br />
<br />
''A predictable API is desirable.'' The participants' view is that an API that is easily understood and reflects current models of biological data is the most desirable API. Currently the Modware and Chado::AutoDBI packages come closest to providing this comprehensibility.<br />
<br />
__TOC__<br />
<br clear='all'><br />
==Middleware for Chado databases==<br />
<br />
===Participants===<br />
<br />
On January 19 2007, representatives of the major model organism databases (MODs) and other interested parties met to discuss and compare Middleware packages used by developers working for the MODs. The workshop attendees were tasked to make specific recommendations. Such recommendations will focus the efforts of the MODs on specific packages and lead to code re-use and more feature-rich middleware.<br />
<br />
After an introductory presentation attendees listened to a series of presentations on the different Middleware packages. Each presenter described the features of the package, both positive and negative, and showed how the package would be used to address a specific set of test problems. The attendees reassembled in a general discussion group to discuss the presentations and to develop a consensus statement. This document represents the consensus of the workshop and shows the individual presentations.<br />
<br />
<br />
===Introduction===<br />
<br />
A group of some 50 GMOD developers gathered at the annual meeting to discuss middleware. This one day meeting had the following general goals:<br />
<br />
* To educate GMOD programmers on methods and practices for Middleware<br />
* To facilitate discussion on the best methods<br />
* To guide GMOD to a uniform Middleware layer<br />
* To generate this central reference document for Middleware projects, including:<br />
** Platform information<br />
** Strengths & weaknesses of different Middleware packages<br />
** Specific examples of how one would use a given middleware package<br />
<br />
One of the key characteristics of the GMOD software project is the variety of approaches and components that it supports. This applies to applications, database schemas, as well as to middleware, a software ''layer'' that mediates the exchange of data between the applications and the databases. Despite this diversity certain applications and schemas have emerged as key supported components in GMOD, such as the GBrowse application and the Chado schema, to name just two. However, a consensus view has not emerged with respect to middleware, and there are certainly a number of different middleware packages that have been used in the GMOD world, coming from within this world and from the larger world of open source.<br />
<br />
In late 2006 the GMOD developers took note of the large number of middleware packages in use and elected to embark on a short-term study to evaluate and compare these packages. The primary motivation here was to select or recommend certain packages over others specifically within the GMOD context. The assumption is that making such recommendations will serve to focus the developers' effort on a smaller number of packages. Clearly it's also assumed that such a focus will inevitably lead to greater support for and use of those recommended packages, and that all will GMOD will benefit.<br />
<br />
Another purpose of this study is to educate GMOD programmers on best practices concerning the use and development of middleware. It's expected that common agreement on these practices will lead to the development of more effective software as well as the best use of the software in practice. Finally, this study should generate a central reference document on these different middleware packages used in GMOD. This reference will contain platform- and language-specific information as well as descriptions of the strengths and weaknesses of the packages that can be used by GMOD developers when considering middleware.<br />
<br />
====General Evaluation Criteria====<br />
<br />
The GMOD developers proposed that each presenter provide some basic information about each middleware package, both general and technical. In addition each middleware application was asked to address a set of sample problems, shown below. These example problems are thought to typify some of the common functions that the scientist may need when working with their own database. It was understood that not all software would be able to handle all aspects of the sample problems and this demonstration was not intended to be ''live''.<br />
<br />
=====Problem 1=====<br />
<br />
Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned <code>feature_id</code> for each inserted gene.<br />
<br />
Note:<br />
<br />
* The coordinates are given in exact coordinates.<br />
* Use the <code>organism_id</code> for your organism<br />
* Store description in the chado table <code>featureprop</code><br />
* A sequence in fasta format (see [[FakeChromosome|Fake Chromosome]]) should be loaded as genomic sequence, either chromosome or contig -- this will be used as a <code>srcfeature</code> in <code>featureloc</code><br />
<br />
Gene descriptions:<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
mRNA Feature<br />
exon_1: <br />
start: 13691<br />
end: 13767<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
exon_2: <br />
start: 14687<br />
end: 14720<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
symbol: x-men<br />
synonyms: wolverine<br />
mRNA Feature<br />
exon_1: <br />
start: 12648<br />
end: 13136<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
symbol: x-ray<br />
synonyms: none<br />
exon: <br />
start: 1703<br />
end: 1900<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 2=====<br />
<br />
Retrieve and print the following report for gene '''xfile''' (the coding sequence and exon coordinates are derived from the associated mRNA feature). The results should resemble the following:<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>xfile cds <br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
=====Problem 3=====<br />
<br />
Update the gene '''xfile''': change the name symbol to '''x-file''' and retrieve the changed record. Regenerate the report from Problem 1. The results should resemble the following:<br />
<br />
symbol: x-file<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>x-file cds<br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
=====Problem 4=====<br />
<br />
Search for all genes with symbols starting with ''x-''. With the results produce the following simple result list<br />
(organism will vary):<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
1325 x-ray Xenopus laevis<br />
<br />
=====Problem 5=====<br />
<br />
Delete the gene '''x-ray''' using the <code>geneId</code>. Run the search and report in Problem 4 again to show the delete has taken<br />
place, with a result resembling the following:<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
<br />
=====Problem Results=====<br />
<br />
All presenters paid attention to the assigned problems and all packages could perform the required operations, except for the GBrowse (DasI) Adaptor which is read-only software. Clearly one can see many differences between packages in how the the problems were solved, please see the presentations themselves for these details.<br />
<br />
===The Middleware Packages===<br />
<br />
The one day meeting heard presentations from developers using both Perl and Java middleware and a number of satisfactory solutions were described. The focus in all cases was some sort of system that connected to the Chado relational database. Although other databases are encountered in the GMOD world (e.g. [http://biosql.org BioSQL]) the Chado schema is popular and serves as a good test schema for this exercise given its complexity. The primary focus in the talks was on functionality from the perspective of writing code and extending the software and less attention was given to performance. Each presenter focussed on their middleware and little side-by-side comparisons were made (for one comparison please see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]] by Josh Goodman, written before this meeting).<br />
<br />
The Chado::AutoDBI middleware package is built on Perl's Class:DBI, a module that can be considered either an ORM tool or a tool to build ORM tools. Modware is built on top of Chado::AutoDBI. Both Chado::AutoDBI and Modware are built specifically for the Chado schema and aren't general tools. <br />
<br />
A key difference is that Modware uses Bioperl as its programmatic interface whereas the Chado::AutoDBI API resembles that of Class::DBI. Furthermore Chado::AutoDBI wraps the entire Chado schema where Modware objects only address those parts of the Chado schema that map to Bioperl objects. Neither of these packages require configuration, they are pre-configured for Chado and all one needs to do is connect to an existing schema. This schema, in theory, could be running on any popular RDBMS (Postgres, Mysql, Oracle, etc.), this flexibility is built into Class::DBI.<br />
<br />
The Java packages, Hibernate and iBatis, are far more general than the 2 packages discussed above and one could use them with any relational schema. One distinguishing feature is the languages that these packages use: Hibernate tends to present more Java to the programmer whereas iBatis presents both Java and XML. Both allow you to address the schema with SQL should you choose to do so. Hibernate also introduces its own language, HQL, a Java-SQL hybrid. Both would need some configuration to connect to the Chado schema but this should not be considered a significant barrier. Both packages can be used with any popular RDBMS (Postgres, Mysql, Oracle, etc.).<br />
<br />
XORT is not a typical ORM tool like these other packages but has been included here because of its utility in bulk operations. This capability is one that the ORM tools are thought not to do very well as the serial construction and destruction of objects is typically not fast, and constructing very large numbers of objects simultaneously consumes quite a bit of memory. This is not to say that one shouldn't use an ORM tool for bulk operations but that one should test the tool in question and not assume its performance is adequate to the given task. Such a test may show an entirely different approach, like that of XORT, is more appropriate.<br />
<br />
====Java Middleware====<br />
<br />
The Java packages all used Java plus XML, to some degree. In addition iBatis exposes SQL to the developer and it was argued that this could be viewed either as an advantage (allows tuning of underlying SQL) or a disadvantage. Both Hibernate and iBatis are designed to operate with any relational schema, not just Chado. <br />
<br />
Both Abator and iBatis share one limitation which is that one can not <br />
auto-generate higher level objects, such as Genes. In fairness one must add that this would not really <br />
be possible to do accurately for any system. There is not sufficient <br />
information present in the database schema to derive all of the <br />
relationships necessary. In both packages the auto-generation code only mimics the schema and it would have take additional work to get <br />
a package to automatically create and relate subsets of data based on "join <br />
tables" like feature_relationship property. This mostly has to do <br />
with the circular nature of the schema.<br />
<br />
=====Hibernate=====<br />
<br />
From the [http://www.hibernate.org Hibernate Web site]:<br />
<br />
''Hibernate lets you develop persistent classes following object-oriented idiom - including association, inheritance, polymorphism, composition, and collections. Hibernate allows you to express queries in its own portable SQL extension (HQL), as well as in native SQL, or with an object-oriented Criteria and Example API. Unlike many other persistence solutions, Hibernate does not hide the power of SQL from you and guarantees that your investment in relational technology and knowledge is as valid as always.''<br />
<br />
Hibernate is thought to work best when used in conjunction with a schema that's been designed with objects in mind, an ''object-oriented schema''.<br />
<br />
======Abstraction======<br />
<br />
Hibernate is the more abstracted of the 2 Java packages, it allows you to work with the relational database with the least exposure to SQL if you choose to do this. It is probably considered the more flexible of the 2 with respect to language since one can program in Java or HBL (Hibernate Query Language), a hybrid between SQL and Java.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons between Hibernate and iBatis were made.<br />
<br />
======Configuration======<br />
<br />
Hibernate has the ability to read a database schema and generate Java <br />
Code representing the database objects or tables. In this exercise the developer chose not to use <br />
this method because one must create a file telling the code auto-generation what to use for indexes, which fields should be complex <br />
objects, and so on. It was judged to be less work to just go ahead and make <br />
the code by hand.<br />
<br />
For each object mapped to the database, one can create an XML file <br />
that tells Hibernate which database fields to map to specific object <br />
properties. Additionally, one can implement equals() and hashcode() <br />
functions for each object based on the table unique constraints. <br />
After doing that, Hibernate takes care of all the Create-Retreive-Update-Delete (CRUD) and transaction <br />
functionality.<br />
<br />
======Documentation======<br />
<br />
Hibernate is a popular and well-supported tool with extensive documentation.<br />
<br />
=====iBatis=====<br />
<br />
From the [http://ibatis.apache.org iBatis Web site]:<br />
<br />
''The Data Mapper framework (a.k.a. SQL Maps) will help to significantly reduce the amount of Java and .NET code that is normally neededto access a relational database. This framework maps classes to SQL statements using a very simple XML descriptor. Simplicity is the biggest advantage of iBATIS over other frameworks and object relational mapping tools. To use iBATIS you need only be familiar with your own application domain objects (basic JavaBeans or .NET classes), XML, and SQL. There is very little else to learn. There is no complex scheme required to join tables or execute complex queries. Using iBATIS you have the full power of real SQL at your fingertips. The iBATIS Data Mapper framework can map nearly any database to any object model and is very tolerant of legacy designs, or even bad designs. This is all achieved without special database tables, peer objects or code generation.''<br />
<br />
======Abstraction======<br />
<br />
iBatis does not attempt to achieve abstraction in the way that other Java ORM tools do and assumes that viewing SQL is an advantage, not a disadvantage.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons made between iBatis and Hibernate.<br />
<br />
======Configuration======<br />
<br />
The following general steps are performed:<br />
<br />
# Create an XML file of tables the developer wants to create objects for <br />
#* Including specifying the method for retrieving auto-generated sequence values from inserts<br />
#* This file could als be easily auto-generated<br />
# Make some type adjustments for some variables <br />
#* For example, iBatis mapped some things to Java strings that should have been Integers<br />
<br />
======Documentation======<br />
<br />
iBatis is a popular and well-supported tool with extensive documentation.<br />
<br />
====Perl Middleware====<br />
<br />
The Perl approaches used only the Perl language (the Java packages all used Java plus XML, to some degree). The XORT application is not, strictly speaking, ''middleware'' but has proven to be very useful in bulk operations using the Chado schema and Chado XML though in principle it can be used with any relational schema.<br />
<br />
=====Chado::AutoDBI=====<br />
<br />
======Abstraction======<br />
<br />
Chado::AutoDBI objects map directly to the Chado tables so it could be said that Chado::AutoDBI is as abstract as Chado itself. Therefore one needs to become somewhat familiar with Chado itself in order to use Chado::AutoDBI.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons of performance were done using Perl middleware. All packages were deemed to give adequate performance when used to connect UIs to underlying databases. On the other hand the presenters were reluctant to recommend their packages for ''bulk'' operations. <br />
<br />
======Configuration======<br />
<br />
Chado::AutoDBI requires no configuration, it is designed to interact with the Chado schema out-of-the-box. If the schema changes then the underlying Class::DBI should be able to adjust to the change.<br />
<br />
======Documentation======<br />
<br />
=====Modware=====<br />
<br />
======Abstraction======<br />
<br />
Modware has higher level of abstraction than that provided by Chado::AutoDBI. The critical point is that it uses the [http://bioperl.org Bioperl] objects as accessors, this could either be highly suitable or not at all appropriate for a given environment.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons of performance were done using Perl middleware. All packages were deemed to give adequate performance when used to connect UIs to underlying databases.<br />
<br />
======Configuration======<br />
<br />
Modware requires no configuration, it is designed to interact with the Chado schema out-of-the-box. If the schema changes then the underlying Chado::AutoDBI should be able to adjust to the change.<br />
<br />
======Documentation======<br />
<br />
Bioperl-style documentation at http://gmod-ware.sourceforge.net/doc/, written for POD for all methods.<br />
<br />
=====GBrowse (DasI)=====<br />
<br />
======Abstraction======<br />
<br />
======Performance======<br />
<br />
======Configuration======<br />
<br />
This package needs no configuration, it is pre-configured for Chado.<br />
<br />
======Documentation======<br />
<br />
The DasI interface is well-documented, about a dozen methods and three classes, all documented.<br />
<br />
===Getting More Information===<br />
<br />
The issues around using and developing middleware are of general interest in GMOD. If you have questions about middleware we suggest that you contact the GMOD Development list rather than contacting individual developers. You can sign up for the list here:<br />
<br />
https://lists.sourceforge.net/lists/listinfo/gmod-devel<br />
<br />
<br />
===Object-Relational Mapping Principles===<br />
<br />
====Presentation by Sohel Merchant====<br />
<br />
Sohel Merchant, Bioinformatics Software Engineer at dictyBase, Center for Genetic Medicine, Northwestern University, Chicago. This Wiki section is an edited version of [[Media:Object_Relational_Mapping_Layer.pdf|Sohel's presentation]].<br />
<br />
=====Outline=====<br />
<br />
* The Problem<br />
* Solutions<br />
* ORM<br />
* Perl – Class::DBI<br />
* Summary<br />
<br />
=====The Problem=====<br />
<br />
* Developers need to perform Create, Retrieve, Update, Delete (aka CRUD) operations on data inside an application.<br />
* The real world objects represented using a programming language needs to be stored in databases<br />
* Using relational databases to store object-oriented data leads to a semantic gap<br />
* RDBMS have fixed types, but OO can have more complicated user defined types.<br />
<br />
=====Solutions=====<br />
<br />
* Data Access Object (DAO)<br />
* Developer writes a class which contains one attribute for each field in the table<br />
* Methods for CRUD typically contains JDBC/DBI code with the necessary SQL statements.<br />
* Object Relational Mapping (ORM), WikiPedia:<br />
** “ORM is a programming technique that links databases to object-oriented language concepts, creating (in effect) a virtual object database.“<br />
* Developer needs to configure the ORM<br />
* Less amount of manual coding<br />
* CRUD methods are automatically generated by the ORM layer<br />
<br />
=====ORM=====<br />
<br />
ORM solutions<br />
<br />
* Perl<br />
** Class::DBI<br />
* Java<br />
** EJB<br />
** Hibernate<br />
** JDO<br />
** iBatis<br />
<br />
=====Perl - Class::DBI=====<br />
<br />
* Provides a simple interfaces for wrapping Perl classes around a database tables<br />
* Tables are mapped directly to objects<br />
* The table column name are mapped to the get/set methods<br />
* Can be used with transactions<br />
<br />
=====Class::DBI=====<br />
<br />
Defining a class in Class::DBI to represent a table:<br />
<br />
CVTERM<br />
cvterm_id<br />
cv_id<br />
name<br />
definition<br />
dbxref_id<br />
<br />
Corresponding code:<br />
<br />
<perl><br />
package Chado::Cvterm;<br />
use base 'Chado::DBI';<br />
Chado::Cvterm->set_up_table('Cvterm');<br />
</perl><br />
<br />
=====Class::DBI - CRUD=====<br />
<br />
<perl><br />
## Create<br />
$term_dbobj = Chado::Cvterm->create({<br />
name => ”DUMMY TERM”,<br />
cv_id => 1,<br />
dbxref_id => 125<br />
});<br />
## Retrieve<br />
$term_dbobj = Chado::Cvterm->retrieve(2);<br />
<br />
## Update<br />
$term_dbobj->name( $term->name() );<br />
$term_dbobj->definition( $term->definition );<br />
<br />
## Delete<br />
$term_dbobj->delete();<br />
</perl><br />
<br />
=====Java - Hibernate=====<br />
<br />
* Hibernate maps Java Objects directly to database tables<br />
* Scalable<br />
* Works well for controlled Data model<br />
<br />
=====Java - iBatis=====<br />
<br />
* iBATIS maps Java Objects to the results of SQL Queries<br />
* XML definitions for queries<br />
* Queries and managing Maps<br />
* Transactions<br />
* Good fit for existing database schema<br />
<br />
=====Summary=====<br />
<br />
* ORM provides painless roundtrip of data between the application and database.<br />
* Reduces the amount of SQL code and allows a programmatic style interface to the RDBMS<br />
* Choice of ORM solution depends on the type of project<br />
* Flybase examined iBatis and Hibernate, both use XML configuration files<br />
** Hibernate is better if you're building schema from scratch<br />
** Both auto-configure given a schema. <br />
** Both have strengths and weaknesses.<br />
* Is Hibernate better when you're in the process of designing a schema?<br />
** Hibernate can assist you in making a ''Hibernate-compatible'' schema.<br />
<br />
===XORT===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/project/showfiles.php?group_id=27707<br />
* Language: Perl<br />
* Authors: Pinglei Zhou<br />
* Users: Flybase<br />
* Support: Pinglei Zhou<br />
* Third party code: Uses XML::Parser::PerlSAX, Unicode::String, XML::DOM, and DBI from Perl (CPAN)<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
=====Comparing Hibernate & XORT=====<br />
<br />
Flybase tried Hibernate, but just creating simple print() statements in the course of doing bulk operations they<br />
encountered performance issues. Therer are many caching parameters available in Hibernate but the problem is that Chado is recursive or cyclical. <br />
XORT does some simple, and automatic, caching. With XORT you can handle recursive or cyclical operations more easily. In common operations such as merging genes Chado users will encounter this issue routinely.<br />
<br />
Also see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]].<br />
<br />
====Limitations====<br />
<br />
* Database schemas need to follow certain rules<br />
** All must have internal int primary key<br />
** All must have unique key(s)<br />
* It may take a long path to retrieve certain type of data<br />
** Example: gene->allele->genotype->phenotype via feature_relationship<br />
* Structure not stored in memory, you flush out data as it goes<br />
<br />
====Presentation by Pinglei Zhou and Josh Goodman====<br />
<br />
[[XORT Presentation|XORT Presentation]]<br />
<br />
===Chado::AutoDBI===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/projects/gmod/<br />
* Language: Perl<br />
* Authors: Allen Day, Scott Cain, Brian O'Connor, & others<br />
* Users: <br />
* Support:<br />
* Third party code: Based on Class::DBI by Michael Schwern & Tony Bowden<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Performance<br />
** Can one read thousands of objects into memory? You could do this but it's not suited to bulk operations<br />
* Joins & complex queries<br />
<br />
<perl><br />
# Add the add_constructor for looking for name lengths<br />
<br />
__PACKAGE__ ->add_constructor(long_names => qq{ length(name) > 15 });<br />
<br />
# Custom SQL<br />
<br />
__PACKAGE__->set_sql(xfiles => qq{<br />
SELECT FEATURE_ID<br />
FROM FEATURE<br />
WHERE NAME = 'xfiles' });<br />
</perl><br />
<br />
====Presentation by Brian O'Connor====<br />
<br />
[[Chado::AutoDBI Presentation|Chado::AutoDBI Presentation]]<br />
<br />
===Modware===<br />
Modware seeks to provide an object-oriented perl interface to Chado to allow for rapid application development without worrying about the complex details of the schema on a day-to-day basis.<br />
<br />
====Background====<br />
<br />
* Source: http://gmod-ware.sourceforge.net<br />
* Language: Perl<br />
* Authors: Sohel Merchant, Eric Just<br />
* Contact: e-just [at] northwestern.edu<br />
* Users: dictyBase<br />
* Support: Fully supported by authors. Sourceforge infrastructure (bug reports, mailing lists)<br />
* Third party code: GMOD, BioPerl<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity: Uses the Chado::AutoDBI connection from GMOD. No connection configuration necessary since Modware is built on top of GMOD and the coonnection is configured on GMOD install. <br />
* Transaction support: Transactions are fully supported. The database handle is available as a singleton through Modware::DBH. To rollback at any time, simply insert '''new Modware::DBH->rollback()''' into script.<br />
* Code generation: No automatic code generation.<br />
<br />
====Special topics====<br />
<br />
* Features in Modware map to familiar Bioperl (Bio::Seq and Bio::SeqFeature) objects that can be accessed through a 'bioperl' method. <br />
Modware internally links Bio::SeqFeautre objects to the proper Bio::Seq objects to provide maximum functionality from the BioPerl libraries.<br />
* Modware's API documentation is complete and easily browsed at [http://gmod-ware.sourceforge.net/doc/ http://gmod-ware.sourceforge.net/doc/]<br />
* Plenty of other documentation (quick starts with simple use cases) at [http://gmod-ware.sourceforge.net http://gmod-ware.sourceforge.net]<br />
<br />
====Limitations====<br />
* Currently Alpha release (hoping for user feedback to create Beta)<br />
* Does not ''cover'' all of Chado<br />
* Not enough users to get quality feedback yet<br />
* Performance is slower by object-oriented nature<br />
* Language dependent<br />
<br />
====Presentation by Eric Just====<br />
<br />
[[Modware Presentation|Modware Presentation]]<br />
<br />
===GBrowse (DasI) Adaptor===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/project/showfiles.php?group_id=27707&package_id=34513&release_id=433523<br />
* Language: Perl<br />
* Authors: Scott Cain<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity: Connects ''via'' Perl's DBI<br />
* Transaction support: N/A (read only adapter)<br />
* Code generation: N/A<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Read-only<br />
* Not generic middleware but if you use Chado and GBrowse may be useful<br />
* Incomplete implementation of Bio::DasI; just enough to make GBrowse work<br />
* Also, despite the name, has never been tested with a Das server.<br />
<br />
====Presentation by Scott Cain====<br />
<br />
[[GBrowse (DasI) Presentation|GBrowse (DasI) Adaptor Presentation]]<br />
<br />
===iBatis and Abator===<br />
<br />
====Background====<br />
<br />
* Source: http://ibatis.apache.org/<br />
* Language: Java<br />
* Authors: Apache group<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Does not hide SQL<br />
* Does not create a whole object model of the database in memory<br />
* Not as widely used as Hibernate<br />
* No Perl version<br />
<br />
====Presentation by Jeff Bowes====<br />
<br />
[[iBatis Presentation|iBatis Presentation]]<br />
<br />
===Hibernate===<br />
<br />
====Background====<br />
<br />
* Source: http://www.hibernate.org<br />
* Language: Java<br />
* Authors: JBoss group<br />
* Users: VectorBase<br />
* Support: JBoss group<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Limitations====<br />
<br />
* Issue around Completeness <br />
* Exception Handling<br />
* Performance Tuning<br />
<br />
====Presentation by Robert Bruggner====<br />
<br />
[[Hibernate Presentation|Hibernate Presentation]]<br />
<br />
===PSU Chado Interface===<br />
<br />
====Background====<br />
<br />
* Source:<br />
* Language: Java<br />
* Authors: Chinmay Patel, Adrian Tivey, Tim Carver<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Limitations====<br />
<br />
* Hibernate Save: no equivalent to a ''find or save'' method.<br />
* Not great for bulk data retrieval.<br />
* Hibernate works best when applied to databases designed with objects in mind (''Object Oriented Databases''). <br />
<br />
<br />
====Presentation by Chinmay Patel====<br />
<br />
[[PSU Presentation|PSU Presentation]]<br />
<br />
===Wiki Authors===<br />
<br />
* Jeff Bowes, XenBase<br />
* Robert Bruggner, VectorBase<br />
* Scott Cain, GMOD<br />
* Josh Goodman, FlyBase<br />
* Eric Just, DictyBase<br />
* Sohel Merchant, DictyBase<br />
* Brian O'Connor, UCLA<br />
* Brian Osborne, GMOD<br />
* Chinmay Patel, GeneDB<br />
* Pinglei Zhou, FlyBase</div>165.124.152.78http://gmod.org/wiki/GMOD_MiddlewareGMOD Middleware2007-02-06T17:46:58Z<p>165.124.152.78: /* Limitations */</p>
<hr />
<div>==Executive Summary==<br />
<br />
''The meeting participants examined two types of Object-Relational Mapping (ORM) tools.'' The Hibernate and iBatis tools will examine any relational schema and create middleware ''de novo'' whereas Modware and Chado::AutoDBI are hand-built to match GMOD's Chado relational schema.<br />
<br />
''A predictable API is desirable.'' The participants' view is that an API that is easily understood and reflects current models of biological data is the most desirable API. Currently the Modware and Chado::AutoDBI packages come closest to providing this comprehensibility.<br />
<br />
__TOC__<br />
<br clear='all'><br />
==Middleware for Chado databases==<br />
<br />
===Participants===<br />
<br />
On January 19 2007, representatives of the major model organism databases (MODs) and other interested parties met to discuss and compare Middleware packages used by developers working for the MODs. The workshop attendees were tasked to make specific recommendations. Such recommendations will focus the efforts of the MODs on specific packages and lead to code re-use and more feature-rich middleware.<br />
<br />
After an introductory presentation attendees listened to a series of presentations on the different Middleware packages. Each presenter described the features of the package, both positive and negative, and showed how the package would be used to address a specific set of test problems. The attendees reassembled in a general discussion group to discuss the presentations and to develop a consensus statement. This document represents the consensus of the workshop and shows the individual presentations.<br />
<br />
<br />
===Introduction===<br />
<br />
A group of some 50 GMOD developers gathered at the annual meeting to discuss middleware. This one day meeting had the following general goals:<br />
<br />
* To educate GMOD programmers on methods and practices for Middleware<br />
* To facilitate discussion on the best methods<br />
* To guide GMOD to a uniform Middleware layer<br />
* To generate this central reference document for Middleware projects, including:<br />
** Platform information<br />
** Strengths & weaknesses of different Middleware packages<br />
** Specific examples of how one would use a given middleware package<br />
<br />
One of the key characteristics of the GMOD software project is the variety of approaches and components that it supports. This applies to applications, database schemas, as well as to middleware, a software ''layer'' that mediates the exchange of data between the applications and the databases. Despite this diversity certain applications and schemas have emerged as key supported components in GMOD, such as the GBrowse application and the Chado schema, to name just two. However, a consensus view has not emerged with respect to middleware, and there are certainly a number of different middleware packages that have been used in the GMOD world, coming from within this world and from the larger world of open source.<br />
<br />
In late 2006 the GMOD developers took note of the large number of middleware packages in use and elected to embark on a short-term study to evaluate and compare these packages. The primary motivation here was to select or recommend certain packages over others specifically within the GMOD context. The assumption is that making such recommendations will serve to focus the developers' effort on a smaller number of packages. Clearly it's also assumed that such a focus will inevitably lead to greater support for and use of those recommended packages, and that all will GMOD will benefit.<br />
<br />
Another purpose of this study is to educate GMOD programmers on best practices concerning the use and development of middleware. It's expected that common agreement on these practices will lead to the development of more effective software as well as the best use of the software in practice. Finally, this study should generate a central reference document on these different middleware packages used in GMOD. This reference will contain platform- and language-specific information as well as descriptions of the strengths and weaknesses of the packages that can be used by GMOD developers when considering middleware.<br />
<br />
====General Evaluation Criteria====<br />
<br />
The GMOD developers proposed that each presenter provide some basic information about each middleware package, both general and technical. In addition each middleware application was asked to address a set of sample problems, shown below. These example problems are thought to typify some of the common functions that the scientist may need when working with their own database. It was understood that not all software would be able to handle all aspects of the sample problems and this demonstration was not intended to be ''live''.<br />
<br />
=====Problem 1=====<br />
<br />
Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned <code>feature_id</code> for each inserted gene.<br />
<br />
Note:<br />
<br />
* The coordinates are given in exact coordinates.<br />
* Use the <code>organism_id</code> for your organism<br />
* Store description in the chado table <code>featureprop</code><br />
* A sequence in fasta format (see [[FakeChromosome|Fake Chromosome]]) should be loaded as genomic sequence, either chromosome or contig -- this will be used as a <code>srcfeature</code> in <code>featureloc</code><br />
<br />
Gene descriptions:<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
mRNA Feature<br />
exon_1: <br />
start: 13691<br />
end: 13767<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
exon_2: <br />
start: 14687<br />
end: 14720<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
symbol: x-men<br />
synonyms: wolverine<br />
mRNA Feature<br />
exon_1: <br />
start: 12648<br />
end: 13136<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
symbol: x-ray<br />
synonyms: none<br />
exon: <br />
start: 1703<br />
end: 1900<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 2=====<br />
<br />
Retrieve and print the following report for gene '''xfile''' (the coding sequence and exon coordinates are derived from the associated mRNA feature). The results should resemble the following:<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>xfile cds <br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
=====Problem 3=====<br />
<br />
Update the gene '''xfile''': change the name symbol to '''x-file''' and retrieve the changed record. Regenerate the report from Problem 1. The results should resemble the following:<br />
<br />
symbol: x-file<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>x-file cds<br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
=====Problem 4=====<br />
<br />
Search for all genes with symbols starting with ''x-''. With the results produce the following simple result list<br />
(organism will vary):<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
1325 x-ray Xenopus laevis<br />
<br />
=====Problem 5=====<br />
<br />
Delete the gene '''x-ray''' using the <code>geneId</code>. Run the search and report in Problem 4 again to show the delete has taken<br />
place, with a result resembling the following:<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
<br />
=====Problem Results=====<br />
<br />
All presenters paid attention to the assigned problems and all packages could perform the required operations, except for the GBrowse (DasI) Adaptor which is read-only software. Clearly one can see many differences between packages in how the the problems were solved, please see the presentations themselves for these details.<br />
<br />
===The Middleware Packages===<br />
<br />
The one day meeting heard presentations from developers using both Perl and Java middleware and a number of satisfactory solutions were described. The focus in all cases was some sort of system that connected to the Chado relational database. Although other databases are encountered in the GMOD world (e.g. [http://biosql.org BioSQL]) the Chado schema is popular and serves as a good test schema for this exercise given its complexity. The primary focus in the talks was on functionality from the perspective of writing code and extending the software and less attention was given to performance. Each presenter focussed on their middleware and little side-by-side comparisons were made (for one comparison please see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]] by Josh Goodman, written before this meeting).<br />
<br />
The Chado::AutoDBI middleware package is built on Perl's Class:DBI, a module that can be considered either an ORM tool or a tool to build ORM tools. Modware is built on top of Chado::AutoDBI. Both Chado::AutoDBI and Modware are built specifically for the Chado schema and aren't general tools. <br />
<br />
A key difference is that Modware uses Bioperl as its programmatic interface whereas the Chado::AutoDBI API resembles that of Class::DBI. Furthermore Chado::AutoDBI wraps the entire Chado schema where Modware objects only address those parts of the Chado schema that map to Bioperl objects. Neither of these packages require configuration, they are pre-configured for Chado and all one needs to do is connect to an existing schema. This schema, in theory, could be running on any popular RDBMS (Postgres, Mysql, Oracle, etc.), this flexibility is built into Class::DBI.<br />
<br />
The Java packages, Hibernate and iBatis, are far more general than the 2 packages discussed above and one could use them with any relational schema. One distinguishing feature is the languages that these packages use: Hibernate tends to present more Java to the programmer whereas iBatis presents both Java and XML. Both allow you to address the schema with SQL should you choose to do so. Hibernate also introduces its own language, HQL, a Java-SQL hybrid. Both would need some configuration to connect to the Chado schema but this should not be considered a significant barrier. Both packages can be used with any popular RDBMS (Postgres, Mysql, Oracle, etc.).<br />
<br />
XORT is not a typical ORM tool like these other packages but has been included here because of its utility in bulk operations. This capability is one that the ORM tools are thought not to do very well as the serial construction and destruction of objects is typically not fast, and constructing very large numbers of objects simultaneously consumes quite a bit of memory. This is not to say that one shouldn't use an ORM tool for bulk operations but that one should test the tool in question and not assume its performance is adequate to the given task. Such a test may show an entirely different approach, like that of XORT, is more appropriate.<br />
<br />
====Java Middleware====<br />
<br />
The Java packages all used Java plus XML, to some degree. In addition iBatis exposes SQL to the developer and it was argued that this could be viewed either as an advantage (allows tuning of underlying SQL) or a disadvantage. Both Hibernate and iBatis are designed to operate with any relational schema, not just Chado. <br />
<br />
Both Abator and iBatis share one limitation which is that one can not <br />
auto-generate higher level objects, such as Genes. In fairness one must add that this would not really <br />
be possible to do accurately for any system. There is not sufficient <br />
information present in the database schema to derive all of the <br />
relationships necessary. In both packages the auto-generation code only mimics the schema and it would have take additional work to get <br />
a package to automatically create and relate subsets of data based on "join <br />
tables" like feature_relationship property. This mostly has to do <br />
with the circular nature of the schema.<br />
<br />
=====Hibernate=====<br />
<br />
From the [http://www.hibernate.org Hibernate Web site]:<br />
<br />
''Hibernate lets you develop persistent classes following object-oriented idiom - including association, inheritance, polymorphism, composition, and collections. Hibernate allows you to express queries in its own portable SQL extension (HQL), as well as in native SQL, or with an object-oriented Criteria and Example API. Unlike many other persistence solutions, Hibernate does not hide the power of SQL from you and guarantees that your investment in relational technology and knowledge is as valid as always.''<br />
<br />
Hibernate is thought to work best when used in conjunction with a schema that's been designed with objects in mind, an ''object-oriented schema''.<br />
<br />
======Abstraction======<br />
<br />
Hibernate is the more abstracted of the 2 Java packages, it allows you to work with the relational database with the least exposure to SQL if you choose to do this. It is probably considered the more flexible of the 2 with respect to language since one can program in Java or HBL (Hibernate Query Language), a hybrid between SQL and Java.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons between Hibernate and iBatis were made.<br />
<br />
======Configuration======<br />
<br />
Hibernate has the ability to read a database schema and generate Java <br />
Code representing the database objects or tables. In this exercise the developer chose not to use <br />
this method because one must create a file telling the code auto-generation what to use for indexes, which fields should be complex <br />
objects, and so on. It was judged to be less work to just go ahead and make <br />
the code by hand.<br />
<br />
For each object mapped to the database, one can create an XML file <br />
that tells Hibernate which database fields to map to specific object <br />
properties. Additionally, one can implement equals() and hashcode() <br />
functions for each object based on the table unique constraints. <br />
After doing that, Hibernate takes care of all the Create-Retreive-Update-Delete (CRUD) and transaction <br />
functionality.<br />
<br />
======Documentation======<br />
<br />
Hibernate is a popular and well-supported tool with extensive documentation.<br />
<br />
=====iBatis=====<br />
<br />
From the [http://ibatis.apache.org iBatis Web site]:<br />
<br />
''The Data Mapper framework (a.k.a. SQL Maps) will help to significantly reduce the amount of Java and .NET code that is normally neededto access a relational database. This framework maps classes to SQL statements using a very simple XML descriptor. Simplicity is the biggest advantage of iBATIS over other frameworks and object relational mapping tools. To use iBATIS you need only be familiar with your own application domain objects (basic JavaBeans or .NET classes), XML, and SQL. There is very little else to learn. There is no complex scheme required to join tables or execute complex queries. Using iBATIS you have the full power of real SQL at your fingertips. The iBATIS Data Mapper framework can map nearly any database to any object model and is very tolerant of legacy designs, or even bad designs. This is all achieved without special database tables, peer objects or code generation.''<br />
<br />
======Abstraction======<br />
<br />
iBatis does not attempt to achieve abstraction in the way that other Java ORM tools do and assumes that viewing SQL is an advantage, not a disadvantage.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons made between iBatis and Hibernate.<br />
<br />
======Configuration======<br />
<br />
The following general steps are performed:<br />
<br />
# Create an XML file of tables the developer wants to create objects for <br />
#* Including specifying the method for retrieving auto-generated sequence values from inserts<br />
#* This file could als be easily auto-generated<br />
# Make some type adjustments for some variables <br />
#* For example, iBatis mapped some things to Java strings that should have been Integers<br />
<br />
======Documentation======<br />
<br />
iBatis is a popular and well-supported tool with extensive documentation.<br />
<br />
====Perl Middleware====<br />
<br />
The Perl approaches used only the Perl language (the Java packages all used Java plus XML, to some degree). The XORT application is not, strictly speaking, ''middleware'' but has proven to be very useful in bulk operations using the Chado schema and Chado XML though in principle it can be used with any relational schema.<br />
<br />
=====Chado::AutoDBI=====<br />
<br />
======Abstraction======<br />
<br />
Chado::AutoDBI objects map directly to the Chado tables so it could be said that Chado::AutoDBI is as abstract as Chado itself. Therefore one needs to become somewhat familiar with Chado itself in order to use Chado::AutoDBI.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons of performance were done using Perl middleware. All packages were deemed to give adequate performance when used to connect UIs to underlying databases. On the other hand the presenters were reluctant to recommend their packages for ''bulk'' operations. <br />
<br />
======Configuration======<br />
<br />
Chado::AutoDBI requires no configuration, it is designed to interact with the Chado schema out-of-the-box. If the schema changes then the underlying Class::DBI should be able to adjust to the change.<br />
<br />
======Documentation======<br />
<br />
=====Modware=====<br />
<br />
======Abstraction======<br />
<br />
Modware has higher level of abstraction than that provided by Chado::AutoDBI. The critical point is that it uses the [http://bioperl.org Bioperl] objects as accessors, this could either be highly suitable or not at all appropriate for a given environment.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons of performance were done using Perl middleware. All packages were deemed to give adequate performance when used to connect UIs to underlying databases.<br />
<br />
======Configuration======<br />
<br />
Modware requires no configuration, it is designed to interact with the Chado schema out-of-the-box. If the schema changes then the underlying Chado::AutoDBI should be able to adjust to the change.<br />
<br />
======Documentation======<br />
<br />
Bioperl-style documentation at http://gmod-ware.sourceforge.net/doc/, written for POD for all methods.<br />
<br />
=====GBrowse (DasI)=====<br />
<br />
======Abstraction======<br />
<br />
======Performance======<br />
<br />
======Configuration======<br />
<br />
This package needs no configuration, it is pre-configured for Chado.<br />
<br />
======Documentation======<br />
<br />
The DasI interface is well-documented, about a dozen methods and three classes, all documented.<br />
<br />
===Getting More Information===<br />
<br />
The issues around using and developing middleware are of general interest in GMOD. If you have questions about middleware we suggest that you contact the GMOD Development list rather than contacting individual developers. You can sign up for the list here:<br />
<br />
https://lists.sourceforge.net/lists/listinfo/gmod-devel<br />
<br />
<br />
===Object-Relational Mapping Principles===<br />
<br />
====Presentation by Sohel Merchant====<br />
<br />
Sohel Merchant, Bioinformatics Software Engineer at dictyBase, Center for Genetic Medicine, Northwestern University, Chicago. This Wiki section is an edited version of [[Media:Object_Relational_Mapping_Layer.pdf|Sohel's presentation]].<br />
<br />
=====Outline=====<br />
<br />
* The Problem<br />
* Solutions<br />
* ORM<br />
* Perl – Class::DBI<br />
* Summary<br />
<br />
=====The Problem=====<br />
<br />
* Developers need to perform Create, Retrieve, Update, Delete (aka CRUD) operations on data inside an application.<br />
* The real world objects represented using a programming language needs to be stored in databases<br />
* Using relational databases to store object-oriented data leads to a semantic gap<br />
* RDBMS have fixed types, but OO can have more complicated user defined types.<br />
<br />
=====Solutions=====<br />
<br />
* Data Access Object (DAO)<br />
* Developer writes a class which contains one attribute for each field in the table<br />
* Methods for CRUD typically contains JDBC/DBI code with the necessary SQL statements.<br />
* Object Relational Mapping (ORM), WikiPedia:<br />
** “ORM is a programming technique that links databases to object-oriented language concepts, creating (in effect) a virtual object database.“<br />
* Developer needs to configure the ORM<br />
* Less amount of manual coding<br />
* CRUD methods are automatically generated by the ORM layer<br />
<br />
=====ORM=====<br />
<br />
ORM solutions<br />
<br />
* Perl<br />
** Class::DBI<br />
* Java<br />
** EJB<br />
** Hibernate<br />
** JDO<br />
** iBatis<br />
<br />
=====Perl - Class::DBI=====<br />
<br />
* Provides a simple interfaces for wrapping Perl classes around a database tables<br />
* Tables are mapped directly to objects<br />
* The table column name are mapped to the get/set methods<br />
* Can be used with transactions<br />
<br />
=====Class::DBI=====<br />
<br />
Defining a class in Class::DBI to represent a table:<br />
<br />
CVTERM<br />
cvterm_id<br />
cv_id<br />
name<br />
definition<br />
dbxref_id<br />
<br />
Corresponding code:<br />
<br />
<perl><br />
package Chado::Cvterm;<br />
use base 'Chado::DBI';<br />
Chado::Cvterm->set_up_table('Cvterm');<br />
</perl><br />
<br />
=====Class::DBI - CRUD=====<br />
<br />
<perl><br />
## Create<br />
$term_dbobj = Chado::Cvterm->create({<br />
name => ”DUMMY TERM”,<br />
cv_id => 1,<br />
dbxref_id => 125<br />
});<br />
## Retrieve<br />
$term_dbobj = Chado::Cvterm->retrieve(2);<br />
<br />
## Update<br />
$term_dbobj->name( $term->name() );<br />
$term_dbobj->definition( $term->definition );<br />
<br />
## Delete<br />
$term_dbobj->delete();<br />
</perl><br />
<br />
=====Java - Hibernate=====<br />
<br />
* Hibernate maps Java Objects directly to database tables<br />
* Scalable<br />
* Works well for controlled Data model<br />
<br />
=====Java - iBatis=====<br />
<br />
* iBATIS maps Java Objects to the results of SQL Queries<br />
* XML definitions for queries<br />
* Queries and managing Maps<br />
* Transactions<br />
* Good fit for existing database schema<br />
<br />
=====Summary=====<br />
<br />
* ORM provides painless roundtrip of data between the application and database.<br />
* Reduces the amount of SQL code and allows a programmatic style interface to the RDBMS<br />
* Choice of ORM solution depends on the type of project<br />
* Flybase examined iBatis and Hibernate, both use XML configuration files<br />
** Hibernate is better if you're building schema from scratch<br />
** Both auto-configure given a schema. <br />
** Both have strengths and weaknesses.<br />
* Is Hibernate better when you're in the process of designing a schema?<br />
** Hibernate can assist you in making a ''Hibernate-compatible'' schema.<br />
<br />
===XORT===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/project/showfiles.php?group_id=27707<br />
* Language: Perl<br />
* Authors: Pinglei Zhou<br />
* Users: Flybase<br />
* Support: Pinglei Zhou<br />
* Third party code: Uses XML::Parser::PerlSAX, Unicode::String, XML::DOM, and DBI from Perl (CPAN)<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
=====Comparing Hibernate & XORT=====<br />
<br />
Flybase tried Hibernate, but just creating simple print() statements in the course of doing bulk operations they<br />
encountered performance issues. Therer are many caching parameters available in Hibernate but the problem is that Chado is recursive or cyclical. <br />
XORT does some simple, and automatic, caching. With XORT you can handle recursive or cyclical operations more easily. In common operations such as merging genes Chado users will encounter this issue routinely.<br />
<br />
Also see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]].<br />
<br />
====Limitations====<br />
<br />
* Database schemas need to follow certain rules<br />
** All must have internal int primary key<br />
** All must have unique key(s)<br />
* It may take a long path to retrieve certain type of data<br />
** Example: gene->allele->genotype->phenotype via feature_relationship<br />
* Structure not stored in memory, you flush out data as it goes<br />
<br />
====Presentation by Pinglei Zhou and Josh Goodman====<br />
<br />
[[XORT Presentation|XORT Presentation]]<br />
<br />
===Chado::AutoDBI===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/projects/gmod/<br />
* Language: Perl<br />
* Authors: Allen Day, Scott Cain, Brian O'Connor, & others<br />
* Users: <br />
* Support:<br />
* Third party code: Based on Class::DBI by Michael Schwern & Tony Bowden<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Performance<br />
** Can one read thousands of objects into memory? You could do this but it's not suited to bulk operations<br />
* Joins & complex queries<br />
<br />
<perl><br />
# Add the add_constructor for looking for name lengths<br />
<br />
__PACKAGE__ ->add_constructor(long_names => qq{ length(name) > 15 });<br />
<br />
# Custom SQL<br />
<br />
__PACKAGE__->set_sql(xfiles => qq{<br />
SELECT FEATURE_ID<br />
FROM FEATURE<br />
WHERE NAME = 'xfiles' });<br />
</perl><br />
<br />
====Presentation by Brian O'Connor====<br />
<br />
[[Chado::AutoDBI Presentation|Chado::AutoDBI Presentation]]<br />
<br />
===Modware===<br />
Modware seeks to provide an object-oriented perl interface to Chado to allow for rapid application development without worrying about the complex details of the schema on a day-to-day basis.<br />
<br />
====Background====<br />
<br />
* Source: http://gmod-ware.sourceforge.net<br />
* Language: Perl<br />
* Authors: Sohel Merchant, Eric Just<br />
* Contact: e-just [at] northwestern.edu<br />
* Users: DictyBase<br />
* Support: <br />
* Third party code: GMOD, BioPerl<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity: Uses the Chado::AutoDBI connection from GMOD. No connection configuration necessary since Modware is built on top of GMOD and the coonnection is configured on GMOD install. <br />
* Transaction support: Transactions are fully supported. The database handle is available as a singleton through Modware::DBH. To rollback at any time, simply insert '''new Modware::DBH->rollback()''' into script.<br />
* Code generation: No automatic code generation.<br />
<br />
====Special topics====<br />
<br />
* Features in Modware map to familiar Bioperl (Bio::Seq and Bio::SeqFeature) objects that can be accessed through a 'bioperl' method. <br />
Modware internally links Bio::SeqFeautre objects to the proper Bio::Seq objects to provide maximum functionality from the BioPerl libraries.<br />
* Modware's API documentation is complete and easily browsed at [http://gmod-ware.sourceforge.net/doc/ http://gmod-ware.sourceforge.net/doc/]<br />
* Plenty of other documentation (quick starts with simple use cases) at [http://gmod-ware.sourceforge.net http://gmod-ware.sourceforge.net]<br />
<br />
====Limitations====<br />
* Currently Alpha release (hoping for user feedback to create Beta)<br />
* Does not ''cover'' all of Chado<br />
* Not enough users to get quality feedback yet<br />
* Performance is slower by object-oriented nature<br />
* Language dependent<br />
<br />
====Presentation by Eric Just====<br />
<br />
[[Modware Presentation|Modware Presentation]]<br />
<br />
===GBrowse (DasI) Adaptor===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/project/showfiles.php?group_id=27707&package_id=34513&release_id=433523<br />
* Language: Perl<br />
* Authors: Scott Cain<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity: Connects ''via'' Perl's DBI<br />
* Transaction support: N/A (read only adapter)<br />
* Code generation: N/A<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Read-only<br />
* Not generic middleware but if you use Chado and GBrowse may be useful<br />
* Incomplete implementation of Bio::DasI; just enough to make GBrowse work<br />
* Also, despite the name, has never been tested with a Das server.<br />
<br />
====Presentation by Scott Cain====<br />
<br />
[[GBrowse (DasI) Presentation|GBrowse (DasI) Adaptor Presentation]]<br />
<br />
===iBatis and Abator===<br />
<br />
====Background====<br />
<br />
* Source: http://ibatis.apache.org/<br />
* Language: Java<br />
* Authors: Apache group<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Does not hide SQL<br />
* Does not create a whole object model of the database in memory<br />
* Not as widely used as Hibernate<br />
* No Perl version<br />
<br />
====Presentation by Jeff Bowes====<br />
<br />
[[iBatis Presentation|iBatis Presentation]]<br />
<br />
===Hibernate===<br />
<br />
====Background====<br />
<br />
* Source: http://www.hibernate.org<br />
* Language: Java<br />
* Authors: JBoss group<br />
* Users: VectorBase<br />
* Support: JBoss group<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Limitations====<br />
<br />
* Issue around Completeness <br />
* Exception Handling<br />
* Performance Tuning<br />
<br />
====Presentation by Robert Bruggner====<br />
<br />
[[Hibernate Presentation|Hibernate Presentation]]<br />
<br />
===PSU Chado Interface===<br />
<br />
====Background====<br />
<br />
* Source:<br />
* Language: Java<br />
* Authors: Chinmay Patel, Adrian Tivey, Tim Carver<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Limitations====<br />
<br />
* Hibernate Save: no equivalent to a ''find or save'' method.<br />
* Not great for bulk data retrieval.<br />
* Hibernate works best when applied to databases designed with objects in mind (''Object Oriented Databases''). <br />
<br />
<br />
====Presentation by Chinmay Patel====<br />
<br />
[[PSU Presentation|PSU Presentation]]<br />
<br />
===Wiki Authors===<br />
<br />
* Jeff Bowes, XenBase<br />
* Robert Bruggner, VectorBase<br />
* Scott Cain, GMOD<br />
* Josh Goodman, FlyBase<br />
* Eric Just, DictyBase<br />
* Sohel Merchant, DictyBase<br />
* Brian O'Connor, UCLA<br />
* Brian Osborne, GMOD<br />
* Chinmay Patel, GeneDB<br />
* Pinglei Zhou, FlyBase</div>165.124.152.78http://gmod.org/wiki/GMOD_MiddlewareGMOD Middleware2007-02-06T17:45:58Z<p>165.124.152.78: /* Modware */</p>
<hr />
<div>==Executive Summary==<br />
<br />
''The meeting participants examined two types of Object-Relational Mapping (ORM) tools.'' The Hibernate and iBatis tools will examine any relational schema and create middleware ''de novo'' whereas Modware and Chado::AutoDBI are hand-built to match GMOD's Chado relational schema.<br />
<br />
''A predictable API is desirable.'' The participants' view is that an API that is easily understood and reflects current models of biological data is the most desirable API. Currently the Modware and Chado::AutoDBI packages come closest to providing this comprehensibility.<br />
<br />
__TOC__<br />
<br clear='all'><br />
==Middleware for Chado databases==<br />
<br />
===Participants===<br />
<br />
On January 19 2007, representatives of the major model organism databases (MODs) and other interested parties met to discuss and compare Middleware packages used by developers working for the MODs. The workshop attendees were tasked to make specific recommendations. Such recommendations will focus the efforts of the MODs on specific packages and lead to code re-use and more feature-rich middleware.<br />
<br />
After an introductory presentation attendees listened to a series of presentations on the different Middleware packages. Each presenter described the features of the package, both positive and negative, and showed how the package would be used to address a specific set of test problems. The attendees reassembled in a general discussion group to discuss the presentations and to develop a consensus statement. This document represents the consensus of the workshop and shows the individual presentations.<br />
<br />
<br />
===Introduction===<br />
<br />
A group of some 50 GMOD developers gathered at the annual meeting to discuss middleware. This one day meeting had the following general goals:<br />
<br />
* To educate GMOD programmers on methods and practices for Middleware<br />
* To facilitate discussion on the best methods<br />
* To guide GMOD to a uniform Middleware layer<br />
* To generate this central reference document for Middleware projects, including:<br />
** Platform information<br />
** Strengths & weaknesses of different Middleware packages<br />
** Specific examples of how one would use a given middleware package<br />
<br />
One of the key characteristics of the GMOD software project is the variety of approaches and components that it supports. This applies to applications, database schemas, as well as to middleware, a software ''layer'' that mediates the exchange of data between the applications and the databases. Despite this diversity certain applications and schemas have emerged as key supported components in GMOD, such as the GBrowse application and the Chado schema, to name just two. However, a consensus view has not emerged with respect to middleware, and there are certainly a number of different middleware packages that have been used in the GMOD world, coming from within this world and from the larger world of open source.<br />
<br />
In late 2006 the GMOD developers took note of the large number of middleware packages in use and elected to embark on a short-term study to evaluate and compare these packages. The primary motivation here was to select or recommend certain packages over others specifically within the GMOD context. The assumption is that making such recommendations will serve to focus the developers' effort on a smaller number of packages. Clearly it's also assumed that such a focus will inevitably lead to greater support for and use of those recommended packages, and that all will GMOD will benefit.<br />
<br />
Another purpose of this study is to educate GMOD programmers on best practices concerning the use and development of middleware. It's expected that common agreement on these practices will lead to the development of more effective software as well as the best use of the software in practice. Finally, this study should generate a central reference document on these different middleware packages used in GMOD. This reference will contain platform- and language-specific information as well as descriptions of the strengths and weaknesses of the packages that can be used by GMOD developers when considering middleware.<br />
<br />
====General Evaluation Criteria====<br />
<br />
The GMOD developers proposed that each presenter provide some basic information about each middleware package, both general and technical. In addition each middleware application was asked to address a set of sample problems, shown below. These example problems are thought to typify some of the common functions that the scientist may need when working with their own database. It was understood that not all software would be able to handle all aspects of the sample problems and this demonstration was not intended to be ''live''.<br />
<br />
=====Problem 1=====<br />
<br />
Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned <code>feature_id</code> for each inserted gene.<br />
<br />
Note:<br />
<br />
* The coordinates are given in exact coordinates.<br />
* Use the <code>organism_id</code> for your organism<br />
* Store description in the chado table <code>featureprop</code><br />
* A sequence in fasta format (see [[FakeChromosome|Fake Chromosome]]) should be loaded as genomic sequence, either chromosome or contig -- this will be used as a <code>srcfeature</code> in <code>featureloc</code><br />
<br />
Gene descriptions:<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
mRNA Feature<br />
exon_1: <br />
start: 13691<br />
end: 13767<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
exon_2: <br />
start: 14687<br />
end: 14720<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
symbol: x-men<br />
synonyms: wolverine<br />
mRNA Feature<br />
exon_1: <br />
start: 12648<br />
end: 13136<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
symbol: x-ray<br />
synonyms: none<br />
exon: <br />
start: 1703<br />
end: 1900<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 2=====<br />
<br />
Retrieve and print the following report for gene '''xfile''' (the coding sequence and exon coordinates are derived from the associated mRNA feature). The results should resemble the following:<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>xfile cds <br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
=====Problem 3=====<br />
<br />
Update the gene '''xfile''': change the name symbol to '''x-file''' and retrieve the changed record. Regenerate the report from Problem 1. The results should resemble the following:<br />
<br />
symbol: x-file<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>x-file cds<br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
=====Problem 4=====<br />
<br />
Search for all genes with symbols starting with ''x-''. With the results produce the following simple result list<br />
(organism will vary):<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
1325 x-ray Xenopus laevis<br />
<br />
=====Problem 5=====<br />
<br />
Delete the gene '''x-ray''' using the <code>geneId</code>. Run the search and report in Problem 4 again to show the delete has taken<br />
place, with a result resembling the following:<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
<br />
=====Problem Results=====<br />
<br />
All presenters paid attention to the assigned problems and all packages could perform the required operations, except for the GBrowse (DasI) Adaptor which is read-only software. Clearly one can see many differences between packages in how the the problems were solved, please see the presentations themselves for these details.<br />
<br />
===The Middleware Packages===<br />
<br />
The one day meeting heard presentations from developers using both Perl and Java middleware and a number of satisfactory solutions were described. The focus in all cases was some sort of system that connected to the Chado relational database. Although other databases are encountered in the GMOD world (e.g. [http://biosql.org BioSQL]) the Chado schema is popular and serves as a good test schema for this exercise given its complexity. The primary focus in the talks was on functionality from the perspective of writing code and extending the software and less attention was given to performance. Each presenter focussed on their middleware and little side-by-side comparisons were made (for one comparison please see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]] by Josh Goodman, written before this meeting).<br />
<br />
The Chado::AutoDBI middleware package is built on Perl's Class:DBI, a module that can be considered either an ORM tool or a tool to build ORM tools. Modware is built on top of Chado::AutoDBI. Both Chado::AutoDBI and Modware are built specifically for the Chado schema and aren't general tools. <br />
<br />
A key difference is that Modware uses Bioperl as its programmatic interface whereas the Chado::AutoDBI API resembles that of Class::DBI. Furthermore Chado::AutoDBI wraps the entire Chado schema where Modware objects only address those parts of the Chado schema that map to Bioperl objects. Neither of these packages require configuration, they are pre-configured for Chado and all one needs to do is connect to an existing schema. This schema, in theory, could be running on any popular RDBMS (Postgres, Mysql, Oracle, etc.), this flexibility is built into Class::DBI.<br />
<br />
The Java packages, Hibernate and iBatis, are far more general than the 2 packages discussed above and one could use them with any relational schema. One distinguishing feature is the languages that these packages use: Hibernate tends to present more Java to the programmer whereas iBatis presents both Java and XML. Both allow you to address the schema with SQL should you choose to do so. Hibernate also introduces its own language, HQL, a Java-SQL hybrid. Both would need some configuration to connect to the Chado schema but this should not be considered a significant barrier. Both packages can be used with any popular RDBMS (Postgres, Mysql, Oracle, etc.).<br />
<br />
XORT is not a typical ORM tool like these other packages but has been included here because of its utility in bulk operations. This capability is one that the ORM tools are thought not to do very well as the serial construction and destruction of objects is typically not fast, and constructing very large numbers of objects simultaneously consumes quite a bit of memory. This is not to say that one shouldn't use an ORM tool for bulk operations but that one should test the tool in question and not assume its performance is adequate to the given task. Such a test may show an entirely different approach, like that of XORT, is more appropriate.<br />
<br />
====Java Middleware====<br />
<br />
The Java packages all used Java plus XML, to some degree. In addition iBatis exposes SQL to the developer and it was argued that this could be viewed either as an advantage (allows tuning of underlying SQL) or a disadvantage. Both Hibernate and iBatis are designed to operate with any relational schema, not just Chado. <br />
<br />
Both Abator and iBatis share one limitation which is that one can not <br />
auto-generate higher level objects, such as Genes. In fairness one must add that this would not really <br />
be possible to do accurately for any system. There is not sufficient <br />
information present in the database schema to derive all of the <br />
relationships necessary. In both packages the auto-generation code only mimics the schema and it would have take additional work to get <br />
a package to automatically create and relate subsets of data based on "join <br />
tables" like feature_relationship property. This mostly has to do <br />
with the circular nature of the schema.<br />
<br />
=====Hibernate=====<br />
<br />
From the [http://www.hibernate.org Hibernate Web site]:<br />
<br />
''Hibernate lets you develop persistent classes following object-oriented idiom - including association, inheritance, polymorphism, composition, and collections. Hibernate allows you to express queries in its own portable SQL extension (HQL), as well as in native SQL, or with an object-oriented Criteria and Example API. Unlike many other persistence solutions, Hibernate does not hide the power of SQL from you and guarantees that your investment in relational technology and knowledge is as valid as always.''<br />
<br />
Hibernate is thought to work best when used in conjunction with a schema that's been designed with objects in mind, an ''object-oriented schema''.<br />
<br />
======Abstraction======<br />
<br />
Hibernate is the more abstracted of the 2 Java packages, it allows you to work with the relational database with the least exposure to SQL if you choose to do this. It is probably considered the more flexible of the 2 with respect to language since one can program in Java or HBL (Hibernate Query Language), a hybrid between SQL and Java.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons between Hibernate and iBatis were made.<br />
<br />
======Configuration======<br />
<br />
Hibernate has the ability to read a database schema and generate Java <br />
Code representing the database objects or tables. In this exercise the developer chose not to use <br />
this method because one must create a file telling the code auto-generation what to use for indexes, which fields should be complex <br />
objects, and so on. It was judged to be less work to just go ahead and make <br />
the code by hand.<br />
<br />
For each object mapped to the database, one can create an XML file <br />
that tells Hibernate which database fields to map to specific object <br />
properties. Additionally, one can implement equals() and hashcode() <br />
functions for each object based on the table unique constraints. <br />
After doing that, Hibernate takes care of all the Create-Retreive-Update-Delete (CRUD) and transaction <br />
functionality.<br />
<br />
======Documentation======<br />
<br />
Hibernate is a popular and well-supported tool with extensive documentation.<br />
<br />
=====iBatis=====<br />
<br />
From the [http://ibatis.apache.org iBatis Web site]:<br />
<br />
''The Data Mapper framework (a.k.a. SQL Maps) will help to significantly reduce the amount of Java and .NET code that is normally neededto access a relational database. This framework maps classes to SQL statements using a very simple XML descriptor. Simplicity is the biggest advantage of iBATIS over other frameworks and object relational mapping tools. To use iBATIS you need only be familiar with your own application domain objects (basic JavaBeans or .NET classes), XML, and SQL. There is very little else to learn. There is no complex scheme required to join tables or execute complex queries. Using iBATIS you have the full power of real SQL at your fingertips. The iBATIS Data Mapper framework can map nearly any database to any object model and is very tolerant of legacy designs, or even bad designs. This is all achieved without special database tables, peer objects or code generation.''<br />
<br />
======Abstraction======<br />
<br />
iBatis does not attempt to achieve abstraction in the way that other Java ORM tools do and assumes that viewing SQL is an advantage, not a disadvantage.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons made between iBatis and Hibernate.<br />
<br />
======Configuration======<br />
<br />
The following general steps are performed:<br />
<br />
# Create an XML file of tables the developer wants to create objects for <br />
#* Including specifying the method for retrieving auto-generated sequence values from inserts<br />
#* This file could als be easily auto-generated<br />
# Make some type adjustments for some variables <br />
#* For example, iBatis mapped some things to Java strings that should have been Integers<br />
<br />
======Documentation======<br />
<br />
iBatis is a popular and well-supported tool with extensive documentation.<br />
<br />
====Perl Middleware====<br />
<br />
The Perl approaches used only the Perl language (the Java packages all used Java plus XML, to some degree). The XORT application is not, strictly speaking, ''middleware'' but has proven to be very useful in bulk operations using the Chado schema and Chado XML though in principle it can be used with any relational schema.<br />
<br />
=====Chado::AutoDBI=====<br />
<br />
======Abstraction======<br />
<br />
Chado::AutoDBI objects map directly to the Chado tables so it could be said that Chado::AutoDBI is as abstract as Chado itself. Therefore one needs to become somewhat familiar with Chado itself in order to use Chado::AutoDBI.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons of performance were done using Perl middleware. All packages were deemed to give adequate performance when used to connect UIs to underlying databases. On the other hand the presenters were reluctant to recommend their packages for ''bulk'' operations. <br />
<br />
======Configuration======<br />
<br />
Chado::AutoDBI requires no configuration, it is designed to interact with the Chado schema out-of-the-box. If the schema changes then the underlying Class::DBI should be able to adjust to the change.<br />
<br />
======Documentation======<br />
<br />
=====Modware=====<br />
<br />
======Abstraction======<br />
<br />
Modware has higher level of abstraction than that provided by Chado::AutoDBI. The critical point is that it uses the [http://bioperl.org Bioperl] objects as accessors, this could either be highly suitable or not at all appropriate for a given environment.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons of performance were done using Perl middleware. All packages were deemed to give adequate performance when used to connect UIs to underlying databases.<br />
<br />
======Configuration======<br />
<br />
Modware requires no configuration, it is designed to interact with the Chado schema out-of-the-box. If the schema changes then the underlying Chado::AutoDBI should be able to adjust to the change.<br />
<br />
======Documentation======<br />
<br />
Bioperl-style documentation at http://gmod-ware.sourceforge.net/doc/, written for POD for all methods.<br />
<br />
=====GBrowse (DasI)=====<br />
<br />
======Abstraction======<br />
<br />
======Performance======<br />
<br />
======Configuration======<br />
<br />
This package needs no configuration, it is pre-configured for Chado.<br />
<br />
======Documentation======<br />
<br />
The DasI interface is well-documented, about a dozen methods and three classes, all documented.<br />
<br />
===Getting More Information===<br />
<br />
The issues around using and developing middleware are of general interest in GMOD. If you have questions about middleware we suggest that you contact the GMOD Development list rather than contacting individual developers. You can sign up for the list here:<br />
<br />
https://lists.sourceforge.net/lists/listinfo/gmod-devel<br />
<br />
<br />
===Object-Relational Mapping Principles===<br />
<br />
====Presentation by Sohel Merchant====<br />
<br />
Sohel Merchant, Bioinformatics Software Engineer at dictyBase, Center for Genetic Medicine, Northwestern University, Chicago. This Wiki section is an edited version of [[Media:Object_Relational_Mapping_Layer.pdf|Sohel's presentation]].<br />
<br />
=====Outline=====<br />
<br />
* The Problem<br />
* Solutions<br />
* ORM<br />
* Perl – Class::DBI<br />
* Summary<br />
<br />
=====The Problem=====<br />
<br />
* Developers need to perform Create, Retrieve, Update, Delete (aka CRUD) operations on data inside an application.<br />
* The real world objects represented using a programming language needs to be stored in databases<br />
* Using relational databases to store object-oriented data leads to a semantic gap<br />
* RDBMS have fixed types, but OO can have more complicated user defined types.<br />
<br />
=====Solutions=====<br />
<br />
* Data Access Object (DAO)<br />
* Developer writes a class which contains one attribute for each field in the table<br />
* Methods for CRUD typically contains JDBC/DBI code with the necessary SQL statements.<br />
* Object Relational Mapping (ORM), WikiPedia:<br />
** “ORM is a programming technique that links databases to object-oriented language concepts, creating (in effect) a virtual object database.“<br />
* Developer needs to configure the ORM<br />
* Less amount of manual coding<br />
* CRUD methods are automatically generated by the ORM layer<br />
<br />
=====ORM=====<br />
<br />
ORM solutions<br />
<br />
* Perl<br />
** Class::DBI<br />
* Java<br />
** EJB<br />
** Hibernate<br />
** JDO<br />
** iBatis<br />
<br />
=====Perl - Class::DBI=====<br />
<br />
* Provides a simple interfaces for wrapping Perl classes around a database tables<br />
* Tables are mapped directly to objects<br />
* The table column name are mapped to the get/set methods<br />
* Can be used with transactions<br />
<br />
=====Class::DBI=====<br />
<br />
Defining a class in Class::DBI to represent a table:<br />
<br />
CVTERM<br />
cvterm_id<br />
cv_id<br />
name<br />
definition<br />
dbxref_id<br />
<br />
Corresponding code:<br />
<br />
<perl><br />
package Chado::Cvterm;<br />
use base 'Chado::DBI';<br />
Chado::Cvterm->set_up_table('Cvterm');<br />
</perl><br />
<br />
=====Class::DBI - CRUD=====<br />
<br />
<perl><br />
## Create<br />
$term_dbobj = Chado::Cvterm->create({<br />
name => ”DUMMY TERM”,<br />
cv_id => 1,<br />
dbxref_id => 125<br />
});<br />
## Retrieve<br />
$term_dbobj = Chado::Cvterm->retrieve(2);<br />
<br />
## Update<br />
$term_dbobj->name( $term->name() );<br />
$term_dbobj->definition( $term->definition );<br />
<br />
## Delete<br />
$term_dbobj->delete();<br />
</perl><br />
<br />
=====Java - Hibernate=====<br />
<br />
* Hibernate maps Java Objects directly to database tables<br />
* Scalable<br />
* Works well for controlled Data model<br />
<br />
=====Java - iBatis=====<br />
<br />
* iBATIS maps Java Objects to the results of SQL Queries<br />
* XML definitions for queries<br />
* Queries and managing Maps<br />
* Transactions<br />
* Good fit for existing database schema<br />
<br />
=====Summary=====<br />
<br />
* ORM provides painless roundtrip of data between the application and database.<br />
* Reduces the amount of SQL code and allows a programmatic style interface to the RDBMS<br />
* Choice of ORM solution depends on the type of project<br />
* Flybase examined iBatis and Hibernate, both use XML configuration files<br />
** Hibernate is better if you're building schema from scratch<br />
** Both auto-configure given a schema. <br />
** Both have strengths and weaknesses.<br />
* Is Hibernate better when you're in the process of designing a schema?<br />
** Hibernate can assist you in making a ''Hibernate-compatible'' schema.<br />
<br />
===XORT===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/project/showfiles.php?group_id=27707<br />
* Language: Perl<br />
* Authors: Pinglei Zhou<br />
* Users: Flybase<br />
* Support: Pinglei Zhou<br />
* Third party code: Uses XML::Parser::PerlSAX, Unicode::String, XML::DOM, and DBI from Perl (CPAN)<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
=====Comparing Hibernate & XORT=====<br />
<br />
Flybase tried Hibernate, but just creating simple print() statements in the course of doing bulk operations they<br />
encountered performance issues. Therer are many caching parameters available in Hibernate but the problem is that Chado is recursive or cyclical. <br />
XORT does some simple, and automatic, caching. With XORT you can handle recursive or cyclical operations more easily. In common operations such as merging genes Chado users will encounter this issue routinely.<br />
<br />
Also see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]].<br />
<br />
====Limitations====<br />
<br />
* Database schemas need to follow certain rules<br />
** All must have internal int primary key<br />
** All must have unique key(s)<br />
* It may take a long path to retrieve certain type of data<br />
** Example: gene->allele->genotype->phenotype via feature_relationship<br />
* Structure not stored in memory, you flush out data as it goes<br />
<br />
====Presentation by Pinglei Zhou and Josh Goodman====<br />
<br />
[[XORT Presentation|XORT Presentation]]<br />
<br />
===Chado::AutoDBI===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/projects/gmod/<br />
* Language: Perl<br />
* Authors: Allen Day, Scott Cain, Brian O'Connor, & others<br />
* Users: <br />
* Support:<br />
* Third party code: Based on Class::DBI by Michael Schwern & Tony Bowden<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Performance<br />
** Can one read thousands of objects into memory? You could do this but it's not suited to bulk operations<br />
* Joins & complex queries<br />
<br />
<perl><br />
# Add the add_constructor for looking for name lengths<br />
<br />
__PACKAGE__ ->add_constructor(long_names => qq{ length(name) > 15 });<br />
<br />
# Custom SQL<br />
<br />
__PACKAGE__->set_sql(xfiles => qq{<br />
SELECT FEATURE_ID<br />
FROM FEATURE<br />
WHERE NAME = 'xfiles' });<br />
</perl><br />
<br />
====Presentation by Brian O'Connor====<br />
<br />
[[Chado::AutoDBI Presentation|Chado::AutoDBI Presentation]]<br />
<br />
===Modware===<br />
Modware seeks to provide an object-oriented perl interface to Chado to allow for rapid application development without worrying about the complex details of the schema on a day-to-day basis.<br />
<br />
====Background====<br />
<br />
* Source: http://gmod-ware.sourceforge.net<br />
* Language: Perl<br />
* Authors: Sohel Merchant, Eric Just<br />
* Contact: e-just [at] northwestern.edu<br />
* Users: DictyBase<br />
* Support: <br />
* Third party code: GMOD, BioPerl<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity: Uses the Chado::AutoDBI connection from GMOD. No connection configuration necessary since Modware is built on top of GMOD and the coonnection is configured on GMOD install. <br />
* Transaction support: Transactions are fully supported. The database handle is available as a singleton through Modware::DBH. To rollback at any time, simply insert '''new Modware::DBH->rollback()''' into script.<br />
* Code generation: No automatic code generation.<br />
<br />
====Special topics====<br />
<br />
* Features in Modware map to familiar Bioperl (Bio::Seq and Bio::SeqFeature) objects that can be accessed through a 'bioperl' method. <br />
Modware internally links Bio::SeqFeautre objects to the proper Bio::Seq objects to provide maximum functionality from the BioPerl libraries.<br />
* Modware's API documentation is complete and easily browsed at [http://gmod-ware.sourceforge.net/doc/ http://gmod-ware.sourceforge.net/doc/]<br />
* Plenty of other documentation (quick starts with simple use cases) at [http://gmod-ware.sourceforge.net http://gmod-ware.sourceforge.net]<br />
<br />
====Limitations====<br />
* Currently Alpha release (hoping for user feedback to created Beta)<br />
* Does not ''cover'' all of Chado<br />
* Not enough users to get quality feedback yet<br />
* Performance is slower by object-oriented nature<br />
* Language dependent<br />
<br />
====Presentation by Eric Just====<br />
<br />
[[Modware Presentation|Modware Presentation]]<br />
<br />
===GBrowse (DasI) Adaptor===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/project/showfiles.php?group_id=27707&package_id=34513&release_id=433523<br />
* Language: Perl<br />
* Authors: Scott Cain<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity: Connects ''via'' Perl's DBI<br />
* Transaction support: N/A (read only adapter)<br />
* Code generation: N/A<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Read-only<br />
* Not generic middleware but if you use Chado and GBrowse may be useful<br />
* Incomplete implementation of Bio::DasI; just enough to make GBrowse work<br />
* Also, despite the name, has never been tested with a Das server.<br />
<br />
====Presentation by Scott Cain====<br />
<br />
[[GBrowse (DasI) Presentation|GBrowse (DasI) Adaptor Presentation]]<br />
<br />
===iBatis and Abator===<br />
<br />
====Background====<br />
<br />
* Source: http://ibatis.apache.org/<br />
* Language: Java<br />
* Authors: Apache group<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Does not hide SQL<br />
* Does not create a whole object model of the database in memory<br />
* Not as widely used as Hibernate<br />
* No Perl version<br />
<br />
====Presentation by Jeff Bowes====<br />
<br />
[[iBatis Presentation|iBatis Presentation]]<br />
<br />
===Hibernate===<br />
<br />
====Background====<br />
<br />
* Source: http://www.hibernate.org<br />
* Language: Java<br />
* Authors: JBoss group<br />
* Users: VectorBase<br />
* Support: JBoss group<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Limitations====<br />
<br />
* Issue around Completeness <br />
* Exception Handling<br />
* Performance Tuning<br />
<br />
====Presentation by Robert Bruggner====<br />
<br />
[[Hibernate Presentation|Hibernate Presentation]]<br />
<br />
===PSU Chado Interface===<br />
<br />
====Background====<br />
<br />
* Source:<br />
* Language: Java<br />
* Authors: Chinmay Patel, Adrian Tivey, Tim Carver<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Limitations====<br />
<br />
* Hibernate Save: no equivalent to a ''find or save'' method.<br />
* Not great for bulk data retrieval.<br />
* Hibernate works best when applied to databases designed with objects in mind (''Object Oriented Databases''). <br />
<br />
<br />
====Presentation by Chinmay Patel====<br />
<br />
[[PSU Presentation|PSU Presentation]]<br />
<br />
===Wiki Authors===<br />
<br />
* Jeff Bowes, XenBase<br />
* Robert Bruggner, VectorBase<br />
* Scott Cain, GMOD<br />
* Josh Goodman, FlyBase<br />
* Eric Just, DictyBase<br />
* Sohel Merchant, DictyBase<br />
* Brian O'Connor, UCLA<br />
* Brian Osborne, GMOD<br />
* Chinmay Patel, GeneDB<br />
* Pinglei Zhou, FlyBase</div>165.124.152.78http://gmod.org/wiki/GMOD_MiddlewareGMOD Middleware2007-02-06T17:40:10Z<p>165.124.152.78: /* Special topics */</p>
<hr />
<div><big>Executive Summary</big><br />
<br />
''The meeting participants examined two types of Object-Relational Mapping (ORM) tools.'' The Hibernate and iBatis tools will examine any relational schema and create middleware ''de novo'' whereas Modware and Chado::AutoDBI are hand-built to match GMOD's Chado relational schema.<br />
<br />
''A predictable API is desirable.'' The participants' view is that an API that is easily understood and reflects current models of biological data is the most desirable API. Currently the Modware and Chado::AutoDBI packages come closest to providing this comprehensibility.<br />
<br />
<br />
==Middleware for Chado databases==<br />
<br />
===Participants===<br />
<br />
On January 19 2007, representatives of the major model organism databases (MODs) and other interested parties met to discuss and compare Middleware packages used by developers working for the MODs. The workshop attendees were tasked to make specific recommendations. Such recommendations will focus the efforts of the MODs on specific packages and lead to code re-use and more feature-rich middleware.<br />
<br />
After an introductory presentation attendees listened to a series of presentations on the different Middleware packages. Each presenter described the features of the package, both positive and negative, and showed how the package would be used to address a specific set of test problems. The attendees reassembled in a general discussion group to discuss the presentations and to develop a consensus statement. This document represents the consensus of the workshop and shows the individual presentations.<br />
<br />
<br />
===Introduction===<br />
<br />
A group of some 50 GMOD developers gathered at the annual meeting to discuss middleware. This one day meeting had the following general goals:<br />
<br />
* To educate GMOD programmers on methods and practices for Middleware<br />
* To facilitate discussion on the best methods<br />
* To guide GMOD to a uniform Middleware layer<br />
* To generate this central reference document for Middleware projects, including:<br />
** Platform information<br />
** Strengths & weaknesses of different Middleware packages<br />
** Specific examples of how one would use a given middleware package<br />
<br />
One of the key characteristics of the GMOD software project is the variety of approaches and components that it supports. This applies to applications, database schemas, as well as to middleware, a software ''layer'' that mediates the exchange of data between the applications and the databases. Despite this diversity certain applications and schemas have emerged as key supported components in GMOD, such as the GBrowse application and the Chado schema, to name just two. However, a consensus view has not emerged with respect to middleware, and there are certainly a number of different middleware packages that have been used in the GMOD world, coming from within this world and from the larger world of open source.<br />
<br />
In late 2006 the GMOD developers took note of the large number of middleware packages in use and elected to embark on a short-term study to evaluate and compare these packages. The primary motivation here was to select or recommend certain packages over others specifically within the GMOD context. The assumption is that making such recommendations will serve to focus the developers' effort on a smaller number of packages. Clearly it's also assumed that such a focus will inevitably lead to greater support for and use of those recommended packages, and that all will GMOD will benefit.<br />
<br />
Another purpose of this study is to educate GMOD programmers on best practices concerning the use and development of middleware. It's expected that common agreement on these practices will lead to the development of more effective software as well as the best use of the software in practice. Finally, this study should generate a central reference document on these different middleware packages used in GMOD. This reference will contain platform- and language-specific information as well as descriptions of the strengths and weaknesses of the packages that can be used by GMOD developers when considering middleware.<br />
<br />
====General Evaluation Criteria====<br />
<br />
The GMOD developers proposed that each presenter provide some basic information about each middleware package, both general and technical. In addition each middleware application was asked to address a set of sample problems, shown below. These example problems are thought to typify some of the common functions that the scientist may need when working with their own database. It was understood that not all software would be able to handle all aspects of the sample problems and this demonstration was not intended to be ''live''.<br />
<br />
=====Problem 1=====<br />
<br />
Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned <code>feature_id</code> for each inserted gene.<br />
<br />
Note:<br />
<br />
* The coordinates are given in exact coordinates.<br />
* Use the <code>organism_id</code> for your organism<br />
* Store description in the chado table <code>featureprop</code><br />
* A sequence in fasta format (see [[FakeChromosome|Fake Chromosome]]) should be loaded as genomic sequence, either chromosome or contig -- this will be used as a <code>srcfeature</code> in <code>featureloc</code><br />
<br />
Gene descriptions:<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
mRNA Feature<br />
exon_1: <br />
start: 13691<br />
end: 13767<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
exon_2: <br />
start: 14687<br />
end: 14720<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
symbol: x-men<br />
synonyms: wolverine<br />
mRNA Feature<br />
exon_1: <br />
start: 12648<br />
end: 13136<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
symbol: x-ray<br />
synonyms: none<br />
exon: <br />
start: 1703<br />
end: 1900<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 2=====<br />
<br />
Retrieve and print the following report for gene '''xfile''' (the coding sequence and exon coordinates are derived from the associated mRNA feature). The results should resemble the following:<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>xfile cds <br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
=====Problem 3=====<br />
<br />
Update the gene '''xfile''': change the name symbol to '''x-file''' and retrieve the changed record. Regenerate the report from Problem 1. The results should resemble the following:<br />
<br />
symbol: x-file<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>x-file cds<br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
=====Problem 4=====<br />
<br />
Search for all genes with symbols starting with ''x-''. With the results produce the following simple result list<br />
(organism will vary):<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
1325 x-ray Xenopus laevis<br />
<br />
=====Problem 5=====<br />
<br />
Delete the gene '''x-ray''' using the <code>geneId</code>. Run the search and report in Problem 4 again to show the delete has taken<br />
place, with a result resembling the following:<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
<br />
=====Problem Results=====<br />
<br />
All presenters paid attention to the assigned problems and all packages could perform the required operations, except for the GBrowse (DasI) Adaptor which is read-only software. Clearly one can see many differences between packages in how the the problems were solved, please see the presentations themselves for these details.<br />
<br />
===The Middleware Packages===<br />
<br />
The one day meeting heard presentations from developers using both Perl and Java middleware and a number of satisfactory solutions were described. The focus in all cases was some sort of system that connected to the Chado relational database. Although other databases are encountered in the GMOD world (e.g. [http://biosql.org BioSQL]) the Chado schema is popular and serves as a good test schema for this exercise given its complexity. The primary focus in the talks was on functionality from the perspective of writing code and extending the software and less attention was given to performance. Each presenter focussed on their middleware and little side-by-side comparisons were made (for one comparison please see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]] by Josh Goodman, written before this meeting).<br />
<br />
The Chado::AutoDBI middleware package is built on Perl's Class:DBI, a module that can be considered either an ORM tool or a tool to build ORM tools. Modware is built on top of Chado::AutoDBI. Both Chado::AutoDBI and Modware are built specifically for the Chado schema and aren't general tools. <br />
<br />
A key difference is that Modware uses Bioperl as its programmatic interface whereas the Chado::AutoDBI API resembles that of Class::DBI. Furthermore Chado::AutoDBI wraps the entire Chado schema where Modware objects only address those parts of the Chado schema that map to Bioperl objects. Neither of these packages require configuration, they are pre-configured for Chado and all one needs to do is connect to an existing schema. This schema, in theory, could be running on any popular RDBMS (Postgres, Mysql, Oracle, etc.), this flexibility is built into Class::DBI.<br />
<br />
The Java packages, Hibernate and iBatis, are far more general than the 2 packages discussed above and one could use them with any relational schema. One distinguishing feature is the languages that these packages use: Hibernate tends to present more Java to the programmer whereas iBatis presents both Java and XML. Both allow you to address the schema with SQL should you choose to do so. Hibernate also introduces its own language, HQL, a Java-SQL hybrid. Both would need some configuration to connect to the Chado schema but this should not be considered a significant barrier. Both packages can be used with any popular RDBMS (Postgres, Mysql, Oracle, etc.).<br />
<br />
XORT is not a typical ORM tool like these other packages but has been included here because of its utility in bulk operations. This capability is one that the ORM tools are thought not to do very well as the serial construction and destruction of objects is typically not fast, and constructing very large numbers of objects simultaneously consumes quite a bit of memory. This is not to say that one shouldn't use an ORM tool for bulk operations but that one should test the tool in question and not assume its performance is adequate to the given task. Such a test may show an entirely different approach, like that of XORT, is more appropriate.<br />
<br />
====Java Middleware====<br />
<br />
The Java packages all used Java plus XML, to some degree. In addition iBatis exposes SQL to the developer and it was argued that this could be viewed either as an advantage (allows tuning of underlying SQL) or a disadvantage. Both Hibernate and iBatis are designed to operate with any relational schema, not just Chado. <br />
<br />
=====Hibernate=====<br />
<br />
From the [http://www.hibernate.org Hibernate Web site]:<br />
<br />
''Hibernate lets you develop persistent classes following object-oriented idiom - including association, inheritance, polymorphism, composition, and collections. Hibernate allows you to express queries in its own portable SQL extension (HQL), as well as in native SQL, or with an object-oriented Criteria and Example API. Unlike many other persistence solutions, Hibernate does not hide the power of SQL from you and guarantees that your investment in relational technology and knowledge is as valid as always.''<br />
<br />
Hibernate is thought to work best when used in conjunction with a schema that's been designed with objects in mind, an ''object-oriented schema''.<br />
<br />
======Abstraction======<br />
<br />
Hibernate is the more abstracted of the 2 Java packages, it allows you to work with the relational database with the least exposure to SQL if you choose to do this. It is probably considered the more flexible of the 2 with respect to language since one can program in Java or HBL (Hibernate Query Language), a hybrid between SQL and Java.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons between Hibernate and iBatis were made.<br />
<br />
======Configuration======<br />
<br />
<br />
======Documentation======<br />
<br />
Hibernate is a popular and well-supported tool with extensive documentation.<br />
<br />
=====iBatis=====<br />
<br />
From the [http://ibatis.apache.org iBatis Web site]:<br />
<br />
''The Data Mapper framework (a.k.a. SQL Maps) will help to significantly reduce the amount of Java and .NET code that is normally neededto access a relational database. This framework maps classes to SQL statements using a very simple XML descriptor. Simplicity is the biggest advantage of iBATIS over other frameworks and object relational mapping tools. To use iBATIS you need only be familiar with your own application domain objects (basic JavaBeans or .NET classes), XML, and SQL. There is very little else to learn. There is no complex scheme required to join tables or execute complex queries. Using iBATIS you have the full power of real SQL at your fingertips. The iBATIS Data Mapper framework can map nearly any database to any object model and is very tolerant of legacy designs, or even bad designs. This is all achieved without special database tables, peer objects or code generation.''<br />
<br />
======Abstraction======<br />
<br />
iBatis does not attempt to achieve abstraction in the way that other Java ORM tools do and assumes that viewing SQL is an advantage, not a disadvantage.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons made between iBatis and Hibernate.<br />
<br />
======Configuration======<br />
<br />
======Documentation======<br />
<br />
iBatis is a popular and well-supported tool with extensive documentation.<br />
<br />
====Perl Middleware====<br />
<br />
The Perl approaches used only the Perl language (the Java packages all used Java plus XML, to some degree). The XORT application is not, strictly speaking, ''middleware'' but has proven to be very useful in bulk operations using the Chado schema and Chado XML though in principle it can be used with any relational schema.<br />
<br />
=====Chado::AutoDBI=====<br />
<br />
======Abstraction======<br />
<br />
Chado::AutoDBI objects map directly to the Chado tables so it could be said that Chado::AutoDBI is as abstract as Chado itself. Therefore one needs to become somewhat familiar with Chado itself in order to use Chado::AutoDBI.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons of performance were done using Perl middleware. All packages were deemed to give adequate performance when used to connect UIs to underlying databases. On the other hand the presenters were reluctant to recommend their packages for ''bulk'' operations. <br />
<br />
======Configuration======<br />
<br />
Chado::AutoDBI requires no configuration, it is designed to interact with the Chado schema out-of-the-box. If the schema changes then the underlying Class::DBI should be able to adjust to the change.<br />
<br />
======Documentation======<br />
<br />
=====Modware=====<br />
<br />
======Abstraction======<br />
<br />
Modware has higher level of abstraction than that provided by Chado::AutoDBI. The critical point is that it uses the [http://bioperl.org Bioperl] objects as accessors, this could either be highly suitable or not at all appropriate for a given environment.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons of performance were done using Perl middleware. All packages were deemed to give adequate performance when used to connect UIs to underlying databases.<br />
<br />
======Configuration======<br />
<br />
Modware requires no configuration, it is designed to interact with the Chado schema out-of-the-box. If the schema changes then the underlying Chado::AutoDBI should be able to adjust to the change.<br />
<br />
======Documentation======<br />
<br />
Bioperl-style documentation at http://gmod-ware.sourceforge.net/doc/, written for POD for all methods.<br />
<br />
=====GBrowse (DasI)=====<br />
<br />
======Abstraction======<br />
<br />
======Performance======<br />
<br />
======Configuration======<br />
<br />
This package needs no configuration, it is pre-configured for Chado.<br />
<br />
======Documentation======<br />
<br />
The DasI interface is well-documented, about a dozen methods and three classes, all documented.<br />
<br />
===Getting More Information===<br />
<br />
The issues around using and developing middleware are of general interest in GMOD. If you have questions about middleware we suggest that you contact the GMOD Development list rather than contacting individual developers. You can sign up for the list here:<br />
<br />
https://lists.sourceforge.net/lists/listinfo/gmod-devel<br />
<br />
<br />
===Object-Relational Mapping Principles===<br />
<br />
====Presentation by Sohel Merchant====<br />
<br />
Sohel Merchant, Bioinformatics Software Engineer at dictyBase, Center for Genetic Medicine, Northwestern University, Chicago. This Wiki section is an edited version of [[Media:Object_Relational_Mapping_Layer.pdf|Sohel's presentation]].<br />
<br />
=====Outline=====<br />
<br />
* The Problem<br />
* Solutions<br />
* ORM<br />
* Perl – Class::DBI<br />
* Summary<br />
<br />
=====The Problem=====<br />
<br />
* Developers need to perform Create, Retrieve, Update, Delete (aka CRUD) operations on data inside an application.<br />
* The real world objects represented using a programming language needs to be stored in databases<br />
* Using relational databases to store object-oriented data leads to a semantic gap<br />
* RDBMS have fixed types, but OO can have more complicated user defined types.<br />
<br />
=====Solutions=====<br />
<br />
* Data Access Object (DAO)<br />
* Developer writes a class which contains one attribute for each field in the table<br />
* Methods for CRUD typically contains JDBC/DBI code with the necessary SQL statements.<br />
* Object Relational Mapping (ORM), WikiPedia:<br />
** “ORM is a programming technique that links databases to object-oriented language concepts, creating (in effect) a virtual object database.“<br />
* Developer needs to configure the ORM<br />
* Less amount of manual coding<br />
* CRUD methods are automatically generated by the ORM layer<br />
<br />
=====ORM=====<br />
<br />
ORM solutions<br />
<br />
* Perl<br />
** Class::DBI<br />
* Java<br />
** EJB<br />
** Hibernate<br />
** JDO<br />
** iBatis<br />
<br />
=====Perl - Class::DBI=====<br />
<br />
* Provides a simple interfaces for wrapping Perl classes around a database tables<br />
* Tables are mapped directly to objects<br />
* The table column name are mapped to the get/set methods<br />
* Can be used with transactions<br />
<br />
=====Class::DBI=====<br />
<br />
Defining a class in Class::DBI to represent a table:<br />
<br />
CVTERM<br />
cvterm_id<br />
cv_id<br />
name<br />
definition<br />
dbxref_id<br />
<br />
Corresponding code:<br />
<br />
<perl><br />
package Chado::Cvterm;<br />
use base 'Chado::DBI';<br />
Chado::Cvterm->set_up_table('Cvterm');<br />
</perl><br />
<br />
=====Class::DBI - CRUD=====<br />
<br />
<perl><br />
## Create<br />
$term_dbobj = Chado::Cvterm->create({<br />
name => ”DUMMY TERM”,<br />
cv_id => 1,<br />
dbxref_id => 125<br />
});<br />
## Retrieve<br />
$term_dbobj = Chado::Cvterm->retrieve(2);<br />
<br />
## Update<br />
$term_dbobj->name( $term->name() );<br />
$term_dbobj->definition( $term->definition );<br />
<br />
## Delete<br />
$term_dbobj->delete();<br />
</perl><br />
<br />
=====Java - Hibernate=====<br />
<br />
* Hibernate maps Java Objects directly to database tables<br />
* Scalable<br />
* Works well for controlled Data model<br />
<br />
=====Java - iBatis=====<br />
<br />
* iBATIS maps Java Objects to the results of SQL Queries<br />
* XML definitions for queries<br />
* Queries and managing Maps<br />
* Transactions<br />
* Good fit for existing database schema<br />
<br />
=====Summary=====<br />
<br />
* ORM provides painless roundtrip of data between the application and database.<br />
* Reduces the amount of SQL code and allows a programmatic style interface to the RDBMS<br />
* Choice of ORM solution depends on the type of project<br />
* Flybase examined iBatis and Hibernate, both use XML configuration files<br />
** Hibernate is better if you're building schema from scratch<br />
** Both auto-configure given a schema. <br />
** Both have strengths and weaknesses.<br />
* Is Hibernate better when you're in the process of designing a schema?<br />
** Hibernate can assist you in making a ''Hibernate-compatible'' schema.<br />
<br />
===XORT===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/project/showfiles.php?group_id=27707<br />
* Language: Perl<br />
* Authors: Pinglei Zhou<br />
* Users: Flybase<br />
* Support: Pinglei Zhou<br />
* Third party code: Uses XML::Parser::PerlSAX, Unicode::String, XML::DOM, and DBI from Perl (CPAN)<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
=====Comparing Hibernate & XORT=====<br />
<br />
Flybase tried Hibernate, but just creating simple print() statements in the course of doing bulk operations they<br />
encountered performance issues. Therer are many caching parameters available in Hibernate but the problem is that Chado is recursive or cyclical. <br />
XORT does some simple, and automatic, caching. With XORT you can handle recursive or cyclical operations more easily. In common operations such as merging genes Chado users will encounter this issue routinely.<br />
<br />
Also see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]].<br />
<br />
====Limitations====<br />
<br />
* Database schemas need to follow certain rules<br />
** All must have internal int primary key<br />
** All must have unique key(s)<br />
* It may take a long path to retrieve certain type of data<br />
** Example: gene->allele->genotype->phenotype via feature_relationship<br />
* Structure not stored in memory, you flush out data as it goes<br />
<br />
====Presentation by Pinglei Zhou and Josh Goodman====<br />
<br />
[[XORT Presentation|XORT Presentation]]<br />
<br />
===Chado::AutoDBI===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/projects/gmod/<br />
* Language: Perl<br />
* Authors: Allen Day, Scott Cain, Brian O'Connor, & others<br />
* Users: <br />
* Support:<br />
* Third party code: Based on Class::DBI by Michael Schwern & Tony Bowden<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Performance<br />
** Can one read thousands of objects into memory? You could do this but it's not suited to bulk operations<br />
* Joins & complex queries<br />
<br />
<perl><br />
# Add the add_constructor for looking for name lengths<br />
<br />
__PACKAGE__ ->add_constructor(long_names => qq{ length(name) > 15 });<br />
<br />
# Custom SQL<br />
<br />
__PACKAGE__->set_sql(xfiles => qq{<br />
SELECT FEATURE_ID<br />
FROM FEATURE<br />
WHERE NAME = 'xfiles' });<br />
</perl><br />
<br />
====Presentation by Brian O'Connor====<br />
<br />
[[Chado::AutoDBI Presentation|Chado::AutoDBI Presentation]]<br />
<br />
===Modware===<br />
<br />
====Background====<br />
<br />
* Source: http://gmod-ware.sourceforge.net<br />
* Language: Perl<br />
* Authors: Sohel Merchant, Eric Just<br />
* Contact: e-just [at] northwestern.edu<br />
* Users: DictyBase<br />
* Support: <br />
* Third party code: GMOD, BioPerl<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity: Uses the Chado::AutoDBI connection from GMOD. No connection configuration necessary since Modware is built on top of GMOD and the coonnection is configured on GMOD install. <br />
* Transaction support: Transactions are fully supported. The database handle is available as a singleton through Modware::DBH. To rollback at any time, simply insert '''new Modware::DBH->rollback()''' into script.<br />
* Code generation: No automatic code generation.<br />
<br />
====Special topics====<br />
<br />
* Modware's API documentation is complete and easily browsed at [http://gmod-ware.sourceforge.net/doc/ http://gmod-ware.sourceforge.net/doc/]<br />
* Plenty of other documentation (quick starts with simple use cases) at [http://gmod-ware.sourceforge.net http://gmod-ware.sourceforge.net]<br />
<br />
====Limitations====<br />
<br />
* Does not ''cover'' all of Chado<br />
* Not enough users to get quality feedback yet<br />
* Performance (?)<br />
* Language dependent<br />
<br />
====Presentation by Eric Just====<br />
<br />
[[Modware Presentation|Modware Presentation]]<br />
<br />
===GBrowse (DasI) Adaptor===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/project/showfiles.php?group_id=27707&package_id=34513&release_id=433523<br />
* Language: Perl<br />
* Authors: Scott Cain<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity: Connects ''via'' Perl's DBI<br />
* Transaction support: N/A (read only adapter)<br />
* Code generation: N/A<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Read-only<br />
* Not generic middleware but if you use Chado and GBrowse may be useful<br />
* Incomplete implementation of Bio::DasI; just enough to make GBrowse work<br />
* Also, despite the name, has never been tested with a Das server.<br />
<br />
====Presentation by Scott Cain====<br />
<br />
[[GBrowse (DasI) Presentation|GBrowse (DasI) Adaptor Presentation]]<br />
<br />
===iBatis and Abator===<br />
<br />
====Background====<br />
<br />
* Source: http://ibatis.apache.org/<br />
* Language: Java<br />
* Authors: Apache group<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Does not hide SQL<br />
* Does not create a whole object model of the database in memory<br />
* Not as widely used as Hibernate<br />
* No Perl version<br />
<br />
====Presentation by Jeff Bowes====<br />
<br />
[[iBatis Presentation|iBatis Presentation]]<br />
<br />
===Hibernate===<br />
<br />
====Background====<br />
<br />
* Source: http://www.hibernate.org<br />
* Language: Java<br />
* Authors: JBoss group<br />
* Users: VectorBase<br />
* Support: JBoss group<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Limitations====<br />
<br />
* Issue around Completeness <br />
* Exception Handling<br />
* Performance Tuning<br />
<br />
====Presentation by Robert Bruggner====<br />
<br />
[[Hibernate Presentation|Hibernate Presentation]]<br />
<br />
===PSU Chado Interface===<br />
<br />
====Background====<br />
<br />
* Source:<br />
* Language: Java<br />
* Authors: Chinmay Patel, Adrian Tivey, Tim Carver<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Limitations====<br />
<br />
* Hibernate Save: no equivalent to a ''find or save'' method.<br />
* Not great for bulk data retrieval.<br />
* Hibernate works best when applied to databases designed with objects in mind (''Object Oriented Databases''). <br />
<br />
<br />
====Presentation by Chinmay Patel====<br />
<br />
[[PSU Presentation|PSU Presentation]]<br />
<br />
===Wiki Authors===<br />
<br />
* Jeff Bowes, XenBase<br />
* Robert Bruggner, VectorBase<br />
* Scott Cain, GMOD<br />
* Josh Goodman, FlyBase<br />
* Eric Just, DictyBase<br />
* Sohel Merchant, DictyBase<br />
* Brian O'Connor, UCLA<br />
* Brian Osborne, GMOD<br />
* Chinmay Patel, GeneDB<br />
* Pinglei Zhou, FlyBase</div>165.124.152.78http://gmod.org/wiki/GMOD_MiddlewareGMOD Middleware2007-02-06T17:36:55Z<p>165.124.152.78: /* Technical Overview */</p>
<hr />
<div><big>Executive Summary</big><br />
<br />
''The meeting participants examined two types of Object-Relational Mapping (ORM) tools.'' The Hibernate and iBatis tools will examine any relational schema and create middleware ''de novo'' whereas Modware and Chado::AutoDBI are hand-built to match GMOD's Chado relational schema.<br />
<br />
''A predictable API is desirable.'' The participants' view is that an API that is easily understood and reflects current models of biological data is the most desirable API. Currently the Modware and Chado::AutoDBI packages come closest to providing this comprehensibility.<br />
<br />
<br />
==Middleware for Chado databases==<br />
<br />
===Participants===<br />
<br />
On January 19 2007, representatives of the major model organism databases (MODs) and other interested parties met to discuss and compare Middleware packages used by developers working for the MODs. The workshop attendees were tasked to make specific recommendations. Such recommendations will focus the efforts of the MODs on specific packages and lead to code re-use and more feature-rich middleware.<br />
<br />
After an introductory presentation attendees listened to a series of presentations on the different Middleware packages. Each presenter described the features of the package, both positive and negative, and showed how the package would be used to address a specific set of test problems. The attendees reassembled in a general discussion group to discuss the presentations and to develop a consensus statement. This document represents the consensus of the workshop and shows the individual presentations.<br />
<br />
<br />
===Introduction===<br />
<br />
A group of some 50 GMOD developers gathered at the annual meeting to discuss middleware. This one day meeting had the following general goals:<br />
<br />
* To educate GMOD programmers on methods and practices for Middleware<br />
* To facilitate discussion on the best methods<br />
* To guide GMOD to a uniform Middleware layer<br />
* To generate this central reference document for Middleware projects, including:<br />
** Platform information<br />
** Strengths & weaknesses of different Middleware packages<br />
** Specific examples of how one would use a given middleware package<br />
<br />
One of the key characteristics of the GMOD software project is the variety of approaches and components that it supports. This applies to applications, database schemas, as well as to middleware, a software ''layer'' that mediates the exchange of data between the applications and the databases. Despite this diversity certain applications and schemas have emerged as key supported components in GMOD, such as the GBrowse application and the Chado schema, to name just two. However, a consensus view has not emerged with respect to middleware, and there are certainly a number of different middleware packages that have been used in the GMOD world, coming from within this world and from the larger world of open source.<br />
<br />
In late 2006 the GMOD developers took note of the large number of middleware packages in use and elected to embark on a short-term study to evaluate and compare these packages. The primary motivation here was to select or recommend certain packages over others specifically within the GMOD context. The assumption is that making such recommendations will serve to focus the developers' effort on a smaller number of packages. Clearly it's also assumed that such a focus will inevitably lead to greater support for and use of those recommended packages, and that all will GMOD will benefit.<br />
<br />
Another purpose of this study is to educate GMOD programmers on best practices concerning the use and development of middleware. It's expected that common agreement on these practices will lead to the development of more effective software as well as the best use of the software in practice. Finally, this study should generate a central reference document on these different middleware packages used in GMOD. This reference will contain platform- and language-specific information as well as descriptions of the strengths and weaknesses of the packages that can be used by GMOD developers when considering middleware.<br />
<br />
====General Evaluation Criteria====<br />
<br />
The GMOD developers proposed that each presenter provide some basic information about each middleware package, both general and technical. In addition each middleware application was asked to address a set of sample problems, shown below. These example problems are thought to typify some of the common functions that the scientist may need when working with their own database. It was understood that not all software would be able to handle all aspects of the sample problems and this demonstration was not intended to be ''live''.<br />
<br />
=====Problem 1=====<br />
<br />
Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned <code>feature_id</code> for each inserted gene.<br />
<br />
Note:<br />
<br />
* The coordinates are given in exact coordinates.<br />
* Use the <code>organism_id</code> for your organism<br />
* Store description in the chado table <code>featureprop</code><br />
* A sequence in fasta format (see [[FakeChromosome|Fake Chromosome]]) should be loaded as genomic sequence, either chromosome or contig -- this will be used as a <code>srcfeature</code> in <code>featureloc</code><br />
<br />
Gene descriptions:<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
mRNA Feature<br />
exon_1: <br />
start: 13691<br />
end: 13767<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
exon_2: <br />
start: 14687<br />
end: 14720<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
symbol: x-men<br />
synonyms: wolverine<br />
mRNA Feature<br />
exon_1: <br />
start: 12648<br />
end: 13136<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
symbol: x-ray<br />
synonyms: none<br />
exon: <br />
start: 1703<br />
end: 1900<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 2=====<br />
<br />
Retrieve and print the following report for gene '''xfile''' (the coding sequence and exon coordinates are derived from the associated mRNA feature). The results should resemble the following:<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>xfile cds <br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
=====Problem 3=====<br />
<br />
Update the gene '''xfile''': change the name symbol to '''x-file''' and retrieve the changed record. Regenerate the report from Problem 1. The results should resemble the following:<br />
<br />
symbol: x-file<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>x-file cds<br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
=====Problem 4=====<br />
<br />
Search for all genes with symbols starting with ''x-''. With the results produce the following simple result list<br />
(organism will vary):<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
1325 x-ray Xenopus laevis<br />
<br />
=====Problem 5=====<br />
<br />
Delete the gene '''x-ray''' using the <code>geneId</code>. Run the search and report in Problem 4 again to show the delete has taken<br />
place, with a result resembling the following:<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
<br />
=====Problem Results=====<br />
<br />
All presenters paid attention to the assigned problems and all packages could perform the required operations, except for the GBrowse (DasI) Adaptor which is read-only software. Clearly one can see many differences between packages in how the the problems were solved, please see the presentations themselves for these details.<br />
<br />
===The Middleware Packages===<br />
<br />
The one day meeting heard presentations from developers using both Perl and Java middleware and a number of satisfactory solutions were described. The focus in all cases was some sort of system that connected to the Chado relational database. Although other databases are encountered in the GMOD world (e.g. [http://biosql.org BioSQL]) the Chado schema is popular and serves as a good test schema for this exercise given its complexity. The primary focus in the talks was on functionality from the perspective of writing code and extending the software and less attention was given to performance. Each presenter focussed on their middleware and little side-by-side comparisons were made (for one comparison please see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]] by Josh Goodman, written before this meeting).<br />
<br />
The Chado::AutoDBI middleware package is built on Perl's Class:DBI, a module that can be considered either an ORM tool or a tool to build ORM tools. Modware is built on top of Chado::AutoDBI. Both Chado::AutoDBI and Modware are built specifically for the Chado schema and aren't general tools. <br />
<br />
A key difference is that Modware uses Bioperl as its programmatic interface whereas the Chado::AutoDBI API resembles that of Class::DBI. Furthermore Chado::AutoDBI wraps the entire Chado schema where Modware objects only address those parts of the Chado schema that map to Bioperl objects. Neither of these packages require configuration, they are pre-configured for Chado and all one needs to do is connect to an existing schema. This schema, in theory, could be running on any popular RDBMS (Postgres, Mysql, Oracle, etc.), this flexibility is built into Class::DBI.<br />
<br />
The Java packages, Hibernate and iBatis, are far more general than the 2 packages discussed above and one could use them with any relational schema. One distinguishing feature is the languages that these packages use: Hibernate tends to present more Java to the programmer whereas iBatis presents both Java and XML. Both allow you to address the schema with SQL should you choose to do so. Hibernate also introduces its own language, HQL, a Java-SQL hybrid. Both would need some configuration to connect to the Chado schema but this should not be considered a significant barrier. Both packages can be used with any popular RDBMS (Postgres, Mysql, Oracle, etc.).<br />
<br />
XORT is not a typical ORM tool like these other packages but has been included here because of its utility in bulk operations. This capability is one that the ORM tools are thought not to do very well as the serial construction and destruction of objects is typically not fast, and constructing very large numbers of objects simultaneously consumes quite a bit of memory. This is not to say that one shouldn't use an ORM tool for bulk operations but that one should test the tool in question and not assume its performance is adequate to the given task. Such a test may show an entirely different approach, like that of XORT, is more appropriate.<br />
<br />
====Java Middleware====<br />
<br />
The Java packages all used Java plus XML, to some degree. In addition iBatis exposes SQL to the developer and it was argued that this could be viewed either as an advantage (allows tuning of underlying SQL) or a disadvantage. Both Hibernate and iBatis are designed to operate with any relational schema, not just Chado. <br />
<br />
=====Hibernate=====<br />
<br />
From the [http://www.hibernate.org Hibernate Web site]:<br />
<br />
''Hibernate lets you develop persistent classes following object-oriented idiom - including association, inheritance, polymorphism, composition, and collections. Hibernate allows you to express queries in its own portable SQL extension (HQL), as well as in native SQL, or with an object-oriented Criteria and Example API. Unlike many other persistence solutions, Hibernate does not hide the power of SQL from you and guarantees that your investment in relational technology and knowledge is as valid as always.''<br />
<br />
Hibernate is thought to work best when used in conjunction with a schema that's been designed with objects in mind, an ''object-oriented schema''.<br />
<br />
======Abstraction======<br />
<br />
Hibernate is the more abstracted of the 2 Java packages, it allows you to work with the relational database with the least exposure to SQL if you choose to do this. It is probably considered the more flexible of the 2 with respect to language since one can program in Java or HBL (Hibernate Query Language), a hybrid between SQL and Java.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons between Hibernate and iBatis were made.<br />
<br />
======Configuration======<br />
<br />
<br />
======Documentation======<br />
<br />
Hibernate is a popular and well-supported tool with extensive documentation.<br />
<br />
=====iBatis=====<br />
<br />
From the [http://ibatis.apache.org iBatis Web site]:<br />
<br />
''The Data Mapper framework (a.k.a. SQL Maps) will help to significantly reduce the amount of Java and .NET code that is normally neededto access a relational database. This framework maps classes to SQL statements using a very simple XML descriptor. Simplicity is the biggest advantage of iBATIS over other frameworks and object relational mapping tools. To use iBATIS you need only be familiar with your own application domain objects (basic JavaBeans or .NET classes), XML, and SQL. There is very little else to learn. There is no complex scheme required to join tables or execute complex queries. Using iBATIS you have the full power of real SQL at your fingertips. The iBATIS Data Mapper framework can map nearly any database to any object model and is very tolerant of legacy designs, or even bad designs. This is all achieved without special database tables, peer objects or code generation.''<br />
<br />
======Abstraction======<br />
<br />
iBatis does not attempt to achieve abstraction in the way that other Java ORM tools do and assumes that viewing SQL is an advantage, not a disadvantage.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons made between iBatis and Hibernate.<br />
<br />
======Configuration======<br />
<br />
======Documentation======<br />
<br />
iBatis is a popular and well-supported tool with extensive documentation.<br />
<br />
====Perl Middleware====<br />
<br />
The Perl approaches used only the Perl language (the Java packages all used Java plus XML, to some degree). The XORT application is not, strictly speaking, ''middleware'' but has proven to be very useful in bulk operations using the Chado schema and Chado XML though in principle it can be used with any relational schema.<br />
<br />
=====Chado::AutoDBI=====<br />
<br />
======Abstraction======<br />
<br />
Chado::AutoDBI objects map directly to the Chado tables so it could be said that Chado::AutoDBI is as abstract as Chado itself. Therefore one needs to become somewhat familiar with Chado itself in order to use Chado::AutoDBI.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons of performance were done using Perl middleware. All packages were deemed to give adequate performance when used to connect UIs to underlying databases. On the other hand the presenters were reluctant to recommend their packages for ''bulk'' operations. <br />
<br />
======Configuration======<br />
<br />
Chado::AutoDBI requires no configuration, it is designed to interact with the Chado schema out-of-the-box. If the schema changes then the underlying Class::DBI should be able to adjust to the change.<br />
<br />
======Documentation======<br />
<br />
=====Modware=====<br />
<br />
======Abstraction======<br />
<br />
Modware has higher level of abstraction than that provided by Chado::AutoDBI. The critical point is that it uses the [http://bioperl.org Bioperl] objects as accessors, this could either be highly suitable or not at all appropriate for a given environment.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons of performance were done using Perl middleware. All packages were deemed to give adequate performance when used to connect UIs to underlying databases.<br />
<br />
======Configuration======<br />
<br />
Modware requires no configuration, it is designed to interact with the Chado schema out-of-the-box. If the schema changes then the underlying Chado::AutoDBI should be able to adjust to the change.<br />
<br />
======Documentation======<br />
<br />
Bioperl-style documentation at http://gmod-ware.sourceforge.net/doc/, written for POD for all methods.<br />
<br />
=====GBrowse (DasI)=====<br />
<br />
======Abstraction======<br />
<br />
======Performance======<br />
<br />
======Configuration======<br />
<br />
This package needs no configuration, it is pre-configured for Chado.<br />
<br />
======Documentation======<br />
<br />
The DasI interface is well-documented, about a dozen methods and three classes, all documented.<br />
<br />
===Getting More Information===<br />
<br />
The issues around using and developing middleware are of general interest in GMOD. If you have questions about middleware we suggest that you contact the GMOD Development list rather than contacting individual developers. You can sign up for the list here:<br />
<br />
https://lists.sourceforge.net/lists/listinfo/gmod-devel<br />
<br />
<br />
===Object-Relational Mapping Principles===<br />
<br />
====Presentation by Sohel Merchant====<br />
<br />
Sohel Merchant, Bioinformatics Software Engineer at dictyBase, Center for Genetic Medicine, Northwestern University, Chicago. This Wiki section is an edited version of [[Media:Object_Relational_Mapping_Layer.pdf|Sohel's presentation]].<br />
<br />
=====Outline=====<br />
<br />
* The Problem<br />
* Solutions<br />
* ORM<br />
* Perl – Class::DBI<br />
* Summary<br />
<br />
=====The Problem=====<br />
<br />
* Developers need to perform Create, Retrieve, Update, Delete (aka CRUD) operations on data inside an application.<br />
* The real world objects represented using a programming language needs to be stored in databases<br />
* Using relational databases to store object-oriented data leads to a semantic gap<br />
* RDBMS have fixed types, but OO can have more complicated user defined types.<br />
<br />
=====Solutions=====<br />
<br />
* Data Access Object (DAO)<br />
* Developer writes a class which contains one attribute for each field in the table<br />
* Methods for CRUD typically contains JDBC/DBI code with the necessary SQL statements.<br />
* Object Relational Mapping (ORM), WikiPedia:<br />
** “ORM is a programming technique that links databases to object-oriented language concepts, creating (in effect) a virtual object database.“<br />
* Developer needs to configure the ORM<br />
* Less amount of manual coding<br />
* CRUD methods are automatically generated by the ORM layer<br />
<br />
=====ORM=====<br />
<br />
ORM solutions<br />
<br />
* Perl<br />
** Class::DBI<br />
* Java<br />
** EJB<br />
** Hibernate<br />
** JDO<br />
** iBatis<br />
<br />
=====Perl - Class::DBI=====<br />
<br />
* Provides a simple interfaces for wrapping Perl classes around a database tables<br />
* Tables are mapped directly to objects<br />
* The table column name are mapped to the get/set methods<br />
* Can be used with transactions<br />
<br />
=====Class::DBI=====<br />
<br />
Defining a class in Class::DBI to represent a table:<br />
<br />
CVTERM<br />
cvterm_id<br />
cv_id<br />
name<br />
definition<br />
dbxref_id<br />
<br />
Corresponding code:<br />
<br />
<perl><br />
package Chado::Cvterm;<br />
use base 'Chado::DBI';<br />
Chado::Cvterm->set_up_table('Cvterm');<br />
</perl><br />
<br />
=====Class::DBI - CRUD=====<br />
<br />
<perl><br />
## Create<br />
$term_dbobj = Chado::Cvterm->create({<br />
name => ”DUMMY TERM”,<br />
cv_id => 1,<br />
dbxref_id => 125<br />
});<br />
## Retrieve<br />
$term_dbobj = Chado::Cvterm->retrieve(2);<br />
<br />
## Update<br />
$term_dbobj->name( $term->name() );<br />
$term_dbobj->definition( $term->definition );<br />
<br />
## Delete<br />
$term_dbobj->delete();<br />
</perl><br />
<br />
=====Java - Hibernate=====<br />
<br />
* Hibernate maps Java Objects directly to database tables<br />
* Scalable<br />
* Works well for controlled Data model<br />
<br />
=====Java - iBatis=====<br />
<br />
* iBATIS maps Java Objects to the results of SQL Queries<br />
* XML definitions for queries<br />
* Queries and managing Maps<br />
* Transactions<br />
* Good fit for existing database schema<br />
<br />
=====Summary=====<br />
<br />
* ORM provides painless roundtrip of data between the application and database.<br />
* Reduces the amount of SQL code and allows a programmatic style interface to the RDBMS<br />
* Choice of ORM solution depends on the type of project<br />
* Flybase examined iBatis and Hibernate, both use XML configuration files<br />
** Hibernate is better if you're building schema from scratch<br />
** Both auto-configure given a schema. <br />
** Both have strengths and weaknesses.<br />
* Is Hibernate better when you're in the process of designing a schema?<br />
** Hibernate can assist you in making a ''Hibernate-compatible'' schema.<br />
<br />
===XORT===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/project/showfiles.php?group_id=27707<br />
* Language: Perl<br />
* Authors: Pinglei Zhou<br />
* Users: Flybase<br />
* Support: Pinglei Zhou<br />
* Third party code: Uses XML::Parser::PerlSAX, Unicode::String, XML::DOM, and DBI from Perl (CPAN)<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
=====Comparing Hibernate & XORT=====<br />
<br />
Flybase tried Hibernate, but just creating simple print() statements in the course of doing bulk operations they<br />
encountered performance issues. Therer are many caching parameters available in Hibernate but the problem is that Chado is recursive or cyclical. <br />
XORT does some simple, and automatic, caching. With XORT you can handle recursive or cyclical operations more easily. In common operations such as merging genes Chado users will encounter this issue routinely.<br />
<br />
Also see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]].<br />
<br />
====Limitations====<br />
<br />
* Database schemas need to follow certain rules<br />
** All must have internal int primary key<br />
** All must have unique key(s)<br />
* It may take a long path to retrieve certain type of data<br />
** Example: gene->allele->genotype->phenotype via feature_relationship<br />
* Structure not stored in memory, you flush out data as it goes<br />
<br />
====Presentation by Pinglei Zhou and Josh Goodman====<br />
<br />
[[XORT Presentation|XORT Presentation]]<br />
<br />
===Chado::AutoDBI===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/projects/gmod/<br />
* Language: Perl<br />
* Authors: Allen Day, Scott Cain, Brian O'Connor, & others<br />
* Users: <br />
* Support:<br />
* Third party code: Based on Class::DBI by Michael Schwern & Tony Bowden<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Performance<br />
** Can one read thousands of objects into memory? You could do this but it's not suited to bulk operations<br />
* Joins & complex queries<br />
<br />
<perl><br />
# Add the add_constructor for looking for name lengths<br />
<br />
__PACKAGE__ ->add_constructor(long_names => qq{ length(name) > 15 });<br />
<br />
# Custom SQL<br />
<br />
__PACKAGE__->set_sql(xfiles => qq{<br />
SELECT FEATURE_ID<br />
FROM FEATURE<br />
WHERE NAME = 'xfiles' });<br />
</perl><br />
<br />
====Presentation by Brian O'Connor====<br />
<br />
[[Chado::AutoDBI Presentation|Chado::AutoDBI Presentation]]<br />
<br />
===Modware===<br />
<br />
====Background====<br />
<br />
* Source: http://gmod-ware.sourceforge.net<br />
* Language: Perl<br />
* Authors: Sohel Merchant, Eric Just<br />
* Contact: e-just [at] northwestern.edu<br />
* Users: DictyBase<br />
* Support: <br />
* Third party code: GMOD, BioPerl<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity: Uses the Chado::AutoDBI connection from GMOD. No connection configuration necessary since Modware is built on top of GMOD and the coonnection is configured on GMOD install. <br />
* Transaction support: Transactions are fully supported. The database handle is available as a singleton through Modware::DBH. To rollback at any time, simply insert '''new Modware::DBH->rollback()''' into script.<br />
* Code generation: No automatic code generation.<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Does not ''cover'' all of Chado<br />
* Not enough users to get quality feedback yet<br />
* Performance (?)<br />
* Language dependent<br />
<br />
====Presentation by Eric Just====<br />
<br />
[[Modware Presentation|Modware Presentation]]<br />
<br />
===GBrowse (DasI) Adaptor===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/project/showfiles.php?group_id=27707&package_id=34513&release_id=433523<br />
* Language: Perl<br />
* Authors: Scott Cain<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity: Connects ''via'' Perl's DBI<br />
* Transaction support: N/A (read only adapter)<br />
* Code generation: N/A<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Read-only<br />
* Not generic middleware but if you use Chado and GBrowse may be useful<br />
* Incomplete implementation of Bio::DasI; just enough to make GBrowse work<br />
* Also, despite the name, has never been tested with a Das server.<br />
<br />
====Presentation by Scott Cain====<br />
<br />
[[GBrowse (DasI) Presentation|GBrowse (DasI) Adaptor Presentation]]<br />
<br />
===iBatis and Abator===<br />
<br />
====Background====<br />
<br />
* Source: http://ibatis.apache.org/<br />
* Language: Java<br />
* Authors: Apache group<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Does not hide SQL<br />
* Does not create a whole object model of the database in memory<br />
* Not as widely used as Hibernate<br />
* No Perl version<br />
<br />
====Presentation by Jeff Bowes====<br />
<br />
[[iBatis Presentation|iBatis Presentation]]<br />
<br />
===Hibernate===<br />
<br />
====Background====<br />
<br />
* Source: http://www.hibernate.org<br />
* Language: Java<br />
* Authors: JBoss group<br />
* Users: VectorBase<br />
* Support: JBoss group<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Limitations====<br />
<br />
* Issue around Completeness <br />
* Exception Handling<br />
* Performance Tuning<br />
<br />
====Presentation by Robert Bruggner====<br />
<br />
[[Hibernate Presentation|Hibernate Presentation]]<br />
<br />
===PSU Chado Interface===<br />
<br />
====Background====<br />
<br />
* Source:<br />
* Language: Java<br />
* Authors: Chinmay Patel, Adrian Tivey, Tim Carver<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Limitations====<br />
<br />
* Hibernate Save: no equivalent to a ''find or save'' method.<br />
* Not great for bulk data retrieval.<br />
* Hibernate works best when applied to databases designed with objects in mind (''Object Oriented Databases''). <br />
<br />
<br />
====Presentation by Chinmay Patel====<br />
<br />
[[PSU Presentation|PSU Presentation]]<br />
<br />
===Wiki Authors===<br />
<br />
* Jeff Bowes, XenBase<br />
* Robert Bruggner, VectorBase<br />
* Scott Cain, GMOD<br />
* Josh Goodman, FlyBase<br />
* Eric Just, DictyBase<br />
* Sohel Merchant, DictyBase<br />
* Brian O'Connor, UCLA<br />
* Brian Osborne, GMOD<br />
* Chinmay Patel, GeneDB<br />
* Pinglei Zhou, FlyBase</div>165.124.152.78http://gmod.org/wiki/GMOD_MiddlewareGMOD Middleware2007-02-06T17:35:48Z<p>165.124.152.78: /* Technical Overview */</p>
<hr />
<div><big>Executive Summary</big><br />
<br />
''The meeting participants examined two types of Object-Relational Mapping (ORM) tools.'' The Hibernate and iBatis tools will examine any relational schema and create middleware ''de novo'' whereas Modware and Chado::AutoDBI are hand-built to match GMOD's Chado relational schema.<br />
<br />
''A predictable API is desirable.'' The participants' view is that an API that is easily understood and reflects current models of biological data is the most desirable API. Currently the Modware and Chado::AutoDBI packages come closest to providing this comprehensibility.<br />
<br />
<br />
==Middleware for Chado databases==<br />
<br />
===Participants===<br />
<br />
On January 19 2007, representatives of the major model organism databases (MODs) and other interested parties met to discuss and compare Middleware packages used by developers working for the MODs. The workshop attendees were tasked to make specific recommendations. Such recommendations will focus the efforts of the MODs on specific packages and lead to code re-use and more feature-rich middleware.<br />
<br />
After an introductory presentation attendees listened to a series of presentations on the different Middleware packages. Each presenter described the features of the package, both positive and negative, and showed how the package would be used to address a specific set of test problems. The attendees reassembled in a general discussion group to discuss the presentations and to develop a consensus statement. This document represents the consensus of the workshop and shows the individual presentations.<br />
<br />
<br />
===Introduction===<br />
<br />
A group of some 50 GMOD developers gathered at the annual meeting to discuss middleware. This one day meeting had the following general goals:<br />
<br />
* To educate GMOD programmers on methods and practices for Middleware<br />
* To facilitate discussion on the best methods<br />
* To guide GMOD to a uniform Middleware layer<br />
* To generate this central reference document for Middleware projects, including:<br />
** Platform information<br />
** Strengths & weaknesses of different Middleware packages<br />
** Specific examples of how one would use a given middleware package<br />
<br />
One of the key characteristics of the GMOD software project is the variety of approaches and components that it supports. This applies to applications, database schemas, as well as to middleware, a software ''layer'' that mediates the exchange of data between the applications and the databases. Despite this diversity certain applications and schemas have emerged as key supported components in GMOD, such as the GBrowse application and the Chado schema, to name just two. However, a consensus view has not emerged with respect to middleware, and there are certainly a number of different middleware packages that have been used in the GMOD world, coming from within this world and from the larger world of open source.<br />
<br />
In late 2006 the GMOD developers took note of the large number of middleware packages in use and elected to embark on a short-term study to evaluate and compare these packages. The primary motivation here was to select or recommend certain packages over others specifically within the GMOD context. The assumption is that making such recommendations will serve to focus the developers' effort on a smaller number of packages. Clearly it's also assumed that such a focus will inevitably lead to greater support for and use of those recommended packages, and that all will GMOD will benefit.<br />
<br />
Another purpose of this study is to educate GMOD programmers on best practices concerning the use and development of middleware. It's expected that common agreement on these practices will lead to the development of more effective software as well as the best use of the software in practice. Finally, this study should generate a central reference document on these different middleware packages used in GMOD. This reference will contain platform- and language-specific information as well as descriptions of the strengths and weaknesses of the packages that can be used by GMOD developers when considering middleware.<br />
<br />
====General Evaluation Criteria====<br />
<br />
The GMOD developers proposed that each presenter provide some basic information about each middleware package, both general and technical. In addition each middleware application was asked to address a set of sample problems, shown below. These example problems are thought to typify some of the common functions that the scientist may need when working with their own database. It was understood that not all software would be able to handle all aspects of the sample problems and this demonstration was not intended to be ''live''.<br />
<br />
=====Problem 1=====<br />
<br />
Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned <code>feature_id</code> for each inserted gene.<br />
<br />
Note:<br />
<br />
* The coordinates are given in exact coordinates.<br />
* Use the <code>organism_id</code> for your organism<br />
* Store description in the chado table <code>featureprop</code><br />
* A sequence in fasta format (see [[FakeChromosome|Fake Chromosome]]) should be loaded as genomic sequence, either chromosome or contig -- this will be used as a <code>srcfeature</code> in <code>featureloc</code><br />
<br />
Gene descriptions:<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
mRNA Feature<br />
exon_1: <br />
start: 13691<br />
end: 13767<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
exon_2: <br />
start: 14687<br />
end: 14720<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
symbol: x-men<br />
synonyms: wolverine<br />
mRNA Feature<br />
exon_1: <br />
start: 12648<br />
end: 13136<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
symbol: x-ray<br />
synonyms: none<br />
exon: <br />
start: 1703<br />
end: 1900<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 2=====<br />
<br />
Retrieve and print the following report for gene '''xfile''' (the coding sequence and exon coordinates are derived from the associated mRNA feature). The results should resemble the following:<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>xfile cds <br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
=====Problem 3=====<br />
<br />
Update the gene '''xfile''': change the name symbol to '''x-file''' and retrieve the changed record. Regenerate the report from Problem 1. The results should resemble the following:<br />
<br />
symbol: x-file<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>x-file cds<br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
=====Problem 4=====<br />
<br />
Search for all genes with symbols starting with ''x-''. With the results produce the following simple result list<br />
(organism will vary):<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
1325 x-ray Xenopus laevis<br />
<br />
=====Problem 5=====<br />
<br />
Delete the gene '''x-ray''' using the <code>geneId</code>. Run the search and report in Problem 4 again to show the delete has taken<br />
place, with a result resembling the following:<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
<br />
=====Problem Results=====<br />
<br />
All presenters paid attention to the assigned problems and all packages could perform the required operations, except for the GBrowse (DasI) Adaptor which is read-only software. Clearly one can see many differences between packages in how the the problems were solved, please see the presentations themselves for these details.<br />
<br />
===The Middleware Packages===<br />
<br />
The one day meeting heard presentations from developers using both Perl and Java middleware and a number of satisfactory solutions were described. The focus in all cases was some sort of system that connected to the Chado relational database. Although other databases are encountered in the GMOD world (e.g. [http://biosql.org BioSQL]) the Chado schema is popular and serves as a good test schema for this exercise given its complexity. The primary focus in the talks was on functionality from the perspective of writing code and extending the software and less attention was given to performance. Each presenter focussed on their middleware and little side-by-side comparisons were made (for one comparison please see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]] by Josh Goodman, written before this meeting).<br />
<br />
The Chado::AutoDBI middleware package is built on Perl's Class:DBI, a module that can be considered either an ORM tool or a tool to build ORM tools. Modware is built on top of Chado::AutoDBI. Both Chado::AutoDBI and Modware are built specifically for the Chado schema and aren't general tools. <br />
<br />
A key difference is that Modware uses Bioperl as its programmatic interface whereas the Chado::AutoDBI API resembles that of Class::DBI. Furthermore Chado::AutoDBI wraps the entire Chado schema where Modware objects only address those parts of the Chado schema that map to Bioperl objects. Neither of these packages require configuration, they are pre-configured for Chado and all one needs to do is connect to an existing schema. This schema, in theory, could be running on any popular RDBMS (Postgres, Mysql, Oracle, etc.), this flexibility is built into Class::DBI.<br />
<br />
The Java packages, Hibernate and iBatis, are far more general than the 2 packages discussed above and one could use them with any relational schema. One distinguishing feature is the languages that these packages use: Hibernate tends to present more Java to the programmer whereas iBatis presents both Java and XML. Both allow you to address the schema with SQL should you choose to do so. Hibernate also introduces its own language, HQL, a Java-SQL hybrid. Both would need some configuration to connect to the Chado schema but this should not be considered a significant barrier. Both packages can be used with any popular RDBMS (Postgres, Mysql, Oracle, etc.).<br />
<br />
XORT is not a typical ORM tool like these other packages but has been included here because of its utility in bulk operations. This capability is one that the ORM tools are thought not to do very well as the serial construction and destruction of objects is typically not fast, and constructing very large numbers of objects simultaneously consumes quite a bit of memory. This is not to say that one shouldn't use an ORM tool for bulk operations but that one should test the tool in question and not assume its performance is adequate to the given task. Such a test may show an entirely different approach, like that of XORT, is more appropriate.<br />
<br />
====Java Middleware====<br />
<br />
The Java packages all used Java plus XML, to some degree. In addition iBatis exposes SQL to the developer and it was argued that this could be viewed either as an advantage (allows tuning of underlying SQL) or a disadvantage. Both Hibernate and iBatis are designed to operate with any relational schema, not just Chado. <br />
<br />
=====Hibernate=====<br />
<br />
From the [http://www.hibernate.org Hibernate Web site]:<br />
<br />
''Hibernate lets you develop persistent classes following object-oriented idiom - including association, inheritance, polymorphism, composition, and collections. Hibernate allows you to express queries in its own portable SQL extension (HQL), as well as in native SQL, or with an object-oriented Criteria and Example API. Unlike many other persistence solutions, Hibernate does not hide the power of SQL from you and guarantees that your investment in relational technology and knowledge is as valid as always.''<br />
<br />
Hibernate is thought to work best when used in conjunction with a schema that's been designed with objects in mind, an ''object-oriented schema''.<br />
<br />
======Abstraction======<br />
<br />
Hibernate is the more abstracted of the 2 Java packages, it allows you to work with the relational database with the least exposure to SQL if you choose to do this. It is probably considered the more flexible of the 2 with respect to language since one can program in Java or HBL (Hibernate Query Language), a hybrid between SQL and Java.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons between Hibernate and iBatis were made.<br />
<br />
======Configuration======<br />
<br />
<br />
======Documentation======<br />
<br />
Hibernate is a popular and well-supported tool with extensive documentation.<br />
<br />
=====iBatis=====<br />
<br />
From the [http://ibatis.apache.org iBatis Web site]:<br />
<br />
''The Data Mapper framework (a.k.a. SQL Maps) will help to significantly reduce the amount of Java and .NET code that is normally neededto access a relational database. This framework maps classes to SQL statements using a very simple XML descriptor. Simplicity is the biggest advantage of iBATIS over other frameworks and object relational mapping tools. To use iBATIS you need only be familiar with your own application domain objects (basic JavaBeans or .NET classes), XML, and SQL. There is very little else to learn. There is no complex scheme required to join tables or execute complex queries. Using iBATIS you have the full power of real SQL at your fingertips. The iBATIS Data Mapper framework can map nearly any database to any object model and is very tolerant of legacy designs, or even bad designs. This is all achieved without special database tables, peer objects or code generation.''<br />
<br />
======Abstraction======<br />
<br />
iBatis does not attempt to achieve abstraction in the way that other Java ORM tools do and assumes that viewing SQL is an advantage, not a disadvantage.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons made between iBatis and Hibernate.<br />
<br />
======Configuration======<br />
<br />
======Documentation======<br />
<br />
iBatis is a popular and well-supported tool with extensive documentation.<br />
<br />
====Perl Middleware====<br />
<br />
The Perl approaches used only the Perl language (the Java packages all used Java plus XML, to some degree). The XORT application is not, strictly speaking, ''middleware'' but has proven to be very useful in bulk operations using the Chado schema and Chado XML though in principle it can be used with any relational schema.<br />
<br />
=====Chado::AutoDBI=====<br />
<br />
======Abstraction======<br />
<br />
Chado::AutoDBI objects map directly to the Chado tables so it could be said that Chado::AutoDBI is as abstract as Chado itself. Therefore one needs to become somewhat familiar with Chado itself in order to use Chado::AutoDBI.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons of performance were done using Perl middleware. All packages were deemed to give adequate performance when used to connect UIs to underlying databases. On the other hand the presenters were reluctant to recommend their packages for ''bulk'' operations. <br />
<br />
======Configuration======<br />
<br />
Chado::AutoDBI requires no configuration, it is designed to interact with the Chado schema out-of-the-box. If the schema changes then the underlying Class::DBI should be able to adjust to the change.<br />
<br />
======Documentation======<br />
<br />
=====Modware=====<br />
<br />
======Abstraction======<br />
<br />
Modware has higher level of abstraction than that provided by Chado::AutoDBI. The critical point is that it uses the [http://bioperl.org Bioperl] objects as accessors, this could either be highly suitable or not at all appropriate for a given environment.<br />
<br />
======Performance======<br />
<br />
No pairwise comparisons of performance were done using Perl middleware. All packages were deemed to give adequate performance when used to connect UIs to underlying databases.<br />
<br />
======Configuration======<br />
<br />
Modware requires no configuration, it is designed to interact with the Chado schema out-of-the-box. If the schema changes then the underlying Chado::AutoDBI should be able to adjust to the change.<br />
<br />
======Documentation======<br />
<br />
Bioperl-style documentation at http://gmod-ware.sourceforge.net/doc/, written for POD for all methods.<br />
<br />
=====GBrowse (DasI)=====<br />
<br />
======Abstraction======<br />
<br />
======Performance======<br />
<br />
======Configuration======<br />
<br />
This package needs no configuration, it is pre-configured for Chado.<br />
<br />
======Documentation======<br />
<br />
The DasI interface is well-documented, about a dozen methods and three classes, all documented.<br />
<br />
===Getting More Information===<br />
<br />
The issues around using and developing middleware are of general interest in GMOD. If you have questions about middleware we suggest that you contact the GMOD Development list rather than contacting individual developers. You can sign up for the list here:<br />
<br />
https://lists.sourceforge.net/lists/listinfo/gmod-devel<br />
<br />
<br />
===Object-Relational Mapping Principles===<br />
<br />
====Presentation by Sohel Merchant====<br />
<br />
Sohel Merchant, Bioinformatics Software Engineer at dictyBase, Center for Genetic Medicine, Northwestern University, Chicago. This Wiki section is an edited version of [[Media:Object_Relational_Mapping_Layer.pdf|Sohel's presentation]].<br />
<br />
=====Outline=====<br />
<br />
* The Problem<br />
* Solutions<br />
* ORM<br />
* Perl – Class::DBI<br />
* Summary<br />
<br />
=====The Problem=====<br />
<br />
* Developers need to perform Create, Retrieve, Update, Delete (aka CRUD) operations on data inside an application.<br />
* The real world objects represented using a programming language needs to be stored in databases<br />
* Using relational databases to store object-oriented data leads to a semantic gap<br />
* RDBMS have fixed types, but OO can have more complicated user defined types.<br />
<br />
=====Solutions=====<br />
<br />
* Data Access Object (DAO)<br />
* Developer writes a class which contains one attribute for each field in the table<br />
* Methods for CRUD typically contains JDBC/DBI code with the necessary SQL statements.<br />
* Object Relational Mapping (ORM), WikiPedia:<br />
** “ORM is a programming technique that links databases to object-oriented language concepts, creating (in effect) a virtual object database.“<br />
* Developer needs to configure the ORM<br />
* Less amount of manual coding<br />
* CRUD methods are automatically generated by the ORM layer<br />
<br />
=====ORM=====<br />
<br />
ORM solutions<br />
<br />
* Perl<br />
** Class::DBI<br />
* Java<br />
** EJB<br />
** Hibernate<br />
** JDO<br />
** iBatis<br />
<br />
=====Perl - Class::DBI=====<br />
<br />
* Provides a simple interfaces for wrapping Perl classes around a database tables<br />
* Tables are mapped directly to objects<br />
* The table column name are mapped to the get/set methods<br />
* Can be used with transactions<br />
<br />
=====Class::DBI=====<br />
<br />
Defining a class in Class::DBI to represent a table:<br />
<br />
CVTERM<br />
cvterm_id<br />
cv_id<br />
name<br />
definition<br />
dbxref_id<br />
<br />
Corresponding code:<br />
<br />
<perl><br />
package Chado::Cvterm;<br />
use base 'Chado::DBI';<br />
Chado::Cvterm->set_up_table('Cvterm');<br />
</perl><br />
<br />
=====Class::DBI - CRUD=====<br />
<br />
<perl><br />
## Create<br />
$term_dbobj = Chado::Cvterm->create({<br />
name => ”DUMMY TERM”,<br />
cv_id => 1,<br />
dbxref_id => 125<br />
});<br />
## Retrieve<br />
$term_dbobj = Chado::Cvterm->retrieve(2);<br />
<br />
## Update<br />
$term_dbobj->name( $term->name() );<br />
$term_dbobj->definition( $term->definition );<br />
<br />
## Delete<br />
$term_dbobj->delete();<br />
</perl><br />
<br />
=====Java - Hibernate=====<br />
<br />
* Hibernate maps Java Objects directly to database tables<br />
* Scalable<br />
* Works well for controlled Data model<br />
<br />
=====Java - iBatis=====<br />
<br />
* iBATIS maps Java Objects to the results of SQL Queries<br />
* XML definitions for queries<br />
* Queries and managing Maps<br />
* Transactions<br />
* Good fit for existing database schema<br />
<br />
=====Summary=====<br />
<br />
* ORM provides painless roundtrip of data between the application and database.<br />
* Reduces the amount of SQL code and allows a programmatic style interface to the RDBMS<br />
* Choice of ORM solution depends on the type of project<br />
* Flybase examined iBatis and Hibernate, both use XML configuration files<br />
** Hibernate is better if you're building schema from scratch<br />
** Both auto-configure given a schema. <br />
** Both have strengths and weaknesses.<br />
* Is Hibernate better when you're in the process of designing a schema?<br />
** Hibernate can assist you in making a ''Hibernate-compatible'' schema.<br />
<br />
===XORT===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/project/showfiles.php?group_id=27707<br />
* Language: Perl<br />
* Authors: Pinglei Zhou<br />
* Users: Flybase<br />
* Support: Pinglei Zhou<br />
* Third party code: Uses XML::Parser::PerlSAX, Unicode::String, XML::DOM, and DBI from Perl (CPAN)<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
=====Comparing Hibernate & XORT=====<br />
<br />
Flybase tried Hibernate, but just creating simple print() statements in the course of doing bulk operations they<br />
encountered performance issues. Therer are many caching parameters available in Hibernate but the problem is that Chado is recursive or cyclical. <br />
XORT does some simple, and automatic, caching. With XORT you can handle recursive or cyclical operations more easily. In common operations such as merging genes Chado users will encounter this issue routinely.<br />
<br />
Also see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]].<br />
<br />
====Limitations====<br />
<br />
* Database schemas need to follow certain rules<br />
** All must have internal int primary key<br />
** All must have unique key(s)<br />
* It may take a long path to retrieve certain type of data<br />
** Example: gene->allele->genotype->phenotype via feature_relationship<br />
* Structure not stored in memory, you flush out data as it goes<br />
<br />
====Presentation by Pinglei Zhou and Josh Goodman====<br />
<br />
[[XORT Presentation|XORT Presentation]]<br />
<br />
===Chado::AutoDBI===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/projects/gmod/<br />
* Language: Perl<br />
* Authors: Allen Day, Scott Cain, Brian O'Connor, & others<br />
* Users: <br />
* Support:<br />
* Third party code: Based on Class::DBI by Michael Schwern & Tony Bowden<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Performance<br />
** Can one read thousands of objects into memory? You could do this but it's not suited to bulk operations<br />
* Joins & complex queries<br />
<br />
<perl><br />
# Add the add_constructor for looking for name lengths<br />
<br />
__PACKAGE__ ->add_constructor(long_names => qq{ length(name) > 15 });<br />
<br />
# Custom SQL<br />
<br />
__PACKAGE__->set_sql(xfiles => qq{<br />
SELECT FEATURE_ID<br />
FROM FEATURE<br />
WHERE NAME = 'xfiles' });<br />
</perl><br />
<br />
====Presentation by Brian O'Connor====<br />
<br />
[[Chado::AutoDBI Presentation|Chado::AutoDBI Presentation]]<br />
<br />
===Modware===<br />
<br />
====Background====<br />
<br />
* Source: http://gmod-ware.sourceforge.net<br />
* Language: Perl<br />
* Authors: Sohel Merchant, Eric Just<br />
* Contact: e-just [at] northwestern.edu<br />
* Users: DictyBase<br />
* Support: <br />
* Third party code: GMOD, BioPerl<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity: Uses the Chado::AutoDBI connection from GMOD. No connection configuration necessary since Modware is built on top of GMOD and the coonnection is configured on GMOD install. <br />
* Transaction support: Transactions are fully supported. The database handle is available as a singleton through Modware::DBH. To rollback at any time, simply insert <pre>new Modware::DBH->rollback()</pre> into script.<br />
* Code generation: No automatic code generation.<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Does not ''cover'' all of Chado<br />
* Not enough users to get quality feedback yet<br />
* Performance (?)<br />
* Language dependent<br />
<br />
====Presentation by Eric Just====<br />
<br />
[[Modware Presentation|Modware Presentation]]<br />
<br />
===GBrowse (DasI) Adaptor===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/project/showfiles.php?group_id=27707&package_id=34513&release_id=433523<br />
* Language: Perl<br />
* Authors: Scott Cain<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity: Connects ''via'' Perl's DBI<br />
* Transaction support: N/A (read only adapter)<br />
* Code generation: N/A<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Read-only<br />
* Not generic middleware but if you use Chado and GBrowse may be useful<br />
* Incomplete implementation of Bio::DasI; just enough to make GBrowse work<br />
* Also, despite the name, has never been tested with a Das server.<br />
<br />
====Presentation by Scott Cain====<br />
<br />
[[GBrowse (DasI) Presentation|GBrowse (DasI) Adaptor Presentation]]<br />
<br />
===iBatis and Abator===<br />
<br />
====Background====<br />
<br />
* Source: http://ibatis.apache.org/<br />
* Language: Java<br />
* Authors: Apache group<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Does not hide SQL<br />
* Does not create a whole object model of the database in memory<br />
* Not as widely used as Hibernate<br />
* No Perl version<br />
<br />
====Presentation by Jeff Bowes====<br />
<br />
[[iBatis Presentation|iBatis Presentation]]<br />
<br />
===Hibernate===<br />
<br />
====Background====<br />
<br />
* Source: http://www.hibernate.org<br />
* Language: Java<br />
* Authors: JBoss group<br />
* Users: VectorBase<br />
* Support: JBoss group<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Limitations====<br />
<br />
* Issue around Completeness <br />
* Exception Handling<br />
* Performance Tuning<br />
<br />
====Presentation by Robert Bruggner====<br />
<br />
[[Hibernate Presentation|Hibernate Presentation]]<br />
<br />
===PSU Chado Interface===<br />
<br />
====Background====<br />
<br />
* Source:<br />
* Language: Java<br />
* Authors: Chinmay Patel, Adrian Tivey, Tim Carver<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Limitations====<br />
<br />
* Hibernate Save: no equivalent to a ''find or save'' method.<br />
* Not great for bulk data retrieval.<br />
* Hibernate works best when applied to databases designed with objects in mind (''Object Oriented Databases''). <br />
<br />
<br />
====Presentation by Chinmay Patel====<br />
<br />
[[PSU Presentation|PSU Presentation]]<br />
<br />
===Wiki Authors===<br />
<br />
* Jeff Bowes, XenBase<br />
* Robert Bruggner, VectorBase<br />
* Scott Cain, GMOD<br />
* Josh Goodman, FlyBase<br />
* Eric Just, DictyBase<br />
* Sohel Merchant, DictyBase<br />
* Brian O'Connor, UCLA<br />
* Brian Osborne, GMOD<br />
* Chinmay Patel, GeneDB<br />
* Pinglei Zhou, FlyBase</div>165.124.152.78http://gmod.org/wiki/GMOD_MiddlewareGMOD Middleware2007-01-26T20:24:04Z<p>165.124.152.78: /* Why Modware Was Developed */</p>
<hr />
<div>==Middleware for Chado databases==<br />
<br />
===Authors===<br />
<br />
* Jeff Bowes<br />
* Robert Bruggner<br />
* Scott Cain<br />
* Josh Goodman<br />
* Eric Just<br />
* Sohel Merchant<br />
* Brian O'Connor<br />
* Brian Osborne<br />
* Chinmay Patel<br />
* Pinglei Zhou<br />
<br />
===Middleware Evaluation January 2007===<br />
<br />
A group of some 50 GMOD developers met at the annual meeting to discuss middleware. This one day meeting had the following general goals:<br />
<br />
* To educate GMOD programmers on methods and practices for Middleware<br />
* To facilitate discussion on the best methods<br />
* To guide GMOD to a uniform Middleware layer<br />
* To generate this central reference document for Middleware projects, including:<br />
** Platform information<br />
** Strengths & weaknesses of different Middleware packages<br />
** Specific examples of how one would use a given middleware package<br />
<br />
===Introduction===<br />
<br />
One of the key characteristics of the GMOD software project is the variety of approaches and components that it supports. This applies to applications, database schemas, as well as to middleware, a software ''layer'' that mediates the exchange of data between the applications and the databases. Despite this diversity certain applications and schemas have emerged as key supported components in GMOD, such as the GBrowse application and the Chado schema, to name just two. However, a consensus view has not emerged with respect to middleware, and there are certainly a number of different middleware packages that have been used in the GMOD world, coming from within this world and from the larger world of open source.<br />
<br />
In late 2006 the GMOD developers took note of the large number of middleware packages in use and elected to embark on a short-term study to evaluate and compare these packages. The primary motivation here was to select or recommend certain packages over others specifically within the GMOD context. The assumption is that making such recommendations will serve to focus the developers' effort on a smaller number of packages. Clearly it's also assumed that such a focus will inevitably lead to greater support for and use of those recommended packages, and that all will GMOD will benefit.<br />
<br />
Another purpose of this study is to educate GMOD programmers on best practices concerning the use and development of middleware. It's expected that common agreement on these practices will lead to the development of more effective software as well as the best use of the software in practice. Finally, this study should generate a central reference document on these different middleware packages used in GMOD. This reference will contain platform- and language-specific information as well as descriptions of the strengths and weaknesses of the packages that can be used by GMOD developers when considering middleware.<br />
<br />
====General Evaluation Criteria====<br />
<br />
The GMOD developers proposed that each presenter provide some basic information about each middleware package, both general and technical. In addition each middleware application was asked to address a set of sample problems, shown below. These example problems are thought to typify some of the common functions that the scientist may need when working with their own database. It was understood that not all software would be able to handle all aspects of the sample problems and this demonstration was not intended to be ''live''.<br />
<br />
=====Problem 1=====<br />
<br />
Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned <code>feature_id</code> for each inserted gene.<br />
<br />
Note:<br />
<br />
* The coordinates are given in exact coordinates.<br />
* Use the <code>organism_id</code> for your organism<br />
* Store description in the chado table <code>featureprop</code><br />
* A sequence in fasta format (see [[FakeChromosome|Fake Chromosome]]) should be loaded as genomic sequence, either chromosome or contig -- this will be used as a <code>srcfeature</code> in <code>featureloc</code><br />
<br />
Gene descriptions:<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
mRNA Feature<br />
exon_1: <br />
start: 13691<br />
end: 13767<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
exon_2: <br />
start: 14687<br />
end: 14720<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
symbol: x-men<br />
synonyms: wolverine<br />
mRNA Feature<br />
exon_1: <br />
start: 12648<br />
end: 13136<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
symbol: x-ray<br />
synonyms: none<br />
exon: <br />
start: 1703<br />
end: 1900<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 2=====<br />
<br />
Retrieve and print the following report for gene '''xfile''' (the coding sequence and exon coordinates are derived from the associated mRNA feature). The results should resemble the following:<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>xfile cds <br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
=====Problem 3=====<br />
<br />
Update the gene '''xfile''': change the name symbol to '''x-file''' and retrieve the changed record. Regenerate the report from Problem 1. The results should resemble the following:<br />
<br />
symbol: x-file<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>x-file cds<br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
=====Problem 4=====<br />
<br />
Search for all genes with symbols starting with ''x-''. With the results produce the following simple result list<br />
(organism will vary):<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
1325 x-ray Xenopus laevis<br />
<br />
=====Problem 5=====<br />
<br />
Delete the gene '''x-ray''' using the <code>geneId</code>. Run the search and report in Problem 4 again to show the delete has taken<br />
place, with a result resembling the following:<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
<br />
===Conclusions===<br />
<br />
The one day meeting heard presentations from developers using both Perl and Java middleware and a number of satisfactory solutions were described. The focus in all cases was some sort of system that connected to the Chado relational database. Although other databases are encountered in the GMOD world (e.g. BioSQL) the Chado schema is popular and serves as a good test schema for this exercise given its complexity. The primary focus in the talks was on functionality from the perspective of writing code and extending the software and less attention was given to performance. Each presenter focussed on their middleware and little side-by-side comparisons were made (for one comparison please see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]] by Josh Goodman).<br />
<br />
====Problem Assignments====<br />
<br />
All presenters paid attention to the assigned problems and all packages could perform the required operations, except for the GBrowse (DasI) Adaptor which is read-only software. Clearly one can see many differences between packages in how the the problems were solved, please see the presentations themselves for these details. <br />
<br />
The Perl approaches used only the Perl language whereas the Java packages all used Java plus XML, to some degree. In addition iBatis exposes SQL to the developer and it was argued that this could be viewed either as an advantage (allows tuning of underlying SQL) or disadvantage.<br />
<br />
====Java Middleware====<br />
<br />
* Flybase examined iBatis and Hibernate, both use XML configuration files<br />
** Hibernate is better if you're building schema from scratch<br />
** Both auto-configure given a schema. <br />
** Both have strengths and weaknesses.<br />
* Is Hibernate better when you're in the process of designing a schema?<br />
** Hibernate can assist you in making a ''Hibernate-compatible'' schema.<br />
<br />
=====Abstraction=====<br />
<br />
<br />
=====Performance=====<br />
<br />
No pairwise comparisons.<br />
<br />
=====Configuration & Autoconfiguration=====<br />
<br />
<br />
=====Documentation=====<br />
<br />
<br />
====Perl Middleware====<br />
<br />
<br />
=====Abstraction=====<br />
<br />
Modware has Higher level abstraction than that provided by Chado::AutoDBI<br />
<br />
=====Performance=====<br />
<br />
No pairwise comparisons.<br />
<br />
=====Configuration & Autoconfiguration=====<br />
<br />
<br />
=====Documentation=====<br />
<br />
DasI interface is well-documented, about a dozen methods and three classes, all documented.<br />
<br />
===Object-Relational Mapping Principles===<br />
<br />
====Presentation by Sohel Merchant====<br />
<br />
Sohel Merchant, Bioinformatics Software Engineer at dictyBase, Center for Genetic Medicine, Northwestern University, Chicago. This Wiki section is an edited version of [[Media:Object_Relational_Mapping_Layer.pdf|Sohel's presentation]].<br />
<br />
=====Outline=====<br />
<br />
* The Problem<br />
* Solutions<br />
* ORM<br />
* Perl – Class::DBI<br />
* Summary<br />
<br />
=====The Problem=====<br />
<br />
* Developers need to perform Create, Retrieve, Update, Delete (aka CRUD) operations on data inside an application.<br />
* The real world objects represented using a programming language needs to be stored in databases<br />
* Using relational databases to store object-oriented data leads to a semantic gap<br />
* RDBMS have fixed types, but OO can have more complicated user defined types.<br />
<br />
=====Solutions=====<br />
<br />
* Data Access Object (DAO)<br />
* Developer writes a class which contains one attribute for each field in the table<br />
* Methods for CRUD typically contains JDBC/DBI code with the necessary SQL statements.<br />
* Object Relational Mapping (ORM), WikiPedia:<br />
** “ORM is a programming technique that links databases to object-oriented language concepts, creating (in effect) a virtual object database.“<br />
* Developer needs to configure the ORM<br />
* Less amount of manual coding<br />
* CRUD methods are automatically generated by the ORM layer<br />
<br />
=====ORM=====<br />
<br />
ORM solutions<br />
<br />
* Perl<br />
** Class::DBI<br />
* Java<br />
** EJB<br />
** Hibernate<br />
** JDO<br />
** iBatis<br />
<br />
=====Perl - Class::DBI=====<br />
<br />
* Provides a simple interfaces for wrapping Perl classes around a database tables<br />
* Tables are mapped directly to objects<br />
* The table column name are mapped to the get/set methods<br />
* Can be used with transactions<br />
<br />
=====Class::DBI=====<br />
<br />
Defining a class in Class::DBI to represent a table:<br />
<br />
CVTERM<br />
cvterm_id<br />
cv_id<br />
name<br />
definition<br />
dbxref_id<br />
<br />
Corresponding code:<br />
<br />
<perl><br />
package Chado::Cvterm;<br />
use base 'Chado::DBI';<br />
Chado::Cvterm->set_up_table('Cvterm');<br />
</perl><br />
<br />
=====Class::DBI - CRUD=====<br />
<br />
<perl><br />
## Create<br />
$term_dbobj = Chado::Cvterm->create({<br />
name => ”DUMMY TERM”,<br />
cv_id => 1,<br />
dbxref_id => 125<br />
});<br />
## Retrieve<br />
$term_dbobj = Chado::Cvterm->retrieve(2);<br />
<br />
## Update<br />
$term_dbobj->name( $term->name() );<br />
$term_dbobj->definition( $term->definition );<br />
<br />
## Delete<br />
$term_dbobj->delete();<br />
</perl><br />
<br />
=====Java - Hibernate=====<br />
<br />
* Hibernate maps Java Objects directly to database tables<br />
* Scalable<br />
* Works well for controlled Data model<br />
<br />
=====Java - iBatis=====<br />
<br />
* iBATIS maps Java Objects to the results of SQL Queries<br />
* XML definitions for queries<br />
* Queries and managing Maps<br />
* Transactions<br />
* Good fit for existing database schema<br />
<br />
=====Summary=====<br />
<br />
* ORM provides painless roundtrip of data between the application and database.<br />
* Reduces the amount of SQL code and allows a programmatic style interface to the RDBMS<br />
* Choice of ORM solution depends on the type of project<br />
* Flybase examined iBatis and Hibernate, both use XML configuration files<br />
** Hibernate is better if you're building schema from scratch<br />
** Both auto-configure given a schema. <br />
** Both have strengths and weaknesses.<br />
* Is Hibernate better when you're in the process of designing a schema?<br />
** Hibernate can assist you in making a ''Hibernate-compatible'' schema.<br />
<br />
===XORT===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/project/showfiles.php?group_id=27707<br />
* Language: Perl<br />
* Authors: Pinglei Zhou<br />
* Users: Flybase<br />
* Support: Pinglei Zhou<br />
* Third party code: Uses XML::Parser::PerlSAX, Unicode::String, XML::DOM, and DBI from Perl (CPAN)<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
=====Comparing Hibernate & XORT=====<br />
<br />
Flybase tried Hibernate, but just creating simple print() statements in the course of doing bulk operations they<br />
encountered performance issues. Therer are many caching parameters available in Hibernate but the problem is that Chado is recursive or cyclical. <br />
XORT does some simple, and automatic, caching. With XORT you can handle recursive or cyclical operations more easily. In common operations such as merging genes Chado users will encounter this issue routinely.<br />
<br />
Also see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]].<br />
<br />
====Limitations====<br />
<br />
* Database schemas need to follow certain rules<br />
** All must have internal int primary key<br />
** All must have unique key(s)<br />
* It may take a long path to retrieve certain type of data<br />
** Example: gene->allele->genotype->phenotype via feature_relationship<br />
* Structure not stored in memory, you flush out data as it goes<br />
<br />
====Presentation by Pinglei Zhou and Josh Goodman====<br />
<br />
This Wiki section is an edited version of [[Media:XORT.pdf|Josh and Pinglei's presentation]].<br />
<br />
<br />
<br />
=====Introduction=====<br />
<br />
* An XML-database mapping system for data exchange between DB and XML-driven application<br />
* XORT can handle typical XML, it's not Chado-specific<br />
* Developed/Supported by Pinglei Zhou at FlyBase Harvard, 0.007 version now.<br />
* Used at all FlyBase sites<br />
** Harvard has extensive library of Perl modules for generating ChadoXML<br />
* Written in Perl<br />
* Required perl modules:<br />
** XML::Parser::PerlSAX<br />
** Unicode::String<br />
** XML::DOM<br />
** DBI<br />
<br />
=====Chado XML=====<br />
<br />
* Is ChadoXML necessary? No, but it may help you.<br />
* ChadoXML assists with incremental updates, if you want to avoid flush-and-reload.<br />
* While update can be achived by other middleware (for example, perl Class::DBI, Java Hibernate), ChadoXML provide additional feature as way to archive your transaction.<br />
* It provides bulk update/download which other methods lack or is inefficient<br />
<br />
=====Components=====<br />
<br />
* Database & Schema<br />
* ChadoXML Specification<br />
* DumpSpec<br />
** DumpSpec files are simple XML files that tells XORT what to do<br />
** DumpSpec files are ''language independent'', being XML<br />
** It's fairly easy for those who know the schema to read these files and understand what the operation is<br />
<br />
=====Highlights of Chado XML Specification=====<br />
<br />
* Unique represent of specific database schema<br />
* Get away with those internal primary key value<br />
* Static vs. Operational<br />
* Encoding for non-ASCII characters<br />
* Macro mechanism (object reference)<br />
<br />
=====Putting it together: New FlyBase dataflow Part 1=====<br />
<br />
There are three Flybase sites, and most curation is done at Harvard and<br />
Cambridge. Proforma is the curation format at Cambridge and Harvard, but<br />
Harvard also curates with Apollo and ChadoXML.<br />
<br />
Once in Chado, the reporting instance, there's a denormalization step<br />
in moving data to a read-only database. Once in the read-only database there are<br />
dumps, for reporting purposes, using XORT to create ChadoXML. Once<br />
ChadoXML is created version 2 of XSLT is used to create HTML and GFF. HTML reports<br />
are for human-readable reports, GFF for GBrowse and for various power<br />
users.<br />
<br />
1.a. Proforma (FlyBase Cambridge) is converted to ChadoXML<br />
<br />
1.b. ChadoXML is created by Apollo (Harvard)<br />
<br />
1.c. ChadoXML is created by Java SEAN (Harvard)<br />
<br />
2. All ChadoXML is loaded into Chado by XORT<br />
<br />
=====Putting it together: New FlyBase dataflow Part 2=====<br />
<br />
3. Chado (Harvard) is denormalized and loaded into Chado (Indiana)<br />
<br />
4. ChadoXML is created from Chado using XORT<br />
<br />
5.a. GFF and Fasta is created from ChadoXML<br />
<br />
5.b. HTML is created from Chado XML<br />
<br />
=====Data & Report Generation=====<br />
<br />
* Content of all output files is controlled by XML dumpspecs.<br />
** Dumpspecs are language independent.<br />
** Easily readable (with knowledge of Chado structure).<br />
* All XML transformation steps are done with XSLT v2.<br />
** Saxon XSLT (http://saxon.sourceforge.net/)<br />
** ChadoXML is split into individual chunks before XSLT processing to accommodate large file sizes.<br />
** Extremely fast. We can process all data for ~60,000 Drosophila genes in under 30 minutes.<br />
<br />
=====Hibernate & XORT=====<br />
<br />
* Hibernate didn't scale well when dealing with 5,000+ features in bulk.<br />
** The test was simply calling <code>print()</code> statements<br />
* Performance tweaks for Hibernate can be quite complicated to setup for bulk operations.<br />
* XORT is currently handling ~6 million features in production with only minor performance problems.<br />
* XORT is much more language independent.<br />
<br />
=====Support for complex transactions using XORT=====<br />
<br />
For example:<br />
<br />
* Find all records linked to a record using dumpspec<br />
* Merge gene x into y, each with thousands of records attached<br />
<br />
Step 1. Dump all data use simple dumpspec<br />
<xml><br />
<chado><br />
<feature dump=“all”><br />
<uniquename test=“eq”>x</uniquename><br />
</feature><br />
</chado><br />
</xml><br />
Step 2 Delete feature x from DB, with triggers to clean orphan records, if necessary<br />
<br />
Step 3. Edit the output xml, change uniquename x to y, then load the edited file back to DB<br />
<br />
=====CHIA (Chado Interface Application)=====<br />
<br />
A Java application that organizes SQL and XORT functionality for internal users, e.g.:<br />
<br />
* Dump chado-XML for gene regions for Apollo curation<br />
* Organize and execute “canned” SQL queries<br />
* Serve IDs for curators (in development)<br />
* Dynamic browser Chado without writing SQL statement<br />
<br />
CHIA is being designed to be extensible for adding new functionality as needed.<br />
<br />
<br />
=====Documentation=====<br />
<br />
* ''Using Chado to Store Genome Annotation Data"<br />
** Current Protocols in Bioinformatics (Baxevanis, A.D., and Davison, D.B., eds) 2, 9.6.1-9.6.28.<br />
* XORT specification docs<br />
* XORT draft (unpublished)<br />
* GMOD case demo procedure<br />
** All in the doc directory of XORT package, http://www.gmod.org<br />
<br />
=====Acknowledgements=====<br />
<br />
* Willian Gelbart <br />
* Chris Mungall<br />
* David Emmert <br />
* Mark Gibson<br />
* Stan Letovsky <br />
* Nomi Harris<br />
* Frank Smutniak <br />
* Suzanna Lewis<br />
* Peili Zhang <br />
* Stan Letovsky<br />
* Haiyan Zhang <br />
* Aubrey de Grey<br />
* Andy Schroeder <br />
* Don Gilbert<br />
* Susan Russo<br />
* Mark Zythovicz <br />
* Scott Cain<br />
* Lincoln Stein<br />
* Victor Strelets<br />
* Robert Wilson<br />
* Paul Leyland<br />
<br />
===Chado::AutoDBI===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/projects/gmod/<br />
* Language: Perl<br />
* Authors: Allen Day, Scott Cain, Brian O'Connor, & others<br />
* Users: <br />
* Support:<br />
* Third party code: Based on Class::DBI by Michael Schwern & Tony Bowden<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Performance<br />
** Can one read thousands of objects into memory? You could do this but it's not suited to bulk operations<br />
* Joins & complex queries<br />
<br />
<perl><br />
# Add the add_constructor for looking for name lengths<br />
<br />
__PACKAGE__ ->add_constructor(long_names => qq{ length(name) > 15 });<br />
<br />
# Custom SQL<br />
<br />
__PACKAGE__->set_sql(xfiles => qq{<br />
SELECT FEATURE_ID<br />
FROM FEATURE<br />
WHERE NAME = 'xfiles' });<br />
</perl><br />
<br />
====Presentation by Brian O'Connor====<br />
<br />
This Wiki section is an edited version of [[Media:AutoDBI.pdf|Brian's presentation]].<br />
<br />
=====Relation to Turnkey=====<br />
<br />
Turnkey is a package that auto-generates Web sites given a relational<br />
schema, based on SQL::Translator<br />
<br />
* Turnkey authors: Allen Day, Scott Cain, Brian O'Connor<br />
* Turnkey and Chado::AutoDBI objects are essentially the same<br />
<br />
=====Technical Overview=====<br />
<br />
* Code Generation<br />
<br />
=====Project Overview=====<br />
<br />
Convert SQL Queries/Inserts/Deletes -> Object Calls<br />
<sql><br />
INSERT INTO feature (organism_id, name)<br />
VALUES (1, 'foo');<br />
</sql><br />
To:<br />
<perl><br />
my $feature = Turnkey::Model::Feature->find_or_create({<br />
organism_id => $organism,<br />
name => 'xfile', uniquename => 'xfile',<br />
type_id => $mrna_cvterm,<br />
is_analysis => 'f', is_obsolete => 'f'<br />
});<br />
</perl><br />
<br />
=====Technical Overview=====<br />
<br />
* Database connection: use a base class<br />
* Set up base object and connect, then create a ''table object'' to access primary key. <br />
* Class::DBI can find and insert records into other table, based on foreign key.<br />
<br />
<perl><br />
use base qw(Class::DBI::Pg);<br />
<br />
my ($dsn, $name, $pass);<br />
$dsn = "dbi:Pg:host=localhost;dbname=chado;port=5432";<br />
$name = "postgres";<br />
$pass = "";<br />
<br />
Turnkey::Model::DBI->set_db('Main', $dsn, $name, $pass, {AutoCommit => 1});<br />
</perl><br />
<br />
=====Technical Overview=====<br />
<br />
* Basic ORM Object: Feature<br />
<br />
<perl><br />
package Turnkey::Model::Feature;<br />
use base 'Turnkey::Model::DBI';<br />
<br />
Turnkey::Model::Feature->set_up_table('feature');<br />
<br />
#<br />
# Primary key accessors<br />
#<br />
<br />
sub id { shift->feature_id }<br />
sub feature { shift->feature_id }<br />
</perl><br />
<br />
* data field accessors by Class::Accessor<br />
<br />
=====Technical Overview=====<br />
<br />
* Basic ORM Object: Feature<br />
** has_a<br />
<br />
<perl><br />
#<br />
# has_a<br />
#<br />
Turnkey::Model::Feature->has_a( type_id => "Turnkey::Model::Cvterm" );<br />
sub cvterm { return shift->type_id; }<br />
</perl><br />
<br />
* Basic ORM Object: Feature<br />
** has_many<br />
<br />
<perl><br />
#<br />
# has_many<br />
#<br />
Turnkey::Model::Feature->has_many('feature_synonym_feature_id', <br />
'Turnkey::Model::Feature_Synonym' => 'feature_id');<br />
sub feature_synonyms { return shift->feature_synonym_feature_id; }<br />
<br />
Turnkey::Model::Feature->has_many('featureprop_feature_id', <br />
'Turnkey::Model::Featureprop' => 'feature_id');<br />
sub featureprops { return shift->featureprop_feature_id; }<br />
</perl><br />
<br />
* Can traverse tables, such as going from FEATURE to FEATUREPROP <br />
** Tell base object that the ''table object'' has_a() or has_many() keys corresponding to some key in other ''table object'' <br />
<br />
=====Technical Overview=====<br />
<br />
* Basic ORM Object: Feature<br />
** skipping linker tables for has_many<br />
<br />
<perl><br />
# skip over feature_synonym table<br />
#<br />
# method 1<br />
#<br />
sub synonyms { my $self = shift; return map $_->synonym_id, $self->feature_synonyms; }<br />
#<br />
# method 2<br />
#<br />
Turnkey::Model::Feature->has_many( synonyms2 =><br />
['Turnkey::Model::Feature_Synonym' => 'synonym_id']);<br />
</perl><br />
<br />
=====Technical Overview=====<br />
<br />
* Transactions<br />
** Chado::AutoDBI supports transactions, and one can wrap the transaction in an eval()<br />
<perl><br />
sub do_transaction {<br />
my $class = shift;<br />
my ( $code ) = @_;<br />
# Turn off AutoCommit for this scope.<br />
# A commit will occur at the exit of this block automatically,<br />
# when the local AutoCommit goes out of scope.<br />
local $class->db_Main->{ AutoCommit };<br />
<br />
# Execute the required code inside the transaction.<br />
eval { $code->() };<br />
if ( $@ ) {<br />
my $commit_error = $@;<br />
eval { $class->dbi_rollback }; # might also die!<br />
die $commit_error;<br />
}<br />
}<br />
</perl><br />
<br />
=====Technical Overview=====<br />
<br />
* Lazy Loading<br />
** One can either do automated creation of objects or explicitly dictate which fields are incorporated into object<br />
<perl><br />
Turnkey::Model::Feature->columns( Primary => qw/feature_id/ );<br />
Turnkey::Model::Feature->columns( Essential => qw/name organism_id type_id/ );<br />
Turnkey::Model::Feature->columns( Others => qw/residues .../ );<br />
</perl><br />
<br />
Typically:<br />
<br />
<perl><br />
Turnkey::Model::Feature->set_up_table('feature');<br />
</perl><br />
<br />
=====Problem 1=====<br />
<br />
* Create Feature & Add Description<br />
<br />
<perl><br />
# now create mRNA feature<br />
<br />
my $feature = Turnkey::Model::Feature->find_or_create({<br />
organism_id => $organism,<br />
name => 'xfile', uniquename => 'xfile',<br />
type_id => $mrna_cvterm,<br />
is_analysis => 'f', is_obsolete => 'f'<br />
});<br />
<br />
# create description<br />
<br />
my $featureprop = Turnkey::Model::Featureprop->find_or_create({<br />
value => 'A test gene for GMOD meeting',<br />
feature_id => $feature,<br />
type_id => $note_cvterm,<br />
});<br />
</perl><br />
<br />
=====Problem 2=====<br />
<br />
* Retrieve a Feature via Searching<br />
** Search using strings or identifiers, a search will return an iterator object<br />
<br />
<perl><br />
# objects for global use<br />
<br />
# the organism for our new feature<br />
my $organism = Turnkey::Model::Organism->search(abbreviation => "S.cerevisiae")->next;<br />
<br />
# the cvterm for a "Note"<br />
my $note_cvterm = Turnkey::Model::Cvterm->retrieve(2);<br />
<br />
# searching name by wildcard<br />
<br />
my @results = Turnkey::Model::Feature->search_like(name => 'x-%');<br />
</perl><br />
<br />
=====Problems 3, 4, & 5=====<br />
<br />
* Update a Feature<br />
<br />
<perl><br />
# update the xfile gene name<br />
<br />
$feature->name("x-file");<br />
$feature->update();<br />
</perl><br />
<br />
* Delete a Feature<br />
<br />
<perl><br />
# now delete the x-file feature<br />
<br />
$feature->delete();<br />
</perl><br />
<br />
=====Things Chado::AutoDBI does well=====<br />
<br />
* Easy to use<br />
* Easy to port<br />
* Use with other DBs<br />
** Both Oracle and Postgres used currently<br />
* Autogenerated via Turnkey<br />
* find_or_create method<br />
* Performance is not as bad as you might guess<br />
** Due to Lazy loading<br />
** Even whole genome operations are feasible<br />
<br />
Note that speed is relative: one can find bad performance using the wrong SQL and Chado::AutoDBI approach will be speedier.<br />
<br />
<br />
=====For More Information=====<br />
<br />
* Class::DBI<br />
** http://www.class-dbi.com<br />
** http://search.cpan.org<br />
<br />
* Turnkey<br />
** http://turnkey.sf.net<br />
<br />
* Biopackages<br />
** http://biopackages.net<br />
<br />
===Modware===<br />
<br />
====Background====<br />
<br />
* Source: http://gmod-ware.sourceforge.net<br />
* Language: Perl<br />
* Authors: Sohel Merchant, Eric Just<br />
* Contact: e-just [at] northwestern.edu<br />
* Users: DictyBase<br />
* Support: <br />
* Third party code: GMOD, BioPerl<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity: Uses Chado::AutoDBI to connect. Connection is configured on GMOD install.<br />
* Transaction support: See Chado::AutoDBI talk.<br />
* Code generation: No automatic code generation<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Does not ''cover'' all of Chado<br />
* Not enough users to get quality feedback yet<br />
* Performance (?)<br />
* Language dependent<br />
<br />
====Presentation by Eric Just====<br />
<br />
Eric Just, Senior Bioinformatics Scientist, dictyBase: http://dictybase.org<br />
Center for Genetic Medicine, Northwestern University. This is an edited version of [[Media:Modware.pdf|Eric's presentation]].<br />
<br />
=====Why Modware Was Developed=====<br />
<br />
* Each feature type requires different behavior<br />
* Want to leave schema semantics out of application<br />
* Want to leverage work done in BioPerl<br />
* Re-use code developed for common use cases<br />
* DictyBase is using a superset of Modware<br />
** Modware uses this code, but strips out all non-standard GMOD code<br />
* Provides nice interface over stock GMOD installation<br />
<br />
=====What is in the Feature Table?=====<br />
<br />
The core of Chado<br />
<br />
* Chromosome<br />
* Contig<br />
* Gene<br />
* mRNA<br />
* Exon<br />
* Lots of other things - See Sequence Ontology!<br />
<br />
=====Modware Features=====<br />
<br />
* Multiple Feature classes<br />
** CHROMOSOME, GENE, MRNA, CONTIG<br />
* Each class provides type specific methods<br />
* Logic such as building exon structure of mRNA features is encapsulated<br />
* Parent class Modware::Feature<br />
** Provides common methods such as name(), primary_id(), external_ids() <br />
** Abstract factory for various feature types<br />
* Lazy : information is only retrieved when you ask for it, but cached for <br />
speedy retrieval the next time it is required<br />
* Uses Bioperl and its objects<br />
** Each different feature subclass has a bioperl() method that returns <br />
an appropriate BioPerl object.<br />
** Bioperl object manipulation used to update feature coordinates<br />
* Subclasses provide type-specific methods<br />
** For example, Chromosome isn't the same as Gene which isn't the same as ...<br />
* Any feature type not explicitly supported in Modware::Feature class is blessed <br />
as a Modware::Feature::GENERIC class which has a start/stop coordinate on a genomic <br />
sequence feature (no structure like a trasncript with exons)<br />
<br />
=====Architectural Overview=====<br />
<br />
* Object-oriented Perl interface to Chado<br />
* Built on top of Chado::AutoDBI<br />
* Connection handled by GMOD<br />
* Database transactions supported<br />
* BioPerl used to represent and manipulate sequence and feature structure<br />
* ‘Lazy’ evaluation<br />
<br />
=====Create and Insert Chromosome=====<br />
<br />
<perl><br />
my $seq_io = new Bio::SeqIO(<br />
-file => "../data/fake_chromosome.txt",<br />
-format => 'fasta'<br />
);<br />
<br />
# Bio::SeqIO will return a Bio::Seq object which<br />
# Modware uses as its representation<br />
my $seq = $seq_io->next_seq();<br />
<br />
my $reference_feature = new Modware::Feature(<br />
-type => 'chromosome',<br />
-bioperl => $seq,<br />
-description => "This is a test",<br />
-name => 'Fake',<br />
-source => 'GMOD 2007 Demo'<br />
);<br />
<br />
# Inserts chromosome into database<br />
$reference_feature->insert();<br />
</perl><br />
<br />
<br />
=====Problem 1 - Create and Insert a Gene=====<br />
<br />
1) Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned feature_id for each inserted gene.<br />
<br />
Gene Feature<br />
symbol: x-ray<br />
synonyms: none<br />
mRNA Feature<br />
exon:<br />
start: 1703<br />
end: 1900<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 1 - Create and Insert a Gene=====<br />
<br />
1) Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned feature_id for each inserted gene.<br />
<br />
Gene Feature<br />
symbol: x-men<br />
synonyms: wolverine<br />
mRNA Feature<br />
exon_1: <br />
start: 12648<br />
end: 13136<br />
strand: 1<br />
srcFeature_id: <br />
Id of genomic sample<br />
<br />
=====Problem 1 - Create and Insert a Gene=====<br />
<br />
1) Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned feature_id for each inserted gene.<br />
<br />
Gene Feature<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
mRNA Feature<br />
exon_1: <br />
start: 13691<br />
end: 13767<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
exon_2: <br />
start: 14687<br />
end: 14720<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 1 - Create and Insert a Gene=====<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
…<br />
<br />
<perl><br />
my $gene_feature = new Modware::Feature(<br />
-type => 'gene',<br />
-name => 'xfile',<br />
-description => 'A test gene for GMOD meeting',<br />
-source => 'GMOD 2007 Demo‘<br />
);<br />
<br />
$gene_feature->add_synonym( 'mulder' );<br />
$gene_feature->add_synonym( 'scully' );<br />
<br />
# inserts object into database<br />
$gene_feature->insert();<br />
print 'Inserted gene with feature_id:'.$gene_feature->feature_id()."\n";<br />
</perl><br />
<br />
=====Problem 1 - Create mRNA BioPerl Object=====<br />
<br />
exon_1: exon_2: <br />
start: 13691 start: 14687<br />
end: 13767 end: 14720<br />
strand: 1 strand: 1<br />
srcFeature_id: Id of genomic sample srcFeature_id: Id of genomic sample<br />
<br />
<perl><br />
# First, create exon features (using Bioperl)<br />
my $exon_1 = new Bio::SeqFeature::Gene::Exon (<br />
-start => 13691,<br />
-end => 13767,<br />
-strand => 1,<br />
-is_coding => 1<br />
);<br />
<br />
my $exon_2 = new Bio::SeqFeature::Gene::Exon (<br />
-start => 14687,<br />
-end => 14720,<br />
-strand => 1,<br />
-is_coding => 1<br />
);<br />
<br />
# Next, create transcript feature to 'hold' exons (using Bioperl)<br />
my $bioperl_mrna = new Bio::SeqFeature::Gene::Transcript();<br />
<br />
# Add exons to transcript (using Bioperl)<br />
$bioperl_mrna->add_exon( $exon_1 );<br />
$bioperl_mrna->add_exon( $exon_2 );<br />
</perl><br />
<br />
=====Problem 1 - Create and Insert mRNA=====<br />
<br />
The BioPerl object holds the location information, but now we want to create a Modware object and link it to the gene as well as locate it on the chromosome.<br />
<br />
<perl><br />
# Now create Modware Feature to 'hold' bioperl object<br />
my $mrna_feature = new Modware::Feature(<br />
-type => 'mRNA',<br />
-bioperl => $bioperl_mrna,<br />
-source => 'GMOD 2007 Demo',<br />
-reference_feature => $reference_feature<br />
);<br />
<br />
# Associate mRNA to gene (required for insertion)<br />
$mrna_feature->gene( $gene_feature );<br />
<br />
# inserts object into database<br />
$mrna_feature->insert();<br />
</perl><br />
<br />
=====Problem 2 - Writing the Report=====<br />
<br />
2) Retrieve and print the following report for gene xfile<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>xfile cds<br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
Create new package, GMODWriter, to write the report, this package<br />
uses Modware and Bioperl methods.<br />
<br />
<perl><br />
use Modware::Gene;<br />
use GMODWriter;<br />
<br />
my $xfile_gene = new Modware::Gene( -name => 'xfile' );<br />
GMODWriter->Write_gene_report( $xfile_gene );<br />
</perl><br />
<br />
* What's the difference between Modware::Gene and Modware::Feature? Gene is-a Feature.<br />
<br />
=====Problem 2 - Writing the Report=====<br />
<br />
2) Retrieve and print the following report for gene xfile<br />
<br />
* The mRNA object contains the Bioperl object<br />
** Why not just subclass? More flexibility the way shown here<br />
<br />
<perl><br />
package GMODWriter; <br />
sub Write_gene_report {<br />
my ($self, $gene) = @_;<br />
my $symbol = $gene->name();<br />
<br />
my @synonyms = @{ $gene->synonyms() };<br />
my $syn_string = join ",", @synonyms;<br />
my $description = $gene->description();<br />
my $type = $gene->type();<br />
# get features associated with the gene that are of type 'mRNA'<br />
my ($mrna) = grep { $_->type() eq 'mRNA' } @{ $gene->features() };<br />
# use bioperl method to get exons from mRNA<br />
my @exons = $mrna->bioperl->exons_ordered();<br />
# Modware will return a nice fasta file for you.<br />
my $fasta = $mrna->sequence( -type => 'cds', -format => 'fasta' );<br />
# Now print the actual report<br />
print "symbol: $symbol\n";<br />
print "synonyms: $syn_string\n";<br />
print "description: $description\n";<br />
print "type: $type\n";<br />
<br />
my $count = 0;<br />
foreach my $exon (@exons ) {<br />
$count++;<br />
print "exon${count} start: ".$exon->start()."\n";<br />
<br />
print "exon${count} end: ".$exon->end()."\n";<br />
<br />
}<br />
print "$fasta";<br />
}<br />
. . .<br />
</perl><br />
<br />
=====Problem 3 - Updating a Gene Name=====<br />
<br />
3) Update the gene xfile: change the name symbol to x-file and retrieve the changed record. Regenerate gene report<br />
<br />
<perl><br />
use Modware::Gene;<br />
use Modware::DBH;<br />
use GMODWriter;<br />
<br />
eval{<br />
<br />
# get xfile gene<br />
my $xfile_gene = new Modware::Gene( -name => 'xfile' );<br />
<br />
# change the name<br />
$xfile_gene->name( 'x-file' );<br />
# write changes to database<br />
$xfile_gene->update();<br />
<br />
# we can use the original object if we want, but instead<br />
# we refetch from the database to 'prove' the name has been changed<br />
my $xfile_gene2 = new Modware::Gene( -name => 'x-file' );<br />
# use our GMODWriter package to write report for x-file<br />
GMODWriter->Write_gene_report( $xfile_gene2 );<br />
<br />
};<br />
if ($@){<br />
warn $@;<br />
new Modware::DBH->rollback();<br />
}<br />
</perl><br />
<br />
<br />
=====Problem 4 - Search and Display Results=====<br />
<br />
4) Search for all genes with symbols starting with "x-*". With the results produce the following simple result list (organism will vary):<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
1325 x-ray Xenopus laevis<br />
<br />
<perl><br />
use Modware::Gene;<br />
use Modware::DBH;<br />
use GMODWriter;<br />
<br />
# find genes starting with 'x-'<br />
my $results = Modware::Search::Gene->Search_by_name( 'x-*' );<br />
<br />
# write the search results<br />
GMODWriter->Write_search_results( $results )<br />
</perl><br />
<br />
<br />
=====Problem 4 - Search and Display Results=====<br />
<br />
4) Search for all genes with symbols starting with "x-*". With the results produce the following simple result list (organism will vary):<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
1325 x-ray Xenopus laevis<br />
<br />
<br />
<perl><br />
sub Write_search_results {<br />
my ($self, $itr) = @_;<br />
# loop through iterator<br />
while (my $gene = $itr->next) {<br />
# print the requested information<br />
print $gene->feature_id . "\t" . $gene->name .<br />
"\t" . $gene->organism_name . "\n";<br />
}<br />
}<br />
</perl><br />
<br />
=====Problem 5 - Delete a Gene=====<br />
<br />
5) Delete the gene x-ray. Run the search and report again.<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
<br />
<perl><br />
# get the xray gene<br />
my $xray = new Modware::Gene( -name => 'x-ray' );<br />
<br />
# set is_deleted = 1, this will 'hide' the gene from searches,<br />
# also sets the is_available to 0, the gene is no longer visible<br />
# to a search.<br />
<br />
$xray->is_deleted(1);<br />
<br />
# write change to database<br />
$xray->update();<br />
<br />
# find genes starting with 'x-'<br />
my $results = Modware::Search::Gene->Search_by_name( 'x-*' );<br />
<br />
# write the search results<br />
GMODWriter->Write_search_results( $results )<br />
</perl><br />
<br />
<br />
=====Other Modware Highlights=====<br />
<br />
* Easy to write applications with Modware<br />
* Extensible<br />
* Available through Sourceforge<br />
** http://gmod-ware.sourceforge.net<br />
* Easy to install<br />
* Large unit test coverage<br />
* Current release 0.2-RC1<br />
** Works with GMOD’s latest release<br />
* Sample script demoed here are available<br />
** sample_scripts directory<br />
<br />
=====Other Nice Things About Modware=====<br />
<br />
<br />
* Bioperl-style documentation <br />
** http://gmod-ware.sourceforge.net/doc/<br />
** POD for all methods<br />
* If Chado changes then...<br />
** Manually change Modware or ... <br />
** AutoDBI will automatically adjust to the change, depends on the change<br />
* Can set multiple connections through AutoDBI's <code>set_connection</code><br />
<br />
=====Coming Attractions=====<br />
<br />
* Support for changing genomic sequence<br />
* ncRNAs<br />
* UTRs<br />
* Onotology modules<br />
* Phenotype Annotations<br />
* Getting a new database handle returns the existing<br />
** Thinking about configuring modules to set what database handle can be used<br />
* Pass an argument ''type'' to the Gene's feature() method<br />
* Type the kind of synonym is being inserted?<br />
** Possible: trade-off between simplicity and functionality<br />
<br />
* Send us your ideas!<br />
<br />
<br />
=====Discussion=====<br />
<br />
* How hard is it to extend Modware?<br />
** Not known absolutely, but generally thought to be not difficult<br />
<br />
=====Acknowlegments=====<br />
<br />
* Rex Chisholm, PhD<br />
* Warren Kibbe, PhD <br />
* Scott Cain<br />
* Brian O’connor<br />
* Sohel Merchant<br />
* Petra Fey<br />
* Pascale Gaudet,<br />
* Karen Pilcher<br />
<br />
* BioPerl<br />
* GMOD<br />
* SGD<br />
<br />
===GBrowse (DasI) Adaptor===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/project/showfiles.php?group_id=27707&package_id=34513&release_id=433523<br />
* Language: Perl<br />
* Authors: Scott Cain<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity: Connects vi perl DBI<br />
* Transaction support: N/A (read only adapter)<br />
* Code generation: N/A<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Read-only<br />
* Not generic middleware but if you use Chado and GBrowse may be useful<br />
* Incomplete implementation of Bio::DasI; just enough to make GBrowse work<br />
* Also, despite the name, has never been tested with a Das server.<br />
<br />
====Presentation by Scott Cain====<br />
<br />
This Wiki section is an edited version of [[Media:DasI_middleware.pdf|Scott's presentation]].<br />
<br />
=====Create the database=====<br />
<br />
$ perl Makefile.PL<br />
$ make<br />
$ sudo make install<br />
$ make load_schema<br />
$ make prepdb # now with Xenopus!<br />
$ make ontologies # load rel, SO, featureprop<br />
<br />
=====Problem 1 - Loading Data=====<br />
<br />
Create some GFF from the specifications:<br />
<br />
fake_chromosome example chromosome 1 15017 . . . ID=fake_chromosome;Name=fake_chromosome<br />
fake_chromosome example gene 13691 14720 . + . ID=xfile;Name=xfile;Alias=mulder,scully;Note=A test gene for GMOD meeting<br />
fake_chromosome example mRNA 13691 14720 . + . ID=xfile_mRNA;Parent=xfile<br />
fake_chromosome example exon 13691 13767 . + . Parent=xfile_mRNA<br />
fake_chromosome example exon 14687 14720 . + . Parent=xfile_mRNA<br />
fake_chromosome example gene 12648 13136 . + . ID=x-men<br />
<br />
Gene inserted as GFF using a standard Bioperl bulk loader:<br />
<br />
<code>$ gmod_bulk_load_gff3.pl -g sample.gff</code><br />
<br />
''...lots of output...''<br />
<br />
=====Adaptor Components=====<br />
<br />
* Bio::DB::Das::Chado<br />
** Database connection object<br />
* Bio::DB::Das::Chado::Segment<br />
** Object for any range of DNA<br />
* Bio::DB::Das::Chado::Segment::Feature<br />
<br />
=====Use Bio::DB::Das::Chado=====<br />
<br />
<perl><br />
use Bio::DB::Das::Chado;<br />
<br />
my $chado = Bio::DB::Das::Chado->new(<br />
-dsn => "dbi:Pg:dbname=test",<br />
-user=> "scott",<br />
-pass=> "" ) || die "no new chado";<br />
<br />
my $gene_name = 'xfile';<br />
<br />
my ($gene_fo) = $chado->get_features_by_name($gene_name);<br />
</perl><br />
<br />
=====Problem 2 - Use Some Accessors=====<br />
<br />
<perl><br />
print "symbol: " . $gene_fo->display_name."\n";<br />
print "synonyms: " . join(', ',$gene_fo->synonyms)."\n";<br />
print "description: " . $gene_fo->notes."\n";<br />
print "type: " . $gene_fo->type."\n";<br />
<br />
my ($mRNA) = $gene_fo->sub_SeqFeature();<br />
my @exons = $mRNA->sub_SeqFeature();<br />
<br />
for my $exon (@exons) {<br />
next unless ($exon->type->method eq 'exon');<br />
$exon_count++;<br />
print "exon$exon_count start: " . $exon->start."\n";<br />
print "exon$exon_count end: " . $exon->end. "\n";<br />
$cds_seq .= $exon->seq->seq; # the first seq call returns a Bio::Seq object, the second gets the DNA string from Bio::Seq<br />
} <br />
</perl><br />
<br />
=====Bulk Output=====<br />
<br />
<perl><br />
my $gene_name = 'x-*';<br />
<br />
my @genes = $chado->get_features_by_name(<br />
-name => $gene_name,<br />
-class=> 'gene' );<br />
<br />
for my $gene (@genes) {<br />
print join("\t",<br />
$gene->feature_id,<br />
$gene->display_name,<br />
$gene->organism),"\n";<br />
}<br />
</perl><br />
<br />
Or see your report in GBrowse<br />
<br />
=====Advantages=====<br />
<br />
* Comes 'for free' with GBrowse<br />
** GBrowse will run with any DasI-compatible interface<br />
* Uses 'familiar' BioPerl idioms, very similar to widely used Bio::DB::GFF (though with fewer methods)<br />
<br />
<br />
=====Conclusion=====<br />
<br />
* Not suitable as a 'general' middleware layer<br />
** May be suitable for some applications, particularly if they are similar to GBrowse or other uses of Bio::DB::GFF<br />
<br />
===iBatis and Abator===<br />
<br />
====Background====<br />
<br />
* Source: http://ibatis.apache.org/<br />
* Language: Java<br />
* Authors: Apache group<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Does not hide SQL<br />
* Does not create a whole object model of the database in memory<br />
* Not as widely used as Hibernate<br />
* No Perl version<br />
<br />
====Presentation by Jeff Bowes====<br />
<br />
Jeff Bowes, Xenbase, University of Calgary. This Wiki section is an edited version of [[Media:iBatis.pdf|Jeff's presentation]].<br />
<br />
=====ibatis=====<br />
<br />
* iBatis<br />
** Light-weight framework<br />
** Still based on SQL but eliminates the repetitive drudgery of JDBC<br />
** You can tune a query by re-writing the SQL in XML & the API does not change.<br />
* iBatis does not create your database in memory as objects<br />
* Shallow learning curve<br />
* Manually create a Java class and SQL map to describe higher-level objects<br />
** Example: ''Gene''<br />
* Support for inheritance<br />
** Inheritance in result maps, allows fair amount of re-use. <br />
* Supports different transaction schemes<br />
** For example, JDBC, Java Transaction API<br />
<br />
=====Abator=====<br />
<br />
* Generates ibatis CRUD objects by introspecting database tables<br />
* Abator creates ''SQL in XML'' files (SQL Map files) and Java classes <br />
** Within these files is a Result Map section.<br />
* Abator config files are simple, set connection parameters, tell where the files are.<br />
* In the SQL Map files you can specify how to find parent ids, such asfeature_id. <br />
<br />
=====Abator Example=====<br />
<xml><br />
<abatorConfiguration><br />
<abatorContext> <!-- TODO: Add Database Connection Information --><br />
<jdbcConnection driverClass="COM.ibm.db2.jdbc.app.DB2Driver"<br />
connectionURL="jdbc:db2:XBDV05"<br />
userId="db2inst1"<br />
password=“*******"><br />
<classPathEntry location="/Program Files/IBM/SQLLIB/java/db2java.zip" /><br />
</jdbcConnection><br />
<br />
<javaModelGenerator<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.model"<br />
targetProject="gene" /><br />
<sqlMapGenerator<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.sql"<br />
targetProject="gene" /><br />
<daoGenerator type="IBATIS"<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.dao"<br />
targetProject="gene" /><br />
<abatorConfiguration><br />
</xml><br />
<br />
=====Abator Example=====<br />
<xml><br />
<abatorConfiguration><br />
<abatorContext> <!-- TODO: Add Database Connection Information --><br />
<jdbcConnection driverClass="COM.ibm.db2.jdbc.app.DB2Driver"<br />
connectionURL="jdbc:db2:XBDV05"<br />
userId="db2inst1"<br />
password=“*******"><br />
<classPathEntry location="/Program Files/IBM/SQLLIB/java/db2java.zip" /><br />
</jdbcConnection><br />
<br />
<javaModelGenerator<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.model"<br />
targetProject="gene" /><br />
<sqlMapGenerator<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.sql"<br />
targetProject="gene" /><br />
<daoGenerator type="IBATIS"<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.dao"<br />
targetProject="gene" /><br />
<abatorConfiguration><br />
</xml><br />
<br />
=====Abator Example=====<br />
<nowiki><br />
<table schema="db2inst1" tableName="synonym"></nowiki><br />
<generatedKey column="synonym_id" sqlStatement="VALUES PREVVAL FOR<br />
synonym_seq" identity="true" /><br />
<columnOverride column="CREATED_BY" jdbcType="INTEGER" /><br />
<columnOverride column="MODIFIED_BY" jdbcType="INTEGER" /><br />
<nowiki></table></nowiki><br />
<br />
=====Abator=====<br />
<br />
Works as:<br />
<br />
* Eclipse plug-in<br />
* ANT<br />
* Standalone<br />
<br />
=====DAO Methods=====<br />
<br />
* Insert (Feature)<br />
* Update (Feature)<br />
* DeletebyKey (FeatureKey)<br />
* SelectbyKey (FeatureKey)<br />
* SelectbyExample (FeatureExample)<br />
* DeletebyExample (FeatureExample)<br />
<br />
=====Insert=====<br />
<xml><br />
<insert id="abatorgenerated_insert" parameterClass=<br />
"org.gmod.architecture.framwork.bakeoff.abator.model.FeatureWithBLOBs"><br />
insert into db2inst1.feature<br />
(DBXREF_ID, ORGANISM_ID, NAME, UNIQUENAME,<br />
RESIDUES, SEQLEN, MD5CHECKSUM, TYPE_ID, IS_ANALYSIS,<br />
IS_OBSOLETE, CREATED_BY)<br />
values (#dbxrefId:INTEGER#, #organismId:INTEGER#, #name:VARCHAR#,<br />
#uniquename:VARCHAR#, #residues:CLOB#, #seqlen:INTEGER#,<br />
#md5checksum:CHAR#, #typeId:INTEGER#,<br />
#isAnalysis:SMALLINT#, #isObsolete:SMALLINT#,<br />
#createdBy:INTEGER#)<br />
<br />
<selectKey resultClass="java.lang.Integer" keyProperty="featureId"><br />
VALUES PREVVAL FOR feature_seq<br />
</selectKey><br />
</insert><br />
</xml><br />
<br />
=====Insert=====<br />
<xml><br />
<insert id="abatorgenerated_insert" parameterClass=<br />
"org.gmod.architecture.framwork.bakeoff.abator.model.FeatureWithBLOBs"><br />
insert into db2inst1.feature<br />
(DBXREF_ID, ORGANISM_ID, NAME, UNIQUENAME,<br />
RESIDUES, SEQLEN, MD5CHECKSUM, TYPE_ID, IS_ANALYSIS,<br />
IS_OBSOLETE, CREATED_BY)<br />
values (#dbxrefId:INTEGER#, #organismId:INTEGER#, #name:VARCHAR#,<br />
#uniquename:VARCHAR#, #residues:CLOB#, #seqlen:INTEGER#,<br />
#md5checksum:CHAR#, #typeId:INTEGER#,<br />
#isAnalysis:SMALLINT#, #isObsolete:SMALLINT#,<br />
#createdBy:INTEGER#)<br />
<br />
<selectKey resultClass="java.lang.Integer" keyProperty="featureId"><br />
VALUES PREVVAL FOR feature_seq<br />
</selectKey><br />
</insert><br />
</xml><br />
<br />
=====Insert=====<br />
<xml><br />
<insert id="abatorgenerated_insert" parameterClass=<br />
"org.gmod.architecture.framwork.bakeoff.abator.model.FeatureWithBLOBs"><br />
insert into db2inst1.feature<br />
(DBXREF_ID, ORGANISM_ID, NAME, UNIQUENAME,<br />
RESIDUES, SEQLEN, MD5CHECKSUM, TYPE_ID, IS_ANALYSIS,<br />
IS_OBSOLETE, CREATED_BY)<br />
values (#dbxrefId:INTEGER#, #organismId:INTEGER#, #name:VARCHAR#,<br />
#uniquename:VARCHAR#, #residues:CLOB#, #seqlen:INTEGER#,<br />
#md5checksum:CHAR#, #typeId:INTEGER#,<br />
#isAnalysis:SMALLINT#, #isObsolete:SMALLINT#,<br />
#createdBy:INTEGER#)<br />
<br />
<selectKey resultClass="java.lang.Integer" keyProperty="featureId"><br />
VALUES PREVVAL FOR feature_seq<br />
</selectKey><br />
</insert><br />
</xml><br />
<br />
=====Insert=====<br />
<xml><br />
<selectKey resultClass="java.lang.Integer"<br />
keyProperty="featureId"><br />
VALUES PREVVAL FOR feature_seq<br />
</selectKey><br />
</xml><br />
<br />
=====Insert=====<br />
<xml><br />
<selectKey resultClass="java.lang.Integer"<br />
keyProperty="featureId"><br />
VALUES PREVVAL FOR feature_seq<br />
</selectKey><br />
</xml><br />
<br />
=====Problem 1 - Insert=====<br />
<br />
<java><br />
try {<br />
sqlMap.startTransaction();<br />
pGene.id =featureDAO.insert(pGene.getFeatureWithBLOBs());<br />
featurepropDAO.insert(pGene.getPropertyDescription());<br />
pGene.featurelocId = featurelocDAO.insert(pGene<br />
.getFeaturelocWithBLOBS());<br />
pGene = insertExons(pGene);<br />
insertSynonyms(pGene);<br />
sqlMap.commitTransaction();<br />
} catch (Exception e) {<br />
System.out.println(e);<br />
throw (e);<br />
} finally {<br />
sqlMap.endTransaction();<br />
}<br />
</java><br />
<br />
=====Problem 1 - Insert=====<br />
<br />
<java><br />
try {<br />
sqlMap.startTransaction();<br />
pGene.id =featureDAO.insert(pGene.getFeatureWithBLOBs());<br />
featurepropDAO.insert(pGene.getPropertyDescription());<br />
pGene.featurelocId = featurelocDAO.insert(pGene<br />
.getFeaturelocWithBLOBS());<br />
pGene = insertExons(pGene);<br />
insertSynonyms(pGene);<br />
sqlMap.commitTransaction();<br />
} catch (Exception e) {<br />
System.out.println(e);<br />
throw (e);<br />
} finally {<br />
sqlMap.endTransaction();<br />
}<br />
</java><br />
<br />
=====Transactions=====<br />
<br />
* SQLMap<br />
* JDBC<br />
* JTA - Java Transaction API<br />
** 2-Phase commit<br />
* Hibernate<br />
* External (Customized)<br />
<br />
=====Retrieval=====<br />
<br />
symbol: xfile<br />
description: A test gene for GMOD meeting<br />
mRNA Feature<br />
exon_1: start: 13691 end: 13767 <br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
exon_2: start: 14687 end: 14720<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 2 - Master Detail Reports=====<br />
<br />
Account for cycles or recursion in Master Detail Report. <br />
<br />
<xml><br />
<resultMap id="SelectGeneResults"<br />
class="org.gmod.architecture.framwork.bakeoff.Gene" groupBy="id"><br />
<result column="FEATURE_ID" property="id" jdbcType="INTEGER"/><br />
<result column="GENE_NAME" property="name" jdbcType="VARCHAR" /><br />
<result column="DESCRIPTION" property="description“<br />
jdbcType="VARCHAR" /><br />
<result column="TYPE_ID" property="typeId" jdbcType="INTEGER" /><br />
<result property="exons" resultMap = "gene.SelectExonResults"/><br />
</resultMap><br />
<br />
<resultMap id="SelectExonResults"<br />
class="org.gmod.architecture.framwork.bakeoff.Exon"><br />
<result column="EXON_ID" property="id" jdbcType="INTEGER"/><br />
<result column="EXON_NAME" property="name" jdbcType="VARCHAR" /><br />
<result column="EXON_RESIDUES" property="residues" jdbcType="CLOB" /><br />
<result column="STRAND" property="strand" jdbcType="INTEGER" /><br />
<result column="FMIN" property="fmin" jdbcType="INTEGER" /><br />
<result column="FMAX" property="fmax" jdbcType="INTEGER" /><br />
<result column="SRCFEATURE_ID" property="sourceFeatureId"<br />
jdbcType="INTEGER" /><br />
</resultMap><br />
</xml><br />
<br />
=====Problem 2 - Master Detail Report=====<br />
<xml><br />
<resultMap id="SelectGeneResults"<br />
class="org.gmod.architecture.framwork.bakeoff.Gene" groupBy="id"><br />
<result column="FEATURE_ID" property="id" jdbcType="INTEGER"/><br />
<result column="GENE_NAME" property="name" jdbcType="VARCHAR" /><br />
<result column="DESCRIPTION" property="description“<br />
jdbcType="VARCHAR" /><br />
<result column="TYPE_ID" property="typeId" jdbcType="INTEGER" /><br />
<result property="exons" resultMap = "gene.SelectExonResults"/><br />
</resultMap><br />
<br />
<resultMap id="SelectExonResults"<br />
class="org.gmod.architecture.framwork.bakeoff.Exon"><br />
<result column="EXON_ID" property="id" jdbcType="INTEGER"/><br />
<result column="EXON_NAME" property="name" jdbcType="VARCHAR" /><br />
<result column="EXON_RESIDUES" property="residues" jdbcType="CLOB" /><br />
<result column="STRAND" property="strand" jdbcType="INTEGER" /><br />
<result column="FMIN" property="fmin" jdbcType="INTEGER" /><br />
<result column="FMAX" property="fmax" jdbcType="INTEGER" /><br />
<result column="SRCFEATURE_ID" property="sourceFeatureId"<br />
jdbcType="INTEGER" /><br />
</resultMap><br />
</xml><br />
<br />
=====Master Detail Report=====<br />
<br />
gene_id Symbol Type Fmin Fmax<br />
6129482 x-files gene 13691 13767<br />
6129482 x-files gene 14687 14720<br />
<br />
Becomes:<br />
<br />
gene_id Symbol Type Fmin Fmax<br />
6129482 x-files gene 13691 13767<br />
14687 14720<br />
<br />
=====Dynamic Queries=====<br />
<br />
* Gene Name (Description)<br />
** Feature, Featureprop<br />
* Symbol<br />
** Feature<br />
* Feature Synonyms<br />
** Feature, Feature_Synonym, Synonym<br />
* Ortholog Synonyms<br />
** Feature, Feature_relationship, Feature, Feature Synonyms<br />
<br />
=====Dynamic Queries=====<br />
<br />
FROM<br />
CAT_X_GENE_V gc<br />
<isEqual<br />
prepend=",property="searchSymbol"<br />
compareValue="true"><br />
GENE_SYMBOLS s<br />
</isEqual><br />
<br />
<isEqual prepend=","<br />
property="searchNcbi" <br />
compareValue="true"><br />
NCBI_GI n<br />
</isEqual><br />
<br />
=====Dynamic Queries=====<br />
<br />
<dynamic prepend="WHERE"><br />
<isEqual prepend="AND" property="searchNameOnly“<br />
compareValue="true"><br />
<iterate property="searchTokens" conjunction="AND" <br />
open=" (" close=") "><br />
LOWER(VARCHAR(gc.longname)) LIKE <br />
LOWER(CAST(#searchTokens[]:VARCHAR# AS VARCHAR(512)))<br />
</iterate><br />
</isEqual><br />
<br />
Iterate very useful for multiple search terms <br />
<br />
=====Miscellaneous Features=====<br />
<br />
* Supports various data sources<br />
** Simple JDBC<br />
** DBCP – Apache Connection Pooling<br />
** JNDI – Java Naming Directory Interface<br />
* Very flexible<br />
* Local caching of results<br />
** Lazy loading<br />
<br />
=====Support=====<br />
<br />
* In GMOD used by<br />
** Xenbase, Artemis at Sanger<br />
* Many other users<br />
** e.g. MySpace.com<br />
* Top level Apache Project<br />
** www.ibatis.apache.org<br />
* Active community<br />
<br />
<br />
=====What iBatis Does Well=====<br />
<br />
* Does not hide SQL<br />
** No new query language to learn<br />
* Separates and groups SQL<br />
* Simple!!<br />
** Light wrapper - No real tweaks<br />
* Does the job well<br />
* Excellent support for Master-Detail<br />
* Dynamically generated queries <br />
** You can structure conditions around clauses in SQL<br />
** One XML statement can represent many variations on a query<br />
<br />
=====Acknowledgements=====<br />
<br />
GMOD<br />
* Eric Just<br />
* Everyone else<br />
<br />
Ibatis Developers<br />
* Kevin Snyder,<br />
* Chris Jarabek,<br />
* Ross Gibb<br />
<br />
PI<br />
* Peter Vize<br />
<br />
Financial Support<br />
* Alberta Heritage Foundation for Medical Research<br />
* Alberta Network for Proteomics Innovation<br />
* University of Calgary, Faculty of Science<br />
* University of Calgary Dept. of Computer Science<br />
* NICHD<br />
<br />
===Hibernate===<br />
<br />
====Background====<br />
<br />
* Source: http://www.hibernate.org<br />
* Language: Java<br />
* Authors: JBoss group<br />
* Users: VectorBase<br />
* Support: JBoss group<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Limitations====<br />
<br />
* Issue around Completeness <br />
* Exception Handling<br />
* Performance Tuning<br />
<br />
====Presentation by Robert Bruggner====<br />
<br />
Chado API via Java & Hibernate, Robert Bruggner, VectorBase.org. This Wiki section is an edited version of [[Media:HibernateChadoAPI.pdf|Robert's presentation]].<br />
<br />
=====Overview=====<br />
<br />
* Background<br />
* Quick Hibernate Overview<br />
* Hibernate Connectivity and O/R Mapping Example<br />
* GMOD Demo<br />
<br />
Also see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]].<br />
<br />
=====Background=====<br />
<br />
* VectorBase<br />
** A bioinformatic resource center for invertebrate vectors of human pathogens<br />
* Responsible for storage and display of multiple organisms’ genomes<br />
** Anopheles gambiae, Aedes aegypti, Ixodes scapularis, Culex pipiens and so on....<br />
* Want to store data for many organisms- Chado a natural choice<br />
* Ensembl Genome Browser already used for ''A. gambiae''<br />
** Wrote Ensembl API Database adaptor for Chado... Not maintainable.<br />
* Use Both Databases<br />
** Transfer genomic data from Ensembl to Chado<br />
** Search Engine and Indexer using Lucene<br />
** Run DAS<br />
** Export data via ChadoXML and GFF3<br />
* Need API for Database I/O<br />
<br />
=====Hibernate Background=====<br />
<br />
* They say: “A powerful, high performance object/relational persistence and query service.”<br />
* Automates the persistence of plain old Java objects (POJO)<br />
** User maps their POJO properties to database tables via XML (HBM File).<br />
** There are Hibernate tools that generate HBMs<br />
*** Configurable in the sense that one can create get & set tables where the methods map one-to-one to fields.<br />
* Persist a specific object by storing it the database.<br />
* Intelligent Database I/O <br />
** Smart detection of ''Dirty Properties'' when performing Save / Update / Delete.<br />
** Cascadable Save / Update / Delete for complex objects.<br />
* Everything's done within the scope of a transaction.<br />
<br />
=====Hibernate Database Connectivity=====<br />
<br />
* Configure Hibernate in hibernate.cfg.xml<br />
* Define a Data Source<br />
** We use a simple, single JDBC connection Chado<br />
** Can be configured to use a connection pool or data source accessible by the Java Naming and Directory Interface (JNDI).<br />
** Define a connection “dialect”<br />
** org.hibernate.dialect.PostgreSQLDialect<br />
* Describe the relationship between Java objects and database tables<br />
** Use XML to describe where to store POJO property data in the database<br />
* Create a new Hibernate Session based on the configuration<br />
* Begin a transaction to start performing work<br />
<br />
=====POJO and HBM Example file - CV=====<br />
<br />
<java><br />
public class CV {<br />
<br />
private int cv_id<br />
private String name;<br />
private String definition;<br />
<br />
public property gettersandsetters() {<br />
....<br />
}<br />
<br />
public boolean equals(CV comparaCV) {<br />
....<br />
}<br />
public int hashCode(){<br />
...<br />
}<br />
}<br />
</java><br />
<xml><br />
<hibernate-mapping><br />
<br />
<class name="org.vectorbase.chadoAPI.chadoObjects.CV" table="cv"><br />
<br />
<id name="cv_id" column="cv_id" unsaved-value="undefined"><br />
<br />
<generator class="sequence"><br />
<br />
<param name="sequence">cv_cv_id_seq</param><br />
<br />
</generator><br />
<br />
</id> <br />
<br />
<property name="name" column=”name” type="java.lang.String” not-null="true"/><br />
<br />
<property name="definition" column=”definition” type="java.lang.String”/><br />
<br />
</class><br />
</hibernate-mapping><br />
</xml><br />
<br />
=====HBM Example CVTerm=====<br />
<br />
<java><br />
public class CVTerm {<br />
<br />
private int cvterm_id;<br />
<br />
private CV cv;<br />
<br />
private String name;<br />
<br />
private String definition;<br />
<br />
private DBXref dbxref;<br />
<br />
private int is_obsolete;<br />
<br />
private int is_relationshiptype;<br />
<br />
.....<br />
<br />
}<br />
</java><br />
<xml><br />
<hibernate-mapping><br />
<br />
<class name="org.vectorbase.chadoAPI.chadoObjects.CVTerm" table="cvterm"><br />
<br />
<id name="cvterm_id" column="cvterm_id" unsaved-value="undefined"><br />
<br />
<generator class="sequence"><br />
<br />
<param name="sequence">cvterm_cvterm_id_seq</param><br />
<br />
</generator><br />
<br />
</id><br />
<br />
<many-to-one name="cv" class="org.vectorbase.chadoAPI.chadoObjects.CV" column="cv_id" <br />
not-null="true" cascade="save-update"/><br />
<br />
<property name="name" not-null="true" type="java.lang.String"/><br />
<br />
<property name="definition"/><br />
<br />
<one-to-one name="dbxref" class="org.vectorbase.chadoAPI.chadoObjects.DBXref" cascade="all"/><br />
<br />
<property name="is_obsolete"/><br />
<br />
<property name="is_relationshiptype"/><br />
<br />
</class><br />
</hibernate-mapping><br />
</xml><br />
<br />
=====Hibernate Object Retrieve=====<br />
<br />
One can use Java, Hibernate Query Language, or SQL, this example uses HQL<br />
<br />
<java><br />
import org.hibernate.Session;<br />
import org.vectorbase.chadoAPI.CVTerm;<br />
import org.vectorbase.chadoAPI.CV;<br />
<br />
// Load the configuration from hibernate.cfg.xml<br />
<br />
// Build a session factory first (not shown)<br />
<br />
// Get the session based on the configuration and begin transaction<br />
Session session = HibernateSessionFactory.getCurrentSession();<br />
session.beginTransaction();<br />
<br />
// Load a CVTerm by its ID<br />
CVTerm cvt = (CVTerm) session.get(CVTerm.class,1);<br />
<br />
// Load a CVTerm using HQL<br />
CVTerm cvt = session.createQuery(“from CVTerm where name=?”).setString(0,”name”).uniqueResult();<br />
<br />
// Print out the name of the cvterm<br />
System.out.println(cvt.getName());<br />
<br />
// Get the cv that the cvterm is associated with<br />
// Hibernate doesn’t return the cv_id - it returns a CV Object.<br />
CV cv = cvt.getCv();<br />
<br />
// Print out the cv’s name<br />
System.out.println(cv.getName());<br />
</java><br />
<br />
=====Hibernate Object Update=====<br />
<br />
<java><br />
import org.hibernate.Session;<br />
import org.vectorbase.chadoAPI.CVTerm;<br />
<br />
// Load the configuration from hibernate.cfg.xml<br />
<br />
// Build a session factory first (not shown)<br />
<br />
// Get the session based on the configuration and begin transaction<br />
Session session = HibernateSessionFactory.getCurrentSession();<br />
session.beginTransaction();<br />
<br />
// Load a CVTerm by its ID<br />
CVTerm cvt = (CVTerm) session.get(CVTerm.class,1);<br />
<br />
// Change cvt’s name<br />
cvt.setName(“New CVTerm name”);<br />
<br />
// Save!<br />
// Generated SQL updates “Dirty” properties (name, in this case)<br />
session.save(cvt);<br />
<br />
// Commit data to database<br />
session.commit();<br />
</java><br />
<br />
=====Hibernate Save=====<br />
<br />
<java><br />
import org.hibernate.Session;<br />
import org.vectorbase.chadoAPI.CVTerm;<br />
import org.vectorbase.chadoAPI.CV;<br />
<br />
// Load the configuration from hibernate.cfg.xml<br />
// Build a session factory first and get begin transaction (not shown)<br />
<br />
// Make a new CV<br />
CV new_cv = new CV();<br />
new_cv.setName(“New CV”);<br />
new_cv.setDefinition(“New CV Def”);<br />
<br />
// Make a new cvterm for that cv<br />
CVTerm new_cvterm = new CVTerm();<br />
new_cvterm.setName(“New CVTerm Name”);<br />
// ..... save dbxref etc......<br />
<br />
// Add that CVTerm to our new CV<br />
new_cv.addCVTerm(new_cvterm);<br />
<br />
// Save the new data...<br />
// Hibernate recognizes that it has to first save new_cv, then save new_cvterm.<br />
session.save(new_cvterm);<br />
<br />
session.commit();<br />
<br />
// You can see the new id’s assigned by the database<br />
System.out.println(new_cv.getCv_id());<br />
System.out.println(new_cvterm.getCvterm_id());<br />
</java><br />
<br />
=====Inheritance=====<br />
<xml><br />
<hibernate-mapping><br />
<br />
<class name="org.vectorbase.chadoAPI.chadoObjects.Feature" table="feature" discriminator-<br />
value="not null"><br />
<br />
<id name="feature_id" column="feature_id" unsaved-value="undefined"><br />
<br />
<generator class="sequence"><br />
<br />
<param name="sequence">feature_feature_id_seq</param><br />
<br />
</generator><br />
<br />
</id> <br />
<br />
<discriminator column="type_id" type="integer" insert="false"/><br />
<br />
<many-to-one name="dbxref" class="org.vectorbase.chadoAPI.chadoObjects.DBXref" <br />
column="dbxref_id" cascade="all"/><br />
<br />
<many-to-one name="organism" class="org.vectorbase.chadoAPI.chadoObjects.Organism" <br />
column="organism_id" not-null="true" cascade="save-update"/><br />
<br />
<property name="name"/><br />
.....<br />
<br />
<hibernate-mapping> <br />
<br />
<subclass name="org.vectorbase.chadoAPI.chadoFeatures.Gene" <br />
extends="org.vectorbase.chadoAPI.chadoObjects.Feature" discriminator-value="767"><br />
<br />
</subclass><br />
</hibernate-mapping><br />
</xml><br />
Write custom methods for specific sub-classes<br />
<br />
=====ChadoAPI=====<br />
<br />
* POJO Mappings<br />
** CV, CVTerm, DB, DBXref, Feature, FeatureCVTerm, FeatureDBXref, FeatureLoc, FeatureProp, FeatureRelationship, FeatureSynonym, Organism, Pub, Synonym<br />
* Extended Features<br />
** Chromosome, Gene, Transcript, Exon, Protein<br />
* Constants<br />
** CVTerms, FeatureFeatureRelationships, Ontologies<br />
* Special<br />
** ChadoAdapter<br />
<br />
=====Problem 1 - GMOD Example=====<br />
<br />
<java><br />
// Set up our session and begin transaction<br />
Session session = HibernateUtil.getSessionFactory().getCurrentSession();<br />
session.beginTransaction();<br />
<br />
// Make a chado adpator and load up some utility objects<br />
ChadoAdaptor ca = new ChadoAdaptor();<br />
Chromosome c = ca.fetchChromosomeByUniqueName("fake_chromosome");<br />
Pub null_pub = ca.fetchPubByPubID(1);<br />
Organism agambiae = ca.fetchOrganismByScientificName("Anopheles","gambiae");<br />
<br />
// Begin GMOD Demo Code<br />
<br />
// Make our new gene;<br />
Gene xfile = new Gene();<br />
xfile.setOrganism(agambiae);<br />
xfile.setUniquename("xfile");<br />
xfile.setDescription("A test gene for GMOD meeting");<br />
<br />
/* Set the location of our gene. No need to set coordinates because they'll be updated<br />
* based on the exon boundaries. <br />
*/<br />
FeatureLoc xfile_loc = new FeatureLoc();<br />
xfile_loc.setSrcfeature(c);<br />
xfile_loc.setStrand(1);<br />
xfile.setFeatureLoc(xfile_loc);<br />
<br />
// Add synonyms to xfile<br />
xfile.createNewFeatureSynonym("mulder", null_pub, CVTerms.EXACT_SYNONYM);<br />
xfile.createNewFeatureSynonym("scully", null_pub, CVTerms.EXACT_SYNONYM);<br />
</java><br />
<br />
=====Problem 2 - GMOD Example=====<br />
<br />
<java><br />
// Create a new transcript for our gene.<br />
Transcript t = xfile.createGeneTranscript("xfile-RA");<br />
<br />
// Create some exons for that transcript.<br />
t.createTranscriptExon("xfile:1", 13691, 13767);<br />
t.createTranscriptExon("xfile:2", 14687, 14720);<br />
<br />
// Save our new gene<br />
session.save(xfile);<br />
System.out.println("xfile feature_id is " + xfile.getFeature_id());<br />
<br />
// Fetch our saved gene from the database<br />
Gene xfile_r = ca.fetchGeneByUniqueName("xfile");<br />
System.out.println("symbol: " + xfile_r.getUniquename());<br />
System.out.print("synonyms: ");<br />
for (FeatureSynonym fs : xfile_r.getFeatureSynonyms()){<br />
<br />
System.out.print(fs.getSynonym().getName() + " ");<br />
}<br />
<br />
System.out.println("description: " + xfile_r.getDescription());<br />
System.out.println("type: " + xfile_r.getType().getName());<br />
<br />
for (Transcript tx : xfile_r.fetchAllTranscripts()){<br />
for (Exon e : tx.fetchAllExons()){<br />
System.out.println(e.getUniquename() + " Start:\t" + e.getFeatureLoc().getFmin());<br />
System.out.println(e.getUniquename() + " End:\t" + e.getFeatureLoc().getFmax());<br />
System.out.println("\tSrcFeatureID: " + e.getFeatureLoc().getSrcfeature().getFeature_id());<br />
}<br />
System.out.println(">" + tx.getUniquename());<br />
System.out.println(tx.generateTranscriptSequenceFromExons().toUpperCase());<br />
}<br />
</java><br />
<br />
=====Problems 3, 4, & 5 - GMOD Update & Delete=====<br />
<br />
<java><br />
// Lets update our name...<br />
xfile_r.setUniquename("x-file");<br />
<br />
session.save(xfile_r);<br />
<br />
// Not part of the ChadoAdaptor utility object, but a good example of HQL<br />
List<Gene> genes = (List<Gene>)session.createQuery("from Gene where uniquename like ?").setString(0,”x-%”).list();<br />
<br />
for (Gene g : genes){<br />
<br />
System.out.println(g.getFeature_id() + <br />
"\t" + g.getUniquename() + <br />
"\t" + g.getOrganism().getGenus() +<br />
" " + g.getOrganism().getSpecies());<br />
}<br />
<br />
// Deleting... hmm...<br />
Gene delete_me = ca.fetchGeneByUniqueName("x-ray");<br />
session.delete(delete_me);<br />
<br />
// All Finished<br />
session.getTransaction().commit();<br />
</java><br />
<br />
<br />
<br />
=====What Hibernate Does Well=====<br />
<br />
* Hibernate can be configured to perform specialized functions<br />
** For example, it has its own notion of a cascade<br />
* Flexible with respect to language<br />
** Java, Hibernate Query Language, or SQL<br />
* Any JDBC driver<br />
<br />
=====Acknowledgements=====<br />
<br />
* VectorBase People<br />
** Frank Collins, EO Stinson, Ryan Butler<br />
* GMOD<br />
* NIAID<br />
<br />
===PSU Chado Interface===<br />
<br />
====Background====<br />
<br />
* Source:<br />
* Language: Java<br />
* Authors: Chinmay Patel, Adrian Tivey<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Limitations====<br />
<br />
* Hibernate Save: no equivalent to a ''find or save'' method.<br />
* Not great for bulk data retrieval.<br />
* Hibernate works best when applied to databases designed with objects in mind (''Object Oriented Databases''). <br />
<br />
<br />
====Presentation by Chinmay Patel====<br />
<br />
This Wiki section is an edited version of [[Media:PSU.pdf|Chinmay's presentation]].<br />
<br />
=====GeneDB=====<br />
<br />
* GeneDB is the organism data and annotation database for the Pathogen Sequencing Unit (PSU) at the Sanger Institute, UK<br />
* Contains 37 organisms, which is expected to grow to 62<br />
* Currently migrating to chado schema<br />
* Java API with two engines Hibernate & iBatis<br />
** Two teams, Artemis and GeneDB, took different approaches<br />
<br />
=====Technical - Connections=====<br />
<br />
Connections are configured in the Spring configuration file<br />
<xml><br />
<bean id="dataSource" class="org.apache.commons.dbcp.BasicDataSource"><br />
<property name="driverClassName" value="org.postgresql.Driver" /><br />
<property name="url" value="jdbc:postgresql://holly.sanger.ac.uk:5432/chado" /><br />
<property name="username" value="DELIBERATELY_BOGUS_NAME"/><br />
<property name="password" value="WIBBLE" /><br />
</bean><br />
</xml><br />
* Uses a connection pool<br />
* Connection to the database is specified graphically, so the iBatis configuration file has variables for the location:<br />
<xml><br />
<property name="JDBC.Driver" value="org.postgresql.Driver"/><br />
<br />
<property name="JDBC.ConnectionURL” value="jdbc:postgresql://${chado}"/><br />
<br />
<property name="JDBC.Username" value="${username}"/><br />
<br />
<property name="JDBC.Password" value="${password}"/><br />
</xml><br />
<br />
* provide database location, username & password<br />
* select from scrollable list of feature with residues (organisms in separate Postgres schemas) what to open in Artemis<br />
<br />
=====Technical - Code Generation=====<br />
<br />
* The shared interface and hibernate implementation were originally generated<br />
* There’s no explicit code generation (although the Spring and Hibernate runtimes may use them behinds the scenes)<br />
<br />
=====Technical - Transactions=====<br />
<br />
* Transactions are fully supported<br />
* There’s no explicit code generation (although the Spring and Hibernate runtimes may use them behind the scenes)<br />
<br />
=====Problems 1, 2, & 3=====<br />
<br />
Creating a gene<br />
<java><br />
genes[0] = new Feature(ORG, GENE, "xfile", false, false, now, now);<br />
<br />
genes[0].setSeqLen(1029); <br />
sequenceDao.persist(genes[0]);<br />
<br />
FeatureLoc loc = new FeatureLoc(SOURCE_FEATURE, genes[0], 13691, false, 14720, false, (short)1, 0, 0 ,0);<br />
<br />
sequenceDao.persist(loc);<br />
<br />
addFeatureProp(genes[0], "description", "A test gene for GMOD meeting");<br />
<br />
addSynonymsToFeature(genes[0], "mulder", "scully");<br />
<br />
createExon("exon1", genes[0], 13691, 13767, now, 0);<br />
<br />
createExon("exon2", genes[0], 14687, 14720, now, 1);<br />
</java><br />
<br />
Retrieve a gene<br />
<java><br />
Feature f = sequenceDao.getFeatureByUniqueName("xfile");<br />
displayGene(f);<br />
</java><br />
<br />
Update a gene<br />
<java><br />
genes[0].setUniqueName("x-file");<br />
<br />
sequenceDao.merge(genes[0]);<br />
</java><br />
<br />
=====Demo – Sample Problem=====<br />
<br />
<java><br />
private Feature createExon(String name, Feature gene, int min, int max, Timestamp now, int rank) {<br />
<br />
Feature exon = new Feature(ORG, EXON, name, false, false, now, now);<br />
exon.setSeqLen(max-min);<br />
sequenceDao.persist(exon);<br />
<br />
FeatureLoc loc = new FeatureLoc(SOURCE_FEATURE, exon, min, false, max, false, <br />
(short)1, 0, 0 ,0);<br />
sequenceDao.persist(loc);<br />
<br />
return exon;<br />
<br />
}<br />
</java><br />
<br />
=====Demo – Sample Problem=====<br />
<br />
<xml><br />
<st:section name="Naming" id="gene_naming" collapsed="false" collapsible="false"<br />
hideIfEmpty="true"><br />
<dl><br />
<dt><b>symbol:</b></dt><br />
<dd>${feature.uniqueName}</dd><br />
</dl><br />
<db:synonym name="synonym" var="name" collection="${feature.featureSynonyms}"><br />
<br /><b>Synonym:</b> <db:list-string collection="${name}" /><br />
</db:synonym><br />
<dt><b>Type:</b></dt><br />
<dd>${feature.cvTerm.name}</dd><br />
<br />
<st:section name="Exons" collapsed="false" collapsible="true" hideIfEmpty="true"><br />
<display:table name="exons" uid="tmp" pagesize="30" class="simple" cellspacing="0"<br />
cellpadding="4"><br />
<display:column property="uniqueName" title="Exon"/><br />
<display:column property="featureLocsForSrcFeatureId.fmin" title="Start"/><br />
<display:column property="featureLocsForSrcFeatureId.fmax" title="end"/><br />
</display:table><br />
</st:section><br />
<br />
<st:section name="cds" collapsible="true"><br />
<b>${feature.residues}</b><br />
</st:section><br />
</xml><br />
<br />
Specialized functionality like a cascading delete are handled by the database</div>165.124.152.78http://gmod.org/wiki/GMOD_MiddlewareGMOD Middleware2007-01-26T20:23:01Z<p>165.124.152.78: /* Why Modware Was Developed */</p>
<hr />
<div>==Middleware for Chado databases==<br />
<br />
===Authors===<br />
<br />
* Jeff Bowes<br />
* Robert Bruggner<br />
* Scott Cain<br />
* Josh Goodman<br />
* Eric Just<br />
* Sohel Merchant<br />
* Brian O'Connor<br />
* Brian Osborne<br />
* Chinmay Patel<br />
* Pinglei Zhou<br />
<br />
===Middleware Evaluation January 2007===<br />
<br />
A group of some 50 GMOD developers met at the annual meeting to discuss middleware. This one day meeting had the following general goals:<br />
<br />
* To educate GMOD programmers on methods and practices for Middleware<br />
* To facilitate discussion on the best methods<br />
* To guide GMOD to a uniform Middleware layer<br />
* To generate this central reference document for Middleware projects, including:<br />
** Platform information<br />
** Strengths & weaknesses of different Middleware packages<br />
** Specific examples of how one would use a given middleware package<br />
<br />
===Introduction===<br />
<br />
One of the key characteristics of the GMOD software project is the variety of approaches and components that it supports. This applies to applications, database schemas, as well as to middleware, a software ''layer'' that mediates the exchange of data between the applications and the databases. Despite this diversity certain applications and schemas have emerged as key supported components in GMOD, such as the GBrowse application and the Chado schema, to name just two. However, a consensus view has not emerged with respect to middleware, and there are certainly a number of different middleware packages that have been used in the GMOD world, coming from within this world and from the larger world of open source.<br />
<br />
In late 2006 the GMOD developers took note of the large number of middleware packages in use and elected to embark on a short-term study to evaluate and compare these packages. The primary motivation here was to select or recommend certain packages over others specifically within the GMOD context. The assumption is that making such recommendations will serve to focus the developers' effort on a smaller number of packages. Clearly it's also assumed that such a focus will inevitably lead to greater support for and use of those recommended packages, and that all will GMOD will benefit.<br />
<br />
Another purpose of this study is to educate GMOD programmers on best practices concerning the use and development of middleware. It's expected that common agreement on these practices will lead to the development of more effective software as well as the best use of the software in practice. Finally, this study should generate a central reference document on these different middleware packages used in GMOD. This reference will contain platform- and language-specific information as well as descriptions of the strengths and weaknesses of the packages that can be used by GMOD developers when considering middleware.<br />
<br />
====General Evaluation Criteria====<br />
<br />
The GMOD developers proposed that each presenter provide some basic information about each middleware package, both general and technical. In addition each middleware application was asked to address a set of sample problems, shown below. These example problems are thought to typify some of the common functions that the scientist may need when working with their own database. It was understood that not all software would be able to handle all aspects of the sample problems and this demonstration was not intended to be ''live''.<br />
<br />
=====Problem 1=====<br />
<br />
Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned <code>feature_id</code> for each inserted gene.<br />
<br />
Note:<br />
<br />
* The coordinates are given in exact coordinates.<br />
* Use the <code>organism_id</code> for your organism<br />
* Store description in the chado table <code>featureprop</code><br />
* A sequence in fasta format (see [[FakeChromosome|Fake Chromosome]]) should be loaded as genomic sequence, either chromosome or contig -- this will be used as a <code>srcfeature</code> in <code>featureloc</code><br />
<br />
Gene descriptions:<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
mRNA Feature<br />
exon_1: <br />
start: 13691<br />
end: 13767<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
exon_2: <br />
start: 14687<br />
end: 14720<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
symbol: x-men<br />
synonyms: wolverine<br />
mRNA Feature<br />
exon_1: <br />
start: 12648<br />
end: 13136<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
symbol: x-ray<br />
synonyms: none<br />
exon: <br />
start: 1703<br />
end: 1900<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 2=====<br />
<br />
Retrieve and print the following report for gene '''xfile''' (the coding sequence and exon coordinates are derived from the associated mRNA feature). The results should resemble the following:<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>xfile cds <br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
=====Problem 3=====<br />
<br />
Update the gene '''xfile''': change the name symbol to '''x-file''' and retrieve the changed record. Regenerate the report from Problem 1. The results should resemble the following:<br />
<br />
symbol: x-file<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>x-file cds<br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
=====Problem 4=====<br />
<br />
Search for all genes with symbols starting with ''x-''. With the results produce the following simple result list<br />
(organism will vary):<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
1325 x-ray Xenopus laevis<br />
<br />
=====Problem 5=====<br />
<br />
Delete the gene '''x-ray''' using the <code>geneId</code>. Run the search and report in Problem 4 again to show the delete has taken<br />
place, with a result resembling the following:<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
<br />
===Conclusions===<br />
<br />
The one day meeting heard presentations from developers using both Perl and Java middleware and a number of satisfactory solutions were described. The focus in all cases was some sort of system that connected to the Chado relational database. Although other databases are encountered in the GMOD world (e.g. BioSQL) the Chado schema is popular and serves as a good test schema for this exercise given its complexity. The primary focus in the talks was on functionality from the perspective of writing code and extending the software and less attention was given to performance. Each presenter focussed on their middleware and little side-by-side comparisons were made (for one comparison please see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]] by Josh Goodman).<br />
<br />
====Problem Assignments====<br />
<br />
All presenters paid attention to the assigned problems and all packages could perform the required operations, except for the GBrowse (DasI) Adaptor which is read-only software. Clearly one can see many differences between packages in how the the problems were solved, please see the presentations themselves for these details. <br />
<br />
The Perl approaches used only the Perl language whereas the Java packages all used Java plus XML, to some degree. In addition iBatis exposes SQL to the developer and it was argued that this could be viewed either as an advantage (allows tuning of underlying SQL) or disadvantage.<br />
<br />
====Java Middleware====<br />
<br />
* Flybase examined iBatis and Hibernate, both use XML configuration files<br />
** Hibernate is better if you're building schema from scratch<br />
** Both auto-configure given a schema. <br />
** Both have strengths and weaknesses.<br />
* Is Hibernate better when you're in the process of designing a schema?<br />
** Hibernate can assist you in making a ''Hibernate-compatible'' schema.<br />
<br />
=====Abstraction=====<br />
<br />
<br />
=====Performance=====<br />
<br />
No pairwise comparisons.<br />
<br />
=====Configuration & Autoconfiguration=====<br />
<br />
<br />
=====Documentation=====<br />
<br />
<br />
====Perl Middleware====<br />
<br />
<br />
=====Abstraction=====<br />
<br />
Modware has Higher level abstraction than that provided by Chado::AutoDBI<br />
<br />
=====Performance=====<br />
<br />
No pairwise comparisons.<br />
<br />
=====Configuration & Autoconfiguration=====<br />
<br />
<br />
=====Documentation=====<br />
<br />
DasI interface is well-documented, about a dozen methods and three classes, all documented.<br />
<br />
===Object-Relational Mapping Principles===<br />
<br />
====Presentation by Sohel Merchant====<br />
<br />
Sohel Merchant, Bioinformatics Software Engineer at dictyBase, Center for Genetic Medicine, Northwestern University, Chicago. This Wiki section is an edited version of [[Media:Object_Relational_Mapping_Layer.pdf|Sohel's presentation]].<br />
<br />
=====Outline=====<br />
<br />
* The Problem<br />
* Solutions<br />
* ORM<br />
* Perl – Class::DBI<br />
* Summary<br />
<br />
=====The Problem=====<br />
<br />
* Developers need to perform Create, Retrieve, Update, Delete (aka CRUD) operations on data inside an application.<br />
* The real world objects represented using a programming language needs to be stored in databases<br />
* Using relational databases to store object-oriented data leads to a semantic gap<br />
* RDBMS have fixed types, but OO can have more complicated user defined types.<br />
<br />
=====Solutions=====<br />
<br />
* Data Access Object (DAO)<br />
* Developer writes a class which contains one attribute for each field in the table<br />
* Methods for CRUD typically contains JDBC/DBI code with the necessary SQL statements.<br />
* Object Relational Mapping (ORM), WikiPedia:<br />
** “ORM is a programming technique that links databases to object-oriented language concepts, creating (in effect) a virtual object database.“<br />
* Developer needs to configure the ORM<br />
* Less amount of manual coding<br />
* CRUD methods are automatically generated by the ORM layer<br />
<br />
=====ORM=====<br />
<br />
ORM solutions<br />
<br />
* Perl<br />
** Class::DBI<br />
* Java<br />
** EJB<br />
** Hibernate<br />
** JDO<br />
** iBatis<br />
<br />
=====Perl - Class::DBI=====<br />
<br />
* Provides a simple interfaces for wrapping Perl classes around a database tables<br />
* Tables are mapped directly to objects<br />
* The table column name are mapped to the get/set methods<br />
* Can be used with transactions<br />
<br />
=====Class::DBI=====<br />
<br />
Defining a class in Class::DBI to represent a table:<br />
<br />
CVTERM<br />
cvterm_id<br />
cv_id<br />
name<br />
definition<br />
dbxref_id<br />
<br />
Corresponding code:<br />
<br />
<perl><br />
package Chado::Cvterm;<br />
use base 'Chado::DBI';<br />
Chado::Cvterm->set_up_table('Cvterm');<br />
</perl><br />
<br />
=====Class::DBI - CRUD=====<br />
<br />
<perl><br />
## Create<br />
$term_dbobj = Chado::Cvterm->create({<br />
name => ”DUMMY TERM”,<br />
cv_id => 1,<br />
dbxref_id => 125<br />
});<br />
## Retrieve<br />
$term_dbobj = Chado::Cvterm->retrieve(2);<br />
<br />
## Update<br />
$term_dbobj->name( $term->name() );<br />
$term_dbobj->definition( $term->definition );<br />
<br />
## Delete<br />
$term_dbobj->delete();<br />
</perl><br />
<br />
=====Java - Hibernate=====<br />
<br />
* Hibernate maps Java Objects directly to database tables<br />
* Scalable<br />
* Works well for controlled Data model<br />
<br />
=====Java - iBatis=====<br />
<br />
* iBATIS maps Java Objects to the results of SQL Queries<br />
* XML definitions for queries<br />
* Queries and managing Maps<br />
* Transactions<br />
* Good fit for existing database schema<br />
<br />
=====Summary=====<br />
<br />
* ORM provides painless roundtrip of data between the application and database.<br />
* Reduces the amount of SQL code and allows a programmatic style interface to the RDBMS<br />
* Choice of ORM solution depends on the type of project<br />
* Flybase examined iBatis and Hibernate, both use XML configuration files<br />
** Hibernate is better if you're building schema from scratch<br />
** Both auto-configure given a schema. <br />
** Both have strengths and weaknesses.<br />
* Is Hibernate better when you're in the process of designing a schema?<br />
** Hibernate can assist you in making a ''Hibernate-compatible'' schema.<br />
<br />
===XORT===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/project/showfiles.php?group_id=27707<br />
* Language: Perl<br />
* Authors: Pinglei Zhou<br />
* Users: Flybase<br />
* Support: Pinglei Zhou<br />
* Third party code: Uses XML::Parser::PerlSAX, Unicode::String, XML::DOM, and DBI from Perl (CPAN)<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
=====Comparing Hibernate & XORT=====<br />
<br />
Flybase tried Hibernate, but just creating simple print() statements in the course of doing bulk operations they<br />
encountered performance issues. Therer are many caching parameters available in Hibernate but the problem is that Chado is recursive or cyclical. <br />
XORT does some simple, and automatic, caching. With XORT you can handle recursive or cyclical operations more easily. In common operations such as merging genes Chado users will encounter this issue routinely.<br />
<br />
Also see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]].<br />
<br />
====Limitations====<br />
<br />
* Database schemas need to follow certain rules<br />
** All must have internal int primary key<br />
** All must have unique key(s)<br />
* It may take a long path to retrieve certain type of data<br />
** Example: gene->allele->genotype->phenotype via feature_relationship<br />
* Structure not stored in memory, you flush out data as it goes<br />
<br />
====Presentation by Pinglei Zhou and Josh Goodman====<br />
<br />
This Wiki section is an edited version of [[Media:XORT.pdf|Josh and Pinglei's presentation]].<br />
<br />
<br />
<br />
=====Introduction=====<br />
<br />
* An XML-database mapping system for data exchange between DB and XML-driven application<br />
* XORT can handle typical XML, it's not Chado-specific<br />
* Developed/Supported by Pinglei Zhou at FlyBase Harvard, 0.007 version now.<br />
* Used at all FlyBase sites<br />
** Harvard has extensive library of Perl modules for generating ChadoXML<br />
* Written in Perl<br />
* Required perl modules:<br />
** XML::Parser::PerlSAX<br />
** Unicode::String<br />
** XML::DOM<br />
** DBI<br />
<br />
=====Chado XML=====<br />
<br />
* Is ChadoXML necessary? No, but it may help you.<br />
* ChadoXML assists with incremental updates, if you want to avoid flush-and-reload.<br />
* While update can be achived by other middleware (for example, perl Class::DBI, Java Hibernate), ChadoXML provide additional feature as way to archive your transaction.<br />
* It provides bulk update/download which other methods lack or is inefficient<br />
<br />
=====Components=====<br />
<br />
* Database & Schema<br />
* ChadoXML Specification<br />
* DumpSpec<br />
** DumpSpec files are simple XML files that tells XORT what to do<br />
** DumpSpec files are ''language independent'', being XML<br />
** It's fairly easy for those who know the schema to read these files and understand what the operation is<br />
<br />
=====Highlights of Chado XML Specification=====<br />
<br />
* Unique represent of specific database schema<br />
* Get away with those internal primary key value<br />
* Static vs. Operational<br />
* Encoding for non-ASCII characters<br />
* Macro mechanism (object reference)<br />
<br />
=====Putting it together: New FlyBase dataflow Part 1=====<br />
<br />
There are three Flybase sites, and most curation is done at Harvard and<br />
Cambridge. Proforma is the curation format at Cambridge and Harvard, but<br />
Harvard also curates with Apollo and ChadoXML.<br />
<br />
Once in Chado, the reporting instance, there's a denormalization step<br />
in moving data to a read-only database. Once in the read-only database there are<br />
dumps, for reporting purposes, using XORT to create ChadoXML. Once<br />
ChadoXML is created version 2 of XSLT is used to create HTML and GFF. HTML reports<br />
are for human-readable reports, GFF for GBrowse and for various power<br />
users.<br />
<br />
1.a. Proforma (FlyBase Cambridge) is converted to ChadoXML<br />
<br />
1.b. ChadoXML is created by Apollo (Harvard)<br />
<br />
1.c. ChadoXML is created by Java SEAN (Harvard)<br />
<br />
2. All ChadoXML is loaded into Chado by XORT<br />
<br />
=====Putting it together: New FlyBase dataflow Part 2=====<br />
<br />
3. Chado (Harvard) is denormalized and loaded into Chado (Indiana)<br />
<br />
4. ChadoXML is created from Chado using XORT<br />
<br />
5.a. GFF and Fasta is created from ChadoXML<br />
<br />
5.b. HTML is created from Chado XML<br />
<br />
=====Data & Report Generation=====<br />
<br />
* Content of all output files is controlled by XML dumpspecs.<br />
** Dumpspecs are language independent.<br />
** Easily readable (with knowledge of Chado structure).<br />
* All XML transformation steps are done with XSLT v2.<br />
** Saxon XSLT (http://saxon.sourceforge.net/)<br />
** ChadoXML is split into individual chunks before XSLT processing to accommodate large file sizes.<br />
** Extremely fast. We can process all data for ~60,000 Drosophila genes in under 30 minutes.<br />
<br />
=====Hibernate & XORT=====<br />
<br />
* Hibernate didn't scale well when dealing with 5,000+ features in bulk.<br />
** The test was simply calling <code>print()</code> statements<br />
* Performance tweaks for Hibernate can be quite complicated to setup for bulk operations.<br />
* XORT is currently handling ~6 million features in production with only minor performance problems.<br />
* XORT is much more language independent.<br />
<br />
=====Support for complex transactions using XORT=====<br />
<br />
For example:<br />
<br />
* Find all records linked to a record using dumpspec<br />
* Merge gene x into y, each with thousands of records attached<br />
<br />
Step 1. Dump all data use simple dumpspec<br />
<xml><br />
<chado><br />
<feature dump=“all”><br />
<uniquename test=“eq”>x</uniquename><br />
</feature><br />
</chado><br />
</xml><br />
Step 2 Delete feature x from DB, with triggers to clean orphan records, if necessary<br />
<br />
Step 3. Edit the output xml, change uniquename x to y, then load the edited file back to DB<br />
<br />
=====CHIA (Chado Interface Application)=====<br />
<br />
A Java application that organizes SQL and XORT functionality for internal users, e.g.:<br />
<br />
* Dump chado-XML for gene regions for Apollo curation<br />
* Organize and execute “canned” SQL queries<br />
* Serve IDs for curators (in development)<br />
* Dynamic browser Chado without writing SQL statement<br />
<br />
CHIA is being designed to be extensible for adding new functionality as needed.<br />
<br />
<br />
=====Documentation=====<br />
<br />
* ''Using Chado to Store Genome Annotation Data"<br />
** Current Protocols in Bioinformatics (Baxevanis, A.D., and Davison, D.B., eds) 2, 9.6.1-9.6.28.<br />
* XORT specification docs<br />
* XORT draft (unpublished)<br />
* GMOD case demo procedure<br />
** All in the doc directory of XORT package, http://www.gmod.org<br />
<br />
=====Acknowledgements=====<br />
<br />
* Willian Gelbart <br />
* Chris Mungall<br />
* David Emmert <br />
* Mark Gibson<br />
* Stan Letovsky <br />
* Nomi Harris<br />
* Frank Smutniak <br />
* Suzanna Lewis<br />
* Peili Zhang <br />
* Stan Letovsky<br />
* Haiyan Zhang <br />
* Aubrey de Grey<br />
* Andy Schroeder <br />
* Don Gilbert<br />
* Susan Russo<br />
* Mark Zythovicz <br />
* Scott Cain<br />
* Lincoln Stein<br />
* Victor Strelets<br />
* Robert Wilson<br />
* Paul Leyland<br />
<br />
===Chado::AutoDBI===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/projects/gmod/<br />
* Language: Perl<br />
* Authors: Allen Day, Scott Cain, Brian O'Connor, & others<br />
* Users: <br />
* Support:<br />
* Third party code: Based on Class::DBI by Michael Schwern & Tony Bowden<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Performance<br />
** Can one read thousands of objects into memory? You could do this but it's not suited to bulk operations<br />
* Joins & complex queries<br />
<br />
<perl><br />
# Add the add_constructor for looking for name lengths<br />
<br />
__PACKAGE__ ->add_constructor(long_names => qq{ length(name) > 15 });<br />
<br />
# Custom SQL<br />
<br />
__PACKAGE__->set_sql(xfiles => qq{<br />
SELECT FEATURE_ID<br />
FROM FEATURE<br />
WHERE NAME = 'xfiles' });<br />
</perl><br />
<br />
====Presentation by Brian O'Connor====<br />
<br />
This Wiki section is an edited version of [[Media:AutoDBI.pdf|Brian's presentation]].<br />
<br />
=====Relation to Turnkey=====<br />
<br />
Turnkey is a package that auto-generates Web sites given a relational<br />
schema, based on SQL::Translator<br />
<br />
* Turnkey authors: Allen Day, Scott Cain, Brian O'Connor<br />
* Turnkey and Chado::AutoDBI objects are essentially the same<br />
<br />
=====Technical Overview=====<br />
<br />
* Code Generation<br />
<br />
=====Project Overview=====<br />
<br />
Convert SQL Queries/Inserts/Deletes -> Object Calls<br />
<sql><br />
INSERT INTO feature (organism_id, name)<br />
VALUES (1, 'foo');<br />
</sql><br />
To:<br />
<perl><br />
my $feature = Turnkey::Model::Feature->find_or_create({<br />
organism_id => $organism,<br />
name => 'xfile', uniquename => 'xfile',<br />
type_id => $mrna_cvterm,<br />
is_analysis => 'f', is_obsolete => 'f'<br />
});<br />
</perl><br />
<br />
=====Technical Overview=====<br />
<br />
* Database connection: use a base class<br />
* Set up base object and connect, then create a ''table object'' to access primary key. <br />
* Class::DBI can find and insert records into other table, based on foreign key.<br />
<br />
<perl><br />
use base qw(Class::DBI::Pg);<br />
<br />
my ($dsn, $name, $pass);<br />
$dsn = "dbi:Pg:host=localhost;dbname=chado;port=5432";<br />
$name = "postgres";<br />
$pass = "";<br />
<br />
Turnkey::Model::DBI->set_db('Main', $dsn, $name, $pass, {AutoCommit => 1});<br />
</perl><br />
<br />
=====Technical Overview=====<br />
<br />
* Basic ORM Object: Feature<br />
<br />
<perl><br />
package Turnkey::Model::Feature;<br />
use base 'Turnkey::Model::DBI';<br />
<br />
Turnkey::Model::Feature->set_up_table('feature');<br />
<br />
#<br />
# Primary key accessors<br />
#<br />
<br />
sub id { shift->feature_id }<br />
sub feature { shift->feature_id }<br />
</perl><br />
<br />
* data field accessors by Class::Accessor<br />
<br />
=====Technical Overview=====<br />
<br />
* Basic ORM Object: Feature<br />
** has_a<br />
<br />
<perl><br />
#<br />
# has_a<br />
#<br />
Turnkey::Model::Feature->has_a( type_id => "Turnkey::Model::Cvterm" );<br />
sub cvterm { return shift->type_id; }<br />
</perl><br />
<br />
* Basic ORM Object: Feature<br />
** has_many<br />
<br />
<perl><br />
#<br />
# has_many<br />
#<br />
Turnkey::Model::Feature->has_many('feature_synonym_feature_id', <br />
'Turnkey::Model::Feature_Synonym' => 'feature_id');<br />
sub feature_synonyms { return shift->feature_synonym_feature_id; }<br />
<br />
Turnkey::Model::Feature->has_many('featureprop_feature_id', <br />
'Turnkey::Model::Featureprop' => 'feature_id');<br />
sub featureprops { return shift->featureprop_feature_id; }<br />
</perl><br />
<br />
* Can traverse tables, such as going from FEATURE to FEATUREPROP <br />
** Tell base object that the ''table object'' has_a() or has_many() keys corresponding to some key in other ''table object'' <br />
<br />
=====Technical Overview=====<br />
<br />
* Basic ORM Object: Feature<br />
** skipping linker tables for has_many<br />
<br />
<perl><br />
# skip over feature_synonym table<br />
#<br />
# method 1<br />
#<br />
sub synonyms { my $self = shift; return map $_->synonym_id, $self->feature_synonyms; }<br />
#<br />
# method 2<br />
#<br />
Turnkey::Model::Feature->has_many( synonyms2 =><br />
['Turnkey::Model::Feature_Synonym' => 'synonym_id']);<br />
</perl><br />
<br />
=====Technical Overview=====<br />
<br />
* Transactions<br />
** Chado::AutoDBI supports transactions, and one can wrap the transaction in an eval()<br />
<perl><br />
sub do_transaction {<br />
my $class = shift;<br />
my ( $code ) = @_;<br />
# Turn off AutoCommit for this scope.<br />
# A commit will occur at the exit of this block automatically,<br />
# when the local AutoCommit goes out of scope.<br />
local $class->db_Main->{ AutoCommit };<br />
<br />
# Execute the required code inside the transaction.<br />
eval { $code->() };<br />
if ( $@ ) {<br />
my $commit_error = $@;<br />
eval { $class->dbi_rollback }; # might also die!<br />
die $commit_error;<br />
}<br />
}<br />
</perl><br />
<br />
=====Technical Overview=====<br />
<br />
* Lazy Loading<br />
** One can either do automated creation of objects or explicitly dictate which fields are incorporated into object<br />
<perl><br />
Turnkey::Model::Feature->columns( Primary => qw/feature_id/ );<br />
Turnkey::Model::Feature->columns( Essential => qw/name organism_id type_id/ );<br />
Turnkey::Model::Feature->columns( Others => qw/residues .../ );<br />
</perl><br />
<br />
Typically:<br />
<br />
<perl><br />
Turnkey::Model::Feature->set_up_table('feature');<br />
</perl><br />
<br />
=====Problem 1=====<br />
<br />
* Create Feature & Add Description<br />
<br />
<perl><br />
# now create mRNA feature<br />
<br />
my $feature = Turnkey::Model::Feature->find_or_create({<br />
organism_id => $organism,<br />
name => 'xfile', uniquename => 'xfile',<br />
type_id => $mrna_cvterm,<br />
is_analysis => 'f', is_obsolete => 'f'<br />
});<br />
<br />
# create description<br />
<br />
my $featureprop = Turnkey::Model::Featureprop->find_or_create({<br />
value => 'A test gene for GMOD meeting',<br />
feature_id => $feature,<br />
type_id => $note_cvterm,<br />
});<br />
</perl><br />
<br />
=====Problem 2=====<br />
<br />
* Retrieve a Feature via Searching<br />
** Search using strings or identifiers, a search will return an iterator object<br />
<br />
<perl><br />
# objects for global use<br />
<br />
# the organism for our new feature<br />
my $organism = Turnkey::Model::Organism->search(abbreviation => "S.cerevisiae")->next;<br />
<br />
# the cvterm for a "Note"<br />
my $note_cvterm = Turnkey::Model::Cvterm->retrieve(2);<br />
<br />
# searching name by wildcard<br />
<br />
my @results = Turnkey::Model::Feature->search_like(name => 'x-%');<br />
</perl><br />
<br />
=====Problems 3, 4, & 5=====<br />
<br />
* Update a Feature<br />
<br />
<perl><br />
# update the xfile gene name<br />
<br />
$feature->name("x-file");<br />
$feature->update();<br />
</perl><br />
<br />
* Delete a Feature<br />
<br />
<perl><br />
# now delete the x-file feature<br />
<br />
$feature->delete();<br />
</perl><br />
<br />
=====Things Chado::AutoDBI does well=====<br />
<br />
* Easy to use<br />
* Easy to port<br />
* Use with other DBs<br />
** Both Oracle and Postgres used currently<br />
* Autogenerated via Turnkey<br />
* find_or_create method<br />
* Performance is not as bad as you might guess<br />
** Due to Lazy loading<br />
** Even whole genome operations are feasible<br />
<br />
Note that speed is relative: one can find bad performance using the wrong SQL and Chado::AutoDBI approach will be speedier.<br />
<br />
<br />
=====For More Information=====<br />
<br />
* Class::DBI<br />
** http://www.class-dbi.com<br />
** http://search.cpan.org<br />
<br />
* Turnkey<br />
** http://turnkey.sf.net<br />
<br />
* Biopackages<br />
** http://biopackages.net<br />
<br />
===Modware===<br />
<br />
====Background====<br />
<br />
* Source: http://gmod-ware.sourceforge.net<br />
* Language: Perl<br />
* Authors: Sohel Merchant, Eric Just<br />
* Contact: e-just [at] northwestern.edu<br />
* Users: DictyBase<br />
* Support: <br />
* Third party code: GMOD, BioPerl<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity: Uses Chado::AutoDBI to connect. Connection is configured on GMOD install.<br />
* Transaction support: See Chado::AutoDBI talk.<br />
* Code generation: No automatic code generation<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Does not ''cover'' all of Chado<br />
* Not enough users to get quality feedback yet<br />
* Performance (?)<br />
* Language dependent<br />
<br />
====Presentation by Eric Just====<br />
<br />
Eric Just, Senior Bioinformatics Scientist, dictyBase: http://dictybase.org<br />
Center for Genetic Medicine, Northwestern University. This is an edited version of [[Media:Modware.pdf|Eric's presentation]].<br />
<br />
=====Why Modware Was Developed=====<br />
<br />
* Each feature type requires different behavior<br />
* Want to leave schema semantics out of application<br />
* Want to leverage work done in BioPerl<br />
* Re-use code developed for common use cases<br />
* DictyBase is using a superset of Modware<br />
** Modware uses this code, but strips out all non-standard GMOD code<br />
<br />
=====What is in the Feature Table?=====<br />
<br />
The core of Chado<br />
<br />
* Chromosome<br />
* Contig<br />
* Gene<br />
* mRNA<br />
* Exon<br />
* Lots of other things - See Sequence Ontology!<br />
<br />
=====Modware Features=====<br />
<br />
* Multiple Feature classes<br />
** CHROMOSOME, GENE, MRNA, CONTIG<br />
* Each class provides type specific methods<br />
* Logic such as building exon structure of mRNA features is encapsulated<br />
* Parent class Modware::Feature<br />
** Provides common methods such as name(), primary_id(), external_ids() <br />
** Abstract factory for various feature types<br />
* Lazy : information is only retrieved when you ask for it, but cached for <br />
speedy retrieval the next time it is required<br />
* Uses Bioperl and its objects<br />
** Each different feature subclass has a bioperl() method that returns <br />
an appropriate BioPerl object.<br />
** Bioperl object manipulation used to update feature coordinates<br />
* Subclasses provide type-specific methods<br />
** For example, Chromosome isn't the same as Gene which isn't the same as ...<br />
* Any feature type not explicitly supported in Modware::Feature class is blessed <br />
as a Modware::Feature::GENERIC class which has a start/stop coordinate on a genomic <br />
sequence feature (no structure like a trasncript with exons)<br />
<br />
=====Architectural Overview=====<br />
<br />
* Object-oriented Perl interface to Chado<br />
* Built on top of Chado::AutoDBI<br />
* Connection handled by GMOD<br />
* Database transactions supported<br />
* BioPerl used to represent and manipulate sequence and feature structure<br />
* ‘Lazy’ evaluation<br />
<br />
=====Create and Insert Chromosome=====<br />
<br />
<perl><br />
my $seq_io = new Bio::SeqIO(<br />
-file => "../data/fake_chromosome.txt",<br />
-format => 'fasta'<br />
);<br />
<br />
# Bio::SeqIO will return a Bio::Seq object which<br />
# Modware uses as its representation<br />
my $seq = $seq_io->next_seq();<br />
<br />
my $reference_feature = new Modware::Feature(<br />
-type => 'chromosome',<br />
-bioperl => $seq,<br />
-description => "This is a test",<br />
-name => 'Fake',<br />
-source => 'GMOD 2007 Demo'<br />
);<br />
<br />
# Inserts chromosome into database<br />
$reference_feature->insert();<br />
</perl><br />
<br />
<br />
=====Problem 1 - Create and Insert a Gene=====<br />
<br />
1) Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned feature_id for each inserted gene.<br />
<br />
Gene Feature<br />
symbol: x-ray<br />
synonyms: none<br />
mRNA Feature<br />
exon:<br />
start: 1703<br />
end: 1900<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 1 - Create and Insert a Gene=====<br />
<br />
1) Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned feature_id for each inserted gene.<br />
<br />
Gene Feature<br />
symbol: x-men<br />
synonyms: wolverine<br />
mRNA Feature<br />
exon_1: <br />
start: 12648<br />
end: 13136<br />
strand: 1<br />
srcFeature_id: <br />
Id of genomic sample<br />
<br />
=====Problem 1 - Create and Insert a Gene=====<br />
<br />
1) Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned feature_id for each inserted gene.<br />
<br />
Gene Feature<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
mRNA Feature<br />
exon_1: <br />
start: 13691<br />
end: 13767<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
exon_2: <br />
start: 14687<br />
end: 14720<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 1 - Create and Insert a Gene=====<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
…<br />
<br />
<perl><br />
my $gene_feature = new Modware::Feature(<br />
-type => 'gene',<br />
-name => 'xfile',<br />
-description => 'A test gene for GMOD meeting',<br />
-source => 'GMOD 2007 Demo‘<br />
);<br />
<br />
$gene_feature->add_synonym( 'mulder' );<br />
$gene_feature->add_synonym( 'scully' );<br />
<br />
# inserts object into database<br />
$gene_feature->insert();<br />
print 'Inserted gene with feature_id:'.$gene_feature->feature_id()."\n";<br />
</perl><br />
<br />
=====Problem 1 - Create mRNA BioPerl Object=====<br />
<br />
exon_1: exon_2: <br />
start: 13691 start: 14687<br />
end: 13767 end: 14720<br />
strand: 1 strand: 1<br />
srcFeature_id: Id of genomic sample srcFeature_id: Id of genomic sample<br />
<br />
<perl><br />
# First, create exon features (using Bioperl)<br />
my $exon_1 = new Bio::SeqFeature::Gene::Exon (<br />
-start => 13691,<br />
-end => 13767,<br />
-strand => 1,<br />
-is_coding => 1<br />
);<br />
<br />
my $exon_2 = new Bio::SeqFeature::Gene::Exon (<br />
-start => 14687,<br />
-end => 14720,<br />
-strand => 1,<br />
-is_coding => 1<br />
);<br />
<br />
# Next, create transcript feature to 'hold' exons (using Bioperl)<br />
my $bioperl_mrna = new Bio::SeqFeature::Gene::Transcript();<br />
<br />
# Add exons to transcript (using Bioperl)<br />
$bioperl_mrna->add_exon( $exon_1 );<br />
$bioperl_mrna->add_exon( $exon_2 );<br />
</perl><br />
<br />
=====Problem 1 - Create and Insert mRNA=====<br />
<br />
The BioPerl object holds the location information, but now we want to create a Modware object and link it to the gene as well as locate it on the chromosome.<br />
<br />
<perl><br />
# Now create Modware Feature to 'hold' bioperl object<br />
my $mrna_feature = new Modware::Feature(<br />
-type => 'mRNA',<br />
-bioperl => $bioperl_mrna,<br />
-source => 'GMOD 2007 Demo',<br />
-reference_feature => $reference_feature<br />
);<br />
<br />
# Associate mRNA to gene (required for insertion)<br />
$mrna_feature->gene( $gene_feature );<br />
<br />
# inserts object into database<br />
$mrna_feature->insert();<br />
</perl><br />
<br />
=====Problem 2 - Writing the Report=====<br />
<br />
2) Retrieve and print the following report for gene xfile<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>xfile cds<br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
Create new package, GMODWriter, to write the report, this package<br />
uses Modware and Bioperl methods.<br />
<br />
<perl><br />
use Modware::Gene;<br />
use GMODWriter;<br />
<br />
my $xfile_gene = new Modware::Gene( -name => 'xfile' );<br />
GMODWriter->Write_gene_report( $xfile_gene );<br />
</perl><br />
<br />
* What's the difference between Modware::Gene and Modware::Feature? Gene is-a Feature.<br />
<br />
=====Problem 2 - Writing the Report=====<br />
<br />
2) Retrieve and print the following report for gene xfile<br />
<br />
* The mRNA object contains the Bioperl object<br />
** Why not just subclass? More flexibility the way shown here<br />
<br />
<perl><br />
package GMODWriter; <br />
sub Write_gene_report {<br />
my ($self, $gene) = @_;<br />
my $symbol = $gene->name();<br />
<br />
my @synonyms = @{ $gene->synonyms() };<br />
my $syn_string = join ",", @synonyms;<br />
my $description = $gene->description();<br />
my $type = $gene->type();<br />
# get features associated with the gene that are of type 'mRNA'<br />
my ($mrna) = grep { $_->type() eq 'mRNA' } @{ $gene->features() };<br />
# use bioperl method to get exons from mRNA<br />
my @exons = $mrna->bioperl->exons_ordered();<br />
# Modware will return a nice fasta file for you.<br />
my $fasta = $mrna->sequence( -type => 'cds', -format => 'fasta' );<br />
# Now print the actual report<br />
print "symbol: $symbol\n";<br />
print "synonyms: $syn_string\n";<br />
print "description: $description\n";<br />
print "type: $type\n";<br />
<br />
my $count = 0;<br />
foreach my $exon (@exons ) {<br />
$count++;<br />
print "exon${count} start: ".$exon->start()."\n";<br />
<br />
print "exon${count} end: ".$exon->end()."\n";<br />
<br />
}<br />
print "$fasta";<br />
}<br />
. . .<br />
</perl><br />
<br />
=====Problem 3 - Updating a Gene Name=====<br />
<br />
3) Update the gene xfile: change the name symbol to x-file and retrieve the changed record. Regenerate gene report<br />
<br />
<perl><br />
use Modware::Gene;<br />
use Modware::DBH;<br />
use GMODWriter;<br />
<br />
eval{<br />
<br />
# get xfile gene<br />
my $xfile_gene = new Modware::Gene( -name => 'xfile' );<br />
<br />
# change the name<br />
$xfile_gene->name( 'x-file' );<br />
# write changes to database<br />
$xfile_gene->update();<br />
<br />
# we can use the original object if we want, but instead<br />
# we refetch from the database to 'prove' the name has been changed<br />
my $xfile_gene2 = new Modware::Gene( -name => 'x-file' );<br />
# use our GMODWriter package to write report for x-file<br />
GMODWriter->Write_gene_report( $xfile_gene2 );<br />
<br />
};<br />
if ($@){<br />
warn $@;<br />
new Modware::DBH->rollback();<br />
}<br />
</perl><br />
<br />
<br />
=====Problem 4 - Search and Display Results=====<br />
<br />
4) Search for all genes with symbols starting with "x-*". With the results produce the following simple result list (organism will vary):<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
1325 x-ray Xenopus laevis<br />
<br />
<perl><br />
use Modware::Gene;<br />
use Modware::DBH;<br />
use GMODWriter;<br />
<br />
# find genes starting with 'x-'<br />
my $results = Modware::Search::Gene->Search_by_name( 'x-*' );<br />
<br />
# write the search results<br />
GMODWriter->Write_search_results( $results )<br />
</perl><br />
<br />
<br />
=====Problem 4 - Search and Display Results=====<br />
<br />
4) Search for all genes with symbols starting with "x-*". With the results produce the following simple result list (organism will vary):<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
1325 x-ray Xenopus laevis<br />
<br />
<br />
<perl><br />
sub Write_search_results {<br />
my ($self, $itr) = @_;<br />
# loop through iterator<br />
while (my $gene = $itr->next) {<br />
# print the requested information<br />
print $gene->feature_id . "\t" . $gene->name .<br />
"\t" . $gene->organism_name . "\n";<br />
}<br />
}<br />
</perl><br />
<br />
=====Problem 5 - Delete a Gene=====<br />
<br />
5) Delete the gene x-ray. Run the search and report again.<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
<br />
<perl><br />
# get the xray gene<br />
my $xray = new Modware::Gene( -name => 'x-ray' );<br />
<br />
# set is_deleted = 1, this will 'hide' the gene from searches,<br />
# also sets the is_available to 0, the gene is no longer visible<br />
# to a search.<br />
<br />
$xray->is_deleted(1);<br />
<br />
# write change to database<br />
$xray->update();<br />
<br />
# find genes starting with 'x-'<br />
my $results = Modware::Search::Gene->Search_by_name( 'x-*' );<br />
<br />
# write the search results<br />
GMODWriter->Write_search_results( $results )<br />
</perl><br />
<br />
<br />
=====Other Modware Highlights=====<br />
<br />
* Easy to write applications with Modware<br />
* Extensible<br />
* Available through Sourceforge<br />
** http://gmod-ware.sourceforge.net<br />
* Easy to install<br />
* Large unit test coverage<br />
* Current release 0.2-RC1<br />
** Works with GMOD’s latest release<br />
* Sample script demoed here are available<br />
** sample_scripts directory<br />
<br />
=====Other Nice Things About Modware=====<br />
<br />
<br />
* Bioperl-style documentation <br />
** http://gmod-ware.sourceforge.net/doc/<br />
** POD for all methods<br />
* If Chado changes then...<br />
** Manually change Modware or ... <br />
** AutoDBI will automatically adjust to the change, depends on the change<br />
* Can set multiple connections through AutoDBI's <code>set_connection</code><br />
<br />
=====Coming Attractions=====<br />
<br />
* Support for changing genomic sequence<br />
* ncRNAs<br />
* UTRs<br />
* Onotology modules<br />
* Phenotype Annotations<br />
* Getting a new database handle returns the existing<br />
** Thinking about configuring modules to set what database handle can be used<br />
* Pass an argument ''type'' to the Gene's feature() method<br />
* Type the kind of synonym is being inserted?<br />
** Possible: trade-off between simplicity and functionality<br />
<br />
* Send us your ideas!<br />
<br />
<br />
=====Discussion=====<br />
<br />
* How hard is it to extend Modware?<br />
** Not known absolutely, but generally thought to be not difficult<br />
<br />
=====Acknowlegments=====<br />
<br />
* Rex Chisholm, PhD<br />
* Warren Kibbe, PhD <br />
* Scott Cain<br />
* Brian O’connor<br />
* Sohel Merchant<br />
* Petra Fey<br />
* Pascale Gaudet,<br />
* Karen Pilcher<br />
<br />
* BioPerl<br />
* GMOD<br />
* SGD<br />
<br />
===GBrowse (DasI) Adaptor===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/project/showfiles.php?group_id=27707&package_id=34513&release_id=433523<br />
* Language: Perl<br />
* Authors: Scott Cain<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity: Connects vi perl DBI<br />
* Transaction support: N/A (read only adapter)<br />
* Code generation: N/A<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Read-only<br />
* Not generic middleware but if you use Chado and GBrowse may be useful<br />
* Incomplete implementation of Bio::DasI; just enough to make GBrowse work<br />
* Also, despite the name, has never been tested with a Das server.<br />
<br />
====Presentation by Scott Cain====<br />
<br />
This Wiki section is an edited version of [[Media:DasI_middleware.pdf|Scott's presentation]].<br />
<br />
=====Create the database=====<br />
<br />
$ perl Makefile.PL<br />
$ make<br />
$ sudo make install<br />
$ make load_schema<br />
$ make prepdb # now with Xenopus!<br />
$ make ontologies # load rel, SO, featureprop<br />
<br />
=====Problem 1 - Loading Data=====<br />
<br />
Create some GFF from the specifications:<br />
<br />
fake_chromosome example chromosome 1 15017 . . . ID=fake_chromosome;Name=fake_chromosome<br />
fake_chromosome example gene 13691 14720 . + . ID=xfile;Name=xfile;Alias=mulder,scully;Note=A test gene for GMOD meeting<br />
fake_chromosome example mRNA 13691 14720 . + . ID=xfile_mRNA;Parent=xfile<br />
fake_chromosome example exon 13691 13767 . + . Parent=xfile_mRNA<br />
fake_chromosome example exon 14687 14720 . + . Parent=xfile_mRNA<br />
fake_chromosome example gene 12648 13136 . + . ID=x-men<br />
<br />
Gene inserted as GFF using a standard Bioperl bulk loader:<br />
<br />
<code>$ gmod_bulk_load_gff3.pl -g sample.gff</code><br />
<br />
''...lots of output...''<br />
<br />
=====Adaptor Components=====<br />
<br />
* Bio::DB::Das::Chado<br />
** Database connection object<br />
* Bio::DB::Das::Chado::Segment<br />
** Object for any range of DNA<br />
* Bio::DB::Das::Chado::Segment::Feature<br />
<br />
=====Use Bio::DB::Das::Chado=====<br />
<br />
<perl><br />
use Bio::DB::Das::Chado;<br />
<br />
my $chado = Bio::DB::Das::Chado->new(<br />
-dsn => "dbi:Pg:dbname=test",<br />
-user=> "scott",<br />
-pass=> "" ) || die "no new chado";<br />
<br />
my $gene_name = 'xfile';<br />
<br />
my ($gene_fo) = $chado->get_features_by_name($gene_name);<br />
</perl><br />
<br />
=====Problem 2 - Use Some Accessors=====<br />
<br />
<perl><br />
print "symbol: " . $gene_fo->display_name."\n";<br />
print "synonyms: " . join(', ',$gene_fo->synonyms)."\n";<br />
print "description: " . $gene_fo->notes."\n";<br />
print "type: " . $gene_fo->type."\n";<br />
<br />
my ($mRNA) = $gene_fo->sub_SeqFeature();<br />
my @exons = $mRNA->sub_SeqFeature();<br />
<br />
for my $exon (@exons) {<br />
next unless ($exon->type->method eq 'exon');<br />
$exon_count++;<br />
print "exon$exon_count start: " . $exon->start."\n";<br />
print "exon$exon_count end: " . $exon->end. "\n";<br />
$cds_seq .= $exon->seq->seq; # the first seq call returns a Bio::Seq object, the second gets the DNA string from Bio::Seq<br />
} <br />
</perl><br />
<br />
=====Bulk Output=====<br />
<br />
<perl><br />
my $gene_name = 'x-*';<br />
<br />
my @genes = $chado->get_features_by_name(<br />
-name => $gene_name,<br />
-class=> 'gene' );<br />
<br />
for my $gene (@genes) {<br />
print join("\t",<br />
$gene->feature_id,<br />
$gene->display_name,<br />
$gene->organism),"\n";<br />
}<br />
</perl><br />
<br />
Or see your report in GBrowse<br />
<br />
=====Advantages=====<br />
<br />
* Comes 'for free' with GBrowse<br />
** GBrowse will run with any DasI-compatible interface<br />
* Uses 'familiar' BioPerl idioms, very similar to widely used Bio::DB::GFF (though with fewer methods)<br />
<br />
<br />
=====Conclusion=====<br />
<br />
* Not suitable as a 'general' middleware layer<br />
** May be suitable for some applications, particularly if they are similar to GBrowse or other uses of Bio::DB::GFF<br />
<br />
===iBatis and Abator===<br />
<br />
====Background====<br />
<br />
* Source: http://ibatis.apache.org/<br />
* Language: Java<br />
* Authors: Apache group<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Does not hide SQL<br />
* Does not create a whole object model of the database in memory<br />
* Not as widely used as Hibernate<br />
* No Perl version<br />
<br />
====Presentation by Jeff Bowes====<br />
<br />
Jeff Bowes, Xenbase, University of Calgary. This Wiki section is an edited version of [[Media:iBatis.pdf|Jeff's presentation]].<br />
<br />
=====ibatis=====<br />
<br />
* iBatis<br />
** Light-weight framework<br />
** Still based on SQL but eliminates the repetitive drudgery of JDBC<br />
** You can tune a query by re-writing the SQL in XML & the API does not change.<br />
* iBatis does not create your database in memory as objects<br />
* Shallow learning curve<br />
* Manually create a Java class and SQL map to describe higher-level objects<br />
** Example: ''Gene''<br />
* Support for inheritance<br />
** Inheritance in result maps, allows fair amount of re-use. <br />
* Supports different transaction schemes<br />
** For example, JDBC, Java Transaction API<br />
<br />
=====Abator=====<br />
<br />
* Generates ibatis CRUD objects by introspecting database tables<br />
* Abator creates ''SQL in XML'' files (SQL Map files) and Java classes <br />
** Within these files is a Result Map section.<br />
* Abator config files are simple, set connection parameters, tell where the files are.<br />
* In the SQL Map files you can specify how to find parent ids, such asfeature_id. <br />
<br />
=====Abator Example=====<br />
<xml><br />
<abatorConfiguration><br />
<abatorContext> <!-- TODO: Add Database Connection Information --><br />
<jdbcConnection driverClass="COM.ibm.db2.jdbc.app.DB2Driver"<br />
connectionURL="jdbc:db2:XBDV05"<br />
userId="db2inst1"<br />
password=“*******"><br />
<classPathEntry location="/Program Files/IBM/SQLLIB/java/db2java.zip" /><br />
</jdbcConnection><br />
<br />
<javaModelGenerator<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.model"<br />
targetProject="gene" /><br />
<sqlMapGenerator<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.sql"<br />
targetProject="gene" /><br />
<daoGenerator type="IBATIS"<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.dao"<br />
targetProject="gene" /><br />
<abatorConfiguration><br />
</xml><br />
<br />
=====Abator Example=====<br />
<xml><br />
<abatorConfiguration><br />
<abatorContext> <!-- TODO: Add Database Connection Information --><br />
<jdbcConnection driverClass="COM.ibm.db2.jdbc.app.DB2Driver"<br />
connectionURL="jdbc:db2:XBDV05"<br />
userId="db2inst1"<br />
password=“*******"><br />
<classPathEntry location="/Program Files/IBM/SQLLIB/java/db2java.zip" /><br />
</jdbcConnection><br />
<br />
<javaModelGenerator<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.model"<br />
targetProject="gene" /><br />
<sqlMapGenerator<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.sql"<br />
targetProject="gene" /><br />
<daoGenerator type="IBATIS"<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.dao"<br />
targetProject="gene" /><br />
<abatorConfiguration><br />
</xml><br />
<br />
=====Abator Example=====<br />
<nowiki><br />
<table schema="db2inst1" tableName="synonym"></nowiki><br />
<generatedKey column="synonym_id" sqlStatement="VALUES PREVVAL FOR<br />
synonym_seq" identity="true" /><br />
<columnOverride column="CREATED_BY" jdbcType="INTEGER" /><br />
<columnOverride column="MODIFIED_BY" jdbcType="INTEGER" /><br />
<nowiki></table></nowiki><br />
<br />
=====Abator=====<br />
<br />
Works as:<br />
<br />
* Eclipse plug-in<br />
* ANT<br />
* Standalone<br />
<br />
=====DAO Methods=====<br />
<br />
* Insert (Feature)<br />
* Update (Feature)<br />
* DeletebyKey (FeatureKey)<br />
* SelectbyKey (FeatureKey)<br />
* SelectbyExample (FeatureExample)<br />
* DeletebyExample (FeatureExample)<br />
<br />
=====Insert=====<br />
<xml><br />
<insert id="abatorgenerated_insert" parameterClass=<br />
"org.gmod.architecture.framwork.bakeoff.abator.model.FeatureWithBLOBs"><br />
insert into db2inst1.feature<br />
(DBXREF_ID, ORGANISM_ID, NAME, UNIQUENAME,<br />
RESIDUES, SEQLEN, MD5CHECKSUM, TYPE_ID, IS_ANALYSIS,<br />
IS_OBSOLETE, CREATED_BY)<br />
values (#dbxrefId:INTEGER#, #organismId:INTEGER#, #name:VARCHAR#,<br />
#uniquename:VARCHAR#, #residues:CLOB#, #seqlen:INTEGER#,<br />
#md5checksum:CHAR#, #typeId:INTEGER#,<br />
#isAnalysis:SMALLINT#, #isObsolete:SMALLINT#,<br />
#createdBy:INTEGER#)<br />
<br />
<selectKey resultClass="java.lang.Integer" keyProperty="featureId"><br />
VALUES PREVVAL FOR feature_seq<br />
</selectKey><br />
</insert><br />
</xml><br />
<br />
=====Insert=====<br />
<xml><br />
<insert id="abatorgenerated_insert" parameterClass=<br />
"org.gmod.architecture.framwork.bakeoff.abator.model.FeatureWithBLOBs"><br />
insert into db2inst1.feature<br />
(DBXREF_ID, ORGANISM_ID, NAME, UNIQUENAME,<br />
RESIDUES, SEQLEN, MD5CHECKSUM, TYPE_ID, IS_ANALYSIS,<br />
IS_OBSOLETE, CREATED_BY)<br />
values (#dbxrefId:INTEGER#, #organismId:INTEGER#, #name:VARCHAR#,<br />
#uniquename:VARCHAR#, #residues:CLOB#, #seqlen:INTEGER#,<br />
#md5checksum:CHAR#, #typeId:INTEGER#,<br />
#isAnalysis:SMALLINT#, #isObsolete:SMALLINT#,<br />
#createdBy:INTEGER#)<br />
<br />
<selectKey resultClass="java.lang.Integer" keyProperty="featureId"><br />
VALUES PREVVAL FOR feature_seq<br />
</selectKey><br />
</insert><br />
</xml><br />
<br />
=====Insert=====<br />
<xml><br />
<insert id="abatorgenerated_insert" parameterClass=<br />
"org.gmod.architecture.framwork.bakeoff.abator.model.FeatureWithBLOBs"><br />
insert into db2inst1.feature<br />
(DBXREF_ID, ORGANISM_ID, NAME, UNIQUENAME,<br />
RESIDUES, SEQLEN, MD5CHECKSUM, TYPE_ID, IS_ANALYSIS,<br />
IS_OBSOLETE, CREATED_BY)<br />
values (#dbxrefId:INTEGER#, #organismId:INTEGER#, #name:VARCHAR#,<br />
#uniquename:VARCHAR#, #residues:CLOB#, #seqlen:INTEGER#,<br />
#md5checksum:CHAR#, #typeId:INTEGER#,<br />
#isAnalysis:SMALLINT#, #isObsolete:SMALLINT#,<br />
#createdBy:INTEGER#)<br />
<br />
<selectKey resultClass="java.lang.Integer" keyProperty="featureId"><br />
VALUES PREVVAL FOR feature_seq<br />
</selectKey><br />
</insert><br />
</xml><br />
<br />
=====Insert=====<br />
<xml><br />
<selectKey resultClass="java.lang.Integer"<br />
keyProperty="featureId"><br />
VALUES PREVVAL FOR feature_seq<br />
</selectKey><br />
</xml><br />
<br />
=====Insert=====<br />
<xml><br />
<selectKey resultClass="java.lang.Integer"<br />
keyProperty="featureId"><br />
VALUES PREVVAL FOR feature_seq<br />
</selectKey><br />
</xml><br />
<br />
=====Problem 1 - Insert=====<br />
<br />
<java><br />
try {<br />
sqlMap.startTransaction();<br />
pGene.id =featureDAO.insert(pGene.getFeatureWithBLOBs());<br />
featurepropDAO.insert(pGene.getPropertyDescription());<br />
pGene.featurelocId = featurelocDAO.insert(pGene<br />
.getFeaturelocWithBLOBS());<br />
pGene = insertExons(pGene);<br />
insertSynonyms(pGene);<br />
sqlMap.commitTransaction();<br />
} catch (Exception e) {<br />
System.out.println(e);<br />
throw (e);<br />
} finally {<br />
sqlMap.endTransaction();<br />
}<br />
</java><br />
<br />
=====Problem 1 - Insert=====<br />
<br />
<java><br />
try {<br />
sqlMap.startTransaction();<br />
pGene.id =featureDAO.insert(pGene.getFeatureWithBLOBs());<br />
featurepropDAO.insert(pGene.getPropertyDescription());<br />
pGene.featurelocId = featurelocDAO.insert(pGene<br />
.getFeaturelocWithBLOBS());<br />
pGene = insertExons(pGene);<br />
insertSynonyms(pGene);<br />
sqlMap.commitTransaction();<br />
} catch (Exception e) {<br />
System.out.println(e);<br />
throw (e);<br />
} finally {<br />
sqlMap.endTransaction();<br />
}<br />
</java><br />
<br />
=====Transactions=====<br />
<br />
* SQLMap<br />
* JDBC<br />
* JTA - Java Transaction API<br />
** 2-Phase commit<br />
* Hibernate<br />
* External (Customized)<br />
<br />
=====Retrieval=====<br />
<br />
symbol: xfile<br />
description: A test gene for GMOD meeting<br />
mRNA Feature<br />
exon_1: start: 13691 end: 13767 <br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
exon_2: start: 14687 end: 14720<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 2 - Master Detail Reports=====<br />
<br />
Account for cycles or recursion in Master Detail Report. <br />
<br />
<xml><br />
<resultMap id="SelectGeneResults"<br />
class="org.gmod.architecture.framwork.bakeoff.Gene" groupBy="id"><br />
<result column="FEATURE_ID" property="id" jdbcType="INTEGER"/><br />
<result column="GENE_NAME" property="name" jdbcType="VARCHAR" /><br />
<result column="DESCRIPTION" property="description“<br />
jdbcType="VARCHAR" /><br />
<result column="TYPE_ID" property="typeId" jdbcType="INTEGER" /><br />
<result property="exons" resultMap = "gene.SelectExonResults"/><br />
</resultMap><br />
<br />
<resultMap id="SelectExonResults"<br />
class="org.gmod.architecture.framwork.bakeoff.Exon"><br />
<result column="EXON_ID" property="id" jdbcType="INTEGER"/><br />
<result column="EXON_NAME" property="name" jdbcType="VARCHAR" /><br />
<result column="EXON_RESIDUES" property="residues" jdbcType="CLOB" /><br />
<result column="STRAND" property="strand" jdbcType="INTEGER" /><br />
<result column="FMIN" property="fmin" jdbcType="INTEGER" /><br />
<result column="FMAX" property="fmax" jdbcType="INTEGER" /><br />
<result column="SRCFEATURE_ID" property="sourceFeatureId"<br />
jdbcType="INTEGER" /><br />
</resultMap><br />
</xml><br />
<br />
=====Problem 2 - Master Detail Report=====<br />
<xml><br />
<resultMap id="SelectGeneResults"<br />
class="org.gmod.architecture.framwork.bakeoff.Gene" groupBy="id"><br />
<result column="FEATURE_ID" property="id" jdbcType="INTEGER"/><br />
<result column="GENE_NAME" property="name" jdbcType="VARCHAR" /><br />
<result column="DESCRIPTION" property="description“<br />
jdbcType="VARCHAR" /><br />
<result column="TYPE_ID" property="typeId" jdbcType="INTEGER" /><br />
<result property="exons" resultMap = "gene.SelectExonResults"/><br />
</resultMap><br />
<br />
<resultMap id="SelectExonResults"<br />
class="org.gmod.architecture.framwork.bakeoff.Exon"><br />
<result column="EXON_ID" property="id" jdbcType="INTEGER"/><br />
<result column="EXON_NAME" property="name" jdbcType="VARCHAR" /><br />
<result column="EXON_RESIDUES" property="residues" jdbcType="CLOB" /><br />
<result column="STRAND" property="strand" jdbcType="INTEGER" /><br />
<result column="FMIN" property="fmin" jdbcType="INTEGER" /><br />
<result column="FMAX" property="fmax" jdbcType="INTEGER" /><br />
<result column="SRCFEATURE_ID" property="sourceFeatureId"<br />
jdbcType="INTEGER" /><br />
</resultMap><br />
</xml><br />
<br />
=====Master Detail Report=====<br />
<br />
gene_id Symbol Type Fmin Fmax<br />
6129482 x-files gene 13691 13767<br />
6129482 x-files gene 14687 14720<br />
<br />
Becomes:<br />
<br />
gene_id Symbol Type Fmin Fmax<br />
6129482 x-files gene 13691 13767<br />
14687 14720<br />
<br />
=====Dynamic Queries=====<br />
<br />
* Gene Name (Description)<br />
** Feature, Featureprop<br />
* Symbol<br />
** Feature<br />
* Feature Synonyms<br />
** Feature, Feature_Synonym, Synonym<br />
* Ortholog Synonyms<br />
** Feature, Feature_relationship, Feature, Feature Synonyms<br />
<br />
=====Dynamic Queries=====<br />
<br />
FROM<br />
CAT_X_GENE_V gc<br />
<isEqual<br />
prepend=",property="searchSymbol"<br />
compareValue="true"><br />
GENE_SYMBOLS s<br />
</isEqual><br />
<br />
<isEqual prepend=","<br />
property="searchNcbi" <br />
compareValue="true"><br />
NCBI_GI n<br />
</isEqual><br />
<br />
=====Dynamic Queries=====<br />
<br />
<dynamic prepend="WHERE"><br />
<isEqual prepend="AND" property="searchNameOnly“<br />
compareValue="true"><br />
<iterate property="searchTokens" conjunction="AND" <br />
open=" (" close=") "><br />
LOWER(VARCHAR(gc.longname)) LIKE <br />
LOWER(CAST(#searchTokens[]:VARCHAR# AS VARCHAR(512)))<br />
</iterate><br />
</isEqual><br />
<br />
Iterate very useful for multiple search terms <br />
<br />
=====Miscellaneous Features=====<br />
<br />
* Supports various data sources<br />
** Simple JDBC<br />
** DBCP – Apache Connection Pooling<br />
** JNDI – Java Naming Directory Interface<br />
* Very flexible<br />
* Local caching of results<br />
** Lazy loading<br />
<br />
=====Support=====<br />
<br />
* In GMOD used by<br />
** Xenbase, Artemis at Sanger<br />
* Many other users<br />
** e.g. MySpace.com<br />
* Top level Apache Project<br />
** www.ibatis.apache.org<br />
* Active community<br />
<br />
<br />
=====What iBatis Does Well=====<br />
<br />
* Does not hide SQL<br />
** No new query language to learn<br />
* Separates and groups SQL<br />
* Simple!!<br />
** Light wrapper - No real tweaks<br />
* Does the job well<br />
* Excellent support for Master-Detail<br />
* Dynamically generated queries <br />
** You can structure conditions around clauses in SQL<br />
** One XML statement can represent many variations on a query<br />
<br />
=====Acknowledgements=====<br />
<br />
GMOD<br />
* Eric Just<br />
* Everyone else<br />
<br />
Ibatis Developers<br />
* Kevin Snyder,<br />
* Chris Jarabek,<br />
* Ross Gibb<br />
<br />
PI<br />
* Peter Vize<br />
<br />
Financial Support<br />
* Alberta Heritage Foundation for Medical Research<br />
* Alberta Network for Proteomics Innovation<br />
* University of Calgary, Faculty of Science<br />
* University of Calgary Dept. of Computer Science<br />
* NICHD<br />
<br />
===Hibernate===<br />
<br />
====Background====<br />
<br />
* Source: http://www.hibernate.org<br />
* Language: Java<br />
* Authors: JBoss group<br />
* Users: VectorBase<br />
* Support: JBoss group<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Limitations====<br />
<br />
* Issue around Completeness <br />
* Exception Handling<br />
* Performance Tuning<br />
<br />
====Presentation by Robert Bruggner====<br />
<br />
Chado API via Java & Hibernate, Robert Bruggner, VectorBase.org. This Wiki section is an edited version of [[Media:HibernateChadoAPI.pdf|Robert's presentation]].<br />
<br />
=====Overview=====<br />
<br />
* Background<br />
* Quick Hibernate Overview<br />
* Hibernate Connectivity and O/R Mapping Example<br />
* GMOD Demo<br />
<br />
Also see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]].<br />
<br />
=====Background=====<br />
<br />
* VectorBase<br />
** A bioinformatic resource center for invertebrate vectors of human pathogens<br />
* Responsible for storage and display of multiple organisms’ genomes<br />
** Anopheles gambiae, Aedes aegypti, Ixodes scapularis, Culex pipiens and so on....<br />
* Want to store data for many organisms- Chado a natural choice<br />
* Ensembl Genome Browser already used for ''A. gambiae''<br />
** Wrote Ensembl API Database adaptor for Chado... Not maintainable.<br />
* Use Both Databases<br />
** Transfer genomic data from Ensembl to Chado<br />
** Search Engine and Indexer using Lucene<br />
** Run DAS<br />
** Export data via ChadoXML and GFF3<br />
* Need API for Database I/O<br />
<br />
=====Hibernate Background=====<br />
<br />
* They say: “A powerful, high performance object/relational persistence and query service.”<br />
* Automates the persistence of plain old Java objects (POJO)<br />
** User maps their POJO properties to database tables via XML (HBM File).<br />
** There are Hibernate tools that generate HBMs<br />
*** Configurable in the sense that one can create get & set tables where the methods map one-to-one to fields.<br />
* Persist a specific object by storing it the database.<br />
* Intelligent Database I/O <br />
** Smart detection of ''Dirty Properties'' when performing Save / Update / Delete.<br />
** Cascadable Save / Update / Delete for complex objects.<br />
* Everything's done within the scope of a transaction.<br />
<br />
=====Hibernate Database Connectivity=====<br />
<br />
* Configure Hibernate in hibernate.cfg.xml<br />
* Define a Data Source<br />
** We use a simple, single JDBC connection Chado<br />
** Can be configured to use a connection pool or data source accessible by the Java Naming and Directory Interface (JNDI).<br />
** Define a connection “dialect”<br />
** org.hibernate.dialect.PostgreSQLDialect<br />
* Describe the relationship between Java objects and database tables<br />
** Use XML to describe where to store POJO property data in the database<br />
* Create a new Hibernate Session based on the configuration<br />
* Begin a transaction to start performing work<br />
<br />
=====POJO and HBM Example file - CV=====<br />
<br />
<java><br />
public class CV {<br />
<br />
private int cv_id<br />
private String name;<br />
private String definition;<br />
<br />
public property gettersandsetters() {<br />
....<br />
}<br />
<br />
public boolean equals(CV comparaCV) {<br />
....<br />
}<br />
public int hashCode(){<br />
...<br />
}<br />
}<br />
</java><br />
<xml><br />
<hibernate-mapping><br />
<br />
<class name="org.vectorbase.chadoAPI.chadoObjects.CV" table="cv"><br />
<br />
<id name="cv_id" column="cv_id" unsaved-value="undefined"><br />
<br />
<generator class="sequence"><br />
<br />
<param name="sequence">cv_cv_id_seq</param><br />
<br />
</generator><br />
<br />
</id> <br />
<br />
<property name="name" column=”name” type="java.lang.String” not-null="true"/><br />
<br />
<property name="definition" column=”definition” type="java.lang.String”/><br />
<br />
</class><br />
</hibernate-mapping><br />
</xml><br />
<br />
=====HBM Example CVTerm=====<br />
<br />
<java><br />
public class CVTerm {<br />
<br />
private int cvterm_id;<br />
<br />
private CV cv;<br />
<br />
private String name;<br />
<br />
private String definition;<br />
<br />
private DBXref dbxref;<br />
<br />
private int is_obsolete;<br />
<br />
private int is_relationshiptype;<br />
<br />
.....<br />
<br />
}<br />
</java><br />
<xml><br />
<hibernate-mapping><br />
<br />
<class name="org.vectorbase.chadoAPI.chadoObjects.CVTerm" table="cvterm"><br />
<br />
<id name="cvterm_id" column="cvterm_id" unsaved-value="undefined"><br />
<br />
<generator class="sequence"><br />
<br />
<param name="sequence">cvterm_cvterm_id_seq</param><br />
<br />
</generator><br />
<br />
</id><br />
<br />
<many-to-one name="cv" class="org.vectorbase.chadoAPI.chadoObjects.CV" column="cv_id" <br />
not-null="true" cascade="save-update"/><br />
<br />
<property name="name" not-null="true" type="java.lang.String"/><br />
<br />
<property name="definition"/><br />
<br />
<one-to-one name="dbxref" class="org.vectorbase.chadoAPI.chadoObjects.DBXref" cascade="all"/><br />
<br />
<property name="is_obsolete"/><br />
<br />
<property name="is_relationshiptype"/><br />
<br />
</class><br />
</hibernate-mapping><br />
</xml><br />
<br />
=====Hibernate Object Retrieve=====<br />
<br />
One can use Java, Hibernate Query Language, or SQL, this example uses HQL<br />
<br />
<java><br />
import org.hibernate.Session;<br />
import org.vectorbase.chadoAPI.CVTerm;<br />
import org.vectorbase.chadoAPI.CV;<br />
<br />
// Load the configuration from hibernate.cfg.xml<br />
<br />
// Build a session factory first (not shown)<br />
<br />
// Get the session based on the configuration and begin transaction<br />
Session session = HibernateSessionFactory.getCurrentSession();<br />
session.beginTransaction();<br />
<br />
// Load a CVTerm by its ID<br />
CVTerm cvt = (CVTerm) session.get(CVTerm.class,1);<br />
<br />
// Load a CVTerm using HQL<br />
CVTerm cvt = session.createQuery(“from CVTerm where name=?”).setString(0,”name”).uniqueResult();<br />
<br />
// Print out the name of the cvterm<br />
System.out.println(cvt.getName());<br />
<br />
// Get the cv that the cvterm is associated with<br />
// Hibernate doesn’t return the cv_id - it returns a CV Object.<br />
CV cv = cvt.getCv();<br />
<br />
// Print out the cv’s name<br />
System.out.println(cv.getName());<br />
</java><br />
<br />
=====Hibernate Object Update=====<br />
<br />
<java><br />
import org.hibernate.Session;<br />
import org.vectorbase.chadoAPI.CVTerm;<br />
<br />
// Load the configuration from hibernate.cfg.xml<br />
<br />
// Build a session factory first (not shown)<br />
<br />
// Get the session based on the configuration and begin transaction<br />
Session session = HibernateSessionFactory.getCurrentSession();<br />
session.beginTransaction();<br />
<br />
// Load a CVTerm by its ID<br />
CVTerm cvt = (CVTerm) session.get(CVTerm.class,1);<br />
<br />
// Change cvt’s name<br />
cvt.setName(“New CVTerm name”);<br />
<br />
// Save!<br />
// Generated SQL updates “Dirty” properties (name, in this case)<br />
session.save(cvt);<br />
<br />
// Commit data to database<br />
session.commit();<br />
</java><br />
<br />
=====Hibernate Save=====<br />
<br />
<java><br />
import org.hibernate.Session;<br />
import org.vectorbase.chadoAPI.CVTerm;<br />
import org.vectorbase.chadoAPI.CV;<br />
<br />
// Load the configuration from hibernate.cfg.xml<br />
// Build a session factory first and get begin transaction (not shown)<br />
<br />
// Make a new CV<br />
CV new_cv = new CV();<br />
new_cv.setName(“New CV”);<br />
new_cv.setDefinition(“New CV Def”);<br />
<br />
// Make a new cvterm for that cv<br />
CVTerm new_cvterm = new CVTerm();<br />
new_cvterm.setName(“New CVTerm Name”);<br />
// ..... save dbxref etc......<br />
<br />
// Add that CVTerm to our new CV<br />
new_cv.addCVTerm(new_cvterm);<br />
<br />
// Save the new data...<br />
// Hibernate recognizes that it has to first save new_cv, then save new_cvterm.<br />
session.save(new_cvterm);<br />
<br />
session.commit();<br />
<br />
// You can see the new id’s assigned by the database<br />
System.out.println(new_cv.getCv_id());<br />
System.out.println(new_cvterm.getCvterm_id());<br />
</java><br />
<br />
=====Inheritance=====<br />
<xml><br />
<hibernate-mapping><br />
<br />
<class name="org.vectorbase.chadoAPI.chadoObjects.Feature" table="feature" discriminator-<br />
value="not null"><br />
<br />
<id name="feature_id" column="feature_id" unsaved-value="undefined"><br />
<br />
<generator class="sequence"><br />
<br />
<param name="sequence">feature_feature_id_seq</param><br />
<br />
</generator><br />
<br />
</id> <br />
<br />
<discriminator column="type_id" type="integer" insert="false"/><br />
<br />
<many-to-one name="dbxref" class="org.vectorbase.chadoAPI.chadoObjects.DBXref" <br />
column="dbxref_id" cascade="all"/><br />
<br />
<many-to-one name="organism" class="org.vectorbase.chadoAPI.chadoObjects.Organism" <br />
column="organism_id" not-null="true" cascade="save-update"/><br />
<br />
<property name="name"/><br />
.....<br />
<br />
<hibernate-mapping> <br />
<br />
<subclass name="org.vectorbase.chadoAPI.chadoFeatures.Gene" <br />
extends="org.vectorbase.chadoAPI.chadoObjects.Feature" discriminator-value="767"><br />
<br />
</subclass><br />
</hibernate-mapping><br />
</xml><br />
Write custom methods for specific sub-classes<br />
<br />
=====ChadoAPI=====<br />
<br />
* POJO Mappings<br />
** CV, CVTerm, DB, DBXref, Feature, FeatureCVTerm, FeatureDBXref, FeatureLoc, FeatureProp, FeatureRelationship, FeatureSynonym, Organism, Pub, Synonym<br />
* Extended Features<br />
** Chromosome, Gene, Transcript, Exon, Protein<br />
* Constants<br />
** CVTerms, FeatureFeatureRelationships, Ontologies<br />
* Special<br />
** ChadoAdapter<br />
<br />
=====Problem 1 - GMOD Example=====<br />
<br />
<java><br />
// Set up our session and begin transaction<br />
Session session = HibernateUtil.getSessionFactory().getCurrentSession();<br />
session.beginTransaction();<br />
<br />
// Make a chado adpator and load up some utility objects<br />
ChadoAdaptor ca = new ChadoAdaptor();<br />
Chromosome c = ca.fetchChromosomeByUniqueName("fake_chromosome");<br />
Pub null_pub = ca.fetchPubByPubID(1);<br />
Organism agambiae = ca.fetchOrganismByScientificName("Anopheles","gambiae");<br />
<br />
// Begin GMOD Demo Code<br />
<br />
// Make our new gene;<br />
Gene xfile = new Gene();<br />
xfile.setOrganism(agambiae);<br />
xfile.setUniquename("xfile");<br />
xfile.setDescription("A test gene for GMOD meeting");<br />
<br />
/* Set the location of our gene. No need to set coordinates because they'll be updated<br />
* based on the exon boundaries. <br />
*/<br />
FeatureLoc xfile_loc = new FeatureLoc();<br />
xfile_loc.setSrcfeature(c);<br />
xfile_loc.setStrand(1);<br />
xfile.setFeatureLoc(xfile_loc);<br />
<br />
// Add synonyms to xfile<br />
xfile.createNewFeatureSynonym("mulder", null_pub, CVTerms.EXACT_SYNONYM);<br />
xfile.createNewFeatureSynonym("scully", null_pub, CVTerms.EXACT_SYNONYM);<br />
</java><br />
<br />
=====Problem 2 - GMOD Example=====<br />
<br />
<java><br />
// Create a new transcript for our gene.<br />
Transcript t = xfile.createGeneTranscript("xfile-RA");<br />
<br />
// Create some exons for that transcript.<br />
t.createTranscriptExon("xfile:1", 13691, 13767);<br />
t.createTranscriptExon("xfile:2", 14687, 14720);<br />
<br />
// Save our new gene<br />
session.save(xfile);<br />
System.out.println("xfile feature_id is " + xfile.getFeature_id());<br />
<br />
// Fetch our saved gene from the database<br />
Gene xfile_r = ca.fetchGeneByUniqueName("xfile");<br />
System.out.println("symbol: " + xfile_r.getUniquename());<br />
System.out.print("synonyms: ");<br />
for (FeatureSynonym fs : xfile_r.getFeatureSynonyms()){<br />
<br />
System.out.print(fs.getSynonym().getName() + " ");<br />
}<br />
<br />
System.out.println("description: " + xfile_r.getDescription());<br />
System.out.println("type: " + xfile_r.getType().getName());<br />
<br />
for (Transcript tx : xfile_r.fetchAllTranscripts()){<br />
for (Exon e : tx.fetchAllExons()){<br />
System.out.println(e.getUniquename() + " Start:\t" + e.getFeatureLoc().getFmin());<br />
System.out.println(e.getUniquename() + " End:\t" + e.getFeatureLoc().getFmax());<br />
System.out.println("\tSrcFeatureID: " + e.getFeatureLoc().getSrcfeature().getFeature_id());<br />
}<br />
System.out.println(">" + tx.getUniquename());<br />
System.out.println(tx.generateTranscriptSequenceFromExons().toUpperCase());<br />
}<br />
</java><br />
<br />
=====Problems 3, 4, & 5 - GMOD Update & Delete=====<br />
<br />
<java><br />
// Lets update our name...<br />
xfile_r.setUniquename("x-file");<br />
<br />
session.save(xfile_r);<br />
<br />
// Not part of the ChadoAdaptor utility object, but a good example of HQL<br />
List<Gene> genes = (List<Gene>)session.createQuery("from Gene where uniquename like ?").setString(0,”x-%”).list();<br />
<br />
for (Gene g : genes){<br />
<br />
System.out.println(g.getFeature_id() + <br />
"\t" + g.getUniquename() + <br />
"\t" + g.getOrganism().getGenus() +<br />
" " + g.getOrganism().getSpecies());<br />
}<br />
<br />
// Deleting... hmm...<br />
Gene delete_me = ca.fetchGeneByUniqueName("x-ray");<br />
session.delete(delete_me);<br />
<br />
// All Finished<br />
session.getTransaction().commit();<br />
</java><br />
<br />
<br />
<br />
=====What Hibernate Does Well=====<br />
<br />
* Hibernate can be configured to perform specialized functions<br />
** For example, it has its own notion of a cascade<br />
* Flexible with respect to language<br />
** Java, Hibernate Query Language, or SQL<br />
* Any JDBC driver<br />
<br />
=====Acknowledgements=====<br />
<br />
* VectorBase People<br />
** Frank Collins, EO Stinson, Ryan Butler<br />
* GMOD<br />
* NIAID<br />
<br />
===PSU Chado Interface===<br />
<br />
====Background====<br />
<br />
* Source:<br />
* Language: Java<br />
* Authors: Chinmay Patel, Adrian Tivey<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Limitations====<br />
<br />
* Hibernate Save: no equivalent to a ''find or save'' method.<br />
* Not great for bulk data retrieval.<br />
* Hibernate works best when applied to databases designed with objects in mind (''Object Oriented Databases''). <br />
<br />
<br />
====Presentation by Chinmay Patel====<br />
<br />
This Wiki section is an edited version of [[Media:PSU.pdf|Chinmay's presentation]].<br />
<br />
=====GeneDB=====<br />
<br />
* GeneDB is the organism data and annotation database for the Pathogen Sequencing Unit (PSU) at the Sanger Institute, UK<br />
* Contains 37 organisms, which is expected to grow to 62<br />
* Currently migrating to chado schema<br />
* Java API with two engines Hibernate & iBatis<br />
** Two teams, Artemis and GeneDB, took different approaches<br />
<br />
=====Technical - Connections=====<br />
<br />
Connections are configured in the Spring configuration file<br />
<xml><br />
<bean id="dataSource" class="org.apache.commons.dbcp.BasicDataSource"><br />
<property name="driverClassName" value="org.postgresql.Driver" /><br />
<property name="url" value="jdbc:postgresql://holly.sanger.ac.uk:5432/chado" /><br />
<property name="username" value="DELIBERATELY_BOGUS_NAME"/><br />
<property name="password" value="WIBBLE" /><br />
</bean><br />
</xml><br />
* Uses a connection pool<br />
* Connection to the database is specified graphically, so the iBatis configuration file has variables for the location:<br />
<xml><br />
<property name="JDBC.Driver" value="org.postgresql.Driver"/><br />
<br />
<property name="JDBC.ConnectionURL” value="jdbc:postgresql://${chado}"/><br />
<br />
<property name="JDBC.Username" value="${username}"/><br />
<br />
<property name="JDBC.Password" value="${password}"/><br />
</xml><br />
<br />
* provide database location, username & password<br />
* select from scrollable list of feature with residues (organisms in separate Postgres schemas) what to open in Artemis<br />
<br />
=====Technical - Code Generation=====<br />
<br />
* The shared interface and hibernate implementation were originally generated<br />
* There’s no explicit code generation (although the Spring and Hibernate runtimes may use them behinds the scenes)<br />
<br />
=====Technical - Transactions=====<br />
<br />
* Transactions are fully supported<br />
* There’s no explicit code generation (although the Spring and Hibernate runtimes may use them behind the scenes)<br />
<br />
=====Problems 1, 2, & 3=====<br />
<br />
Creating a gene<br />
<java><br />
genes[0] = new Feature(ORG, GENE, "xfile", false, false, now, now);<br />
<br />
genes[0].setSeqLen(1029); <br />
sequenceDao.persist(genes[0]);<br />
<br />
FeatureLoc loc = new FeatureLoc(SOURCE_FEATURE, genes[0], 13691, false, 14720, false, (short)1, 0, 0 ,0);<br />
<br />
sequenceDao.persist(loc);<br />
<br />
addFeatureProp(genes[0], "description", "A test gene for GMOD meeting");<br />
<br />
addSynonymsToFeature(genes[0], "mulder", "scully");<br />
<br />
createExon("exon1", genes[0], 13691, 13767, now, 0);<br />
<br />
createExon("exon2", genes[0], 14687, 14720, now, 1);<br />
</java><br />
<br />
Retrieve a gene<br />
<java><br />
Feature f = sequenceDao.getFeatureByUniqueName("xfile");<br />
displayGene(f);<br />
</java><br />
<br />
Update a gene<br />
<java><br />
genes[0].setUniqueName("x-file");<br />
<br />
sequenceDao.merge(genes[0]);<br />
</java><br />
<br />
=====Demo – Sample Problem=====<br />
<br />
<java><br />
private Feature createExon(String name, Feature gene, int min, int max, Timestamp now, int rank) {<br />
<br />
Feature exon = new Feature(ORG, EXON, name, false, false, now, now);<br />
exon.setSeqLen(max-min);<br />
sequenceDao.persist(exon);<br />
<br />
FeatureLoc loc = new FeatureLoc(SOURCE_FEATURE, exon, min, false, max, false, <br />
(short)1, 0, 0 ,0);<br />
sequenceDao.persist(loc);<br />
<br />
return exon;<br />
<br />
}<br />
</java><br />
<br />
=====Demo – Sample Problem=====<br />
<br />
<xml><br />
<st:section name="Naming" id="gene_naming" collapsed="false" collapsible="false"<br />
hideIfEmpty="true"><br />
<dl><br />
<dt><b>symbol:</b></dt><br />
<dd>${feature.uniqueName}</dd><br />
</dl><br />
<db:synonym name="synonym" var="name" collection="${feature.featureSynonyms}"><br />
<br /><b>Synonym:</b> <db:list-string collection="${name}" /><br />
</db:synonym><br />
<dt><b>Type:</b></dt><br />
<dd>${feature.cvTerm.name}</dd><br />
<br />
<st:section name="Exons" collapsed="false" collapsible="true" hideIfEmpty="true"><br />
<display:table name="exons" uid="tmp" pagesize="30" class="simple" cellspacing="0"<br />
cellpadding="4"><br />
<display:column property="uniqueName" title="Exon"/><br />
<display:column property="featureLocsForSrcFeatureId.fmin" title="Start"/><br />
<display:column property="featureLocsForSrcFeatureId.fmax" title="end"/><br />
</display:table><br />
</st:section><br />
<br />
<st:section name="cds" collapsible="true"><br />
<b>${feature.residues}</b><br />
</st:section><br />
</xml><br />
<br />
Specialized functionality like a cascading delete are handled by the database</div>165.124.152.78http://gmod.org/wiki/GMOD_MiddlewareGMOD Middleware2007-01-26T20:21:30Z<p>165.124.152.78: /* Modware Features */</p>
<hr />
<div>==Middleware for Chado databases==<br />
<br />
===Authors===<br />
<br />
* Jeff Bowes<br />
* Robert Bruggner<br />
* Scott Cain<br />
* Josh Goodman<br />
* Eric Just<br />
* Sohel Merchant<br />
* Brian O'Connor<br />
* Brian Osborne<br />
* Chinmay Patel<br />
* Pinglei Zhou<br />
<br />
===Middleware Evaluation January 2007===<br />
<br />
A group of some 50 GMOD developers met at the annual meeting to discuss middleware. This one day meeting had the following general goals:<br />
<br />
* To educate GMOD programmers on methods and practices for Middleware<br />
* To facilitate discussion on the best methods<br />
* To guide GMOD to a uniform Middleware layer<br />
* To generate this central reference document for Middleware projects, including:<br />
** Platform information<br />
** Strengths & weaknesses of different Middleware packages<br />
** Specific examples of how one would use a given middleware package<br />
<br />
===Introduction===<br />
<br />
One of the key characteristics of the GMOD software project is the variety of approaches and components that it supports. This applies to applications, database schemas, as well as to middleware, a software ''layer'' that mediates the exchange of data between the applications and the databases. Despite this diversity certain applications and schemas have emerged as key supported components in GMOD, such as the GBrowse application and the Chado schema, to name just two. However, a consensus view has not emerged with respect to middleware, and there are certainly a number of different middleware packages that have been used in the GMOD world, coming from within this world and from the larger world of open source.<br />
<br />
In late 2006 the GMOD developers took note of the large number of middleware packages in use and elected to embark on a short-term study to evaluate and compare these packages. The primary motivation here was to select or recommend certain packages over others specifically within the GMOD context. The assumption is that making such recommendations will serve to focus the developers' effort on a smaller number of packages. Clearly it's also assumed that such a focus will inevitably lead to greater support for and use of those recommended packages, and that all will GMOD will benefit.<br />
<br />
Another purpose of this study is to educate GMOD programmers on best practices concerning the use and development of middleware. It's expected that common agreement on these practices will lead to the development of more effective software as well as the best use of the software in practice. Finally, this study should generate a central reference document on these different middleware packages used in GMOD. This reference will contain platform- and language-specific information as well as descriptions of the strengths and weaknesses of the packages that can be used by GMOD developers when considering middleware.<br />
<br />
====General Evaluation Criteria====<br />
<br />
The GMOD developers proposed that each presenter provide some basic information about each middleware package, both general and technical. In addition each middleware application was asked to address a set of sample problems, shown below. These example problems are thought to typify some of the common functions that the scientist may need when working with their own database. It was understood that not all software would be able to handle all aspects of the sample problems and this demonstration was not intended to be ''live''.<br />
<br />
=====Problem 1=====<br />
<br />
Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned <code>feature_id</code> for each inserted gene.<br />
<br />
Note:<br />
<br />
* The coordinates are given in exact coordinates.<br />
* Use the <code>organism_id</code> for your organism<br />
* Store description in the chado table <code>featureprop</code><br />
* A sequence in fasta format (see [[FakeChromosome|Fake Chromosome]]) should be loaded as genomic sequence, either chromosome or contig -- this will be used as a <code>srcfeature</code> in <code>featureloc</code><br />
<br />
Gene descriptions:<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
mRNA Feature<br />
exon_1: <br />
start: 13691<br />
end: 13767<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
exon_2: <br />
start: 14687<br />
end: 14720<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
symbol: x-men<br />
synonyms: wolverine<br />
mRNA Feature<br />
exon_1: <br />
start: 12648<br />
end: 13136<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
symbol: x-ray<br />
synonyms: none<br />
exon: <br />
start: 1703<br />
end: 1900<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 2=====<br />
<br />
Retrieve and print the following report for gene '''xfile''' (the coding sequence and exon coordinates are derived from the associated mRNA feature). The results should resemble the following:<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>xfile cds <br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
=====Problem 3=====<br />
<br />
Update the gene '''xfile''': change the name symbol to '''x-file''' and retrieve the changed record. Regenerate the report from Problem 1. The results should resemble the following:<br />
<br />
symbol: x-file<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>x-file cds<br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
=====Problem 4=====<br />
<br />
Search for all genes with symbols starting with ''x-''. With the results produce the following simple result list<br />
(organism will vary):<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
1325 x-ray Xenopus laevis<br />
<br />
=====Problem 5=====<br />
<br />
Delete the gene '''x-ray''' using the <code>geneId</code>. Run the search and report in Problem 4 again to show the delete has taken<br />
place, with a result resembling the following:<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
<br />
===Conclusions===<br />
<br />
The one day meeting heard presentations from developers using both Perl and Java middleware and a number of satisfactory solutions were described. The focus in all cases was some sort of system that connected to the Chado relational database. Although other databases are encountered in the GMOD world (e.g. BioSQL) the Chado schema is popular and serves as a good test schema for this exercise given its complexity. The primary focus in the talks was on functionality from the perspective of writing code and extending the software and less attention was given to performance. Each presenter focussed on their middleware and little side-by-side comparisons were made (for one comparison please see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]] by Josh Goodman).<br />
<br />
====Problem Assignments====<br />
<br />
All presenters paid attention to the assigned problems and all packages could perform the required operations, except for the GBrowse (DasI) Adaptor which is read-only software. Clearly one can see many differences between packages in how the the problems were solved, please see the presentations themselves for these details. <br />
<br />
The Perl approaches used only the Perl language whereas the Java packages all used Java plus XML, to some degree. In addition iBatis exposes SQL to the developer and it was argued that this could be viewed either as an advantage (allows tuning of underlying SQL) or disadvantage.<br />
<br />
====Java Middleware====<br />
<br />
* Flybase examined iBatis and Hibernate, both use XML configuration files<br />
** Hibernate is better if you're building schema from scratch<br />
** Both auto-configure given a schema. <br />
** Both have strengths and weaknesses.<br />
* Is Hibernate better when you're in the process of designing a schema?<br />
** Hibernate can assist you in making a ''Hibernate-compatible'' schema.<br />
<br />
=====Abstraction=====<br />
<br />
<br />
=====Performance=====<br />
<br />
No pairwise comparisons.<br />
<br />
=====Configuration & Autoconfiguration=====<br />
<br />
<br />
=====Documentation=====<br />
<br />
<br />
====Perl Middleware====<br />
<br />
<br />
=====Abstraction=====<br />
<br />
Modware has Higher level abstraction than that provided by Chado::AutoDBI<br />
<br />
=====Performance=====<br />
<br />
No pairwise comparisons.<br />
<br />
=====Configuration & Autoconfiguration=====<br />
<br />
<br />
=====Documentation=====<br />
<br />
DasI interface is well-documented, about a dozen methods and three classes, all documented.<br />
<br />
===Object-Relational Mapping Principles===<br />
<br />
====Presentation by Sohel Merchant====<br />
<br />
Sohel Merchant, Bioinformatics Software Engineer at dictyBase, Center for Genetic Medicine, Northwestern University, Chicago. This Wiki section is an edited version of [[Media:Object_Relational_Mapping_Layer.pdf|Sohel's presentation]].<br />
<br />
=====Outline=====<br />
<br />
* The Problem<br />
* Solutions<br />
* ORM<br />
* Perl – Class::DBI<br />
* Summary<br />
<br />
=====The Problem=====<br />
<br />
* Developers need to perform Create, Retrieve, Update, Delete (aka CRUD) operations on data inside an application.<br />
* The real world objects represented using a programming language needs to be stored in databases<br />
* Using relational databases to store object-oriented data leads to a semantic gap<br />
* RDBMS have fixed types, but OO can have more complicated user defined types.<br />
<br />
=====Solutions=====<br />
<br />
* Data Access Object (DAO)<br />
* Developer writes a class which contains one attribute for each field in the table<br />
* Methods for CRUD typically contains JDBC/DBI code with the necessary SQL statements.<br />
* Object Relational Mapping (ORM), WikiPedia:<br />
** “ORM is a programming technique that links databases to object-oriented language concepts, creating (in effect) a virtual object database.“<br />
* Developer needs to configure the ORM<br />
* Less amount of manual coding<br />
* CRUD methods are automatically generated by the ORM layer<br />
<br />
=====ORM=====<br />
<br />
ORM solutions<br />
<br />
* Perl<br />
** Class::DBI<br />
* Java<br />
** EJB<br />
** Hibernate<br />
** JDO<br />
** iBatis<br />
<br />
=====Perl - Class::DBI=====<br />
<br />
* Provides a simple interfaces for wrapping Perl classes around a database tables<br />
* Tables are mapped directly to objects<br />
* The table column name are mapped to the get/set methods<br />
* Can be used with transactions<br />
<br />
=====Class::DBI=====<br />
<br />
Defining a class in Class::DBI to represent a table:<br />
<br />
CVTERM<br />
cvterm_id<br />
cv_id<br />
name<br />
definition<br />
dbxref_id<br />
<br />
Corresponding code:<br />
<br />
<perl><br />
package Chado::Cvterm;<br />
use base 'Chado::DBI';<br />
Chado::Cvterm->set_up_table('Cvterm');<br />
</perl><br />
<br />
=====Class::DBI - CRUD=====<br />
<br />
<perl><br />
## Create<br />
$term_dbobj = Chado::Cvterm->create({<br />
name => ”DUMMY TERM”,<br />
cv_id => 1,<br />
dbxref_id => 125<br />
});<br />
## Retrieve<br />
$term_dbobj = Chado::Cvterm->retrieve(2);<br />
<br />
## Update<br />
$term_dbobj->name( $term->name() );<br />
$term_dbobj->definition( $term->definition );<br />
<br />
## Delete<br />
$term_dbobj->delete();<br />
</perl><br />
<br />
=====Java - Hibernate=====<br />
<br />
* Hibernate maps Java Objects directly to database tables<br />
* Scalable<br />
* Works well for controlled Data model<br />
<br />
=====Java - iBatis=====<br />
<br />
* iBATIS maps Java Objects to the results of SQL Queries<br />
* XML definitions for queries<br />
* Queries and managing Maps<br />
* Transactions<br />
* Good fit for existing database schema<br />
<br />
=====Summary=====<br />
<br />
* ORM provides painless roundtrip of data between the application and database.<br />
* Reduces the amount of SQL code and allows a programmatic style interface to the RDBMS<br />
* Choice of ORM solution depends on the type of project<br />
* Flybase examined iBatis and Hibernate, both use XML configuration files<br />
** Hibernate is better if you're building schema from scratch<br />
** Both auto-configure given a schema. <br />
** Both have strengths and weaknesses.<br />
* Is Hibernate better when you're in the process of designing a schema?<br />
** Hibernate can assist you in making a ''Hibernate-compatible'' schema.<br />
<br />
===XORT===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/project/showfiles.php?group_id=27707<br />
* Language: Perl<br />
* Authors: Pinglei Zhou<br />
* Users: Flybase<br />
* Support: Pinglei Zhou<br />
* Third party code: Uses XML::Parser::PerlSAX, Unicode::String, XML::DOM, and DBI from Perl (CPAN)<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
=====Comparing Hibernate & XORT=====<br />
<br />
Flybase tried Hibernate, but just creating simple print() statements in the course of doing bulk operations they<br />
encountered performance issues. Therer are many caching parameters available in Hibernate but the problem is that Chado is recursive or cyclical. <br />
XORT does some simple, and automatic, caching. With XORT you can handle recursive or cyclical operations more easily. In common operations such as merging genes Chado users will encounter this issue routinely.<br />
<br />
Also see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]].<br />
<br />
====Limitations====<br />
<br />
* Database schemas need to follow certain rules<br />
** All must have internal int primary key<br />
** All must have unique key(s)<br />
* It may take a long path to retrieve certain type of data<br />
** Example: gene->allele->genotype->phenotype via feature_relationship<br />
* Structure not stored in memory, you flush out data as it goes<br />
<br />
====Presentation by Pinglei Zhou and Josh Goodman====<br />
<br />
This Wiki section is an edited version of [[Media:XORT.pdf|Josh and Pinglei's presentation]].<br />
<br />
<br />
<br />
=====Introduction=====<br />
<br />
* An XML-database mapping system for data exchange between DB and XML-driven application<br />
* XORT can handle typical XML, it's not Chado-specific<br />
* Developed/Supported by Pinglei Zhou at FlyBase Harvard, 0.007 version now.<br />
* Used at all FlyBase sites<br />
** Harvard has extensive library of Perl modules for generating ChadoXML<br />
* Written in Perl<br />
* Required perl modules:<br />
** XML::Parser::PerlSAX<br />
** Unicode::String<br />
** XML::DOM<br />
** DBI<br />
<br />
=====Chado XML=====<br />
<br />
* Is ChadoXML necessary? No, but it may help you.<br />
* ChadoXML assists with incremental updates, if you want to avoid flush-and-reload.<br />
* While update can be achived by other middleware (for example, perl Class::DBI, Java Hibernate), ChadoXML provide additional feature as way to archive your transaction.<br />
* It provides bulk update/download which other methods lack or is inefficient<br />
<br />
=====Components=====<br />
<br />
* Database & Schema<br />
* ChadoXML Specification<br />
* DumpSpec<br />
** DumpSpec files are simple XML files that tells XORT what to do<br />
** DumpSpec files are ''language independent'', being XML<br />
** It's fairly easy for those who know the schema to read these files and understand what the operation is<br />
<br />
=====Highlights of Chado XML Specification=====<br />
<br />
* Unique represent of specific database schema<br />
* Get away with those internal primary key value<br />
* Static vs. Operational<br />
* Encoding for non-ASCII characters<br />
* Macro mechanism (object reference)<br />
<br />
=====Putting it together: New FlyBase dataflow Part 1=====<br />
<br />
There are three Flybase sites, and most curation is done at Harvard and<br />
Cambridge. Proforma is the curation format at Cambridge and Harvard, but<br />
Harvard also curates with Apollo and ChadoXML.<br />
<br />
Once in Chado, the reporting instance, there's a denormalization step<br />
in moving data to a read-only database. Once in the read-only database there are<br />
dumps, for reporting purposes, using XORT to create ChadoXML. Once<br />
ChadoXML is created version 2 of XSLT is used to create HTML and GFF. HTML reports<br />
are for human-readable reports, GFF for GBrowse and for various power<br />
users.<br />
<br />
1.a. Proforma (FlyBase Cambridge) is converted to ChadoXML<br />
<br />
1.b. ChadoXML is created by Apollo (Harvard)<br />
<br />
1.c. ChadoXML is created by Java SEAN (Harvard)<br />
<br />
2. All ChadoXML is loaded into Chado by XORT<br />
<br />
=====Putting it together: New FlyBase dataflow Part 2=====<br />
<br />
3. Chado (Harvard) is denormalized and loaded into Chado (Indiana)<br />
<br />
4. ChadoXML is created from Chado using XORT<br />
<br />
5.a. GFF and Fasta is created from ChadoXML<br />
<br />
5.b. HTML is created from Chado XML<br />
<br />
=====Data & Report Generation=====<br />
<br />
* Content of all output files is controlled by XML dumpspecs.<br />
** Dumpspecs are language independent.<br />
** Easily readable (with knowledge of Chado structure).<br />
* All XML transformation steps are done with XSLT v2.<br />
** Saxon XSLT (http://saxon.sourceforge.net/)<br />
** ChadoXML is split into individual chunks before XSLT processing to accommodate large file sizes.<br />
** Extremely fast. We can process all data for ~60,000 Drosophila genes in under 30 minutes.<br />
<br />
=====Hibernate & XORT=====<br />
<br />
* Hibernate didn't scale well when dealing with 5,000+ features in bulk.<br />
** The test was simply calling <code>print()</code> statements<br />
* Performance tweaks for Hibernate can be quite complicated to setup for bulk operations.<br />
* XORT is currently handling ~6 million features in production with only minor performance problems.<br />
* XORT is much more language independent.<br />
<br />
=====Support for complex transactions using XORT=====<br />
<br />
For example:<br />
<br />
* Find all records linked to a record using dumpspec<br />
* Merge gene x into y, each with thousands of records attached<br />
<br />
Step 1. Dump all data use simple dumpspec<br />
<xml><br />
<chado><br />
<feature dump=“all”><br />
<uniquename test=“eq”>x</uniquename><br />
</feature><br />
</chado><br />
</xml><br />
Step 2 Delete feature x from DB, with triggers to clean orphan records, if necessary<br />
<br />
Step 3. Edit the output xml, change uniquename x to y, then load the edited file back to DB<br />
<br />
=====CHIA (Chado Interface Application)=====<br />
<br />
A Java application that organizes SQL and XORT functionality for internal users, e.g.:<br />
<br />
* Dump chado-XML for gene regions for Apollo curation<br />
* Organize and execute “canned” SQL queries<br />
* Serve IDs for curators (in development)<br />
* Dynamic browser Chado without writing SQL statement<br />
<br />
CHIA is being designed to be extensible for adding new functionality as needed.<br />
<br />
<br />
=====Documentation=====<br />
<br />
* ''Using Chado to Store Genome Annotation Data"<br />
** Current Protocols in Bioinformatics (Baxevanis, A.D., and Davison, D.B., eds) 2, 9.6.1-9.6.28.<br />
* XORT specification docs<br />
* XORT draft (unpublished)<br />
* GMOD case demo procedure<br />
** All in the doc directory of XORT package, http://www.gmod.org<br />
<br />
=====Acknowledgements=====<br />
<br />
* Willian Gelbart <br />
* Chris Mungall<br />
* David Emmert <br />
* Mark Gibson<br />
* Stan Letovsky <br />
* Nomi Harris<br />
* Frank Smutniak <br />
* Suzanna Lewis<br />
* Peili Zhang <br />
* Stan Letovsky<br />
* Haiyan Zhang <br />
* Aubrey de Grey<br />
* Andy Schroeder <br />
* Don Gilbert<br />
* Susan Russo<br />
* Mark Zythovicz <br />
* Scott Cain<br />
* Lincoln Stein<br />
* Victor Strelets<br />
* Robert Wilson<br />
* Paul Leyland<br />
<br />
===Chado::AutoDBI===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/projects/gmod/<br />
* Language: Perl<br />
* Authors: Allen Day, Scott Cain, Brian O'Connor, & others<br />
* Users: <br />
* Support:<br />
* Third party code: Based on Class::DBI by Michael Schwern & Tony Bowden<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Performance<br />
** Can one read thousands of objects into memory? You could do this but it's not suited to bulk operations<br />
* Joins & complex queries<br />
<br />
<perl><br />
# Add the add_constructor for looking for name lengths<br />
<br />
__PACKAGE__ ->add_constructor(long_names => qq{ length(name) > 15 });<br />
<br />
# Custom SQL<br />
<br />
__PACKAGE__->set_sql(xfiles => qq{<br />
SELECT FEATURE_ID<br />
FROM FEATURE<br />
WHERE NAME = 'xfiles' });<br />
</perl><br />
<br />
====Presentation by Brian O'Connor====<br />
<br />
This Wiki section is an edited version of [[Media:AutoDBI.pdf|Brian's presentation]].<br />
<br />
=====Relation to Turnkey=====<br />
<br />
Turnkey is a package that auto-generates Web sites given a relational<br />
schema, based on SQL::Translator<br />
<br />
* Turnkey authors: Allen Day, Scott Cain, Brian O'Connor<br />
* Turnkey and Chado::AutoDBI objects are essentially the same<br />
<br />
=====Technical Overview=====<br />
<br />
* Code Generation<br />
<br />
=====Project Overview=====<br />
<br />
Convert SQL Queries/Inserts/Deletes -> Object Calls<br />
<sql><br />
INSERT INTO feature (organism_id, name)<br />
VALUES (1, 'foo');<br />
</sql><br />
To:<br />
<perl><br />
my $feature = Turnkey::Model::Feature->find_or_create({<br />
organism_id => $organism,<br />
name => 'xfile', uniquename => 'xfile',<br />
type_id => $mrna_cvterm,<br />
is_analysis => 'f', is_obsolete => 'f'<br />
});<br />
</perl><br />
<br />
=====Technical Overview=====<br />
<br />
* Database connection: use a base class<br />
* Set up base object and connect, then create a ''table object'' to access primary key. <br />
* Class::DBI can find and insert records into other table, based on foreign key.<br />
<br />
<perl><br />
use base qw(Class::DBI::Pg);<br />
<br />
my ($dsn, $name, $pass);<br />
$dsn = "dbi:Pg:host=localhost;dbname=chado;port=5432";<br />
$name = "postgres";<br />
$pass = "";<br />
<br />
Turnkey::Model::DBI->set_db('Main', $dsn, $name, $pass, {AutoCommit => 1});<br />
</perl><br />
<br />
=====Technical Overview=====<br />
<br />
* Basic ORM Object: Feature<br />
<br />
<perl><br />
package Turnkey::Model::Feature;<br />
use base 'Turnkey::Model::DBI';<br />
<br />
Turnkey::Model::Feature->set_up_table('feature');<br />
<br />
#<br />
# Primary key accessors<br />
#<br />
<br />
sub id { shift->feature_id }<br />
sub feature { shift->feature_id }<br />
</perl><br />
<br />
* data field accessors by Class::Accessor<br />
<br />
=====Technical Overview=====<br />
<br />
* Basic ORM Object: Feature<br />
** has_a<br />
<br />
<perl><br />
#<br />
# has_a<br />
#<br />
Turnkey::Model::Feature->has_a( type_id => "Turnkey::Model::Cvterm" );<br />
sub cvterm { return shift->type_id; }<br />
</perl><br />
<br />
* Basic ORM Object: Feature<br />
** has_many<br />
<br />
<perl><br />
#<br />
# has_many<br />
#<br />
Turnkey::Model::Feature->has_many('feature_synonym_feature_id', <br />
'Turnkey::Model::Feature_Synonym' => 'feature_id');<br />
sub feature_synonyms { return shift->feature_synonym_feature_id; }<br />
<br />
Turnkey::Model::Feature->has_many('featureprop_feature_id', <br />
'Turnkey::Model::Featureprop' => 'feature_id');<br />
sub featureprops { return shift->featureprop_feature_id; }<br />
</perl><br />
<br />
* Can traverse tables, such as going from FEATURE to FEATUREPROP <br />
** Tell base object that the ''table object'' has_a() or has_many() keys corresponding to some key in other ''table object'' <br />
<br />
=====Technical Overview=====<br />
<br />
* Basic ORM Object: Feature<br />
** skipping linker tables for has_many<br />
<br />
<perl><br />
# skip over feature_synonym table<br />
#<br />
# method 1<br />
#<br />
sub synonyms { my $self = shift; return map $_->synonym_id, $self->feature_synonyms; }<br />
#<br />
# method 2<br />
#<br />
Turnkey::Model::Feature->has_many( synonyms2 =><br />
['Turnkey::Model::Feature_Synonym' => 'synonym_id']);<br />
</perl><br />
<br />
=====Technical Overview=====<br />
<br />
* Transactions<br />
** Chado::AutoDBI supports transactions, and one can wrap the transaction in an eval()<br />
<perl><br />
sub do_transaction {<br />
my $class = shift;<br />
my ( $code ) = @_;<br />
# Turn off AutoCommit for this scope.<br />
# A commit will occur at the exit of this block automatically,<br />
# when the local AutoCommit goes out of scope.<br />
local $class->db_Main->{ AutoCommit };<br />
<br />
# Execute the required code inside the transaction.<br />
eval { $code->() };<br />
if ( $@ ) {<br />
my $commit_error = $@;<br />
eval { $class->dbi_rollback }; # might also die!<br />
die $commit_error;<br />
}<br />
}<br />
</perl><br />
<br />
=====Technical Overview=====<br />
<br />
* Lazy Loading<br />
** One can either do automated creation of objects or explicitly dictate which fields are incorporated into object<br />
<perl><br />
Turnkey::Model::Feature->columns( Primary => qw/feature_id/ );<br />
Turnkey::Model::Feature->columns( Essential => qw/name organism_id type_id/ );<br />
Turnkey::Model::Feature->columns( Others => qw/residues .../ );<br />
</perl><br />
<br />
Typically:<br />
<br />
<perl><br />
Turnkey::Model::Feature->set_up_table('feature');<br />
</perl><br />
<br />
=====Problem 1=====<br />
<br />
* Create Feature & Add Description<br />
<br />
<perl><br />
# now create mRNA feature<br />
<br />
my $feature = Turnkey::Model::Feature->find_or_create({<br />
organism_id => $organism,<br />
name => 'xfile', uniquename => 'xfile',<br />
type_id => $mrna_cvterm,<br />
is_analysis => 'f', is_obsolete => 'f'<br />
});<br />
<br />
# create description<br />
<br />
my $featureprop = Turnkey::Model::Featureprop->find_or_create({<br />
value => 'A test gene for GMOD meeting',<br />
feature_id => $feature,<br />
type_id => $note_cvterm,<br />
});<br />
</perl><br />
<br />
=====Problem 2=====<br />
<br />
* Retrieve a Feature via Searching<br />
** Search using strings or identifiers, a search will return an iterator object<br />
<br />
<perl><br />
# objects for global use<br />
<br />
# the organism for our new feature<br />
my $organism = Turnkey::Model::Organism->search(abbreviation => "S.cerevisiae")->next;<br />
<br />
# the cvterm for a "Note"<br />
my $note_cvterm = Turnkey::Model::Cvterm->retrieve(2);<br />
<br />
# searching name by wildcard<br />
<br />
my @results = Turnkey::Model::Feature->search_like(name => 'x-%');<br />
</perl><br />
<br />
=====Problems 3, 4, & 5=====<br />
<br />
* Update a Feature<br />
<br />
<perl><br />
# update the xfile gene name<br />
<br />
$feature->name("x-file");<br />
$feature->update();<br />
</perl><br />
<br />
* Delete a Feature<br />
<br />
<perl><br />
# now delete the x-file feature<br />
<br />
$feature->delete();<br />
</perl><br />
<br />
=====Things Chado::AutoDBI does well=====<br />
<br />
* Easy to use<br />
* Easy to port<br />
* Use with other DBs<br />
** Both Oracle and Postgres used currently<br />
* Autogenerated via Turnkey<br />
* find_or_create method<br />
* Performance is not as bad as you might guess<br />
** Due to Lazy loading<br />
** Even whole genome operations are feasible<br />
<br />
Note that speed is relative: one can find bad performance using the wrong SQL and Chado::AutoDBI approach will be speedier.<br />
<br />
<br />
=====For More Information=====<br />
<br />
* Class::DBI<br />
** http://www.class-dbi.com<br />
** http://search.cpan.org<br />
<br />
* Turnkey<br />
** http://turnkey.sf.net<br />
<br />
* Biopackages<br />
** http://biopackages.net<br />
<br />
===Modware===<br />
<br />
====Background====<br />
<br />
* Source: http://gmod-ware.sourceforge.net<br />
* Language: Perl<br />
* Authors: Sohel Merchant, Eric Just<br />
* Contact: e-just [at] northwestern.edu<br />
* Users: DictyBase<br />
* Support: <br />
* Third party code: GMOD, BioPerl<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity: Uses Chado::AutoDBI to connect. Connection is configured on GMOD install.<br />
* Transaction support: See Chado::AutoDBI talk.<br />
* Code generation: No automatic code generation<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Does not ''cover'' all of Chado<br />
* Not enough users to get quality feedback yet<br />
* Performance (?)<br />
* Language dependent<br />
<br />
====Presentation by Eric Just====<br />
<br />
Eric Just, Senior Bioinformatics Scientist, dictyBase: http://dictybase.org<br />
Center for Genetic Medicine, Northwestern University. This is an edited version of [[Media:Modware.pdf|Eric's presentation]].<br />
<br />
=====Why Modware Was Developed=====<br />
<br />
* Each feature type requires different behavior<br />
* Want to leave schema semantics out of application<br />
* Want to leverage work done in BioPerl<br />
* Re-use code developed for common use cases<br />
* DictyBase is using a superset of Modware<br />
** Some DictyBase-specific code used there<br />
<br />
=====What is in the Feature Table?=====<br />
<br />
The core of Chado<br />
<br />
* Chromosome<br />
* Contig<br />
* Gene<br />
* mRNA<br />
* Exon<br />
* Lots of other things - See Sequence Ontology!<br />
<br />
=====Modware Features=====<br />
<br />
* Multiple Feature classes<br />
** CHROMOSOME, GENE, MRNA, CONTIG<br />
* Each class provides type specific methods<br />
* Logic such as building exon structure of mRNA features is encapsulated<br />
* Parent class Modware::Feature<br />
** Provides common methods such as name(), primary_id(), external_ids() <br />
** Abstract factory for various feature types<br />
* Lazy : information is only retrieved when you ask for it, but cached for <br />
speedy retrieval the next time it is required<br />
* Uses Bioperl and its objects<br />
** Each different feature subclass has a bioperl() method that returns <br />
an appropriate BioPerl object.<br />
** Bioperl object manipulation used to update feature coordinates<br />
* Subclasses provide type-specific methods<br />
** For example, Chromosome isn't the same as Gene which isn't the same as ...<br />
* Any feature type not explicitly supported in Modware::Feature class is blessed <br />
as a Modware::Feature::GENERIC class which has a start/stop coordinate on a genomic <br />
sequence feature (no structure like a trasncript with exons)<br />
<br />
=====Architectural Overview=====<br />
<br />
* Object-oriented Perl interface to Chado<br />
* Built on top of Chado::AutoDBI<br />
* Connection handled by GMOD<br />
* Database transactions supported<br />
* BioPerl used to represent and manipulate sequence and feature structure<br />
* ‘Lazy’ evaluation<br />
<br />
=====Create and Insert Chromosome=====<br />
<br />
<perl><br />
my $seq_io = new Bio::SeqIO(<br />
-file => "../data/fake_chromosome.txt",<br />
-format => 'fasta'<br />
);<br />
<br />
# Bio::SeqIO will return a Bio::Seq object which<br />
# Modware uses as its representation<br />
my $seq = $seq_io->next_seq();<br />
<br />
my $reference_feature = new Modware::Feature(<br />
-type => 'chromosome',<br />
-bioperl => $seq,<br />
-description => "This is a test",<br />
-name => 'Fake',<br />
-source => 'GMOD 2007 Demo'<br />
);<br />
<br />
# Inserts chromosome into database<br />
$reference_feature->insert();<br />
</perl><br />
<br />
<br />
=====Problem 1 - Create and Insert a Gene=====<br />
<br />
1) Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned feature_id for each inserted gene.<br />
<br />
Gene Feature<br />
symbol: x-ray<br />
synonyms: none<br />
mRNA Feature<br />
exon:<br />
start: 1703<br />
end: 1900<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 1 - Create and Insert a Gene=====<br />
<br />
1) Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned feature_id for each inserted gene.<br />
<br />
Gene Feature<br />
symbol: x-men<br />
synonyms: wolverine<br />
mRNA Feature<br />
exon_1: <br />
start: 12648<br />
end: 13136<br />
strand: 1<br />
srcFeature_id: <br />
Id of genomic sample<br />
<br />
=====Problem 1 - Create and Insert a Gene=====<br />
<br />
1) Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned feature_id for each inserted gene.<br />
<br />
Gene Feature<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
mRNA Feature<br />
exon_1: <br />
start: 13691<br />
end: 13767<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
exon_2: <br />
start: 14687<br />
end: 14720<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 1 - Create and Insert a Gene=====<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
…<br />
<br />
<perl><br />
my $gene_feature = new Modware::Feature(<br />
-type => 'gene',<br />
-name => 'xfile',<br />
-description => 'A test gene for GMOD meeting',<br />
-source => 'GMOD 2007 Demo‘<br />
);<br />
<br />
$gene_feature->add_synonym( 'mulder' );<br />
$gene_feature->add_synonym( 'scully' );<br />
<br />
# inserts object into database<br />
$gene_feature->insert();<br />
print 'Inserted gene with feature_id:'.$gene_feature->feature_id()."\n";<br />
</perl><br />
<br />
=====Problem 1 - Create mRNA BioPerl Object=====<br />
<br />
exon_1: exon_2: <br />
start: 13691 start: 14687<br />
end: 13767 end: 14720<br />
strand: 1 strand: 1<br />
srcFeature_id: Id of genomic sample srcFeature_id: Id of genomic sample<br />
<br />
<perl><br />
# First, create exon features (using Bioperl)<br />
my $exon_1 = new Bio::SeqFeature::Gene::Exon (<br />
-start => 13691,<br />
-end => 13767,<br />
-strand => 1,<br />
-is_coding => 1<br />
);<br />
<br />
my $exon_2 = new Bio::SeqFeature::Gene::Exon (<br />
-start => 14687,<br />
-end => 14720,<br />
-strand => 1,<br />
-is_coding => 1<br />
);<br />
<br />
# Next, create transcript feature to 'hold' exons (using Bioperl)<br />
my $bioperl_mrna = new Bio::SeqFeature::Gene::Transcript();<br />
<br />
# Add exons to transcript (using Bioperl)<br />
$bioperl_mrna->add_exon( $exon_1 );<br />
$bioperl_mrna->add_exon( $exon_2 );<br />
</perl><br />
<br />
=====Problem 1 - Create and Insert mRNA=====<br />
<br />
The BioPerl object holds the location information, but now we want to create a Modware object and link it to the gene as well as locate it on the chromosome.<br />
<br />
<perl><br />
# Now create Modware Feature to 'hold' bioperl object<br />
my $mrna_feature = new Modware::Feature(<br />
-type => 'mRNA',<br />
-bioperl => $bioperl_mrna,<br />
-source => 'GMOD 2007 Demo',<br />
-reference_feature => $reference_feature<br />
);<br />
<br />
# Associate mRNA to gene (required for insertion)<br />
$mrna_feature->gene( $gene_feature );<br />
<br />
# inserts object into database<br />
$mrna_feature->insert();<br />
</perl><br />
<br />
=====Problem 2 - Writing the Report=====<br />
<br />
2) Retrieve and print the following report for gene xfile<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>xfile cds<br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
Create new package, GMODWriter, to write the report, this package<br />
uses Modware and Bioperl methods.<br />
<br />
<perl><br />
use Modware::Gene;<br />
use GMODWriter;<br />
<br />
my $xfile_gene = new Modware::Gene( -name => 'xfile' );<br />
GMODWriter->Write_gene_report( $xfile_gene );<br />
</perl><br />
<br />
* What's the difference between Modware::Gene and Modware::Feature? Gene is-a Feature.<br />
<br />
=====Problem 2 - Writing the Report=====<br />
<br />
2) Retrieve and print the following report for gene xfile<br />
<br />
* The mRNA object contains the Bioperl object<br />
** Why not just subclass? More flexibility the way shown here<br />
<br />
<perl><br />
package GMODWriter; <br />
sub Write_gene_report {<br />
my ($self, $gene) = @_;<br />
my $symbol = $gene->name();<br />
<br />
my @synonyms = @{ $gene->synonyms() };<br />
my $syn_string = join ",", @synonyms;<br />
my $description = $gene->description();<br />
my $type = $gene->type();<br />
# get features associated with the gene that are of type 'mRNA'<br />
my ($mrna) = grep { $_->type() eq 'mRNA' } @{ $gene->features() };<br />
# use bioperl method to get exons from mRNA<br />
my @exons = $mrna->bioperl->exons_ordered();<br />
# Modware will return a nice fasta file for you.<br />
my $fasta = $mrna->sequence( -type => 'cds', -format => 'fasta' );<br />
# Now print the actual report<br />
print "symbol: $symbol\n";<br />
print "synonyms: $syn_string\n";<br />
print "description: $description\n";<br />
print "type: $type\n";<br />
<br />
my $count = 0;<br />
foreach my $exon (@exons ) {<br />
$count++;<br />
print "exon${count} start: ".$exon->start()."\n";<br />
<br />
print "exon${count} end: ".$exon->end()."\n";<br />
<br />
}<br />
print "$fasta";<br />
}<br />
. . .<br />
</perl><br />
<br />
=====Problem 3 - Updating a Gene Name=====<br />
<br />
3) Update the gene xfile: change the name symbol to x-file and retrieve the changed record. Regenerate gene report<br />
<br />
<perl><br />
use Modware::Gene;<br />
use Modware::DBH;<br />
use GMODWriter;<br />
<br />
eval{<br />
<br />
# get xfile gene<br />
my $xfile_gene = new Modware::Gene( -name => 'xfile' );<br />
<br />
# change the name<br />
$xfile_gene->name( 'x-file' );<br />
# write changes to database<br />
$xfile_gene->update();<br />
<br />
# we can use the original object if we want, but instead<br />
# we refetch from the database to 'prove' the name has been changed<br />
my $xfile_gene2 = new Modware::Gene( -name => 'x-file' );<br />
# use our GMODWriter package to write report for x-file<br />
GMODWriter->Write_gene_report( $xfile_gene2 );<br />
<br />
};<br />
if ($@){<br />
warn $@;<br />
new Modware::DBH->rollback();<br />
}<br />
</perl><br />
<br />
<br />
=====Problem 4 - Search and Display Results=====<br />
<br />
4) Search for all genes with symbols starting with "x-*". With the results produce the following simple result list (organism will vary):<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
1325 x-ray Xenopus laevis<br />
<br />
<perl><br />
use Modware::Gene;<br />
use Modware::DBH;<br />
use GMODWriter;<br />
<br />
# find genes starting with 'x-'<br />
my $results = Modware::Search::Gene->Search_by_name( 'x-*' );<br />
<br />
# write the search results<br />
GMODWriter->Write_search_results( $results )<br />
</perl><br />
<br />
<br />
=====Problem 4 - Search and Display Results=====<br />
<br />
4) Search for all genes with symbols starting with "x-*". With the results produce the following simple result list (organism will vary):<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
1325 x-ray Xenopus laevis<br />
<br />
<br />
<perl><br />
sub Write_search_results {<br />
my ($self, $itr) = @_;<br />
# loop through iterator<br />
while (my $gene = $itr->next) {<br />
# print the requested information<br />
print $gene->feature_id . "\t" . $gene->name .<br />
"\t" . $gene->organism_name . "\n";<br />
}<br />
}<br />
</perl><br />
<br />
=====Problem 5 - Delete a Gene=====<br />
<br />
5) Delete the gene x-ray. Run the search and report again.<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
<br />
<perl><br />
# get the xray gene<br />
my $xray = new Modware::Gene( -name => 'x-ray' );<br />
<br />
# set is_deleted = 1, this will 'hide' the gene from searches,<br />
# also sets the is_available to 0, the gene is no longer visible<br />
# to a search.<br />
<br />
$xray->is_deleted(1);<br />
<br />
# write change to database<br />
$xray->update();<br />
<br />
# find genes starting with 'x-'<br />
my $results = Modware::Search::Gene->Search_by_name( 'x-*' );<br />
<br />
# write the search results<br />
GMODWriter->Write_search_results( $results )<br />
</perl><br />
<br />
<br />
=====Other Modware Highlights=====<br />
<br />
* Easy to write applications with Modware<br />
* Extensible<br />
* Available through Sourceforge<br />
** http://gmod-ware.sourceforge.net<br />
* Easy to install<br />
* Large unit test coverage<br />
* Current release 0.2-RC1<br />
** Works with GMOD’s latest release<br />
* Sample script demoed here are available<br />
** sample_scripts directory<br />
<br />
=====Other Nice Things About Modware=====<br />
<br />
<br />
* Bioperl-style documentation <br />
** http://gmod-ware.sourceforge.net/doc/<br />
** POD for all methods<br />
* If Chado changes then...<br />
** Manually change Modware or ... <br />
** AutoDBI will automatically adjust to the change, depends on the change<br />
* Can set multiple connections through AutoDBI's <code>set_connection</code><br />
<br />
=====Coming Attractions=====<br />
<br />
* Support for changing genomic sequence<br />
* ncRNAs<br />
* UTRs<br />
* Onotology modules<br />
* Phenotype Annotations<br />
* Getting a new database handle returns the existing<br />
** Thinking about configuring modules to set what database handle can be used<br />
* Pass an argument ''type'' to the Gene's feature() method<br />
* Type the kind of synonym is being inserted?<br />
** Possible: trade-off between simplicity and functionality<br />
<br />
* Send us your ideas!<br />
<br />
<br />
=====Discussion=====<br />
<br />
* How hard is it to extend Modware?<br />
** Not known absolutely, but generally thought to be not difficult<br />
<br />
=====Acknowlegments=====<br />
<br />
* Rex Chisholm, PhD<br />
* Warren Kibbe, PhD <br />
* Scott Cain<br />
* Brian O’connor<br />
* Sohel Merchant<br />
* Petra Fey<br />
* Pascale Gaudet,<br />
* Karen Pilcher<br />
<br />
* BioPerl<br />
* GMOD<br />
* SGD<br />
<br />
===GBrowse (DasI) Adaptor===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/project/showfiles.php?group_id=27707&package_id=34513&release_id=433523<br />
* Language: Perl<br />
* Authors: Scott Cain<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity: Connects vi perl DBI<br />
* Transaction support: N/A (read only adapter)<br />
* Code generation: N/A<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Read-only<br />
* Not generic middleware but if you use Chado and GBrowse may be useful<br />
* Incomplete implementation of Bio::DasI; just enough to make GBrowse work<br />
* Also, despite the name, has never been tested with a Das server.<br />
<br />
====Presentation by Scott Cain====<br />
<br />
This Wiki section is an edited version of [[Media:DasI_middleware.pdf|Scott's presentation]].<br />
<br />
=====Create the database=====<br />
<br />
$ perl Makefile.PL<br />
$ make<br />
$ sudo make install<br />
$ make load_schema<br />
$ make prepdb # now with Xenopus!<br />
$ make ontologies # load rel, SO, featureprop<br />
<br />
=====Problem 1 - Loading Data=====<br />
<br />
Create some GFF from the specifications:<br />
<br />
fake_chromosome example chromosome 1 15017 . . . ID=fake_chromosome;Name=fake_chromosome<br />
fake_chromosome example gene 13691 14720 . + . ID=xfile;Name=xfile;Alias=mulder,scully;Note=A test gene for GMOD meeting<br />
fake_chromosome example mRNA 13691 14720 . + . ID=xfile_mRNA;Parent=xfile<br />
fake_chromosome example exon 13691 13767 . + . Parent=xfile_mRNA<br />
fake_chromosome example exon 14687 14720 . + . Parent=xfile_mRNA<br />
fake_chromosome example gene 12648 13136 . + . ID=x-men<br />
<br />
Gene inserted as GFF using a standard Bioperl bulk loader:<br />
<br />
<code>$ gmod_bulk_load_gff3.pl -g sample.gff</code><br />
<br />
''...lots of output...''<br />
<br />
=====Adaptor Components=====<br />
<br />
* Bio::DB::Das::Chado<br />
** Database connection object<br />
* Bio::DB::Das::Chado::Segment<br />
** Object for any range of DNA<br />
* Bio::DB::Das::Chado::Segment::Feature<br />
<br />
=====Use Bio::DB::Das::Chado=====<br />
<br />
<perl><br />
use Bio::DB::Das::Chado;<br />
<br />
my $chado = Bio::DB::Das::Chado->new(<br />
-dsn => "dbi:Pg:dbname=test",<br />
-user=> "scott",<br />
-pass=> "" ) || die "no new chado";<br />
<br />
my $gene_name = 'xfile';<br />
<br />
my ($gene_fo) = $chado->get_features_by_name($gene_name);<br />
</perl><br />
<br />
=====Problem 2 - Use Some Accessors=====<br />
<br />
<perl><br />
print "symbol: " . $gene_fo->display_name."\n";<br />
print "synonyms: " . join(', ',$gene_fo->synonyms)."\n";<br />
print "description: " . $gene_fo->notes."\n";<br />
print "type: " . $gene_fo->type."\n";<br />
<br />
my ($mRNA) = $gene_fo->sub_SeqFeature();<br />
my @exons = $mRNA->sub_SeqFeature();<br />
<br />
for my $exon (@exons) {<br />
next unless ($exon->type->method eq 'exon');<br />
$exon_count++;<br />
print "exon$exon_count start: " . $exon->start."\n";<br />
print "exon$exon_count end: " . $exon->end. "\n";<br />
$cds_seq .= $exon->seq->seq; # the first seq call returns a Bio::Seq object, the second gets the DNA string from Bio::Seq<br />
} <br />
</perl><br />
<br />
=====Bulk Output=====<br />
<br />
<perl><br />
my $gene_name = 'x-*';<br />
<br />
my @genes = $chado->get_features_by_name(<br />
-name => $gene_name,<br />
-class=> 'gene' );<br />
<br />
for my $gene (@genes) {<br />
print join("\t",<br />
$gene->feature_id,<br />
$gene->display_name,<br />
$gene->organism),"\n";<br />
}<br />
</perl><br />
<br />
Or see your report in GBrowse<br />
<br />
=====Advantages=====<br />
<br />
* Comes 'for free' with GBrowse<br />
** GBrowse will run with any DasI-compatible interface<br />
* Uses 'familiar' BioPerl idioms, very similar to widely used Bio::DB::GFF (though with fewer methods)<br />
<br />
<br />
=====Conclusion=====<br />
<br />
* Not suitable as a 'general' middleware layer<br />
** May be suitable for some applications, particularly if they are similar to GBrowse or other uses of Bio::DB::GFF<br />
<br />
===iBatis and Abator===<br />
<br />
====Background====<br />
<br />
* Source: http://ibatis.apache.org/<br />
* Language: Java<br />
* Authors: Apache group<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Does not hide SQL<br />
* Does not create a whole object model of the database in memory<br />
* Not as widely used as Hibernate<br />
* No Perl version<br />
<br />
====Presentation by Jeff Bowes====<br />
<br />
Jeff Bowes, Xenbase, University of Calgary. This Wiki section is an edited version of [[Media:iBatis.pdf|Jeff's presentation]].<br />
<br />
=====ibatis=====<br />
<br />
* iBatis<br />
** Light-weight framework<br />
** Still based on SQL but eliminates the repetitive drudgery of JDBC<br />
** You can tune a query by re-writing the SQL in XML & the API does not change.<br />
* iBatis does not create your database in memory as objects<br />
* Shallow learning curve<br />
* Manually create a Java class and SQL map to describe higher-level objects<br />
** Example: ''Gene''<br />
* Support for inheritance<br />
** Inheritance in result maps, allows fair amount of re-use. <br />
* Supports different transaction schemes<br />
** For example, JDBC, Java Transaction API<br />
<br />
=====Abator=====<br />
<br />
* Generates ibatis CRUD objects by introspecting database tables<br />
* Abator creates ''SQL in XML'' files (SQL Map files) and Java classes <br />
** Within these files is a Result Map section.<br />
* Abator config files are simple, set connection parameters, tell where the files are.<br />
* In the SQL Map files you can specify how to find parent ids, such asfeature_id. <br />
<br />
=====Abator Example=====<br />
<xml><br />
<abatorConfiguration><br />
<abatorContext> <!-- TODO: Add Database Connection Information --><br />
<jdbcConnection driverClass="COM.ibm.db2.jdbc.app.DB2Driver"<br />
connectionURL="jdbc:db2:XBDV05"<br />
userId="db2inst1"<br />
password=“*******"><br />
<classPathEntry location="/Program Files/IBM/SQLLIB/java/db2java.zip" /><br />
</jdbcConnection><br />
<br />
<javaModelGenerator<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.model"<br />
targetProject="gene" /><br />
<sqlMapGenerator<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.sql"<br />
targetProject="gene" /><br />
<daoGenerator type="IBATIS"<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.dao"<br />
targetProject="gene" /><br />
<abatorConfiguration><br />
</xml><br />
<br />
=====Abator Example=====<br />
<xml><br />
<abatorConfiguration><br />
<abatorContext> <!-- TODO: Add Database Connection Information --><br />
<jdbcConnection driverClass="COM.ibm.db2.jdbc.app.DB2Driver"<br />
connectionURL="jdbc:db2:XBDV05"<br />
userId="db2inst1"<br />
password=“*******"><br />
<classPathEntry location="/Program Files/IBM/SQLLIB/java/db2java.zip" /><br />
</jdbcConnection><br />
<br />
<javaModelGenerator<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.model"<br />
targetProject="gene" /><br />
<sqlMapGenerator<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.sql"<br />
targetProject="gene" /><br />
<daoGenerator type="IBATIS"<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.dao"<br />
targetProject="gene" /><br />
<abatorConfiguration><br />
</xml><br />
<br />
=====Abator Example=====<br />
<nowiki><br />
<table schema="db2inst1" tableName="synonym"></nowiki><br />
<generatedKey column="synonym_id" sqlStatement="VALUES PREVVAL FOR<br />
synonym_seq" identity="true" /><br />
<columnOverride column="CREATED_BY" jdbcType="INTEGER" /><br />
<columnOverride column="MODIFIED_BY" jdbcType="INTEGER" /><br />
<nowiki></table></nowiki><br />
<br />
=====Abator=====<br />
<br />
Works as:<br />
<br />
* Eclipse plug-in<br />
* ANT<br />
* Standalone<br />
<br />
=====DAO Methods=====<br />
<br />
* Insert (Feature)<br />
* Update (Feature)<br />
* DeletebyKey (FeatureKey)<br />
* SelectbyKey (FeatureKey)<br />
* SelectbyExample (FeatureExample)<br />
* DeletebyExample (FeatureExample)<br />
<br />
=====Insert=====<br />
<xml><br />
<insert id="abatorgenerated_insert" parameterClass=<br />
"org.gmod.architecture.framwork.bakeoff.abator.model.FeatureWithBLOBs"><br />
insert into db2inst1.feature<br />
(DBXREF_ID, ORGANISM_ID, NAME, UNIQUENAME,<br />
RESIDUES, SEQLEN, MD5CHECKSUM, TYPE_ID, IS_ANALYSIS,<br />
IS_OBSOLETE, CREATED_BY)<br />
values (#dbxrefId:INTEGER#, #organismId:INTEGER#, #name:VARCHAR#,<br />
#uniquename:VARCHAR#, #residues:CLOB#, #seqlen:INTEGER#,<br />
#md5checksum:CHAR#, #typeId:INTEGER#,<br />
#isAnalysis:SMALLINT#, #isObsolete:SMALLINT#,<br />
#createdBy:INTEGER#)<br />
<br />
<selectKey resultClass="java.lang.Integer" keyProperty="featureId"><br />
VALUES PREVVAL FOR feature_seq<br />
</selectKey><br />
</insert><br />
</xml><br />
<br />
=====Insert=====<br />
<xml><br />
<insert id="abatorgenerated_insert" parameterClass=<br />
"org.gmod.architecture.framwork.bakeoff.abator.model.FeatureWithBLOBs"><br />
insert into db2inst1.feature<br />
(DBXREF_ID, ORGANISM_ID, NAME, UNIQUENAME,<br />
RESIDUES, SEQLEN, MD5CHECKSUM, TYPE_ID, IS_ANALYSIS,<br />
IS_OBSOLETE, CREATED_BY)<br />
values (#dbxrefId:INTEGER#, #organismId:INTEGER#, #name:VARCHAR#,<br />
#uniquename:VARCHAR#, #residues:CLOB#, #seqlen:INTEGER#,<br />
#md5checksum:CHAR#, #typeId:INTEGER#,<br />
#isAnalysis:SMALLINT#, #isObsolete:SMALLINT#,<br />
#createdBy:INTEGER#)<br />
<br />
<selectKey resultClass="java.lang.Integer" keyProperty="featureId"><br />
VALUES PREVVAL FOR feature_seq<br />
</selectKey><br />
</insert><br />
</xml><br />
<br />
=====Insert=====<br />
<xml><br />
<insert id="abatorgenerated_insert" parameterClass=<br />
"org.gmod.architecture.framwork.bakeoff.abator.model.FeatureWithBLOBs"><br />
insert into db2inst1.feature<br />
(DBXREF_ID, ORGANISM_ID, NAME, UNIQUENAME,<br />
RESIDUES, SEQLEN, MD5CHECKSUM, TYPE_ID, IS_ANALYSIS,<br />
IS_OBSOLETE, CREATED_BY)<br />
values (#dbxrefId:INTEGER#, #organismId:INTEGER#, #name:VARCHAR#,<br />
#uniquename:VARCHAR#, #residues:CLOB#, #seqlen:INTEGER#,<br />
#md5checksum:CHAR#, #typeId:INTEGER#,<br />
#isAnalysis:SMALLINT#, #isObsolete:SMALLINT#,<br />
#createdBy:INTEGER#)<br />
<br />
<selectKey resultClass="java.lang.Integer" keyProperty="featureId"><br />
VALUES PREVVAL FOR feature_seq<br />
</selectKey><br />
</insert><br />
</xml><br />
<br />
=====Insert=====<br />
<xml><br />
<selectKey resultClass="java.lang.Integer"<br />
keyProperty="featureId"><br />
VALUES PREVVAL FOR feature_seq<br />
</selectKey><br />
</xml><br />
<br />
=====Insert=====<br />
<xml><br />
<selectKey resultClass="java.lang.Integer"<br />
keyProperty="featureId"><br />
VALUES PREVVAL FOR feature_seq<br />
</selectKey><br />
</xml><br />
<br />
=====Problem 1 - Insert=====<br />
<br />
<java><br />
try {<br />
sqlMap.startTransaction();<br />
pGene.id =featureDAO.insert(pGene.getFeatureWithBLOBs());<br />
featurepropDAO.insert(pGene.getPropertyDescription());<br />
pGene.featurelocId = featurelocDAO.insert(pGene<br />
.getFeaturelocWithBLOBS());<br />
pGene = insertExons(pGene);<br />
insertSynonyms(pGene);<br />
sqlMap.commitTransaction();<br />
} catch (Exception e) {<br />
System.out.println(e);<br />
throw (e);<br />
} finally {<br />
sqlMap.endTransaction();<br />
}<br />
</java><br />
<br />
=====Problem 1 - Insert=====<br />
<br />
<java><br />
try {<br />
sqlMap.startTransaction();<br />
pGene.id =featureDAO.insert(pGene.getFeatureWithBLOBs());<br />
featurepropDAO.insert(pGene.getPropertyDescription());<br />
pGene.featurelocId = featurelocDAO.insert(pGene<br />
.getFeaturelocWithBLOBS());<br />
pGene = insertExons(pGene);<br />
insertSynonyms(pGene);<br />
sqlMap.commitTransaction();<br />
} catch (Exception e) {<br />
System.out.println(e);<br />
throw (e);<br />
} finally {<br />
sqlMap.endTransaction();<br />
}<br />
</java><br />
<br />
=====Transactions=====<br />
<br />
* SQLMap<br />
* JDBC<br />
* JTA - Java Transaction API<br />
** 2-Phase commit<br />
* Hibernate<br />
* External (Customized)<br />
<br />
=====Retrieval=====<br />
<br />
symbol: xfile<br />
description: A test gene for GMOD meeting<br />
mRNA Feature<br />
exon_1: start: 13691 end: 13767 <br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
exon_2: start: 14687 end: 14720<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 2 - Master Detail Reports=====<br />
<br />
Account for cycles or recursion in Master Detail Report. <br />
<br />
<xml><br />
<resultMap id="SelectGeneResults"<br />
class="org.gmod.architecture.framwork.bakeoff.Gene" groupBy="id"><br />
<result column="FEATURE_ID" property="id" jdbcType="INTEGER"/><br />
<result column="GENE_NAME" property="name" jdbcType="VARCHAR" /><br />
<result column="DESCRIPTION" property="description“<br />
jdbcType="VARCHAR" /><br />
<result column="TYPE_ID" property="typeId" jdbcType="INTEGER" /><br />
<result property="exons" resultMap = "gene.SelectExonResults"/><br />
</resultMap><br />
<br />
<resultMap id="SelectExonResults"<br />
class="org.gmod.architecture.framwork.bakeoff.Exon"><br />
<result column="EXON_ID" property="id" jdbcType="INTEGER"/><br />
<result column="EXON_NAME" property="name" jdbcType="VARCHAR" /><br />
<result column="EXON_RESIDUES" property="residues" jdbcType="CLOB" /><br />
<result column="STRAND" property="strand" jdbcType="INTEGER" /><br />
<result column="FMIN" property="fmin" jdbcType="INTEGER" /><br />
<result column="FMAX" property="fmax" jdbcType="INTEGER" /><br />
<result column="SRCFEATURE_ID" property="sourceFeatureId"<br />
jdbcType="INTEGER" /><br />
</resultMap><br />
</xml><br />
<br />
=====Problem 2 - Master Detail Report=====<br />
<xml><br />
<resultMap id="SelectGeneResults"<br />
class="org.gmod.architecture.framwork.bakeoff.Gene" groupBy="id"><br />
<result column="FEATURE_ID" property="id" jdbcType="INTEGER"/><br />
<result column="GENE_NAME" property="name" jdbcType="VARCHAR" /><br />
<result column="DESCRIPTION" property="description“<br />
jdbcType="VARCHAR" /><br />
<result column="TYPE_ID" property="typeId" jdbcType="INTEGER" /><br />
<result property="exons" resultMap = "gene.SelectExonResults"/><br />
</resultMap><br />
<br />
<resultMap id="SelectExonResults"<br />
class="org.gmod.architecture.framwork.bakeoff.Exon"><br />
<result column="EXON_ID" property="id" jdbcType="INTEGER"/><br />
<result column="EXON_NAME" property="name" jdbcType="VARCHAR" /><br />
<result column="EXON_RESIDUES" property="residues" jdbcType="CLOB" /><br />
<result column="STRAND" property="strand" jdbcType="INTEGER" /><br />
<result column="FMIN" property="fmin" jdbcType="INTEGER" /><br />
<result column="FMAX" property="fmax" jdbcType="INTEGER" /><br />
<result column="SRCFEATURE_ID" property="sourceFeatureId"<br />
jdbcType="INTEGER" /><br />
</resultMap><br />
</xml><br />
<br />
=====Master Detail Report=====<br />
<br />
gene_id Symbol Type Fmin Fmax<br />
6129482 x-files gene 13691 13767<br />
6129482 x-files gene 14687 14720<br />
<br />
Becomes:<br />
<br />
gene_id Symbol Type Fmin Fmax<br />
6129482 x-files gene 13691 13767<br />
14687 14720<br />
<br />
=====Dynamic Queries=====<br />
<br />
* Gene Name (Description)<br />
** Feature, Featureprop<br />
* Symbol<br />
** Feature<br />
* Feature Synonyms<br />
** Feature, Feature_Synonym, Synonym<br />
* Ortholog Synonyms<br />
** Feature, Feature_relationship, Feature, Feature Synonyms<br />
<br />
=====Dynamic Queries=====<br />
<br />
FROM<br />
CAT_X_GENE_V gc<br />
<isEqual<br />
prepend=",property="searchSymbol"<br />
compareValue="true"><br />
GENE_SYMBOLS s<br />
</isEqual><br />
<br />
<isEqual prepend=","<br />
property="searchNcbi" <br />
compareValue="true"><br />
NCBI_GI n<br />
</isEqual><br />
<br />
=====Dynamic Queries=====<br />
<br />
<dynamic prepend="WHERE"><br />
<isEqual prepend="AND" property="searchNameOnly“<br />
compareValue="true"><br />
<iterate property="searchTokens" conjunction="AND" <br />
open=" (" close=") "><br />
LOWER(VARCHAR(gc.longname)) LIKE <br />
LOWER(CAST(#searchTokens[]:VARCHAR# AS VARCHAR(512)))<br />
</iterate><br />
</isEqual><br />
<br />
Iterate very useful for multiple search terms <br />
<br />
=====Miscellaneous Features=====<br />
<br />
* Supports various data sources<br />
** Simple JDBC<br />
** DBCP – Apache Connection Pooling<br />
** JNDI – Java Naming Directory Interface<br />
* Very flexible<br />
* Local caching of results<br />
** Lazy loading<br />
<br />
=====Support=====<br />
<br />
* In GMOD used by<br />
** Xenbase, Artemis at Sanger<br />
* Many other users<br />
** e.g. MySpace.com<br />
* Top level Apache Project<br />
** www.ibatis.apache.org<br />
* Active community<br />
<br />
<br />
=====What iBatis Does Well=====<br />
<br />
* Does not hide SQL<br />
** No new query language to learn<br />
* Separates and groups SQL<br />
* Simple!!<br />
** Light wrapper - No real tweaks<br />
* Does the job well<br />
* Excellent support for Master-Detail<br />
* Dynamically generated queries <br />
** You can structure conditions around clauses in SQL<br />
** One XML statement can represent many variations on a query<br />
<br />
=====Acknowledgements=====<br />
<br />
GMOD<br />
* Eric Just<br />
* Everyone else<br />
<br />
Ibatis Developers<br />
* Kevin Snyder,<br />
* Chris Jarabek,<br />
* Ross Gibb<br />
<br />
PI<br />
* Peter Vize<br />
<br />
Financial Support<br />
* Alberta Heritage Foundation for Medical Research<br />
* Alberta Network for Proteomics Innovation<br />
* University of Calgary, Faculty of Science<br />
* University of Calgary Dept. of Computer Science<br />
* NICHD<br />
<br />
===Hibernate===<br />
<br />
====Background====<br />
<br />
* Source: http://www.hibernate.org<br />
* Language: Java<br />
* Authors: JBoss group<br />
* Users: VectorBase<br />
* Support: JBoss group<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Limitations====<br />
<br />
* Issue around Completeness <br />
* Exception Handling<br />
* Performance Tuning<br />
<br />
====Presentation by Robert Bruggner====<br />
<br />
Chado API via Java & Hibernate, Robert Bruggner, VectorBase.org. This Wiki section is an edited version of [[Media:HibernateChadoAPI.pdf|Robert's presentation]].<br />
<br />
=====Overview=====<br />
<br />
* Background<br />
* Quick Hibernate Overview<br />
* Hibernate Connectivity and O/R Mapping Example<br />
* GMOD Demo<br />
<br />
Also see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]].<br />
<br />
=====Background=====<br />
<br />
* VectorBase<br />
** A bioinformatic resource center for invertebrate vectors of human pathogens<br />
* Responsible for storage and display of multiple organisms’ genomes<br />
** Anopheles gambiae, Aedes aegypti, Ixodes scapularis, Culex pipiens and so on....<br />
* Want to store data for many organisms- Chado a natural choice<br />
* Ensembl Genome Browser already used for ''A. gambiae''<br />
** Wrote Ensembl API Database adaptor for Chado... Not maintainable.<br />
* Use Both Databases<br />
** Transfer genomic data from Ensembl to Chado<br />
** Search Engine and Indexer using Lucene<br />
** Run DAS<br />
** Export data via ChadoXML and GFF3<br />
* Need API for Database I/O<br />
<br />
=====Hibernate Background=====<br />
<br />
* They say: “A powerful, high performance object/relational persistence and query service.”<br />
* Automates the persistence of plain old Java objects (POJO)<br />
** User maps their POJO properties to database tables via XML (HBM File).<br />
** There are Hibernate tools that generate HBMs<br />
*** Configurable in the sense that one can create get & set tables where the methods map one-to-one to fields.<br />
* Persist a specific object by storing it the database.<br />
* Intelligent Database I/O <br />
** Smart detection of ''Dirty Properties'' when performing Save / Update / Delete.<br />
** Cascadable Save / Update / Delete for complex objects.<br />
* Everything's done within the scope of a transaction.<br />
<br />
=====Hibernate Database Connectivity=====<br />
<br />
* Configure Hibernate in hibernate.cfg.xml<br />
* Define a Data Source<br />
** We use a simple, single JDBC connection Chado<br />
** Can be configured to use a connection pool or data source accessible by the Java Naming and Directory Interface (JNDI).<br />
** Define a connection “dialect”<br />
** org.hibernate.dialect.PostgreSQLDialect<br />
* Describe the relationship between Java objects and database tables<br />
** Use XML to describe where to store POJO property data in the database<br />
* Create a new Hibernate Session based on the configuration<br />
* Begin a transaction to start performing work<br />
<br />
=====POJO and HBM Example file - CV=====<br />
<br />
<java><br />
public class CV {<br />
<br />
private int cv_id<br />
private String name;<br />
private String definition;<br />
<br />
public property gettersandsetters() {<br />
....<br />
}<br />
<br />
public boolean equals(CV comparaCV) {<br />
....<br />
}<br />
public int hashCode(){<br />
...<br />
}<br />
}<br />
</java><br />
<xml><br />
<hibernate-mapping><br />
<br />
<class name="org.vectorbase.chadoAPI.chadoObjects.CV" table="cv"><br />
<br />
<id name="cv_id" column="cv_id" unsaved-value="undefined"><br />
<br />
<generator class="sequence"><br />
<br />
<param name="sequence">cv_cv_id_seq</param><br />
<br />
</generator><br />
<br />
</id> <br />
<br />
<property name="name" column=”name” type="java.lang.String” not-null="true"/><br />
<br />
<property name="definition" column=”definition” type="java.lang.String”/><br />
<br />
</class><br />
</hibernate-mapping><br />
</xml><br />
<br />
=====HBM Example CVTerm=====<br />
<br />
<java><br />
public class CVTerm {<br />
<br />
private int cvterm_id;<br />
<br />
private CV cv;<br />
<br />
private String name;<br />
<br />
private String definition;<br />
<br />
private DBXref dbxref;<br />
<br />
private int is_obsolete;<br />
<br />
private int is_relationshiptype;<br />
<br />
.....<br />
<br />
}<br />
</java><br />
<xml><br />
<hibernate-mapping><br />
<br />
<class name="org.vectorbase.chadoAPI.chadoObjects.CVTerm" table="cvterm"><br />
<br />
<id name="cvterm_id" column="cvterm_id" unsaved-value="undefined"><br />
<br />
<generator class="sequence"><br />
<br />
<param name="sequence">cvterm_cvterm_id_seq</param><br />
<br />
</generator><br />
<br />
</id><br />
<br />
<many-to-one name="cv" class="org.vectorbase.chadoAPI.chadoObjects.CV" column="cv_id" <br />
not-null="true" cascade="save-update"/><br />
<br />
<property name="name" not-null="true" type="java.lang.String"/><br />
<br />
<property name="definition"/><br />
<br />
<one-to-one name="dbxref" class="org.vectorbase.chadoAPI.chadoObjects.DBXref" cascade="all"/><br />
<br />
<property name="is_obsolete"/><br />
<br />
<property name="is_relationshiptype"/><br />
<br />
</class><br />
</hibernate-mapping><br />
</xml><br />
<br />
=====Hibernate Object Retrieve=====<br />
<br />
One can use Java, Hibernate Query Language, or SQL, this example uses HQL<br />
<br />
<java><br />
import org.hibernate.Session;<br />
import org.vectorbase.chadoAPI.CVTerm;<br />
import org.vectorbase.chadoAPI.CV;<br />
<br />
// Load the configuration from hibernate.cfg.xml<br />
<br />
// Build a session factory first (not shown)<br />
<br />
// Get the session based on the configuration and begin transaction<br />
Session session = HibernateSessionFactory.getCurrentSession();<br />
session.beginTransaction();<br />
<br />
// Load a CVTerm by its ID<br />
CVTerm cvt = (CVTerm) session.get(CVTerm.class,1);<br />
<br />
// Load a CVTerm using HQL<br />
CVTerm cvt = session.createQuery(“from CVTerm where name=?”).setString(0,”name”).uniqueResult();<br />
<br />
// Print out the name of the cvterm<br />
System.out.println(cvt.getName());<br />
<br />
// Get the cv that the cvterm is associated with<br />
// Hibernate doesn’t return the cv_id - it returns a CV Object.<br />
CV cv = cvt.getCv();<br />
<br />
// Print out the cv’s name<br />
System.out.println(cv.getName());<br />
</java><br />
<br />
=====Hibernate Object Update=====<br />
<br />
<java><br />
import org.hibernate.Session;<br />
import org.vectorbase.chadoAPI.CVTerm;<br />
<br />
// Load the configuration from hibernate.cfg.xml<br />
<br />
// Build a session factory first (not shown)<br />
<br />
// Get the session based on the configuration and begin transaction<br />
Session session = HibernateSessionFactory.getCurrentSession();<br />
session.beginTransaction();<br />
<br />
// Load a CVTerm by its ID<br />
CVTerm cvt = (CVTerm) session.get(CVTerm.class,1);<br />
<br />
// Change cvt’s name<br />
cvt.setName(“New CVTerm name”);<br />
<br />
// Save!<br />
// Generated SQL updates “Dirty” properties (name, in this case)<br />
session.save(cvt);<br />
<br />
// Commit data to database<br />
session.commit();<br />
</java><br />
<br />
=====Hibernate Save=====<br />
<br />
<java><br />
import org.hibernate.Session;<br />
import org.vectorbase.chadoAPI.CVTerm;<br />
import org.vectorbase.chadoAPI.CV;<br />
<br />
// Load the configuration from hibernate.cfg.xml<br />
// Build a session factory first and get begin transaction (not shown)<br />
<br />
// Make a new CV<br />
CV new_cv = new CV();<br />
new_cv.setName(“New CV”);<br />
new_cv.setDefinition(“New CV Def”);<br />
<br />
// Make a new cvterm for that cv<br />
CVTerm new_cvterm = new CVTerm();<br />
new_cvterm.setName(“New CVTerm Name”);<br />
// ..... save dbxref etc......<br />
<br />
// Add that CVTerm to our new CV<br />
new_cv.addCVTerm(new_cvterm);<br />
<br />
// Save the new data...<br />
// Hibernate recognizes that it has to first save new_cv, then save new_cvterm.<br />
session.save(new_cvterm);<br />
<br />
session.commit();<br />
<br />
// You can see the new id’s assigned by the database<br />
System.out.println(new_cv.getCv_id());<br />
System.out.println(new_cvterm.getCvterm_id());<br />
</java><br />
<br />
=====Inheritance=====<br />
<xml><br />
<hibernate-mapping><br />
<br />
<class name="org.vectorbase.chadoAPI.chadoObjects.Feature" table="feature" discriminator-<br />
value="not null"><br />
<br />
<id name="feature_id" column="feature_id" unsaved-value="undefined"><br />
<br />
<generator class="sequence"><br />
<br />
<param name="sequence">feature_feature_id_seq</param><br />
<br />
</generator><br />
<br />
</id> <br />
<br />
<discriminator column="type_id" type="integer" insert="false"/><br />
<br />
<many-to-one name="dbxref" class="org.vectorbase.chadoAPI.chadoObjects.DBXref" <br />
column="dbxref_id" cascade="all"/><br />
<br />
<many-to-one name="organism" class="org.vectorbase.chadoAPI.chadoObjects.Organism" <br />
column="organism_id" not-null="true" cascade="save-update"/><br />
<br />
<property name="name"/><br />
.....<br />
<br />
<hibernate-mapping> <br />
<br />
<subclass name="org.vectorbase.chadoAPI.chadoFeatures.Gene" <br />
extends="org.vectorbase.chadoAPI.chadoObjects.Feature" discriminator-value="767"><br />
<br />
</subclass><br />
</hibernate-mapping><br />
</xml><br />
Write custom methods for specific sub-classes<br />
<br />
=====ChadoAPI=====<br />
<br />
* POJO Mappings<br />
** CV, CVTerm, DB, DBXref, Feature, FeatureCVTerm, FeatureDBXref, FeatureLoc, FeatureProp, FeatureRelationship, FeatureSynonym, Organism, Pub, Synonym<br />
* Extended Features<br />
** Chromosome, Gene, Transcript, Exon, Protein<br />
* Constants<br />
** CVTerms, FeatureFeatureRelationships, Ontologies<br />
* Special<br />
** ChadoAdapter<br />
<br />
=====Problem 1 - GMOD Example=====<br />
<br />
<java><br />
// Set up our session and begin transaction<br />
Session session = HibernateUtil.getSessionFactory().getCurrentSession();<br />
session.beginTransaction();<br />
<br />
// Make a chado adpator and load up some utility objects<br />
ChadoAdaptor ca = new ChadoAdaptor();<br />
Chromosome c = ca.fetchChromosomeByUniqueName("fake_chromosome");<br />
Pub null_pub = ca.fetchPubByPubID(1);<br />
Organism agambiae = ca.fetchOrganismByScientificName("Anopheles","gambiae");<br />
<br />
// Begin GMOD Demo Code<br />
<br />
// Make our new gene;<br />
Gene xfile = new Gene();<br />
xfile.setOrganism(agambiae);<br />
xfile.setUniquename("xfile");<br />
xfile.setDescription("A test gene for GMOD meeting");<br />
<br />
/* Set the location of our gene. No need to set coordinates because they'll be updated<br />
* based on the exon boundaries. <br />
*/<br />
FeatureLoc xfile_loc = new FeatureLoc();<br />
xfile_loc.setSrcfeature(c);<br />
xfile_loc.setStrand(1);<br />
xfile.setFeatureLoc(xfile_loc);<br />
<br />
// Add synonyms to xfile<br />
xfile.createNewFeatureSynonym("mulder", null_pub, CVTerms.EXACT_SYNONYM);<br />
xfile.createNewFeatureSynonym("scully", null_pub, CVTerms.EXACT_SYNONYM);<br />
</java><br />
<br />
=====Problem 2 - GMOD Example=====<br />
<br />
<java><br />
// Create a new transcript for our gene.<br />
Transcript t = xfile.createGeneTranscript("xfile-RA");<br />
<br />
// Create some exons for that transcript.<br />
t.createTranscriptExon("xfile:1", 13691, 13767);<br />
t.createTranscriptExon("xfile:2", 14687, 14720);<br />
<br />
// Save our new gene<br />
session.save(xfile);<br />
System.out.println("xfile feature_id is " + xfile.getFeature_id());<br />
<br />
// Fetch our saved gene from the database<br />
Gene xfile_r = ca.fetchGeneByUniqueName("xfile");<br />
System.out.println("symbol: " + xfile_r.getUniquename());<br />
System.out.print("synonyms: ");<br />
for (FeatureSynonym fs : xfile_r.getFeatureSynonyms()){<br />
<br />
System.out.print(fs.getSynonym().getName() + " ");<br />
}<br />
<br />
System.out.println("description: " + xfile_r.getDescription());<br />
System.out.println("type: " + xfile_r.getType().getName());<br />
<br />
for (Transcript tx : xfile_r.fetchAllTranscripts()){<br />
for (Exon e : tx.fetchAllExons()){<br />
System.out.println(e.getUniquename() + " Start:\t" + e.getFeatureLoc().getFmin());<br />
System.out.println(e.getUniquename() + " End:\t" + e.getFeatureLoc().getFmax());<br />
System.out.println("\tSrcFeatureID: " + e.getFeatureLoc().getSrcfeature().getFeature_id());<br />
}<br />
System.out.println(">" + tx.getUniquename());<br />
System.out.println(tx.generateTranscriptSequenceFromExons().toUpperCase());<br />
}<br />
</java><br />
<br />
=====Problems 3, 4, & 5 - GMOD Update & Delete=====<br />
<br />
<java><br />
// Lets update our name...<br />
xfile_r.setUniquename("x-file");<br />
<br />
session.save(xfile_r);<br />
<br />
// Not part of the ChadoAdaptor utility object, but a good example of HQL<br />
List<Gene> genes = (List<Gene>)session.createQuery("from Gene where uniquename like ?").setString(0,”x-%”).list();<br />
<br />
for (Gene g : genes){<br />
<br />
System.out.println(g.getFeature_id() + <br />
"\t" + g.getUniquename() + <br />
"\t" + g.getOrganism().getGenus() +<br />
" " + g.getOrganism().getSpecies());<br />
}<br />
<br />
// Deleting... hmm...<br />
Gene delete_me = ca.fetchGeneByUniqueName("x-ray");<br />
session.delete(delete_me);<br />
<br />
// All Finished<br />
session.getTransaction().commit();<br />
</java><br />
<br />
<br />
<br />
=====What Hibernate Does Well=====<br />
<br />
* Hibernate can be configured to perform specialized functions<br />
** For example, it has its own notion of a cascade<br />
* Flexible with respect to language<br />
** Java, Hibernate Query Language, or SQL<br />
* Any JDBC driver<br />
<br />
=====Acknowledgements=====<br />
<br />
* VectorBase People<br />
** Frank Collins, EO Stinson, Ryan Butler<br />
* GMOD<br />
* NIAID<br />
<br />
===PSU Chado Interface===<br />
<br />
====Background====<br />
<br />
* Source:<br />
* Language: Java<br />
* Authors: Chinmay Patel, Adrian Tivey<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Limitations====<br />
<br />
* Hibernate Save: no equivalent to a ''find or save'' method.<br />
* Not great for bulk data retrieval.<br />
* Hibernate works best when applied to databases designed with objects in mind (''Object Oriented Databases''). <br />
<br />
<br />
====Presentation by Chinmay Patel====<br />
<br />
This Wiki section is an edited version of [[Media:PSU.pdf|Chinmay's presentation]].<br />
<br />
=====GeneDB=====<br />
<br />
* GeneDB is the organism data and annotation database for the Pathogen Sequencing Unit (PSU) at the Sanger Institute, UK<br />
* Contains 37 organisms, which is expected to grow to 62<br />
* Currently migrating to chado schema<br />
* Java API with two engines Hibernate & iBatis<br />
** Two teams, Artemis and GeneDB, took different approaches<br />
<br />
=====Technical - Connections=====<br />
<br />
Connections are configured in the Spring configuration file<br />
<xml><br />
<bean id="dataSource" class="org.apache.commons.dbcp.BasicDataSource"><br />
<property name="driverClassName" value="org.postgresql.Driver" /><br />
<property name="url" value="jdbc:postgresql://holly.sanger.ac.uk:5432/chado" /><br />
<property name="username" value="DELIBERATELY_BOGUS_NAME"/><br />
<property name="password" value="WIBBLE" /><br />
</bean><br />
</xml><br />
* Uses a connection pool<br />
* Connection to the database is specified graphically, so the iBatis configuration file has variables for the location:<br />
<xml><br />
<property name="JDBC.Driver" value="org.postgresql.Driver"/><br />
<br />
<property name="JDBC.ConnectionURL” value="jdbc:postgresql://${chado}"/><br />
<br />
<property name="JDBC.Username" value="${username}"/><br />
<br />
<property name="JDBC.Password" value="${password}"/><br />
</xml><br />
<br />
* provide database location, username & password<br />
* select from scrollable list of feature with residues (organisms in separate Postgres schemas) what to open in Artemis<br />
<br />
=====Technical - Code Generation=====<br />
<br />
* The shared interface and hibernate implementation were originally generated<br />
* There’s no explicit code generation (although the Spring and Hibernate runtimes may use them behinds the scenes)<br />
<br />
=====Technical - Transactions=====<br />
<br />
* Transactions are fully supported<br />
* There’s no explicit code generation (although the Spring and Hibernate runtimes may use them behind the scenes)<br />
<br />
=====Problems 1, 2, & 3=====<br />
<br />
Creating a gene<br />
<java><br />
genes[0] = new Feature(ORG, GENE, "xfile", false, false, now, now);<br />
<br />
genes[0].setSeqLen(1029); <br />
sequenceDao.persist(genes[0]);<br />
<br />
FeatureLoc loc = new FeatureLoc(SOURCE_FEATURE, genes[0], 13691, false, 14720, false, (short)1, 0, 0 ,0);<br />
<br />
sequenceDao.persist(loc);<br />
<br />
addFeatureProp(genes[0], "description", "A test gene for GMOD meeting");<br />
<br />
addSynonymsToFeature(genes[0], "mulder", "scully");<br />
<br />
createExon("exon1", genes[0], 13691, 13767, now, 0);<br />
<br />
createExon("exon2", genes[0], 14687, 14720, now, 1);<br />
</java><br />
<br />
Retrieve a gene<br />
<java><br />
Feature f = sequenceDao.getFeatureByUniqueName("xfile");<br />
displayGene(f);<br />
</java><br />
<br />
Update a gene<br />
<java><br />
genes[0].setUniqueName("x-file");<br />
<br />
sequenceDao.merge(genes[0]);<br />
</java><br />
<br />
=====Demo – Sample Problem=====<br />
<br />
<java><br />
private Feature createExon(String name, Feature gene, int min, int max, Timestamp now, int rank) {<br />
<br />
Feature exon = new Feature(ORG, EXON, name, false, false, now, now);<br />
exon.setSeqLen(max-min);<br />
sequenceDao.persist(exon);<br />
<br />
FeatureLoc loc = new FeatureLoc(SOURCE_FEATURE, exon, min, false, max, false, <br />
(short)1, 0, 0 ,0);<br />
sequenceDao.persist(loc);<br />
<br />
return exon;<br />
<br />
}<br />
</java><br />
<br />
=====Demo – Sample Problem=====<br />
<br />
<xml><br />
<st:section name="Naming" id="gene_naming" collapsed="false" collapsible="false"<br />
hideIfEmpty="true"><br />
<dl><br />
<dt><b>symbol:</b></dt><br />
<dd>${feature.uniqueName}</dd><br />
</dl><br />
<db:synonym name="synonym" var="name" collection="${feature.featureSynonyms}"><br />
<br /><b>Synonym:</b> <db:list-string collection="${name}" /><br />
</db:synonym><br />
<dt><b>Type:</b></dt><br />
<dd>${feature.cvTerm.name}</dd><br />
<br />
<st:section name="Exons" collapsed="false" collapsible="true" hideIfEmpty="true"><br />
<display:table name="exons" uid="tmp" pagesize="30" class="simple" cellspacing="0"<br />
cellpadding="4"><br />
<display:column property="uniqueName" title="Exon"/><br />
<display:column property="featureLocsForSrcFeatureId.fmin" title="Start"/><br />
<display:column property="featureLocsForSrcFeatureId.fmax" title="end"/><br />
</display:table><br />
</st:section><br />
<br />
<st:section name="cds" collapsible="true"><br />
<b>${feature.residues}</b><br />
</st:section><br />
</xml><br />
<br />
Specialized functionality like a cascading delete are handled by the database</div>165.124.152.78http://gmod.org/wiki/GMOD_MiddlewareGMOD Middleware2007-01-26T19:57:30Z<p>165.124.152.78: /* Technical Overview */</p>
<hr />
<div>==Middleware for Chado databases==<br />
<br />
===Authors===<br />
<br />
* Jeff Bowes<br />
* Robert Bruggner<br />
* Scott Cain<br />
* Josh Goodman<br />
* Eric Just<br />
* Sohel Merchant<br />
* Brian O'Connor<br />
* Brian Osborne<br />
* Chinmay Patel<br />
* Pinglei Zhou<br />
<br />
===Middleware Evaluation January 2007===<br />
<br />
A group of some 50 GMOD developers met at the annual meeting to discuss middleware. This one day meeting had the following general goals:<br />
<br />
* To educate GMOD programmers on methods and practices for Middleware<br />
* To facilitate discussion on the best methods<br />
* To guide GMOD to a uniform Middleware layer<br />
* To generate this central reference document for Middleware projects, including:<br />
** Platform information<br />
** Strengths & weaknesses of different Middleware packages<br />
** Specific examples of how one would use a given middleware package<br />
<br />
===Introduction===<br />
<br />
One of the key characteristics of the GMOD software project is the variety of approaches and components that it supports. This applies to applications, database schemas, as well as to middleware, a software ''layer'' that mediates the exchange of data between the applications and the databases. Despite this diversity certain applications and schemas have emerged as key supported components in GMOD, such as the GBrowse application and the Chado schema, to name just two. However, a consensus view has not emerged with respect to middleware, and there are certainly a number of different middleware packages that have been used in the GMOD world, coming from within this world and from the larger world of open source.<br />
<br />
In late 2006 the GMOD developers took note of the large number of middleware packages in use and elected to embark on a short-term study to evaluate and compare these packages. The primary motivation here was to select or recommend certain packages over others specifically within the GMOD context. The assumption is that making such recommendations will serve to focus the developers' effort on a smaller number of packages. Clearly it's also assumed that such a focus will inevitably lead to greater support for and use of those recommended packages, and that all will GMOD will benefit.<br />
<br />
Another purpose of this study is to educate GMOD programmers on best practices concerning the use and development of middleware. It's expected that common agreement on these practices will lead to the development of more effective software as well as the best use of the software in practice. Finally, this study should generate a central reference document on these different middleware packages used in GMOD. This reference will contain platform- and language-specific information as well as descriptions of the strengths and weaknesses of the packages that can be used by GMOD developers when considering middleware.<br />
<br />
====General Evaluation Criteria====<br />
<br />
The GMOD developers proposed that each presenter provide some basic information about each middleware package, both general and technical. In addition each middleware application was asked to address a set of sample problems, shown below. These example problems are thought to typify some of the common functions that the scientist may need when working with their own database. It was understood that not all software would be able to handle all aspects of the sample problems and this demonstration was not intended to be ''live''.<br />
<br />
=====Problem 1=====<br />
<br />
Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned <code>feature_id</code> for each inserted gene.<br />
<br />
Note:<br />
<br />
* The coordinates are given in exact coordinates.<br />
* Use the <code>organism_id</code> for your organism<br />
* Store description in the chado table <code>featureprop</code><br />
* A sequence in fasta format (see [[FakeChromosome|Fake Chromosome]]) should be loaded as genomic sequence, either chromosome or contig -- this will be used as a <code>srcfeature</code> in <code>featureloc</code><br />
<br />
Gene descriptions:<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
mRNA Feature<br />
exon_1: <br />
start: 13691<br />
end: 13767<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
exon_2: <br />
start: 14687<br />
end: 14720<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
symbol: x-men<br />
synonyms: wolverine<br />
mRNA Feature<br />
exon_1: <br />
start: 12648<br />
end: 13136<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
symbol: x-ray<br />
synonyms: none<br />
exon: <br />
start: 1703<br />
end: 1900<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 2=====<br />
<br />
Retrieve and print the following report for gene '''xfile''' (the coding sequence and exon coordinates are derived from the associated mRNA feature). The results should resemble the following:<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>xfile cds <br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
=====Problem 3=====<br />
<br />
Update the gene '''xfile''': change the name symbol to '''x-file''' and retrieve the changed record. Regenerate the report from Problem 1. The results should resemble the following:<br />
<br />
symbol: x-file<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>x-file cds<br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
=====Problem 4=====<br />
<br />
Search for all genes with symbols starting with ''x-''. With the results produce the following simple result list<br />
(organism will vary):<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
1325 x-ray Xenopus laevis<br />
<br />
=====Problem 5=====<br />
<br />
Delete the gene '''x-ray''' using the <code>geneId</code>. Run the search and report in Problem 4 again to show the delete has taken<br />
place, with a result resembling the following:<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
<br />
===Conclusions===<br />
<br />
The one day meeting heard presentations from developers using both Perl and Java middleware and a number of satisfactory solutions were described. The focus in all cases was some sort of system that connected to the Chado relational database. Although other databases are encountered in the GMOD world (e.g. BioSQL) the Chado schema is popular and serves as a good test schema for this exercise given its complexity. The primary focus in the talks was on functionality from the perspective of writing code and extending the software and less attention was given to performance. Each presenter focussed on their middleware and little side-by-side comparisons were made (for one comparison please see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]] by Josh Goodman).<br />
<br />
====Problem Assignments====<br />
<br />
All presenters paid attention to the assigned problems and all packages could perform the required operations, except for the GBrowse (DasI) Adaptor which is read-only software. Clearly one can see many differences between packages in how the the problems were solved, please see the presentations themselves for these details. <br />
<br />
The Perl approaches used only the Perl language whereas the Java packages all used Java plus XML, to some degree. In addition iBatis exposes SQL to the developer and it was argued that this could be viewed either as an advantage (allows tuning of underlying SQL) or disadvantage.<br />
<br />
====Java Middleware====<br />
<br />
* Flybase examined iBatis and Hibernate, both use XML configuration files<br />
** Hibernate is better if you're building schema from scratch<br />
** Both auto-configure given a schema. <br />
** Both have strengths and weaknesses.<br />
* Is Hibernate better when you're in the process of designing a schema?<br />
** Hibernate can assist you in making a ''Hibernate-compatible'' schema.<br />
<br />
=====Abstraction=====<br />
<br />
<br />
=====Performance=====<br />
<br />
No pairwise comparisons.<br />
<br />
=====Configuration & Autoconfiguration=====<br />
<br />
<br />
=====Documentation=====<br />
<br />
<br />
====Perl Middleware====<br />
<br />
<br />
=====Abstraction=====<br />
<br />
Modware has Higher level abstraction than that provided by Chado::AutoDBI<br />
<br />
=====Performance=====<br />
<br />
No pairwise comparisons.<br />
<br />
=====Configuration & Autoconfiguration=====<br />
<br />
<br />
=====Documentation=====<br />
<br />
DasI interface is well-documented, about a dozen methods and three classes, all documented.<br />
<br />
===Object-Relational Mapping Principles===<br />
<br />
====Presentation by Sohel Merchant====<br />
<br />
Sohel Merchant, Bioinformatics Software Engineer at dictyBase, Center for Genetic Medicine, Northwestern University, Chicago. This Wiki section is an edited version of [[Media:Object_Relational_Mapping_Layer.pdf|Sohel's presentation]].<br />
<br />
=====Outline=====<br />
<br />
* The Problem<br />
* Solutions<br />
* ORM<br />
* Perl – Class::DBI<br />
* Summary<br />
<br />
=====The Problem=====<br />
<br />
* Developers need to perform Create, Retrieve, Update, Delete (aka CRUD) operations on data inside an application.<br />
* The real world objects represented using a programming language needs to be stored in databases<br />
* Using relational databases to store object-oriented data leads to a semantic gap<br />
* RDBMS have fixed types, but OO can have more complicated user defined types.<br />
<br />
=====Solutions=====<br />
<br />
* Data Access Object (DAO)<br />
* Developer writes a class which contains one attribute for each field in the table<br />
* Methods for CRUD typically contains JDBC/DBI code with the necessary SQL statements.<br />
* Object Relational Mapping (ORM), WikiPedia:<br />
** “ORM is a programming technique that links databases to object-oriented language concepts, creating (in effect) a virtual object database.“<br />
* Developer needs to configure the ORM<br />
* Less amount of manual coding<br />
* CRUD methods are automatically generated by the ORM layer<br />
<br />
=====ORM=====<br />
<br />
ORM solutions<br />
<br />
* Perl<br />
** Class::DBI<br />
* Java<br />
** EJB<br />
** Hibernate<br />
** JDO<br />
** iBatis<br />
<br />
=====Perl - Class::DBI=====<br />
<br />
* Provides a simple interfaces for wrapping Perl classes around a database tables<br />
* Tables are mapped directly to objects<br />
* The table column name are mapped to the get/set methods<br />
* Can be used with transactions<br />
<br />
=====Class::DBI=====<br />
<br />
Defining a class in Class::DBI to represent a table:<br />
<br />
CVTERM<br />
cvterm_id<br />
cv_id<br />
name<br />
definition<br />
dbxref_id<br />
<br />
Corresponding code:<br />
<br />
<perl><br />
package Chado::Cvterm;<br />
use base 'Chado::DBI';<br />
Chado::Cvterm->set_up_table('Cvterm');<br />
</perl><br />
<br />
=====Class::DBI - CRUD=====<br />
<br />
<perl><br />
## Create<br />
$term_dbobj = Chado::Cvterm->create({<br />
name => ”DUMMY TERM”,<br />
cv_id => 1,<br />
dbxref_id => 125<br />
});<br />
## Retrieve<br />
$term_dbobj = Chado::Cvterm->retrieve(2);<br />
<br />
## Update<br />
$term_dbobj->name( $term->name() );<br />
$term_dbobj->definition( $term->definition );<br />
<br />
## Delete<br />
$term_dbobj->delete();<br />
</perl><br />
<br />
=====Java - Hibernate=====<br />
<br />
* Hibernate maps Java Objects directly to database tables<br />
* Scalable<br />
* Works well for controlled Data model<br />
<br />
=====Java - iBatis=====<br />
<br />
* iBATIS maps Java Objects to the results of SQL Queries<br />
* XML definitions for queries<br />
* Queries and managing Maps<br />
* Transactions<br />
* Good fit for existing database schema<br />
<br />
=====Summary=====<br />
<br />
* ORM provides painless roundtrip of data between the application and database.<br />
* Reduces the amount of SQL code and allows a programmatic style interface to the RDBMS<br />
* Choice of ORM solution depends on the type of project<br />
* Flybase examined iBatis and Hibernate, both use XML configuration files<br />
** Hibernate is better if you're building schema from scratch<br />
** Both auto-configure given a schema. <br />
** Both have strengths and weaknesses.<br />
* Is Hibernate better when you're in the process of designing a schema?<br />
** Hibernate can assist you in making a ''Hibernate-compatible'' schema.<br />
<br />
===XORT===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/project/showfiles.php?group_id=27707<br />
* Language: Perl<br />
* Authors: Pinglei Zhou<br />
* Users: Flybase<br />
* Support: Pinglei Zhou<br />
* Third party code: Uses XML::Parser::PerlSAX, Unicode::String, XML::DOM, and DBI from Perl (CPAN)<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
=====Comparing Hibernate & XORT=====<br />
<br />
Flybase tried Hibernate, but just creating simple print() statements in the course of doing bulk operations they<br />
encountered performance issues. Therer are many caching parameters available in Hibernate but the problem is that Chado is recursive or cyclical. <br />
XORT does some simple, and automatic, caching. With XORT you can handle recursive or cyclical operations more easily. In common operations such as merging genes Chado users will encounter this issue routinely.<br />
<br />
Also see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]].<br />
<br />
====Limitations====<br />
<br />
* Database schemas need to follow certain rules<br />
** All must have internal int primary key<br />
** All must have unique key(s)<br />
* It may take a long path to retrieve certain type of data<br />
** Example: gene->allele->genotype->phenotype via feature_relationship<br />
* Structure not stored in memory, you flush out data as it goes<br />
<br />
====Presentation by Pinglei Zhou and Josh Goodman====<br />
<br />
This Wiki section is an edited version of [[Media:XORT.pdf|Josh and Pinglei's presentation]].<br />
<br />
<br />
<br />
=====Introduction=====<br />
<br />
* An XML-database mapping system for data exchange between DB and XML-driven application<br />
* XORT can handle typical XML, it's not Chado-specific<br />
* Developed/Supported by Pinglei Zhou at FlyBase Harvard, 0.007 version now.<br />
* Used at all FlyBase sites<br />
** Harvard has extensive library of Perl modules for generating ChadoXML<br />
* Written in Perl<br />
* Required perl modules:<br />
** XML::Parser::PerlSAX<br />
** Unicode::String<br />
** XML::DOM<br />
** DBI<br />
<br />
=====Chado XML=====<br />
<br />
* Is ChadoXML necessary? No, but it may help you.<br />
* ChadoXML assists with incremental updates, if you want to avoid flush-and-reload.<br />
* While update can be achived by other middleware (for example, perl Class::DBI, Java Hibernate), ChadoXML provide additional feature as way to archive your transaction.<br />
* It provides bulk update/download which other methods lack or is inefficient<br />
<br />
=====Components=====<br />
<br />
* Database & Schema<br />
* ChadoXML Specification<br />
* DumpSpec<br />
** DumpSpec files are simple XML files that tells XORT what to do<br />
** DumpSpec files are ''language independent'', being XML<br />
** It's fairly easy for those who know the schema to read these files and understand what the operation is<br />
<br />
=====Highlights of Chado XML Specification=====<br />
<br />
* Unique represent of specific database schema<br />
* Get away with those internal primary key value<br />
* Static vs. Operational<br />
* Encoding for non-ASCII characters<br />
* Macro mechanism (object reference)<br />
<br />
=====Putting it together: New FlyBase dataflow Part 1=====<br />
<br />
There are three Flybase sites, and most curation is done at Harvard and<br />
Cambridge. Proforma is the curation format at Cambridge and Harvard, but<br />
Harvard also curates with Apollo and ChadoXML.<br />
<br />
Once in Chado, the reporting instance, there's a denormalization step<br />
in moving data to a read-only database. Once in the read-only database there are<br />
dumps, for reporting purposes, using XORT to create ChadoXML. Once<br />
ChadoXML is created version 2 of XSLT is used to create HTML and GFF. HTML reports<br />
are for human-readable reports, GFF for GBrowse and for various power<br />
users.<br />
<br />
1.a. Proforma (FlyBase Cambridge) is converted to ChadoXML<br />
<br />
1.b. ChadoXML is created by Apollo (Harvard)<br />
<br />
1.c. ChadoXML is created by Java SEAN (Harvard)<br />
<br />
2. All ChadoXML is loaded into Chado by XORT<br />
<br />
=====Putting it together: New FlyBase dataflow Part 2=====<br />
<br />
3. Chado (Harvard) is denormalized and loaded into Chado (Indiana)<br />
<br />
4. ChadoXML is created from Chado using XORT<br />
<br />
5.a. GFF and Fasta is created from ChadoXML<br />
<br />
5.b. HTML is created from Chado XML<br />
<br />
=====Data & Report Generation=====<br />
<br />
* Content of all output files is controlled by XML dumpspecs.<br />
** Dumpspecs are language independent.<br />
** Easily readable (with knowledge of Chado structure).<br />
* All XML transformation steps are done with XSLT v2.<br />
** Saxon XSLT (http://saxon.sourceforge.net/)<br />
** ChadoXML is split into individual chunks before XSLT processing to accommodate large file sizes.<br />
** Extremely fast. We can process all data for ~60,000 Drosophila genes in under 30 minutes.<br />
<br />
=====Hibernate & XORT=====<br />
<br />
* Hibernate didn't scale well when dealing with 5,000+ features in bulk.<br />
** The test was simply calling <code>print()</code> statements<br />
* Performance tweaks for Hibernate can be quite complicated to setup for bulk operations.<br />
* XORT is currently handling ~6 million features in production with only minor performance problems.<br />
* XORT is much more language independent.<br />
<br />
=====Support for complex transactions using XORT=====<br />
<br />
For example:<br />
<br />
* Find all records linked to a record using dumpspec<br />
* Merge gene x into y, each with thousands of records attached<br />
<br />
Step 1. Dump all data use simple dumpspec<br />
<xml><br />
<chado><br />
<feature dump=“all”><br />
<uniquename test=“eq”>x</uniquename><br />
</feature><br />
</chado><br />
</xml><br />
Step 2 Delete feature x from DB, with triggers to clean orphan records, if necessary<br />
<br />
Step 3. Edit the output xml, change uniquename x to y, then load the edited file back to DB<br />
<br />
=====CHIA (Chado Interface Application)=====<br />
<br />
A Java application that organizes SQL and XORT functionality for internal users, e.g.:<br />
<br />
* Dump chado-XML for gene regions for Apollo curation<br />
* Organize and execute “canned” SQL queries<br />
* Serve IDs for curators (in development)<br />
* Dynamic browser Chado without writing SQL statement<br />
<br />
CHIA is being designed to be extensible for adding new functionality as needed.<br />
<br />
<br />
=====Documentation=====<br />
<br />
* ''Using Chado to Store Genome Annotation Data"<br />
** Current Protocols in Bioinformatics (Baxevanis, A.D., and Davison, D.B., eds) 2, 9.6.1-9.6.28.<br />
* XORT specification docs<br />
* XORT draft (unpublished)<br />
* GMOD case demo procedure<br />
** All in the doc directory of XORT package, http://www.gmod.org<br />
<br />
=====Acknowledgements=====<br />
<br />
* Willian Gelbart <br />
* Chris Mungall<br />
* David Emmert <br />
* Mark Gibson<br />
* Stan Letovsky <br />
* Nomi Harris<br />
* Frank Smutniak <br />
* Suzanna Lewis<br />
* Peili Zhang <br />
* Stan Letovsky<br />
* Haiyan Zhang <br />
* Aubrey de Grey<br />
* Andy Schroeder <br />
* Don Gilbert<br />
* Susan Russo<br />
* Mark Zythovicz <br />
* Scott Cain<br />
* Lincoln Stein<br />
* Victor Strelets<br />
* Robert Wilson<br />
* Paul Leyland<br />
<br />
===Chado::AutoDBI===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/projects/gmod/<br />
* Language: Perl<br />
* Authors: Allen Day, Scott Cain, Brian O'Connor, & others<br />
* Users: <br />
* Support:<br />
* Third party code: Based on Class::DBI by Michael Schwern & Tony Bowden<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Performance<br />
** Can one read thousands of objects into memory? You could do this but it's not suited to bulk operations<br />
* Joins & complex queries<br />
<br />
<perl><br />
# Add the add_constructor for looking for name lengths<br />
<br />
__PACKAGE__ ->add_constructor(long_names => qq{ length(name) > 15 });<br />
<br />
# Custom SQL<br />
<br />
__PACKAGE__->set_sql(xfiles => qq{<br />
SELECT FEATURE_ID<br />
FROM FEATURE<br />
WHERE NAME = 'xfiles' });<br />
</perl><br />
<br />
====Presentation by Brian O'Connor====<br />
<br />
This Wiki section is an edited version of [[Media:AutoDBI.pdf|Brian's presentation]].<br />
<br />
=====Relation to Turnkey=====<br />
<br />
Turnkey is a package that auto-generates Web sites given a relational<br />
schema, based on SQL::Translator<br />
<br />
* Turnkey authors: Allen Day, Scott Cain, Brian O'Connor<br />
* Turnkey and Chado::AutoDBI objects are essentially the same<br />
<br />
=====Technical Overview=====<br />
<br />
* Code Generation<br />
<br />
=====Project Overview=====<br />
<br />
Convert SQL Queries/Inserts/Deletes -> Object Calls<br />
<sql><br />
INSERT INTO feature (organism_id, name)<br />
VALUES (1, 'foo');<br />
</sql><br />
To:<br />
<perl><br />
my $feature = Turnkey::Model::Feature->find_or_create({<br />
organism_id => $organism,<br />
name => 'xfile', uniquename => 'xfile',<br />
type_id => $mrna_cvterm,<br />
is_analysis => 'f', is_obsolete => 'f'<br />
});<br />
</perl><br />
<br />
=====Technical Overview=====<br />
<br />
* Database connection: use a base class<br />
* Set up base object and connect, then create a ''table object'' to access primary key. <br />
* Class::DBI can find and insert records into other table, based on foreign key.<br />
<br />
<perl><br />
use base qw(Class::DBI::Pg);<br />
<br />
my ($dsn, $name, $pass);<br />
$dsn = "dbi:Pg:host=localhost;dbname=chado;port=5432";<br />
$name = "postgres";<br />
$pass = "";<br />
<br />
Turnkey::Model::DBI->set_db('Main', $dsn, $name, $pass, {AutoCommit => 1});<br />
</perl><br />
<br />
=====Technical Overview=====<br />
<br />
* Basic ORM Object: Feature<br />
<br />
<perl><br />
package Turnkey::Model::Feature;<br />
use base 'Turnkey::Model::DBI';<br />
<br />
Turnkey::Model::Feature->set_up_table('feature');<br />
<br />
#<br />
# Primary key accessors<br />
#<br />
<br />
sub id { shift->feature_id }<br />
sub feature { shift->feature_id }<br />
</perl><br />
<br />
* data field accessors by Class::Accessor<br />
<br />
=====Technical Overview=====<br />
<br />
* Basic ORM Object: Feature<br />
** has_a<br />
<br />
<perl><br />
#<br />
# has_a<br />
#<br />
Turnkey::Model::Feature->has_a( type_id => "Turnkey::Model::Cvterm" );<br />
sub cvterm { return shift->type_id; }<br />
</perl><br />
<br />
* Basic ORM Object: Feature<br />
** has_many<br />
<br />
<perl><br />
#<br />
# has_many<br />
#<br />
Turnkey::Model::Feature->has_many('feature_synonym_feature_id', <br />
'Turnkey::Model::Feature_Synonym' => 'feature_id');<br />
sub feature_synonyms { return shift->feature_synonym_feature_id; }<br />
<br />
Turnkey::Model::Feature->has_many('featureprop_feature_id', <br />
'Turnkey::Model::Featureprop' => 'feature_id');<br />
sub featureprops { return shift->featureprop_feature_id; }<br />
</perl><br />
<br />
* Can traverse tables, such as going from FEATURE to FEATUREPROP <br />
** Tell base object that the ''table object'' has_a() or has_many() keys corresponding to some key in other ''table object'' <br />
<br />
=====Technical Overview=====<br />
<br />
* Basic ORM Object: Feature<br />
** skipping linker tables for has_many<br />
<br />
<perl><br />
# skip over feature_synonym table<br />
#<br />
# method 1<br />
#<br />
sub synonyms { my $self = shift; return map $_->synonym_id, $self->feature_synonyms; }<br />
#<br />
# method 2<br />
#<br />
Turnkey::Model::Feature->has_many( synonyms2 =><br />
['Turnkey::Model::Feature_Synonym' => 'synonym_id']);<br />
</perl><br />
<br />
=====Technical Overview=====<br />
<br />
* Transactions<br />
** Chado::AutoDBI supports transactions, and one can wrap the transaction in an eval()<br />
<perl><br />
sub do_transaction {<br />
my $class = shift;<br />
my ( $code ) = @_;<br />
# Turn off AutoCommit for this scope.<br />
# A commit will occur at the exit of this block automatically,<br />
# when the local AutoCommit goes out of scope.<br />
local $class->db_Main->{ AutoCommit };<br />
<br />
# Execute the required code inside the transaction.<br />
eval { $code->() };<br />
if ( $@ ) {<br />
my $commit_error = $@;<br />
eval { $class->dbi_rollback }; # might also die!<br />
die $commit_error;<br />
}<br />
}<br />
</perl><br />
<br />
=====Technical Overview=====<br />
<br />
* Lazy Loading<br />
** One can either do automated creation of objects or explicitly dictate which fields are incorporated into object<br />
<perl><br />
Turnkey::Model::Feature->columns( Primary => qw/feature_id/ );<br />
Turnkey::Model::Feature->columns( Essential => qw/name organism_id type_id/ );<br />
Turnkey::Model::Feature->columns( Others => qw/residues .../ );<br />
</perl><br />
<br />
Typically:<br />
<br />
<perl><br />
Turnkey::Model::Feature->set_up_table('feature');<br />
</perl><br />
<br />
=====Problem 1=====<br />
<br />
* Create Feature & Add Description<br />
<br />
<perl><br />
# now create mRNA feature<br />
<br />
my $feature = Turnkey::Model::Feature->find_or_create({<br />
organism_id => $organism,<br />
name => 'xfile', uniquename => 'xfile',<br />
type_id => $mrna_cvterm,<br />
is_analysis => 'f', is_obsolete => 'f'<br />
});<br />
<br />
# create description<br />
<br />
my $featureprop = Turnkey::Model::Featureprop->find_or_create({<br />
value => 'A test gene for GMOD meeting',<br />
feature_id => $feature,<br />
type_id => $note_cvterm,<br />
});<br />
</perl><br />
<br />
=====Problem 2=====<br />
<br />
* Retrieve a Feature via Searching<br />
** Search using strings or identifiers, a search will return an iterator object<br />
<br />
<perl><br />
# objects for global use<br />
<br />
# the organism for our new feature<br />
my $organism = Turnkey::Model::Organism->search(abbreviation => "S.cerevisiae")->next;<br />
<br />
# the cvterm for a "Note"<br />
my $note_cvterm = Turnkey::Model::Cvterm->retrieve(2);<br />
<br />
# searching name by wildcard<br />
<br />
my @results = Turnkey::Model::Feature->search_like(name => 'x-%');<br />
</perl><br />
<br />
=====Problems 3, 4, & 5=====<br />
<br />
* Update a Feature<br />
<br />
<perl><br />
# update the xfile gene name<br />
<br />
$feature->name("x-file");<br />
$feature->update();<br />
</perl><br />
<br />
* Delete a Feature<br />
<br />
<perl><br />
# now delete the x-file feature<br />
<br />
$feature->delete();<br />
</perl><br />
<br />
=====Things Chado::AutoDBI does well=====<br />
<br />
* Easy to use<br />
* Easy to port<br />
* Use with other DBs<br />
** Both Oracle and Postgres used currently<br />
* Autogenerated via Turnkey<br />
* find_or_create method<br />
* Performance is not as bad as you might guess<br />
** Due to Lazy loading<br />
** Even whole genome operations are feasible<br />
<br />
Note that speed is relative: one can find bad performance using the wrong SQL and Chado::AutoDBI approach will be speedier.<br />
<br />
<br />
=====For More Information=====<br />
<br />
* Class::DBI<br />
** http://www.class-dbi.com<br />
** http://search.cpan.org<br />
<br />
* Turnkey<br />
** http://turnkey.sf.net<br />
<br />
* Biopackages<br />
** http://biopackages.net<br />
<br />
===Modware===<br />
<br />
====Background====<br />
<br />
* Source: http://gmod-ware.sourceforge.net<br />
* Language: Perl<br />
* Authors: Sohel Merchant, Eric Just<br />
* Contact: e-just [at] northwestern.edu<br />
* Users: DictyBase<br />
* Support: <br />
* Third party code: GMOD, BioPerl<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity: Uses Chado::AutoDBI to connect. Connection is configured on GMOD install.<br />
* Transaction support: See Chado::AutoDBI talk.<br />
* Code generation: No automatic code generation<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Does not ''cover'' all of Chado<br />
* Not enough users to get quality feedback yet<br />
* Performance (?)<br />
* Language dependent<br />
<br />
====Presentation by Eric Just====<br />
<br />
Eric Just, Senior Bioinformatics Scientist, dictyBase: http://dictybase.org<br />
Center for Genetic Medicine, Northwestern University. This is an edited version of [[Media:Modware.pdf|Eric's presentation]].<br />
<br />
=====Why Modware Was Developed=====<br />
<br />
* Each feature type requires different behavior<br />
* Want to leave schema semantics out of application<br />
* Want to leverage work done in BioPerl<br />
* Re-use code developed for common use cases<br />
* DictyBase is using a superset of Modware<br />
** Some DictyBase-specific code used there<br />
<br />
=====What is in the Feature Table?=====<br />
<br />
The core of Chado<br />
<br />
* Chromosome<br />
* Contig<br />
* Gene<br />
* mRNA<br />
* Exon<br />
* Lots of other things - See Sequence Ontology!<br />
<br />
=====Modware Features=====<br />
<br />
* Multiple Feature classes<br />
** CHROMOSOME, GENE, MRNA, CONTIG<br />
* Each class provides type specific methods<br />
* Logic such as building exon structure of mRNA features is encapsulated<br />
* Parent class Modware::Feature<br />
** Provides common methods<br />
** Abstract factory for various feature types<br />
* Lazy : information is only retrieved when you ask for it, but cached for <br />
speedy retrieval the next time it is required<br />
* Uses Bioperl and its objects<br />
** Common methods such as name(), primary_id(), external_ids() <br />
* Subclasses provide type-specific methods<br />
** For example, Chromosome isn't the same as Gene which isn't the same as ...<br />
* Any feature type not explicitly supported in Modware::Feature class is blessed <br />
as a Modware::Feature::GENERIC class which has a start/stop coordinate on a genomic <br />
sequence feature (no structure like a trasncript with exons)<br />
<br />
=====Architectural Overview=====<br />
<br />
* Object-oriented Perl interface to Chado<br />
* Built on top of Chado::AutoDBI<br />
* Connection handled by GMOD<br />
* Database transactions supported<br />
* BioPerl used to represent and manipulate sequence and feature structure<br />
* ‘Lazy’ evaluation<br />
<br />
=====Create and Insert Chromosome=====<br />
<br />
<perl><br />
my $seq_io = new Bio::SeqIO(<br />
-file => "../data/fake_chromosome.txt",<br />
-format => 'fasta'<br />
);<br />
<br />
# Bio::SeqIO will return a Bio::Seq object which<br />
# Modware uses as its representation<br />
my $seq = $seq_io->next_seq();<br />
<br />
my $reference_feature = new Modware::Feature(<br />
-type => 'chromosome',<br />
-bioperl => $seq,<br />
-description => "This is a test",<br />
-name => 'Fake',<br />
-source => 'GMOD 2007 Demo'<br />
);<br />
<br />
# Inserts chromosome into database<br />
$reference_feature->insert();<br />
</perl><br />
<br />
<br />
=====Problem 1 - Create and Insert a Gene=====<br />
<br />
1) Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned feature_id for each inserted gene.<br />
<br />
Gene Feature<br />
symbol: x-ray<br />
synonyms: none<br />
mRNA Feature<br />
exon:<br />
start: 1703<br />
end: 1900<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 1 - Create and Insert a Gene=====<br />
<br />
1) Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned feature_id for each inserted gene.<br />
<br />
Gene Feature<br />
symbol: x-men<br />
synonyms: wolverine<br />
mRNA Feature<br />
exon_1: <br />
start: 12648<br />
end: 13136<br />
strand: 1<br />
srcFeature_id: <br />
Id of genomic sample<br />
<br />
=====Problem 1 - Create and Insert a Gene=====<br />
<br />
1) Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned feature_id for each inserted gene.<br />
<br />
Gene Feature<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
mRNA Feature<br />
exon_1: <br />
start: 13691<br />
end: 13767<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
exon_2: <br />
start: 14687<br />
end: 14720<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 1 - Create and Insert a Gene=====<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
…<br />
<br />
<perl><br />
my $gene_feature = new Modware::Feature(<br />
-type => 'gene',<br />
-name => 'xfile',<br />
-description => 'A test gene for GMOD meeting',<br />
-source => 'GMOD 2007 Demo‘<br />
);<br />
<br />
$gene_feature->add_synonym( 'mulder' );<br />
$gene_feature->add_synonym( 'scully' );<br />
<br />
# inserts object into database<br />
$gene_feature->insert();<br />
print 'Inserted gene with feature_id:'.$gene_feature->feature_id()."\n";<br />
</perl><br />
<br />
=====Problem 1 - Create mRNA BioPerl Object=====<br />
<br />
exon_1: exon_2: <br />
start: 13691 start: 14687<br />
end: 13767 end: 14720<br />
strand: 1 strand: 1<br />
srcFeature_id: Id of genomic sample srcFeature_id: Id of genomic sample<br />
<br />
<perl><br />
# First, create exon features (using Bioperl)<br />
my $exon_1 = new Bio::SeqFeature::Gene::Exon (<br />
-start => 13691,<br />
-end => 13767,<br />
-strand => 1,<br />
-is_coding => 1<br />
);<br />
<br />
my $exon_2 = new Bio::SeqFeature::Gene::Exon (<br />
-start => 14687,<br />
-end => 14720,<br />
-strand => 1,<br />
-is_coding => 1<br />
);<br />
<br />
# Next, create transcript feature to 'hold' exons (using Bioperl)<br />
my $bioperl_mrna = new Bio::SeqFeature::Gene::Transcript();<br />
<br />
# Add exons to transcript (using Bioperl)<br />
$bioperl_mrna->add_exon( $exon_1 );<br />
$bioperl_mrna->add_exon( $exon_2 );<br />
</perl><br />
<br />
=====Problem 1 - Create and Insert mRNA=====<br />
<br />
The BioPerl object holds the location information, but now we want to create a Modware object and link it to the gene as well as locate it on the chromosome.<br />
<br />
<perl><br />
# Now create Modware Feature to 'hold' bioperl object<br />
my $mrna_feature = new Modware::Feature(<br />
-type => 'mRNA',<br />
-bioperl => $bioperl_mrna,<br />
-source => 'GMOD 2007 Demo',<br />
-reference_feature => $reference_feature<br />
);<br />
<br />
# Associate mRNA to gene (required for insertion)<br />
$mrna_feature->gene( $gene_feature );<br />
<br />
# inserts object into database<br />
$mrna_feature->insert();<br />
</perl><br />
<br />
=====Problem 2 - Writing the Report=====<br />
<br />
2) Retrieve and print the following report for gene xfile<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>xfile cds<br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
Create new package, GMODWriter, to write the report, this package<br />
uses Modware and Bioperl methods.<br />
<br />
<perl><br />
use Modware::Gene;<br />
use GMODWriter;<br />
<br />
my $xfile_gene = new Modware::Gene( -name => 'xfile' );<br />
GMODWriter->Write_gene_report( $xfile_gene );<br />
</perl><br />
<br />
* What's the difference between Modware::Gene and Modware::Feature? Gene is-a Feature.<br />
<br />
=====Problem 2 - Writing the Report=====<br />
<br />
2) Retrieve and print the following report for gene xfile<br />
<br />
* The mRNA object contains the Bioperl object<br />
** Why not just subclass? More flexibility the way shown here<br />
<br />
<perl><br />
package GMODWriter; <br />
sub Write_gene_report {<br />
my ($self, $gene) = @_;<br />
my $symbol = $gene->name();<br />
<br />
my @synonyms = @{ $gene->synonyms() };<br />
my $syn_string = join ",", @synonyms;<br />
my $description = $gene->description();<br />
my $type = $gene->type();<br />
# get features associated with the gene that are of type 'mRNA'<br />
my ($mrna) = grep { $_->type() eq 'mRNA' } @{ $gene->features() };<br />
# use bioperl method to get exons from mRNA<br />
my @exons = $mrna->bioperl->exons_ordered();<br />
# Modware will return a nice fasta file for you.<br />
my $fasta = $mrna->sequence( -type => 'cds', -format => 'fasta' );<br />
# Now print the actual report<br />
print "symbol: $symbol\n";<br />
print "synonyms: $syn_string\n";<br />
print "description: $description\n";<br />
print "type: $type\n";<br />
<br />
my $count = 0;<br />
foreach my $exon (@exons ) {<br />
$count++;<br />
print "exon${count} start: ".$exon->start()."\n";<br />
<br />
print "exon${count} end: ".$exon->end()."\n";<br />
<br />
}<br />
print "$fasta";<br />
}<br />
. . .<br />
</perl><br />
<br />
=====Problem 3 - Updating a Gene Name=====<br />
<br />
3) Update the gene xfile: change the name symbol to x-file and retrieve the changed record. Regenerate gene report<br />
<br />
<perl><br />
use Modware::Gene;<br />
use Modware::DBH;<br />
use GMODWriter;<br />
<br />
eval{<br />
<br />
# get xfile gene<br />
my $xfile_gene = new Modware::Gene( -name => 'xfile' );<br />
<br />
# change the name<br />
$xfile_gene->name( 'x-file' );<br />
# write changes to database<br />
$xfile_gene->update();<br />
<br />
# we can use the original object if we want, but instead<br />
# we refetch from the database to 'prove' the name has been changed<br />
my $xfile_gene2 = new Modware::Gene( -name => 'x-file' );<br />
# use our GMODWriter package to write report for x-file<br />
GMODWriter->Write_gene_report( $xfile_gene2 );<br />
<br />
};<br />
if ($@){<br />
warn $@;<br />
new Modware::DBH->rollback();<br />
}<br />
</perl><br />
<br />
<br />
=====Problem 4 - Search and Display Results=====<br />
<br />
4) Search for all genes with symbols starting with "x-*". With the results produce the following simple result list (organism will vary):<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
1325 x-ray Xenopus laevis<br />
<br />
<perl><br />
use Modware::Gene;<br />
use Modware::DBH;<br />
use GMODWriter;<br />
<br />
# find genes starting with 'x-'<br />
my $results = Modware::Search::Gene->Search_by_name( 'x-*' );<br />
<br />
# write the search results<br />
GMODWriter->Write_search_results( $results )<br />
</perl><br />
<br />
<br />
=====Problem 4 - Search and Display Results=====<br />
<br />
4) Search for all genes with symbols starting with "x-*". With the results produce the following simple result list (organism will vary):<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
1325 x-ray Xenopus laevis<br />
<br />
<br />
<perl><br />
sub Write_search_results {<br />
my ($self, $itr) = @_;<br />
# loop through iterator<br />
while (my $gene = $itr->next) {<br />
# print the requested information<br />
print $gene->feature_id . "\t" . $gene->name .<br />
"\t" . $gene->organism_name . "\n";<br />
}<br />
}<br />
</perl><br />
<br />
=====Problem 5 - Delete a Gene=====<br />
<br />
5) Delete the gene x-ray. Run the search and report again.<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
<br />
<perl><br />
# get the xray gene<br />
my $xray = new Modware::Gene( -name => 'x-ray' );<br />
<br />
# set is_deleted = 1, this will 'hide' the gene from searches,<br />
# also sets the is_available to 0, the gene is no longer visible<br />
# to a search.<br />
<br />
$xray->is_deleted(1);<br />
<br />
# write change to database<br />
$xray->update();<br />
<br />
# find genes starting with 'x-'<br />
my $results = Modware::Search::Gene->Search_by_name( 'x-*' );<br />
<br />
# write the search results<br />
GMODWriter->Write_search_results( $results )<br />
</perl><br />
<br />
<br />
=====Other Modware Highlights=====<br />
<br />
* Easy to write applications with Modware<br />
* Extensible<br />
* Available through Sourceforge<br />
** http://gmod-ware.sourceforge.net<br />
* Easy to install<br />
* Large unit test coverage<br />
* Current release 0.2-RC1<br />
** Works with GMOD’s latest release<br />
* Sample script demoed here are available<br />
** sample_scripts directory<br />
<br />
=====Other Nice Things About Modware=====<br />
<br />
<br />
* Bioperl-style documentation <br />
** http://gmod-ware.sourceforge.net/doc/<br />
** POD for all methods<br />
* If Chado changes then...<br />
** Manually change Modware or ... <br />
** AutoDBI will automatically adjust to the change, depends on the change<br />
* Can set multiple connections through AutoDBI's <code>set_connection</code><br />
<br />
=====Coming Attractions=====<br />
<br />
* Support for changing genomic sequence<br />
* ncRNAs<br />
* UTRs<br />
* Onotology modules<br />
* Phenotype Annotations<br />
* Getting a new database handle returns the existing<br />
** Thinking about configuring modules to set what database handle can be used<br />
* Pass an argument ''type'' to the Gene's feature() method<br />
* Type the kind of synonym is being inserted?<br />
** Possible: trade-off between simplicity and functionality<br />
<br />
* Send us your ideas!<br />
<br />
<br />
=====Discussion=====<br />
<br />
* How hard is it to extend Modware?<br />
** Not known absolutely, but generally thought to be not difficult<br />
<br />
=====Acknowlegments=====<br />
<br />
* Rex Chisholm, PhD<br />
* Warren Kibbe, PhD <br />
* Scott Cain<br />
* Brian O’connor<br />
* Sohel Merchant<br />
* Petra Fey<br />
* Pascale Gaudet,<br />
* Karen Pilcher<br />
<br />
* BioPerl<br />
* GMOD<br />
* SGD<br />
<br />
===GBrowse (DasI) Adaptor===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/project/showfiles.php?group_id=27707&package_id=34513&release_id=433523<br />
* Language: Perl<br />
* Authors: Scott Cain<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity: Connects vi perl DBI<br />
* Transaction support: N/A (read only adapter)<br />
* Code generation: N/A<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Read-only<br />
* Not generic middleware but if you use Chado and GBrowse may be useful<br />
* Incomplete implementation of Bio::DasI; just enough to make GBrowse work<br />
* Also, despite the name, has never been tested with a Das server.<br />
<br />
====Presentation by Scott Cain====<br />
<br />
This Wiki section is an edited version of [[Media:DasI_middleware.pdf|Scott's presentation]].<br />
<br />
=====Create the database=====<br />
<br />
$ perl Makefile.PL<br />
$ make<br />
$ sudo make install<br />
$ make load_schema<br />
$ make prepdb # now with Xenopus!<br />
$ make ontologies # load rel, SO, featureprop<br />
<br />
=====Problem 1 - Loading Data=====<br />
<br />
Create some GFF from the specifications:<br />
<br />
fake_chromosome example chromosome 1 15017 . . . ID=fake_chromosome;Name=fake_chromosome<br />
fake_chromosome example gene 13691 14720 . + . ID=xfile;Name=xfile;Alias=mulder,scully;Note=A test gene for GMOD meeting<br />
fake_chromosome example mRNA 13691 14720 . + . ID=xfile_mRNA;Parent=xfile<br />
fake_chromosome example exon 13691 13767 . + . Parent=xfile_mRNA<br />
fake_chromosome example exon 14687 14720 . + . Parent=xfile_mRNA<br />
fake_chromosome example gene 12648 13136 . + . ID=x-men<br />
<br />
Gene inserted as GFF using a standard Bioperl bulk loader:<br />
<br />
<code>$ gmod_bulk_load_gff3.pl -g sample.gff</code><br />
<br />
''...lots of output...''<br />
<br />
=====Adaptor Components=====<br />
<br />
* Bio::DB::Das::Chado<br />
** Database connection object<br />
* Bio::DB::Das::Chado::Segment<br />
** Object for any range of DNA<br />
* Bio::DB::Das::Chado::Segment::Feature<br />
<br />
=====Use Bio::DB::Das::Chado=====<br />
<br />
<perl><br />
use Bio::DB::Das::Chado;<br />
<br />
my $chado = Bio::DB::Das::Chado->new(<br />
-dsn => "dbi:Pg:dbname=test",<br />
-user=> "scott",<br />
-pass=> "" ) || die "no new chado";<br />
<br />
my $gene_name = 'xfile';<br />
<br />
my ($gene_fo) = $chado->get_features_by_name($gene_name);<br />
</perl><br />
<br />
=====Problem 2 - Use Some Accessors=====<br />
<br />
<perl><br />
print "symbol: " . $gene_fo->display_name."\n";<br />
print "synonyms: " . join(', ',$gene_fo->synonyms)."\n";<br />
print "description: " . $gene_fo->notes."\n";<br />
print "type: " . $gene_fo->type."\n";<br />
<br />
my ($mRNA) = $gene_fo->sub_SeqFeature();<br />
my @exons = $mRNA->sub_SeqFeature();<br />
<br />
for my $exon (@exons) {<br />
next unless ($exon->type->method eq 'exon');<br />
$exon_count++;<br />
print "exon$exon_count start: " . $exon->start."\n";<br />
print "exon$exon_count end: " . $exon->end. "\n";<br />
$cds_seq .= $exon->seq->seq; # the first seq call returns a Bio::Seq object, the second gets the DNA string from Bio::Seq<br />
} <br />
</perl><br />
<br />
=====Bulk Output=====<br />
<br />
<perl><br />
my $gene_name = 'x-*';<br />
<br />
my @genes = $chado->get_features_by_name(<br />
-name => $gene_name,<br />
-class=> 'gene' );<br />
<br />
for my $gene (@genes) {<br />
print join("\t",<br />
$gene->feature_id,<br />
$gene->display_name,<br />
$gene->organism),"\n";<br />
}<br />
</perl><br />
<br />
Or see your report in GBrowse<br />
<br />
=====Advantages=====<br />
<br />
* Comes 'for free' with GBrowse<br />
** GBrowse will run with any DasI-compatible interface<br />
* Uses 'familiar' BioPerl idioms, very similar to widely used Bio::DB::GFF (though with fewer methods)<br />
<br />
<br />
=====Conclusion=====<br />
<br />
* Not suitable as a 'general' middleware layer<br />
** May be suitable for some applications, particularly if they are similar to GBrowse or other uses of Bio::DB::GFF<br />
<br />
===iBatis and Abator===<br />
<br />
====Background====<br />
<br />
* Source: http://ibatis.apache.org/<br />
* Language: Java<br />
* Authors: Apache group<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Does not hide SQL<br />
* Does not create a whole object model of the database in memory<br />
* Not as widely used as Hibernate<br />
* No Perl version<br />
<br />
====Presentation by Jeff Bowes====<br />
<br />
Jeff Bowes, Xenbase, University of Calgary. This Wiki section is an edited version of [[Media:iBatis.pdf|Jeff's presentation]].<br />
<br />
=====ibatis=====<br />
<br />
* iBatis<br />
** Light-weight framework<br />
** Still based on SQL but eliminates the repetitive drudgery of JDBC<br />
** You can tune a query by re-writing the SQL in XML & the API does not change.<br />
* iBatis does not create your database in memory as objects<br />
* Shallow learning curve<br />
* Manually create a Java class and SQL map to describe higher-level objects<br />
** Example: ''Gene''<br />
* Support for inheritance<br />
** Inheritance in result maps, allows fair amount of re-use. <br />
* Supports different transaction schemes<br />
** For example, JDBC, Java Transaction API<br />
<br />
=====Abator=====<br />
<br />
* Generates ibatis CRUD objects by introspecting database tables<br />
* Abator creates ''SQL in XML'' files (SQL Map files) and Java classes <br />
** Within these files is a Result Map section.<br />
* Abator config files are simple, set connection parameters, tell where the files are.<br />
* In the SQL Map files you can specify how to find parent ids, such asfeature_id. <br />
<br />
=====Abator Example=====<br />
<xml><br />
<abatorConfiguration><br />
<abatorContext> <!-- TODO: Add Database Connection Information --><br />
<jdbcConnection driverClass="COM.ibm.db2.jdbc.app.DB2Driver"<br />
connectionURL="jdbc:db2:XBDV05"<br />
userId="db2inst1"<br />
password=“*******"><br />
<classPathEntry location="/Program Files/IBM/SQLLIB/java/db2java.zip" /><br />
</jdbcConnection><br />
<br />
<javaModelGenerator<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.model"<br />
targetProject="gene" /><br />
<sqlMapGenerator<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.sql"<br />
targetProject="gene" /><br />
<daoGenerator type="IBATIS"<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.dao"<br />
targetProject="gene" /><br />
<abatorConfiguration><br />
</xml><br />
<br />
=====Abator Example=====<br />
<xml><br />
<abatorConfiguration><br />
<abatorContext> <!-- TODO: Add Database Connection Information --><br />
<jdbcConnection driverClass="COM.ibm.db2.jdbc.app.DB2Driver"<br />
connectionURL="jdbc:db2:XBDV05"<br />
userId="db2inst1"<br />
password=“*******"><br />
<classPathEntry location="/Program Files/IBM/SQLLIB/java/db2java.zip" /><br />
</jdbcConnection><br />
<br />
<javaModelGenerator<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.model"<br />
targetProject="gene" /><br />
<sqlMapGenerator<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.sql"<br />
targetProject="gene" /><br />
<daoGenerator type="IBATIS"<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.dao"<br />
targetProject="gene" /><br />
<abatorConfiguration><br />
</xml><br />
<br />
=====Abator Example=====<br />
<nowiki><br />
<table schema="db2inst1" tableName="synonym"></nowiki><br />
<generatedKey column="synonym_id" sqlStatement="VALUES PREVVAL FOR<br />
synonym_seq" identity="true" /><br />
<columnOverride column="CREATED_BY" jdbcType="INTEGER" /><br />
<columnOverride column="MODIFIED_BY" jdbcType="INTEGER" /><br />
<nowiki></table></nowiki><br />
<br />
=====Abator=====<br />
<br />
Works as:<br />
<br />
* Eclipse plug-in<br />
* ANT<br />
* Standalone<br />
<br />
=====DAO Methods=====<br />
<br />
* Insert (Feature)<br />
* Update (Feature)<br />
* DeletebyKey (FeatureKey)<br />
* SelectbyKey (FeatureKey)<br />
* SelectbyExample (FeatureExample)<br />
* DeletebyExample (FeatureExample)<br />
<br />
=====Insert=====<br />
<xml><br />
<insert id="abatorgenerated_insert" parameterClass=<br />
"org.gmod.architecture.framwork.bakeoff.abator.model.FeatureWithBLOBs"><br />
insert into db2inst1.feature<br />
(DBXREF_ID, ORGANISM_ID, NAME, UNIQUENAME,<br />
RESIDUES, SEQLEN, MD5CHECKSUM, TYPE_ID, IS_ANALYSIS,<br />
IS_OBSOLETE, CREATED_BY)<br />
values (#dbxrefId:INTEGER#, #organismId:INTEGER#, #name:VARCHAR#,<br />
#uniquename:VARCHAR#, #residues:CLOB#, #seqlen:INTEGER#,<br />
#md5checksum:CHAR#, #typeId:INTEGER#,<br />
#isAnalysis:SMALLINT#, #isObsolete:SMALLINT#,<br />
#createdBy:INTEGER#)<br />
<br />
<selectKey resultClass="java.lang.Integer" keyProperty="featureId"><br />
VALUES PREVVAL FOR feature_seq<br />
</selectKey><br />
</insert><br />
</xml><br />
<br />
=====Insert=====<br />
<xml><br />
<insert id="abatorgenerated_insert" parameterClass=<br />
"org.gmod.architecture.framwork.bakeoff.abator.model.FeatureWithBLOBs"><br />
insert into db2inst1.feature<br />
(DBXREF_ID, ORGANISM_ID, NAME, UNIQUENAME,<br />
RESIDUES, SEQLEN, MD5CHECKSUM, TYPE_ID, IS_ANALYSIS,<br />
IS_OBSOLETE, CREATED_BY)<br />
values (#dbxrefId:INTEGER#, #organismId:INTEGER#, #name:VARCHAR#,<br />
#uniquename:VARCHAR#, #residues:CLOB#, #seqlen:INTEGER#,<br />
#md5checksum:CHAR#, #typeId:INTEGER#,<br />
#isAnalysis:SMALLINT#, #isObsolete:SMALLINT#,<br />
#createdBy:INTEGER#)<br />
<br />
<selectKey resultClass="java.lang.Integer" keyProperty="featureId"><br />
VALUES PREVVAL FOR feature_seq<br />
</selectKey><br />
</insert><br />
</xml><br />
<br />
=====Insert=====<br />
<xml><br />
<insert id="abatorgenerated_insert" parameterClass=<br />
"org.gmod.architecture.framwork.bakeoff.abator.model.FeatureWithBLOBs"><br />
insert into db2inst1.feature<br />
(DBXREF_ID, ORGANISM_ID, NAME, UNIQUENAME,<br />
RESIDUES, SEQLEN, MD5CHECKSUM, TYPE_ID, IS_ANALYSIS,<br />
IS_OBSOLETE, CREATED_BY)<br />
values (#dbxrefId:INTEGER#, #organismId:INTEGER#, #name:VARCHAR#,<br />
#uniquename:VARCHAR#, #residues:CLOB#, #seqlen:INTEGER#,<br />
#md5checksum:CHAR#, #typeId:INTEGER#,<br />
#isAnalysis:SMALLINT#, #isObsolete:SMALLINT#,<br />
#createdBy:INTEGER#)<br />
<br />
<selectKey resultClass="java.lang.Integer" keyProperty="featureId"><br />
VALUES PREVVAL FOR feature_seq<br />
</selectKey><br />
</insert><br />
</xml><br />
<br />
=====Insert=====<br />
<xml><br />
<selectKey resultClass="java.lang.Integer"<br />
keyProperty="featureId"><br />
VALUES PREVVAL FOR feature_seq<br />
</selectKey><br />
</xml><br />
<br />
=====Insert=====<br />
<xml><br />
<selectKey resultClass="java.lang.Integer"<br />
keyProperty="featureId"><br />
VALUES PREVVAL FOR feature_seq<br />
</selectKey><br />
</xml><br />
<br />
=====Problem 1 - Insert=====<br />
<br />
<java><br />
try {<br />
sqlMap.startTransaction();<br />
pGene.id =featureDAO.insert(pGene.getFeatureWithBLOBs());<br />
featurepropDAO.insert(pGene.getPropertyDescription());<br />
pGene.featurelocId = featurelocDAO.insert(pGene<br />
.getFeaturelocWithBLOBS());<br />
pGene = insertExons(pGene);<br />
insertSynonyms(pGene);<br />
sqlMap.commitTransaction();<br />
} catch (Exception e) {<br />
System.out.println(e);<br />
throw (e);<br />
} finally {<br />
sqlMap.endTransaction();<br />
}<br />
</java><br />
<br />
=====Problem 1 - Insert=====<br />
<br />
<java><br />
try {<br />
sqlMap.startTransaction();<br />
pGene.id =featureDAO.insert(pGene.getFeatureWithBLOBs());<br />
featurepropDAO.insert(pGene.getPropertyDescription());<br />
pGene.featurelocId = featurelocDAO.insert(pGene<br />
.getFeaturelocWithBLOBS());<br />
pGene = insertExons(pGene);<br />
insertSynonyms(pGene);<br />
sqlMap.commitTransaction();<br />
} catch (Exception e) {<br />
System.out.println(e);<br />
throw (e);<br />
} finally {<br />
sqlMap.endTransaction();<br />
}<br />
</java><br />
<br />
=====Transactions=====<br />
<br />
* SQLMap<br />
* JDBC<br />
* JTA - Java Transaction API<br />
** 2-Phase commit<br />
* Hibernate<br />
* External (Customized)<br />
<br />
=====Retrieval=====<br />
<br />
symbol: xfile<br />
description: A test gene for GMOD meeting<br />
mRNA Feature<br />
exon_1: start: 13691 end: 13767 <br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
exon_2: start: 14687 end: 14720<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 2 - Master Detail Reports=====<br />
<br />
Account for cycles or recursion in Master Detail Report. <br />
<br />
<xml><br />
<resultMap id="SelectGeneResults"<br />
class="org.gmod.architecture.framwork.bakeoff.Gene" groupBy="id"><br />
<result column="FEATURE_ID" property="id" jdbcType="INTEGER"/><br />
<result column="GENE_NAME" property="name" jdbcType="VARCHAR" /><br />
<result column="DESCRIPTION" property="description“<br />
jdbcType="VARCHAR" /><br />
<result column="TYPE_ID" property="typeId" jdbcType="INTEGER" /><br />
<result property="exons" resultMap = "gene.SelectExonResults"/><br />
</resultMap><br />
<br />
<resultMap id="SelectExonResults"<br />
class="org.gmod.architecture.framwork.bakeoff.Exon"><br />
<result column="EXON_ID" property="id" jdbcType="INTEGER"/><br />
<result column="EXON_NAME" property="name" jdbcType="VARCHAR" /><br />
<result column="EXON_RESIDUES" property="residues" jdbcType="CLOB" /><br />
<result column="STRAND" property="strand" jdbcType="INTEGER" /><br />
<result column="FMIN" property="fmin" jdbcType="INTEGER" /><br />
<result column="FMAX" property="fmax" jdbcType="INTEGER" /><br />
<result column="SRCFEATURE_ID" property="sourceFeatureId"<br />
jdbcType="INTEGER" /><br />
</resultMap><br />
</xml><br />
<br />
=====Problem 2 - Master Detail Report=====<br />
<xml><br />
<resultMap id="SelectGeneResults"<br />
class="org.gmod.architecture.framwork.bakeoff.Gene" groupBy="id"><br />
<result column="FEATURE_ID" property="id" jdbcType="INTEGER"/><br />
<result column="GENE_NAME" property="name" jdbcType="VARCHAR" /><br />
<result column="DESCRIPTION" property="description“<br />
jdbcType="VARCHAR" /><br />
<result column="TYPE_ID" property="typeId" jdbcType="INTEGER" /><br />
<result property="exons" resultMap = "gene.SelectExonResults"/><br />
</resultMap><br />
<br />
<resultMap id="SelectExonResults"<br />
class="org.gmod.architecture.framwork.bakeoff.Exon"><br />
<result column="EXON_ID" property="id" jdbcType="INTEGER"/><br />
<result column="EXON_NAME" property="name" jdbcType="VARCHAR" /><br />
<result column="EXON_RESIDUES" property="residues" jdbcType="CLOB" /><br />
<result column="STRAND" property="strand" jdbcType="INTEGER" /><br />
<result column="FMIN" property="fmin" jdbcType="INTEGER" /><br />
<result column="FMAX" property="fmax" jdbcType="INTEGER" /><br />
<result column="SRCFEATURE_ID" property="sourceFeatureId"<br />
jdbcType="INTEGER" /><br />
</resultMap><br />
</xml><br />
<br />
=====Master Detail Report=====<br />
<br />
gene_id Symbol Type Fmin Fmax<br />
6129482 x-files gene 13691 13767<br />
6129482 x-files gene 14687 14720<br />
<br />
Becomes:<br />
<br />
gene_id Symbol Type Fmin Fmax<br />
6129482 x-files gene 13691 13767<br />
14687 14720<br />
<br />
=====Dynamic Queries=====<br />
<br />
* Gene Name (Description)<br />
** Feature, Featureprop<br />
* Symbol<br />
** Feature<br />
* Feature Synonyms<br />
** Feature, Feature_Synonym, Synonym<br />
* Ortholog Synonyms<br />
** Feature, Feature_relationship, Feature, Feature Synonyms<br />
<br />
=====Dynamic Queries=====<br />
<br />
FROM<br />
CAT_X_GENE_V gc<br />
<isEqual<br />
prepend=",property="searchSymbol"<br />
compareValue="true"><br />
GENE_SYMBOLS s<br />
</isEqual><br />
<br />
<isEqual prepend=","<br />
property="searchNcbi" <br />
compareValue="true"><br />
NCBI_GI n<br />
</isEqual><br />
<br />
=====Dynamic Queries=====<br />
<br />
<dynamic prepend="WHERE"><br />
<isEqual prepend="AND" property="searchNameOnly“<br />
compareValue="true"><br />
<iterate property="searchTokens" conjunction="AND" <br />
open=" (" close=") "><br />
LOWER(VARCHAR(gc.longname)) LIKE <br />
LOWER(CAST(#searchTokens[]:VARCHAR# AS VARCHAR(512)))<br />
</iterate><br />
</isEqual><br />
<br />
Iterate very useful for multiple search terms <br />
<br />
=====Miscellaneous Features=====<br />
<br />
* Supports various data sources<br />
** Simple JDBC<br />
** DBCP – Apache Connection Pooling<br />
** JNDI – Java Naming Directory Interface<br />
* Very flexible<br />
* Local caching of results<br />
** Lazy loading<br />
<br />
=====Support=====<br />
<br />
* In GMOD used by<br />
** Xenbase, Artemis at Sanger<br />
* Many other users<br />
** e.g. MySpace.com<br />
* Top level Apache Project<br />
** www.ibatis.apache.org<br />
* Active community<br />
<br />
<br />
=====What iBatis Does Well=====<br />
<br />
* Does not hide SQL<br />
** No new query language to learn<br />
* Separates and groups SQL<br />
* Simple!!<br />
** Light wrapper - No real tweaks<br />
* Does the job well<br />
* Excellent support for Master-Detail<br />
* Dynamically generated queries <br />
** You can structure conditions around clauses in SQL<br />
** One XML statement can represent many variations on a query<br />
<br />
=====Acknowledgements=====<br />
<br />
GMOD<br />
* Eric Just<br />
* Everyone else<br />
<br />
Ibatis Developers<br />
* Kevin Snyder,<br />
* Chris Jarabek,<br />
* Ross Gibb<br />
<br />
PI<br />
* Peter Vize<br />
<br />
Financial Support<br />
* Alberta Heritage Foundation for Medical Research<br />
* Alberta Network for Proteomics Innovation<br />
* University of Calgary, Faculty of Science<br />
* University of Calgary Dept. of Computer Science<br />
* NICHD<br />
<br />
===Hibernate===<br />
<br />
====Background====<br />
<br />
* Source: http://www.hibernate.org<br />
* Language: Java<br />
* Authors: JBoss group<br />
* Users: VectorBase<br />
* Support: JBoss group<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Limitations====<br />
<br />
* Issue around Completeness <br />
* Exception Handling<br />
* Performance Tuning<br />
<br />
====Presentation by Robert Bruggner====<br />
<br />
Chado API via Java & Hibernate, Robert Bruggner, VectorBase.org. This Wiki section is an edited version of [[Media:HibernateChadoAPI.pdf|Robert's presentation]].<br />
<br />
=====Overview=====<br />
<br />
* Background<br />
* Quick Hibernate Overview<br />
* Hibernate Connectivity and O/R Mapping Example<br />
* GMOD Demo<br />
<br />
Also see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]].<br />
<br />
=====Background=====<br />
<br />
* VectorBase<br />
** A bioinformatic resource center for invertebrate vectors of human pathogens<br />
* Responsible for storage and display of multiple organisms’ genomes<br />
** Anopheles gambiae, Aedes aegypti, Ixodes scapularis, Culex pipiens and so on....<br />
* Want to store data for many organisms- Chado a natural choice<br />
* Ensembl Genome Browser already used for ''A. gambiae''<br />
** Wrote Ensembl API Database adaptor for Chado... Not maintainable.<br />
* Use Both Databases<br />
** Transfer genomic data from Ensembl to Chado<br />
** Search Engine and Indexer using Lucene<br />
** Run DAS<br />
** Export data via ChadoXML and GFF3<br />
* Need API for Database I/O<br />
<br />
=====Hibernate Background=====<br />
<br />
* They say: “A powerful, high performance object/relational persistence and query service.”<br />
* Automates the persistence of plain old Java objects (POJO)<br />
** User maps their POJO properties to database tables via XML (HBM File).<br />
** There are Hibernate tools that generate HBMs<br />
*** Configurable in the sense that one can create get & set tables where the methods map one-to-one to fields.<br />
* Persist a specific object by storing it the database.<br />
* Intelligent Database I/O <br />
** Smart detection of ''Dirty Properties'' when performing Save / Update / Delete.<br />
** Cascadable Save / Update / Delete for complex objects.<br />
* Everything's done within the scope of a transaction.<br />
<br />
=====Hibernate Database Connectivity=====<br />
<br />
* Configure Hibernate in hibernate.cfg.xml<br />
* Define a Data Source<br />
** We use a simple, single JDBC connection Chado<br />
** Can be configured to use a connection pool or data source accessible by the Java Naming and Directory Interface (JNDI).<br />
** Define a connection “dialect”<br />
** org.hibernate.dialect.PostgreSQLDialect<br />
* Describe the relationship between Java objects and database tables<br />
** Use XML to describe where to store POJO property data in the database<br />
* Create a new Hibernate Session based on the configuration<br />
* Begin a transaction to start performing work<br />
<br />
=====POJO and HBM Example file - CV=====<br />
<br />
<java><br />
public class CV {<br />
<br />
private int cv_id<br />
private String name;<br />
private String definition;<br />
<br />
public property gettersandsetters() {<br />
....<br />
}<br />
<br />
public boolean equals(CV comparaCV) {<br />
....<br />
}<br />
public int hashCode(){<br />
...<br />
}<br />
}<br />
</java><br />
<xml><br />
<hibernate-mapping><br />
<br />
<class name="org.vectorbase.chadoAPI.chadoObjects.CV" table="cv"><br />
<br />
<id name="cv_id" column="cv_id" unsaved-value="undefined"><br />
<br />
<generator class="sequence"><br />
<br />
<param name="sequence">cv_cv_id_seq</param><br />
<br />
</generator><br />
<br />
</id> <br />
<br />
<property name="name" column=”name” type="java.lang.String” not-null="true"/><br />
<br />
<property name="definition" column=”definition” type="java.lang.String”/><br />
<br />
</class><br />
</hibernate-mapping><br />
</xml><br />
<br />
=====HBM Example CVTerm=====<br />
<br />
<java><br />
public class CVTerm {<br />
<br />
private int cvterm_id;<br />
<br />
private CV cv;<br />
<br />
private String name;<br />
<br />
private String definition;<br />
<br />
private DBXref dbxref;<br />
<br />
private int is_obsolete;<br />
<br />
private int is_relationshiptype;<br />
<br />
.....<br />
<br />
}<br />
</java><br />
<xml><br />
<hibernate-mapping><br />
<br />
<class name="org.vectorbase.chadoAPI.chadoObjects.CVTerm" table="cvterm"><br />
<br />
<id name="cvterm_id" column="cvterm_id" unsaved-value="undefined"><br />
<br />
<generator class="sequence"><br />
<br />
<param name="sequence">cvterm_cvterm_id_seq</param><br />
<br />
</generator><br />
<br />
</id><br />
<br />
<many-to-one name="cv" class="org.vectorbase.chadoAPI.chadoObjects.CV" column="cv_id" <br />
not-null="true" cascade="save-update"/><br />
<br />
<property name="name" not-null="true" type="java.lang.String"/><br />
<br />
<property name="definition"/><br />
<br />
<one-to-one name="dbxref" class="org.vectorbase.chadoAPI.chadoObjects.DBXref" cascade="all"/><br />
<br />
<property name="is_obsolete"/><br />
<br />
<property name="is_relationshiptype"/><br />
<br />
</class><br />
</hibernate-mapping><br />
</xml><br />
<br />
=====Hibernate Object Retrieve=====<br />
<br />
One can use Java, Hibernate Query Language, or SQL, this example uses HQL<br />
<br />
<java><br />
import org.hibernate.Session;<br />
import org.vectorbase.chadoAPI.CVTerm;<br />
import org.vectorbase.chadoAPI.CV;<br />
<br />
// Load the configuration from hibernate.cfg.xml<br />
<br />
// Build a session factory first (not shown)<br />
<br />
// Get the session based on the configuration and begin transaction<br />
Session session = HibernateSessionFactory.getCurrentSession();<br />
session.beginTransaction();<br />
<br />
// Load a CVTerm by its ID<br />
CVTerm cvt = (CVTerm) session.get(CVTerm.class,1);<br />
<br />
// Load a CVTerm using HQL<br />
CVTerm cvt = session.createQuery(“from CVTerm where name=?”).setString(0,”name”).uniqueResult();<br />
<br />
// Print out the name of the cvterm<br />
System.out.println(cvt.getName());<br />
<br />
// Get the cv that the cvterm is associated with<br />
// Hibernate doesn’t return the cv_id - it returns a CV Object.<br />
CV cv = cvt.getCv();<br />
<br />
// Print out the cv’s name<br />
System.out.println(cv.getName());<br />
</java><br />
<br />
=====Hibernate Object Update=====<br />
<br />
<java><br />
import org.hibernate.Session;<br />
import org.vectorbase.chadoAPI.CVTerm;<br />
<br />
// Load the configuration from hibernate.cfg.xml<br />
<br />
// Build a session factory first (not shown)<br />
<br />
// Get the session based on the configuration and begin transaction<br />
Session session = HibernateSessionFactory.getCurrentSession();<br />
session.beginTransaction();<br />
<br />
// Load a CVTerm by its ID<br />
CVTerm cvt = (CVTerm) session.get(CVTerm.class,1);<br />
<br />
// Change cvt’s name<br />
cvt.setName(“New CVTerm name”);<br />
<br />
// Save!<br />
// Generated SQL updates “Dirty” properties (name, in this case)<br />
session.save(cvt);<br />
<br />
// Commit data to database<br />
session.commit();<br />
</java><br />
<br />
=====Hibernate Save=====<br />
<br />
<java><br />
import org.hibernate.Session;<br />
import org.vectorbase.chadoAPI.CVTerm;<br />
import org.vectorbase.chadoAPI.CV;<br />
<br />
// Load the configuration from hibernate.cfg.xml<br />
// Build a session factory first and get begin transaction (not shown)<br />
<br />
// Make a new CV<br />
CV new_cv = new CV();<br />
new_cv.setName(“New CV”);<br />
new_cv.setDefinition(“New CV Def”);<br />
<br />
// Make a new cvterm for that cv<br />
CVTerm new_cvterm = new CVTerm();<br />
new_cvterm.setName(“New CVTerm Name”);<br />
// ..... save dbxref etc......<br />
<br />
// Add that CVTerm to our new CV<br />
new_cv.addCVTerm(new_cvterm);<br />
<br />
// Save the new data...<br />
// Hibernate recognizes that it has to first save new_cv, then save new_cvterm.<br />
session.save(new_cvterm);<br />
<br />
session.commit();<br />
<br />
// You can see the new id’s assigned by the database<br />
System.out.println(new_cv.getCv_id());<br />
System.out.println(new_cvterm.getCvterm_id());<br />
</java><br />
<br />
=====Inheritance=====<br />
<xml><br />
<hibernate-mapping><br />
<br />
<class name="org.vectorbase.chadoAPI.chadoObjects.Feature" table="feature" discriminator-<br />
value="not null"><br />
<br />
<id name="feature_id" column="feature_id" unsaved-value="undefined"><br />
<br />
<generator class="sequence"><br />
<br />
<param name="sequence">feature_feature_id_seq</param><br />
<br />
</generator><br />
<br />
</id> <br />
<br />
<discriminator column="type_id" type="integer" insert="false"/><br />
<br />
<many-to-one name="dbxref" class="org.vectorbase.chadoAPI.chadoObjects.DBXref" <br />
column="dbxref_id" cascade="all"/><br />
<br />
<many-to-one name="organism" class="org.vectorbase.chadoAPI.chadoObjects.Organism" <br />
column="organism_id" not-null="true" cascade="save-update"/><br />
<br />
<property name="name"/><br />
.....<br />
<br />
<hibernate-mapping> <br />
<br />
<subclass name="org.vectorbase.chadoAPI.chadoFeatures.Gene" <br />
extends="org.vectorbase.chadoAPI.chadoObjects.Feature" discriminator-value="767"><br />
<br />
</subclass><br />
</hibernate-mapping><br />
</xml><br />
Write custom methods for specific sub-classes<br />
<br />
=====ChadoAPI=====<br />
<br />
* POJO Mappings<br />
** CV, CVTerm, DB, DBXref, Feature, FeatureCVTerm, FeatureDBXref, FeatureLoc, FeatureProp, FeatureRelationship, FeatureSynonym, Organism, Pub, Synonym<br />
* Extended Features<br />
** Chromosome, Gene, Transcript, Exon, Protein<br />
* Constants<br />
** CVTerms, FeatureFeatureRelationships, Ontologies<br />
* Special<br />
** ChadoAdapter<br />
<br />
=====Problem 1 - GMOD Example=====<br />
<br />
<java><br />
// Set up our session and begin transaction<br />
Session session = HibernateUtil.getSessionFactory().getCurrentSession();<br />
session.beginTransaction();<br />
<br />
// Make a chado adpator and load up some utility objects<br />
ChadoAdaptor ca = new ChadoAdaptor();<br />
Chromosome c = ca.fetchChromosomeByUniqueName("fake_chromosome");<br />
Pub null_pub = ca.fetchPubByPubID(1);<br />
Organism agambiae = ca.fetchOrganismByScientificName("Anopheles","gambiae");<br />
<br />
// Begin GMOD Demo Code<br />
<br />
// Make our new gene;<br />
Gene xfile = new Gene();<br />
xfile.setOrganism(agambiae);<br />
xfile.setUniquename("xfile");<br />
xfile.setDescription("A test gene for GMOD meeting");<br />
<br />
/* Set the location of our gene. No need to set coordinates because they'll be updated<br />
* based on the exon boundaries. <br />
*/<br />
FeatureLoc xfile_loc = new FeatureLoc();<br />
xfile_loc.setSrcfeature(c);<br />
xfile_loc.setStrand(1);<br />
xfile.setFeatureLoc(xfile_loc);<br />
<br />
// Add synonyms to xfile<br />
xfile.createNewFeatureSynonym("mulder", null_pub, CVTerms.EXACT_SYNONYM);<br />
xfile.createNewFeatureSynonym("scully", null_pub, CVTerms.EXACT_SYNONYM);<br />
</java><br />
<br />
=====Problem 2 - GMOD Example=====<br />
<br />
<java><br />
// Create a new transcript for our gene.<br />
Transcript t = xfile.createGeneTranscript("xfile-RA");<br />
<br />
// Create some exons for that transcript.<br />
t.createTranscriptExon("xfile:1", 13691, 13767);<br />
t.createTranscriptExon("xfile:2", 14687, 14720);<br />
<br />
// Save our new gene<br />
session.save(xfile);<br />
System.out.println("xfile feature_id is " + xfile.getFeature_id());<br />
<br />
// Fetch our saved gene from the database<br />
Gene xfile_r = ca.fetchGeneByUniqueName("xfile");<br />
System.out.println("symbol: " + xfile_r.getUniquename());<br />
System.out.print("synonyms: ");<br />
for (FeatureSynonym fs : xfile_r.getFeatureSynonyms()){<br />
<br />
System.out.print(fs.getSynonym().getName() + " ");<br />
}<br />
<br />
System.out.println("description: " + xfile_r.getDescription());<br />
System.out.println("type: " + xfile_r.getType().getName());<br />
<br />
for (Transcript tx : xfile_r.fetchAllTranscripts()){<br />
for (Exon e : tx.fetchAllExons()){<br />
System.out.println(e.getUniquename() + " Start:\t" + e.getFeatureLoc().getFmin());<br />
System.out.println(e.getUniquename() + " End:\t" + e.getFeatureLoc().getFmax());<br />
System.out.println("\tSrcFeatureID: " + e.getFeatureLoc().getSrcfeature().getFeature_id());<br />
}<br />
System.out.println(">" + tx.getUniquename());<br />
System.out.println(tx.generateTranscriptSequenceFromExons().toUpperCase());<br />
}<br />
</java><br />
<br />
=====Problems 3, 4, & 5 - GMOD Update & Delete=====<br />
<br />
<java><br />
// Lets update our name...<br />
xfile_r.setUniquename("x-file");<br />
<br />
session.save(xfile_r);<br />
<br />
// Not part of the ChadoAdaptor utility object, but a good example of HQL<br />
List<Gene> genes = (List<Gene>)session.createQuery("from Gene where uniquename like ?").setString(0,”x-%”).list();<br />
<br />
for (Gene g : genes){<br />
<br />
System.out.println(g.getFeature_id() + <br />
"\t" + g.getUniquename() + <br />
"\t" + g.getOrganism().getGenus() +<br />
" " + g.getOrganism().getSpecies());<br />
}<br />
<br />
// Deleting... hmm...<br />
Gene delete_me = ca.fetchGeneByUniqueName("x-ray");<br />
session.delete(delete_me);<br />
<br />
// All Finished<br />
session.getTransaction().commit();<br />
</java><br />
<br />
<br />
<br />
=====What Hibernate Does Well=====<br />
<br />
* Hibernate can be configured to perform specialized functions<br />
** For example, it has its own notion of a cascade<br />
* Flexible with respect to language<br />
** Java, Hibernate Query Language, or SQL<br />
* Any JDBC driver<br />
<br />
=====Acknowledgements=====<br />
<br />
* VectorBase People<br />
** Frank Collins, EO Stinson, Ryan Butler<br />
* GMOD<br />
* NIAID<br />
<br />
===PSU Chado Interface===<br />
<br />
====Background====<br />
<br />
* Source:<br />
* Language: Java<br />
* Authors: Chinmay Patel, Adrian Tivey<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Limitations====<br />
<br />
* Hibernate Save: no equivalent to a ''find or save'' method.<br />
* Not great for bulk data retrieval.<br />
* Hibernate works best when applied to databases designed with objects in mind (''Object Oriented Databases''). <br />
<br />
<br />
====Presentation by Chinmay Patel====<br />
<br />
This Wiki section is an edited version of [[Media:PSU.pdf|Chinmay's presentation]].<br />
<br />
=====GeneDB=====<br />
<br />
* GeneDB is the organism data and annotation database for the Pathogen Sequencing Unit (PSU) at the Sanger Institute, UK<br />
* Contains 37 organisms, which is expected to grow to 62<br />
* Currently migrating to chado schema<br />
* Java API with two engines Hibernate & iBatis<br />
** Two teams, Artemis and GeneDB, took different approaches<br />
<br />
=====Technical - Connections=====<br />
<br />
Connections are configured in the Spring configuration file<br />
<xml><br />
<bean id="dataSource" class="org.apache.commons.dbcp.BasicDataSource"><br />
<property name="driverClassName" value="org.postgresql.Driver" /><br />
<property name="url" value="jdbc:postgresql://holly.sanger.ac.uk:5432/chado" /><br />
<property name="username" value="DELIBERATELY_BOGUS_NAME"/><br />
<property name="password" value="WIBBLE" /><br />
</bean><br />
</xml><br />
* Uses a connection pool<br />
* Connection to the database is specified graphically, so the iBatis configuration file has variables for the location:<br />
<xml><br />
<property name="JDBC.Driver" value="org.postgresql.Driver"/><br />
<br />
<property name="JDBC.ConnectionURL” value="jdbc:postgresql://${chado}"/><br />
<br />
<property name="JDBC.Username" value="${username}"/><br />
<br />
<property name="JDBC.Password" value="${password}"/><br />
</xml><br />
<br />
* provide database location, username & password<br />
* select from scrollable list of feature with residues (organisms in separate Postgres schemas) what to open in Artemis<br />
<br />
=====Technical - Code Generation=====<br />
<br />
* The shared interface and hibernate implementation were originally generated<br />
* There’s no explicit code generation (although the Spring and Hibernate runtimes may use them behinds the scenes)<br />
<br />
=====Technical - Transactions=====<br />
<br />
* Transactions are fully supported<br />
* There’s no explicit code generation (although the Spring and Hibernate runtimes may use them behind the scenes)<br />
<br />
=====Problems 1, 2, & 3=====<br />
<br />
Creating a gene<br />
<java><br />
genes[0] = new Feature(ORG, GENE, "xfile", false, false, now, now);<br />
<br />
genes[0].setSeqLen(1029); <br />
sequenceDao.persist(genes[0]);<br />
<br />
FeatureLoc loc = new FeatureLoc(SOURCE_FEATURE, genes[0], 13691, false, 14720, false, (short)1, 0, 0 ,0);<br />
<br />
sequenceDao.persist(loc);<br />
<br />
addFeatureProp(genes[0], "description", "A test gene for GMOD meeting");<br />
<br />
addSynonymsToFeature(genes[0], "mulder", "scully");<br />
<br />
createExon("exon1", genes[0], 13691, 13767, now, 0);<br />
<br />
createExon("exon2", genes[0], 14687, 14720, now, 1);<br />
</java><br />
<br />
Retrieve a gene<br />
<java><br />
Feature f = sequenceDao.getFeatureByUniqueName("xfile");<br />
displayGene(f);<br />
</java><br />
<br />
Update a gene<br />
<java><br />
genes[0].setUniqueName("x-file");<br />
<br />
sequenceDao.merge(genes[0]);<br />
</java><br />
<br />
=====Demo – Sample Problem=====<br />
<br />
<java><br />
private Feature createExon(String name, Feature gene, int min, int max, Timestamp now, int rank) {<br />
<br />
Feature exon = new Feature(ORG, EXON, name, false, false, now, now);<br />
exon.setSeqLen(max-min);<br />
sequenceDao.persist(exon);<br />
<br />
FeatureLoc loc = new FeatureLoc(SOURCE_FEATURE, exon, min, false, max, false, <br />
(short)1, 0, 0 ,0);<br />
sequenceDao.persist(loc);<br />
<br />
return exon;<br />
<br />
}<br />
</java><br />
<br />
=====Demo – Sample Problem=====<br />
<br />
<xml><br />
<st:section name="Naming" id="gene_naming" collapsed="false" collapsible="false"<br />
hideIfEmpty="true"><br />
<dl><br />
<dt><b>symbol:</b></dt><br />
<dd>${feature.uniqueName}</dd><br />
</dl><br />
<db:synonym name="synonym" var="name" collection="${feature.featureSynonyms}"><br />
<br /><b>Synonym:</b> <db:list-string collection="${name}" /><br />
</db:synonym><br />
<dt><b>Type:</b></dt><br />
<dd>${feature.cvTerm.name}</dd><br />
<br />
<st:section name="Exons" collapsed="false" collapsible="true" hideIfEmpty="true"><br />
<display:table name="exons" uid="tmp" pagesize="30" class="simple" cellspacing="0"<br />
cellpadding="4"><br />
<display:column property="uniqueName" title="Exon"/><br />
<display:column property="featureLocsForSrcFeatureId.fmin" title="Start"/><br />
<display:column property="featureLocsForSrcFeatureId.fmax" title="end"/><br />
</display:table><br />
</st:section><br />
<br />
<st:section name="cds" collapsible="true"><br />
<b>${feature.residues}</b><br />
</st:section><br />
</xml><br />
<br />
Specialized functionality like a cascading delete are handled by the database</div>165.124.152.78http://gmod.org/wiki/GMOD_MiddlewareGMOD Middleware2007-01-26T19:56:12Z<p>165.124.152.78: /* Background */</p>
<hr />
<div>==Middleware for Chado databases==<br />
<br />
===Authors===<br />
<br />
* Jeff Bowes<br />
* Robert Bruggner<br />
* Scott Cain<br />
* Josh Goodman<br />
* Eric Just<br />
* Sohel Merchant<br />
* Brian O'Connor<br />
* Brian Osborne<br />
* Chinmay Patel<br />
* Pinglei Zhou<br />
<br />
===Middleware Evaluation January 2007===<br />
<br />
A group of some 50 GMOD developers met at the annual meeting to discuss middleware. This one day meeting had the following general goals:<br />
<br />
* To educate GMOD programmers on methods and practices for Middleware<br />
* To facilitate discussion on the best methods<br />
* To guide GMOD to a uniform Middleware layer<br />
* To generate this central reference document for Middleware projects, including:<br />
** Platform information<br />
** Strengths & weaknesses of different Middleware packages<br />
** Specific examples of how one would use a given middleware package<br />
<br />
===Introduction===<br />
<br />
One of the key characteristics of the GMOD software project is the variety of approaches and components that it supports. This applies to applications, database schemas, as well as to middleware, a software ''layer'' that mediates the exchange of data between the applications and the databases. Despite this diversity certain applications and schemas have emerged as key supported components in GMOD, such as the GBrowse application and the Chado schema, to name just two. However, a consensus view has not emerged with respect to middleware, and there are certainly a number of different middleware packages that have been used in the GMOD world, coming from within this world and from the larger world of open source.<br />
<br />
In late 2006 the GMOD developers took note of the large number of middleware packages in use and elected to embark on a short-term study to evaluate and compare these packages. The primary motivation here was to select or recommend certain packages over others specifically within the GMOD context. The assumption is that making such recommendations will serve to focus the developers' effort on a smaller number of packages. Clearly it's also assumed that such a focus will inevitably lead to greater support for and use of those recommended packages, and that all will GMOD will benefit.<br />
<br />
Another purpose of this study is to educate GMOD programmers on best practices concerning the use and development of middleware. It's expected that common agreement on these practices will lead to the development of more effective software as well as the best use of the software in practice. Finally, this study should generate a central reference document on these different middleware packages used in GMOD. This reference will contain platform- and language-specific information as well as descriptions of the strengths and weaknesses of the packages that can be used by GMOD developers when considering middleware.<br />
<br />
====General Evaluation Criteria====<br />
<br />
The GMOD developers proposed that each presenter provide some basic information about each middleware package, both general and technical. In addition each middleware application was asked to address a set of sample problems, shown below. These example problems are thought to typify some of the common functions that the scientist may need when working with their own database. It was understood that not all software would be able to handle all aspects of the sample problems and this demonstration was not intended to be ''live''.<br />
<br />
=====Problem 1=====<br />
<br />
Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned <code>feature_id</code> for each inserted gene.<br />
<br />
Note:<br />
<br />
* The coordinates are given in exact coordinates.<br />
* Use the <code>organism_id</code> for your organism<br />
* Store description in the chado table <code>featureprop</code><br />
* A sequence in fasta format (see [[FakeChromosome|Fake Chromosome]]) should be loaded as genomic sequence, either chromosome or contig -- this will be used as a <code>srcfeature</code> in <code>featureloc</code><br />
<br />
Gene descriptions:<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
mRNA Feature<br />
exon_1: <br />
start: 13691<br />
end: 13767<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
exon_2: <br />
start: 14687<br />
end: 14720<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
symbol: x-men<br />
synonyms: wolverine<br />
mRNA Feature<br />
exon_1: <br />
start: 12648<br />
end: 13136<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
symbol: x-ray<br />
synonyms: none<br />
exon: <br />
start: 1703<br />
end: 1900<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 2=====<br />
<br />
Retrieve and print the following report for gene '''xfile''' (the coding sequence and exon coordinates are derived from the associated mRNA feature). The results should resemble the following:<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>xfile cds <br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
=====Problem 3=====<br />
<br />
Update the gene '''xfile''': change the name symbol to '''x-file''' and retrieve the changed record. Regenerate the report from Problem 1. The results should resemble the following:<br />
<br />
symbol: x-file<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>x-file cds<br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
=====Problem 4=====<br />
<br />
Search for all genes with symbols starting with ''x-''. With the results produce the following simple result list<br />
(organism will vary):<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
1325 x-ray Xenopus laevis<br />
<br />
=====Problem 5=====<br />
<br />
Delete the gene '''x-ray''' using the <code>geneId</code>. Run the search and report in Problem 4 again to show the delete has taken<br />
place, with a result resembling the following:<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
<br />
===Conclusions===<br />
<br />
The one day meeting heard presentations from developers using both Perl and Java middleware and a number of satisfactory solutions were described. The focus in all cases was some sort of system that connected to the Chado relational database. Although other databases are encountered in the GMOD world (e.g. BioSQL) the Chado schema is popular and serves as a good test schema for this exercise given its complexity. The primary focus in the talks was on functionality from the perspective of writing code and extending the software and less attention was given to performance. Each presenter focussed on their middleware and little side-by-side comparisons were made (for one comparison please see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]] by Josh Goodman).<br />
<br />
====Problem Assignments====<br />
<br />
All presenters paid attention to the assigned problems and all packages could perform the required operations, except for the GBrowse (DasI) Adaptor which is read-only software. Clearly one can see many differences between packages in how the the problems were solved, please see the presentations themselves for these details. <br />
<br />
The Perl approaches used only the Perl language whereas the Java packages all used Java plus XML, to some degree. In addition iBatis exposes SQL to the developer and it was argued that this could be viewed either as an advantage (allows tuning of underlying SQL) or disadvantage.<br />
<br />
====Java Middleware====<br />
<br />
* Flybase examined iBatis and Hibernate, both use XML configuration files<br />
** Hibernate is better if you're building schema from scratch<br />
** Both auto-configure given a schema. <br />
** Both have strengths and weaknesses.<br />
* Is Hibernate better when you're in the process of designing a schema?<br />
** Hibernate can assist you in making a ''Hibernate-compatible'' schema.<br />
<br />
=====Abstraction=====<br />
<br />
<br />
=====Performance=====<br />
<br />
No pairwise comparisons.<br />
<br />
=====Configuration & Autoconfiguration=====<br />
<br />
<br />
=====Documentation=====<br />
<br />
<br />
====Perl Middleware====<br />
<br />
<br />
=====Abstraction=====<br />
<br />
Modware has Higher level abstraction than that provided by Chado::AutoDBI<br />
<br />
=====Performance=====<br />
<br />
No pairwise comparisons.<br />
<br />
=====Configuration & Autoconfiguration=====<br />
<br />
<br />
=====Documentation=====<br />
<br />
DasI interface is well-documented, about a dozen methods and three classes, all documented.<br />
<br />
===Object-Relational Mapping Principles===<br />
<br />
====Presentation by Sohel Merchant====<br />
<br />
Sohel Merchant, Bioinformatics Software Engineer at dictyBase, Center for Genetic Medicine, Northwestern University, Chicago. This Wiki section is an edited version of [[Media:Object_Relational_Mapping_Layer.pdf|Sohel's presentation]].<br />
<br />
=====Outline=====<br />
<br />
* The Problem<br />
* Solutions<br />
* ORM<br />
* Perl – Class::DBI<br />
* Summary<br />
<br />
=====The Problem=====<br />
<br />
* Developers need to perform Create, Retrieve, Update, Delete (aka CRUD) operations on data inside an application.<br />
* The real world objects represented using a programming language needs to be stored in databases<br />
* Using relational databases to store object-oriented data leads to a semantic gap<br />
* RDBMS have fixed types, but OO can have more complicated user defined types.<br />
<br />
=====Solutions=====<br />
<br />
* Data Access Object (DAO)<br />
* Developer writes a class which contains one attribute for each field in the table<br />
* Methods for CRUD typically contains JDBC/DBI code with the necessary SQL statements.<br />
* Object Relational Mapping (ORM), WikiPedia:<br />
** “ORM is a programming technique that links databases to object-oriented language concepts, creating (in effect) a virtual object database.“<br />
* Developer needs to configure the ORM<br />
* Less amount of manual coding<br />
* CRUD methods are automatically generated by the ORM layer<br />
<br />
=====ORM=====<br />
<br />
ORM solutions<br />
<br />
* Perl<br />
** Class::DBI<br />
* Java<br />
** EJB<br />
** Hibernate<br />
** JDO<br />
** iBatis<br />
<br />
=====Perl - Class::DBI=====<br />
<br />
* Provides a simple interfaces for wrapping Perl classes around a database tables<br />
* Tables are mapped directly to objects<br />
* The table column name are mapped to the get/set methods<br />
* Can be used with transactions<br />
<br />
=====Class::DBI=====<br />
<br />
Defining a class in Class::DBI to represent a table:<br />
<br />
CVTERM<br />
cvterm_id<br />
cv_id<br />
name<br />
definition<br />
dbxref_id<br />
<br />
Corresponding code:<br />
<br />
<perl><br />
package Chado::Cvterm;<br />
use base 'Chado::DBI';<br />
Chado::Cvterm->set_up_table('Cvterm');<br />
</perl><br />
<br />
=====Class::DBI - CRUD=====<br />
<br />
<perl><br />
## Create<br />
$term_dbobj = Chado::Cvterm->create({<br />
name => ”DUMMY TERM”,<br />
cv_id => 1,<br />
dbxref_id => 125<br />
});<br />
## Retrieve<br />
$term_dbobj = Chado::Cvterm->retrieve(2);<br />
<br />
## Update<br />
$term_dbobj->name( $term->name() );<br />
$term_dbobj->definition( $term->definition );<br />
<br />
## Delete<br />
$term_dbobj->delete();<br />
</perl><br />
<br />
=====Java - Hibernate=====<br />
<br />
* Hibernate maps Java Objects directly to database tables<br />
* Scalable<br />
* Works well for controlled Data model<br />
<br />
=====Java - iBatis=====<br />
<br />
* iBATIS maps Java Objects to the results of SQL Queries<br />
* XML definitions for queries<br />
* Queries and managing Maps<br />
* Transactions<br />
* Good fit for existing database schema<br />
<br />
=====Summary=====<br />
<br />
* ORM provides painless roundtrip of data between the application and database.<br />
* Reduces the amount of SQL code and allows a programmatic style interface to the RDBMS<br />
* Choice of ORM solution depends on the type of project<br />
* Flybase examined iBatis and Hibernate, both use XML configuration files<br />
** Hibernate is better if you're building schema from scratch<br />
** Both auto-configure given a schema. <br />
** Both have strengths and weaknesses.<br />
* Is Hibernate better when you're in the process of designing a schema?<br />
** Hibernate can assist you in making a ''Hibernate-compatible'' schema.<br />
<br />
===XORT===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/project/showfiles.php?group_id=27707<br />
* Language: Perl<br />
* Authors: Pinglei Zhou<br />
* Users: Flybase<br />
* Support: Pinglei Zhou<br />
* Third party code: Uses XML::Parser::PerlSAX, Unicode::String, XML::DOM, and DBI from Perl (CPAN)<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
=====Comparing Hibernate & XORT=====<br />
<br />
Flybase tried Hibernate, but just creating simple print() statements in the course of doing bulk operations they<br />
encountered performance issues. Therer are many caching parameters available in Hibernate but the problem is that Chado is recursive or cyclical. <br />
XORT does some simple, and automatic, caching. With XORT you can handle recursive or cyclical operations more easily. In common operations such as merging genes Chado users will encounter this issue routinely.<br />
<br />
Also see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]].<br />
<br />
====Limitations====<br />
<br />
* Database schemas need to follow certain rules<br />
** All must have internal int primary key<br />
** All must have unique key(s)<br />
* It may take a long path to retrieve certain type of data<br />
** Example: gene->allele->genotype->phenotype via feature_relationship<br />
* Structure not stored in memory, you flush out data as it goes<br />
<br />
====Presentation by Pinglei Zhou and Josh Goodman====<br />
<br />
This Wiki section is an edited version of [[Media:XORT.pdf|Josh and Pinglei's presentation]].<br />
<br />
<br />
<br />
=====Introduction=====<br />
<br />
* An XML-database mapping system for data exchange between DB and XML-driven application<br />
* XORT can handle typical XML, it's not Chado-specific<br />
* Developed/Supported by Pinglei Zhou at FlyBase Harvard, 0.007 version now.<br />
* Used at all FlyBase sites<br />
** Harvard has extensive library of Perl modules for generating ChadoXML<br />
* Written in Perl<br />
* Required perl modules:<br />
** XML::Parser::PerlSAX<br />
** Unicode::String<br />
** XML::DOM<br />
** DBI<br />
<br />
=====Chado XML=====<br />
<br />
* Is ChadoXML necessary? No, but it may help you.<br />
* ChadoXML assists with incremental updates, if you want to avoid flush-and-reload.<br />
* While update can be achived by other middleware (for example, perl Class::DBI, Java Hibernate), ChadoXML provide additional feature as way to archive your transaction.<br />
* It provides bulk update/download which other methods lack or is inefficient<br />
<br />
=====Components=====<br />
<br />
* Database & Schema<br />
* ChadoXML Specification<br />
* DumpSpec<br />
** DumpSpec files are simple XML files that tells XORT what to do<br />
** DumpSpec files are ''language independent'', being XML<br />
** It's fairly easy for those who know the schema to read these files and understand what the operation is<br />
<br />
=====Highlights of Chado XML Specification=====<br />
<br />
* Unique represent of specific database schema<br />
* Get away with those internal primary key value<br />
* Static vs. Operational<br />
* Encoding for non-ASCII characters<br />
* Macro mechanism (object reference)<br />
<br />
=====Putting it together: New FlyBase dataflow Part 1=====<br />
<br />
There are three Flybase sites, and most curation is done at Harvard and<br />
Cambridge. Proforma is the curation format at Cambridge and Harvard, but<br />
Harvard also curates with Apollo and ChadoXML.<br />
<br />
Once in Chado, the reporting instance, there's a denormalization step<br />
in moving data to a read-only database. Once in the read-only database there are<br />
dumps, for reporting purposes, using XORT to create ChadoXML. Once<br />
ChadoXML is created version 2 of XSLT is used to create HTML and GFF. HTML reports<br />
are for human-readable reports, GFF for GBrowse and for various power<br />
users.<br />
<br />
1.a. Proforma (FlyBase Cambridge) is converted to ChadoXML<br />
<br />
1.b. ChadoXML is created by Apollo (Harvard)<br />
<br />
1.c. ChadoXML is created by Java SEAN (Harvard)<br />
<br />
2. All ChadoXML is loaded into Chado by XORT<br />
<br />
=====Putting it together: New FlyBase dataflow Part 2=====<br />
<br />
3. Chado (Harvard) is denormalized and loaded into Chado (Indiana)<br />
<br />
4. ChadoXML is created from Chado using XORT<br />
<br />
5.a. GFF and Fasta is created from ChadoXML<br />
<br />
5.b. HTML is created from Chado XML<br />
<br />
=====Data & Report Generation=====<br />
<br />
* Content of all output files is controlled by XML dumpspecs.<br />
** Dumpspecs are language independent.<br />
** Easily readable (with knowledge of Chado structure).<br />
* All XML transformation steps are done with XSLT v2.<br />
** Saxon XSLT (http://saxon.sourceforge.net/)<br />
** ChadoXML is split into individual chunks before XSLT processing to accommodate large file sizes.<br />
** Extremely fast. We can process all data for ~60,000 Drosophila genes in under 30 minutes.<br />
<br />
=====Hibernate & XORT=====<br />
<br />
* Hibernate didn't scale well when dealing with 5,000+ features in bulk.<br />
** The test was simply calling <code>print()</code> statements<br />
* Performance tweaks for Hibernate can be quite complicated to setup for bulk operations.<br />
* XORT is currently handling ~6 million features in production with only minor performance problems.<br />
* XORT is much more language independent.<br />
<br />
=====Support for complex transactions using XORT=====<br />
<br />
For example:<br />
<br />
* Find all records linked to a record using dumpspec<br />
* Merge gene x into y, each with thousands of records attached<br />
<br />
Step 1. Dump all data use simple dumpspec<br />
<xml><br />
<chado><br />
<feature dump=“all”><br />
<uniquename test=“eq”>x</uniquename><br />
</feature><br />
</chado><br />
</xml><br />
Step 2 Delete feature x from DB, with triggers to clean orphan records, if necessary<br />
<br />
Step 3. Edit the output xml, change uniquename x to y, then load the edited file back to DB<br />
<br />
=====CHIA (Chado Interface Application)=====<br />
<br />
A Java application that organizes SQL and XORT functionality for internal users, e.g.:<br />
<br />
* Dump chado-XML for gene regions for Apollo curation<br />
* Organize and execute “canned” SQL queries<br />
* Serve IDs for curators (in development)<br />
* Dynamic browser Chado without writing SQL statement<br />
<br />
CHIA is being designed to be extensible for adding new functionality as needed.<br />
<br />
<br />
=====Documentation=====<br />
<br />
* ''Using Chado to Store Genome Annotation Data"<br />
** Current Protocols in Bioinformatics (Baxevanis, A.D., and Davison, D.B., eds) 2, 9.6.1-9.6.28.<br />
* XORT specification docs<br />
* XORT draft (unpublished)<br />
* GMOD case demo procedure<br />
** All in the doc directory of XORT package, http://www.gmod.org<br />
<br />
=====Acknowledgements=====<br />
<br />
* Willian Gelbart <br />
* Chris Mungall<br />
* David Emmert <br />
* Mark Gibson<br />
* Stan Letovsky <br />
* Nomi Harris<br />
* Frank Smutniak <br />
* Suzanna Lewis<br />
* Peili Zhang <br />
* Stan Letovsky<br />
* Haiyan Zhang <br />
* Aubrey de Grey<br />
* Andy Schroeder <br />
* Don Gilbert<br />
* Susan Russo<br />
* Mark Zythovicz <br />
* Scott Cain<br />
* Lincoln Stein<br />
* Victor Strelets<br />
* Robert Wilson<br />
* Paul Leyland<br />
<br />
===Chado::AutoDBI===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/projects/gmod/<br />
* Language: Perl<br />
* Authors: Allen Day, Scott Cain, Brian O'Connor, & others<br />
* Users: <br />
* Support:<br />
* Third party code: Based on Class::DBI by Michael Schwern & Tony Bowden<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Performance<br />
** Can one read thousands of objects into memory? You could do this but it's not suited to bulk operations<br />
* Joins & complex queries<br />
<br />
<perl><br />
# Add the add_constructor for looking for name lengths<br />
<br />
__PACKAGE__ ->add_constructor(long_names => qq{ length(name) > 15 });<br />
<br />
# Custom SQL<br />
<br />
__PACKAGE__->set_sql(xfiles => qq{<br />
SELECT FEATURE_ID<br />
FROM FEATURE<br />
WHERE NAME = 'xfiles' });<br />
</perl><br />
<br />
====Presentation by Brian O'Connor====<br />
<br />
This Wiki section is an edited version of [[Media:AutoDBI.pdf|Brian's presentation]].<br />
<br />
=====Relation to Turnkey=====<br />
<br />
Turnkey is a package that auto-generates Web sites given a relational<br />
schema, based on SQL::Translator<br />
<br />
* Turnkey authors: Allen Day, Scott Cain, Brian O'Connor<br />
* Turnkey and Chado::AutoDBI objects are essentially the same<br />
<br />
=====Technical Overview=====<br />
<br />
* Code Generation<br />
<br />
=====Project Overview=====<br />
<br />
Convert SQL Queries/Inserts/Deletes -> Object Calls<br />
<sql><br />
INSERT INTO feature (organism_id, name)<br />
VALUES (1, 'foo');<br />
</sql><br />
To:<br />
<perl><br />
my $feature = Turnkey::Model::Feature->find_or_create({<br />
organism_id => $organism,<br />
name => 'xfile', uniquename => 'xfile',<br />
type_id => $mrna_cvterm,<br />
is_analysis => 'f', is_obsolete => 'f'<br />
});<br />
</perl><br />
<br />
=====Technical Overview=====<br />
<br />
* Database connection: use a base class<br />
* Set up base object and connect, then create a ''table object'' to access primary key. <br />
* Class::DBI can find and insert records into other table, based on foreign key.<br />
<br />
<perl><br />
use base qw(Class::DBI::Pg);<br />
<br />
my ($dsn, $name, $pass);<br />
$dsn = "dbi:Pg:host=localhost;dbname=chado;port=5432";<br />
$name = "postgres";<br />
$pass = "";<br />
<br />
Turnkey::Model::DBI->set_db('Main', $dsn, $name, $pass, {AutoCommit => 1});<br />
</perl><br />
<br />
=====Technical Overview=====<br />
<br />
* Basic ORM Object: Feature<br />
<br />
<perl><br />
package Turnkey::Model::Feature;<br />
use base 'Turnkey::Model::DBI';<br />
<br />
Turnkey::Model::Feature->set_up_table('feature');<br />
<br />
#<br />
# Primary key accessors<br />
#<br />
<br />
sub id { shift->feature_id }<br />
sub feature { shift->feature_id }<br />
</perl><br />
<br />
* data field accessors by Class::Accessor<br />
<br />
=====Technical Overview=====<br />
<br />
* Basic ORM Object: Feature<br />
** has_a<br />
<br />
<perl><br />
#<br />
# has_a<br />
#<br />
Turnkey::Model::Feature->has_a( type_id => "Turnkey::Model::Cvterm" );<br />
sub cvterm { return shift->type_id; }<br />
</perl><br />
<br />
* Basic ORM Object: Feature<br />
** has_many<br />
<br />
<perl><br />
#<br />
# has_many<br />
#<br />
Turnkey::Model::Feature->has_many('feature_synonym_feature_id', <br />
'Turnkey::Model::Feature_Synonym' => 'feature_id');<br />
sub feature_synonyms { return shift->feature_synonym_feature_id; }<br />
<br />
Turnkey::Model::Feature->has_many('featureprop_feature_id', <br />
'Turnkey::Model::Featureprop' => 'feature_id');<br />
sub featureprops { return shift->featureprop_feature_id; }<br />
</perl><br />
<br />
* Can traverse tables, such as going from FEATURE to FEATUREPROP <br />
** Tell base object that the ''table object'' has_a() or has_many() keys corresponding to some key in other ''table object'' <br />
<br />
=====Technical Overview=====<br />
<br />
* Basic ORM Object: Feature<br />
** skipping linker tables for has_many<br />
<br />
<perl><br />
# skip over feature_synonym table<br />
#<br />
# method 1<br />
#<br />
sub synonyms { my $self = shift; return map $_->synonym_id, $self->feature_synonyms; }<br />
#<br />
# method 2<br />
#<br />
Turnkey::Model::Feature->has_many( synonyms2 =><br />
['Turnkey::Model::Feature_Synonym' => 'synonym_id']);<br />
</perl><br />
<br />
=====Technical Overview=====<br />
<br />
* Transactions<br />
** Chado::AutoDBI supports transactions, and one can wrap the transaction in an eval()<br />
<perl><br />
sub do_transaction {<br />
my $class = shift;<br />
my ( $code ) = @_;<br />
# Turn off AutoCommit for this scope.<br />
# A commit will occur at the exit of this block automatically,<br />
# when the local AutoCommit goes out of scope.<br />
local $class->db_Main->{ AutoCommit };<br />
<br />
# Execute the required code inside the transaction.<br />
eval { $code->() };<br />
if ( $@ ) {<br />
my $commit_error = $@;<br />
eval { $class->dbi_rollback }; # might also die!<br />
die $commit_error;<br />
}<br />
}<br />
</perl><br />
<br />
=====Technical Overview=====<br />
<br />
* Lazy Loading<br />
** One can either do automated creation of objects or explicitly dictate which fields are incorporated into object<br />
<perl><br />
Turnkey::Model::Feature->columns( Primary => qw/feature_id/ );<br />
Turnkey::Model::Feature->columns( Essential => qw/name organism_id type_id/ );<br />
Turnkey::Model::Feature->columns( Others => qw/residues .../ );<br />
</perl><br />
<br />
Typically:<br />
<br />
<perl><br />
Turnkey::Model::Feature->set_up_table('feature');<br />
</perl><br />
<br />
=====Problem 1=====<br />
<br />
* Create Feature & Add Description<br />
<br />
<perl><br />
# now create mRNA feature<br />
<br />
my $feature = Turnkey::Model::Feature->find_or_create({<br />
organism_id => $organism,<br />
name => 'xfile', uniquename => 'xfile',<br />
type_id => $mrna_cvterm,<br />
is_analysis => 'f', is_obsolete => 'f'<br />
});<br />
<br />
# create description<br />
<br />
my $featureprop = Turnkey::Model::Featureprop->find_or_create({<br />
value => 'A test gene for GMOD meeting',<br />
feature_id => $feature,<br />
type_id => $note_cvterm,<br />
});<br />
</perl><br />
<br />
=====Problem 2=====<br />
<br />
* Retrieve a Feature via Searching<br />
** Search using strings or identifiers, a search will return an iterator object<br />
<br />
<perl><br />
# objects for global use<br />
<br />
# the organism for our new feature<br />
my $organism = Turnkey::Model::Organism->search(abbreviation => "S.cerevisiae")->next;<br />
<br />
# the cvterm for a "Note"<br />
my $note_cvterm = Turnkey::Model::Cvterm->retrieve(2);<br />
<br />
# searching name by wildcard<br />
<br />
my @results = Turnkey::Model::Feature->search_like(name => 'x-%');<br />
</perl><br />
<br />
=====Problems 3, 4, & 5=====<br />
<br />
* Update a Feature<br />
<br />
<perl><br />
# update the xfile gene name<br />
<br />
$feature->name("x-file");<br />
$feature->update();<br />
</perl><br />
<br />
* Delete a Feature<br />
<br />
<perl><br />
# now delete the x-file feature<br />
<br />
$feature->delete();<br />
</perl><br />
<br />
=====Things Chado::AutoDBI does well=====<br />
<br />
* Easy to use<br />
* Easy to port<br />
* Use with other DBs<br />
** Both Oracle and Postgres used currently<br />
* Autogenerated via Turnkey<br />
* find_or_create method<br />
* Performance is not as bad as you might guess<br />
** Due to Lazy loading<br />
** Even whole genome operations are feasible<br />
<br />
Note that speed is relative: one can find bad performance using the wrong SQL and Chado::AutoDBI approach will be speedier.<br />
<br />
<br />
=====For More Information=====<br />
<br />
* Class::DBI<br />
** http://www.class-dbi.com<br />
** http://search.cpan.org<br />
<br />
* Turnkey<br />
** http://turnkey.sf.net<br />
<br />
* Biopackages<br />
** http://biopackages.net<br />
<br />
===Modware===<br />
<br />
====Background====<br />
<br />
* Source: http://gmod-ware.sourceforge.net<br />
* Language: Perl<br />
* Authors: Sohel Merchant, Eric Just<br />
* Contact: e-just [at] northwestern.edu<br />
* Users: DictyBase<br />
* Support: <br />
* Third party code: GMOD, BioPerl<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Does not ''cover'' all of Chado<br />
* Not enough users to get quality feedback yet<br />
* Performance (?)<br />
* Language dependent<br />
<br />
====Presentation by Eric Just====<br />
<br />
Eric Just, Senior Bioinformatics Scientist, dictyBase: http://dictybase.org<br />
Center for Genetic Medicine, Northwestern University. This is an edited version of [[Media:Modware.pdf|Eric's presentation]].<br />
<br />
=====Why Modware Was Developed=====<br />
<br />
* Each feature type requires different behavior<br />
* Want to leave schema semantics out of application<br />
* Want to leverage work done in BioPerl<br />
* Re-use code developed for common use cases<br />
* DictyBase is using a superset of Modware<br />
** Some DictyBase-specific code used there<br />
<br />
=====What is in the Feature Table?=====<br />
<br />
The core of Chado<br />
<br />
* Chromosome<br />
* Contig<br />
* Gene<br />
* mRNA<br />
* Exon<br />
* Lots of other things - See Sequence Ontology!<br />
<br />
=====Modware Features=====<br />
<br />
* Multiple Feature classes<br />
** CHROMOSOME, GENE, MRNA, CONTIG<br />
* Each class provides type specific methods<br />
* Logic such as building exon structure of mRNA features is encapsulated<br />
* Parent class Modware::Feature<br />
** Provides common methods<br />
** Abstract factory for various feature types<br />
* Lazy : information is only retrieved when you ask for it, but cached for <br />
speedy retrieval the next time it is required<br />
* Uses Bioperl and its objects<br />
** Common methods such as name(), primary_id(), external_ids() <br />
* Subclasses provide type-specific methods<br />
** For example, Chromosome isn't the same as Gene which isn't the same as ...<br />
* Any feature type not explicitly supported in Modware::Feature class is blessed <br />
as a Modware::Feature::GENERIC class which has a start/stop coordinate on a genomic <br />
sequence feature (no structure like a trasncript with exons)<br />
<br />
=====Architectural Overview=====<br />
<br />
* Object-oriented Perl interface to Chado<br />
* Built on top of Chado::AutoDBI<br />
* Connection handled by GMOD<br />
* Database transactions supported<br />
* BioPerl used to represent and manipulate sequence and feature structure<br />
* ‘Lazy’ evaluation<br />
<br />
=====Create and Insert Chromosome=====<br />
<br />
<perl><br />
my $seq_io = new Bio::SeqIO(<br />
-file => "../data/fake_chromosome.txt",<br />
-format => 'fasta'<br />
);<br />
<br />
# Bio::SeqIO will return a Bio::Seq object which<br />
# Modware uses as its representation<br />
my $seq = $seq_io->next_seq();<br />
<br />
my $reference_feature = new Modware::Feature(<br />
-type => 'chromosome',<br />
-bioperl => $seq,<br />
-description => "This is a test",<br />
-name => 'Fake',<br />
-source => 'GMOD 2007 Demo'<br />
);<br />
<br />
# Inserts chromosome into database<br />
$reference_feature->insert();<br />
</perl><br />
<br />
<br />
=====Problem 1 - Create and Insert a Gene=====<br />
<br />
1) Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned feature_id for each inserted gene.<br />
<br />
Gene Feature<br />
symbol: x-ray<br />
synonyms: none<br />
mRNA Feature<br />
exon:<br />
start: 1703<br />
end: 1900<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 1 - Create and Insert a Gene=====<br />
<br />
1) Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned feature_id for each inserted gene.<br />
<br />
Gene Feature<br />
symbol: x-men<br />
synonyms: wolverine<br />
mRNA Feature<br />
exon_1: <br />
start: 12648<br />
end: 13136<br />
strand: 1<br />
srcFeature_id: <br />
Id of genomic sample<br />
<br />
=====Problem 1 - Create and Insert a Gene=====<br />
<br />
1) Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned feature_id for each inserted gene.<br />
<br />
Gene Feature<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
mRNA Feature<br />
exon_1: <br />
start: 13691<br />
end: 13767<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
exon_2: <br />
start: 14687<br />
end: 14720<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 1 - Create and Insert a Gene=====<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
…<br />
<br />
<perl><br />
my $gene_feature = new Modware::Feature(<br />
-type => 'gene',<br />
-name => 'xfile',<br />
-description => 'A test gene for GMOD meeting',<br />
-source => 'GMOD 2007 Demo‘<br />
);<br />
<br />
$gene_feature->add_synonym( 'mulder' );<br />
$gene_feature->add_synonym( 'scully' );<br />
<br />
# inserts object into database<br />
$gene_feature->insert();<br />
print 'Inserted gene with feature_id:'.$gene_feature->feature_id()."\n";<br />
</perl><br />
<br />
=====Problem 1 - Create mRNA BioPerl Object=====<br />
<br />
exon_1: exon_2: <br />
start: 13691 start: 14687<br />
end: 13767 end: 14720<br />
strand: 1 strand: 1<br />
srcFeature_id: Id of genomic sample srcFeature_id: Id of genomic sample<br />
<br />
<perl><br />
# First, create exon features (using Bioperl)<br />
my $exon_1 = new Bio::SeqFeature::Gene::Exon (<br />
-start => 13691,<br />
-end => 13767,<br />
-strand => 1,<br />
-is_coding => 1<br />
);<br />
<br />
my $exon_2 = new Bio::SeqFeature::Gene::Exon (<br />
-start => 14687,<br />
-end => 14720,<br />
-strand => 1,<br />
-is_coding => 1<br />
);<br />
<br />
# Next, create transcript feature to 'hold' exons (using Bioperl)<br />
my $bioperl_mrna = new Bio::SeqFeature::Gene::Transcript();<br />
<br />
# Add exons to transcript (using Bioperl)<br />
$bioperl_mrna->add_exon( $exon_1 );<br />
$bioperl_mrna->add_exon( $exon_2 );<br />
</perl><br />
<br />
=====Problem 1 - Create and Insert mRNA=====<br />
<br />
The BioPerl object holds the location information, but now we want to create a Modware object and link it to the gene as well as locate it on the chromosome.<br />
<br />
<perl><br />
# Now create Modware Feature to 'hold' bioperl object<br />
my $mrna_feature = new Modware::Feature(<br />
-type => 'mRNA',<br />
-bioperl => $bioperl_mrna,<br />
-source => 'GMOD 2007 Demo',<br />
-reference_feature => $reference_feature<br />
);<br />
<br />
# Associate mRNA to gene (required for insertion)<br />
$mrna_feature->gene( $gene_feature );<br />
<br />
# inserts object into database<br />
$mrna_feature->insert();<br />
</perl><br />
<br />
=====Problem 2 - Writing the Report=====<br />
<br />
2) Retrieve and print the following report for gene xfile<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>xfile cds<br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
Create new package, GMODWriter, to write the report, this package<br />
uses Modware and Bioperl methods.<br />
<br />
<perl><br />
use Modware::Gene;<br />
use GMODWriter;<br />
<br />
my $xfile_gene = new Modware::Gene( -name => 'xfile' );<br />
GMODWriter->Write_gene_report( $xfile_gene );<br />
</perl><br />
<br />
* What's the difference between Modware::Gene and Modware::Feature? Gene is-a Feature.<br />
<br />
=====Problem 2 - Writing the Report=====<br />
<br />
2) Retrieve and print the following report for gene xfile<br />
<br />
* The mRNA object contains the Bioperl object<br />
** Why not just subclass? More flexibility the way shown here<br />
<br />
<perl><br />
package GMODWriter; <br />
sub Write_gene_report {<br />
my ($self, $gene) = @_;<br />
my $symbol = $gene->name();<br />
<br />
my @synonyms = @{ $gene->synonyms() };<br />
my $syn_string = join ",", @synonyms;<br />
my $description = $gene->description();<br />
my $type = $gene->type();<br />
# get features associated with the gene that are of type 'mRNA'<br />
my ($mrna) = grep { $_->type() eq 'mRNA' } @{ $gene->features() };<br />
# use bioperl method to get exons from mRNA<br />
my @exons = $mrna->bioperl->exons_ordered();<br />
# Modware will return a nice fasta file for you.<br />
my $fasta = $mrna->sequence( -type => 'cds', -format => 'fasta' );<br />
# Now print the actual report<br />
print "symbol: $symbol\n";<br />
print "synonyms: $syn_string\n";<br />
print "description: $description\n";<br />
print "type: $type\n";<br />
<br />
my $count = 0;<br />
foreach my $exon (@exons ) {<br />
$count++;<br />
print "exon${count} start: ".$exon->start()."\n";<br />
<br />
print "exon${count} end: ".$exon->end()."\n";<br />
<br />
}<br />
print "$fasta";<br />
}<br />
. . .<br />
</perl><br />
<br />
=====Problem 3 - Updating a Gene Name=====<br />
<br />
3) Update the gene xfile: change the name symbol to x-file and retrieve the changed record. Regenerate gene report<br />
<br />
<perl><br />
use Modware::Gene;<br />
use Modware::DBH;<br />
use GMODWriter;<br />
<br />
eval{<br />
<br />
# get xfile gene<br />
my $xfile_gene = new Modware::Gene( -name => 'xfile' );<br />
<br />
# change the name<br />
$xfile_gene->name( 'x-file' );<br />
# write changes to database<br />
$xfile_gene->update();<br />
<br />
# we can use the original object if we want, but instead<br />
# we refetch from the database to 'prove' the name has been changed<br />
my $xfile_gene2 = new Modware::Gene( -name => 'x-file' );<br />
# use our GMODWriter package to write report for x-file<br />
GMODWriter->Write_gene_report( $xfile_gene2 );<br />
<br />
};<br />
if ($@){<br />
warn $@;<br />
new Modware::DBH->rollback();<br />
}<br />
</perl><br />
<br />
<br />
=====Problem 4 - Search and Display Results=====<br />
<br />
4) Search for all genes with symbols starting with "x-*". With the results produce the following simple result list (organism will vary):<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
1325 x-ray Xenopus laevis<br />
<br />
<perl><br />
use Modware::Gene;<br />
use Modware::DBH;<br />
use GMODWriter;<br />
<br />
# find genes starting with 'x-'<br />
my $results = Modware::Search::Gene->Search_by_name( 'x-*' );<br />
<br />
# write the search results<br />
GMODWriter->Write_search_results( $results )<br />
</perl><br />
<br />
<br />
=====Problem 4 - Search and Display Results=====<br />
<br />
4) Search for all genes with symbols starting with "x-*". With the results produce the following simple result list (organism will vary):<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
1325 x-ray Xenopus laevis<br />
<br />
<br />
<perl><br />
sub Write_search_results {<br />
my ($self, $itr) = @_;<br />
# loop through iterator<br />
while (my $gene = $itr->next) {<br />
# print the requested information<br />
print $gene->feature_id . "\t" . $gene->name .<br />
"\t" . $gene->organism_name . "\n";<br />
}<br />
}<br />
</perl><br />
<br />
=====Problem 5 - Delete a Gene=====<br />
<br />
5) Delete the gene x-ray. Run the search and report again.<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
<br />
<perl><br />
# get the xray gene<br />
my $xray = new Modware::Gene( -name => 'x-ray' );<br />
<br />
# set is_deleted = 1, this will 'hide' the gene from searches,<br />
# also sets the is_available to 0, the gene is no longer visible<br />
# to a search.<br />
<br />
$xray->is_deleted(1);<br />
<br />
# write change to database<br />
$xray->update();<br />
<br />
# find genes starting with 'x-'<br />
my $results = Modware::Search::Gene->Search_by_name( 'x-*' );<br />
<br />
# write the search results<br />
GMODWriter->Write_search_results( $results )<br />
</perl><br />
<br />
<br />
=====Other Modware Highlights=====<br />
<br />
* Easy to write applications with Modware<br />
* Extensible<br />
* Available through Sourceforge<br />
** http://gmod-ware.sourceforge.net<br />
* Easy to install<br />
* Large unit test coverage<br />
* Current release 0.2-RC1<br />
** Works with GMOD’s latest release<br />
* Sample script demoed here are available<br />
** sample_scripts directory<br />
<br />
=====Other Nice Things About Modware=====<br />
<br />
<br />
* Bioperl-style documentation <br />
** http://gmod-ware.sourceforge.net/doc/<br />
** POD for all methods<br />
* If Chado changes then...<br />
** Manually change Modware or ... <br />
** AutoDBI will automatically adjust to the change, depends on the change<br />
* Can set multiple connections through AutoDBI's <code>set_connection</code><br />
<br />
=====Coming Attractions=====<br />
<br />
* Support for changing genomic sequence<br />
* ncRNAs<br />
* UTRs<br />
* Onotology modules<br />
* Phenotype Annotations<br />
* Getting a new database handle returns the existing<br />
** Thinking about configuring modules to set what database handle can be used<br />
* Pass an argument ''type'' to the Gene's feature() method<br />
* Type the kind of synonym is being inserted?<br />
** Possible: trade-off between simplicity and functionality<br />
<br />
* Send us your ideas!<br />
<br />
<br />
=====Discussion=====<br />
<br />
* How hard is it to extend Modware?<br />
** Not known absolutely, but generally thought to be not difficult<br />
<br />
=====Acknowlegments=====<br />
<br />
* Rex Chisholm, PhD<br />
* Warren Kibbe, PhD <br />
* Scott Cain<br />
* Brian O’connor<br />
* Sohel Merchant<br />
* Petra Fey<br />
* Pascale Gaudet,<br />
* Karen Pilcher<br />
<br />
* BioPerl<br />
* GMOD<br />
* SGD<br />
<br />
===GBrowse (DasI) Adaptor===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/project/showfiles.php?group_id=27707&package_id=34513&release_id=433523<br />
* Language: Perl<br />
* Authors: Scott Cain<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity: Connects vi perl DBI<br />
* Transaction support: N/A (read only adapter)<br />
* Code generation: N/A<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Read-only<br />
* Not generic middleware but if you use Chado and GBrowse may be useful<br />
* Incomplete implementation of Bio::DasI; just enough to make GBrowse work<br />
* Also, despite the name, has never been tested with a Das server.<br />
<br />
====Presentation by Scott Cain====<br />
<br />
This Wiki section is an edited version of [[Media:DasI_middleware.pdf|Scott's presentation]].<br />
<br />
=====Create the database=====<br />
<br />
$ perl Makefile.PL<br />
$ make<br />
$ sudo make install<br />
$ make load_schema<br />
$ make prepdb # now with Xenopus!<br />
$ make ontologies # load rel, SO, featureprop<br />
<br />
=====Problem 1 - Loading Data=====<br />
<br />
Create some GFF from the specifications:<br />
<br />
fake_chromosome example chromosome 1 15017 . . . ID=fake_chromosome;Name=fake_chromosome<br />
fake_chromosome example gene 13691 14720 . + . ID=xfile;Name=xfile;Alias=mulder,scully;Note=A test gene for GMOD meeting<br />
fake_chromosome example mRNA 13691 14720 . + . ID=xfile_mRNA;Parent=xfile<br />
fake_chromosome example exon 13691 13767 . + . Parent=xfile_mRNA<br />
fake_chromosome example exon 14687 14720 . + . Parent=xfile_mRNA<br />
fake_chromosome example gene 12648 13136 . + . ID=x-men<br />
<br />
Gene inserted as GFF using a standard Bioperl bulk loader:<br />
<br />
<code>$ gmod_bulk_load_gff3.pl -g sample.gff</code><br />
<br />
''...lots of output...''<br />
<br />
=====Adaptor Components=====<br />
<br />
* Bio::DB::Das::Chado<br />
** Database connection object<br />
* Bio::DB::Das::Chado::Segment<br />
** Object for any range of DNA<br />
* Bio::DB::Das::Chado::Segment::Feature<br />
<br />
=====Use Bio::DB::Das::Chado=====<br />
<br />
<perl><br />
use Bio::DB::Das::Chado;<br />
<br />
my $chado = Bio::DB::Das::Chado->new(<br />
-dsn => "dbi:Pg:dbname=test",<br />
-user=> "scott",<br />
-pass=> "" ) || die "no new chado";<br />
<br />
my $gene_name = 'xfile';<br />
<br />
my ($gene_fo) = $chado->get_features_by_name($gene_name);<br />
</perl><br />
<br />
=====Problem 2 - Use Some Accessors=====<br />
<br />
<perl><br />
print "symbol: " . $gene_fo->display_name."\n";<br />
print "synonyms: " . join(', ',$gene_fo->synonyms)."\n";<br />
print "description: " . $gene_fo->notes."\n";<br />
print "type: " . $gene_fo->type."\n";<br />
<br />
my ($mRNA) = $gene_fo->sub_SeqFeature();<br />
my @exons = $mRNA->sub_SeqFeature();<br />
<br />
for my $exon (@exons) {<br />
next unless ($exon->type->method eq 'exon');<br />
$exon_count++;<br />
print "exon$exon_count start: " . $exon->start."\n";<br />
print "exon$exon_count end: " . $exon->end. "\n";<br />
$cds_seq .= $exon->seq->seq; # the first seq call returns a Bio::Seq object, the second gets the DNA string from Bio::Seq<br />
} <br />
</perl><br />
<br />
=====Bulk Output=====<br />
<br />
<perl><br />
my $gene_name = 'x-*';<br />
<br />
my @genes = $chado->get_features_by_name(<br />
-name => $gene_name,<br />
-class=> 'gene' );<br />
<br />
for my $gene (@genes) {<br />
print join("\t",<br />
$gene->feature_id,<br />
$gene->display_name,<br />
$gene->organism),"\n";<br />
}<br />
</perl><br />
<br />
Or see your report in GBrowse<br />
<br />
=====Advantages=====<br />
<br />
* Comes 'for free' with GBrowse<br />
** GBrowse will run with any DasI-compatible interface<br />
* Uses 'familiar' BioPerl idioms, very similar to widely used Bio::DB::GFF (though with fewer methods)<br />
<br />
<br />
=====Conclusion=====<br />
<br />
* Not suitable as a 'general' middleware layer<br />
** May be suitable for some applications, particularly if they are similar to GBrowse or other uses of Bio::DB::GFF<br />
<br />
===iBatis and Abator===<br />
<br />
====Background====<br />
<br />
* Source: http://ibatis.apache.org/<br />
* Language: Java<br />
* Authors: Apache group<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Does not hide SQL<br />
* Does not create a whole object model of the database in memory<br />
* Not as widely used as Hibernate<br />
* No Perl version<br />
<br />
====Presentation by Jeff Bowes====<br />
<br />
Jeff Bowes, Xenbase, University of Calgary. This Wiki section is an edited version of [[Media:iBatis.pdf|Jeff's presentation]].<br />
<br />
=====ibatis=====<br />
<br />
* iBatis<br />
** Light-weight framework<br />
** Still based on SQL but eliminates the repetitive drudgery of JDBC<br />
** You can tune a query by re-writing the SQL in XML & the API does not change.<br />
* iBatis does not create your database in memory as objects<br />
* Shallow learning curve<br />
* Manually create a Java class and SQL map to describe higher-level objects<br />
** Example: ''Gene''<br />
* Support for inheritance<br />
** Inheritance in result maps, allows fair amount of re-use. <br />
* Supports different transaction schemes<br />
** For example, JDBC, Java Transaction API<br />
<br />
=====Abator=====<br />
<br />
* Generates ibatis CRUD objects by introspecting database tables<br />
* Abator creates ''SQL in XML'' files (SQL Map files) and Java classes <br />
** Within these files is a Result Map section.<br />
* Abator config files are simple, set connection parameters, tell where the files are.<br />
* In the SQL Map files you can specify how to find parent ids, such asfeature_id. <br />
<br />
=====Abator Example=====<br />
<xml><br />
<abatorConfiguration><br />
<abatorContext> <!-- TODO: Add Database Connection Information --><br />
<jdbcConnection driverClass="COM.ibm.db2.jdbc.app.DB2Driver"<br />
connectionURL="jdbc:db2:XBDV05"<br />
userId="db2inst1"<br />
password=“*******"><br />
<classPathEntry location="/Program Files/IBM/SQLLIB/java/db2java.zip" /><br />
</jdbcConnection><br />
<br />
<javaModelGenerator<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.model"<br />
targetProject="gene" /><br />
<sqlMapGenerator<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.sql"<br />
targetProject="gene" /><br />
<daoGenerator type="IBATIS"<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.dao"<br />
targetProject="gene" /><br />
<abatorConfiguration><br />
</xml><br />
<br />
=====Abator Example=====<br />
<xml><br />
<abatorConfiguration><br />
<abatorContext> <!-- TODO: Add Database Connection Information --><br />
<jdbcConnection driverClass="COM.ibm.db2.jdbc.app.DB2Driver"<br />
connectionURL="jdbc:db2:XBDV05"<br />
userId="db2inst1"<br />
password=“*******"><br />
<classPathEntry location="/Program Files/IBM/SQLLIB/java/db2java.zip" /><br />
</jdbcConnection><br />
<br />
<javaModelGenerator<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.model"<br />
targetProject="gene" /><br />
<sqlMapGenerator<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.sql"<br />
targetProject="gene" /><br />
<daoGenerator type="IBATIS"<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.dao"<br />
targetProject="gene" /><br />
<abatorConfiguration><br />
</xml><br />
<br />
=====Abator Example=====<br />
<nowiki><br />
<table schema="db2inst1" tableName="synonym"></nowiki><br />
<generatedKey column="synonym_id" sqlStatement="VALUES PREVVAL FOR<br />
synonym_seq" identity="true" /><br />
<columnOverride column="CREATED_BY" jdbcType="INTEGER" /><br />
<columnOverride column="MODIFIED_BY" jdbcType="INTEGER" /><br />
<nowiki></table></nowiki><br />
<br />
=====Abator=====<br />
<br />
Works as:<br />
<br />
* Eclipse plug-in<br />
* ANT<br />
* Standalone<br />
<br />
=====DAO Methods=====<br />
<br />
* Insert (Feature)<br />
* Update (Feature)<br />
* DeletebyKey (FeatureKey)<br />
* SelectbyKey (FeatureKey)<br />
* SelectbyExample (FeatureExample)<br />
* DeletebyExample (FeatureExample)<br />
<br />
=====Insert=====<br />
<xml><br />
<insert id="abatorgenerated_insert" parameterClass=<br />
"org.gmod.architecture.framwork.bakeoff.abator.model.FeatureWithBLOBs"><br />
insert into db2inst1.feature<br />
(DBXREF_ID, ORGANISM_ID, NAME, UNIQUENAME,<br />
RESIDUES, SEQLEN, MD5CHECKSUM, TYPE_ID, IS_ANALYSIS,<br />
IS_OBSOLETE, CREATED_BY)<br />
values (#dbxrefId:INTEGER#, #organismId:INTEGER#, #name:VARCHAR#,<br />
#uniquename:VARCHAR#, #residues:CLOB#, #seqlen:INTEGER#,<br />
#md5checksum:CHAR#, #typeId:INTEGER#,<br />
#isAnalysis:SMALLINT#, #isObsolete:SMALLINT#,<br />
#createdBy:INTEGER#)<br />
<br />
<selectKey resultClass="java.lang.Integer" keyProperty="featureId"><br />
VALUES PREVVAL FOR feature_seq<br />
</selectKey><br />
</insert><br />
</xml><br />
<br />
=====Insert=====<br />
<xml><br />
<insert id="abatorgenerated_insert" parameterClass=<br />
"org.gmod.architecture.framwork.bakeoff.abator.model.FeatureWithBLOBs"><br />
insert into db2inst1.feature<br />
(DBXREF_ID, ORGANISM_ID, NAME, UNIQUENAME,<br />
RESIDUES, SEQLEN, MD5CHECKSUM, TYPE_ID, IS_ANALYSIS,<br />
IS_OBSOLETE, CREATED_BY)<br />
values (#dbxrefId:INTEGER#, #organismId:INTEGER#, #name:VARCHAR#,<br />
#uniquename:VARCHAR#, #residues:CLOB#, #seqlen:INTEGER#,<br />
#md5checksum:CHAR#, #typeId:INTEGER#,<br />
#isAnalysis:SMALLINT#, #isObsolete:SMALLINT#,<br />
#createdBy:INTEGER#)<br />
<br />
<selectKey resultClass="java.lang.Integer" keyProperty="featureId"><br />
VALUES PREVVAL FOR feature_seq<br />
</selectKey><br />
</insert><br />
</xml><br />
<br />
=====Insert=====<br />
<xml><br />
<insert id="abatorgenerated_insert" parameterClass=<br />
"org.gmod.architecture.framwork.bakeoff.abator.model.FeatureWithBLOBs"><br />
insert into db2inst1.feature<br />
(DBXREF_ID, ORGANISM_ID, NAME, UNIQUENAME,<br />
RESIDUES, SEQLEN, MD5CHECKSUM, TYPE_ID, IS_ANALYSIS,<br />
IS_OBSOLETE, CREATED_BY)<br />
values (#dbxrefId:INTEGER#, #organismId:INTEGER#, #name:VARCHAR#,<br />
#uniquename:VARCHAR#, #residues:CLOB#, #seqlen:INTEGER#,<br />
#md5checksum:CHAR#, #typeId:INTEGER#,<br />
#isAnalysis:SMALLINT#, #isObsolete:SMALLINT#,<br />
#createdBy:INTEGER#)<br />
<br />
<selectKey resultClass="java.lang.Integer" keyProperty="featureId"><br />
VALUES PREVVAL FOR feature_seq<br />
</selectKey><br />
</insert><br />
</xml><br />
<br />
=====Insert=====<br />
<xml><br />
<selectKey resultClass="java.lang.Integer"<br />
keyProperty="featureId"><br />
VALUES PREVVAL FOR feature_seq<br />
</selectKey><br />
</xml><br />
<br />
=====Insert=====<br />
<xml><br />
<selectKey resultClass="java.lang.Integer"<br />
keyProperty="featureId"><br />
VALUES PREVVAL FOR feature_seq<br />
</selectKey><br />
</xml><br />
<br />
=====Problem 1 - Insert=====<br />
<br />
<java><br />
try {<br />
sqlMap.startTransaction();<br />
pGene.id =featureDAO.insert(pGene.getFeatureWithBLOBs());<br />
featurepropDAO.insert(pGene.getPropertyDescription());<br />
pGene.featurelocId = featurelocDAO.insert(pGene<br />
.getFeaturelocWithBLOBS());<br />
pGene = insertExons(pGene);<br />
insertSynonyms(pGene);<br />
sqlMap.commitTransaction();<br />
} catch (Exception e) {<br />
System.out.println(e);<br />
throw (e);<br />
} finally {<br />
sqlMap.endTransaction();<br />
}<br />
</java><br />
<br />
=====Problem 1 - Insert=====<br />
<br />
<java><br />
try {<br />
sqlMap.startTransaction();<br />
pGene.id =featureDAO.insert(pGene.getFeatureWithBLOBs());<br />
featurepropDAO.insert(pGene.getPropertyDescription());<br />
pGene.featurelocId = featurelocDAO.insert(pGene<br />
.getFeaturelocWithBLOBS());<br />
pGene = insertExons(pGene);<br />
insertSynonyms(pGene);<br />
sqlMap.commitTransaction();<br />
} catch (Exception e) {<br />
System.out.println(e);<br />
throw (e);<br />
} finally {<br />
sqlMap.endTransaction();<br />
}<br />
</java><br />
<br />
=====Transactions=====<br />
<br />
* SQLMap<br />
* JDBC<br />
* JTA - Java Transaction API<br />
** 2-Phase commit<br />
* Hibernate<br />
* External (Customized)<br />
<br />
=====Retrieval=====<br />
<br />
symbol: xfile<br />
description: A test gene for GMOD meeting<br />
mRNA Feature<br />
exon_1: start: 13691 end: 13767 <br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
exon_2: start: 14687 end: 14720<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 2 - Master Detail Reports=====<br />
<br />
Account for cycles or recursion in Master Detail Report. <br />
<br />
<xml><br />
<resultMap id="SelectGeneResults"<br />
class="org.gmod.architecture.framwork.bakeoff.Gene" groupBy="id"><br />
<result column="FEATURE_ID" property="id" jdbcType="INTEGER"/><br />
<result column="GENE_NAME" property="name" jdbcType="VARCHAR" /><br />
<result column="DESCRIPTION" property="description“<br />
jdbcType="VARCHAR" /><br />
<result column="TYPE_ID" property="typeId" jdbcType="INTEGER" /><br />
<result property="exons" resultMap = "gene.SelectExonResults"/><br />
</resultMap><br />
<br />
<resultMap id="SelectExonResults"<br />
class="org.gmod.architecture.framwork.bakeoff.Exon"><br />
<result column="EXON_ID" property="id" jdbcType="INTEGER"/><br />
<result column="EXON_NAME" property="name" jdbcType="VARCHAR" /><br />
<result column="EXON_RESIDUES" property="residues" jdbcType="CLOB" /><br />
<result column="STRAND" property="strand" jdbcType="INTEGER" /><br />
<result column="FMIN" property="fmin" jdbcType="INTEGER" /><br />
<result column="FMAX" property="fmax" jdbcType="INTEGER" /><br />
<result column="SRCFEATURE_ID" property="sourceFeatureId"<br />
jdbcType="INTEGER" /><br />
</resultMap><br />
</xml><br />
<br />
=====Problem 2 - Master Detail Report=====<br />
<xml><br />
<resultMap id="SelectGeneResults"<br />
class="org.gmod.architecture.framwork.bakeoff.Gene" groupBy="id"><br />
<result column="FEATURE_ID" property="id" jdbcType="INTEGER"/><br />
<result column="GENE_NAME" property="name" jdbcType="VARCHAR" /><br />
<result column="DESCRIPTION" property="description“<br />
jdbcType="VARCHAR" /><br />
<result column="TYPE_ID" property="typeId" jdbcType="INTEGER" /><br />
<result property="exons" resultMap = "gene.SelectExonResults"/><br />
</resultMap><br />
<br />
<resultMap id="SelectExonResults"<br />
class="org.gmod.architecture.framwork.bakeoff.Exon"><br />
<result column="EXON_ID" property="id" jdbcType="INTEGER"/><br />
<result column="EXON_NAME" property="name" jdbcType="VARCHAR" /><br />
<result column="EXON_RESIDUES" property="residues" jdbcType="CLOB" /><br />
<result column="STRAND" property="strand" jdbcType="INTEGER" /><br />
<result column="FMIN" property="fmin" jdbcType="INTEGER" /><br />
<result column="FMAX" property="fmax" jdbcType="INTEGER" /><br />
<result column="SRCFEATURE_ID" property="sourceFeatureId"<br />
jdbcType="INTEGER" /><br />
</resultMap><br />
</xml><br />
<br />
=====Master Detail Report=====<br />
<br />
gene_id Symbol Type Fmin Fmax<br />
6129482 x-files gene 13691 13767<br />
6129482 x-files gene 14687 14720<br />
<br />
Becomes:<br />
<br />
gene_id Symbol Type Fmin Fmax<br />
6129482 x-files gene 13691 13767<br />
14687 14720<br />
<br />
=====Dynamic Queries=====<br />
<br />
* Gene Name (Description)<br />
** Feature, Featureprop<br />
* Symbol<br />
** Feature<br />
* Feature Synonyms<br />
** Feature, Feature_Synonym, Synonym<br />
* Ortholog Synonyms<br />
** Feature, Feature_relationship, Feature, Feature Synonyms<br />
<br />
=====Dynamic Queries=====<br />
<br />
FROM<br />
CAT_X_GENE_V gc<br />
<isEqual<br />
prepend=",property="searchSymbol"<br />
compareValue="true"><br />
GENE_SYMBOLS s<br />
</isEqual><br />
<br />
<isEqual prepend=","<br />
property="searchNcbi" <br />
compareValue="true"><br />
NCBI_GI n<br />
</isEqual><br />
<br />
=====Dynamic Queries=====<br />
<br />
<dynamic prepend="WHERE"><br />
<isEqual prepend="AND" property="searchNameOnly“<br />
compareValue="true"><br />
<iterate property="searchTokens" conjunction="AND" <br />
open=" (" close=") "><br />
LOWER(VARCHAR(gc.longname)) LIKE <br />
LOWER(CAST(#searchTokens[]:VARCHAR# AS VARCHAR(512)))<br />
</iterate><br />
</isEqual><br />
<br />
Iterate very useful for multiple search terms <br />
<br />
=====Miscellaneous Features=====<br />
<br />
* Supports various data sources<br />
** Simple JDBC<br />
** DBCP – Apache Connection Pooling<br />
** JNDI – Java Naming Directory Interface<br />
* Very flexible<br />
* Local caching of results<br />
** Lazy loading<br />
<br />
=====Support=====<br />
<br />
* In GMOD used by<br />
** Xenbase, Artemis at Sanger<br />
* Many other users<br />
** e.g. MySpace.com<br />
* Top level Apache Project<br />
** www.ibatis.apache.org<br />
* Active community<br />
<br />
<br />
=====What iBatis Does Well=====<br />
<br />
* Does not hide SQL<br />
** No new query language to learn<br />
* Separates and groups SQL<br />
* Simple!!<br />
** Light wrapper - No real tweaks<br />
* Does the job well<br />
* Excellent support for Master-Detail<br />
* Dynamically generated queries <br />
** You can structure conditions around clauses in SQL<br />
** One XML statement can represent many variations on a query<br />
<br />
=====Acknowledgements=====<br />
<br />
GMOD<br />
* Eric Just<br />
* Everyone else<br />
<br />
Ibatis Developers<br />
* Kevin Snyder,<br />
* Chris Jarabek,<br />
* Ross Gibb<br />
<br />
PI<br />
* Peter Vize<br />
<br />
Financial Support<br />
* Alberta Heritage Foundation for Medical Research<br />
* Alberta Network for Proteomics Innovation<br />
* University of Calgary, Faculty of Science<br />
* University of Calgary Dept. of Computer Science<br />
* NICHD<br />
<br />
===Hibernate===<br />
<br />
====Background====<br />
<br />
* Source: http://www.hibernate.org<br />
* Language: Java<br />
* Authors: JBoss group<br />
* Users: VectorBase<br />
* Support: JBoss group<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Limitations====<br />
<br />
* Issue around Completeness <br />
* Exception Handling<br />
* Performance Tuning<br />
<br />
====Presentation by Robert Bruggner====<br />
<br />
Chado API via Java & Hibernate, Robert Bruggner, VectorBase.org. This Wiki section is an edited version of [[Media:HibernateChadoAPI.pdf|Robert's presentation]].<br />
<br />
=====Overview=====<br />
<br />
* Background<br />
* Quick Hibernate Overview<br />
* Hibernate Connectivity and O/R Mapping Example<br />
* GMOD Demo<br />
<br />
Also see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]].<br />
<br />
=====Background=====<br />
<br />
* VectorBase<br />
** A bioinformatic resource center for invertebrate vectors of human pathogens<br />
* Responsible for storage and display of multiple organisms’ genomes<br />
** Anopheles gambiae, Aedes aegypti, Ixodes scapularis, Culex pipiens and so on....<br />
* Want to store data for many organisms- Chado a natural choice<br />
* Ensembl Genome Browser already used for ''A. gambiae''<br />
** Wrote Ensembl API Database adaptor for Chado... Not maintainable.<br />
* Use Both Databases<br />
** Transfer genomic data from Ensembl to Chado<br />
** Search Engine and Indexer using Lucene<br />
** Run DAS<br />
** Export data via ChadoXML and GFF3<br />
* Need API for Database I/O<br />
<br />
=====Hibernate Background=====<br />
<br />
* They say: “A powerful, high performance object/relational persistence and query service.”<br />
* Automates the persistence of plain old Java objects (POJO)<br />
** User maps their POJO properties to database tables via XML (HBM File).<br />
** There are Hibernate tools that generate HBMs<br />
*** Configurable in the sense that one can create get & set tables where the methods map one-to-one to fields.<br />
* Persist a specific object by storing it the database.<br />
* Intelligent Database I/O <br />
** Smart detection of ''Dirty Properties'' when performing Save / Update / Delete.<br />
** Cascadable Save / Update / Delete for complex objects.<br />
* Everything's done within the scope of a transaction.<br />
<br />
=====Hibernate Database Connectivity=====<br />
<br />
* Configure Hibernate in hibernate.cfg.xml<br />
* Define a Data Source<br />
** We use a simple, single JDBC connection Chado<br />
** Can be configured to use a connection pool or data source accessible by the Java Naming and Directory Interface (JNDI).<br />
** Define a connection “dialect”<br />
** org.hibernate.dialect.PostgreSQLDialect<br />
* Describe the relationship between Java objects and database tables<br />
** Use XML to describe where to store POJO property data in the database<br />
* Create a new Hibernate Session based on the configuration<br />
* Begin a transaction to start performing work<br />
<br />
=====POJO and HBM Example file - CV=====<br />
<br />
<java><br />
public class CV {<br />
<br />
private int cv_id<br />
private String name;<br />
private String definition;<br />
<br />
public property gettersandsetters() {<br />
....<br />
}<br />
<br />
public boolean equals(CV comparaCV) {<br />
....<br />
}<br />
public int hashCode(){<br />
...<br />
}<br />
}<br />
</java><br />
<xml><br />
<hibernate-mapping><br />
<br />
<class name="org.vectorbase.chadoAPI.chadoObjects.CV" table="cv"><br />
<br />
<id name="cv_id" column="cv_id" unsaved-value="undefined"><br />
<br />
<generator class="sequence"><br />
<br />
<param name="sequence">cv_cv_id_seq</param><br />
<br />
</generator><br />
<br />
</id> <br />
<br />
<property name="name" column=”name” type="java.lang.String” not-null="true"/><br />
<br />
<property name="definition" column=”definition” type="java.lang.String”/><br />
<br />
</class><br />
</hibernate-mapping><br />
</xml><br />
<br />
=====HBM Example CVTerm=====<br />
<br />
<java><br />
public class CVTerm {<br />
<br />
private int cvterm_id;<br />
<br />
private CV cv;<br />
<br />
private String name;<br />
<br />
private String definition;<br />
<br />
private DBXref dbxref;<br />
<br />
private int is_obsolete;<br />
<br />
private int is_relationshiptype;<br />
<br />
.....<br />
<br />
}<br />
</java><br />
<xml><br />
<hibernate-mapping><br />
<br />
<class name="org.vectorbase.chadoAPI.chadoObjects.CVTerm" table="cvterm"><br />
<br />
<id name="cvterm_id" column="cvterm_id" unsaved-value="undefined"><br />
<br />
<generator class="sequence"><br />
<br />
<param name="sequence">cvterm_cvterm_id_seq</param><br />
<br />
</generator><br />
<br />
</id><br />
<br />
<many-to-one name="cv" class="org.vectorbase.chadoAPI.chadoObjects.CV" column="cv_id" <br />
not-null="true" cascade="save-update"/><br />
<br />
<property name="name" not-null="true" type="java.lang.String"/><br />
<br />
<property name="definition"/><br />
<br />
<one-to-one name="dbxref" class="org.vectorbase.chadoAPI.chadoObjects.DBXref" cascade="all"/><br />
<br />
<property name="is_obsolete"/><br />
<br />
<property name="is_relationshiptype"/><br />
<br />
</class><br />
</hibernate-mapping><br />
</xml><br />
<br />
=====Hibernate Object Retrieve=====<br />
<br />
One can use Java, Hibernate Query Language, or SQL, this example uses HQL<br />
<br />
<java><br />
import org.hibernate.Session;<br />
import org.vectorbase.chadoAPI.CVTerm;<br />
import org.vectorbase.chadoAPI.CV;<br />
<br />
// Load the configuration from hibernate.cfg.xml<br />
<br />
// Build a session factory first (not shown)<br />
<br />
// Get the session based on the configuration and begin transaction<br />
Session session = HibernateSessionFactory.getCurrentSession();<br />
session.beginTransaction();<br />
<br />
// Load a CVTerm by its ID<br />
CVTerm cvt = (CVTerm) session.get(CVTerm.class,1);<br />
<br />
// Load a CVTerm using HQL<br />
CVTerm cvt = session.createQuery(“from CVTerm where name=?”).setString(0,”name”).uniqueResult();<br />
<br />
// Print out the name of the cvterm<br />
System.out.println(cvt.getName());<br />
<br />
// Get the cv that the cvterm is associated with<br />
// Hibernate doesn’t return the cv_id - it returns a CV Object.<br />
CV cv = cvt.getCv();<br />
<br />
// Print out the cv’s name<br />
System.out.println(cv.getName());<br />
</java><br />
<br />
=====Hibernate Object Update=====<br />
<br />
<java><br />
import org.hibernate.Session;<br />
import org.vectorbase.chadoAPI.CVTerm;<br />
<br />
// Load the configuration from hibernate.cfg.xml<br />
<br />
// Build a session factory first (not shown)<br />
<br />
// Get the session based on the configuration and begin transaction<br />
Session session = HibernateSessionFactory.getCurrentSession();<br />
session.beginTransaction();<br />
<br />
// Load a CVTerm by its ID<br />
CVTerm cvt = (CVTerm) session.get(CVTerm.class,1);<br />
<br />
// Change cvt’s name<br />
cvt.setName(“New CVTerm name”);<br />
<br />
// Save!<br />
// Generated SQL updates “Dirty” properties (name, in this case)<br />
session.save(cvt);<br />
<br />
// Commit data to database<br />
session.commit();<br />
</java><br />
<br />
=====Hibernate Save=====<br />
<br />
<java><br />
import org.hibernate.Session;<br />
import org.vectorbase.chadoAPI.CVTerm;<br />
import org.vectorbase.chadoAPI.CV;<br />
<br />
// Load the configuration from hibernate.cfg.xml<br />
// Build a session factory first and get begin transaction (not shown)<br />
<br />
// Make a new CV<br />
CV new_cv = new CV();<br />
new_cv.setName(“New CV”);<br />
new_cv.setDefinition(“New CV Def”);<br />
<br />
// Make a new cvterm for that cv<br />
CVTerm new_cvterm = new CVTerm();<br />
new_cvterm.setName(“New CVTerm Name”);<br />
// ..... save dbxref etc......<br />
<br />
// Add that CVTerm to our new CV<br />
new_cv.addCVTerm(new_cvterm);<br />
<br />
// Save the new data...<br />
// Hibernate recognizes that it has to first save new_cv, then save new_cvterm.<br />
session.save(new_cvterm);<br />
<br />
session.commit();<br />
<br />
// You can see the new id’s assigned by the database<br />
System.out.println(new_cv.getCv_id());<br />
System.out.println(new_cvterm.getCvterm_id());<br />
</java><br />
<br />
=====Inheritance=====<br />
<xml><br />
<hibernate-mapping><br />
<br />
<class name="org.vectorbase.chadoAPI.chadoObjects.Feature" table="feature" discriminator-<br />
value="not null"><br />
<br />
<id name="feature_id" column="feature_id" unsaved-value="undefined"><br />
<br />
<generator class="sequence"><br />
<br />
<param name="sequence">feature_feature_id_seq</param><br />
<br />
</generator><br />
<br />
</id> <br />
<br />
<discriminator column="type_id" type="integer" insert="false"/><br />
<br />
<many-to-one name="dbxref" class="org.vectorbase.chadoAPI.chadoObjects.DBXref" <br />
column="dbxref_id" cascade="all"/><br />
<br />
<many-to-one name="organism" class="org.vectorbase.chadoAPI.chadoObjects.Organism" <br />
column="organism_id" not-null="true" cascade="save-update"/><br />
<br />
<property name="name"/><br />
.....<br />
<br />
<hibernate-mapping> <br />
<br />
<subclass name="org.vectorbase.chadoAPI.chadoFeatures.Gene" <br />
extends="org.vectorbase.chadoAPI.chadoObjects.Feature" discriminator-value="767"><br />
<br />
</subclass><br />
</hibernate-mapping><br />
</xml><br />
Write custom methods for specific sub-classes<br />
<br />
=====ChadoAPI=====<br />
<br />
* POJO Mappings<br />
** CV, CVTerm, DB, DBXref, Feature, FeatureCVTerm, FeatureDBXref, FeatureLoc, FeatureProp, FeatureRelationship, FeatureSynonym, Organism, Pub, Synonym<br />
* Extended Features<br />
** Chromosome, Gene, Transcript, Exon, Protein<br />
* Constants<br />
** CVTerms, FeatureFeatureRelationships, Ontologies<br />
* Special<br />
** ChadoAdapter<br />
<br />
=====Problem 1 - GMOD Example=====<br />
<br />
<java><br />
// Set up our session and begin transaction<br />
Session session = HibernateUtil.getSessionFactory().getCurrentSession();<br />
session.beginTransaction();<br />
<br />
// Make a chado adpator and load up some utility objects<br />
ChadoAdaptor ca = new ChadoAdaptor();<br />
Chromosome c = ca.fetchChromosomeByUniqueName("fake_chromosome");<br />
Pub null_pub = ca.fetchPubByPubID(1);<br />
Organism agambiae = ca.fetchOrganismByScientificName("Anopheles","gambiae");<br />
<br />
// Begin GMOD Demo Code<br />
<br />
// Make our new gene;<br />
Gene xfile = new Gene();<br />
xfile.setOrganism(agambiae);<br />
xfile.setUniquename("xfile");<br />
xfile.setDescription("A test gene for GMOD meeting");<br />
<br />
/* Set the location of our gene. No need to set coordinates because they'll be updated<br />
* based on the exon boundaries. <br />
*/<br />
FeatureLoc xfile_loc = new FeatureLoc();<br />
xfile_loc.setSrcfeature(c);<br />
xfile_loc.setStrand(1);<br />
xfile.setFeatureLoc(xfile_loc);<br />
<br />
// Add synonyms to xfile<br />
xfile.createNewFeatureSynonym("mulder", null_pub, CVTerms.EXACT_SYNONYM);<br />
xfile.createNewFeatureSynonym("scully", null_pub, CVTerms.EXACT_SYNONYM);<br />
</java><br />
<br />
=====Problem 2 - GMOD Example=====<br />
<br />
<java><br />
// Create a new transcript for our gene.<br />
Transcript t = xfile.createGeneTranscript("xfile-RA");<br />
<br />
// Create some exons for that transcript.<br />
t.createTranscriptExon("xfile:1", 13691, 13767);<br />
t.createTranscriptExon("xfile:2", 14687, 14720);<br />
<br />
// Save our new gene<br />
session.save(xfile);<br />
System.out.println("xfile feature_id is " + xfile.getFeature_id());<br />
<br />
// Fetch our saved gene from the database<br />
Gene xfile_r = ca.fetchGeneByUniqueName("xfile");<br />
System.out.println("symbol: " + xfile_r.getUniquename());<br />
System.out.print("synonyms: ");<br />
for (FeatureSynonym fs : xfile_r.getFeatureSynonyms()){<br />
<br />
System.out.print(fs.getSynonym().getName() + " ");<br />
}<br />
<br />
System.out.println("description: " + xfile_r.getDescription());<br />
System.out.println("type: " + xfile_r.getType().getName());<br />
<br />
for (Transcript tx : xfile_r.fetchAllTranscripts()){<br />
for (Exon e : tx.fetchAllExons()){<br />
System.out.println(e.getUniquename() + " Start:\t" + e.getFeatureLoc().getFmin());<br />
System.out.println(e.getUniquename() + " End:\t" + e.getFeatureLoc().getFmax());<br />
System.out.println("\tSrcFeatureID: " + e.getFeatureLoc().getSrcfeature().getFeature_id());<br />
}<br />
System.out.println(">" + tx.getUniquename());<br />
System.out.println(tx.generateTranscriptSequenceFromExons().toUpperCase());<br />
}<br />
</java><br />
<br />
=====Problems 3, 4, & 5 - GMOD Update & Delete=====<br />
<br />
<java><br />
// Lets update our name...<br />
xfile_r.setUniquename("x-file");<br />
<br />
session.save(xfile_r);<br />
<br />
// Not part of the ChadoAdaptor utility object, but a good example of HQL<br />
List<Gene> genes = (List<Gene>)session.createQuery("from Gene where uniquename like ?").setString(0,”x-%”).list();<br />
<br />
for (Gene g : genes){<br />
<br />
System.out.println(g.getFeature_id() + <br />
"\t" + g.getUniquename() + <br />
"\t" + g.getOrganism().getGenus() +<br />
" " + g.getOrganism().getSpecies());<br />
}<br />
<br />
// Deleting... hmm...<br />
Gene delete_me = ca.fetchGeneByUniqueName("x-ray");<br />
session.delete(delete_me);<br />
<br />
// All Finished<br />
session.getTransaction().commit();<br />
</java><br />
<br />
<br />
<br />
=====What Hibernate Does Well=====<br />
<br />
* Hibernate can be configured to perform specialized functions<br />
** For example, it has its own notion of a cascade<br />
* Flexible with respect to language<br />
** Java, Hibernate Query Language, or SQL<br />
* Any JDBC driver<br />
<br />
=====Acknowledgements=====<br />
<br />
* VectorBase People<br />
** Frank Collins, EO Stinson, Ryan Butler<br />
* GMOD<br />
* NIAID<br />
<br />
===PSU Chado Interface===<br />
<br />
====Background====<br />
<br />
* Source:<br />
* Language: Java<br />
* Authors: Chinmay Patel, Adrian Tivey<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Limitations====<br />
<br />
* Hibernate Save: no equivalent to a ''find or save'' method.<br />
* Not great for bulk data retrieval.<br />
* Hibernate works best when applied to databases designed with objects in mind (''Object Oriented Databases''). <br />
<br />
<br />
====Presentation by Chinmay Patel====<br />
<br />
This Wiki section is an edited version of [[Media:PSU.pdf|Chinmay's presentation]].<br />
<br />
=====GeneDB=====<br />
<br />
* GeneDB is the organism data and annotation database for the Pathogen Sequencing Unit (PSU) at the Sanger Institute, UK<br />
* Contains 37 organisms, which is expected to grow to 62<br />
* Currently migrating to chado schema<br />
* Java API with two engines Hibernate & iBatis<br />
** Two teams, Artemis and GeneDB, took different approaches<br />
<br />
=====Technical - Connections=====<br />
<br />
Connections are configured in the Spring configuration file<br />
<xml><br />
<bean id="dataSource" class="org.apache.commons.dbcp.BasicDataSource"><br />
<property name="driverClassName" value="org.postgresql.Driver" /><br />
<property name="url" value="jdbc:postgresql://holly.sanger.ac.uk:5432/chado" /><br />
<property name="username" value="DELIBERATELY_BOGUS_NAME"/><br />
<property name="password" value="WIBBLE" /><br />
</bean><br />
</xml><br />
* Uses a connection pool<br />
* Connection to the database is specified graphically, so the iBatis configuration file has variables for the location:<br />
<xml><br />
<property name="JDBC.Driver" value="org.postgresql.Driver"/><br />
<br />
<property name="JDBC.ConnectionURL” value="jdbc:postgresql://${chado}"/><br />
<br />
<property name="JDBC.Username" value="${username}"/><br />
<br />
<property name="JDBC.Password" value="${password}"/><br />
</xml><br />
<br />
* provide database location, username & password<br />
* select from scrollable list of feature with residues (organisms in separate Postgres schemas) what to open in Artemis<br />
<br />
=====Technical - Code Generation=====<br />
<br />
* The shared interface and hibernate implementation were originally generated<br />
* There’s no explicit code generation (although the Spring and Hibernate runtimes may use them behinds the scenes)<br />
<br />
=====Technical - Transactions=====<br />
<br />
* Transactions are fully supported<br />
* There’s no explicit code generation (although the Spring and Hibernate runtimes may use them behind the scenes)<br />
<br />
=====Problems 1, 2, & 3=====<br />
<br />
Creating a gene<br />
<java><br />
genes[0] = new Feature(ORG, GENE, "xfile", false, false, now, now);<br />
<br />
genes[0].setSeqLen(1029); <br />
sequenceDao.persist(genes[0]);<br />
<br />
FeatureLoc loc = new FeatureLoc(SOURCE_FEATURE, genes[0], 13691, false, 14720, false, (short)1, 0, 0 ,0);<br />
<br />
sequenceDao.persist(loc);<br />
<br />
addFeatureProp(genes[0], "description", "A test gene for GMOD meeting");<br />
<br />
addSynonymsToFeature(genes[0], "mulder", "scully");<br />
<br />
createExon("exon1", genes[0], 13691, 13767, now, 0);<br />
<br />
createExon("exon2", genes[0], 14687, 14720, now, 1);<br />
</java><br />
<br />
Retrieve a gene<br />
<java><br />
Feature f = sequenceDao.getFeatureByUniqueName("xfile");<br />
displayGene(f);<br />
</java><br />
<br />
Update a gene<br />
<java><br />
genes[0].setUniqueName("x-file");<br />
<br />
sequenceDao.merge(genes[0]);<br />
</java><br />
<br />
=====Demo – Sample Problem=====<br />
<br />
<java><br />
private Feature createExon(String name, Feature gene, int min, int max, Timestamp now, int rank) {<br />
<br />
Feature exon = new Feature(ORG, EXON, name, false, false, now, now);<br />
exon.setSeqLen(max-min);<br />
sequenceDao.persist(exon);<br />
<br />
FeatureLoc loc = new FeatureLoc(SOURCE_FEATURE, exon, min, false, max, false, <br />
(short)1, 0, 0 ,0);<br />
sequenceDao.persist(loc);<br />
<br />
return exon;<br />
<br />
}<br />
</java><br />
<br />
=====Demo – Sample Problem=====<br />
<br />
<xml><br />
<st:section name="Naming" id="gene_naming" collapsed="false" collapsible="false"<br />
hideIfEmpty="true"><br />
<dl><br />
<dt><b>symbol:</b></dt><br />
<dd>${feature.uniqueName}</dd><br />
</dl><br />
<db:synonym name="synonym" var="name" collection="${feature.featureSynonyms}"><br />
<br /><b>Synonym:</b> <db:list-string collection="${name}" /><br />
</db:synonym><br />
<dt><b>Type:</b></dt><br />
<dd>${feature.cvTerm.name}</dd><br />
<br />
<st:section name="Exons" collapsed="false" collapsible="true" hideIfEmpty="true"><br />
<display:table name="exons" uid="tmp" pagesize="30" class="simple" cellspacing="0"<br />
cellpadding="4"><br />
<display:column property="uniqueName" title="Exon"/><br />
<display:column property="featureLocsForSrcFeatureId.fmin" title="Start"/><br />
<display:column property="featureLocsForSrcFeatureId.fmax" title="end"/><br />
</display:table><br />
</st:section><br />
<br />
<st:section name="cds" collapsible="true"><br />
<b>${feature.residues}</b><br />
</st:section><br />
</xml><br />
<br />
Specialized functionality like a cascading delete are handled by the database</div>165.124.152.78http://gmod.org/wiki/GMOD_MiddlewareGMOD Middleware2007-01-26T19:52:40Z<p>165.124.152.78: /* Modware Features */</p>
<hr />
<div>==Middleware for Chado databases==<br />
<br />
===Authors===<br />
<br />
* Jeff Bowes<br />
* Robert Bruggner<br />
* Scott Cain<br />
* Josh Goodman<br />
* Eric Just<br />
* Sohel Merchant<br />
* Brian O'Connor<br />
* Brian Osborne<br />
* Chinmay Patel<br />
* Pinglei Zhou<br />
<br />
===Middleware Evaluation January 2007===<br />
<br />
A group of some 50 GMOD developers met at the annual meeting to discuss middleware. This one day meeting had the following general goals:<br />
<br />
* To educate GMOD programmers on methods and practices for Middleware<br />
* To facilitate discussion on the best methods<br />
* To guide GMOD to a uniform Middleware layer<br />
* To generate this central reference document for Middleware projects, including:<br />
** Platform information<br />
** Strengths & weaknesses of different Middleware packages<br />
** Specific examples of how one would use a given middleware package<br />
<br />
===Introduction===<br />
<br />
One of the key characteristics of the GMOD software project is the variety of approaches and components that it supports. This applies to applications, database schemas, as well as to middleware, a software ''layer'' that mediates the exchange of data between the applications and the databases. Despite this diversity certain applications and schemas have emerged as key supported components in GMOD, such as the GBrowse application and the Chado schema, to name just two. However, a consensus view has not emerged with respect to middleware, and there are certainly a number of different middleware packages that have been used in the GMOD world, coming from within this world and from the larger world of open source.<br />
<br />
In late 2006 the GMOD developers took note of the large number of middleware packages in use and elected to embark on a short-term study to evaluate and compare these packages. The primary motivation here was to select or recommend certain packages over others specifically within the GMOD context. The assumption is that making such recommendations will serve to focus the developers' effort on a smaller number of packages. Clearly it's also assumed that such a focus will inevitably lead to greater support for and use of those recommended packages, and that all will GMOD will benefit.<br />
<br />
Another purpose of this study is to educate GMOD programmers on best practices concerning the use and development of middleware. It's expected that common agreement on these practices will lead to the development of more effective software as well as the best use of the software in practice. Finally, this study should generate a central reference document on these different middleware packages used in GMOD. This reference will contain platform- and language-specific information as well as descriptions of the strengths and weaknesses of the packages that can be used by GMOD developers when considering middleware.<br />
<br />
====General Evaluation Criteria====<br />
<br />
The GMOD developers proposed that each presenter provide some basic information about each middleware package, both general and technical. In addition each middleware application was asked to address a set of sample problems, shown below. These example problems are thought to typify some of the common functions that the scientist may need when working with their own database. It was understood that not all software would be able to handle all aspects of the sample problems and this demonstration was not intended to be ''live''.<br />
<br />
=====Problem 1=====<br />
<br />
Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned <code>feature_id</code> for each inserted gene.<br />
<br />
Note:<br />
<br />
* The coordinates are given in exact coordinates.<br />
* Use the <code>organism_id</code> for your organism<br />
* Store description in the chado table <code>featureprop</code><br />
* A sequence in fasta format (see [[FakeChromosome|Fake Chromosome]]) should be loaded as genomic sequence, either chromosome or contig -- this will be used as a <code>srcfeature</code> in <code>featureloc</code><br />
<br />
Gene descriptions:<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
mRNA Feature<br />
exon_1: <br />
start: 13691<br />
end: 13767<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
exon_2: <br />
start: 14687<br />
end: 14720<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
symbol: x-men<br />
synonyms: wolverine<br />
mRNA Feature<br />
exon_1: <br />
start: 12648<br />
end: 13136<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
symbol: x-ray<br />
synonyms: none<br />
exon: <br />
start: 1703<br />
end: 1900<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 2=====<br />
<br />
Retrieve and print the following report for gene '''xfile''' (the coding sequence and exon coordinates are derived from the associated mRNA feature). The results should resemble the following:<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>xfile cds <br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
=====Problem 3=====<br />
<br />
Update the gene '''xfile''': change the name symbol to '''x-file''' and retrieve the changed record. Regenerate the report from Problem 1. The results should resemble the following:<br />
<br />
symbol: x-file<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>x-file cds<br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
=====Problem 4=====<br />
<br />
Search for all genes with symbols starting with ''x-''. With the results produce the following simple result list<br />
(organism will vary):<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
1325 x-ray Xenopus laevis<br />
<br />
=====Problem 5=====<br />
<br />
Delete the gene '''x-ray''' using the <code>geneId</code>. Run the search and report in Problem 4 again to show the delete has taken<br />
place, with a result resembling the following:<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
<br />
===Conclusions===<br />
<br />
The one day meeting heard presentations from developers using both Perl and Java middleware and a number of satisfactory solutions were described. The focus in all cases was some sort of system that connected to the Chado relational database. Although other databases are encountered in the GMOD world (e.g. BioSQL) the Chado schema is popular and serves as a good test schema for this exercise given its complexity. The primary focus in the talks was on functionality from the perspective of writing code and extending the software and less attention was given to performance. Each presenter focussed on their middleware and little side-by-side comparisons were made (for one comparison please see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]] by Josh Goodman).<br />
<br />
====Problem Assignments====<br />
<br />
All presenters paid attention to the assigned problems and all packages could perform the required operations, except for the GBrowse (DasI) Adaptor which is read-only software. Clearly one can see many differences between packages in how the the problems were solved, please see the presentations themselves for these details. <br />
<br />
The Perl approaches used only the Perl language whereas the Java packages all used Java plus XML, to some degree. In addition iBatis exposes SQL to the developer and it was argued that this could be viewed either as an advantage (allows tuning of underlying SQL) or disadvantage.<br />
<br />
====Java Middleware====<br />
<br />
* Flybase examined iBatis and Hibernate, both use XML configuration files<br />
** Hibernate is better if you're building schema from scratch<br />
** Both auto-configure given a schema. <br />
** Both have strengths and weaknesses.<br />
* Is Hibernate better when you're in the process of designing a schema?<br />
** Hibernate can assist you in making a ''Hibernate-compatible'' schema.<br />
<br />
=====Abstraction=====<br />
<br />
<br />
=====Performance=====<br />
<br />
No pairwise comparisons.<br />
<br />
=====Configuration & Autoconfiguration=====<br />
<br />
<br />
=====Documentation=====<br />
<br />
<br />
====Perl Middleware====<br />
<br />
<br />
=====Abstraction=====<br />
<br />
Modware has Higher level abstraction than that provided by Chado::AutoDBI<br />
<br />
=====Performance=====<br />
<br />
No pairwise comparisons.<br />
<br />
=====Configuration & Autoconfiguration=====<br />
<br />
<br />
=====Documentation=====<br />
<br />
DasI interface is well-documented, about a dozen methods and three classes, all documented.<br />
<br />
===Object-Relational Mapping Principles===<br />
<br />
====Presentation by Sohel Merchant====<br />
<br />
Sohel Merchant, Bioinformatics Software Engineer at dictyBase, Center for Genetic Medicine, Northwestern University, Chicago. This Wiki section is an edited version of [[Media:Object_Relational_Mapping_Layer.pdf|Sohel's presentation]].<br />
<br />
=====Outline=====<br />
<br />
* The Problem<br />
* Solutions<br />
* ORM<br />
* Perl – Class::DBI<br />
* Summary<br />
<br />
=====The Problem=====<br />
<br />
* Developers need to perform Create, Retrieve, Update, Delete (aka CRUD) operations on data inside an application.<br />
* The real world objects represented using a programming language needs to be stored in databases<br />
* Using relational databases to store object-oriented data leads to a semantic gap<br />
* RDBMS have fixed types, but OO can have more complicated user defined types.<br />
<br />
=====Solutions=====<br />
<br />
* Data Access Object (DAO)<br />
* Developer writes a class which contains one attribute for each field in the table<br />
* Methods for CRUD typically contains JDBC/DBI code with the necessary SQL statements.<br />
* Object Relational Mapping (ORM), WikiPedia:<br />
** “ORM is a programming technique that links databases to object-oriented language concepts, creating (in effect) a virtual object database.“<br />
* Developer needs to configure the ORM<br />
* Less amount of manual coding<br />
* CRUD methods are automatically generated by the ORM layer<br />
<br />
=====ORM=====<br />
<br />
ORM solutions<br />
<br />
* Perl<br />
** Class::DBI<br />
* Java<br />
** EJB<br />
** Hibernate<br />
** JDO<br />
** iBatis<br />
<br />
=====Perl - Class::DBI=====<br />
<br />
* Provides a simple interfaces for wrapping Perl classes around a database tables<br />
* Tables are mapped directly to objects<br />
* The table column name are mapped to the get/set methods<br />
* Can be used with transactions<br />
<br />
=====Class::DBI=====<br />
<br />
Defining a class in Class::DBI to represent a table:<br />
<br />
CVTERM<br />
cvterm_id<br />
cv_id<br />
name<br />
definition<br />
dbxref_id<br />
<br />
Corresponding code:<br />
<br />
<perl><br />
package Chado::Cvterm;<br />
use base 'Chado::DBI';<br />
Chado::Cvterm->set_up_table('Cvterm');<br />
</perl><br />
<br />
=====Class::DBI - CRUD=====<br />
<br />
<perl><br />
## Create<br />
$term_dbobj = Chado::Cvterm->create({<br />
name => ”DUMMY TERM”,<br />
cv_id => 1,<br />
dbxref_id => 125<br />
});<br />
## Retrieve<br />
$term_dbobj = Chado::Cvterm->retrieve(2);<br />
<br />
## Update<br />
$term_dbobj->name( $term->name() );<br />
$term_dbobj->definition( $term->definition );<br />
<br />
## Delete<br />
$term_dbobj->delete();<br />
</perl><br />
<br />
=====Java - Hibernate=====<br />
<br />
* Hibernate maps Java Objects directly to database tables<br />
* Scalable<br />
* Works well for controlled Data model<br />
<br />
=====Java - iBatis=====<br />
<br />
* iBATIS maps Java Objects to the results of SQL Queries<br />
* XML definitions for queries<br />
* Queries and managing Maps<br />
* Transactions<br />
* Good fit for existing database schema<br />
<br />
=====Summary=====<br />
<br />
* ORM provides painless roundtrip of data between the application and database.<br />
* Reduces the amount of SQL code and allows a programmatic style interface to the RDBMS<br />
* Choice of ORM solution depends on the type of project<br />
* Flybase examined iBatis and Hibernate, both use XML configuration files<br />
** Hibernate is better if you're building schema from scratch<br />
** Both auto-configure given a schema. <br />
** Both have strengths and weaknesses.<br />
* Is Hibernate better when you're in the process of designing a schema?<br />
** Hibernate can assist you in making a ''Hibernate-compatible'' schema.<br />
<br />
===XORT===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/project/showfiles.php?group_id=27707<br />
* Language: Perl<br />
* Authors: Pinglei Zhou<br />
* Users: Flybase<br />
* Support: Pinglei Zhou<br />
* Third party code: Uses XML::Parser::PerlSAX, Unicode::String, XML::DOM, and DBI from Perl (CPAN)<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
=====Comparing Hibernate & XORT=====<br />
<br />
Flybase tried Hibernate, but just creating simple print() statements in the course of doing bulk operations they<br />
encountered performance issues. Therer are many caching parameters available in Hibernate but the problem is that Chado is recursive or cyclical. <br />
XORT does some simple, and automatic, caching. With XORT you can handle recursive or cyclical operations more easily. In common operations such as merging genes Chado users will encounter this issue routinely.<br />
<br />
Also see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]].<br />
<br />
====Limitations====<br />
<br />
* Database schemas need to follow certain rules<br />
** All must have internal int primary key<br />
** All must have unique key(s)<br />
* It may take a long path to retrieve certain type of data<br />
** Example: gene->allele->genotype->phenotype via feature_relationship<br />
* Structure not stored in memory, you flush out data as it goes<br />
<br />
====Presentation by Pinglei Zhou and Josh Goodman====<br />
<br />
This Wiki section is an edited version of [[Media:XORT.pdf|Josh and Pinglei's presentation]].<br />
<br />
<br />
<br />
=====Introduction=====<br />
<br />
* An XML-database mapping system for data exchange between DB and XML-driven application<br />
* XORT can handle typical XML, it's not Chado-specific<br />
* Developed/Supported by Pinglei Zhou at FlyBase Harvard, 0.007 version now.<br />
* Used at all FlyBase sites<br />
** Harvard has extensive library of Perl modules for generating ChadoXML<br />
* Written in Perl<br />
* Required perl modules:<br />
** XML::Parser::PerlSAX<br />
** Unicode::String<br />
** XML::DOM<br />
** DBI<br />
<br />
=====Chado XML=====<br />
<br />
* Is ChadoXML necessary? No, but it may help you.<br />
* ChadoXML assists with incremental updates, if you want to avoid flush-and-reload.<br />
* While update can be achived by other middleware (for example, perl Class::DBI, Java Hibernate), ChadoXML provide additional feature as way to archive your transaction.<br />
* It provides bulk update/download which other methods lack or is inefficient<br />
<br />
=====Components=====<br />
<br />
* Database & Schema<br />
* ChadoXML Specification<br />
* DumpSpec<br />
** DumpSpec files are simple XML files that tells XORT what to do<br />
** DumpSpec files are ''language independent'', being XML<br />
** It's fairly easy for those who know the schema to read these files and understand what the operation is<br />
<br />
=====Highlights of Chado XML Specification=====<br />
<br />
* Unique represent of specific database schema<br />
* Get away with those internal primary key value<br />
* Static vs. Operational<br />
* Encoding for non-ASCII characters<br />
* Macro mechanism (object reference)<br />
<br />
=====Putting it together: New FlyBase dataflow Part 1=====<br />
<br />
There are three Flybase sites, and most curation is done at Harvard and<br />
Cambridge. Proforma is the curation format at Cambridge and Harvard, but<br />
Harvard also curates with Apollo and ChadoXML.<br />
<br />
Once in Chado, the reporting instance, there's a denormalization step<br />
in moving data to a read-only database. Once in the read-only database there are<br />
dumps, for reporting purposes, using XORT to create ChadoXML. Once<br />
ChadoXML is created version 2 of XSLT is used to create HTML and GFF. HTML reports<br />
are for human-readable reports, GFF for GBrowse and for various power<br />
users.<br />
<br />
1.a. Proforma (FlyBase Cambridge) is converted to ChadoXML<br />
<br />
1.b. ChadoXML is created by Apollo (Harvard)<br />
<br />
1.c. ChadoXML is created by Java SEAN (Harvard)<br />
<br />
2. All ChadoXML is loaded into Chado by XORT<br />
<br />
=====Putting it together: New FlyBase dataflow Part 2=====<br />
<br />
3. Chado (Harvard) is denormalized and loaded into Chado (Indiana)<br />
<br />
4. ChadoXML is created from Chado using XORT<br />
<br />
5.a. GFF and Fasta is created from ChadoXML<br />
<br />
5.b. HTML is created from Chado XML<br />
<br />
=====Data & Report Generation=====<br />
<br />
* Content of all output files is controlled by XML dumpspecs.<br />
** Dumpspecs are language independent.<br />
** Easily readable (with knowledge of Chado structure).<br />
* All XML transformation steps are done with XSLT v2.<br />
** Saxon XSLT (http://saxon.sourceforge.net/)<br />
** ChadoXML is split into individual chunks before XSLT processing to accommodate large file sizes.<br />
** Extremely fast. We can process all data for ~60,000 Drosophila genes in under 30 minutes.<br />
<br />
=====Hibernate & XORT=====<br />
<br />
* Hibernate didn't scale well when dealing with 5,000+ features in bulk.<br />
** The test was simply calling <code>print()</code> statements<br />
* Performance tweaks for Hibernate can be quite complicated to setup for bulk operations.<br />
* XORT is currently handling ~6 million features in production with only minor performance problems.<br />
* XORT is much more language independent.<br />
<br />
=====Support for complex transactions using XORT=====<br />
<br />
For example:<br />
<br />
* Find all records linked to a record using dumpspec<br />
* Merge gene x into y, each with thousands of records attached<br />
<br />
Step 1. Dump all data use simple dumpspec<br />
<xml><br />
<chado><br />
<feature dump=“all”><br />
<uniquename test=“eq”>x</uniquename><br />
</feature><br />
</chado><br />
</xml><br />
Step 2 Delete feature x from DB, with triggers to clean orphan records, if necessary<br />
<br />
Step 3. Edit the output xml, change uniquename x to y, then load the edited file back to DB<br />
<br />
=====CHIA (Chado Interface Application)=====<br />
<br />
A Java application that organizes SQL and XORT functionality for internal users, e.g.:<br />
<br />
* Dump chado-XML for gene regions for Apollo curation<br />
* Organize and execute “canned” SQL queries<br />
* Serve IDs for curators (in development)<br />
* Dynamic browser Chado without writing SQL statement<br />
<br />
CHIA is being designed to be extensible for adding new functionality as needed.<br />
<br />
<br />
=====Documentation=====<br />
<br />
* ''Using Chado to Store Genome Annotation Data"<br />
** Current Protocols in Bioinformatics (Baxevanis, A.D., and Davison, D.B., eds) 2, 9.6.1-9.6.28.<br />
* XORT specification docs<br />
* XORT draft (unpublished)<br />
* GMOD case demo procedure<br />
** All in the doc directory of XORT package, http://www.gmod.org<br />
<br />
=====Acknowledgements=====<br />
<br />
* Willian Gelbart <br />
* Chris Mungall<br />
* David Emmert <br />
* Mark Gibson<br />
* Stan Letovsky <br />
* Nomi Harris<br />
* Frank Smutniak <br />
* Suzanna Lewis<br />
* Peili Zhang <br />
* Stan Letovsky<br />
* Haiyan Zhang <br />
* Aubrey de Grey<br />
* Andy Schroeder <br />
* Don Gilbert<br />
* Susan Russo<br />
* Mark Zythovicz <br />
* Scott Cain<br />
* Lincoln Stein<br />
* Victor Strelets<br />
* Robert Wilson<br />
* Paul Leyland<br />
<br />
===Chado::AutoDBI===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/projects/gmod/<br />
* Language: Perl<br />
* Authors: Allen Day, Scott Cain, Brian O'Connor, & others<br />
* Users: <br />
* Support:<br />
* Third party code: Based on Class::DBI by Michael Schwern & Tony Bowden<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Performance<br />
** Can one read thousands of objects into memory? You could do this but it's not suited to bulk operations<br />
* Joins & complex queries<br />
<br />
<perl><br />
# Add the add_constructor for looking for name lengths<br />
<br />
__PACKAGE__ ->add_constructor(long_names => qq{ length(name) > 15 });<br />
<br />
# Custom SQL<br />
<br />
__PACKAGE__->set_sql(xfiles => qq{<br />
SELECT FEATURE_ID<br />
FROM FEATURE<br />
WHERE NAME = 'xfiles' });<br />
</perl><br />
<br />
====Presentation by Brian O'Connor====<br />
<br />
This Wiki section is an edited version of [[Media:AutoDBI.pdf|Brian's presentation]].<br />
<br />
=====Relation to Turnkey=====<br />
<br />
Turnkey is a package that auto-generates Web sites given a relational<br />
schema, based on SQL::Translator<br />
<br />
* Turnkey authors: Allen Day, Scott Cain, Brian O'Connor<br />
* Turnkey and Chado::AutoDBI objects are essentially the same<br />
<br />
=====Technical Overview=====<br />
<br />
* Code Generation<br />
<br />
=====Project Overview=====<br />
<br />
Convert SQL Queries/Inserts/Deletes -> Object Calls<br />
<sql><br />
INSERT INTO feature (organism_id, name)<br />
VALUES (1, 'foo');<br />
</sql><br />
To:<br />
<perl><br />
my $feature = Turnkey::Model::Feature->find_or_create({<br />
organism_id => $organism,<br />
name => 'xfile', uniquename => 'xfile',<br />
type_id => $mrna_cvterm,<br />
is_analysis => 'f', is_obsolete => 'f'<br />
});<br />
</perl><br />
<br />
=====Technical Overview=====<br />
<br />
* Database connection: use a base class<br />
* Set up base object and connect, then create a ''table object'' to access primary key. <br />
* Class::DBI can find and insert records into other table, based on foreign key.<br />
<br />
<perl><br />
use base qw(Class::DBI::Pg);<br />
<br />
my ($dsn, $name, $pass);<br />
$dsn = "dbi:Pg:host=localhost;dbname=chado;port=5432";<br />
$name = "postgres";<br />
$pass = "";<br />
<br />
Turnkey::Model::DBI->set_db('Main', $dsn, $name, $pass, {AutoCommit => 1});<br />
</perl><br />
<br />
=====Technical Overview=====<br />
<br />
* Basic ORM Object: Feature<br />
<br />
<perl><br />
package Turnkey::Model::Feature;<br />
use base 'Turnkey::Model::DBI';<br />
<br />
Turnkey::Model::Feature->set_up_table('feature');<br />
<br />
#<br />
# Primary key accessors<br />
#<br />
<br />
sub id { shift->feature_id }<br />
sub feature { shift->feature_id }<br />
</perl><br />
<br />
* data field accessors by Class::Accessor<br />
<br />
=====Technical Overview=====<br />
<br />
* Basic ORM Object: Feature<br />
** has_a<br />
<br />
<perl><br />
#<br />
# has_a<br />
#<br />
Turnkey::Model::Feature->has_a( type_id => "Turnkey::Model::Cvterm" );<br />
sub cvterm { return shift->type_id; }<br />
</perl><br />
<br />
* Basic ORM Object: Feature<br />
** has_many<br />
<br />
<perl><br />
#<br />
# has_many<br />
#<br />
Turnkey::Model::Feature->has_many('feature_synonym_feature_id', <br />
'Turnkey::Model::Feature_Synonym' => 'feature_id');<br />
sub feature_synonyms { return shift->feature_synonym_feature_id; }<br />
<br />
Turnkey::Model::Feature->has_many('featureprop_feature_id', <br />
'Turnkey::Model::Featureprop' => 'feature_id');<br />
sub featureprops { return shift->featureprop_feature_id; }<br />
</perl><br />
<br />
* Can traverse tables, such as going from FEATURE to FEATUREPROP <br />
** Tell base object that the ''table object'' has_a() or has_many() keys corresponding to some key in other ''table object'' <br />
<br />
=====Technical Overview=====<br />
<br />
* Basic ORM Object: Feature<br />
** skipping linker tables for has_many<br />
<br />
<perl><br />
# skip over feature_synonym table<br />
#<br />
# method 1<br />
#<br />
sub synonyms { my $self = shift; return map $_->synonym_id, $self->feature_synonyms; }<br />
#<br />
# method 2<br />
#<br />
Turnkey::Model::Feature->has_many( synonyms2 =><br />
['Turnkey::Model::Feature_Synonym' => 'synonym_id']);<br />
</perl><br />
<br />
=====Technical Overview=====<br />
<br />
* Transactions<br />
** Chado::AutoDBI supports transactions, and one can wrap the transaction in an eval()<br />
<perl><br />
sub do_transaction {<br />
my $class = shift;<br />
my ( $code ) = @_;<br />
# Turn off AutoCommit for this scope.<br />
# A commit will occur at the exit of this block automatically,<br />
# when the local AutoCommit goes out of scope.<br />
local $class->db_Main->{ AutoCommit };<br />
<br />
# Execute the required code inside the transaction.<br />
eval { $code->() };<br />
if ( $@ ) {<br />
my $commit_error = $@;<br />
eval { $class->dbi_rollback }; # might also die!<br />
die $commit_error;<br />
}<br />
}<br />
</perl><br />
<br />
=====Technical Overview=====<br />
<br />
* Lazy Loading<br />
** One can either do automated creation of objects or explicitly dictate which fields are incorporated into object<br />
<perl><br />
Turnkey::Model::Feature->columns( Primary => qw/feature_id/ );<br />
Turnkey::Model::Feature->columns( Essential => qw/name organism_id type_id/ );<br />
Turnkey::Model::Feature->columns( Others => qw/residues .../ );<br />
</perl><br />
<br />
Typically:<br />
<br />
<perl><br />
Turnkey::Model::Feature->set_up_table('feature');<br />
</perl><br />
<br />
=====Problem 1=====<br />
<br />
* Create Feature & Add Description<br />
<br />
<perl><br />
# now create mRNA feature<br />
<br />
my $feature = Turnkey::Model::Feature->find_or_create({<br />
organism_id => $organism,<br />
name => 'xfile', uniquename => 'xfile',<br />
type_id => $mrna_cvterm,<br />
is_analysis => 'f', is_obsolete => 'f'<br />
});<br />
<br />
# create description<br />
<br />
my $featureprop = Turnkey::Model::Featureprop->find_or_create({<br />
value => 'A test gene for GMOD meeting',<br />
feature_id => $feature,<br />
type_id => $note_cvterm,<br />
});<br />
</perl><br />
<br />
=====Problem 2=====<br />
<br />
* Retrieve a Feature via Searching<br />
** Search using strings or identifiers, a search will return an iterator object<br />
<br />
<perl><br />
# objects for global use<br />
<br />
# the organism for our new feature<br />
my $organism = Turnkey::Model::Organism->search(abbreviation => "S.cerevisiae")->next;<br />
<br />
# the cvterm for a "Note"<br />
my $note_cvterm = Turnkey::Model::Cvterm->retrieve(2);<br />
<br />
# searching name by wildcard<br />
<br />
my @results = Turnkey::Model::Feature->search_like(name => 'x-%');<br />
</perl><br />
<br />
=====Problems 3, 4, & 5=====<br />
<br />
* Update a Feature<br />
<br />
<perl><br />
# update the xfile gene name<br />
<br />
$feature->name("x-file");<br />
$feature->update();<br />
</perl><br />
<br />
* Delete a Feature<br />
<br />
<perl><br />
# now delete the x-file feature<br />
<br />
$feature->delete();<br />
</perl><br />
<br />
=====Things Chado::AutoDBI does well=====<br />
<br />
* Easy to use<br />
* Easy to port<br />
* Use with other DBs<br />
** Both Oracle and Postgres used currently<br />
* Autogenerated via Turnkey<br />
* find_or_create method<br />
* Performance is not as bad as you might guess<br />
** Due to Lazy loading<br />
** Even whole genome operations are feasible<br />
<br />
Note that speed is relative: one can find bad performance using the wrong SQL and Chado::AutoDBI approach will be speedier.<br />
<br />
<br />
=====For More Information=====<br />
<br />
* Class::DBI<br />
** http://www.class-dbi.com<br />
** http://search.cpan.org<br />
<br />
* Turnkey<br />
** http://turnkey.sf.net<br />
<br />
* Biopackages<br />
** http://biopackages.net<br />
<br />
===Modware===<br />
<br />
====Background====<br />
<br />
* Source: http://gmod-ware.sourceforge.net<br />
* Language: Perl<br />
* Authors: Sohel Merchant, Eric Just<br />
* Users: DictyBase<br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Does not ''cover'' all of Chado<br />
* Not enough users to get quality feedback yet<br />
* Performance (?)<br />
* Language dependent<br />
<br />
====Presentation by Eric Just====<br />
<br />
Eric Just, Senior Bioinformatics Scientist, dictyBase: http://dictybase.org<br />
Center for Genetic Medicine, Northwestern University. This is an edited version of [[Media:Modware.pdf|Eric's presentation]].<br />
<br />
=====Why Modware Was Developed=====<br />
<br />
* Each feature type requires different behavior<br />
* Want to leave schema semantics out of application<br />
* Want to leverage work done in BioPerl<br />
* Re-use code developed for common use cases<br />
* DictyBase is using a superset of Modware<br />
** Some DictyBase-specific code used there<br />
<br />
=====What is in the Feature Table?=====<br />
<br />
The core of Chado<br />
<br />
* Chromosome<br />
* Contig<br />
* Gene<br />
* mRNA<br />
* Exon<br />
* Lots of other things - See Sequence Ontology!<br />
<br />
=====Modware Features=====<br />
<br />
* Multiple Feature classes<br />
** CHROMOSOME, GENE, MRNA, CONTIG<br />
* Each class provides type specific methods<br />
* Logic such as building exon structure of mRNA features is encapsulated<br />
* Parent class Modware::Feature<br />
** Provides common methods<br />
** Abstract factory for various feature types<br />
* Lazy : information is only retrieved when you ask for it, but cached for <br />
speedy retrieval the next time it is required<br />
* Uses Bioperl and its objects<br />
** Common methods such as name(), primary_id(), external_ids() <br />
* Subclasses provide type-specific methods<br />
** For example, Chromosome isn't the same as Gene which isn't the same as ...<br />
* Any feature type not explicitly supported in Modware::Feature class is blessed <br />
as a Modware::Feature::GENERIC class which has a start/stop coordinate on a genomic <br />
sequence feature (no structure like a trasncript with exons)<br />
<br />
=====Architectural Overview=====<br />
<br />
* Object-oriented Perl interface to Chado<br />
* Built on top of Chado::AutoDBI<br />
* Connection handled by GMOD<br />
* Database transactions supported<br />
* BioPerl used to represent and manipulate sequence and feature structure<br />
* ‘Lazy’ evaluation<br />
<br />
=====Create and Insert Chromosome=====<br />
<br />
<perl><br />
my $seq_io = new Bio::SeqIO(<br />
-file => "../data/fake_chromosome.txt",<br />
-format => 'fasta'<br />
);<br />
<br />
# Bio::SeqIO will return a Bio::Seq object which<br />
# Modware uses as its representation<br />
my $seq = $seq_io->next_seq();<br />
<br />
my $reference_feature = new Modware::Feature(<br />
-type => 'chromosome',<br />
-bioperl => $seq,<br />
-description => "This is a test",<br />
-name => 'Fake',<br />
-source => 'GMOD 2007 Demo'<br />
);<br />
<br />
# Inserts chromosome into database<br />
$reference_feature->insert();<br />
</perl><br />
<br />
<br />
=====Problem 1 - Create and Insert a Gene=====<br />
<br />
1) Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned feature_id for each inserted gene.<br />
<br />
Gene Feature<br />
symbol: x-ray<br />
synonyms: none<br />
mRNA Feature<br />
exon:<br />
start: 1703<br />
end: 1900<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 1 - Create and Insert a Gene=====<br />
<br />
1) Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned feature_id for each inserted gene.<br />
<br />
Gene Feature<br />
symbol: x-men<br />
synonyms: wolverine<br />
mRNA Feature<br />
exon_1: <br />
start: 12648<br />
end: 13136<br />
strand: 1<br />
srcFeature_id: <br />
Id of genomic sample<br />
<br />
=====Problem 1 - Create and Insert a Gene=====<br />
<br />
1) Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned feature_id for each inserted gene.<br />
<br />
Gene Feature<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
mRNA Feature<br />
exon_1: <br />
start: 13691<br />
end: 13767<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
exon_2: <br />
start: 14687<br />
end: 14720<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 1 - Create and Insert a Gene=====<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
…<br />
<br />
<perl><br />
my $gene_feature = new Modware::Feature(<br />
-type => 'gene',<br />
-name => 'xfile',<br />
-description => 'A test gene for GMOD meeting',<br />
-source => 'GMOD 2007 Demo‘<br />
);<br />
<br />
$gene_feature->add_synonym( 'mulder' );<br />
$gene_feature->add_synonym( 'scully' );<br />
<br />
# inserts object into database<br />
$gene_feature->insert();<br />
print 'Inserted gene with feature_id:'.$gene_feature->feature_id()."\n";<br />
</perl><br />
<br />
=====Problem 1 - Create mRNA BioPerl Object=====<br />
<br />
exon_1: exon_2: <br />
start: 13691 start: 14687<br />
end: 13767 end: 14720<br />
strand: 1 strand: 1<br />
srcFeature_id: Id of genomic sample srcFeature_id: Id of genomic sample<br />
<br />
<perl><br />
# First, create exon features (using Bioperl)<br />
my $exon_1 = new Bio::SeqFeature::Gene::Exon (<br />
-start => 13691,<br />
-end => 13767,<br />
-strand => 1,<br />
-is_coding => 1<br />
);<br />
<br />
my $exon_2 = new Bio::SeqFeature::Gene::Exon (<br />
-start => 14687,<br />
-end => 14720,<br />
-strand => 1,<br />
-is_coding => 1<br />
);<br />
<br />
# Next, create transcript feature to 'hold' exons (using Bioperl)<br />
my $bioperl_mrna = new Bio::SeqFeature::Gene::Transcript();<br />
<br />
# Add exons to transcript (using Bioperl)<br />
$bioperl_mrna->add_exon( $exon_1 );<br />
$bioperl_mrna->add_exon( $exon_2 );<br />
</perl><br />
<br />
=====Problem 1 - Create and Insert mRNA=====<br />
<br />
The BioPerl object holds the location information, but now we want to create a Modware object and link it to the gene as well as locate it on the chromosome.<br />
<br />
<perl><br />
# Now create Modware Feature to 'hold' bioperl object<br />
my $mrna_feature = new Modware::Feature(<br />
-type => 'mRNA',<br />
-bioperl => $bioperl_mrna,<br />
-source => 'GMOD 2007 Demo',<br />
-reference_feature => $reference_feature<br />
);<br />
<br />
# Associate mRNA to gene (required for insertion)<br />
$mrna_feature->gene( $gene_feature );<br />
<br />
# inserts object into database<br />
$mrna_feature->insert();<br />
</perl><br />
<br />
=====Problem 2 - Writing the Report=====<br />
<br />
2) Retrieve and print the following report for gene xfile<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>xfile cds<br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
Create new package, GMODWriter, to write the report, this package<br />
uses Modware and Bioperl methods.<br />
<br />
<perl><br />
use Modware::Gene;<br />
use GMODWriter;<br />
<br />
my $xfile_gene = new Modware::Gene( -name => 'xfile' );<br />
GMODWriter->Write_gene_report( $xfile_gene );<br />
</perl><br />
<br />
* What's the difference between Modware::Gene and Modware::Feature? Gene is-a Feature.<br />
<br />
=====Problem 2 - Writing the Report=====<br />
<br />
2) Retrieve and print the following report for gene xfile<br />
<br />
* The mRNA object contains the Bioperl object<br />
** Why not just subclass? More flexibility the way shown here<br />
<br />
<perl><br />
package GMODWriter; <br />
sub Write_gene_report {<br />
my ($self, $gene) = @_;<br />
my $symbol = $gene->name();<br />
<br />
my @synonyms = @{ $gene->synonyms() };<br />
my $syn_string = join ",", @synonyms;<br />
my $description = $gene->description();<br />
my $type = $gene->type();<br />
# get features associated with the gene that are of type 'mRNA'<br />
my ($mrna) = grep { $_->type() eq 'mRNA' } @{ $gene->features() };<br />
# use bioperl method to get exons from mRNA<br />
my @exons = $mrna->bioperl->exons_ordered();<br />
# Modware will return a nice fasta file for you.<br />
my $fasta = $mrna->sequence( -type => 'cds', -format => 'fasta' );<br />
# Now print the actual report<br />
print "symbol: $symbol\n";<br />
print "synonyms: $syn_string\n";<br />
print "description: $description\n";<br />
print "type: $type\n";<br />
<br />
my $count = 0;<br />
foreach my $exon (@exons ) {<br />
$count++;<br />
print "exon${count} start: ".$exon->start()."\n";<br />
<br />
print "exon${count} end: ".$exon->end()."\n";<br />
<br />
}<br />
print "$fasta";<br />
}<br />
. . .<br />
</perl><br />
<br />
=====Problem 3 - Updating a Gene Name=====<br />
<br />
3) Update the gene xfile: change the name symbol to x-file and retrieve the changed record. Regenerate gene report<br />
<br />
<perl><br />
use Modware::Gene;<br />
use Modware::DBH;<br />
use GMODWriter;<br />
<br />
eval{<br />
<br />
# get xfile gene<br />
my $xfile_gene = new Modware::Gene( -name => 'xfile' );<br />
<br />
# change the name<br />
$xfile_gene->name( 'x-file' );<br />
# write changes to database<br />
$xfile_gene->update();<br />
<br />
# we can use the original object if we want, but instead<br />
# we refetch from the database to 'prove' the name has been changed<br />
my $xfile_gene2 = new Modware::Gene( -name => 'x-file' );<br />
# use our GMODWriter package to write report for x-file<br />
GMODWriter->Write_gene_report( $xfile_gene2 );<br />
<br />
};<br />
if ($@){<br />
warn $@;<br />
new Modware::DBH->rollback();<br />
}<br />
</perl><br />
<br />
<br />
=====Problem 4 - Search and Display Results=====<br />
<br />
4) Search for all genes with symbols starting with "x-*". With the results produce the following simple result list (organism will vary):<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
1325 x-ray Xenopus laevis<br />
<br />
<perl><br />
use Modware::Gene;<br />
use Modware::DBH;<br />
use GMODWriter;<br />
<br />
# find genes starting with 'x-'<br />
my $results = Modware::Search::Gene->Search_by_name( 'x-*' );<br />
<br />
# write the search results<br />
GMODWriter->Write_search_results( $results )<br />
</perl><br />
<br />
<br />
=====Problem 4 - Search and Display Results=====<br />
<br />
4) Search for all genes with symbols starting with "x-*". With the results produce the following simple result list (organism will vary):<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
1325 x-ray Xenopus laevis<br />
<br />
<br />
<perl><br />
sub Write_search_results {<br />
my ($self, $itr) = @_;<br />
# loop through iterator<br />
while (my $gene = $itr->next) {<br />
# print the requested information<br />
print $gene->feature_id . "\t" . $gene->name .<br />
"\t" . $gene->organism_name . "\n";<br />
}<br />
}<br />
</perl><br />
<br />
=====Problem 5 - Delete a Gene=====<br />
<br />
5) Delete the gene x-ray. Run the search and report again.<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
<br />
<perl><br />
# get the xray gene<br />
my $xray = new Modware::Gene( -name => 'x-ray' );<br />
<br />
# set is_deleted = 1, this will 'hide' the gene from searches,<br />
# also sets the is_available to 0, the gene is no longer visible<br />
# to a search.<br />
<br />
$xray->is_deleted(1);<br />
<br />
# write change to database<br />
$xray->update();<br />
<br />
# find genes starting with 'x-'<br />
my $results = Modware::Search::Gene->Search_by_name( 'x-*' );<br />
<br />
# write the search results<br />
GMODWriter->Write_search_results( $results )<br />
</perl><br />
<br />
<br />
=====Other Modware Highlights=====<br />
<br />
* Easy to write applications with Modware<br />
* Extensible<br />
* Available through Sourceforge<br />
** http://gmod-ware.sourceforge.net<br />
* Easy to install<br />
* Large unit test coverage<br />
* Current release 0.2-RC1<br />
** Works with GMOD’s latest release<br />
* Sample script demoed here are available<br />
** sample_scripts directory<br />
<br />
=====Other Nice Things About Modware=====<br />
<br />
<br />
* Bioperl-style documentation <br />
** http://gmod-ware.sourceforge.net/doc/<br />
** POD for all methods<br />
* If Chado changes then...<br />
** Manually change Modware or ... <br />
** AutoDBI will automatically adjust to the change, depends on the change<br />
* Can set multiple connections through AutoDBI's <code>set_connection</code><br />
<br />
=====Coming Attractions=====<br />
<br />
* Support for changing genomic sequence<br />
* ncRNAs<br />
* UTRs<br />
* Onotology modules<br />
* Phenotype Annotations<br />
* Getting a new database handle returns the existing<br />
** Thinking about configuring modules to set what database handle can be used<br />
* Pass an argument ''type'' to the Gene's feature() method<br />
* Type the kind of synonym is being inserted?<br />
** Possible: trade-off between simplicity and functionality<br />
<br />
* Send us your ideas!<br />
<br />
<br />
=====Discussion=====<br />
<br />
* How hard is it to extend Modware?<br />
** Not known absolutely, but generally thought to be not difficult<br />
<br />
=====Acknowlegments=====<br />
<br />
* Rex Chisholm, PhD<br />
* Warren Kibbe, PhD <br />
* Scott Cain<br />
* Brian O’connor<br />
* Sohel Merchant<br />
* Petra Fey<br />
* Pascale Gaudet,<br />
* Karen Pilcher<br />
<br />
* BioPerl<br />
* GMOD<br />
* SGD<br />
<br />
===GBrowse (DasI) Adaptor===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/project/showfiles.php?group_id=27707&package_id=34513&release_id=433523<br />
* Language: Perl<br />
* Authors: Scott Cain<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity: Connects vi perl DBI<br />
* Transaction support: N/A (read only adapter)<br />
* Code generation: N/A<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Read-only<br />
* Not generic middleware but if you use Chado and GBrowse may be useful<br />
* Incomplete implementation of Bio::DasI; just enough to make GBrowse work<br />
* Also, despite the name, has never been tested with a Das server.<br />
<br />
====Presentation by Scott Cain====<br />
<br />
This Wiki section is an edited version of [[Media:DasI_middleware.pdf|Scott's presentation]].<br />
<br />
=====Create the database=====<br />
<br />
$ perl Makefile.PL<br />
$ make<br />
$ sudo make install<br />
$ make load_schema<br />
$ make prepdb # now with Xenopus!<br />
$ make ontologies # load rel, SO, featureprop<br />
<br />
=====Problem 1 - Loading Data=====<br />
<br />
Create some GFF from the specifications:<br />
<br />
fake_chromosome example chromosome 1 15017 . . . ID=fake_chromosome;Name=fake_chromosome<br />
fake_chromosome example gene 13691 14720 . + . ID=xfile;Name=xfile;Alias=mulder,scully;Note=A test gene for GMOD meeting<br />
fake_chromosome example mRNA 13691 14720 . + . ID=xfile_mRNA;Parent=xfile<br />
fake_chromosome example exon 13691 13767 . + . Parent=xfile_mRNA<br />
fake_chromosome example exon 14687 14720 . + . Parent=xfile_mRNA<br />
fake_chromosome example gene 12648 13136 . + . ID=x-men<br />
<br />
Gene inserted as GFF using a standard Bioperl bulk loader:<br />
<br />
<code>$ gmod_bulk_load_gff3.pl -g sample.gff</code><br />
<br />
''...lots of output...''<br />
<br />
=====Adaptor Components=====<br />
<br />
* Bio::DB::Das::Chado<br />
** Database connection object<br />
* Bio::DB::Das::Chado::Segment<br />
** Object for any range of DNA<br />
* Bio::DB::Das::Chado::Segment::Feature<br />
<br />
=====Use Bio::DB::Das::Chado=====<br />
<br />
<perl><br />
use Bio::DB::Das::Chado;<br />
<br />
my $chado = Bio::DB::Das::Chado->new(<br />
-dsn => "dbi:Pg:dbname=test",<br />
-user=> "scott",<br />
-pass=> "" ) || die "no new chado";<br />
<br />
my $gene_name = 'xfile';<br />
<br />
my ($gene_fo) = $chado->get_features_by_name($gene_name);<br />
</perl><br />
<br />
=====Problem 2 - Use Some Accessors=====<br />
<br />
<perl><br />
print "symbol: " . $gene_fo->display_name."\n";<br />
print "synonyms: " . join(', ',$gene_fo->synonyms)."\n";<br />
print "description: " . $gene_fo->notes."\n";<br />
print "type: " . $gene_fo->type."\n";<br />
<br />
my ($mRNA) = $gene_fo->sub_SeqFeature();<br />
my @exons = $mRNA->sub_SeqFeature();<br />
<br />
for my $exon (@exons) {<br />
next unless ($exon->type->method eq 'exon');<br />
$exon_count++;<br />
print "exon$exon_count start: " . $exon->start."\n";<br />
print "exon$exon_count end: " . $exon->end. "\n";<br />
$cds_seq .= $exon->seq->seq; # the first seq call returns a Bio::Seq object, the second gets the DNA string from Bio::Seq<br />
} <br />
</perl><br />
<br />
=====Bulk Output=====<br />
<br />
<perl><br />
my $gene_name = 'x-*';<br />
<br />
my @genes = $chado->get_features_by_name(<br />
-name => $gene_name,<br />
-class=> 'gene' );<br />
<br />
for my $gene (@genes) {<br />
print join("\t",<br />
$gene->feature_id,<br />
$gene->display_name,<br />
$gene->organism),"\n";<br />
}<br />
</perl><br />
<br />
Or see your report in GBrowse<br />
<br />
=====Advantages=====<br />
<br />
* Comes 'for free' with GBrowse<br />
** GBrowse will run with any DasI-compatible interface<br />
* Uses 'familiar' BioPerl idioms, very similar to widely used Bio::DB::GFF (though with fewer methods)<br />
<br />
<br />
=====Conclusion=====<br />
<br />
* Not suitable as a 'general' middleware layer<br />
** May be suitable for some applications, particularly if they are similar to GBrowse or other uses of Bio::DB::GFF<br />
<br />
===iBatis and Abator===<br />
<br />
====Background====<br />
<br />
* Source: http://ibatis.apache.org/<br />
* Language: Java<br />
* Authors: Apache group<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Does not hide SQL<br />
* Does not create a whole object model of the database in memory<br />
* Not as widely used as Hibernate<br />
* No Perl version<br />
<br />
====Presentation by Jeff Bowes====<br />
<br />
Jeff Bowes, Xenbase, University of Calgary. This Wiki section is an edited version of [[Media:iBatis.pdf|Jeff's presentation]].<br />
<br />
=====ibatis=====<br />
<br />
* iBatis<br />
** Light-weight framework<br />
** Still based on SQL but eliminates the repetitive drudgery of JDBC<br />
** You can tune a query by re-writing the SQL in XML & the API does not change.<br />
* iBatis does not create your database in memory as objects<br />
* Shallow learning curve<br />
* Manually create a Java class and SQL map to describe higher-level objects<br />
** Example: ''Gene''<br />
* Support for inheritance<br />
** Inheritance in result maps, allows fair amount of re-use. <br />
* Supports different transaction schemes<br />
** For example, JDBC, Java Transaction API<br />
<br />
=====Abator=====<br />
<br />
* Generates ibatis CRUD objects by introspecting database tables<br />
* Abator creates ''SQL in XML'' files (SQL Map files) and Java classes <br />
** Within these files is a Result Map section.<br />
* Abator config files are simple, set connection parameters, tell where the files are.<br />
* In the SQL Map files you can specify how to find parent ids, such asfeature_id. <br />
<br />
=====Abator Example=====<br />
<xml><br />
<abatorConfiguration><br />
<abatorContext> <!-- TODO: Add Database Connection Information --><br />
<jdbcConnection driverClass="COM.ibm.db2.jdbc.app.DB2Driver"<br />
connectionURL="jdbc:db2:XBDV05"<br />
userId="db2inst1"<br />
password=“*******"><br />
<classPathEntry location="/Program Files/IBM/SQLLIB/java/db2java.zip" /><br />
</jdbcConnection><br />
<br />
<javaModelGenerator<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.model"<br />
targetProject="gene" /><br />
<sqlMapGenerator<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.sql"<br />
targetProject="gene" /><br />
<daoGenerator type="IBATIS"<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.dao"<br />
targetProject="gene" /><br />
<abatorConfiguration><br />
</xml><br />
<br />
=====Abator Example=====<br />
<xml><br />
<abatorConfiguration><br />
<abatorContext> <!-- TODO: Add Database Connection Information --><br />
<jdbcConnection driverClass="COM.ibm.db2.jdbc.app.DB2Driver"<br />
connectionURL="jdbc:db2:XBDV05"<br />
userId="db2inst1"<br />
password=“*******"><br />
<classPathEntry location="/Program Files/IBM/SQLLIB/java/db2java.zip" /><br />
</jdbcConnection><br />
<br />
<javaModelGenerator<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.model"<br />
targetProject="gene" /><br />
<sqlMapGenerator<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.sql"<br />
targetProject="gene" /><br />
<daoGenerator type="IBATIS"<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.dao"<br />
targetProject="gene" /><br />
<abatorConfiguration><br />
</xml><br />
<br />
=====Abator Example=====<br />
<nowiki><br />
<table schema="db2inst1" tableName="synonym"></nowiki><br />
<generatedKey column="synonym_id" sqlStatement="VALUES PREVVAL FOR<br />
synonym_seq" identity="true" /><br />
<columnOverride column="CREATED_BY" jdbcType="INTEGER" /><br />
<columnOverride column="MODIFIED_BY" jdbcType="INTEGER" /><br />
<nowiki></table></nowiki><br />
<br />
=====Abator=====<br />
<br />
Works as:<br />
<br />
* Eclipse plug-in<br />
* ANT<br />
* Standalone<br />
<br />
=====DAO Methods=====<br />
<br />
* Insert (Feature)<br />
* Update (Feature)<br />
* DeletebyKey (FeatureKey)<br />
* SelectbyKey (FeatureKey)<br />
* SelectbyExample (FeatureExample)<br />
* DeletebyExample (FeatureExample)<br />
<br />
=====Insert=====<br />
<xml><br />
<insert id="abatorgenerated_insert" parameterClass=<br />
"org.gmod.architecture.framwork.bakeoff.abator.model.FeatureWithBLOBs"><br />
insert into db2inst1.feature<br />
(DBXREF_ID, ORGANISM_ID, NAME, UNIQUENAME,<br />
RESIDUES, SEQLEN, MD5CHECKSUM, TYPE_ID, IS_ANALYSIS,<br />
IS_OBSOLETE, CREATED_BY)<br />
values (#dbxrefId:INTEGER#, #organismId:INTEGER#, #name:VARCHAR#,<br />
#uniquename:VARCHAR#, #residues:CLOB#, #seqlen:INTEGER#,<br />
#md5checksum:CHAR#, #typeId:INTEGER#,<br />
#isAnalysis:SMALLINT#, #isObsolete:SMALLINT#,<br />
#createdBy:INTEGER#)<br />
<br />
<selectKey resultClass="java.lang.Integer" keyProperty="featureId"><br />
VALUES PREVVAL FOR feature_seq<br />
</selectKey><br />
</insert><br />
</xml><br />
<br />
=====Insert=====<br />
<xml><br />
<insert id="abatorgenerated_insert" parameterClass=<br />
"org.gmod.architecture.framwork.bakeoff.abator.model.FeatureWithBLOBs"><br />
insert into db2inst1.feature<br />
(DBXREF_ID, ORGANISM_ID, NAME, UNIQUENAME,<br />
RESIDUES, SEQLEN, MD5CHECKSUM, TYPE_ID, IS_ANALYSIS,<br />
IS_OBSOLETE, CREATED_BY)<br />
values (#dbxrefId:INTEGER#, #organismId:INTEGER#, #name:VARCHAR#,<br />
#uniquename:VARCHAR#, #residues:CLOB#, #seqlen:INTEGER#,<br />
#md5checksum:CHAR#, #typeId:INTEGER#,<br />
#isAnalysis:SMALLINT#, #isObsolete:SMALLINT#,<br />
#createdBy:INTEGER#)<br />
<br />
<selectKey resultClass="java.lang.Integer" keyProperty="featureId"><br />
VALUES PREVVAL FOR feature_seq<br />
</selectKey><br />
</insert><br />
</xml><br />
<br />
=====Insert=====<br />
<xml><br />
<insert id="abatorgenerated_insert" parameterClass=<br />
"org.gmod.architecture.framwork.bakeoff.abator.model.FeatureWithBLOBs"><br />
insert into db2inst1.feature<br />
(DBXREF_ID, ORGANISM_ID, NAME, UNIQUENAME,<br />
RESIDUES, SEQLEN, MD5CHECKSUM, TYPE_ID, IS_ANALYSIS,<br />
IS_OBSOLETE, CREATED_BY)<br />
values (#dbxrefId:INTEGER#, #organismId:INTEGER#, #name:VARCHAR#,<br />
#uniquename:VARCHAR#, #residues:CLOB#, #seqlen:INTEGER#,<br />
#md5checksum:CHAR#, #typeId:INTEGER#,<br />
#isAnalysis:SMALLINT#, #isObsolete:SMALLINT#,<br />
#createdBy:INTEGER#)<br />
<br />
<selectKey resultClass="java.lang.Integer" keyProperty="featureId"><br />
VALUES PREVVAL FOR feature_seq<br />
</selectKey><br />
</insert><br />
</xml><br />
<br />
=====Insert=====<br />
<xml><br />
<selectKey resultClass="java.lang.Integer"<br />
keyProperty="featureId"><br />
VALUES PREVVAL FOR feature_seq<br />
</selectKey><br />
</xml><br />
<br />
=====Insert=====<br />
<xml><br />
<selectKey resultClass="java.lang.Integer"<br />
keyProperty="featureId"><br />
VALUES PREVVAL FOR feature_seq<br />
</selectKey><br />
</xml><br />
<br />
=====Problem 1 - Insert=====<br />
<br />
<java><br />
try {<br />
sqlMap.startTransaction();<br />
pGene.id =featureDAO.insert(pGene.getFeatureWithBLOBs());<br />
featurepropDAO.insert(pGene.getPropertyDescription());<br />
pGene.featurelocId = featurelocDAO.insert(pGene<br />
.getFeaturelocWithBLOBS());<br />
pGene = insertExons(pGene);<br />
insertSynonyms(pGene);<br />
sqlMap.commitTransaction();<br />
} catch (Exception e) {<br />
System.out.println(e);<br />
throw (e);<br />
} finally {<br />
sqlMap.endTransaction();<br />
}<br />
</java><br />
<br />
=====Problem 1 - Insert=====<br />
<br />
<java><br />
try {<br />
sqlMap.startTransaction();<br />
pGene.id =featureDAO.insert(pGene.getFeatureWithBLOBs());<br />
featurepropDAO.insert(pGene.getPropertyDescription());<br />
pGene.featurelocId = featurelocDAO.insert(pGene<br />
.getFeaturelocWithBLOBS());<br />
pGene = insertExons(pGene);<br />
insertSynonyms(pGene);<br />
sqlMap.commitTransaction();<br />
} catch (Exception e) {<br />
System.out.println(e);<br />
throw (e);<br />
} finally {<br />
sqlMap.endTransaction();<br />
}<br />
</java><br />
<br />
=====Transactions=====<br />
<br />
* SQLMap<br />
* JDBC<br />
* JTA - Java Transaction API<br />
** 2-Phase commit<br />
* Hibernate<br />
* External (Customized)<br />
<br />
=====Retrieval=====<br />
<br />
symbol: xfile<br />
description: A test gene for GMOD meeting<br />
mRNA Feature<br />
exon_1: start: 13691 end: 13767 <br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
exon_2: start: 14687 end: 14720<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 2 - Master Detail Reports=====<br />
<br />
Account for cycles or recursion in Master Detail Report. <br />
<br />
<xml><br />
<resultMap id="SelectGeneResults"<br />
class="org.gmod.architecture.framwork.bakeoff.Gene" groupBy="id"><br />
<result column="FEATURE_ID" property="id" jdbcType="INTEGER"/><br />
<result column="GENE_NAME" property="name" jdbcType="VARCHAR" /><br />
<result column="DESCRIPTION" property="description“<br />
jdbcType="VARCHAR" /><br />
<result column="TYPE_ID" property="typeId" jdbcType="INTEGER" /><br />
<result property="exons" resultMap = "gene.SelectExonResults"/><br />
</resultMap><br />
<br />
<resultMap id="SelectExonResults"<br />
class="org.gmod.architecture.framwork.bakeoff.Exon"><br />
<result column="EXON_ID" property="id" jdbcType="INTEGER"/><br />
<result column="EXON_NAME" property="name" jdbcType="VARCHAR" /><br />
<result column="EXON_RESIDUES" property="residues" jdbcType="CLOB" /><br />
<result column="STRAND" property="strand" jdbcType="INTEGER" /><br />
<result column="FMIN" property="fmin" jdbcType="INTEGER" /><br />
<result column="FMAX" property="fmax" jdbcType="INTEGER" /><br />
<result column="SRCFEATURE_ID" property="sourceFeatureId"<br />
jdbcType="INTEGER" /><br />
</resultMap><br />
</xml><br />
<br />
=====Problem 2 - Master Detail Report=====<br />
<xml><br />
<resultMap id="SelectGeneResults"<br />
class="org.gmod.architecture.framwork.bakeoff.Gene" groupBy="id"><br />
<result column="FEATURE_ID" property="id" jdbcType="INTEGER"/><br />
<result column="GENE_NAME" property="name" jdbcType="VARCHAR" /><br />
<result column="DESCRIPTION" property="description“<br />
jdbcType="VARCHAR" /><br />
<result column="TYPE_ID" property="typeId" jdbcType="INTEGER" /><br />
<result property="exons" resultMap = "gene.SelectExonResults"/><br />
</resultMap><br />
<br />
<resultMap id="SelectExonResults"<br />
class="org.gmod.architecture.framwork.bakeoff.Exon"><br />
<result column="EXON_ID" property="id" jdbcType="INTEGER"/><br />
<result column="EXON_NAME" property="name" jdbcType="VARCHAR" /><br />
<result column="EXON_RESIDUES" property="residues" jdbcType="CLOB" /><br />
<result column="STRAND" property="strand" jdbcType="INTEGER" /><br />
<result column="FMIN" property="fmin" jdbcType="INTEGER" /><br />
<result column="FMAX" property="fmax" jdbcType="INTEGER" /><br />
<result column="SRCFEATURE_ID" property="sourceFeatureId"<br />
jdbcType="INTEGER" /><br />
</resultMap><br />
</xml><br />
<br />
=====Master Detail Report=====<br />
<br />
gene_id Symbol Type Fmin Fmax<br />
6129482 x-files gene 13691 13767<br />
6129482 x-files gene 14687 14720<br />
<br />
Becomes:<br />
<br />
gene_id Symbol Type Fmin Fmax<br />
6129482 x-files gene 13691 13767<br />
14687 14720<br />
<br />
=====Dynamic Queries=====<br />
<br />
* Gene Name (Description)<br />
** Feature, Featureprop<br />
* Symbol<br />
** Feature<br />
* Feature Synonyms<br />
** Feature, Feature_Synonym, Synonym<br />
* Ortholog Synonyms<br />
** Feature, Feature_relationship, Feature, Feature Synonyms<br />
<br />
=====Dynamic Queries=====<br />
<br />
FROM<br />
CAT_X_GENE_V gc<br />
<isEqual<br />
prepend=",property="searchSymbol"<br />
compareValue="true"><br />
GENE_SYMBOLS s<br />
</isEqual><br />
<br />
<isEqual prepend=","<br />
property="searchNcbi" <br />
compareValue="true"><br />
NCBI_GI n<br />
</isEqual><br />
<br />
=====Dynamic Queries=====<br />
<br />
<dynamic prepend="WHERE"><br />
<isEqual prepend="AND" property="searchNameOnly“<br />
compareValue="true"><br />
<iterate property="searchTokens" conjunction="AND" <br />
open=" (" close=") "><br />
LOWER(VARCHAR(gc.longname)) LIKE <br />
LOWER(CAST(#searchTokens[]:VARCHAR# AS VARCHAR(512)))<br />
</iterate><br />
</isEqual><br />
<br />
Iterate very useful for multiple search terms <br />
<br />
=====Miscellaneous Features=====<br />
<br />
* Supports various data sources<br />
** Simple JDBC<br />
** DBCP – Apache Connection Pooling<br />
** JNDI – Java Naming Directory Interface<br />
* Very flexible<br />
* Local caching of results<br />
** Lazy loading<br />
<br />
=====Support=====<br />
<br />
* In GMOD used by<br />
** Xenbase, Artemis at Sanger<br />
* Many other users<br />
** e.g. MySpace.com<br />
* Top level Apache Project<br />
** www.ibatis.apache.org<br />
* Active community<br />
<br />
<br />
=====What iBatis Does Well=====<br />
<br />
* Does not hide SQL<br />
** No new query language to learn<br />
* Separates and groups SQL<br />
* Simple!!<br />
** Light wrapper - No real tweaks<br />
* Does the job well<br />
* Excellent support for Master-Detail<br />
* Dynamically generated queries <br />
** You can structure conditions around clauses in SQL<br />
** One XML statement can represent many variations on a query<br />
<br />
=====Acknowledgements=====<br />
<br />
GMOD<br />
* Eric Just<br />
* Everyone else<br />
<br />
Ibatis Developers<br />
* Kevin Snyder,<br />
* Chris Jarabek,<br />
* Ross Gibb<br />
<br />
PI<br />
* Peter Vize<br />
<br />
Financial Support<br />
* Alberta Heritage Foundation for Medical Research<br />
* Alberta Network for Proteomics Innovation<br />
* University of Calgary, Faculty of Science<br />
* University of Calgary Dept. of Computer Science<br />
* NICHD<br />
<br />
===Hibernate===<br />
<br />
====Background====<br />
<br />
* Source: http://www.hibernate.org<br />
* Language: Java<br />
* Authors: JBoss group<br />
* Users: VectorBase<br />
* Support: JBoss group<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Limitations====<br />
<br />
* Issue around Completeness <br />
* Exception Handling<br />
* Performance Tuning<br />
<br />
====Presentation by Robert Bruggner====<br />
<br />
Chado API via Java & Hibernate, Robert Bruggner, VectorBase.org. This Wiki section is an edited version of [[Media:HibernateChadoAPI.pdf|Robert's presentation]].<br />
<br />
=====Overview=====<br />
<br />
* Background<br />
* Quick Hibernate Overview<br />
* Hibernate Connectivity and O/R Mapping Example<br />
* GMOD Demo<br />
<br />
Also see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]].<br />
<br />
=====Background=====<br />
<br />
* VectorBase<br />
** A bioinformatic resource center for invertebrate vectors of human pathogens<br />
* Responsible for storage and display of multiple organisms’ genomes<br />
** Anopheles gambiae, Aedes aegypti, Ixodes scapularis, Culex pipiens and so on....<br />
* Want to store data for many organisms- Chado a natural choice<br />
* Ensembl Genome Browser already used for ''A. gambiae''<br />
** Wrote Ensembl API Database adaptor for Chado... Not maintainable.<br />
* Use Both Databases<br />
** Transfer genomic data from Ensembl to Chado<br />
** Search Engine and Indexer using Lucene<br />
** Run DAS<br />
** Export data via ChadoXML and GFF3<br />
* Need API for Database I/O<br />
<br />
=====Hibernate Background=====<br />
<br />
* They say: “A powerful, high performance object/relational persistence and query service.”<br />
* Automates the persistence of plain old Java objects (POJO)<br />
** User maps their POJO properties to database tables via XML (HBM File).<br />
** There are Hibernate tools that generate HBMs<br />
*** Configurable in the sense that one can create get & set tables where the methods map one-to-one to fields.<br />
* Persist a specific object by storing it the database.<br />
* Intelligent Database I/O <br />
** Smart detection of ''Dirty Properties'' when performing Save / Update / Delete.<br />
** Cascadable Save / Update / Delete for complex objects.<br />
* Everything's done within the scope of a transaction.<br />
<br />
=====Hibernate Database Connectivity=====<br />
<br />
* Configure Hibernate in hibernate.cfg.xml<br />
* Define a Data Source<br />
** We use a simple, single JDBC connection Chado<br />
** Can be configured to use a connection pool or data source accessible by the Java Naming and Directory Interface (JNDI).<br />
** Define a connection “dialect”<br />
** org.hibernate.dialect.PostgreSQLDialect<br />
* Describe the relationship between Java objects and database tables<br />
** Use XML to describe where to store POJO property data in the database<br />
* Create a new Hibernate Session based on the configuration<br />
* Begin a transaction to start performing work<br />
<br />
=====POJO and HBM Example file - CV=====<br />
<br />
<java><br />
public class CV {<br />
<br />
private int cv_id<br />
private String name;<br />
private String definition;<br />
<br />
public property gettersandsetters() {<br />
....<br />
}<br />
<br />
public boolean equals(CV comparaCV) {<br />
....<br />
}<br />
public int hashCode(){<br />
...<br />
}<br />
}<br />
</java><br />
<xml><br />
<hibernate-mapping><br />
<br />
<class name="org.vectorbase.chadoAPI.chadoObjects.CV" table="cv"><br />
<br />
<id name="cv_id" column="cv_id" unsaved-value="undefined"><br />
<br />
<generator class="sequence"><br />
<br />
<param name="sequence">cv_cv_id_seq</param><br />
<br />
</generator><br />
<br />
</id> <br />
<br />
<property name="name" column=”name” type="java.lang.String” not-null="true"/><br />
<br />
<property name="definition" column=”definition” type="java.lang.String”/><br />
<br />
</class><br />
</hibernate-mapping><br />
</xml><br />
<br />
=====HBM Example CVTerm=====<br />
<br />
<java><br />
public class CVTerm {<br />
<br />
private int cvterm_id;<br />
<br />
private CV cv;<br />
<br />
private String name;<br />
<br />
private String definition;<br />
<br />
private DBXref dbxref;<br />
<br />
private int is_obsolete;<br />
<br />
private int is_relationshiptype;<br />
<br />
.....<br />
<br />
}<br />
</java><br />
<xml><br />
<hibernate-mapping><br />
<br />
<class name="org.vectorbase.chadoAPI.chadoObjects.CVTerm" table="cvterm"><br />
<br />
<id name="cvterm_id" column="cvterm_id" unsaved-value="undefined"><br />
<br />
<generator class="sequence"><br />
<br />
<param name="sequence">cvterm_cvterm_id_seq</param><br />
<br />
</generator><br />
<br />
</id><br />
<br />
<many-to-one name="cv" class="org.vectorbase.chadoAPI.chadoObjects.CV" column="cv_id" <br />
not-null="true" cascade="save-update"/><br />
<br />
<property name="name" not-null="true" type="java.lang.String"/><br />
<br />
<property name="definition"/><br />
<br />
<one-to-one name="dbxref" class="org.vectorbase.chadoAPI.chadoObjects.DBXref" cascade="all"/><br />
<br />
<property name="is_obsolete"/><br />
<br />
<property name="is_relationshiptype"/><br />
<br />
</class><br />
</hibernate-mapping><br />
</xml><br />
<br />
=====Hibernate Object Retrieve=====<br />
<br />
One can use Java, Hibernate Query Language, or SQL, this example uses HQL<br />
<br />
<java><br />
import org.hibernate.Session;<br />
import org.vectorbase.chadoAPI.CVTerm;<br />
import org.vectorbase.chadoAPI.CV;<br />
<br />
// Load the configuration from hibernate.cfg.xml<br />
<br />
// Build a session factory first (not shown)<br />
<br />
// Get the session based on the configuration and begin transaction<br />
Session session = HibernateSessionFactory.getCurrentSession();<br />
session.beginTransaction();<br />
<br />
// Load a CVTerm by its ID<br />
CVTerm cvt = (CVTerm) session.get(CVTerm.class,1);<br />
<br />
// Load a CVTerm using HQL<br />
CVTerm cvt = session.createQuery(“from CVTerm where name=?”).setString(0,”name”).uniqueResult();<br />
<br />
// Print out the name of the cvterm<br />
System.out.println(cvt.getName());<br />
<br />
// Get the cv that the cvterm is associated with<br />
// Hibernate doesn’t return the cv_id - it returns a CV Object.<br />
CV cv = cvt.getCv();<br />
<br />
// Print out the cv’s name<br />
System.out.println(cv.getName());<br />
</java><br />
<br />
=====Hibernate Object Update=====<br />
<br />
<java><br />
import org.hibernate.Session;<br />
import org.vectorbase.chadoAPI.CVTerm;<br />
<br />
// Load the configuration from hibernate.cfg.xml<br />
<br />
// Build a session factory first (not shown)<br />
<br />
// Get the session based on the configuration and begin transaction<br />
Session session = HibernateSessionFactory.getCurrentSession();<br />
session.beginTransaction();<br />
<br />
// Load a CVTerm by its ID<br />
CVTerm cvt = (CVTerm) session.get(CVTerm.class,1);<br />
<br />
// Change cvt’s name<br />
cvt.setName(“New CVTerm name”);<br />
<br />
// Save!<br />
// Generated SQL updates “Dirty” properties (name, in this case)<br />
session.save(cvt);<br />
<br />
// Commit data to database<br />
session.commit();<br />
</java><br />
<br />
=====Hibernate Save=====<br />
<br />
<java><br />
import org.hibernate.Session;<br />
import org.vectorbase.chadoAPI.CVTerm;<br />
import org.vectorbase.chadoAPI.CV;<br />
<br />
// Load the configuration from hibernate.cfg.xml<br />
// Build a session factory first and get begin transaction (not shown)<br />
<br />
// Make a new CV<br />
CV new_cv = new CV();<br />
new_cv.setName(“New CV”);<br />
new_cv.setDefinition(“New CV Def”);<br />
<br />
// Make a new cvterm for that cv<br />
CVTerm new_cvterm = new CVTerm();<br />
new_cvterm.setName(“New CVTerm Name”);<br />
// ..... save dbxref etc......<br />
<br />
// Add that CVTerm to our new CV<br />
new_cv.addCVTerm(new_cvterm);<br />
<br />
// Save the new data...<br />
// Hibernate recognizes that it has to first save new_cv, then save new_cvterm.<br />
session.save(new_cvterm);<br />
<br />
session.commit();<br />
<br />
// You can see the new id’s assigned by the database<br />
System.out.println(new_cv.getCv_id());<br />
System.out.println(new_cvterm.getCvterm_id());<br />
</java><br />
<br />
=====Inheritance=====<br />
<xml><br />
<hibernate-mapping><br />
<br />
<class name="org.vectorbase.chadoAPI.chadoObjects.Feature" table="feature" discriminator-<br />
value="not null"><br />
<br />
<id name="feature_id" column="feature_id" unsaved-value="undefined"><br />
<br />
<generator class="sequence"><br />
<br />
<param name="sequence">feature_feature_id_seq</param><br />
<br />
</generator><br />
<br />
</id> <br />
<br />
<discriminator column="type_id" type="integer" insert="false"/><br />
<br />
<many-to-one name="dbxref" class="org.vectorbase.chadoAPI.chadoObjects.DBXref" <br />
column="dbxref_id" cascade="all"/><br />
<br />
<many-to-one name="organism" class="org.vectorbase.chadoAPI.chadoObjects.Organism" <br />
column="organism_id" not-null="true" cascade="save-update"/><br />
<br />
<property name="name"/><br />
.....<br />
<br />
<hibernate-mapping> <br />
<br />
<subclass name="org.vectorbase.chadoAPI.chadoFeatures.Gene" <br />
extends="org.vectorbase.chadoAPI.chadoObjects.Feature" discriminator-value="767"><br />
<br />
</subclass><br />
</hibernate-mapping><br />
</xml><br />
Write custom methods for specific sub-classes<br />
<br />
=====ChadoAPI=====<br />
<br />
* POJO Mappings<br />
** CV, CVTerm, DB, DBXref, Feature, FeatureCVTerm, FeatureDBXref, FeatureLoc, FeatureProp, FeatureRelationship, FeatureSynonym, Organism, Pub, Synonym<br />
* Extended Features<br />
** Chromosome, Gene, Transcript, Exon, Protein<br />
* Constants<br />
** CVTerms, FeatureFeatureRelationships, Ontologies<br />
* Special<br />
** ChadoAdapter<br />
<br />
=====Problem 1 - GMOD Example=====<br />
<br />
<java><br />
// Set up our session and begin transaction<br />
Session session = HibernateUtil.getSessionFactory().getCurrentSession();<br />
session.beginTransaction();<br />
<br />
// Make a chado adpator and load up some utility objects<br />
ChadoAdaptor ca = new ChadoAdaptor();<br />
Chromosome c = ca.fetchChromosomeByUniqueName("fake_chromosome");<br />
Pub null_pub = ca.fetchPubByPubID(1);<br />
Organism agambiae = ca.fetchOrganismByScientificName("Anopheles","gambiae");<br />
<br />
// Begin GMOD Demo Code<br />
<br />
// Make our new gene;<br />
Gene xfile = new Gene();<br />
xfile.setOrganism(agambiae);<br />
xfile.setUniquename("xfile");<br />
xfile.setDescription("A test gene for GMOD meeting");<br />
<br />
/* Set the location of our gene. No need to set coordinates because they'll be updated<br />
* based on the exon boundaries. <br />
*/<br />
FeatureLoc xfile_loc = new FeatureLoc();<br />
xfile_loc.setSrcfeature(c);<br />
xfile_loc.setStrand(1);<br />
xfile.setFeatureLoc(xfile_loc);<br />
<br />
// Add synonyms to xfile<br />
xfile.createNewFeatureSynonym("mulder", null_pub, CVTerms.EXACT_SYNONYM);<br />
xfile.createNewFeatureSynonym("scully", null_pub, CVTerms.EXACT_SYNONYM);<br />
</java><br />
<br />
=====Problem 2 - GMOD Example=====<br />
<br />
<java><br />
// Create a new transcript for our gene.<br />
Transcript t = xfile.createGeneTranscript("xfile-RA");<br />
<br />
// Create some exons for that transcript.<br />
t.createTranscriptExon("xfile:1", 13691, 13767);<br />
t.createTranscriptExon("xfile:2", 14687, 14720);<br />
<br />
// Save our new gene<br />
session.save(xfile);<br />
System.out.println("xfile feature_id is " + xfile.getFeature_id());<br />
<br />
// Fetch our saved gene from the database<br />
Gene xfile_r = ca.fetchGeneByUniqueName("xfile");<br />
System.out.println("symbol: " + xfile_r.getUniquename());<br />
System.out.print("synonyms: ");<br />
for (FeatureSynonym fs : xfile_r.getFeatureSynonyms()){<br />
<br />
System.out.print(fs.getSynonym().getName() + " ");<br />
}<br />
<br />
System.out.println("description: " + xfile_r.getDescription());<br />
System.out.println("type: " + xfile_r.getType().getName());<br />
<br />
for (Transcript tx : xfile_r.fetchAllTranscripts()){<br />
for (Exon e : tx.fetchAllExons()){<br />
System.out.println(e.getUniquename() + " Start:\t" + e.getFeatureLoc().getFmin());<br />
System.out.println(e.getUniquename() + " End:\t" + e.getFeatureLoc().getFmax());<br />
System.out.println("\tSrcFeatureID: " + e.getFeatureLoc().getSrcfeature().getFeature_id());<br />
}<br />
System.out.println(">" + tx.getUniquename());<br />
System.out.println(tx.generateTranscriptSequenceFromExons().toUpperCase());<br />
}<br />
</java><br />
<br />
=====Problems 3, 4, & 5 - GMOD Update & Delete=====<br />
<br />
<java><br />
// Lets update our name...<br />
xfile_r.setUniquename("x-file");<br />
<br />
session.save(xfile_r);<br />
<br />
// Not part of the ChadoAdaptor utility object, but a good example of HQL<br />
List<Gene> genes = (List<Gene>)session.createQuery("from Gene where uniquename like ?").setString(0,”x-%”).list();<br />
<br />
for (Gene g : genes){<br />
<br />
System.out.println(g.getFeature_id() + <br />
"\t" + g.getUniquename() + <br />
"\t" + g.getOrganism().getGenus() +<br />
" " + g.getOrganism().getSpecies());<br />
}<br />
<br />
// Deleting... hmm...<br />
Gene delete_me = ca.fetchGeneByUniqueName("x-ray");<br />
session.delete(delete_me);<br />
<br />
// All Finished<br />
session.getTransaction().commit();<br />
</java><br />
<br />
<br />
<br />
=====What Hibernate Does Well=====<br />
<br />
* Hibernate can be configured to perform specialized functions<br />
** For example, it has its own notion of a cascade<br />
* Flexible with respect to language<br />
** Java, Hibernate Query Language, or SQL<br />
* Any JDBC driver<br />
<br />
=====Acknowledgements=====<br />
<br />
* VectorBase People<br />
** Frank Collins, EO Stinson, Ryan Butler<br />
* GMOD<br />
* NIAID<br />
<br />
===PSU Chado Interface===<br />
<br />
====Background====<br />
<br />
* Source:<br />
* Language: Java<br />
* Authors: Chinmay Patel, Adrian Tivey<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Limitations====<br />
<br />
* Hibernate Save: no equivalent to a ''find or save'' method.<br />
* Not great for bulk data retrieval.<br />
* Hibernate works best when applied to databases designed with objects in mind (''Object Oriented Databases''). <br />
<br />
<br />
====Presentation by Chinmay Patel====<br />
<br />
This Wiki section is an edited version of [[Media:PSU.pdf|Chinmay's presentation]].<br />
<br />
=====GeneDB=====<br />
<br />
* GeneDB is the organism data and annotation database for the Pathogen Sequencing Unit (PSU) at the Sanger Institute, UK<br />
* Contains 37 organisms, which is expected to grow to 62<br />
* Currently migrating to chado schema<br />
* Java API with two engines Hibernate & iBatis<br />
** Two teams, Artemis and GeneDB, took different approaches<br />
<br />
=====Technical - Connections=====<br />
<br />
Connections are configured in the Spring configuration file<br />
<xml><br />
<bean id="dataSource" class="org.apache.commons.dbcp.BasicDataSource"><br />
<property name="driverClassName" value="org.postgresql.Driver" /><br />
<property name="url" value="jdbc:postgresql://holly.sanger.ac.uk:5432/chado" /><br />
<property name="username" value="DELIBERATELY_BOGUS_NAME"/><br />
<property name="password" value="WIBBLE" /><br />
</bean><br />
</xml><br />
* Uses a connection pool<br />
* Connection to the database is specified graphically, so the iBatis configuration file has variables for the location:<br />
<xml><br />
<property name="JDBC.Driver" value="org.postgresql.Driver"/><br />
<br />
<property name="JDBC.ConnectionURL” value="jdbc:postgresql://${chado}"/><br />
<br />
<property name="JDBC.Username" value="${username}"/><br />
<br />
<property name="JDBC.Password" value="${password}"/><br />
</xml><br />
<br />
* provide database location, username & password<br />
* select from scrollable list of feature with residues (organisms in separate Postgres schemas) what to open in Artemis<br />
<br />
=====Technical - Code Generation=====<br />
<br />
* The shared interface and hibernate implementation were originally generated<br />
* There’s no explicit code generation (although the Spring and Hibernate runtimes may use them behinds the scenes)<br />
<br />
=====Technical - Transactions=====<br />
<br />
* Transactions are fully supported<br />
* There’s no explicit code generation (although the Spring and Hibernate runtimes may use them behind the scenes)<br />
<br />
=====Problems 1, 2, & 3=====<br />
<br />
Creating a gene<br />
<java><br />
genes[0] = new Feature(ORG, GENE, "xfile", false, false, now, now);<br />
<br />
genes[0].setSeqLen(1029); <br />
sequenceDao.persist(genes[0]);<br />
<br />
FeatureLoc loc = new FeatureLoc(SOURCE_FEATURE, genes[0], 13691, false, 14720, false, (short)1, 0, 0 ,0);<br />
<br />
sequenceDao.persist(loc);<br />
<br />
addFeatureProp(genes[0], "description", "A test gene for GMOD meeting");<br />
<br />
addSynonymsToFeature(genes[0], "mulder", "scully");<br />
<br />
createExon("exon1", genes[0], 13691, 13767, now, 0);<br />
<br />
createExon("exon2", genes[0], 14687, 14720, now, 1);<br />
</java><br />
<br />
Retrieve a gene<br />
<java><br />
Feature f = sequenceDao.getFeatureByUniqueName("xfile");<br />
displayGene(f);<br />
</java><br />
<br />
Update a gene<br />
<java><br />
genes[0].setUniqueName("x-file");<br />
<br />
sequenceDao.merge(genes[0]);<br />
</java><br />
<br />
=====Demo – Sample Problem=====<br />
<br />
<java><br />
private Feature createExon(String name, Feature gene, int min, int max, Timestamp now, int rank) {<br />
<br />
Feature exon = new Feature(ORG, EXON, name, false, false, now, now);<br />
exon.setSeqLen(max-min);<br />
sequenceDao.persist(exon);<br />
<br />
FeatureLoc loc = new FeatureLoc(SOURCE_FEATURE, exon, min, false, max, false, <br />
(short)1, 0, 0 ,0);<br />
sequenceDao.persist(loc);<br />
<br />
return exon;<br />
<br />
}<br />
</java><br />
<br />
=====Demo – Sample Problem=====<br />
<br />
<xml><br />
<st:section name="Naming" id="gene_naming" collapsed="false" collapsible="false"<br />
hideIfEmpty="true"><br />
<dl><br />
<dt><b>symbol:</b></dt><br />
<dd>${feature.uniqueName}</dd><br />
</dl><br />
<db:synonym name="synonym" var="name" collection="${feature.featureSynonyms}"><br />
<br /><b>Synonym:</b> <db:list-string collection="${name}" /><br />
</db:synonym><br />
<dt><b>Type:</b></dt><br />
<dd>${feature.cvTerm.name}</dd><br />
<br />
<st:section name="Exons" collapsed="false" collapsible="true" hideIfEmpty="true"><br />
<display:table name="exons" uid="tmp" pagesize="30" class="simple" cellspacing="0"<br />
cellpadding="4"><br />
<display:column property="uniqueName" title="Exon"/><br />
<display:column property="featureLocsForSrcFeatureId.fmin" title="Start"/><br />
<display:column property="featureLocsForSrcFeatureId.fmax" title="end"/><br />
</display:table><br />
</st:section><br />
<br />
<st:section name="cds" collapsible="true"><br />
<b>${feature.residues}</b><br />
</st:section><br />
</xml><br />
<br />
Specialized functionality like a cascading delete are handled by the database</div>165.124.152.78http://gmod.org/wiki/GMOD_MiddlewareGMOD Middleware2007-01-26T19:51:37Z<p>165.124.152.78: /* Modware Features */</p>
<hr />
<div>==Middleware for Chado databases==<br />
<br />
===Authors===<br />
<br />
* Jeff Bowes<br />
* Robert Bruggner<br />
* Scott Cain<br />
* Josh Goodman<br />
* Eric Just<br />
* Sohel Merchant<br />
* Brian O'Connor<br />
* Brian Osborne<br />
* Chinmay Patel<br />
* Pinglei Zhou<br />
<br />
===Middleware Evaluation January 2007===<br />
<br />
A group of some 50 GMOD developers met at the annual meeting to discuss middleware. This one day meeting had the following general goals:<br />
<br />
* To educate GMOD programmers on methods and practices for Middleware<br />
* To facilitate discussion on the best methods<br />
* To guide GMOD to a uniform Middleware layer<br />
* To generate this central reference document for Middleware projects, including:<br />
** Platform information<br />
** Strengths & weaknesses of different Middleware packages<br />
** Specific examples of how one would use a given middleware package<br />
<br />
===Introduction===<br />
<br />
One of the key characteristics of the GMOD software project is the variety of approaches and components that it supports. This applies to applications, database schemas, as well as to middleware, a software ''layer'' that mediates the exchange of data between the applications and the databases. Despite this diversity certain applications and schemas have emerged as key supported components in GMOD, such as the GBrowse application and the Chado schema, to name just two. However, a consensus view has not emerged with respect to middleware, and there are certainly a number of different middleware packages that have been used in the GMOD world, coming from within this world and from the larger world of open source.<br />
<br />
In late 2006 the GMOD developers took note of the large number of middleware packages in use and elected to embark on a short-term study to evaluate and compare these packages. The primary motivation here was to select or recommend certain packages over others specifically within the GMOD context. The assumption is that making such recommendations will serve to focus the developers' effort on a smaller number of packages. Clearly it's also assumed that such a focus will inevitably lead to greater support for and use of those recommended packages, and that all will GMOD will benefit.<br />
<br />
Another purpose of this study is to educate GMOD programmers on best practices concerning the use and development of middleware. It's expected that common agreement on these practices will lead to the development of more effective software as well as the best use of the software in practice. Finally, this study should generate a central reference document on these different middleware packages used in GMOD. This reference will contain platform- and language-specific information as well as descriptions of the strengths and weaknesses of the packages that can be used by GMOD developers when considering middleware.<br />
<br />
====General Evaluation Criteria====<br />
<br />
The GMOD developers proposed that each presenter provide some basic information about each middleware package, both general and technical. In addition each middleware application was asked to address a set of sample problems, shown below. These example problems are thought to typify some of the common functions that the scientist may need when working with their own database. It was understood that not all software would be able to handle all aspects of the sample problems and this demonstration was not intended to be ''live''.<br />
<br />
=====Problem 1=====<br />
<br />
Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned <code>feature_id</code> for each inserted gene.<br />
<br />
Note:<br />
<br />
* The coordinates are given in exact coordinates.<br />
* Use the <code>organism_id</code> for your organism<br />
* Store description in the chado table <code>featureprop</code><br />
* A sequence in fasta format (see [[FakeChromosome|Fake Chromosome]]) should be loaded as genomic sequence, either chromosome or contig -- this will be used as a <code>srcfeature</code> in <code>featureloc</code><br />
<br />
Gene descriptions:<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
mRNA Feature<br />
exon_1: <br />
start: 13691<br />
end: 13767<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
exon_2: <br />
start: 14687<br />
end: 14720<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
symbol: x-men<br />
synonyms: wolverine<br />
mRNA Feature<br />
exon_1: <br />
start: 12648<br />
end: 13136<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
symbol: x-ray<br />
synonyms: none<br />
exon: <br />
start: 1703<br />
end: 1900<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 2=====<br />
<br />
Retrieve and print the following report for gene '''xfile''' (the coding sequence and exon coordinates are derived from the associated mRNA feature). The results should resemble the following:<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>xfile cds <br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
=====Problem 3=====<br />
<br />
Update the gene '''xfile''': change the name symbol to '''x-file''' and retrieve the changed record. Regenerate the report from Problem 1. The results should resemble the following:<br />
<br />
symbol: x-file<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>x-file cds<br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
=====Problem 4=====<br />
<br />
Search for all genes with symbols starting with ''x-''. With the results produce the following simple result list<br />
(organism will vary):<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
1325 x-ray Xenopus laevis<br />
<br />
=====Problem 5=====<br />
<br />
Delete the gene '''x-ray''' using the <code>geneId</code>. Run the search and report in Problem 4 again to show the delete has taken<br />
place, with a result resembling the following:<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
<br />
===Conclusions===<br />
<br />
The one day meeting heard presentations from developers using both Perl and Java middleware and a number of satisfactory solutions were described. The focus in all cases was some sort of system that connected to the Chado relational database. Although other databases are encountered in the GMOD world (e.g. BioSQL) the Chado schema is popular and serves as a good test schema for this exercise given its complexity. The primary focus in the talks was on functionality from the perspective of writing code and extending the software and less attention was given to performance. Each presenter focussed on their middleware and little side-by-side comparisons were made (for one comparison please see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]] by Josh Goodman).<br />
<br />
====Problem Assignments====<br />
<br />
All presenters paid attention to the assigned problems and all packages could perform the required operations, except for the GBrowse (DasI) Adaptor which is read-only software. Clearly one can see many differences between packages in how the the problems were solved, please see the presentations themselves for these details. <br />
<br />
The Perl approaches used only the Perl language whereas the Java packages all used Java plus XML, to some degree. In addition iBatis exposes SQL to the developer and it was argued that this could be viewed either as an advantage (allows tuning of underlying SQL) or disadvantage.<br />
<br />
====Java Middleware====<br />
<br />
* Flybase examined iBatis and Hibernate, both use XML configuration files<br />
** Hibernate is better if you're building schema from scratch<br />
** Both auto-configure given a schema. <br />
** Both have strengths and weaknesses.<br />
* Is Hibernate better when you're in the process of designing a schema?<br />
** Hibernate can assist you in making a ''Hibernate-compatible'' schema.<br />
<br />
=====Abstraction=====<br />
<br />
<br />
=====Performance=====<br />
<br />
No pairwise comparisons.<br />
<br />
=====Configuration & Autoconfiguration=====<br />
<br />
<br />
=====Documentation=====<br />
<br />
<br />
====Perl Middleware====<br />
<br />
<br />
=====Abstraction=====<br />
<br />
Modware has Higher level abstraction than that provided by Chado::AutoDBI<br />
<br />
=====Performance=====<br />
<br />
No pairwise comparisons.<br />
<br />
=====Configuration & Autoconfiguration=====<br />
<br />
<br />
=====Documentation=====<br />
<br />
DasI interface is well-documented, about a dozen methods and three classes, all documented.<br />
<br />
===Object-Relational Mapping Principles===<br />
<br />
====Presentation by Sohel Merchant====<br />
<br />
Sohel Merchant, Bioinformatics Software Engineer at dictyBase, Center for Genetic Medicine, Northwestern University, Chicago. This Wiki section is an edited version of [[Media:Object_Relational_Mapping_Layer.pdf|Sohel's presentation]].<br />
<br />
=====Outline=====<br />
<br />
* The Problem<br />
* Solutions<br />
* ORM<br />
* Perl – Class::DBI<br />
* Summary<br />
<br />
=====The Problem=====<br />
<br />
* Developers need to perform Create, Retrieve, Update, Delete (aka CRUD) operations on data inside an application.<br />
* The real world objects represented using a programming language needs to be stored in databases<br />
* Using relational databases to store object-oriented data leads to a semantic gap<br />
* RDBMS have fixed types, but OO can have more complicated user defined types.<br />
<br />
=====Solutions=====<br />
<br />
* Data Access Object (DAO)<br />
* Developer writes a class which contains one attribute for each field in the table<br />
* Methods for CRUD typically contains JDBC/DBI code with the necessary SQL statements.<br />
* Object Relational Mapping (ORM), WikiPedia:<br />
** “ORM is a programming technique that links databases to object-oriented language concepts, creating (in effect) a virtual object database.“<br />
* Developer needs to configure the ORM<br />
* Less amount of manual coding<br />
* CRUD methods are automatically generated by the ORM layer<br />
<br />
=====ORM=====<br />
<br />
ORM solutions<br />
<br />
* Perl<br />
** Class::DBI<br />
* Java<br />
** EJB<br />
** Hibernate<br />
** JDO<br />
** iBatis<br />
<br />
=====Perl - Class::DBI=====<br />
<br />
* Provides a simple interfaces for wrapping Perl classes around a database tables<br />
* Tables are mapped directly to objects<br />
* The table column name are mapped to the get/set methods<br />
* Can be used with transactions<br />
<br />
=====Class::DBI=====<br />
<br />
Defining a class in Class::DBI to represent a table:<br />
<br />
CVTERM<br />
cvterm_id<br />
cv_id<br />
name<br />
definition<br />
dbxref_id<br />
<br />
Corresponding code:<br />
<br />
<perl><br />
package Chado::Cvterm;<br />
use base 'Chado::DBI';<br />
Chado::Cvterm->set_up_table('Cvterm');<br />
</perl><br />
<br />
=====Class::DBI - CRUD=====<br />
<br />
<perl><br />
## Create<br />
$term_dbobj = Chado::Cvterm->create({<br />
name => ”DUMMY TERM”,<br />
cv_id => 1,<br />
dbxref_id => 125<br />
});<br />
## Retrieve<br />
$term_dbobj = Chado::Cvterm->retrieve(2);<br />
<br />
## Update<br />
$term_dbobj->name( $term->name() );<br />
$term_dbobj->definition( $term->definition );<br />
<br />
## Delete<br />
$term_dbobj->delete();<br />
</perl><br />
<br />
=====Java - Hibernate=====<br />
<br />
* Hibernate maps Java Objects directly to database tables<br />
* Scalable<br />
* Works well for controlled Data model<br />
<br />
=====Java - iBatis=====<br />
<br />
* iBATIS maps Java Objects to the results of SQL Queries<br />
* XML definitions for queries<br />
* Queries and managing Maps<br />
* Transactions<br />
* Good fit for existing database schema<br />
<br />
=====Summary=====<br />
<br />
* ORM provides painless roundtrip of data between the application and database.<br />
* Reduces the amount of SQL code and allows a programmatic style interface to the RDBMS<br />
* Choice of ORM solution depends on the type of project<br />
* Flybase examined iBatis and Hibernate, both use XML configuration files<br />
** Hibernate is better if you're building schema from scratch<br />
** Both auto-configure given a schema. <br />
** Both have strengths and weaknesses.<br />
* Is Hibernate better when you're in the process of designing a schema?<br />
** Hibernate can assist you in making a ''Hibernate-compatible'' schema.<br />
<br />
===XORT===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/project/showfiles.php?group_id=27707<br />
* Language: Perl<br />
* Authors: Pinglei Zhou<br />
* Users: Flybase<br />
* Support: Pinglei Zhou<br />
* Third party code: Uses XML::Parser::PerlSAX, Unicode::String, XML::DOM, and DBI from Perl (CPAN)<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
=====Comparing Hibernate & XORT=====<br />
<br />
Flybase tried Hibernate, but just creating simple print() statements in the course of doing bulk operations they<br />
encountered performance issues. Therer are many caching parameters available in Hibernate but the problem is that Chado is recursive or cyclical. <br />
XORT does some simple, and automatic, caching. With XORT you can handle recursive or cyclical operations more easily. In common operations such as merging genes Chado users will encounter this issue routinely.<br />
<br />
Also see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]].<br />
<br />
====Limitations====<br />
<br />
* Database schemas need to follow certain rules<br />
** All must have internal int primary key<br />
** All must have unique key(s)<br />
* It may take a long path to retrieve certain type of data<br />
** Example: gene->allele->genotype->phenotype via feature_relationship<br />
* Structure not stored in memory, you flush out data as it goes<br />
<br />
====Presentation by Pinglei Zhou and Josh Goodman====<br />
<br />
This Wiki section is an edited version of [[Media:XORT.pdf|Josh and Pinglei's presentation]].<br />
<br />
<br />
<br />
=====Introduction=====<br />
<br />
* An XML-database mapping system for data exchange between DB and XML-driven application<br />
* XORT can handle typical XML, it's not Chado-specific<br />
* Developed/Supported by Pinglei Zhou at FlyBase Harvard, 0.007 version now.<br />
* Used at all FlyBase sites<br />
** Harvard has extensive library of Perl modules for generating ChadoXML<br />
* Written in Perl<br />
* Required perl modules:<br />
** XML::Parser::PerlSAX<br />
** Unicode::String<br />
** XML::DOM<br />
** DBI<br />
<br />
=====Chado XML=====<br />
<br />
* Is ChadoXML necessary? No, but it may help you.<br />
* ChadoXML assists with incremental updates, if you want to avoid flush-and-reload.<br />
* While update can be achived by other middleware (for example, perl Class::DBI, Java Hibernate), ChadoXML provide additional feature as way to archive your transaction.<br />
* It provides bulk update/download which other methods lack or is inefficient<br />
<br />
=====Components=====<br />
<br />
* Database & Schema<br />
* ChadoXML Specification<br />
* DumpSpec<br />
** DumpSpec files are simple XML files that tells XORT what to do<br />
** DumpSpec files are ''language independent'', being XML<br />
** It's fairly easy for those who know the schema to read these files and understand what the operation is<br />
<br />
=====Highlights of Chado XML Specification=====<br />
<br />
* Unique represent of specific database schema<br />
* Get away with those internal primary key value<br />
* Static vs. Operational<br />
* Encoding for non-ASCII characters<br />
* Macro mechanism (object reference)<br />
<br />
=====Putting it together: New FlyBase dataflow Part 1=====<br />
<br />
There are three Flybase sites, and most curation is done at Harvard and<br />
Cambridge. Proforma is the curation format at Cambridge and Harvard, but<br />
Harvard also curates with Apollo and ChadoXML.<br />
<br />
Once in Chado, the reporting instance, there's a denormalization step<br />
in moving data to a read-only database. Once in the read-only database there are<br />
dumps, for reporting purposes, using XORT to create ChadoXML. Once<br />
ChadoXML is created version 2 of XSLT is used to create HTML and GFF. HTML reports<br />
are for human-readable reports, GFF for GBrowse and for various power<br />
users.<br />
<br />
1.a. Proforma (FlyBase Cambridge) is converted to ChadoXML<br />
<br />
1.b. ChadoXML is created by Apollo (Harvard)<br />
<br />
1.c. ChadoXML is created by Java SEAN (Harvard)<br />
<br />
2. All ChadoXML is loaded into Chado by XORT<br />
<br />
=====Putting it together: New FlyBase dataflow Part 2=====<br />
<br />
3. Chado (Harvard) is denormalized and loaded into Chado (Indiana)<br />
<br />
4. ChadoXML is created from Chado using XORT<br />
<br />
5.a. GFF and Fasta is created from ChadoXML<br />
<br />
5.b. HTML is created from Chado XML<br />
<br />
=====Data & Report Generation=====<br />
<br />
* Content of all output files is controlled by XML dumpspecs.<br />
** Dumpspecs are language independent.<br />
** Easily readable (with knowledge of Chado structure).<br />
* All XML transformation steps are done with XSLT v2.<br />
** Saxon XSLT (http://saxon.sourceforge.net/)<br />
** ChadoXML is split into individual chunks before XSLT processing to accommodate large file sizes.<br />
** Extremely fast. We can process all data for ~60,000 Drosophila genes in under 30 minutes.<br />
<br />
=====Hibernate & XORT=====<br />
<br />
* Hibernate didn't scale well when dealing with 5,000+ features in bulk.<br />
** The test was simply calling <code>print()</code> statements<br />
* Performance tweaks for Hibernate can be quite complicated to setup for bulk operations.<br />
* XORT is currently handling ~6 million features in production with only minor performance problems.<br />
* XORT is much more language independent.<br />
<br />
=====Support for complex transactions using XORT=====<br />
<br />
For example:<br />
<br />
* Find all records linked to a record using dumpspec<br />
* Merge gene x into y, each with thousands of records attached<br />
<br />
Step 1. Dump all data use simple dumpspec<br />
<xml><br />
<chado><br />
<feature dump=“all”><br />
<uniquename test=“eq”>x</uniquename><br />
</feature><br />
</chado><br />
</xml><br />
Step 2 Delete feature x from DB, with triggers to clean orphan records, if necessary<br />
<br />
Step 3. Edit the output xml, change uniquename x to y, then load the edited file back to DB<br />
<br />
=====CHIA (Chado Interface Application)=====<br />
<br />
A Java application that organizes SQL and XORT functionality for internal users, e.g.:<br />
<br />
* Dump chado-XML for gene regions for Apollo curation<br />
* Organize and execute “canned” SQL queries<br />
* Serve IDs for curators (in development)<br />
* Dynamic browser Chado without writing SQL statement<br />
<br />
CHIA is being designed to be extensible for adding new functionality as needed.<br />
<br />
<br />
=====Documentation=====<br />
<br />
* ''Using Chado to Store Genome Annotation Data"<br />
** Current Protocols in Bioinformatics (Baxevanis, A.D., and Davison, D.B., eds) 2, 9.6.1-9.6.28.<br />
* XORT specification docs<br />
* XORT draft (unpublished)<br />
* GMOD case demo procedure<br />
** All in the doc directory of XORT package, http://www.gmod.org<br />
<br />
=====Acknowledgements=====<br />
<br />
* Willian Gelbart <br />
* Chris Mungall<br />
* David Emmert <br />
* Mark Gibson<br />
* Stan Letovsky <br />
* Nomi Harris<br />
* Frank Smutniak <br />
* Suzanna Lewis<br />
* Peili Zhang <br />
* Stan Letovsky<br />
* Haiyan Zhang <br />
* Aubrey de Grey<br />
* Andy Schroeder <br />
* Don Gilbert<br />
* Susan Russo<br />
* Mark Zythovicz <br />
* Scott Cain<br />
* Lincoln Stein<br />
* Victor Strelets<br />
* Robert Wilson<br />
* Paul Leyland<br />
<br />
===Chado::AutoDBI===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/projects/gmod/<br />
* Language: Perl<br />
* Authors: Allen Day, Scott Cain, Brian O'Connor, & others<br />
* Users: <br />
* Support:<br />
* Third party code: Based on Class::DBI by Michael Schwern & Tony Bowden<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Performance<br />
** Can one read thousands of objects into memory? You could do this but it's not suited to bulk operations<br />
* Joins & complex queries<br />
<br />
<perl><br />
# Add the add_constructor for looking for name lengths<br />
<br />
__PACKAGE__ ->add_constructor(long_names => qq{ length(name) > 15 });<br />
<br />
# Custom SQL<br />
<br />
__PACKAGE__->set_sql(xfiles => qq{<br />
SELECT FEATURE_ID<br />
FROM FEATURE<br />
WHERE NAME = 'xfiles' });<br />
</perl><br />
<br />
====Presentation by Brian O'Connor====<br />
<br />
This Wiki section is an edited version of [[Media:AutoDBI.pdf|Brian's presentation]].<br />
<br />
=====Relation to Turnkey=====<br />
<br />
Turnkey is a package that auto-generates Web sites given a relational<br />
schema, based on SQL::Translator<br />
<br />
* Turnkey authors: Allen Day, Scott Cain, Brian O'Connor<br />
* Turnkey and Chado::AutoDBI objects are essentially the same<br />
<br />
=====Technical Overview=====<br />
<br />
* Code Generation<br />
<br />
=====Project Overview=====<br />
<br />
Convert SQL Queries/Inserts/Deletes -> Object Calls<br />
<sql><br />
INSERT INTO feature (organism_id, name)<br />
VALUES (1, 'foo');<br />
</sql><br />
To:<br />
<perl><br />
my $feature = Turnkey::Model::Feature->find_or_create({<br />
organism_id => $organism,<br />
name => 'xfile', uniquename => 'xfile',<br />
type_id => $mrna_cvterm,<br />
is_analysis => 'f', is_obsolete => 'f'<br />
});<br />
</perl><br />
<br />
=====Technical Overview=====<br />
<br />
* Database connection: use a base class<br />
* Set up base object and connect, then create a ''table object'' to access primary key. <br />
* Class::DBI can find and insert records into other table, based on foreign key.<br />
<br />
<perl><br />
use base qw(Class::DBI::Pg);<br />
<br />
my ($dsn, $name, $pass);<br />
$dsn = "dbi:Pg:host=localhost;dbname=chado;port=5432";<br />
$name = "postgres";<br />
$pass = "";<br />
<br />
Turnkey::Model::DBI->set_db('Main', $dsn, $name, $pass, {AutoCommit => 1});<br />
</perl><br />
<br />
=====Technical Overview=====<br />
<br />
* Basic ORM Object: Feature<br />
<br />
<perl><br />
package Turnkey::Model::Feature;<br />
use base 'Turnkey::Model::DBI';<br />
<br />
Turnkey::Model::Feature->set_up_table('feature');<br />
<br />
#<br />
# Primary key accessors<br />
#<br />
<br />
sub id { shift->feature_id }<br />
sub feature { shift->feature_id }<br />
</perl><br />
<br />
* data field accessors by Class::Accessor<br />
<br />
=====Technical Overview=====<br />
<br />
* Basic ORM Object: Feature<br />
** has_a<br />
<br />
<perl><br />
#<br />
# has_a<br />
#<br />
Turnkey::Model::Feature->has_a( type_id => "Turnkey::Model::Cvterm" );<br />
sub cvterm { return shift->type_id; }<br />
</perl><br />
<br />
* Basic ORM Object: Feature<br />
** has_many<br />
<br />
<perl><br />
#<br />
# has_many<br />
#<br />
Turnkey::Model::Feature->has_many('feature_synonym_feature_id', <br />
'Turnkey::Model::Feature_Synonym' => 'feature_id');<br />
sub feature_synonyms { return shift->feature_synonym_feature_id; }<br />
<br />
Turnkey::Model::Feature->has_many('featureprop_feature_id', <br />
'Turnkey::Model::Featureprop' => 'feature_id');<br />
sub featureprops { return shift->featureprop_feature_id; }<br />
</perl><br />
<br />
* Can traverse tables, such as going from FEATURE to FEATUREPROP <br />
** Tell base object that the ''table object'' has_a() or has_many() keys corresponding to some key in other ''table object'' <br />
<br />
=====Technical Overview=====<br />
<br />
* Basic ORM Object: Feature<br />
** skipping linker tables for has_many<br />
<br />
<perl><br />
# skip over feature_synonym table<br />
#<br />
# method 1<br />
#<br />
sub synonyms { my $self = shift; return map $_->synonym_id, $self->feature_synonyms; }<br />
#<br />
# method 2<br />
#<br />
Turnkey::Model::Feature->has_many( synonyms2 =><br />
['Turnkey::Model::Feature_Synonym' => 'synonym_id']);<br />
</perl><br />
<br />
=====Technical Overview=====<br />
<br />
* Transactions<br />
** Chado::AutoDBI supports transactions, and one can wrap the transaction in an eval()<br />
<perl><br />
sub do_transaction {<br />
my $class = shift;<br />
my ( $code ) = @_;<br />
# Turn off AutoCommit for this scope.<br />
# A commit will occur at the exit of this block automatically,<br />
# when the local AutoCommit goes out of scope.<br />
local $class->db_Main->{ AutoCommit };<br />
<br />
# Execute the required code inside the transaction.<br />
eval { $code->() };<br />
if ( $@ ) {<br />
my $commit_error = $@;<br />
eval { $class->dbi_rollback }; # might also die!<br />
die $commit_error;<br />
}<br />
}<br />
</perl><br />
<br />
=====Technical Overview=====<br />
<br />
* Lazy Loading<br />
** One can either do automated creation of objects or explicitly dictate which fields are incorporated into object<br />
<perl><br />
Turnkey::Model::Feature->columns( Primary => qw/feature_id/ );<br />
Turnkey::Model::Feature->columns( Essential => qw/name organism_id type_id/ );<br />
Turnkey::Model::Feature->columns( Others => qw/residues .../ );<br />
</perl><br />
<br />
Typically:<br />
<br />
<perl><br />
Turnkey::Model::Feature->set_up_table('feature');<br />
</perl><br />
<br />
=====Problem 1=====<br />
<br />
* Create Feature & Add Description<br />
<br />
<perl><br />
# now create mRNA feature<br />
<br />
my $feature = Turnkey::Model::Feature->find_or_create({<br />
organism_id => $organism,<br />
name => 'xfile', uniquename => 'xfile',<br />
type_id => $mrna_cvterm,<br />
is_analysis => 'f', is_obsolete => 'f'<br />
});<br />
<br />
# create description<br />
<br />
my $featureprop = Turnkey::Model::Featureprop->find_or_create({<br />
value => 'A test gene for GMOD meeting',<br />
feature_id => $feature,<br />
type_id => $note_cvterm,<br />
});<br />
</perl><br />
<br />
=====Problem 2=====<br />
<br />
* Retrieve a Feature via Searching<br />
** Search using strings or identifiers, a search will return an iterator object<br />
<br />
<perl><br />
# objects for global use<br />
<br />
# the organism for our new feature<br />
my $organism = Turnkey::Model::Organism->search(abbreviation => "S.cerevisiae")->next;<br />
<br />
# the cvterm for a "Note"<br />
my $note_cvterm = Turnkey::Model::Cvterm->retrieve(2);<br />
<br />
# searching name by wildcard<br />
<br />
my @results = Turnkey::Model::Feature->search_like(name => 'x-%');<br />
</perl><br />
<br />
=====Problems 3, 4, & 5=====<br />
<br />
* Update a Feature<br />
<br />
<perl><br />
# update the xfile gene name<br />
<br />
$feature->name("x-file");<br />
$feature->update();<br />
</perl><br />
<br />
* Delete a Feature<br />
<br />
<perl><br />
# now delete the x-file feature<br />
<br />
$feature->delete();<br />
</perl><br />
<br />
=====Things Chado::AutoDBI does well=====<br />
<br />
* Easy to use<br />
* Easy to port<br />
* Use with other DBs<br />
** Both Oracle and Postgres used currently<br />
* Autogenerated via Turnkey<br />
* find_or_create method<br />
* Performance is not as bad as you might guess<br />
** Due to Lazy loading<br />
** Even whole genome operations are feasible<br />
<br />
Note that speed is relative: one can find bad performance using the wrong SQL and Chado::AutoDBI approach will be speedier.<br />
<br />
<br />
=====For More Information=====<br />
<br />
* Class::DBI<br />
** http://www.class-dbi.com<br />
** http://search.cpan.org<br />
<br />
* Turnkey<br />
** http://turnkey.sf.net<br />
<br />
* Biopackages<br />
** http://biopackages.net<br />
<br />
===Modware===<br />
<br />
====Background====<br />
<br />
* Source: http://gmod-ware.sourceforge.net<br />
* Language: Perl<br />
* Authors: Sohel Merchant, Eric Just<br />
* Users: DictyBase<br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Does not ''cover'' all of Chado<br />
* Not enough users to get quality feedback yet<br />
* Performance (?)<br />
* Language dependent<br />
<br />
====Presentation by Eric Just====<br />
<br />
Eric Just, Senior Bioinformatics Scientist, dictyBase: http://dictybase.org<br />
Center for Genetic Medicine, Northwestern University. This is an edited version of [[Media:Modware.pdf|Eric's presentation]].<br />
<br />
=====Why Modware Was Developed=====<br />
<br />
* Each feature type requires different behavior<br />
* Want to leave schema semantics out of application<br />
* Want to leverage work done in BioPerl<br />
* Re-use code developed for common use cases<br />
* DictyBase is using a superset of Modware<br />
** Some DictyBase-specific code used there<br />
<br />
=====What is in the Feature Table?=====<br />
<br />
The core of Chado<br />
<br />
* Chromosome<br />
* Contig<br />
* Gene<br />
* mRNA<br />
* Exon<br />
* Lots of other things - See Sequence Ontology!<br />
<br />
=====Modware Features=====<br />
<br />
* Multiple Feature classes<br />
** CHROMOSOME, GENE, MRNA, CONTIG<br />
* Each class provides type specific methods<br />
* Logic such as building exon structure of mRNA features is encapsulated<br />
* Parent class Modware::Feature<br />
** Provides common methods<br />
** Abstract factory for various feature types<br />
* Lazy : information is only retrieved when you ask for it<br />
* Uses Bioperl and its objects<br />
** Common methods such as name(), primary_id(), external_ids() <br />
* Subclasses provide type-specific methods<br />
** For example, Chromosome isn't the same as Gene which isn't the same as ...<br />
* Any feature type not explicitly supported in Modware::Feature class is blessed <br />
as a Modware::Feature::GENERIC class which has a start/stop coordinate on a genomic <br />
sequence feature (no structure like a trasncript with exons)<br />
<br />
=====Architectural Overview=====<br />
<br />
* Object-oriented Perl interface to Chado<br />
* Built on top of Chado::AutoDBI<br />
* Connection handled by GMOD<br />
* Database transactions supported<br />
* BioPerl used to represent and manipulate sequence and feature structure<br />
* ‘Lazy’ evaluation<br />
<br />
=====Create and Insert Chromosome=====<br />
<br />
<perl><br />
my $seq_io = new Bio::SeqIO(<br />
-file => "../data/fake_chromosome.txt",<br />
-format => 'fasta'<br />
);<br />
<br />
# Bio::SeqIO will return a Bio::Seq object which<br />
# Modware uses as its representation<br />
my $seq = $seq_io->next_seq();<br />
<br />
my $reference_feature = new Modware::Feature(<br />
-type => 'chromosome',<br />
-bioperl => $seq,<br />
-description => "This is a test",<br />
-name => 'Fake',<br />
-source => 'GMOD 2007 Demo'<br />
);<br />
<br />
# Inserts chromosome into database<br />
$reference_feature->insert();<br />
</perl><br />
<br />
<br />
=====Problem 1 - Create and Insert a Gene=====<br />
<br />
1) Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned feature_id for each inserted gene.<br />
<br />
Gene Feature<br />
symbol: x-ray<br />
synonyms: none<br />
mRNA Feature<br />
exon:<br />
start: 1703<br />
end: 1900<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 1 - Create and Insert a Gene=====<br />
<br />
1) Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned feature_id for each inserted gene.<br />
<br />
Gene Feature<br />
symbol: x-men<br />
synonyms: wolverine<br />
mRNA Feature<br />
exon_1: <br />
start: 12648<br />
end: 13136<br />
strand: 1<br />
srcFeature_id: <br />
Id of genomic sample<br />
<br />
=====Problem 1 - Create and Insert a Gene=====<br />
<br />
1) Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned feature_id for each inserted gene.<br />
<br />
Gene Feature<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
mRNA Feature<br />
exon_1: <br />
start: 13691<br />
end: 13767<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
exon_2: <br />
start: 14687<br />
end: 14720<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 1 - Create and Insert a Gene=====<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
…<br />
<br />
<perl><br />
my $gene_feature = new Modware::Feature(<br />
-type => 'gene',<br />
-name => 'xfile',<br />
-description => 'A test gene for GMOD meeting',<br />
-source => 'GMOD 2007 Demo‘<br />
);<br />
<br />
$gene_feature->add_synonym( 'mulder' );<br />
$gene_feature->add_synonym( 'scully' );<br />
<br />
# inserts object into database<br />
$gene_feature->insert();<br />
print 'Inserted gene with feature_id:'.$gene_feature->feature_id()."\n";<br />
</perl><br />
<br />
=====Problem 1 - Create mRNA BioPerl Object=====<br />
<br />
exon_1: exon_2: <br />
start: 13691 start: 14687<br />
end: 13767 end: 14720<br />
strand: 1 strand: 1<br />
srcFeature_id: Id of genomic sample srcFeature_id: Id of genomic sample<br />
<br />
<perl><br />
# First, create exon features (using Bioperl)<br />
my $exon_1 = new Bio::SeqFeature::Gene::Exon (<br />
-start => 13691,<br />
-end => 13767,<br />
-strand => 1,<br />
-is_coding => 1<br />
);<br />
<br />
my $exon_2 = new Bio::SeqFeature::Gene::Exon (<br />
-start => 14687,<br />
-end => 14720,<br />
-strand => 1,<br />
-is_coding => 1<br />
);<br />
<br />
# Next, create transcript feature to 'hold' exons (using Bioperl)<br />
my $bioperl_mrna = new Bio::SeqFeature::Gene::Transcript();<br />
<br />
# Add exons to transcript (using Bioperl)<br />
$bioperl_mrna->add_exon( $exon_1 );<br />
$bioperl_mrna->add_exon( $exon_2 );<br />
</perl><br />
<br />
=====Problem 1 - Create and Insert mRNA=====<br />
<br />
The BioPerl object holds the location information, but now we want to create a Modware object and link it to the gene as well as locate it on the chromosome.<br />
<br />
<perl><br />
# Now create Modware Feature to 'hold' bioperl object<br />
my $mrna_feature = new Modware::Feature(<br />
-type => 'mRNA',<br />
-bioperl => $bioperl_mrna,<br />
-source => 'GMOD 2007 Demo',<br />
-reference_feature => $reference_feature<br />
);<br />
<br />
# Associate mRNA to gene (required for insertion)<br />
$mrna_feature->gene( $gene_feature );<br />
<br />
# inserts object into database<br />
$mrna_feature->insert();<br />
</perl><br />
<br />
=====Problem 2 - Writing the Report=====<br />
<br />
2) Retrieve and print the following report for gene xfile<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>xfile cds<br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
Create new package, GMODWriter, to write the report, this package<br />
uses Modware and Bioperl methods.<br />
<br />
<perl><br />
use Modware::Gene;<br />
use GMODWriter;<br />
<br />
my $xfile_gene = new Modware::Gene( -name => 'xfile' );<br />
GMODWriter->Write_gene_report( $xfile_gene );<br />
</perl><br />
<br />
* What's the difference between Modware::Gene and Modware::Feature? Gene is-a Feature.<br />
<br />
=====Problem 2 - Writing the Report=====<br />
<br />
2) Retrieve and print the following report for gene xfile<br />
<br />
* The mRNA object contains the Bioperl object<br />
** Why not just subclass? More flexibility the way shown here<br />
<br />
<perl><br />
package GMODWriter; <br />
sub Write_gene_report {<br />
my ($self, $gene) = @_;<br />
my $symbol = $gene->name();<br />
<br />
my @synonyms = @{ $gene->synonyms() };<br />
my $syn_string = join ",", @synonyms;<br />
my $description = $gene->description();<br />
my $type = $gene->type();<br />
# get features associated with the gene that are of type 'mRNA'<br />
my ($mrna) = grep { $_->type() eq 'mRNA' } @{ $gene->features() };<br />
# use bioperl method to get exons from mRNA<br />
my @exons = $mrna->bioperl->exons_ordered();<br />
# Modware will return a nice fasta file for you.<br />
my $fasta = $mrna->sequence( -type => 'cds', -format => 'fasta' );<br />
# Now print the actual report<br />
print "symbol: $symbol\n";<br />
print "synonyms: $syn_string\n";<br />
print "description: $description\n";<br />
print "type: $type\n";<br />
<br />
my $count = 0;<br />
foreach my $exon (@exons ) {<br />
$count++;<br />
print "exon${count} start: ".$exon->start()."\n";<br />
<br />
print "exon${count} end: ".$exon->end()."\n";<br />
<br />
}<br />
print "$fasta";<br />
}<br />
. . .<br />
</perl><br />
<br />
=====Problem 3 - Updating a Gene Name=====<br />
<br />
3) Update the gene xfile: change the name symbol to x-file and retrieve the changed record. Regenerate gene report<br />
<br />
<perl><br />
use Modware::Gene;<br />
use Modware::DBH;<br />
use GMODWriter;<br />
<br />
eval{<br />
<br />
# get xfile gene<br />
my $xfile_gene = new Modware::Gene( -name => 'xfile' );<br />
<br />
# change the name<br />
$xfile_gene->name( 'x-file' );<br />
# write changes to database<br />
$xfile_gene->update();<br />
<br />
# we can use the original object if we want, but instead<br />
# we refetch from the database to 'prove' the name has been changed<br />
my $xfile_gene2 = new Modware::Gene( -name => 'x-file' );<br />
# use our GMODWriter package to write report for x-file<br />
GMODWriter->Write_gene_report( $xfile_gene2 );<br />
<br />
};<br />
if ($@){<br />
warn $@;<br />
new Modware::DBH->rollback();<br />
}<br />
</perl><br />
<br />
<br />
=====Problem 4 - Search and Display Results=====<br />
<br />
4) Search for all genes with symbols starting with "x-*". With the results produce the following simple result list (organism will vary):<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
1325 x-ray Xenopus laevis<br />
<br />
<perl><br />
use Modware::Gene;<br />
use Modware::DBH;<br />
use GMODWriter;<br />
<br />
# find genes starting with 'x-'<br />
my $results = Modware::Search::Gene->Search_by_name( 'x-*' );<br />
<br />
# write the search results<br />
GMODWriter->Write_search_results( $results )<br />
</perl><br />
<br />
<br />
=====Problem 4 - Search and Display Results=====<br />
<br />
4) Search for all genes with symbols starting with "x-*". With the results produce the following simple result list (organism will vary):<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
1325 x-ray Xenopus laevis<br />
<br />
<br />
<perl><br />
sub Write_search_results {<br />
my ($self, $itr) = @_;<br />
# loop through iterator<br />
while (my $gene = $itr->next) {<br />
# print the requested information<br />
print $gene->feature_id . "\t" . $gene->name .<br />
"\t" . $gene->organism_name . "\n";<br />
}<br />
}<br />
</perl><br />
<br />
=====Problem 5 - Delete a Gene=====<br />
<br />
5) Delete the gene x-ray. Run the search and report again.<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
<br />
<perl><br />
# get the xray gene<br />
my $xray = new Modware::Gene( -name => 'x-ray' );<br />
<br />
# set is_deleted = 1, this will 'hide' the gene from searches,<br />
# also sets the is_available to 0, the gene is no longer visible<br />
# to a search.<br />
<br />
$xray->is_deleted(1);<br />
<br />
# write change to database<br />
$xray->update();<br />
<br />
# find genes starting with 'x-'<br />
my $results = Modware::Search::Gene->Search_by_name( 'x-*' );<br />
<br />
# write the search results<br />
GMODWriter->Write_search_results( $results )<br />
</perl><br />
<br />
<br />
=====Other Modware Highlights=====<br />
<br />
* Easy to write applications with Modware<br />
* Extensible<br />
* Available through Sourceforge<br />
** http://gmod-ware.sourceforge.net<br />
* Easy to install<br />
* Large unit test coverage<br />
* Current release 0.2-RC1<br />
** Works with GMOD’s latest release<br />
* Sample script demoed here are available<br />
** sample_scripts directory<br />
<br />
=====Other Nice Things About Modware=====<br />
<br />
<br />
* Bioperl-style documentation <br />
** http://gmod-ware.sourceforge.net/doc/<br />
** POD for all methods<br />
* If Chado changes then...<br />
** Manually change Modware or ... <br />
** AutoDBI will automatically adjust to the change, depends on the change<br />
* Can set multiple connections through AutoDBI's <code>set_connection</code><br />
<br />
=====Coming Attractions=====<br />
<br />
* Support for changing genomic sequence<br />
* ncRNAs<br />
* UTRs<br />
* Onotology modules<br />
* Phenotype Annotations<br />
* Getting a new database handle returns the existing<br />
** Thinking about configuring modules to set what database handle can be used<br />
* Pass an argument ''type'' to the Gene's feature() method<br />
* Type the kind of synonym is being inserted?<br />
** Possible: trade-off between simplicity and functionality<br />
<br />
* Send us your ideas!<br />
<br />
<br />
=====Discussion=====<br />
<br />
* How hard is it to extend Modware?<br />
** Not known absolutely, but generally thought to be not difficult<br />
<br />
=====Acknowlegments=====<br />
<br />
* Rex Chisholm, PhD<br />
* Warren Kibbe, PhD <br />
* Scott Cain<br />
* Brian O’connor<br />
* Sohel Merchant<br />
* Petra Fey<br />
* Pascale Gaudet,<br />
* Karen Pilcher<br />
<br />
* BioPerl<br />
* GMOD<br />
* SGD<br />
<br />
===GBrowse (DasI) Adaptor===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/project/showfiles.php?group_id=27707&package_id=34513&release_id=433523<br />
* Language: Perl<br />
* Authors: Scott Cain<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity: Connects vi perl DBI<br />
* Transaction support: N/A (read only adapter)<br />
* Code generation: N/A<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Read-only<br />
* Not generic middleware but if you use Chado and GBrowse may be useful<br />
* Incomplete implementation of Bio::DasI; just enough to make GBrowse work<br />
* Also, despite the name, has never been tested with a Das server.<br />
<br />
====Presentation by Scott Cain====<br />
<br />
This Wiki section is an edited version of [[Media:DasI_middleware.pdf|Scott's presentation]].<br />
<br />
=====Create the database=====<br />
<br />
$ perl Makefile.PL<br />
$ make<br />
$ sudo make install<br />
$ make load_schema<br />
$ make prepdb # now with Xenopus!<br />
$ make ontologies # load rel, SO, featureprop<br />
<br />
=====Problem 1 - Loading Data=====<br />
<br />
Create some GFF from the specifications:<br />
<br />
fake_chromosome example chromosome 1 15017 . . . ID=fake_chromosome;Name=fake_chromosome<br />
fake_chromosome example gene 13691 14720 . + . ID=xfile;Name=xfile;Alias=mulder,scully;Note=A test gene for GMOD meeting<br />
fake_chromosome example mRNA 13691 14720 . + . ID=xfile_mRNA;Parent=xfile<br />
fake_chromosome example exon 13691 13767 . + . Parent=xfile_mRNA<br />
fake_chromosome example exon 14687 14720 . + . Parent=xfile_mRNA<br />
fake_chromosome example gene 12648 13136 . + . ID=x-men<br />
<br />
Gene inserted as GFF using a standard Bioperl bulk loader:<br />
<br />
<code>$ gmod_bulk_load_gff3.pl -g sample.gff</code><br />
<br />
''...lots of output...''<br />
<br />
=====Adaptor Components=====<br />
<br />
* Bio::DB::Das::Chado<br />
** Database connection object<br />
* Bio::DB::Das::Chado::Segment<br />
** Object for any range of DNA<br />
* Bio::DB::Das::Chado::Segment::Feature<br />
<br />
=====Use Bio::DB::Das::Chado=====<br />
<br />
<perl><br />
use Bio::DB::Das::Chado;<br />
<br />
my $chado = Bio::DB::Das::Chado->new(<br />
-dsn => "dbi:Pg:dbname=test",<br />
-user=> "scott",<br />
-pass=> "" ) || die "no new chado";<br />
<br />
my $gene_name = 'xfile';<br />
<br />
my ($gene_fo) = $chado->get_features_by_name($gene_name);<br />
</perl><br />
<br />
=====Problem 2 - Use Some Accessors=====<br />
<br />
<perl><br />
print "symbol: " . $gene_fo->display_name."\n";<br />
print "synonyms: " . join(', ',$gene_fo->synonyms)."\n";<br />
print "description: " . $gene_fo->notes."\n";<br />
print "type: " . $gene_fo->type."\n";<br />
<br />
my ($mRNA) = $gene_fo->sub_SeqFeature();<br />
my @exons = $mRNA->sub_SeqFeature();<br />
<br />
for my $exon (@exons) {<br />
next unless ($exon->type->method eq 'exon');<br />
$exon_count++;<br />
print "exon$exon_count start: " . $exon->start."\n";<br />
print "exon$exon_count end: " . $exon->end. "\n";<br />
$cds_seq .= $exon->seq->seq; # the first seq call returns a Bio::Seq object, the second gets the DNA string from Bio::Seq<br />
} <br />
</perl><br />
<br />
=====Bulk Output=====<br />
<br />
<perl><br />
my $gene_name = 'x-*';<br />
<br />
my @genes = $chado->get_features_by_name(<br />
-name => $gene_name,<br />
-class=> 'gene' );<br />
<br />
for my $gene (@genes) {<br />
print join("\t",<br />
$gene->feature_id,<br />
$gene->display_name,<br />
$gene->organism),"\n";<br />
}<br />
</perl><br />
<br />
Or see your report in GBrowse<br />
<br />
=====Advantages=====<br />
<br />
* Comes 'for free' with GBrowse<br />
** GBrowse will run with any DasI-compatible interface<br />
* Uses 'familiar' BioPerl idioms, very similar to widely used Bio::DB::GFF (though with fewer methods)<br />
<br />
<br />
=====Conclusion=====<br />
<br />
* Not suitable as a 'general' middleware layer<br />
** May be suitable for some applications, particularly if they are similar to GBrowse or other uses of Bio::DB::GFF<br />
<br />
===iBatis and Abator===<br />
<br />
====Background====<br />
<br />
* Source: http://ibatis.apache.org/<br />
* Language: Java<br />
* Authors: Apache group<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Does not hide SQL<br />
* Does not create a whole object model of the database in memory<br />
* Not as widely used as Hibernate<br />
* No Perl version<br />
<br />
====Presentation by Jeff Bowes====<br />
<br />
Jeff Bowes, Xenbase, University of Calgary. This Wiki section is an edited version of [[Media:iBatis.pdf|Jeff's presentation]].<br />
<br />
=====ibatis=====<br />
<br />
* iBatis<br />
** Light-weight framework<br />
** Still based on SQL but eliminates the repetitive drudgery of JDBC<br />
** You can tune a query by re-writing the SQL in XML & the API does not change.<br />
* iBatis does not create your database in memory as objects<br />
* Shallow learning curve<br />
* Manually create a Java class and SQL map to describe higher-level objects<br />
** Example: ''Gene''<br />
* Support for inheritance<br />
** Inheritance in result maps, allows fair amount of re-use. <br />
* Supports different transaction schemes<br />
** For example, JDBC, Java Transaction API<br />
<br />
=====Abator=====<br />
<br />
* Generates ibatis CRUD objects by introspecting database tables<br />
* Abator creates ''SQL in XML'' files (SQL Map files) and Java classes <br />
** Within these files is a Result Map section.<br />
* Abator config files are simple, set connection parameters, tell where the files are.<br />
* In the SQL Map files you can specify how to find parent ids, such asfeature_id. <br />
<br />
=====Abator Example=====<br />
<xml><br />
<abatorConfiguration><br />
<abatorContext> <!-- TODO: Add Database Connection Information --><br />
<jdbcConnection driverClass="COM.ibm.db2.jdbc.app.DB2Driver"<br />
connectionURL="jdbc:db2:XBDV05"<br />
userId="db2inst1"<br />
password=“*******"><br />
<classPathEntry location="/Program Files/IBM/SQLLIB/java/db2java.zip" /><br />
</jdbcConnection><br />
<br />
<javaModelGenerator<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.model"<br />
targetProject="gene" /><br />
<sqlMapGenerator<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.sql"<br />
targetProject="gene" /><br />
<daoGenerator type="IBATIS"<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.dao"<br />
targetProject="gene" /><br />
<abatorConfiguration><br />
</xml><br />
<br />
=====Abator Example=====<br />
<xml><br />
<abatorConfiguration><br />
<abatorContext> <!-- TODO: Add Database Connection Information --><br />
<jdbcConnection driverClass="COM.ibm.db2.jdbc.app.DB2Driver"<br />
connectionURL="jdbc:db2:XBDV05"<br />
userId="db2inst1"<br />
password=“*******"><br />
<classPathEntry location="/Program Files/IBM/SQLLIB/java/db2java.zip" /><br />
</jdbcConnection><br />
<br />
<javaModelGenerator<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.model"<br />
targetProject="gene" /><br />
<sqlMapGenerator<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.sql"<br />
targetProject="gene" /><br />
<daoGenerator type="IBATIS"<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.dao"<br />
targetProject="gene" /><br />
<abatorConfiguration><br />
</xml><br />
<br />
=====Abator Example=====<br />
<nowiki><br />
<table schema="db2inst1" tableName="synonym"></nowiki><br />
<generatedKey column="synonym_id" sqlStatement="VALUES PREVVAL FOR<br />
synonym_seq" identity="true" /><br />
<columnOverride column="CREATED_BY" jdbcType="INTEGER" /><br />
<columnOverride column="MODIFIED_BY" jdbcType="INTEGER" /><br />
<nowiki></table></nowiki><br />
<br />
=====Abator=====<br />
<br />
Works as:<br />
<br />
* Eclipse plug-in<br />
* ANT<br />
* Standalone<br />
<br />
=====DAO Methods=====<br />
<br />
* Insert (Feature)<br />
* Update (Feature)<br />
* DeletebyKey (FeatureKey)<br />
* SelectbyKey (FeatureKey)<br />
* SelectbyExample (FeatureExample)<br />
* DeletebyExample (FeatureExample)<br />
<br />
=====Insert=====<br />
<xml><br />
<insert id="abatorgenerated_insert" parameterClass=<br />
"org.gmod.architecture.framwork.bakeoff.abator.model.FeatureWithBLOBs"><br />
insert into db2inst1.feature<br />
(DBXREF_ID, ORGANISM_ID, NAME, UNIQUENAME,<br />
RESIDUES, SEQLEN, MD5CHECKSUM, TYPE_ID, IS_ANALYSIS,<br />
IS_OBSOLETE, CREATED_BY)<br />
values (#dbxrefId:INTEGER#, #organismId:INTEGER#, #name:VARCHAR#,<br />
#uniquename:VARCHAR#, #residues:CLOB#, #seqlen:INTEGER#,<br />
#md5checksum:CHAR#, #typeId:INTEGER#,<br />
#isAnalysis:SMALLINT#, #isObsolete:SMALLINT#,<br />
#createdBy:INTEGER#)<br />
<br />
<selectKey resultClass="java.lang.Integer" keyProperty="featureId"><br />
VALUES PREVVAL FOR feature_seq<br />
</selectKey><br />
</insert><br />
</xml><br />
<br />
=====Insert=====<br />
<xml><br />
<insert id="abatorgenerated_insert" parameterClass=<br />
"org.gmod.architecture.framwork.bakeoff.abator.model.FeatureWithBLOBs"><br />
insert into db2inst1.feature<br />
(DBXREF_ID, ORGANISM_ID, NAME, UNIQUENAME,<br />
RESIDUES, SEQLEN, MD5CHECKSUM, TYPE_ID, IS_ANALYSIS,<br />
IS_OBSOLETE, CREATED_BY)<br />
values (#dbxrefId:INTEGER#, #organismId:INTEGER#, #name:VARCHAR#,<br />
#uniquename:VARCHAR#, #residues:CLOB#, #seqlen:INTEGER#,<br />
#md5checksum:CHAR#, #typeId:INTEGER#,<br />
#isAnalysis:SMALLINT#, #isObsolete:SMALLINT#,<br />
#createdBy:INTEGER#)<br />
<br />
<selectKey resultClass="java.lang.Integer" keyProperty="featureId"><br />
VALUES PREVVAL FOR feature_seq<br />
</selectKey><br />
</insert><br />
</xml><br />
<br />
=====Insert=====<br />
<xml><br />
<insert id="abatorgenerated_insert" parameterClass=<br />
"org.gmod.architecture.framwork.bakeoff.abator.model.FeatureWithBLOBs"><br />
insert into db2inst1.feature<br />
(DBXREF_ID, ORGANISM_ID, NAME, UNIQUENAME,<br />
RESIDUES, SEQLEN, MD5CHECKSUM, TYPE_ID, IS_ANALYSIS,<br />
IS_OBSOLETE, CREATED_BY)<br />
values (#dbxrefId:INTEGER#, #organismId:INTEGER#, #name:VARCHAR#,<br />
#uniquename:VARCHAR#, #residues:CLOB#, #seqlen:INTEGER#,<br />
#md5checksum:CHAR#, #typeId:INTEGER#,<br />
#isAnalysis:SMALLINT#, #isObsolete:SMALLINT#,<br />
#createdBy:INTEGER#)<br />
<br />
<selectKey resultClass="java.lang.Integer" keyProperty="featureId"><br />
VALUES PREVVAL FOR feature_seq<br />
</selectKey><br />
</insert><br />
</xml><br />
<br />
=====Insert=====<br />
<xml><br />
<selectKey resultClass="java.lang.Integer"<br />
keyProperty="featureId"><br />
VALUES PREVVAL FOR feature_seq<br />
</selectKey><br />
</xml><br />
<br />
=====Insert=====<br />
<xml><br />
<selectKey resultClass="java.lang.Integer"<br />
keyProperty="featureId"><br />
VALUES PREVVAL FOR feature_seq<br />
</selectKey><br />
</xml><br />
<br />
=====Problem 1 - Insert=====<br />
<br />
<java><br />
try {<br />
sqlMap.startTransaction();<br />
pGene.id =featureDAO.insert(pGene.getFeatureWithBLOBs());<br />
featurepropDAO.insert(pGene.getPropertyDescription());<br />
pGene.featurelocId = featurelocDAO.insert(pGene<br />
.getFeaturelocWithBLOBS());<br />
pGene = insertExons(pGene);<br />
insertSynonyms(pGene);<br />
sqlMap.commitTransaction();<br />
} catch (Exception e) {<br />
System.out.println(e);<br />
throw (e);<br />
} finally {<br />
sqlMap.endTransaction();<br />
}<br />
</java><br />
<br />
=====Problem 1 - Insert=====<br />
<br />
<java><br />
try {<br />
sqlMap.startTransaction();<br />
pGene.id =featureDAO.insert(pGene.getFeatureWithBLOBs());<br />
featurepropDAO.insert(pGene.getPropertyDescription());<br />
pGene.featurelocId = featurelocDAO.insert(pGene<br />
.getFeaturelocWithBLOBS());<br />
pGene = insertExons(pGene);<br />
insertSynonyms(pGene);<br />
sqlMap.commitTransaction();<br />
} catch (Exception e) {<br />
System.out.println(e);<br />
throw (e);<br />
} finally {<br />
sqlMap.endTransaction();<br />
}<br />
</java><br />
<br />
=====Transactions=====<br />
<br />
* SQLMap<br />
* JDBC<br />
* JTA - Java Transaction API<br />
** 2-Phase commit<br />
* Hibernate<br />
* External (Customized)<br />
<br />
=====Retrieval=====<br />
<br />
symbol: xfile<br />
description: A test gene for GMOD meeting<br />
mRNA Feature<br />
exon_1: start: 13691 end: 13767 <br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
exon_2: start: 14687 end: 14720<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 2 - Master Detail Reports=====<br />
<br />
Account for cycles or recursion in Master Detail Report. <br />
<br />
<xml><br />
<resultMap id="SelectGeneResults"<br />
class="org.gmod.architecture.framwork.bakeoff.Gene" groupBy="id"><br />
<result column="FEATURE_ID" property="id" jdbcType="INTEGER"/><br />
<result column="GENE_NAME" property="name" jdbcType="VARCHAR" /><br />
<result column="DESCRIPTION" property="description“<br />
jdbcType="VARCHAR" /><br />
<result column="TYPE_ID" property="typeId" jdbcType="INTEGER" /><br />
<result property="exons" resultMap = "gene.SelectExonResults"/><br />
</resultMap><br />
<br />
<resultMap id="SelectExonResults"<br />
class="org.gmod.architecture.framwork.bakeoff.Exon"><br />
<result column="EXON_ID" property="id" jdbcType="INTEGER"/><br />
<result column="EXON_NAME" property="name" jdbcType="VARCHAR" /><br />
<result column="EXON_RESIDUES" property="residues" jdbcType="CLOB" /><br />
<result column="STRAND" property="strand" jdbcType="INTEGER" /><br />
<result column="FMIN" property="fmin" jdbcType="INTEGER" /><br />
<result column="FMAX" property="fmax" jdbcType="INTEGER" /><br />
<result column="SRCFEATURE_ID" property="sourceFeatureId"<br />
jdbcType="INTEGER" /><br />
</resultMap><br />
</xml><br />
<br />
=====Problem 2 - Master Detail Report=====<br />
<xml><br />
<resultMap id="SelectGeneResults"<br />
class="org.gmod.architecture.framwork.bakeoff.Gene" groupBy="id"><br />
<result column="FEATURE_ID" property="id" jdbcType="INTEGER"/><br />
<result column="GENE_NAME" property="name" jdbcType="VARCHAR" /><br />
<result column="DESCRIPTION" property="description“<br />
jdbcType="VARCHAR" /><br />
<result column="TYPE_ID" property="typeId" jdbcType="INTEGER" /><br />
<result property="exons" resultMap = "gene.SelectExonResults"/><br />
</resultMap><br />
<br />
<resultMap id="SelectExonResults"<br />
class="org.gmod.architecture.framwork.bakeoff.Exon"><br />
<result column="EXON_ID" property="id" jdbcType="INTEGER"/><br />
<result column="EXON_NAME" property="name" jdbcType="VARCHAR" /><br />
<result column="EXON_RESIDUES" property="residues" jdbcType="CLOB" /><br />
<result column="STRAND" property="strand" jdbcType="INTEGER" /><br />
<result column="FMIN" property="fmin" jdbcType="INTEGER" /><br />
<result column="FMAX" property="fmax" jdbcType="INTEGER" /><br />
<result column="SRCFEATURE_ID" property="sourceFeatureId"<br />
jdbcType="INTEGER" /><br />
</resultMap><br />
</xml><br />
<br />
=====Master Detail Report=====<br />
<br />
gene_id Symbol Type Fmin Fmax<br />
6129482 x-files gene 13691 13767<br />
6129482 x-files gene 14687 14720<br />
<br />
Becomes:<br />
<br />
gene_id Symbol Type Fmin Fmax<br />
6129482 x-files gene 13691 13767<br />
14687 14720<br />
<br />
=====Dynamic Queries=====<br />
<br />
* Gene Name (Description)<br />
** Feature, Featureprop<br />
* Symbol<br />
** Feature<br />
* Feature Synonyms<br />
** Feature, Feature_Synonym, Synonym<br />
* Ortholog Synonyms<br />
** Feature, Feature_relationship, Feature, Feature Synonyms<br />
<br />
=====Dynamic Queries=====<br />
<br />
FROM<br />
CAT_X_GENE_V gc<br />
<isEqual<br />
prepend=",property="searchSymbol"<br />
compareValue="true"><br />
GENE_SYMBOLS s<br />
</isEqual><br />
<br />
<isEqual prepend=","<br />
property="searchNcbi" <br />
compareValue="true"><br />
NCBI_GI n<br />
</isEqual><br />
<br />
=====Dynamic Queries=====<br />
<br />
<dynamic prepend="WHERE"><br />
<isEqual prepend="AND" property="searchNameOnly“<br />
compareValue="true"><br />
<iterate property="searchTokens" conjunction="AND" <br />
open=" (" close=") "><br />
LOWER(VARCHAR(gc.longname)) LIKE <br />
LOWER(CAST(#searchTokens[]:VARCHAR# AS VARCHAR(512)))<br />
</iterate><br />
</isEqual><br />
<br />
Iterate very useful for multiple search terms <br />
<br />
=====Miscellaneous Features=====<br />
<br />
* Supports various data sources<br />
** Simple JDBC<br />
** DBCP – Apache Connection Pooling<br />
** JNDI – Java Naming Directory Interface<br />
* Very flexible<br />
* Local caching of results<br />
** Lazy loading<br />
<br />
=====Support=====<br />
<br />
* In GMOD used by<br />
** Xenbase, Artemis at Sanger<br />
* Many other users<br />
** e.g. MySpace.com<br />
* Top level Apache Project<br />
** www.ibatis.apache.org<br />
* Active community<br />
<br />
<br />
=====What iBatis Does Well=====<br />
<br />
* Does not hide SQL<br />
** No new query language to learn<br />
* Separates and groups SQL<br />
* Simple!!<br />
** Light wrapper - No real tweaks<br />
* Does the job well<br />
* Excellent support for Master-Detail<br />
* Dynamically generated queries <br />
** You can structure conditions around clauses in SQL<br />
** One XML statement can represent many variations on a query<br />
<br />
=====Acknowledgements=====<br />
<br />
GMOD<br />
* Eric Just<br />
* Everyone else<br />
<br />
Ibatis Developers<br />
* Kevin Snyder,<br />
* Chris Jarabek,<br />
* Ross Gibb<br />
<br />
PI<br />
* Peter Vize<br />
<br />
Financial Support<br />
* Alberta Heritage Foundation for Medical Research<br />
* Alberta Network for Proteomics Innovation<br />
* University of Calgary, Faculty of Science<br />
* University of Calgary Dept. of Computer Science<br />
* NICHD<br />
<br />
===Hibernate===<br />
<br />
====Background====<br />
<br />
* Source: http://www.hibernate.org<br />
* Language: Java<br />
* Authors: JBoss group<br />
* Users: VectorBase<br />
* Support: JBoss group<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Limitations====<br />
<br />
* Issue around Completeness <br />
* Exception Handling<br />
* Performance Tuning<br />
<br />
====Presentation by Robert Bruggner====<br />
<br />
Chado API via Java & Hibernate, Robert Bruggner, VectorBase.org. This Wiki section is an edited version of [[Media:HibernateChadoAPI.pdf|Robert's presentation]].<br />
<br />
=====Overview=====<br />
<br />
* Background<br />
* Quick Hibernate Overview<br />
* Hibernate Connectivity and O/R Mapping Example<br />
* GMOD Demo<br />
<br />
Also see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]].<br />
<br />
=====Background=====<br />
<br />
* VectorBase<br />
** A bioinformatic resource center for invertebrate vectors of human pathogens<br />
* Responsible for storage and display of multiple organisms’ genomes<br />
** Anopheles gambiae, Aedes aegypti, Ixodes scapularis, Culex pipiens and so on....<br />
* Want to store data for many organisms- Chado a natural choice<br />
* Ensembl Genome Browser already used for ''A. gambiae''<br />
** Wrote Ensembl API Database adaptor for Chado... Not maintainable.<br />
* Use Both Databases<br />
** Transfer genomic data from Ensembl to Chado<br />
** Search Engine and Indexer using Lucene<br />
** Run DAS<br />
** Export data via ChadoXML and GFF3<br />
* Need API for Database I/O<br />
<br />
=====Hibernate Background=====<br />
<br />
* They say: “A powerful, high performance object/relational persistence and query service.”<br />
* Automates the persistence of plain old Java objects (POJO)<br />
** User maps their POJO properties to database tables via XML (HBM File).<br />
** There are Hibernate tools that generate HBMs<br />
*** Configurable in the sense that one can create get & set tables where the methods map one-to-one to fields.<br />
* Persist a specific object by storing it the database.<br />
* Intelligent Database I/O <br />
** Smart detection of ''Dirty Properties'' when performing Save / Update / Delete.<br />
** Cascadable Save / Update / Delete for complex objects.<br />
* Everything's done within the scope of a transaction.<br />
<br />
=====Hibernate Database Connectivity=====<br />
<br />
* Configure Hibernate in hibernate.cfg.xml<br />
* Define a Data Source<br />
** We use a simple, single JDBC connection Chado<br />
** Can be configured to use a connection pool or data source accessible by the Java Naming and Directory Interface (JNDI).<br />
** Define a connection “dialect”<br />
** org.hibernate.dialect.PostgreSQLDialect<br />
* Describe the relationship between Java objects and database tables<br />
** Use XML to describe where to store POJO property data in the database<br />
* Create a new Hibernate Session based on the configuration<br />
* Begin a transaction to start performing work<br />
<br />
=====POJO and HBM Example file - CV=====<br />
<br />
<java><br />
public class CV {<br />
<br />
private int cv_id<br />
private String name;<br />
private String definition;<br />
<br />
public property gettersandsetters() {<br />
....<br />
}<br />
<br />
public boolean equals(CV comparaCV) {<br />
....<br />
}<br />
public int hashCode(){<br />
...<br />
}<br />
}<br />
</java><br />
<xml><br />
<hibernate-mapping><br />
<br />
<class name="org.vectorbase.chadoAPI.chadoObjects.CV" table="cv"><br />
<br />
<id name="cv_id" column="cv_id" unsaved-value="undefined"><br />
<br />
<generator class="sequence"><br />
<br />
<param name="sequence">cv_cv_id_seq</param><br />
<br />
</generator><br />
<br />
</id> <br />
<br />
<property name="name" column=”name” type="java.lang.String” not-null="true"/><br />
<br />
<property name="definition" column=”definition” type="java.lang.String”/><br />
<br />
</class><br />
</hibernate-mapping><br />
</xml><br />
<br />
=====HBM Example CVTerm=====<br />
<br />
<java><br />
public class CVTerm {<br />
<br />
private int cvterm_id;<br />
<br />
private CV cv;<br />
<br />
private String name;<br />
<br />
private String definition;<br />
<br />
private DBXref dbxref;<br />
<br />
private int is_obsolete;<br />
<br />
private int is_relationshiptype;<br />
<br />
.....<br />
<br />
}<br />
</java><br />
<xml><br />
<hibernate-mapping><br />
<br />
<class name="org.vectorbase.chadoAPI.chadoObjects.CVTerm" table="cvterm"><br />
<br />
<id name="cvterm_id" column="cvterm_id" unsaved-value="undefined"><br />
<br />
<generator class="sequence"><br />
<br />
<param name="sequence">cvterm_cvterm_id_seq</param><br />
<br />
</generator><br />
<br />
</id><br />
<br />
<many-to-one name="cv" class="org.vectorbase.chadoAPI.chadoObjects.CV" column="cv_id" <br />
not-null="true" cascade="save-update"/><br />
<br />
<property name="name" not-null="true" type="java.lang.String"/><br />
<br />
<property name="definition"/><br />
<br />
<one-to-one name="dbxref" class="org.vectorbase.chadoAPI.chadoObjects.DBXref" cascade="all"/><br />
<br />
<property name="is_obsolete"/><br />
<br />
<property name="is_relationshiptype"/><br />
<br />
</class><br />
</hibernate-mapping><br />
</xml><br />
<br />
=====Hibernate Object Retrieve=====<br />
<br />
One can use Java, Hibernate Query Language, or SQL, this example uses HQL<br />
<br />
<java><br />
import org.hibernate.Session;<br />
import org.vectorbase.chadoAPI.CVTerm;<br />
import org.vectorbase.chadoAPI.CV;<br />
<br />
// Load the configuration from hibernate.cfg.xml<br />
<br />
// Build a session factory first (not shown)<br />
<br />
// Get the session based on the configuration and begin transaction<br />
Session session = HibernateSessionFactory.getCurrentSession();<br />
session.beginTransaction();<br />
<br />
// Load a CVTerm by its ID<br />
CVTerm cvt = (CVTerm) session.get(CVTerm.class,1);<br />
<br />
// Load a CVTerm using HQL<br />
CVTerm cvt = session.createQuery(“from CVTerm where name=?”).setString(0,”name”).uniqueResult();<br />
<br />
// Print out the name of the cvterm<br />
System.out.println(cvt.getName());<br />
<br />
// Get the cv that the cvterm is associated with<br />
// Hibernate doesn’t return the cv_id - it returns a CV Object.<br />
CV cv = cvt.getCv();<br />
<br />
// Print out the cv’s name<br />
System.out.println(cv.getName());<br />
</java><br />
<br />
=====Hibernate Object Update=====<br />
<br />
<java><br />
import org.hibernate.Session;<br />
import org.vectorbase.chadoAPI.CVTerm;<br />
<br />
// Load the configuration from hibernate.cfg.xml<br />
<br />
// Build a session factory first (not shown)<br />
<br />
// Get the session based on the configuration and begin transaction<br />
Session session = HibernateSessionFactory.getCurrentSession();<br />
session.beginTransaction();<br />
<br />
// Load a CVTerm by its ID<br />
CVTerm cvt = (CVTerm) session.get(CVTerm.class,1);<br />
<br />
// Change cvt’s name<br />
cvt.setName(“New CVTerm name”);<br />
<br />
// Save!<br />
// Generated SQL updates “Dirty” properties (name, in this case)<br />
session.save(cvt);<br />
<br />
// Commit data to database<br />
session.commit();<br />
</java><br />
<br />
=====Hibernate Save=====<br />
<br />
<java><br />
import org.hibernate.Session;<br />
import org.vectorbase.chadoAPI.CVTerm;<br />
import org.vectorbase.chadoAPI.CV;<br />
<br />
// Load the configuration from hibernate.cfg.xml<br />
// Build a session factory first and get begin transaction (not shown)<br />
<br />
// Make a new CV<br />
CV new_cv = new CV();<br />
new_cv.setName(“New CV”);<br />
new_cv.setDefinition(“New CV Def”);<br />
<br />
// Make a new cvterm for that cv<br />
CVTerm new_cvterm = new CVTerm();<br />
new_cvterm.setName(“New CVTerm Name”);<br />
// ..... save dbxref etc......<br />
<br />
// Add that CVTerm to our new CV<br />
new_cv.addCVTerm(new_cvterm);<br />
<br />
// Save the new data...<br />
// Hibernate recognizes that it has to first save new_cv, then save new_cvterm.<br />
session.save(new_cvterm);<br />
<br />
session.commit();<br />
<br />
// You can see the new id’s assigned by the database<br />
System.out.println(new_cv.getCv_id());<br />
System.out.println(new_cvterm.getCvterm_id());<br />
</java><br />
<br />
=====Inheritance=====<br />
<xml><br />
<hibernate-mapping><br />
<br />
<class name="org.vectorbase.chadoAPI.chadoObjects.Feature" table="feature" discriminator-<br />
value="not null"><br />
<br />
<id name="feature_id" column="feature_id" unsaved-value="undefined"><br />
<br />
<generator class="sequence"><br />
<br />
<param name="sequence">feature_feature_id_seq</param><br />
<br />
</generator><br />
<br />
</id> <br />
<br />
<discriminator column="type_id" type="integer" insert="false"/><br />
<br />
<many-to-one name="dbxref" class="org.vectorbase.chadoAPI.chadoObjects.DBXref" <br />
column="dbxref_id" cascade="all"/><br />
<br />
<many-to-one name="organism" class="org.vectorbase.chadoAPI.chadoObjects.Organism" <br />
column="organism_id" not-null="true" cascade="save-update"/><br />
<br />
<property name="name"/><br />
.....<br />
<br />
<hibernate-mapping> <br />
<br />
<subclass name="org.vectorbase.chadoAPI.chadoFeatures.Gene" <br />
extends="org.vectorbase.chadoAPI.chadoObjects.Feature" discriminator-value="767"><br />
<br />
</subclass><br />
</hibernate-mapping><br />
</xml><br />
Write custom methods for specific sub-classes<br />
<br />
=====ChadoAPI=====<br />
<br />
* POJO Mappings<br />
** CV, CVTerm, DB, DBXref, Feature, FeatureCVTerm, FeatureDBXref, FeatureLoc, FeatureProp, FeatureRelationship, FeatureSynonym, Organism, Pub, Synonym<br />
* Extended Features<br />
** Chromosome, Gene, Transcript, Exon, Protein<br />
* Constants<br />
** CVTerms, FeatureFeatureRelationships, Ontologies<br />
* Special<br />
** ChadoAdapter<br />
<br />
=====Problem 1 - GMOD Example=====<br />
<br />
<java><br />
// Set up our session and begin transaction<br />
Session session = HibernateUtil.getSessionFactory().getCurrentSession();<br />
session.beginTransaction();<br />
<br />
// Make a chado adpator and load up some utility objects<br />
ChadoAdaptor ca = new ChadoAdaptor();<br />
Chromosome c = ca.fetchChromosomeByUniqueName("fake_chromosome");<br />
Pub null_pub = ca.fetchPubByPubID(1);<br />
Organism agambiae = ca.fetchOrganismByScientificName("Anopheles","gambiae");<br />
<br />
// Begin GMOD Demo Code<br />
<br />
// Make our new gene;<br />
Gene xfile = new Gene();<br />
xfile.setOrganism(agambiae);<br />
xfile.setUniquename("xfile");<br />
xfile.setDescription("A test gene for GMOD meeting");<br />
<br />
/* Set the location of our gene. No need to set coordinates because they'll be updated<br />
* based on the exon boundaries. <br />
*/<br />
FeatureLoc xfile_loc = new FeatureLoc();<br />
xfile_loc.setSrcfeature(c);<br />
xfile_loc.setStrand(1);<br />
xfile.setFeatureLoc(xfile_loc);<br />
<br />
// Add synonyms to xfile<br />
xfile.createNewFeatureSynonym("mulder", null_pub, CVTerms.EXACT_SYNONYM);<br />
xfile.createNewFeatureSynonym("scully", null_pub, CVTerms.EXACT_SYNONYM);<br />
</java><br />
<br />
=====Problem 2 - GMOD Example=====<br />
<br />
<java><br />
// Create a new transcript for our gene.<br />
Transcript t = xfile.createGeneTranscript("xfile-RA");<br />
<br />
// Create some exons for that transcript.<br />
t.createTranscriptExon("xfile:1", 13691, 13767);<br />
t.createTranscriptExon("xfile:2", 14687, 14720);<br />
<br />
// Save our new gene<br />
session.save(xfile);<br />
System.out.println("xfile feature_id is " + xfile.getFeature_id());<br />
<br />
// Fetch our saved gene from the database<br />
Gene xfile_r = ca.fetchGeneByUniqueName("xfile");<br />
System.out.println("symbol: " + xfile_r.getUniquename());<br />
System.out.print("synonyms: ");<br />
for (FeatureSynonym fs : xfile_r.getFeatureSynonyms()){<br />
<br />
System.out.print(fs.getSynonym().getName() + " ");<br />
}<br />
<br />
System.out.println("description: " + xfile_r.getDescription());<br />
System.out.println("type: " + xfile_r.getType().getName());<br />
<br />
for (Transcript tx : xfile_r.fetchAllTranscripts()){<br />
for (Exon e : tx.fetchAllExons()){<br />
System.out.println(e.getUniquename() + " Start:\t" + e.getFeatureLoc().getFmin());<br />
System.out.println(e.getUniquename() + " End:\t" + e.getFeatureLoc().getFmax());<br />
System.out.println("\tSrcFeatureID: " + e.getFeatureLoc().getSrcfeature().getFeature_id());<br />
}<br />
System.out.println(">" + tx.getUniquename());<br />
System.out.println(tx.generateTranscriptSequenceFromExons().toUpperCase());<br />
}<br />
</java><br />
<br />
=====Problems 3, 4, & 5 - GMOD Update & Delete=====<br />
<br />
<java><br />
// Lets update our name...<br />
xfile_r.setUniquename("x-file");<br />
<br />
session.save(xfile_r);<br />
<br />
// Not part of the ChadoAdaptor utility object, but a good example of HQL<br />
List<Gene> genes = (List<Gene>)session.createQuery("from Gene where uniquename like ?").setString(0,”x-%”).list();<br />
<br />
for (Gene g : genes){<br />
<br />
System.out.println(g.getFeature_id() + <br />
"\t" + g.getUniquename() + <br />
"\t" + g.getOrganism().getGenus() +<br />
" " + g.getOrganism().getSpecies());<br />
}<br />
<br />
// Deleting... hmm...<br />
Gene delete_me = ca.fetchGeneByUniqueName("x-ray");<br />
session.delete(delete_me);<br />
<br />
// All Finished<br />
session.getTransaction().commit();<br />
</java><br />
<br />
<br />
<br />
=====What Hibernate Does Well=====<br />
<br />
* Hibernate can be configured to perform specialized functions<br />
** For example, it has its own notion of a cascade<br />
* Flexible with respect to language<br />
** Java, Hibernate Query Language, or SQL<br />
* Any JDBC driver<br />
<br />
=====Acknowledgements=====<br />
<br />
* VectorBase People<br />
** Frank Collins, EO Stinson, Ryan Butler<br />
* GMOD<br />
* NIAID<br />
<br />
===PSU Chado Interface===<br />
<br />
====Background====<br />
<br />
* Source:<br />
* Language: Java<br />
* Authors: Chinmay Patel, Adrian Tivey<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Limitations====<br />
<br />
* Hibernate Save: no equivalent to a ''find or save'' method.<br />
* Not great for bulk data retrieval.<br />
* Hibernate works best when applied to databases designed with objects in mind (''Object Oriented Databases''). <br />
<br />
<br />
====Presentation by Chinmay Patel====<br />
<br />
This Wiki section is an edited version of [[Media:PSU.pdf|Chinmay's presentation]].<br />
<br />
=====GeneDB=====<br />
<br />
* GeneDB is the organism data and annotation database for the Pathogen Sequencing Unit (PSU) at the Sanger Institute, UK<br />
* Contains 37 organisms, which is expected to grow to 62<br />
* Currently migrating to chado schema<br />
* Java API with two engines Hibernate & iBatis<br />
** Two teams, Artemis and GeneDB, took different approaches<br />
<br />
=====Technical - Connections=====<br />
<br />
Connections are configured in the Spring configuration file<br />
<xml><br />
<bean id="dataSource" class="org.apache.commons.dbcp.BasicDataSource"><br />
<property name="driverClassName" value="org.postgresql.Driver" /><br />
<property name="url" value="jdbc:postgresql://holly.sanger.ac.uk:5432/chado" /><br />
<property name="username" value="DELIBERATELY_BOGUS_NAME"/><br />
<property name="password" value="WIBBLE" /><br />
</bean><br />
</xml><br />
* Uses a connection pool<br />
* Connection to the database is specified graphically, so the iBatis configuration file has variables for the location:<br />
<xml><br />
<property name="JDBC.Driver" value="org.postgresql.Driver"/><br />
<br />
<property name="JDBC.ConnectionURL” value="jdbc:postgresql://${chado}"/><br />
<br />
<property name="JDBC.Username" value="${username}"/><br />
<br />
<property name="JDBC.Password" value="${password}"/><br />
</xml><br />
<br />
* provide database location, username & password<br />
* select from scrollable list of feature with residues (organisms in separate Postgres schemas) what to open in Artemis<br />
<br />
=====Technical - Code Generation=====<br />
<br />
* The shared interface and hibernate implementation were originally generated<br />
* There’s no explicit code generation (although the Spring and Hibernate runtimes may use them behinds the scenes)<br />
<br />
=====Technical - Transactions=====<br />
<br />
* Transactions are fully supported<br />
* There’s no explicit code generation (although the Spring and Hibernate runtimes may use them behind the scenes)<br />
<br />
=====Problems 1, 2, & 3=====<br />
<br />
Creating a gene<br />
<java><br />
genes[0] = new Feature(ORG, GENE, "xfile", false, false, now, now);<br />
<br />
genes[0].setSeqLen(1029); <br />
sequenceDao.persist(genes[0]);<br />
<br />
FeatureLoc loc = new FeatureLoc(SOURCE_FEATURE, genes[0], 13691, false, 14720, false, (short)1, 0, 0 ,0);<br />
<br />
sequenceDao.persist(loc);<br />
<br />
addFeatureProp(genes[0], "description", "A test gene for GMOD meeting");<br />
<br />
addSynonymsToFeature(genes[0], "mulder", "scully");<br />
<br />
createExon("exon1", genes[0], 13691, 13767, now, 0);<br />
<br />
createExon("exon2", genes[0], 14687, 14720, now, 1);<br />
</java><br />
<br />
Retrieve a gene<br />
<java><br />
Feature f = sequenceDao.getFeatureByUniqueName("xfile");<br />
displayGene(f);<br />
</java><br />
<br />
Update a gene<br />
<java><br />
genes[0].setUniqueName("x-file");<br />
<br />
sequenceDao.merge(genes[0]);<br />
</java><br />
<br />
=====Demo – Sample Problem=====<br />
<br />
<java><br />
private Feature createExon(String name, Feature gene, int min, int max, Timestamp now, int rank) {<br />
<br />
Feature exon = new Feature(ORG, EXON, name, false, false, now, now);<br />
exon.setSeqLen(max-min);<br />
sequenceDao.persist(exon);<br />
<br />
FeatureLoc loc = new FeatureLoc(SOURCE_FEATURE, exon, min, false, max, false, <br />
(short)1, 0, 0 ,0);<br />
sequenceDao.persist(loc);<br />
<br />
return exon;<br />
<br />
}<br />
</java><br />
<br />
=====Demo – Sample Problem=====<br />
<br />
<xml><br />
<st:section name="Naming" id="gene_naming" collapsed="false" collapsible="false"<br />
hideIfEmpty="true"><br />
<dl><br />
<dt><b>symbol:</b></dt><br />
<dd>${feature.uniqueName}</dd><br />
</dl><br />
<db:synonym name="synonym" var="name" collection="${feature.featureSynonyms}"><br />
<br /><b>Synonym:</b> <db:list-string collection="${name}" /><br />
</db:synonym><br />
<dt><b>Type:</b></dt><br />
<dd>${feature.cvTerm.name}</dd><br />
<br />
<st:section name="Exons" collapsed="false" collapsible="true" hideIfEmpty="true"><br />
<display:table name="exons" uid="tmp" pagesize="30" class="simple" cellspacing="0"<br />
cellpadding="4"><br />
<display:column property="uniqueName" title="Exon"/><br />
<display:column property="featureLocsForSrcFeatureId.fmin" title="Start"/><br />
<display:column property="featureLocsForSrcFeatureId.fmax" title="end"/><br />
</display:table><br />
</st:section><br />
<br />
<st:section name="cds" collapsible="true"><br />
<b>${feature.residues}</b><br />
</st:section><br />
</xml><br />
<br />
Specialized functionality like a cascading delete are handled by the database</div>165.124.152.78http://gmod.org/wiki/GMOD_MiddlewareGMOD Middleware2007-01-26T19:51:01Z<p>165.124.152.78: /* Modware Features */</p>
<hr />
<div>==Middleware for Chado databases==<br />
<br />
===Authors===<br />
<br />
* Jeff Bowes<br />
* Robert Bruggner<br />
* Scott Cain<br />
* Josh Goodman<br />
* Eric Just<br />
* Sohel Merchant<br />
* Brian O'Connor<br />
* Brian Osborne<br />
* Chinmay Patel<br />
* Pinglei Zhou<br />
<br />
===Middleware Evaluation January 2007===<br />
<br />
A group of some 50 GMOD developers met at the annual meeting to discuss middleware. This one day meeting had the following general goals:<br />
<br />
* To educate GMOD programmers on methods and practices for Middleware<br />
* To facilitate discussion on the best methods<br />
* To guide GMOD to a uniform Middleware layer<br />
* To generate this central reference document for Middleware projects, including:<br />
** Platform information<br />
** Strengths & weaknesses of different Middleware packages<br />
** Specific examples of how one would use a given middleware package<br />
<br />
===Introduction===<br />
<br />
One of the key characteristics of the GMOD software project is the variety of approaches and components that it supports. This applies to applications, database schemas, as well as to middleware, a software ''layer'' that mediates the exchange of data between the applications and the databases. Despite this diversity certain applications and schemas have emerged as key supported components in GMOD, such as the GBrowse application and the Chado schema, to name just two. However, a consensus view has not emerged with respect to middleware, and there are certainly a number of different middleware packages that have been used in the GMOD world, coming from within this world and from the larger world of open source.<br />
<br />
In late 2006 the GMOD developers took note of the large number of middleware packages in use and elected to embark on a short-term study to evaluate and compare these packages. The primary motivation here was to select or recommend certain packages over others specifically within the GMOD context. The assumption is that making such recommendations will serve to focus the developers' effort on a smaller number of packages. Clearly it's also assumed that such a focus will inevitably lead to greater support for and use of those recommended packages, and that all will GMOD will benefit.<br />
<br />
Another purpose of this study is to educate GMOD programmers on best practices concerning the use and development of middleware. It's expected that common agreement on these practices will lead to the development of more effective software as well as the best use of the software in practice. Finally, this study should generate a central reference document on these different middleware packages used in GMOD. This reference will contain platform- and language-specific information as well as descriptions of the strengths and weaknesses of the packages that can be used by GMOD developers when considering middleware.<br />
<br />
====General Evaluation Criteria====<br />
<br />
The GMOD developers proposed that each presenter provide some basic information about each middleware package, both general and technical. In addition each middleware application was asked to address a set of sample problems, shown below. These example problems are thought to typify some of the common functions that the scientist may need when working with their own database. It was understood that not all software would be able to handle all aspects of the sample problems and this demonstration was not intended to be ''live''.<br />
<br />
=====Problem 1=====<br />
<br />
Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned <code>feature_id</code> for each inserted gene.<br />
<br />
Note:<br />
<br />
* The coordinates are given in exact coordinates.<br />
* Use the <code>organism_id</code> for your organism<br />
* Store description in the chado table <code>featureprop</code><br />
* A sequence in fasta format (see [[FakeChromosome|Fake Chromosome]]) should be loaded as genomic sequence, either chromosome or contig -- this will be used as a <code>srcfeature</code> in <code>featureloc</code><br />
<br />
Gene descriptions:<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
mRNA Feature<br />
exon_1: <br />
start: 13691<br />
end: 13767<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
exon_2: <br />
start: 14687<br />
end: 14720<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
symbol: x-men<br />
synonyms: wolverine<br />
mRNA Feature<br />
exon_1: <br />
start: 12648<br />
end: 13136<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
symbol: x-ray<br />
synonyms: none<br />
exon: <br />
start: 1703<br />
end: 1900<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 2=====<br />
<br />
Retrieve and print the following report for gene '''xfile''' (the coding sequence and exon coordinates are derived from the associated mRNA feature). The results should resemble the following:<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>xfile cds <br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
=====Problem 3=====<br />
<br />
Update the gene '''xfile''': change the name symbol to '''x-file''' and retrieve the changed record. Regenerate the report from Problem 1. The results should resemble the following:<br />
<br />
symbol: x-file<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>x-file cds<br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
=====Problem 4=====<br />
<br />
Search for all genes with symbols starting with ''x-''. With the results produce the following simple result list<br />
(organism will vary):<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
1325 x-ray Xenopus laevis<br />
<br />
=====Problem 5=====<br />
<br />
Delete the gene '''x-ray''' using the <code>geneId</code>. Run the search and report in Problem 4 again to show the delete has taken<br />
place, with a result resembling the following:<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
<br />
===Conclusions===<br />
<br />
The one day meeting heard presentations from developers using both Perl and Java middleware and a number of satisfactory solutions were described. The focus in all cases was some sort of system that connected to the Chado relational database. Although other databases are encountered in the GMOD world (e.g. BioSQL) the Chado schema is popular and serves as a good test schema for this exercise given its complexity. The primary focus in the talks was on functionality from the perspective of writing code and extending the software and less attention was given to performance. Each presenter focussed on their middleware and little side-by-side comparisons were made (for one comparison please see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]] by Josh Goodman).<br />
<br />
====Problem Assignments====<br />
<br />
All presenters paid attention to the assigned problems and all packages could perform the required operations, except for the GBrowse (DasI) Adaptor which is read-only software. Clearly one can see many differences between packages in how the the problems were solved, please see the presentations themselves for these details. <br />
<br />
The Perl approaches used only the Perl language whereas the Java packages all used Java plus XML, to some degree. In addition iBatis exposes SQL to the developer and it was argued that this could be viewed either as an advantage (allows tuning of underlying SQL) or disadvantage.<br />
<br />
====Java Middleware====<br />
<br />
* Flybase examined iBatis and Hibernate, both use XML configuration files<br />
** Hibernate is better if you're building schema from scratch<br />
** Both auto-configure given a schema. <br />
** Both have strengths and weaknesses.<br />
* Is Hibernate better when you're in the process of designing a schema?<br />
** Hibernate can assist you in making a ''Hibernate-compatible'' schema.<br />
<br />
=====Abstraction=====<br />
<br />
<br />
=====Performance=====<br />
<br />
No pairwise comparisons.<br />
<br />
=====Configuration & Autoconfiguration=====<br />
<br />
<br />
=====Documentation=====<br />
<br />
<br />
====Perl Middleware====<br />
<br />
<br />
=====Abstraction=====<br />
<br />
Modware has Higher level abstraction than that provided by Chado::AutoDBI<br />
<br />
=====Performance=====<br />
<br />
No pairwise comparisons.<br />
<br />
=====Configuration & Autoconfiguration=====<br />
<br />
<br />
=====Documentation=====<br />
<br />
DasI interface is well-documented, about a dozen methods and three classes, all documented.<br />
<br />
===Object-Relational Mapping Principles===<br />
<br />
====Presentation by Sohel Merchant====<br />
<br />
Sohel Merchant, Bioinformatics Software Engineer at dictyBase, Center for Genetic Medicine, Northwestern University, Chicago. This Wiki section is an edited version of [[Media:Object_Relational_Mapping_Layer.pdf|Sohel's presentation]].<br />
<br />
=====Outline=====<br />
<br />
* The Problem<br />
* Solutions<br />
* ORM<br />
* Perl – Class::DBI<br />
* Summary<br />
<br />
=====The Problem=====<br />
<br />
* Developers need to perform Create, Retrieve, Update, Delete (aka CRUD) operations on data inside an application.<br />
* The real world objects represented using a programming language needs to be stored in databases<br />
* Using relational databases to store object-oriented data leads to a semantic gap<br />
* RDBMS have fixed types, but OO can have more complicated user defined types.<br />
<br />
=====Solutions=====<br />
<br />
* Data Access Object (DAO)<br />
* Developer writes a class which contains one attribute for each field in the table<br />
* Methods for CRUD typically contains JDBC/DBI code with the necessary SQL statements.<br />
* Object Relational Mapping (ORM), WikiPedia:<br />
** “ORM is a programming technique that links databases to object-oriented language concepts, creating (in effect) a virtual object database.“<br />
* Developer needs to configure the ORM<br />
* Less amount of manual coding<br />
* CRUD methods are automatically generated by the ORM layer<br />
<br />
=====ORM=====<br />
<br />
ORM solutions<br />
<br />
* Perl<br />
** Class::DBI<br />
* Java<br />
** EJB<br />
** Hibernate<br />
** JDO<br />
** iBatis<br />
<br />
=====Perl - Class::DBI=====<br />
<br />
* Provides a simple interfaces for wrapping Perl classes around a database tables<br />
* Tables are mapped directly to objects<br />
* The table column name are mapped to the get/set methods<br />
* Can be used with transactions<br />
<br />
=====Class::DBI=====<br />
<br />
Defining a class in Class::DBI to represent a table:<br />
<br />
CVTERM<br />
cvterm_id<br />
cv_id<br />
name<br />
definition<br />
dbxref_id<br />
<br />
Corresponding code:<br />
<br />
<perl><br />
package Chado::Cvterm;<br />
use base 'Chado::DBI';<br />
Chado::Cvterm->set_up_table('Cvterm');<br />
</perl><br />
<br />
=====Class::DBI - CRUD=====<br />
<br />
<perl><br />
## Create<br />
$term_dbobj = Chado::Cvterm->create({<br />
name => ”DUMMY TERM”,<br />
cv_id => 1,<br />
dbxref_id => 125<br />
});<br />
## Retrieve<br />
$term_dbobj = Chado::Cvterm->retrieve(2);<br />
<br />
## Update<br />
$term_dbobj->name( $term->name() );<br />
$term_dbobj->definition( $term->definition );<br />
<br />
## Delete<br />
$term_dbobj->delete();<br />
</perl><br />
<br />
=====Java - Hibernate=====<br />
<br />
* Hibernate maps Java Objects directly to database tables<br />
* Scalable<br />
* Works well for controlled Data model<br />
<br />
=====Java - iBatis=====<br />
<br />
* iBATIS maps Java Objects to the results of SQL Queries<br />
* XML definitions for queries<br />
* Queries and managing Maps<br />
* Transactions<br />
* Good fit for existing database schema<br />
<br />
=====Summary=====<br />
<br />
* ORM provides painless roundtrip of data between the application and database.<br />
* Reduces the amount of SQL code and allows a programmatic style interface to the RDBMS<br />
* Choice of ORM solution depends on the type of project<br />
* Flybase examined iBatis and Hibernate, both use XML configuration files<br />
** Hibernate is better if you're building schema from scratch<br />
** Both auto-configure given a schema. <br />
** Both have strengths and weaknesses.<br />
* Is Hibernate better when you're in the process of designing a schema?<br />
** Hibernate can assist you in making a ''Hibernate-compatible'' schema.<br />
<br />
===XORT===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/project/showfiles.php?group_id=27707<br />
* Language: Perl<br />
* Authors: Pinglei Zhou<br />
* Users: Flybase<br />
* Support: Pinglei Zhou<br />
* Third party code: Uses XML::Parser::PerlSAX, Unicode::String, XML::DOM, and DBI from Perl (CPAN)<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
=====Comparing Hibernate & XORT=====<br />
<br />
Flybase tried Hibernate, but just creating simple print() statements in the course of doing bulk operations they<br />
encountered performance issues. Therer are many caching parameters available in Hibernate but the problem is that Chado is recursive or cyclical. <br />
XORT does some simple, and automatic, caching. With XORT you can handle recursive or cyclical operations more easily. In common operations such as merging genes Chado users will encounter this issue routinely.<br />
<br />
Also see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]].<br />
<br />
====Limitations====<br />
<br />
* Database schemas need to follow certain rules<br />
** All must have internal int primary key<br />
** All must have unique key(s)<br />
* It may take a long path to retrieve certain type of data<br />
** Example: gene->allele->genotype->phenotype via feature_relationship<br />
* Structure not stored in memory, you flush out data as it goes<br />
<br />
====Presentation by Pinglei Zhou and Josh Goodman====<br />
<br />
This Wiki section is an edited version of [[Media:XORT.pdf|Josh and Pinglei's presentation]].<br />
<br />
<br />
<br />
=====Introduction=====<br />
<br />
* An XML-database mapping system for data exchange between DB and XML-driven application<br />
* XORT can handle typical XML, it's not Chado-specific<br />
* Developed/Supported by Pinglei Zhou at FlyBase Harvard, 0.007 version now.<br />
* Used at all FlyBase sites<br />
** Harvard has extensive library of Perl modules for generating ChadoXML<br />
* Written in Perl<br />
* Required perl modules:<br />
** XML::Parser::PerlSAX<br />
** Unicode::String<br />
** XML::DOM<br />
** DBI<br />
<br />
=====Chado XML=====<br />
<br />
* Is ChadoXML necessary? No, but it may help you.<br />
* ChadoXML assists with incremental updates, if you want to avoid flush-and-reload.<br />
* While update can be achived by other middleware (for example, perl Class::DBI, Java Hibernate), ChadoXML provide additional feature as way to archive your transaction.<br />
* It provides bulk update/download which other methods lack or is inefficient<br />
<br />
=====Components=====<br />
<br />
* Database & Schema<br />
* ChadoXML Specification<br />
* DumpSpec<br />
** DumpSpec files are simple XML files that tells XORT what to do<br />
** DumpSpec files are ''language independent'', being XML<br />
** It's fairly easy for those who know the schema to read these files and understand what the operation is<br />
<br />
=====Highlights of Chado XML Specification=====<br />
<br />
* Unique represent of specific database schema<br />
* Get away with those internal primary key value<br />
* Static vs. Operational<br />
* Encoding for non-ASCII characters<br />
* Macro mechanism (object reference)<br />
<br />
=====Putting it together: New FlyBase dataflow Part 1=====<br />
<br />
There are three Flybase sites, and most curation is done at Harvard and<br />
Cambridge. Proforma is the curation format at Cambridge and Harvard, but<br />
Harvard also curates with Apollo and ChadoXML.<br />
<br />
Once in Chado, the reporting instance, there's a denormalization step<br />
in moving data to a read-only database. Once in the read-only database there are<br />
dumps, for reporting purposes, using XORT to create ChadoXML. Once<br />
ChadoXML is created version 2 of XSLT is used to create HTML and GFF. HTML reports<br />
are for human-readable reports, GFF for GBrowse and for various power<br />
users.<br />
<br />
1.a. Proforma (FlyBase Cambridge) is converted to ChadoXML<br />
<br />
1.b. ChadoXML is created by Apollo (Harvard)<br />
<br />
1.c. ChadoXML is created by Java SEAN (Harvard)<br />
<br />
2. All ChadoXML is loaded into Chado by XORT<br />
<br />
=====Putting it together: New FlyBase dataflow Part 2=====<br />
<br />
3. Chado (Harvard) is denormalized and loaded into Chado (Indiana)<br />
<br />
4. ChadoXML is created from Chado using XORT<br />
<br />
5.a. GFF and Fasta is created from ChadoXML<br />
<br />
5.b. HTML is created from Chado XML<br />
<br />
=====Data & Report Generation=====<br />
<br />
* Content of all output files is controlled by XML dumpspecs.<br />
** Dumpspecs are language independent.<br />
** Easily readable (with knowledge of Chado structure).<br />
* All XML transformation steps are done with XSLT v2.<br />
** Saxon XSLT (http://saxon.sourceforge.net/)<br />
** ChadoXML is split into individual chunks before XSLT processing to accommodate large file sizes.<br />
** Extremely fast. We can process all data for ~60,000 Drosophila genes in under 30 minutes.<br />
<br />
=====Hibernate & XORT=====<br />
<br />
* Hibernate didn't scale well when dealing with 5,000+ features in bulk.<br />
** The test was simply calling <code>print()</code> statements<br />
* Performance tweaks for Hibernate can be quite complicated to setup for bulk operations.<br />
* XORT is currently handling ~6 million features in production with only minor performance problems.<br />
* XORT is much more language independent.<br />
<br />
=====Support for complex transactions using XORT=====<br />
<br />
For example:<br />
<br />
* Find all records linked to a record using dumpspec<br />
* Merge gene x into y, each with thousands of records attached<br />
<br />
Step 1. Dump all data use simple dumpspec<br />
<xml><br />
<chado><br />
<feature dump=“all”><br />
<uniquename test=“eq”>x</uniquename><br />
</feature><br />
</chado><br />
</xml><br />
Step 2 Delete feature x from DB, with triggers to clean orphan records, if necessary<br />
<br />
Step 3. Edit the output xml, change uniquename x to y, then load the edited file back to DB<br />
<br />
=====CHIA (Chado Interface Application)=====<br />
<br />
A Java application that organizes SQL and XORT functionality for internal users, e.g.:<br />
<br />
* Dump chado-XML for gene regions for Apollo curation<br />
* Organize and execute “canned” SQL queries<br />
* Serve IDs for curators (in development)<br />
* Dynamic browser Chado without writing SQL statement<br />
<br />
CHIA is being designed to be extensible for adding new functionality as needed.<br />
<br />
<br />
=====Documentation=====<br />
<br />
* ''Using Chado to Store Genome Annotation Data"<br />
** Current Protocols in Bioinformatics (Baxevanis, A.D., and Davison, D.B., eds) 2, 9.6.1-9.6.28.<br />
* XORT specification docs<br />
* XORT draft (unpublished)<br />
* GMOD case demo procedure<br />
** All in the doc directory of XORT package, http://www.gmod.org<br />
<br />
=====Acknowledgements=====<br />
<br />
* Willian Gelbart <br />
* Chris Mungall<br />
* David Emmert <br />
* Mark Gibson<br />
* Stan Letovsky <br />
* Nomi Harris<br />
* Frank Smutniak <br />
* Suzanna Lewis<br />
* Peili Zhang <br />
* Stan Letovsky<br />
* Haiyan Zhang <br />
* Aubrey de Grey<br />
* Andy Schroeder <br />
* Don Gilbert<br />
* Susan Russo<br />
* Mark Zythovicz <br />
* Scott Cain<br />
* Lincoln Stein<br />
* Victor Strelets<br />
* Robert Wilson<br />
* Paul Leyland<br />
<br />
===Chado::AutoDBI===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/projects/gmod/<br />
* Language: Perl<br />
* Authors: Allen Day, Scott Cain, Brian O'Connor, & others<br />
* Users: <br />
* Support:<br />
* Third party code: Based on Class::DBI by Michael Schwern & Tony Bowden<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Performance<br />
** Can one read thousands of objects into memory? You could do this but it's not suited to bulk operations<br />
* Joins & complex queries<br />
<br />
<perl><br />
# Add the add_constructor for looking for name lengths<br />
<br />
__PACKAGE__ ->add_constructor(long_names => qq{ length(name) > 15 });<br />
<br />
# Custom SQL<br />
<br />
__PACKAGE__->set_sql(xfiles => qq{<br />
SELECT FEATURE_ID<br />
FROM FEATURE<br />
WHERE NAME = 'xfiles' });<br />
</perl><br />
<br />
====Presentation by Brian O'Connor====<br />
<br />
This Wiki section is an edited version of [[Media:AutoDBI.pdf|Brian's presentation]].<br />
<br />
=====Relation to Turnkey=====<br />
<br />
Turnkey is a package that auto-generates Web sites given a relational<br />
schema, based on SQL::Translator<br />
<br />
* Turnkey authors: Allen Day, Scott Cain, Brian O'Connor<br />
* Turnkey and Chado::AutoDBI objects are essentially the same<br />
<br />
=====Technical Overview=====<br />
<br />
* Code Generation<br />
<br />
=====Project Overview=====<br />
<br />
Convert SQL Queries/Inserts/Deletes -> Object Calls<br />
<sql><br />
INSERT INTO feature (organism_id, name)<br />
VALUES (1, 'foo');<br />
</sql><br />
To:<br />
<perl><br />
my $feature = Turnkey::Model::Feature->find_or_create({<br />
organism_id => $organism,<br />
name => 'xfile', uniquename => 'xfile',<br />
type_id => $mrna_cvterm,<br />
is_analysis => 'f', is_obsolete => 'f'<br />
});<br />
</perl><br />
<br />
=====Technical Overview=====<br />
<br />
* Database connection: use a base class<br />
* Set up base object and connect, then create a ''table object'' to access primary key. <br />
* Class::DBI can find and insert records into other table, based on foreign key.<br />
<br />
<perl><br />
use base qw(Class::DBI::Pg);<br />
<br />
my ($dsn, $name, $pass);<br />
$dsn = "dbi:Pg:host=localhost;dbname=chado;port=5432";<br />
$name = "postgres";<br />
$pass = "";<br />
<br />
Turnkey::Model::DBI->set_db('Main', $dsn, $name, $pass, {AutoCommit => 1});<br />
</perl><br />
<br />
=====Technical Overview=====<br />
<br />
* Basic ORM Object: Feature<br />
<br />
<perl><br />
package Turnkey::Model::Feature;<br />
use base 'Turnkey::Model::DBI';<br />
<br />
Turnkey::Model::Feature->set_up_table('feature');<br />
<br />
#<br />
# Primary key accessors<br />
#<br />
<br />
sub id { shift->feature_id }<br />
sub feature { shift->feature_id }<br />
</perl><br />
<br />
* data field accessors by Class::Accessor<br />
<br />
=====Technical Overview=====<br />
<br />
* Basic ORM Object: Feature<br />
** has_a<br />
<br />
<perl><br />
#<br />
# has_a<br />
#<br />
Turnkey::Model::Feature->has_a( type_id => "Turnkey::Model::Cvterm" );<br />
sub cvterm { return shift->type_id; }<br />
</perl><br />
<br />
* Basic ORM Object: Feature<br />
** has_many<br />
<br />
<perl><br />
#<br />
# has_many<br />
#<br />
Turnkey::Model::Feature->has_many('feature_synonym_feature_id', <br />
'Turnkey::Model::Feature_Synonym' => 'feature_id');<br />
sub feature_synonyms { return shift->feature_synonym_feature_id; }<br />
<br />
Turnkey::Model::Feature->has_many('featureprop_feature_id', <br />
'Turnkey::Model::Featureprop' => 'feature_id');<br />
sub featureprops { return shift->featureprop_feature_id; }<br />
</perl><br />
<br />
* Can traverse tables, such as going from FEATURE to FEATUREPROP <br />
** Tell base object that the ''table object'' has_a() or has_many() keys corresponding to some key in other ''table object'' <br />
<br />
=====Technical Overview=====<br />
<br />
* Basic ORM Object: Feature<br />
** skipping linker tables for has_many<br />
<br />
<perl><br />
# skip over feature_synonym table<br />
#<br />
# method 1<br />
#<br />
sub synonyms { my $self = shift; return map $_->synonym_id, $self->feature_synonyms; }<br />
#<br />
# method 2<br />
#<br />
Turnkey::Model::Feature->has_many( synonyms2 =><br />
['Turnkey::Model::Feature_Synonym' => 'synonym_id']);<br />
</perl><br />
<br />
=====Technical Overview=====<br />
<br />
* Transactions<br />
** Chado::AutoDBI supports transactions, and one can wrap the transaction in an eval()<br />
<perl><br />
sub do_transaction {<br />
my $class = shift;<br />
my ( $code ) = @_;<br />
# Turn off AutoCommit for this scope.<br />
# A commit will occur at the exit of this block automatically,<br />
# when the local AutoCommit goes out of scope.<br />
local $class->db_Main->{ AutoCommit };<br />
<br />
# Execute the required code inside the transaction.<br />
eval { $code->() };<br />
if ( $@ ) {<br />
my $commit_error = $@;<br />
eval { $class->dbi_rollback }; # might also die!<br />
die $commit_error;<br />
}<br />
}<br />
</perl><br />
<br />
=====Technical Overview=====<br />
<br />
* Lazy Loading<br />
** One can either do automated creation of objects or explicitly dictate which fields are incorporated into object<br />
<perl><br />
Turnkey::Model::Feature->columns( Primary => qw/feature_id/ );<br />
Turnkey::Model::Feature->columns( Essential => qw/name organism_id type_id/ );<br />
Turnkey::Model::Feature->columns( Others => qw/residues .../ );<br />
</perl><br />
<br />
Typically:<br />
<br />
<perl><br />
Turnkey::Model::Feature->set_up_table('feature');<br />
</perl><br />
<br />
=====Problem 1=====<br />
<br />
* Create Feature & Add Description<br />
<br />
<perl><br />
# now create mRNA feature<br />
<br />
my $feature = Turnkey::Model::Feature->find_or_create({<br />
organism_id => $organism,<br />
name => 'xfile', uniquename => 'xfile',<br />
type_id => $mrna_cvterm,<br />
is_analysis => 'f', is_obsolete => 'f'<br />
});<br />
<br />
# create description<br />
<br />
my $featureprop = Turnkey::Model::Featureprop->find_or_create({<br />
value => 'A test gene for GMOD meeting',<br />
feature_id => $feature,<br />
type_id => $note_cvterm,<br />
});<br />
</perl><br />
<br />
=====Problem 2=====<br />
<br />
* Retrieve a Feature via Searching<br />
** Search using strings or identifiers, a search will return an iterator object<br />
<br />
<perl><br />
# objects for global use<br />
<br />
# the organism for our new feature<br />
my $organism = Turnkey::Model::Organism->search(abbreviation => "S.cerevisiae")->next;<br />
<br />
# the cvterm for a "Note"<br />
my $note_cvterm = Turnkey::Model::Cvterm->retrieve(2);<br />
<br />
# searching name by wildcard<br />
<br />
my @results = Turnkey::Model::Feature->search_like(name => 'x-%');<br />
</perl><br />
<br />
=====Problems 3, 4, & 5=====<br />
<br />
* Update a Feature<br />
<br />
<perl><br />
# update the xfile gene name<br />
<br />
$feature->name("x-file");<br />
$feature->update();<br />
</perl><br />
<br />
* Delete a Feature<br />
<br />
<perl><br />
# now delete the x-file feature<br />
<br />
$feature->delete();<br />
</perl><br />
<br />
=====Things Chado::AutoDBI does well=====<br />
<br />
* Easy to use<br />
* Easy to port<br />
* Use with other DBs<br />
** Both Oracle and Postgres used currently<br />
* Autogenerated via Turnkey<br />
* find_or_create method<br />
* Performance is not as bad as you might guess<br />
** Due to Lazy loading<br />
** Even whole genome operations are feasible<br />
<br />
Note that speed is relative: one can find bad performance using the wrong SQL and Chado::AutoDBI approach will be speedier.<br />
<br />
<br />
=====For More Information=====<br />
<br />
* Class::DBI<br />
** http://www.class-dbi.com<br />
** http://search.cpan.org<br />
<br />
* Turnkey<br />
** http://turnkey.sf.net<br />
<br />
* Biopackages<br />
** http://biopackages.net<br />
<br />
===Modware===<br />
<br />
====Background====<br />
<br />
* Source: http://gmod-ware.sourceforge.net<br />
* Language: Perl<br />
* Authors: Sohel Merchant, Eric Just<br />
* Users: DictyBase<br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Does not ''cover'' all of Chado<br />
* Not enough users to get quality feedback yet<br />
* Performance (?)<br />
* Language dependent<br />
<br />
====Presentation by Eric Just====<br />
<br />
Eric Just, Senior Bioinformatics Scientist, dictyBase: http://dictybase.org<br />
Center for Genetic Medicine, Northwestern University. This is an edited version of [[Media:Modware.pdf|Eric's presentation]].<br />
<br />
=====Why Modware Was Developed=====<br />
<br />
* Each feature type requires different behavior<br />
* Want to leave schema semantics out of application<br />
* Want to leverage work done in BioPerl<br />
* Re-use code developed for common use cases<br />
* DictyBase is using a superset of Modware<br />
** Some DictyBase-specific code used there<br />
<br />
=====What is in the Feature Table?=====<br />
<br />
The core of Chado<br />
<br />
* Chromosome<br />
* Contig<br />
* Gene<br />
* mRNA<br />
* Exon<br />
* Lots of other things - See Sequence Ontology!<br />
<br />
=====Modware Features=====<br />
<br />
* Multiple Feature classes<br />
** CHROMOSOME, GENE, MRNA, CONTIG<br />
* Each class provides type specific methods<br />
* Logic such as building exon structure of mRNA features is encapsulated<br />
* Parent class Modware::Feature<br />
** Provides common methods<br />
** Abstract factory for various feature types<br />
* Lazy : information is only retrieved when you ask for it<br />
* Uses Bioperl and its objects<br />
** Common methods such as name(), primary_id(), external_ids() <br />
* Subclasses provide type-specific methods<br />
** For example, Chromosome isn't the same as Gene which isn't the same<br />
as ...<br />
* Any feature type not explicitly supported in Modware::Feature class is blessed as a Modware::Feature::GENERIC class which has a start/stop coordinate on a genomic sequence feature (no structure like a trasncript with exons)<br />
<br />
=====Architectural Overview=====<br />
<br />
* Object-oriented Perl interface to Chado<br />
* Built on top of Chado::AutoDBI<br />
* Connection handled by GMOD<br />
* Database transactions supported<br />
* BioPerl used to represent and manipulate sequence and feature structure<br />
* ‘Lazy’ evaluation<br />
<br />
=====Create and Insert Chromosome=====<br />
<br />
<perl><br />
my $seq_io = new Bio::SeqIO(<br />
-file => "../data/fake_chromosome.txt",<br />
-format => 'fasta'<br />
);<br />
<br />
# Bio::SeqIO will return a Bio::Seq object which<br />
# Modware uses as its representation<br />
my $seq = $seq_io->next_seq();<br />
<br />
my $reference_feature = new Modware::Feature(<br />
-type => 'chromosome',<br />
-bioperl => $seq,<br />
-description => "This is a test",<br />
-name => 'Fake',<br />
-source => 'GMOD 2007 Demo'<br />
);<br />
<br />
# Inserts chromosome into database<br />
$reference_feature->insert();<br />
</perl><br />
<br />
<br />
=====Problem 1 - Create and Insert a Gene=====<br />
<br />
1) Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned feature_id for each inserted gene.<br />
<br />
Gene Feature<br />
symbol: x-ray<br />
synonyms: none<br />
mRNA Feature<br />
exon:<br />
start: 1703<br />
end: 1900<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 1 - Create and Insert a Gene=====<br />
<br />
1) Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned feature_id for each inserted gene.<br />
<br />
Gene Feature<br />
symbol: x-men<br />
synonyms: wolverine<br />
mRNA Feature<br />
exon_1: <br />
start: 12648<br />
end: 13136<br />
strand: 1<br />
srcFeature_id: <br />
Id of genomic sample<br />
<br />
=====Problem 1 - Create and Insert a Gene=====<br />
<br />
1) Enter the information about the following three novel genes, including the associated mRNA structures, into your database. Print the assigned feature_id for each inserted gene.<br />
<br />
Gene Feature<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
mRNA Feature<br />
exon_1: <br />
start: 13691<br />
end: 13767<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
exon_2: <br />
start: 14687<br />
end: 14720<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 1 - Create and Insert a Gene=====<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
…<br />
<br />
<perl><br />
my $gene_feature = new Modware::Feature(<br />
-type => 'gene',<br />
-name => 'xfile',<br />
-description => 'A test gene for GMOD meeting',<br />
-source => 'GMOD 2007 Demo‘<br />
);<br />
<br />
$gene_feature->add_synonym( 'mulder' );<br />
$gene_feature->add_synonym( 'scully' );<br />
<br />
# inserts object into database<br />
$gene_feature->insert();<br />
print 'Inserted gene with feature_id:'.$gene_feature->feature_id()."\n";<br />
</perl><br />
<br />
=====Problem 1 - Create mRNA BioPerl Object=====<br />
<br />
exon_1: exon_2: <br />
start: 13691 start: 14687<br />
end: 13767 end: 14720<br />
strand: 1 strand: 1<br />
srcFeature_id: Id of genomic sample srcFeature_id: Id of genomic sample<br />
<br />
<perl><br />
# First, create exon features (using Bioperl)<br />
my $exon_1 = new Bio::SeqFeature::Gene::Exon (<br />
-start => 13691,<br />
-end => 13767,<br />
-strand => 1,<br />
-is_coding => 1<br />
);<br />
<br />
my $exon_2 = new Bio::SeqFeature::Gene::Exon (<br />
-start => 14687,<br />
-end => 14720,<br />
-strand => 1,<br />
-is_coding => 1<br />
);<br />
<br />
# Next, create transcript feature to 'hold' exons (using Bioperl)<br />
my $bioperl_mrna = new Bio::SeqFeature::Gene::Transcript();<br />
<br />
# Add exons to transcript (using Bioperl)<br />
$bioperl_mrna->add_exon( $exon_1 );<br />
$bioperl_mrna->add_exon( $exon_2 );<br />
</perl><br />
<br />
=====Problem 1 - Create and Insert mRNA=====<br />
<br />
The BioPerl object holds the location information, but now we want to create a Modware object and link it to the gene as well as locate it on the chromosome.<br />
<br />
<perl><br />
# Now create Modware Feature to 'hold' bioperl object<br />
my $mrna_feature = new Modware::Feature(<br />
-type => 'mRNA',<br />
-bioperl => $bioperl_mrna,<br />
-source => 'GMOD 2007 Demo',<br />
-reference_feature => $reference_feature<br />
);<br />
<br />
# Associate mRNA to gene (required for insertion)<br />
$mrna_feature->gene( $gene_feature );<br />
<br />
# inserts object into database<br />
$mrna_feature->insert();<br />
</perl><br />
<br />
=====Problem 2 - Writing the Report=====<br />
<br />
2) Retrieve and print the following report for gene xfile<br />
<br />
symbol: xfile<br />
synonyms: mulder, scully<br />
description: A test gene for GMOD meeting<br />
type: gene<br />
exon1 start: 13691<br />
exon1 end: 13767<br />
exon2 start: 14687<br />
exon2 end: 14720<br />
>xfile cds<br />
ATGGCGTTAGTATTCATGGTTACTGGTTTCGCTACTGATATCACCCAGCGTGTAGGCTGT<br />
GGAATCGAACACTGGTATTGTATAAATGTTTGTGAATACACTGAGAAATAA<br />
<br />
Create new package, GMODWriter, to write the report, this package<br />
uses Modware and Bioperl methods.<br />
<br />
<perl><br />
use Modware::Gene;<br />
use GMODWriter;<br />
<br />
my $xfile_gene = new Modware::Gene( -name => 'xfile' );<br />
GMODWriter->Write_gene_report( $xfile_gene );<br />
</perl><br />
<br />
* What's the difference between Modware::Gene and Modware::Feature? Gene is-a Feature.<br />
<br />
=====Problem 2 - Writing the Report=====<br />
<br />
2) Retrieve and print the following report for gene xfile<br />
<br />
* The mRNA object contains the Bioperl object<br />
** Why not just subclass? More flexibility the way shown here<br />
<br />
<perl><br />
package GMODWriter; <br />
sub Write_gene_report {<br />
my ($self, $gene) = @_;<br />
my $symbol = $gene->name();<br />
<br />
my @synonyms = @{ $gene->synonyms() };<br />
my $syn_string = join ",", @synonyms;<br />
my $description = $gene->description();<br />
my $type = $gene->type();<br />
# get features associated with the gene that are of type 'mRNA'<br />
my ($mrna) = grep { $_->type() eq 'mRNA' } @{ $gene->features() };<br />
# use bioperl method to get exons from mRNA<br />
my @exons = $mrna->bioperl->exons_ordered();<br />
# Modware will return a nice fasta file for you.<br />
my $fasta = $mrna->sequence( -type => 'cds', -format => 'fasta' );<br />
# Now print the actual report<br />
print "symbol: $symbol\n";<br />
print "synonyms: $syn_string\n";<br />
print "description: $description\n";<br />
print "type: $type\n";<br />
<br />
my $count = 0;<br />
foreach my $exon (@exons ) {<br />
$count++;<br />
print "exon${count} start: ".$exon->start()."\n";<br />
<br />
print "exon${count} end: ".$exon->end()."\n";<br />
<br />
}<br />
print "$fasta";<br />
}<br />
. . .<br />
</perl><br />
<br />
=====Problem 3 - Updating a Gene Name=====<br />
<br />
3) Update the gene xfile: change the name symbol to x-file and retrieve the changed record. Regenerate gene report<br />
<br />
<perl><br />
use Modware::Gene;<br />
use Modware::DBH;<br />
use GMODWriter;<br />
<br />
eval{<br />
<br />
# get xfile gene<br />
my $xfile_gene = new Modware::Gene( -name => 'xfile' );<br />
<br />
# change the name<br />
$xfile_gene->name( 'x-file' );<br />
# write changes to database<br />
$xfile_gene->update();<br />
<br />
# we can use the original object if we want, but instead<br />
# we refetch from the database to 'prove' the name has been changed<br />
my $xfile_gene2 = new Modware::Gene( -name => 'x-file' );<br />
# use our GMODWriter package to write report for x-file<br />
GMODWriter->Write_gene_report( $xfile_gene2 );<br />
<br />
};<br />
if ($@){<br />
warn $@;<br />
new Modware::DBH->rollback();<br />
}<br />
</perl><br />
<br />
<br />
=====Problem 4 - Search and Display Results=====<br />
<br />
4) Search for all genes with symbols starting with "x-*". With the results produce the following simple result list (organism will vary):<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
1325 x-ray Xenopus laevis<br />
<br />
<perl><br />
use Modware::Gene;<br />
use Modware::DBH;<br />
use GMODWriter;<br />
<br />
# find genes starting with 'x-'<br />
my $results = Modware::Search::Gene->Search_by_name( 'x-*' );<br />
<br />
# write the search results<br />
GMODWriter->Write_search_results( $results )<br />
</perl><br />
<br />
<br />
=====Problem 4 - Search and Display Results=====<br />
<br />
4) Search for all genes with symbols starting with "x-*". With the results produce the following simple result list (organism will vary):<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
1325 x-ray Xenopus laevis<br />
<br />
<br />
<perl><br />
sub Write_search_results {<br />
my ($self, $itr) = @_;<br />
# loop through iterator<br />
while (my $gene = $itr->next) {<br />
# print the requested information<br />
print $gene->feature_id . "\t" . $gene->name .<br />
"\t" . $gene->organism_name . "\n";<br />
}<br />
}<br />
</perl><br />
<br />
=====Problem 5 - Delete a Gene=====<br />
<br />
5) Delete the gene x-ray. Run the search and report again.<br />
<br />
1323 x-file Xenopus laevis<br />
1324 x-men Xenopus laevis<br />
<br />
<perl><br />
# get the xray gene<br />
my $xray = new Modware::Gene( -name => 'x-ray' );<br />
<br />
# set is_deleted = 1, this will 'hide' the gene from searches,<br />
# also sets the is_available to 0, the gene is no longer visible<br />
# to a search.<br />
<br />
$xray->is_deleted(1);<br />
<br />
# write change to database<br />
$xray->update();<br />
<br />
# find genes starting with 'x-'<br />
my $results = Modware::Search::Gene->Search_by_name( 'x-*' );<br />
<br />
# write the search results<br />
GMODWriter->Write_search_results( $results )<br />
</perl><br />
<br />
<br />
=====Other Modware Highlights=====<br />
<br />
* Easy to write applications with Modware<br />
* Extensible<br />
* Available through Sourceforge<br />
** http://gmod-ware.sourceforge.net<br />
* Easy to install<br />
* Large unit test coverage<br />
* Current release 0.2-RC1<br />
** Works with GMOD’s latest release<br />
* Sample script demoed here are available<br />
** sample_scripts directory<br />
<br />
=====Other Nice Things About Modware=====<br />
<br />
<br />
* Bioperl-style documentation <br />
** http://gmod-ware.sourceforge.net/doc/<br />
** POD for all methods<br />
* If Chado changes then...<br />
** Manually change Modware or ... <br />
** AutoDBI will automatically adjust to the change, depends on the change<br />
* Can set multiple connections through AutoDBI's <code>set_connection</code><br />
<br />
=====Coming Attractions=====<br />
<br />
* Support for changing genomic sequence<br />
* ncRNAs<br />
* UTRs<br />
* Onotology modules<br />
* Phenotype Annotations<br />
* Getting a new database handle returns the existing<br />
** Thinking about configuring modules to set what database handle can be used<br />
* Pass an argument ''type'' to the Gene's feature() method<br />
* Type the kind of synonym is being inserted?<br />
** Possible: trade-off between simplicity and functionality<br />
<br />
* Send us your ideas!<br />
<br />
<br />
=====Discussion=====<br />
<br />
* How hard is it to extend Modware?<br />
** Not known absolutely, but generally thought to be not difficult<br />
<br />
=====Acknowlegments=====<br />
<br />
* Rex Chisholm, PhD<br />
* Warren Kibbe, PhD <br />
* Scott Cain<br />
* Brian O’connor<br />
* Sohel Merchant<br />
* Petra Fey<br />
* Pascale Gaudet,<br />
* Karen Pilcher<br />
<br />
* BioPerl<br />
* GMOD<br />
* SGD<br />
<br />
===GBrowse (DasI) Adaptor===<br />
<br />
====Background====<br />
<br />
* Source: http://sourceforge.net/project/showfiles.php?group_id=27707&package_id=34513&release_id=433523<br />
* Language: Perl<br />
* Authors: Scott Cain<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity: Connects vi perl DBI<br />
* Transaction support: N/A (read only adapter)<br />
* Code generation: N/A<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Read-only<br />
* Not generic middleware but if you use Chado and GBrowse may be useful<br />
* Incomplete implementation of Bio::DasI; just enough to make GBrowse work<br />
* Also, despite the name, has never been tested with a Das server.<br />
<br />
====Presentation by Scott Cain====<br />
<br />
This Wiki section is an edited version of [[Media:DasI_middleware.pdf|Scott's presentation]].<br />
<br />
=====Create the database=====<br />
<br />
$ perl Makefile.PL<br />
$ make<br />
$ sudo make install<br />
$ make load_schema<br />
$ make prepdb # now with Xenopus!<br />
$ make ontologies # load rel, SO, featureprop<br />
<br />
=====Problem 1 - Loading Data=====<br />
<br />
Create some GFF from the specifications:<br />
<br />
fake_chromosome example chromosome 1 15017 . . . ID=fake_chromosome;Name=fake_chromosome<br />
fake_chromosome example gene 13691 14720 . + . ID=xfile;Name=xfile;Alias=mulder,scully;Note=A test gene for GMOD meeting<br />
fake_chromosome example mRNA 13691 14720 . + . ID=xfile_mRNA;Parent=xfile<br />
fake_chromosome example exon 13691 13767 . + . Parent=xfile_mRNA<br />
fake_chromosome example exon 14687 14720 . + . Parent=xfile_mRNA<br />
fake_chromosome example gene 12648 13136 . + . ID=x-men<br />
<br />
Gene inserted as GFF using a standard Bioperl bulk loader:<br />
<br />
<code>$ gmod_bulk_load_gff3.pl -g sample.gff</code><br />
<br />
''...lots of output...''<br />
<br />
=====Adaptor Components=====<br />
<br />
* Bio::DB::Das::Chado<br />
** Database connection object<br />
* Bio::DB::Das::Chado::Segment<br />
** Object for any range of DNA<br />
* Bio::DB::Das::Chado::Segment::Feature<br />
<br />
=====Use Bio::DB::Das::Chado=====<br />
<br />
<perl><br />
use Bio::DB::Das::Chado;<br />
<br />
my $chado = Bio::DB::Das::Chado->new(<br />
-dsn => "dbi:Pg:dbname=test",<br />
-user=> "scott",<br />
-pass=> "" ) || die "no new chado";<br />
<br />
my $gene_name = 'xfile';<br />
<br />
my ($gene_fo) = $chado->get_features_by_name($gene_name);<br />
</perl><br />
<br />
=====Problem 2 - Use Some Accessors=====<br />
<br />
<perl><br />
print "symbol: " . $gene_fo->display_name."\n";<br />
print "synonyms: " . join(', ',$gene_fo->synonyms)."\n";<br />
print "description: " . $gene_fo->notes."\n";<br />
print "type: " . $gene_fo->type."\n";<br />
<br />
my ($mRNA) = $gene_fo->sub_SeqFeature();<br />
my @exons = $mRNA->sub_SeqFeature();<br />
<br />
for my $exon (@exons) {<br />
next unless ($exon->type->method eq 'exon');<br />
$exon_count++;<br />
print "exon$exon_count start: " . $exon->start."\n";<br />
print "exon$exon_count end: " . $exon->end. "\n";<br />
$cds_seq .= $exon->seq->seq; # the first seq call returns a Bio::Seq object, the second gets the DNA string from Bio::Seq<br />
} <br />
</perl><br />
<br />
=====Bulk Output=====<br />
<br />
<perl><br />
my $gene_name = 'x-*';<br />
<br />
my @genes = $chado->get_features_by_name(<br />
-name => $gene_name,<br />
-class=> 'gene' );<br />
<br />
for my $gene (@genes) {<br />
print join("\t",<br />
$gene->feature_id,<br />
$gene->display_name,<br />
$gene->organism),"\n";<br />
}<br />
</perl><br />
<br />
Or see your report in GBrowse<br />
<br />
=====Advantages=====<br />
<br />
* Comes 'for free' with GBrowse<br />
** GBrowse will run with any DasI-compatible interface<br />
* Uses 'familiar' BioPerl idioms, very similar to widely used Bio::DB::GFF (though with fewer methods)<br />
<br />
<br />
=====Conclusion=====<br />
<br />
* Not suitable as a 'general' middleware layer<br />
** May be suitable for some applications, particularly if they are similar to GBrowse or other uses of Bio::DB::GFF<br />
<br />
===iBatis and Abator===<br />
<br />
====Background====<br />
<br />
* Source: http://ibatis.apache.org/<br />
* Language: Java<br />
* Authors: Apache group<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Special topics====<br />
<br />
* Demonstrations of what your software does well<br />
<br />
====Limitations====<br />
<br />
* Does not hide SQL<br />
* Does not create a whole object model of the database in memory<br />
* Not as widely used as Hibernate<br />
* No Perl version<br />
<br />
====Presentation by Jeff Bowes====<br />
<br />
Jeff Bowes, Xenbase, University of Calgary. This Wiki section is an edited version of [[Media:iBatis.pdf|Jeff's presentation]].<br />
<br />
=====ibatis=====<br />
<br />
* iBatis<br />
** Light-weight framework<br />
** Still based on SQL but eliminates the repetitive drudgery of JDBC<br />
** You can tune a query by re-writing the SQL in XML & the API does not change.<br />
* iBatis does not create your database in memory as objects<br />
* Shallow learning curve<br />
* Manually create a Java class and SQL map to describe higher-level objects<br />
** Example: ''Gene''<br />
* Support for inheritance<br />
** Inheritance in result maps, allows fair amount of re-use. <br />
* Supports different transaction schemes<br />
** For example, JDBC, Java Transaction API<br />
<br />
=====Abator=====<br />
<br />
* Generates ibatis CRUD objects by introspecting database tables<br />
* Abator creates ''SQL in XML'' files (SQL Map files) and Java classes <br />
** Within these files is a Result Map section.<br />
* Abator config files are simple, set connection parameters, tell where the files are.<br />
* In the SQL Map files you can specify how to find parent ids, such asfeature_id. <br />
<br />
=====Abator Example=====<br />
<xml><br />
<abatorConfiguration><br />
<abatorContext> <!-- TODO: Add Database Connection Information --><br />
<jdbcConnection driverClass="COM.ibm.db2.jdbc.app.DB2Driver"<br />
connectionURL="jdbc:db2:XBDV05"<br />
userId="db2inst1"<br />
password=“*******"><br />
<classPathEntry location="/Program Files/IBM/SQLLIB/java/db2java.zip" /><br />
</jdbcConnection><br />
<br />
<javaModelGenerator<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.model"<br />
targetProject="gene" /><br />
<sqlMapGenerator<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.sql"<br />
targetProject="gene" /><br />
<daoGenerator type="IBATIS"<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.dao"<br />
targetProject="gene" /><br />
<abatorConfiguration><br />
</xml><br />
<br />
=====Abator Example=====<br />
<xml><br />
<abatorConfiguration><br />
<abatorContext> <!-- TODO: Add Database Connection Information --><br />
<jdbcConnection driverClass="COM.ibm.db2.jdbc.app.DB2Driver"<br />
connectionURL="jdbc:db2:XBDV05"<br />
userId="db2inst1"<br />
password=“*******"><br />
<classPathEntry location="/Program Files/IBM/SQLLIB/java/db2java.zip" /><br />
</jdbcConnection><br />
<br />
<javaModelGenerator<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.model"<br />
targetProject="gene" /><br />
<sqlMapGenerator<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.sql"<br />
targetProject="gene" /><br />
<daoGenerator type="IBATIS"<br />
targetPackage="org.gmod.architecture.framwork.bakeoff.abator.dao"<br />
targetProject="gene" /><br />
<abatorConfiguration><br />
</xml><br />
<br />
=====Abator Example=====<br />
<nowiki><br />
<table schema="db2inst1" tableName="synonym"></nowiki><br />
<generatedKey column="synonym_id" sqlStatement="VALUES PREVVAL FOR<br />
synonym_seq" identity="true" /><br />
<columnOverride column="CREATED_BY" jdbcType="INTEGER" /><br />
<columnOverride column="MODIFIED_BY" jdbcType="INTEGER" /><br />
<nowiki></table></nowiki><br />
<br />
=====Abator=====<br />
<br />
Works as:<br />
<br />
* Eclipse plug-in<br />
* ANT<br />
* Standalone<br />
<br />
=====DAO Methods=====<br />
<br />
* Insert (Feature)<br />
* Update (Feature)<br />
* DeletebyKey (FeatureKey)<br />
* SelectbyKey (FeatureKey)<br />
* SelectbyExample (FeatureExample)<br />
* DeletebyExample (FeatureExample)<br />
<br />
=====Insert=====<br />
<xml><br />
<insert id="abatorgenerated_insert" parameterClass=<br />
"org.gmod.architecture.framwork.bakeoff.abator.model.FeatureWithBLOBs"><br />
insert into db2inst1.feature<br />
(DBXREF_ID, ORGANISM_ID, NAME, UNIQUENAME,<br />
RESIDUES, SEQLEN, MD5CHECKSUM, TYPE_ID, IS_ANALYSIS,<br />
IS_OBSOLETE, CREATED_BY)<br />
values (#dbxrefId:INTEGER#, #organismId:INTEGER#, #name:VARCHAR#,<br />
#uniquename:VARCHAR#, #residues:CLOB#, #seqlen:INTEGER#,<br />
#md5checksum:CHAR#, #typeId:INTEGER#,<br />
#isAnalysis:SMALLINT#, #isObsolete:SMALLINT#,<br />
#createdBy:INTEGER#)<br />
<br />
<selectKey resultClass="java.lang.Integer" keyProperty="featureId"><br />
VALUES PREVVAL FOR feature_seq<br />
</selectKey><br />
</insert><br />
</xml><br />
<br />
=====Insert=====<br />
<xml><br />
<insert id="abatorgenerated_insert" parameterClass=<br />
"org.gmod.architecture.framwork.bakeoff.abator.model.FeatureWithBLOBs"><br />
insert into db2inst1.feature<br />
(DBXREF_ID, ORGANISM_ID, NAME, UNIQUENAME,<br />
RESIDUES, SEQLEN, MD5CHECKSUM, TYPE_ID, IS_ANALYSIS,<br />
IS_OBSOLETE, CREATED_BY)<br />
values (#dbxrefId:INTEGER#, #organismId:INTEGER#, #name:VARCHAR#,<br />
#uniquename:VARCHAR#, #residues:CLOB#, #seqlen:INTEGER#,<br />
#md5checksum:CHAR#, #typeId:INTEGER#,<br />
#isAnalysis:SMALLINT#, #isObsolete:SMALLINT#,<br />
#createdBy:INTEGER#)<br />
<br />
<selectKey resultClass="java.lang.Integer" keyProperty="featureId"><br />
VALUES PREVVAL FOR feature_seq<br />
</selectKey><br />
</insert><br />
</xml><br />
<br />
=====Insert=====<br />
<xml><br />
<insert id="abatorgenerated_insert" parameterClass=<br />
"org.gmod.architecture.framwork.bakeoff.abator.model.FeatureWithBLOBs"><br />
insert into db2inst1.feature<br />
(DBXREF_ID, ORGANISM_ID, NAME, UNIQUENAME,<br />
RESIDUES, SEQLEN, MD5CHECKSUM, TYPE_ID, IS_ANALYSIS,<br />
IS_OBSOLETE, CREATED_BY)<br />
values (#dbxrefId:INTEGER#, #organismId:INTEGER#, #name:VARCHAR#,<br />
#uniquename:VARCHAR#, #residues:CLOB#, #seqlen:INTEGER#,<br />
#md5checksum:CHAR#, #typeId:INTEGER#,<br />
#isAnalysis:SMALLINT#, #isObsolete:SMALLINT#,<br />
#createdBy:INTEGER#)<br />
<br />
<selectKey resultClass="java.lang.Integer" keyProperty="featureId"><br />
VALUES PREVVAL FOR feature_seq<br />
</selectKey><br />
</insert><br />
</xml><br />
<br />
=====Insert=====<br />
<xml><br />
<selectKey resultClass="java.lang.Integer"<br />
keyProperty="featureId"><br />
VALUES PREVVAL FOR feature_seq<br />
</selectKey><br />
</xml><br />
<br />
=====Insert=====<br />
<xml><br />
<selectKey resultClass="java.lang.Integer"<br />
keyProperty="featureId"><br />
VALUES PREVVAL FOR feature_seq<br />
</selectKey><br />
</xml><br />
<br />
=====Problem 1 - Insert=====<br />
<br />
<java><br />
try {<br />
sqlMap.startTransaction();<br />
pGene.id =featureDAO.insert(pGene.getFeatureWithBLOBs());<br />
featurepropDAO.insert(pGene.getPropertyDescription());<br />
pGene.featurelocId = featurelocDAO.insert(pGene<br />
.getFeaturelocWithBLOBS());<br />
pGene = insertExons(pGene);<br />
insertSynonyms(pGene);<br />
sqlMap.commitTransaction();<br />
} catch (Exception e) {<br />
System.out.println(e);<br />
throw (e);<br />
} finally {<br />
sqlMap.endTransaction();<br />
}<br />
</java><br />
<br />
=====Problem 1 - Insert=====<br />
<br />
<java><br />
try {<br />
sqlMap.startTransaction();<br />
pGene.id =featureDAO.insert(pGene.getFeatureWithBLOBs());<br />
featurepropDAO.insert(pGene.getPropertyDescription());<br />
pGene.featurelocId = featurelocDAO.insert(pGene<br />
.getFeaturelocWithBLOBS());<br />
pGene = insertExons(pGene);<br />
insertSynonyms(pGene);<br />
sqlMap.commitTransaction();<br />
} catch (Exception e) {<br />
System.out.println(e);<br />
throw (e);<br />
} finally {<br />
sqlMap.endTransaction();<br />
}<br />
</java><br />
<br />
=====Transactions=====<br />
<br />
* SQLMap<br />
* JDBC<br />
* JTA - Java Transaction API<br />
** 2-Phase commit<br />
* Hibernate<br />
* External (Customized)<br />
<br />
=====Retrieval=====<br />
<br />
symbol: xfile<br />
description: A test gene for GMOD meeting<br />
mRNA Feature<br />
exon_1: start: 13691 end: 13767 <br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
exon_2: start: 14687 end: 14720<br />
strand: 1<br />
srcFeature_id: Id of genomic sample<br />
<br />
=====Problem 2 - Master Detail Reports=====<br />
<br />
Account for cycles or recursion in Master Detail Report. <br />
<br />
<xml><br />
<resultMap id="SelectGeneResults"<br />
class="org.gmod.architecture.framwork.bakeoff.Gene" groupBy="id"><br />
<result column="FEATURE_ID" property="id" jdbcType="INTEGER"/><br />
<result column="GENE_NAME" property="name" jdbcType="VARCHAR" /><br />
<result column="DESCRIPTION" property="description“<br />
jdbcType="VARCHAR" /><br />
<result column="TYPE_ID" property="typeId" jdbcType="INTEGER" /><br />
<result property="exons" resultMap = "gene.SelectExonResults"/><br />
</resultMap><br />
<br />
<resultMap id="SelectExonResults"<br />
class="org.gmod.architecture.framwork.bakeoff.Exon"><br />
<result column="EXON_ID" property="id" jdbcType="INTEGER"/><br />
<result column="EXON_NAME" property="name" jdbcType="VARCHAR" /><br />
<result column="EXON_RESIDUES" property="residues" jdbcType="CLOB" /><br />
<result column="STRAND" property="strand" jdbcType="INTEGER" /><br />
<result column="FMIN" property="fmin" jdbcType="INTEGER" /><br />
<result column="FMAX" property="fmax" jdbcType="INTEGER" /><br />
<result column="SRCFEATURE_ID" property="sourceFeatureId"<br />
jdbcType="INTEGER" /><br />
</resultMap><br />
</xml><br />
<br />
=====Problem 2 - Master Detail Report=====<br />
<xml><br />
<resultMap id="SelectGeneResults"<br />
class="org.gmod.architecture.framwork.bakeoff.Gene" groupBy="id"><br />
<result column="FEATURE_ID" property="id" jdbcType="INTEGER"/><br />
<result column="GENE_NAME" property="name" jdbcType="VARCHAR" /><br />
<result column="DESCRIPTION" property="description“<br />
jdbcType="VARCHAR" /><br />
<result column="TYPE_ID" property="typeId" jdbcType="INTEGER" /><br />
<result property="exons" resultMap = "gene.SelectExonResults"/><br />
</resultMap><br />
<br />
<resultMap id="SelectExonResults"<br />
class="org.gmod.architecture.framwork.bakeoff.Exon"><br />
<result column="EXON_ID" property="id" jdbcType="INTEGER"/><br />
<result column="EXON_NAME" property="name" jdbcType="VARCHAR" /><br />
<result column="EXON_RESIDUES" property="residues" jdbcType="CLOB" /><br />
<result column="STRAND" property="strand" jdbcType="INTEGER" /><br />
<result column="FMIN" property="fmin" jdbcType="INTEGER" /><br />
<result column="FMAX" property="fmax" jdbcType="INTEGER" /><br />
<result column="SRCFEATURE_ID" property="sourceFeatureId"<br />
jdbcType="INTEGER" /><br />
</resultMap><br />
</xml><br />
<br />
=====Master Detail Report=====<br />
<br />
gene_id Symbol Type Fmin Fmax<br />
6129482 x-files gene 13691 13767<br />
6129482 x-files gene 14687 14720<br />
<br />
Becomes:<br />
<br />
gene_id Symbol Type Fmin Fmax<br />
6129482 x-files gene 13691 13767<br />
14687 14720<br />
<br />
=====Dynamic Queries=====<br />
<br />
* Gene Name (Description)<br />
** Feature, Featureprop<br />
* Symbol<br />
** Feature<br />
* Feature Synonyms<br />
** Feature, Feature_Synonym, Synonym<br />
* Ortholog Synonyms<br />
** Feature, Feature_relationship, Feature, Feature Synonyms<br />
<br />
=====Dynamic Queries=====<br />
<br />
FROM<br />
CAT_X_GENE_V gc<br />
<isEqual<br />
prepend=",property="searchSymbol"<br />
compareValue="true"><br />
GENE_SYMBOLS s<br />
</isEqual><br />
<br />
<isEqual prepend=","<br />
property="searchNcbi" <br />
compareValue="true"><br />
NCBI_GI n<br />
</isEqual><br />
<br />
=====Dynamic Queries=====<br />
<br />
<dynamic prepend="WHERE"><br />
<isEqual prepend="AND" property="searchNameOnly“<br />
compareValue="true"><br />
<iterate property="searchTokens" conjunction="AND" <br />
open=" (" close=") "><br />
LOWER(VARCHAR(gc.longname)) LIKE <br />
LOWER(CAST(#searchTokens[]:VARCHAR# AS VARCHAR(512)))<br />
</iterate><br />
</isEqual><br />
<br />
Iterate very useful for multiple search terms <br />
<br />
=====Miscellaneous Features=====<br />
<br />
* Supports various data sources<br />
** Simple JDBC<br />
** DBCP – Apache Connection Pooling<br />
** JNDI – Java Naming Directory Interface<br />
* Very flexible<br />
* Local caching of results<br />
** Lazy loading<br />
<br />
=====Support=====<br />
<br />
* In GMOD used by<br />
** Xenbase, Artemis at Sanger<br />
* Many other users<br />
** e.g. MySpace.com<br />
* Top level Apache Project<br />
** www.ibatis.apache.org<br />
* Active community<br />
<br />
<br />
=====What iBatis Does Well=====<br />
<br />
* Does not hide SQL<br />
** No new query language to learn<br />
* Separates and groups SQL<br />
* Simple!!<br />
** Light wrapper - No real tweaks<br />
* Does the job well<br />
* Excellent support for Master-Detail<br />
* Dynamically generated queries <br />
** You can structure conditions around clauses in SQL<br />
** One XML statement can represent many variations on a query<br />
<br />
=====Acknowledgements=====<br />
<br />
GMOD<br />
* Eric Just<br />
* Everyone else<br />
<br />
Ibatis Developers<br />
* Kevin Snyder,<br />
* Chris Jarabek,<br />
* Ross Gibb<br />
<br />
PI<br />
* Peter Vize<br />
<br />
Financial Support<br />
* Alberta Heritage Foundation for Medical Research<br />
* Alberta Network for Proteomics Innovation<br />
* University of Calgary, Faculty of Science<br />
* University of Calgary Dept. of Computer Science<br />
* NICHD<br />
<br />
===Hibernate===<br />
<br />
====Background====<br />
<br />
* Source: http://www.hibernate.org<br />
* Language: Java<br />
* Authors: JBoss group<br />
* Users: VectorBase<br />
* Support: JBoss group<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Limitations====<br />
<br />
* Issue around Completeness <br />
* Exception Handling<br />
* Performance Tuning<br />
<br />
====Presentation by Robert Bruggner====<br />
<br />
Chado API via Java & Hibernate, Robert Bruggner, VectorBase.org. This Wiki section is an edited version of [[Media:HibernateChadoAPI.pdf|Robert's presentation]].<br />
<br />
=====Overview=====<br />
<br />
* Background<br />
* Quick Hibernate Overview<br />
* Hibernate Connectivity and O/R Mapping Example<br />
* GMOD Demo<br />
<br />
Also see [[Comparison_of_XORT_and_Hibernate_for_Chado_reporting|Comparison of XORT and Hibernate for Chado Reporting]].<br />
<br />
=====Background=====<br />
<br />
* VectorBase<br />
** A bioinformatic resource center for invertebrate vectors of human pathogens<br />
* Responsible for storage and display of multiple organisms’ genomes<br />
** Anopheles gambiae, Aedes aegypti, Ixodes scapularis, Culex pipiens and so on....<br />
* Want to store data for many organisms- Chado a natural choice<br />
* Ensembl Genome Browser already used for ''A. gambiae''<br />
** Wrote Ensembl API Database adaptor for Chado... Not maintainable.<br />
* Use Both Databases<br />
** Transfer genomic data from Ensembl to Chado<br />
** Search Engine and Indexer using Lucene<br />
** Run DAS<br />
** Export data via ChadoXML and GFF3<br />
* Need API for Database I/O<br />
<br />
=====Hibernate Background=====<br />
<br />
* They say: “A powerful, high performance object/relational persistence and query service.”<br />
* Automates the persistence of plain old Java objects (POJO)<br />
** User maps their POJO properties to database tables via XML (HBM File).<br />
** There are Hibernate tools that generate HBMs<br />
*** Configurable in the sense that one can create get & set tables where the methods map one-to-one to fields.<br />
* Persist a specific object by storing it the database.<br />
* Intelligent Database I/O <br />
** Smart detection of ''Dirty Properties'' when performing Save / Update / Delete.<br />
** Cascadable Save / Update / Delete for complex objects.<br />
* Everything's done within the scope of a transaction.<br />
<br />
=====Hibernate Database Connectivity=====<br />
<br />
* Configure Hibernate in hibernate.cfg.xml<br />
* Define a Data Source<br />
** We use a simple, single JDBC connection Chado<br />
** Can be configured to use a connection pool or data source accessible by the Java Naming and Directory Interface (JNDI).<br />
** Define a connection “dialect”<br />
** org.hibernate.dialect.PostgreSQLDialect<br />
* Describe the relationship between Java objects and database tables<br />
** Use XML to describe where to store POJO property data in the database<br />
* Create a new Hibernate Session based on the configuration<br />
* Begin a transaction to start performing work<br />
<br />
=====POJO and HBM Example file - CV=====<br />
<br />
<java><br />
public class CV {<br />
<br />
private int cv_id<br />
private String name;<br />
private String definition;<br />
<br />
public property gettersandsetters() {<br />
....<br />
}<br />
<br />
public boolean equals(CV comparaCV) {<br />
....<br />
}<br />
public int hashCode(){<br />
...<br />
}<br />
}<br />
</java><br />
<xml><br />
<hibernate-mapping><br />
<br />
<class name="org.vectorbase.chadoAPI.chadoObjects.CV" table="cv"><br />
<br />
<id name="cv_id" column="cv_id" unsaved-value="undefined"><br />
<br />
<generator class="sequence"><br />
<br />
<param name="sequence">cv_cv_id_seq</param><br />
<br />
</generator><br />
<br />
</id> <br />
<br />
<property name="name" column=”name” type="java.lang.String” not-null="true"/><br />
<br />
<property name="definition" column=”definition” type="java.lang.String”/><br />
<br />
</class><br />
</hibernate-mapping><br />
</xml><br />
<br />
=====HBM Example CVTerm=====<br />
<br />
<java><br />
public class CVTerm {<br />
<br />
private int cvterm_id;<br />
<br />
private CV cv;<br />
<br />
private String name;<br />
<br />
private String definition;<br />
<br />
private DBXref dbxref;<br />
<br />
private int is_obsolete;<br />
<br />
private int is_relationshiptype;<br />
<br />
.....<br />
<br />
}<br />
</java><br />
<xml><br />
<hibernate-mapping><br />
<br />
<class name="org.vectorbase.chadoAPI.chadoObjects.CVTerm" table="cvterm"><br />
<br />
<id name="cvterm_id" column="cvterm_id" unsaved-value="undefined"><br />
<br />
<generator class="sequence"><br />
<br />
<param name="sequence">cvterm_cvterm_id_seq</param><br />
<br />
</generator><br />
<br />
</id><br />
<br />
<many-to-one name="cv" class="org.vectorbase.chadoAPI.chadoObjects.CV" column="cv_id" <br />
not-null="true" cascade="save-update"/><br />
<br />
<property name="name" not-null="true" type="java.lang.String"/><br />
<br />
<property name="definition"/><br />
<br />
<one-to-one name="dbxref" class="org.vectorbase.chadoAPI.chadoObjects.DBXref" cascade="all"/><br />
<br />
<property name="is_obsolete"/><br />
<br />
<property name="is_relationshiptype"/><br />
<br />
</class><br />
</hibernate-mapping><br />
</xml><br />
<br />
=====Hibernate Object Retrieve=====<br />
<br />
One can use Java, Hibernate Query Language, or SQL, this example uses HQL<br />
<br />
<java><br />
import org.hibernate.Session;<br />
import org.vectorbase.chadoAPI.CVTerm;<br />
import org.vectorbase.chadoAPI.CV;<br />
<br />
// Load the configuration from hibernate.cfg.xml<br />
<br />
// Build a session factory first (not shown)<br />
<br />
// Get the session based on the configuration and begin transaction<br />
Session session = HibernateSessionFactory.getCurrentSession();<br />
session.beginTransaction();<br />
<br />
// Load a CVTerm by its ID<br />
CVTerm cvt = (CVTerm) session.get(CVTerm.class,1);<br />
<br />
// Load a CVTerm using HQL<br />
CVTerm cvt = session.createQuery(“from CVTerm where name=?”).setString(0,”name”).uniqueResult();<br />
<br />
// Print out the name of the cvterm<br />
System.out.println(cvt.getName());<br />
<br />
// Get the cv that the cvterm is associated with<br />
// Hibernate doesn’t return the cv_id - it returns a CV Object.<br />
CV cv = cvt.getCv();<br />
<br />
// Print out the cv’s name<br />
System.out.println(cv.getName());<br />
</java><br />
<br />
=====Hibernate Object Update=====<br />
<br />
<java><br />
import org.hibernate.Session;<br />
import org.vectorbase.chadoAPI.CVTerm;<br />
<br />
// Load the configuration from hibernate.cfg.xml<br />
<br />
// Build a session factory first (not shown)<br />
<br />
// Get the session based on the configuration and begin transaction<br />
Session session = HibernateSessionFactory.getCurrentSession();<br />
session.beginTransaction();<br />
<br />
// Load a CVTerm by its ID<br />
CVTerm cvt = (CVTerm) session.get(CVTerm.class,1);<br />
<br />
// Change cvt’s name<br />
cvt.setName(“New CVTerm name”);<br />
<br />
// Save!<br />
// Generated SQL updates “Dirty” properties (name, in this case)<br />
session.save(cvt);<br />
<br />
// Commit data to database<br />
session.commit();<br />
</java><br />
<br />
=====Hibernate Save=====<br />
<br />
<java><br />
import org.hibernate.Session;<br />
import org.vectorbase.chadoAPI.CVTerm;<br />
import org.vectorbase.chadoAPI.CV;<br />
<br />
// Load the configuration from hibernate.cfg.xml<br />
// Build a session factory first and get begin transaction (not shown)<br />
<br />
// Make a new CV<br />
CV new_cv = new CV();<br />
new_cv.setName(“New CV”);<br />
new_cv.setDefinition(“New CV Def”);<br />
<br />
// Make a new cvterm for that cv<br />
CVTerm new_cvterm = new CVTerm();<br />
new_cvterm.setName(“New CVTerm Name”);<br />
// ..... save dbxref etc......<br />
<br />
// Add that CVTerm to our new CV<br />
new_cv.addCVTerm(new_cvterm);<br />
<br />
// Save the new data...<br />
// Hibernate recognizes that it has to first save new_cv, then save new_cvterm.<br />
session.save(new_cvterm);<br />
<br />
session.commit();<br />
<br />
// You can see the new id’s assigned by the database<br />
System.out.println(new_cv.getCv_id());<br />
System.out.println(new_cvterm.getCvterm_id());<br />
</java><br />
<br />
=====Inheritance=====<br />
<xml><br />
<hibernate-mapping><br />
<br />
<class name="org.vectorbase.chadoAPI.chadoObjects.Feature" table="feature" discriminator-<br />
value="not null"><br />
<br />
<id name="feature_id" column="feature_id" unsaved-value="undefined"><br />
<br />
<generator class="sequence"><br />
<br />
<param name="sequence">feature_feature_id_seq</param><br />
<br />
</generator><br />
<br />
</id> <br />
<br />
<discriminator column="type_id" type="integer" insert="false"/><br />
<br />
<many-to-one name="dbxref" class="org.vectorbase.chadoAPI.chadoObjects.DBXref" <br />
column="dbxref_id" cascade="all"/><br />
<br />
<many-to-one name="organism" class="org.vectorbase.chadoAPI.chadoObjects.Organism" <br />
column="organism_id" not-null="true" cascade="save-update"/><br />
<br />
<property name="name"/><br />
.....<br />
<br />
<hibernate-mapping> <br />
<br />
<subclass name="org.vectorbase.chadoAPI.chadoFeatures.Gene" <br />
extends="org.vectorbase.chadoAPI.chadoObjects.Feature" discriminator-value="767"><br />
<br />
</subclass><br />
</hibernate-mapping><br />
</xml><br />
Write custom methods for specific sub-classes<br />
<br />
=====ChadoAPI=====<br />
<br />
* POJO Mappings<br />
** CV, CVTerm, DB, DBXref, Feature, FeatureCVTerm, FeatureDBXref, FeatureLoc, FeatureProp, FeatureRelationship, FeatureSynonym, Organism, Pub, Synonym<br />
* Extended Features<br />
** Chromosome, Gene, Transcript, Exon, Protein<br />
* Constants<br />
** CVTerms, FeatureFeatureRelationships, Ontologies<br />
* Special<br />
** ChadoAdapter<br />
<br />
=====Problem 1 - GMOD Example=====<br />
<br />
<java><br />
// Set up our session and begin transaction<br />
Session session = HibernateUtil.getSessionFactory().getCurrentSession();<br />
session.beginTransaction();<br />
<br />
// Make a chado adpator and load up some utility objects<br />
ChadoAdaptor ca = new ChadoAdaptor();<br />
Chromosome c = ca.fetchChromosomeByUniqueName("fake_chromosome");<br />
Pub null_pub = ca.fetchPubByPubID(1);<br />
Organism agambiae = ca.fetchOrganismByScientificName("Anopheles","gambiae");<br />
<br />
// Begin GMOD Demo Code<br />
<br />
// Make our new gene;<br />
Gene xfile = new Gene();<br />
xfile.setOrganism(agambiae);<br />
xfile.setUniquename("xfile");<br />
xfile.setDescription("A test gene for GMOD meeting");<br />
<br />
/* Set the location of our gene. No need to set coordinates because they'll be updated<br />
* based on the exon boundaries. <br />
*/<br />
FeatureLoc xfile_loc = new FeatureLoc();<br />
xfile_loc.setSrcfeature(c);<br />
xfile_loc.setStrand(1);<br />
xfile.setFeatureLoc(xfile_loc);<br />
<br />
// Add synonyms to xfile<br />
xfile.createNewFeatureSynonym("mulder", null_pub, CVTerms.EXACT_SYNONYM);<br />
xfile.createNewFeatureSynonym("scully", null_pub, CVTerms.EXACT_SYNONYM);<br />
</java><br />
<br />
=====Problem 2 - GMOD Example=====<br />
<br />
<java><br />
// Create a new transcript for our gene.<br />
Transcript t = xfile.createGeneTranscript("xfile-RA");<br />
<br />
// Create some exons for that transcript.<br />
t.createTranscriptExon("xfile:1", 13691, 13767);<br />
t.createTranscriptExon("xfile:2", 14687, 14720);<br />
<br />
// Save our new gene<br />
session.save(xfile);<br />
System.out.println("xfile feature_id is " + xfile.getFeature_id());<br />
<br />
// Fetch our saved gene from the database<br />
Gene xfile_r = ca.fetchGeneByUniqueName("xfile");<br />
System.out.println("symbol: " + xfile_r.getUniquename());<br />
System.out.print("synonyms: ");<br />
for (FeatureSynonym fs : xfile_r.getFeatureSynonyms()){<br />
<br />
System.out.print(fs.getSynonym().getName() + " ");<br />
}<br />
<br />
System.out.println("description: " + xfile_r.getDescription());<br />
System.out.println("type: " + xfile_r.getType().getName());<br />
<br />
for (Transcript tx : xfile_r.fetchAllTranscripts()){<br />
for (Exon e : tx.fetchAllExons()){<br />
System.out.println(e.getUniquename() + " Start:\t" + e.getFeatureLoc().getFmin());<br />
System.out.println(e.getUniquename() + " End:\t" + e.getFeatureLoc().getFmax());<br />
System.out.println("\tSrcFeatureID: " + e.getFeatureLoc().getSrcfeature().getFeature_id());<br />
}<br />
System.out.println(">" + tx.getUniquename());<br />
System.out.println(tx.generateTranscriptSequenceFromExons().toUpperCase());<br />
}<br />
</java><br />
<br />
=====Problems 3, 4, & 5 - GMOD Update & Delete=====<br />
<br />
<java><br />
// Lets update our name...<br />
xfile_r.setUniquename("x-file");<br />
<br />
session.save(xfile_r);<br />
<br />
// Not part of the ChadoAdaptor utility object, but a good example of HQL<br />
List<Gene> genes = (List<Gene>)session.createQuery("from Gene where uniquename like ?").setString(0,”x-%”).list();<br />
<br />
for (Gene g : genes){<br />
<br />
System.out.println(g.getFeature_id() + <br />
"\t" + g.getUniquename() + <br />
"\t" + g.getOrganism().getGenus() +<br />
" " + g.getOrganism().getSpecies());<br />
}<br />
<br />
// Deleting... hmm...<br />
Gene delete_me = ca.fetchGeneByUniqueName("x-ray");<br />
session.delete(delete_me);<br />
<br />
// All Finished<br />
session.getTransaction().commit();<br />
</java><br />
<br />
<br />
<br />
=====What Hibernate Does Well=====<br />
<br />
* Hibernate can be configured to perform specialized functions<br />
** For example, it has its own notion of a cascade<br />
* Flexible with respect to language<br />
** Java, Hibernate Query Language, or SQL<br />
* Any JDBC driver<br />
<br />
=====Acknowledgements=====<br />
<br />
* VectorBase People<br />
** Frank Collins, EO Stinson, Ryan Butler<br />
* GMOD<br />
* NIAID<br />
<br />
===PSU Chado Interface===<br />
<br />
====Background====<br />
<br />
* Source:<br />
* Language: Java<br />
* Authors: Chinmay Patel, Adrian Tivey<br />
* Users: <br />
* Support:<br />
* Third party code:<br />
<br />
====Technical Overview====<br />
<br />
* Database connectivity:<br />
* Transaction support:<br />
* Code generation:<br />
<br />
====Limitations====<br />
<br />
* Hibernate Save: no equivalent to a ''find or save'' method.<br />
* Not great for bulk data retrieval.<br />
* Hibernate works best when applied to databases designed with objects in mind (''Object Oriented Databases''). <br />
<br />
<br />
====Presentation by Chinmay Patel====<br />
<br />
This Wiki section is an edited version of [[Media:PSU.pdf|Chinmay's presentation]].<br />
<br />
=====GeneDB=====<br />
<br />
* GeneDB is the organism data and annotation database for the Pathogen Sequencing Unit (PSU) at the Sanger Institute, UK<br />
* Contains 37 organisms, which is expected to grow to 62<br />
* Currently migrating to chado schema<br />
* Java API with two engines Hibernate & iBatis<br />
** Two teams, Artemis and GeneDB, took different approaches<br />
<br />
=====Technical - Connections=====<br />
<br />
Connections are configured in the Spring configuration file<br />
<xml><br />
<bean id="dataSource" class="org.apache.commons.dbcp.BasicDataSource"><br />
<property name="driverClassName" value="org.postgresql.Driver" /><br />
<property name="url" value="jdbc:postgresql://holly.sanger.ac.uk:5432/chado" /><br />
<property name="username" value="DELIBERATELY_BOGUS_NAME"/><br />
<property name="password" value="WIBBLE" /><br />
</bean><br />
</xml><br />
* Uses a connection pool<br />
* Connection to the database is specified graphically, so the iBatis configuration file has variables for the location:<br />
<xml><br />
<property name="JDBC.Driver" value="org.postgresql.Driver"/><br />
<br />
<property name="JDBC.ConnectionURL” value="jdbc:postgresql://${chado}"/><br />
<br />
<property name="JDBC.Username" value="${username}"/><br />
<br />
<property name="JDBC.Password" value="${password}"/><br />
</xml><br />
<br />
* provide database location, username & password<br />
* select from scrollable list of feature with residues (organisms in separate Postgres schemas) what to open in Artemis<br />
<br />
=====Technical - Code Generation=====<br />
<br />
* The shared interface and hibernate implementation were originally generated<br />
* There’s no explicit code generation (although the Spring and Hibernate runtimes may use them behinds the scenes)<br />
<br />
=====Technical - Transactions=====<br />
<br />
* Transactions are fully supported<br />
* There’s no explicit code generation (although the Spring and Hibernate runtimes may use them behind the scenes)<br />
<br />
=====Problems 1, 2, & 3=====<br />
<br />
Creating a gene<br />
<java><br />
genes[0] = new Feature(ORG, GENE, "xfile", false, false, now, now);<br />
<br />
genes[0].setSeqLen(1029); <br />
sequenceDao.persist(genes[0]);<br />
<br />
FeatureLoc loc = new FeatureLoc(SOURCE_FEATURE, genes[0], 13691, false, 14720, false, (short)1, 0, 0 ,0);<br />
<br />
sequenceDao.persist(loc);<br />
<br />
addFeatureProp(genes[0], "description", "A test gene for GMOD meeting");<br />
<br />
addSynonymsToFeature(genes[0], "mulder", "scully");<br />
<br />
createExon("exon1", genes[0], 13691, 13767, now, 0);<br />
<br />
createExon("exon2", genes[0], 14687, 14720, now, 1);<br />
</java><br />
<br />
Retrieve a gene<br />
<java><br />
Feature f = sequenceDao.getFeatureByUniqueName("xfile");<br />
displayGene(f);<br />
</java><br />
<br />
Update a gene<br />
<java><br />
genes[0].setUniqueName("x-file");<br />
<br />
sequenceDao.merge(genes[0]);<br />
</java><br />
<br />
=====Demo – Sample Problem=====<br />
<br />
<java><br />
private Feature createExon(String name, Feature gene, int min, int max, Timestamp now, int rank) {<br />
<br />
Feature exon = new Feature(ORG, EXON, name, false, false, now, now);<br />
exon.setSeqLen(max-min);<br />
sequenceDao.persist(exon);<br />
<br />
FeatureLoc loc = new FeatureLoc(SOURCE_FEATURE, exon, min, false, max, false, <br />
(short)1, 0, 0 ,0);<br />
sequenceDao.persist(loc);<br />
<br />
return exon;<br />
<br />
}<br />
</java><br />
<br />
=====Demo – Sample Problem=====<br />
<br />
<xml><br />
<st:section name="Naming" id="gene_naming" collapsed="false" collapsible="false"<br />
hideIfEmpty="true"><br />
<dl><br />
<dt><b>symbol:</b></dt><br />
<dd>${feature.uniqueName}</dd><br />
</dl><br />
<db:synonym name="synonym" var="name" collection="${feature.featureSynonyms}"><br />
<br /><b>Synonym:</b> <db:list-string collection="${name}" /><br />
</db:synonym><br />
<dt><b>Type:</b></dt><br />
<dd>${feature.cvTerm.name}</dd><br />
<br />
<st:section name="Exons" collapsed="false" collapsible="true" hideIfEmpty="true"><br />
<display:table name="exons" uid="tmp" pagesize="30" class="simple" cellspacing="0"<br />
cellpadding="4"><br />
<display:column property="uniqueName" title="Exon"/><br />
<display:column property="featureLocsForSrcFeatureId.fmin" title="Start"/><br />
<display:column property="featureLocsForSrcFeatureId.fmax" title="end"/><br />
</display:table><br />
</st:section><br />
<br />
<st:section name="cds" collapsible="true"><br />
<b>${feature.residues}</b><br />
</st:section><br />
</xml><br />
<br />
Specialized functionality like a cascading delete are handled by the database</div>165.124.152.78http://gmod.org/wiki/GBrowseGBrowse2007-01-25T19:18:14Z<p>165.124.152.78: </p>
<hr />
<div><br />
<br />
The Generic Genome Browser (which is also occationally and incorrectly referred to as 'gbrowser') is a combination of database and interactive web page for manipulating and displaying annotations on genomes. The web browser has the following features.<br />
<br />
* Simultaneous bird's eye and detailed views of the genome.<br />
* Scroll, zoom, center.<br />
* Attach arbitrary URLs to any annotation.<br />
* Order and appearance of tracks are customizable by<br /> administrator and end-user.<br />
* Search by annotation ID, name, or comment.<br />
* Supports third party annotation using<br />[http://www.sanger.ac.uk/software/GFF GFF] formats.<br />
* Settings persist across sessions.<br />
* DNA and GFF dumps.<br />
<br />
See [[a demo here]]<br />
<br />
== Tutorial ==<br />
<br />
Scan through [[the tutorial that comes with the package]] to get a quick overview of its features.<br />
<br />
== Downloading and Installing ==<br />
<br />
Download the source from the [http://sourceforge.net/project/showfiles.php?group_id=27707 SourceForge download page.], or see the [[instructions]]] for information on installing from a binary distribution (most suitable on MS Windows platforms).<br />
<br />
=== Getting the Latest &amp; Greatest Version by CVS ===<br />
<br />
There are many new features in the current development version which have not been released yet. To get the latest and greatest version, please use anonymous CVS. The recommended branch to use is "gbrowse-sessions" which is more stable than HEAD. Here is the recipe:<br />
<br />
<br />
% cvs -d :pserver:anonymous@gmod.cvs.sourceforge.net:/cvsroot/gmod login<br />
CVS password: &lt;hit return&gt;<br />
% cvs -d :pserver:anonymous@gmod.cvs.sourceforge.net:/cvsroot/gmod co -r gbrowse-session Generic-Genome-Browser<br />
<br />
Once you have successfully checked out the Generic-Genome-Browser distribution, you can simply perform a "cvs update" inside the directory to get recent changes.<br />
<br />
You can also browse the GBrowse CVS [http://gmod.cvs.sourceforge.net/gmod/Generic-Genome-Browser/ here.]<br />
<br />
== Demo ==<br />
<br />
Select an organism and desired region; then press "Browse." This will take you to a live demo where you can browse the genomes of worm, yeast and fly. Mouse will be added when the assembly is published.<br />
<br />
{|<br />
! class="searchtitle" colspan="3" | Browsable Genomes<br />
!<br />
|- class="searchbody"<br />
! C. elegans (worm)<br />
! D. melanogaster (fly)<br />
|- class="searchbody"<br />
!<br />
Chromosome I<br />Chromosome II<br />Chromosome III<br />Chromosome IV<br />Chromosome X<br /> <br /><br /><br /><br />
!<br />
Arm 2L<br />Arm 2R<br />Arm 3L<br />Arm 3R<br />4<br />X<br />U<br /> <br /><br /><br /><br />
|- class="searchtitle"<br />
! S. cerevisiae (yeast)<br />
! H. sapiens (human)<br />
|- class="searchbody"<br />
!<br />
Chromosome I<br />Chromosome II<br />Chromosome III<br />Chromosome IV<br />Chromosome V<br />Chromosome VI<br />Chromosome VII<br />Chromosome VIII<br />Chromosome IX<br />Chromosome XI<br />Chromosome XII<br />Chromosome XIII<br />Chromosome XIV<br />Chromosome XV<br />Chromosome XVI<br />Mitochondrium<br /> <br /><br /><br /><br />
!<br />
Chromosome 1<br />Chromosome 2<br />Chromosome 3<br />Chromosome 4<br />Chromosome 5<br />Chromosome 6<br />Chromosome 7<br />Chromosome 8<br />Chromosome 9<br />Chromosome 10<br />Chromosome 11<br />Chromosome 12<br />Chromosome 13<br />Chromosome 14<br />Chromosome 15<br />Chromosome 16<br />Chromosome 17<br />Chromosome 18<br />Chromosome 19<br />Chromosome 20<br />Chromosome 21<br />Chromosome 22<br />Chromosome X<br />Chromosome Y<br /> <br /><br /><br /><br />
|}<br />
<br />
== About the Database ==<br />
<br />
GBrowse has a flexible adaptor system for running off various types of database. Standard adaptors include:<br />
<br />
* Flat file adaptors (in-memory, indexed) -- put your annotations in a directory and go!<br />
* Relational database adaptors -- Chado, Bio::DB::GFF, BioSQL<br />
* Network adaptors -- read annotations from GenBank, UCSC or Ensembl<br />
<br />
== Getting the Software ==<br />
<br />
This is Open Source software, which is available for your own genome annotation projects. To get it, go to the [http://sourceforge.net/project/showfiles.php?group_id=27707 SourceForge download page.] Please report bugs to the SourceForge [http://sourceforge.net/tracker/?func=add&group_id=27707&atid=391291 bug tracker]. Please send questions to the [https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse GBrowse mailing list].<br />
<br />
<br />
<br />
* [[Adding an outgoing link to a text on the feature detail page]]<br />
* [[Gbrowse installation]]<br />
* [[GFF3 Stuff]]<br />
* [[Human GFF file HOWTO]]<br />
* [[Simple synteny viewer in GBrowse]]</div>165.124.152.78http://gmod.org/wiki/XORTXORT2007-01-25T18:44:09Z<p>165.124.152.78: New page: <nowiki> Chado-XML ========= Chado-XML is a direct mapping of the Chado relational schema into XML. Currently the only tool for performing this mapping is XML::XORT, which can du...</p>
<hr />
<div><br />
<br />
<nowiki><br />
Chado-XML<br />
=========<br />
<br />
Chado-XML is a direct mapping of the Chado relational schema into<br />
XML. Currently the only tool for performing this mapping is XML::XORT,<br />
which can dump or save Chado-XML to and from a chado db.<br />
<br />
Contents:<br />
---<br />
chado-xml<br />
README<br />
xsl/ -- useful transforms<br />
dtd/ -- DTDs/XSDs defining the xml model<br />
examples/ -- example XML files<br />
doc/ -- documentation<br />
---<br />
<br />
Documentation in asciidoc - can use asciidoc.py to convert to HTML,<br />
PDF, RTF etc<br />
<br />
See:<br />
<br />
SourceForge: http://sourceforge.net/projects/asciidoc/<br />
Main website: http://www.methods.co.nz/asciidoc/<br />
<br />
<br />
<br />
Macros<br />
------<br />
<br />
The basic chado-xml expansion can be extremely verbose - this is<br />
because chado-xml uses the unique keys from the chado db, yet it does<br />
not database internal foreign keys.<br />
<br />
Macros can be used to capture repeated nodes in the xml and give them<br />
XML IDs that are valid within a particular document.<br />
<br />
<br />
<br />
<br />
See also<br />
--------<br />
<br />
gmod/XML-XORT<br />
</nowiki><br />
<br />
<br />
<br />
* [[Chado-xml doc]]<br />
* [[XORT Usage]]<br />
<br />
<br />
<br />
<br />
Chris suggested the inclusion of the chado-xml docs from the schema--I did that.<br />
<br />
<br />
<br />
<br />
I should also mention that this README and the child page with chado-xml documentation are dynamically pulled in an included from the sourceforge cvs. While this means it will always be up to date with respect to what is in cvs, it also give an additional point of failure if SF's cvs server is down.<br />
<br />
<br />
Flat list - collapsedFlat list - expandedThreaded list - collapsedThreaded list - expanded Date - newest firstDate - oldest first 10 comments per page30 comments per page50 comments per page70 comments per page90 comments per page Select your preferred way to display the comments and click "Save settings" to activate your changes.</div>165.124.152.78http://gmod.org/wiki/SynBrowseSynBrowse2007-01-25T18:44:03Z<p>165.124.152.78: New page: ; '''Description'''<br /> : SynBrowse (Synteny Browser) is a generic sequence comparison tool for visualizing genome alignments both within and between species. It is intended to help sc...</p>
<hr />
<div><br />
<br />
; '''Description'''<br /><br />
: SynBrowse (Synteny Browser) is a generic sequence comparison tool for visualizing genome alignments both within and between species. It is intended to help scientists study and analyze synteny, homologous genes and other conserved elements between sequences. This software is useful in studying genome duplication and evolution. It can also aid in identifying uncharacterized genes, putative regulatory elements and novel structural features of study species by comparing to a well annotated reference sequence, thus enabling genome curators to refine and edit annotations of species that have incomplete genome annotations.<br />
<br />
; '''Demo'''<br /><br />
: Please see http://www.synbrowse.org.<br />
<br />
; '''Requirement'''<br /><br />
: GBrowse 1.62 or higher.<br />
<br />
; '''Downloads'''<br /><br />
: The source code and installation documentation of SynBrowse as well as the associated tools can be found at [http://www.synbrowse.org/download.html http://www.synbrowse.org/download].<br />
<br />
; '''Citation'''<br /><br />
: Pan, X., Stein, L. and Brendel, V. 2005. SynBrowse: a Synteny Browser for Comparative Sequence Analysis. [http://bioinformatics.oxfordjournals.org/cgi/content/abstract/21/17/3461 Bioinformatics 21: 3461-3468].<br />
<br />
; '''Contact'''<br /><br />
: If you have questions, comments and suggestions about SynBrowse, please contact [[Xiaokang Pan]].</div>165.124.152.78http://gmod.org/wiki/SybilSybil2007-01-25T18:44:01Z<p>165.124.152.78: New page: Sybil is a multi-organism synteny viewer written by developers at TIGR. This page is a placeholder for more content, but I wanted to at least get a few links up: * [http://sybil.source...</p>
<hr />
<div><br />
<br />
Sybil is a multi-organism synteny viewer written by developers at TIGR.<br />
<br />
This page is a placeholder for more content, but I wanted to at least get a few links up:<br />
<br />
* [http://sybil.sourceforge.net/index.html Sybil sourceforge site]<br />
* [http://sybil.sourceforge.net/demos.html Sybil Demo]</div>165.124.152.78http://gmod.org/wiki/GMOD_ComponentsGMOD Components2007-01-25T18:43:56Z<p>165.124.152.78: New page: GMOD is a loose federation of software applications (components) aimed at providing functionality that is needed by all model organism databases. The applications are linked together by ...</p>
<hr />
<div><br />
<br />
GMOD is a loose federation of software applications (components) aimed at providing functionality that is needed by all model organism databases. The applications are linked together by their use of a common database schema known as Chado.<br />
<br />
This diagram represents a model organism database (MOD) and its typical components. The "top ten" types of functionality are represented in the top row of the diagram as a set of interfaces. The "Vis" prefix on the end of each interface is an abbreviation of "Visualization", and should be interpreted literally -- an alignment visualization interface should not merely allow users to passively view pre-existing alignments, but should also to user interaction to create and visualize alignments of their own.<br />
<br />
<center><br />[[<br />[[Image:roadmap25.jpg]]]<br />]<br /></center><br />
<br />
[http://www.gmod.org/?q=node/99 Key to diagram]<br />
<br />
GMOD components fulfilling the requirements of an interface are linked to the interface, as well as to the Chado schema modules if they are known to interact with Chado. The requirements of an interface, as well as the components that implement a given interface are described in the documents linked at the end of this document.<br />
<br />
----<br />
<br />
While this book (hierarchical set of web pages) is largely intended to replace the software matrix that was on the old GMOD website, I have received some requests for the old [[software matrix, so here you go.]]<br />
<br />
<br />
<br />
* [[Common Documents]]<br />
* [[Common Templates]]<br />
* [[Community Visualization]]<br />
* [[Comparative Genomics Visualization]]<br />
* [[Database tools]]<br />
* [[Gene Expression Visualization]]<br />
* [[Genome Visualization &amp; Editing]]<br />
* [[GMODWeb]]<br />
* [[Literature Visualization]]<br />
* [[Molecular Pathway Visualization]]<br />
* [[Ontology Visualization]]<br />
* [[Phenotype Visualization]]<br />
* [[Sequence Alignment]]<br />
* [[Sequence Alignment Visualization]]<br />
* [[Strain/Library Visualization]]<br />
* [[Utilities]]<br />
* [[Workflow Management]]<br />
<br />
<br />
<br />
{| id="attachments"<br />
! Attachment<br />
! Size<br />
|- class="dark"<br />
|<br />
[http://www.gmod.org/files/roadmap.png roadmap.png]<br />
| 45.53 KB<br />
|- class="light"<br />
|<br />
[http://www.gmod.org/files/roadmap25.jpg roadmap25.jpg]<br />
| 14.09 KB<br />
|- class="dark"<br />
|<br />
[http://www.gmod.org/files/GMOD_components_roadmap.zuml GMOD_components_roadmap.zuml]<br />
| 28.25 KB<br />
|}<br />
<br />
<br />
<br />
<br />
probably a good idea to turn this back on once the old content has all been ported over to the new site. i just added a new node on workflow/ergatis, and it would be difficult to know this had been added if not posted to the front b/c it is deep in the components book.<br />
<br />
<br />
<br />
<br />
I think he said something to the effect that the front page looked too crowded. Most new posts will be noted on the front page: blog posts will go in their block, comments will go in their block. What will not go in is new 'book' pages, though it will still go into an RSS feed. We could make a block with the site's RSS feed, which would show recent posts on the side, without having to show content. If someone wanted to highlight a new book page, they can blog about it too. (That is very unlikely to happen in real life, though, I expect.)<br />
<br />
<br />
<br />
<br />
RSS sounds like a reasonable compromise.<br />
<br />
<br />
<br />
<br />
Moving things off the front page seems to have messed with the RSS feed, so that solution won't work. Here's what I did instead: I added a module called 'front_page' that lets you configure a custom front page, and then created a node that has just sticky nodes in it, and now requests from outside drupal for www.gmod.org will get redirected to that node. This is the 'clean' homepage. But any clicks on links from inside drupal to www.gmod.org will go to the default homepage with recent nodes. This is the 'working' homepage. I also reset blog entries and book nodes to display on the front page by default.<br />
<br />
<br />
<br />
<br />
I thought I had it all figured out, but it doesn't seem to work yet. At the moment, all reqests for www.gmod.org, regardless of where they originate, go to what is supposed to be just the 'initial' homepage (node 86). I'll sort it out eventually.<br />
<br />
<br />
<br />
<br />
I just had to write a little php.<br />
<br />
<br />
<br />
<br />
zvvpkncyghnfiyqnarczuovgwqjgyq<br />[http://kaaop.sujbox.com link] [url=http://kxtus.xywiub.com]link[/url]<br />
<br />
<br />
<br />
<br />
looks like we need to turn off anonymous comments.<br />
<br />
<br />
<br />
<br />
Hey, where did the UML diagram go? I really liked having the mini-version on the frontpage, it's a good way to see what GMOD currently is and what is being or needs to be developed. If the diagram needs to be resized, recolored, cropped, etc let me know.<br />
<br />
<br />
<br />
<br />
It was huge. I think you had it set at 600 pixels, so I put it after the break. It is a great picture, but if you shrink it more, it becomes completely non-informative but if you make it bigger it is much to big to have on a front page. I would say even at 600 pixels it was too distorted to be informative (I couldn't read any of the text).<br />
<br />
<br />
<br />
<br />
guess it just seemed mini on my 1920x1280 24" widescreen. shall we make a smaller version?<br />
<br />
<br />
Flat list - collapsedFlat list - expandedThreaded list - collapsedThreaded list - expanded Date - newest firstDate - oldest first 10 comments per page30 comments per page50 comments per page70 comments per page90 comments per page Select your preferred way to display the comments and click "Save settings" to activate your changes.</div>165.124.152.78http://gmod.org/wiki/PublicationsPublications2007-01-25T18:43:50Z<p>165.124.152.78: New page: * Donald G. Gilbert. DroSpeGe: rapid access database for new Drosophila species genomes.<br /> Nucleic Acids Res. 2007 35(Database issue):D480-D485; [http://dx.doi.org/10.1093/nar/gkl997...</p>
<hr />
<div><br />
<br />
* Donald G. Gilbert. DroSpeGe: rapid access database for new Drosophila species genomes.<br /> Nucleic Acids Res. 2007 35(Database issue):D480-D485; [http://dx.doi.org/10.1093/nar/gkl997 doi:10.1093/nar/gkl997]<br />
* [http://nar.oxfordjournals.org/cgi/content/full/gkl777?ijkey=Gl3GumB1en1BMhS&keytype=ref Olivier Arnaiz, Scott Cain, Jean Cohen and Linda Sperling]<br /> ParameciumDB: a community resource that integrates the Paramecium tetraurelia genome sequence with genetic data.<br /> Nucleic Acids Research 2006 Nov<br />
* John K. Colbourne , Vasanth R. Singan and Don G. Gilbert<br /> wFleaBase: the Daphnia genome database.<br /> BMC Bioinformatics 2005, 6:45 [http://dx.doi.org/10.1186/1471-2105-6-45 doi:10.1186/1471-2105-6-45]<br />
* [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=12582244&dopt=Abstract Pan X, Liu H, Clarke J, Jones J, Bevan M, Stein L. ]<br /> ATIDB: Arabidopsis thaliana insertion database.<br /> Nucleic Acids Res. 2003 Feb 15;31(4):1245-51.<br /> PMID: 12582244<br />
* [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=12368253&dopt=Abstract Stein LD, Mungall C, Shu S, Caudy M, Mangone M, Day A, Nickerson E, Stajich JE, Harris TW, Arva A, Lewis S. ]<br /> The generic genome browser: a building block for a model organism system database.<br /> Genome Res. 2002 Oct;12(10):1599-610.<br /> PMID: 12368253<br />
* [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=12537571&dopt=Abstract Lewis SE, Searle SM, Harris N, Gibson M, Lyer V, Richter J, Wiel C, Bayraktaroglir L, Birney E, Crosby MA, Kaminker JS, Matthews BB, Prochnik SE, Smithy CD, Tupy JL, Rubin GM, Misra S, Mungall CJ, Clamp ME. ]<br /> Apollo: a sequence annotation editor.<br /> Genome Biol. 2002;3(12):RESEARCH0082-2.<br /> PMID: 12537571</div>165.124.152.78http://gmod.org/wiki/TextpressoTextpresso2007-01-25T18:43:48Z<p>165.124.152.78: New page: ; '''Description'''<br /> : Textpresso is a text mining system for scientific literature whose<br /> capabilities go far beyond that of a simple keyword search engine. The two<br /> key ...</p>
<hr />
<div><br />
<br />
; '''Description'''<br /><br />
: Textpresso is a text mining system for scientific literature whose<br /> capabilities go far beyond that of a simple keyword search engine. The two<br /> key elements are the collection of the full text of scientific articles split<br /> into individual sentences, and the implementation of semantic categories, for<br /> which a database of articles and individual sentences can be searched. The<br /> source of the full text articles are PDFs, and additional bibliographical<br /> information that is obtained from other citation databases can be processed<br /> as well.<br />
<br />
; '''Demo &amp; Screenshots'''<br /><br />
: Please visit the live main site at<br />[http://www.textpresso.org www.textpresso.org] for examples and<br /> screenshots.<br />
<br />
; '''Requirements'''<br /><br />
: The package is designed for Linux operating systems and is tested to run on<br /> an Intel x86 based hardware. The required minimal disk space is around 6GB<br /> per 1000 full text papers, half of it is used by the publically (via WWW)<br /> accessible database, while the other half is needed for database preparation<br /> and maintenance. (If necessary, the latter can be reduced.) Software for a<br /> world wide web server such as Apache needs to be installed, and an Internet<br /> connection should exist. Furthermore, the standard Perl 5.6.1 or higher<br /> should be present, and the most common Perl packages. Thes installation<br /> script requires a bash shell. The Textpresso system requires the modules<br /> XML::Checker::Parser, XML::DOM::Parser, XML::XQL::DOM and<br /> XML::Checker::Parser, which usually come with a Linux distribution. If a<br /> standard Perl package is missing, it can be downloaded and installed from<br /> http://www.cpan.org. There are two non-standard Perl modules required,<br /> Mailer::Mail (in MailTools-1.58) and PDF::Create (in PDF-Create). They too<br /> can be downloaded from http://www.cpan.org. If a model organism database is<br /> used and based on ACeDB (http://www.acedb.org), the Perl module AcePerl is<br /> required. Textpresso uses two software packages: XPDF<br /> (http://www.foolabs.com/xpdf/) is distributed under the GNU general public<br /> license and provides the pdftotext converter. The other package contains a<br /> part-of- speech tagger developed by Eric Brill<br /> (http://research.microsoft.com/~brill/). It is distributed free of charge<br /> under a license of the Massachusetts Institute of Technology and the<br /> University of Pennsylvania. If you want to recompile either of the packages,<br /> you additionally need a C compiler, such as gcc (GNU project).<br />
<br />
This package has been tested with the Linux RedHat 9.0 distribution<br /> (http://www.redhat.com) and Debian Linux 3.1 (http://www.debian.org) . Both<br /> work with a 2.4.20 kernel or higher.<br />
<br />
; '''Documentation'''<br /><br />
: Installation instruction can be found in the tarzipped package file and is<br /> called TextpressoManual.pdf.<br />
<br />
A user guide is available<br />[http://www.textpresso.org/doc/userguide/doc-con.html#top <br /> online].<br />
<br />
; '''Contact'''<br /><br />
: Hans-Michael Muller, mueller (at) caltech.edu<br />
<br />
; '''Downloads'''<br /><br />
: [http://www.textpresso.org/textpresso/downloads.html <br /> http://www.textpresso.org/textpresso/downloads.html]<br />
<br />
<br />
<br />
<br />
Can I download anywhere?<br />
<br />
<br />
<br />
<br />
Hi,<br />
<br />
Sorry for the delay; I've been on vacation. The Textpresso download link seems to be working now. It must have been a transient problem.<br />
<br />
Scott<br />
<br />
<br />
Flat list - collapsedFlat list - expandedThreaded list - collapsedThreaded list - expanded Date - newest firstDate - oldest first 10 comments per page30 comments per page50 comments per page70 comments per page90 comments per page Select your preferred way to display the comments and click "Save settings" to activate your changes.</div>165.124.152.78http://gmod.org/wiki/Java_TreeViewJava TreeView2007-01-25T18:43:45Z<p>165.124.152.78: New page: ; '''Description'''<br /> : Java TreeView, a portable Java application that has been tested on Linux, Windows, MacOS9 and MacOSX. Java TreeView is a visualization for various clustering ...</p>
<hr />
<div><br />
<br />
; '''Description'''<br /><br />
: Java TreeView, a portable Java application that has been tested on Linux, Windows, MacOS9 and MacOSX. Java TreeView is a visualization for various clustering algorithms. Commonly microarray data are clustered, for example with hierarchical clustering. Java TreeView provides an easy to use interface to explore clustered experiments involving large sets of experiments or genes. Java TreeView is not specific to microarray data and have been used to display the results of clustering sequence motifs.<br />
<br />
; '''Demo &amp; Screenshots'''<br /><br />
: [http://jtreeview.sourceforge.net/docs/overview.html Overview] and [http://jtreeview.sourceforge.net/examples/index.html examples].<br />
<br />
; '''Requirements'''<br /><br />
: N/A<br />
<br />
; '''Documentation'''<br /><br />
: [http://jtreeview.sourceforge.net/ Docs]<br />
<br />
; '''Contact'''<br /><br />
: [[jtreeview-users@lists.sourceforge.net]].<br />
<br />
; '''Downloads'''<br /><br />
: [http://sourceforge.net/project/showfiles.php?group_id=84593 From Sourceforge]</div>165.124.152.78http://gmod.org/wiki/CaryoscopeCaryoscope2007-01-25T18:43:43Z<p>165.124.152.78: New page: ; '''Description'''<br /> : Caryoscope is a reusable Java UI component -- and a set of parsing utilities;<br /> command line tools and an application GUI -- for viewing gene expression d...</p>
<hr />
<div><br />
<br />
; '''Description'''<br /><br />
: Caryoscope is a reusable Java UI component -- and a set of parsing utilities;<br /> command line tools and an application GUI -- for viewing gene expression data in<br /> a whole-genome context.<br />
<br />
; '''Demo &amp; Screenshots'''<br /><br />
: [http://caryoscope.stanford.edu/screenshots.html Screenshots]<br />
<br />
; '''Requirements'''<br /><br />
: [http://caryoscope.stanford.edu/dependencies.html Dependencies page]<br />
<br />
; '''Documentation'''<br /><br />
: [http://caryoscope.stanford.edu/documentation.html Documentation page]<br />
<br />
; '''Contact'''<br /><br />
: [[Gavin Sherlock]].<br />
<br />
; '''Downloads'''<br /><br />
: [http://caryoscope.stanford.edu/getting.html Download page]</div>165.124.152.78http://gmod.org/wiki/PubFetchPubFetch2007-01-25T18:43:40Z<p>165.124.152.78: New page: : ; '''Description'''<br /><br /> PubFetch is part of the [[web-based literature curation toolset and functions as the interface between the literature curation tools and the online lite...</p>
<hr />
<div><br />
<br />
:<br />
; '''Description'''<br /><br /> PubFetch is part of the [[web-based literature curation toolset and functions as the interface between the literature curation tools and the online literature databases, such as PubMed. The aim of PubFetch is to provide a generic way of searching and retrieving literature data from online literature datasources so that the downstream applications dont have to deal with the idiosyncracies of the individual literature databases. Initially PubFetch will act as the interface between PubSearch and the [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed PubMed]] and [http://www.nal.usda.gov/ag98/ Agricola] databases used by [http://rgd.mcw.edu/ RGD] and [http://www.arabidopsis.org/ TAIR]. A standard API and data format will be created to provide database queries and return results, popular existing formats and protocols will be used/supported wherever possible.<br /><br />
<br />
:<br />
<br />
{| width="510" cellpadding="2" align="center"<br />
|<br />
[[of pubfetch]]]<br />
|-<br />
|<br />
'''Figure 1 - Overview diagram of PubFetch showing how the PubFetch module will provide a generic literature access interface to PubMed and Agricola which could be expanded to other literature sources as desired.'''<br />
|}<br />
<br />
:<br />
; <br /><br />
; '''Plan of Action'''<br />
: The codebase will be developed initially in perl by adapting [[exising RGD perl modules]] designed to retrieve data from PubMed in a standard XML format. This code will be reviewed and adapted to create the main PubFetch module and appropriate database interace modules'''.''' [[Figure 2]] is a schematic diagram of the exising RGD literature download modules.<br />
[[Image:Existing_PubMed_flow.jpg]]<br />
'''Figure 2- Current RGD literature download process showing perl modules used to interact with PubMed, create XML data and load into RGD'''<br />
<br />
: The fundamental actions required of PubFetch are as follows:<br /><br />
<br />
# Search LitDb for articles matching certain query criteria (eg. keywords, date, author, etc). This will most likely entail passing the search critieria to PubFetch and retrieving a set of accession numbers (eg. PubMed IDs, PMIDs) for matching references.<br />
# Retrieve the text information from the LitDb corresponding to a supplied accession number (eg. bring me the PubMed entry for PMID 12345)<br />
<br />
<br /><br />
<br />
; '''PubFetch as a BioMOBY webservice'''<br />
: To provide generic access to PubFetch we intend to make the core functionality available as a webservice, following the [http://www.biomoby.org/ BioMOBY] service model. The two actions described above will be implemented as two classes of webservices, the first taking keywords and returning PubMed IDs (or other LitDb accession) , the second taking LitDb accessions and returning the text information in a simple, standardized XML format. We will endeavour to provide the data in existing formats (raw data from the LitDb, a BioPerl-compatible format, etc) in addition to a simple XML format that is not dependent on other codebases<br />
:<br />
;<br />
:<br />
; '''Downloads'''<br /><br />
: These will ultimately be from [http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/gmod/pubfetch/ SourceForge]. Perl code, use case diagrams, etc. will be available shortly.</div>165.124.152.78http://gmod.org/wiki/LuceGeneLuceGene2007-01-25T18:43:38Z<p>165.124.152.78: New page: ; '''Description'''<br /> : This is an open-source document/object search and retrieval system specially tuned for bioinformatics text databases and documents. LuceGene is similar in co...</p>
<hr />
<div><br />
<br />
; '''Description'''<br /><br />
:<br />
<br />
This is an open-source document/object search and retrieval system specially tuned for bioinformatics text databases and documents. LuceGene is similar in concept to the widely used, commercially successful, bioinformatics program SRS (Sequence Retrieval System).<br />
<br />
It is built with the [http://jakarta.apache.org/lucene/ open-source Lucene package].<br />
<br />
It includes common text search features: booleans, phrases, word stemming, fuzzy and field range searches, relevance ranking. It supports data field structure of many kinds. Lucene is comparable to web-indexing systems such as Exite, Alta-vista, and Google.<br />
<br />
LuceGene adds these bio-data methods to Lucene:<br />
<br />
* Indexing adaptors for formats such as XML, PDF Documents, Biosequences, Spreadsheets, HTML, and others.<br />
* Configurations for bio-data include UniProt/Swiss-Prot, Fasta and GenBank sequences, BIND protein interactions, NCBI Gene Expression Omnibus, BLAST output tables, Medline.<br />
* Support for batch-list look-ups and searches is included, useful for data miners.<br />
* Web applications offer paged search results, batch downloads, search refinement and search-linking among data libraries.<br />
* Web Services support for data mining is included with a SOAP interface.<br />
* Output support includes field selection and formats such as Spreadsheet, XML, HTML via XSLT, and others.<br />
LuceGene is speedy with big data sets: Searching the UniProt library of 1.7 million sequences with LuceGene is a close equivalent to SRS in speed and content.<br />
Gene Annotation object search and retrieval with LuceGene is 10x to 20x faster than using a Postgres Chado database.<br />
LuceGene has been tested and works well with millions of documents from genome sequence, annotation and literature databases.<br />
*; '''Demo &amp; Screenshots'''<br /><br />
*:<br />
* [[Demo Screenshots]]<br />
* Demonstration server is available at [http://eugenes.org/demolucegene/ http://eugenes.org/demolucegene/]<br />
* FlyBase Search preview http://preview.flybase.net/lucegene/<br />
* euGenes genome search http://eugenes.org/lucegene/<br />
* Daphnia/wFleaBase search http://wfleabase.org/search/<br />
*; '''Requirements'''<br /><br />
*:<br />
* LuceGene requires Java versions 1.4 or later to compile and run.<br />
* A Java/JSP web server like [http://jakarta.apache.org/tomcat/ Jakarta Tomcat] is used for the web application.<br />
Jakarta Lucene software is included with this package, as are other required java libraries.<br />
*; '''Documentation'''<br /><br />
*:<br />
* [[LuceGene Readme]]<br />
* [[INSTALL.txt]] for demo webapp use<br />
* [[Indexing methods overview]]<br />
* Talk slides on Argos/LuceGene, Sept 2003:<br />[http://www.gmod.org/argos/gmod-argos-sep03.ppt PowerPoint FIX THIS LINK]<br />[http://www.gmod.org/argos/gmod-argos-sep03.pdf PDF FIX THIS LINK]<br />
*; '''Downloads'''<br /><br />
*:<br />
Current distribution files are at [http://sourceforge.net/project/showfiles.php?group_id=27707&package_id=120452 SourceForge] and http://eugenes.org/gmod/lucegene/<br />
* [http://prdownloads.sourceforge.net/gmod/lucegene.war lucegene.war]<nowiki>: web application archive </nowiki><br />
* lucegene-*-src.jar : sources, documents, configurations<br />
* [http://eugenes.org/gmod/lucegene/dist/ sample data] for lucegene.war as lucegene_demo*.zip<br />
*; '''Contact'''<br /><br />
*: email: lucegene AT eugenes.org<br /> Current developers: Don Gilbert, Paul Poole, and others<br />
<br />
<br />
<br />
<br />
<br />
Please note that [http://www.ebi.ac.uk/inc/help/search_help.html EBI's new search-everything "EB-eye"] is based on Lucene, as is the GMOD LuceGene project (http://www.gmod.org/lucegene), for the same reasons I would guess: it is fast, and works easily and well on huge, complex bio-data sets.<br />
<br />
Others are noticing that Chado-database user searches, whether for<br /> genome maps, reports, or other complex data, can be quite slow. Chado<br /> is a good management database, but lacks efficiency for web access to<br /> support many customers. Lucene has the ability to search genome<br /> reports, the range of bio-data (XML, sequence records, interaction<br /> data sets), GBrowse map data, etc.<br />
<br />
There is also a GBrowse-Lucene adaptor as part of the LuceGene<br /> project software (which works like the Mysql adaptor),<br /> that I use all the time in preference to Mysql.<br />
<br />
The GMOD/Turnkey web interface now has a Lucene search to avoid slow ChadoDB queries (albeit via an older c-lucene port; I find that Java Lucene can be run well from Perl (GBrowse)).<br />
<br />
...........<br />
<br />
EMBL-EBI News Dec 2006: Better, faster, easier â EMBL-EBI launches its<br /> new website with powerful search engine<br />
<br />
Behind this new web interface lies the âEB-eyeâ, a powerful<br /> search engine allowing instant searches of all the<br /> EBIâs databases from a single query.<br />
<br />
What is the EB-eye Search?<br /> The system is developed on top of the Apache Lucene project framework,<br /> which is an Open-source, high-performance, full-featured text search<br /> engine library written entirely in Java. It uses this technology to<br /> index EBI databases in various formats (e.g. flatfiles, XML dumps, OBO<br /> format, etc.) and provides very fast access to the EBI's data<br /> resources. The system allows the user to search globally across all<br /> EBI databases or individually in selected resources by using an<br /> Advance search.<br /> .......<br />
<br />
<br />
Flat list - collapsedFlat list - expandedThreaded list - collapsedThreaded list - expanded Date - newest firstDate - oldest first 10 comments per page30 comments per page50 comments per page70 comments per page90 comments per page Select your preferred way to display the comments and click "Save settings" to activate your changes.</div>165.124.152.78http://gmod.org/wiki/JavaSEANJavaSEAN2007-01-25T18:43:35Z<p>165.124.152.78: New page: ; '''Description'''<br /> : The javaSEAN tool is a standalone java application which was developed<br /> to facilitate integration and curation of sequence-level data from the<br /> nucl...</p>
<hr />
<div><br />
<br />
; '''Description'''<br /><br />
: The javaSEAN tool is a standalone java application which was developed<br /> to facilitate integration and curation of sequence-level data from the<br /> nucleotide sequence databases and the Drosophila literature; it also<br /> provides a means of graphically presenting these data. Using javaSEAN,<br /> curators create Annotated Reference Gene Sequence (ARGS) records,<br /> which tether various types of mappable data to a genomic sequence,<br /> including gene structure, mutation sites, polymorphic sites, insertion<br /> sites, regulatory elements, and rescue fragments.<br />
<br />
; '''Demo &amp; Screenshots'''<br /><br />
: [[screenshot]]<br />
<br />
; '''Requirements'''<br /><br />
: [[Requirements]]<br />
<br />
; '''Documentation'''<br /><br />
: [[Description]]<br />[[Architecture Diagram]]<br />[[Architecture Text]]<br />[[Setup]]<br />[[Sample Mapfile]]<br />
<br />
; '''Contact'''<br /><br />
: [[smutniak@morgan.harvard.edu]]<br />
<br />
; '''Downloads'''<br /><br />
: [http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/gmod/javaSean/ from cvs]<br />
<br />
<br />
<br />
{| id="attachments"<br />
! Attachment<br />
! Size<br />
|- class="dark"<br />
|<br />
[http://www.gmod.org/files/Architecture.gif Architecture.gif]<br />
| 29.02 KB<br />
|- class="light"<br />
|<br />
[http://www.gmod.org/files/Architecture.txt Architecture.txt]<br />
| 10.15 KB<br />
|- class="dark"<br />
|<br />
[http://www.gmod.org/files/Description.txt Description.txt]<br />
| 36.3 KB<br />
|- class="light"<br />
|<br />
[http://www.gmod.org/files/javaSeanScreen.jpg javaSeanScreen.jpg]<br />
| 199.19 KB<br />
|- class="dark"<br />
|<br />
[http://www.gmod.org/files/mapfile-sample.rpt mapfile-sample.rpt]<br />
| 819 bytes<br />
|- class="light"<br />
|<br />
[http://www.gmod.org/files/Requirements.txt Requirements.txt]<br />
| 3.11 KB<br />
|- class="dark"<br />
|<br />
[http://www.gmod.org/files/Setup.txt Setup.txt]<br />
| 3.81 KB<br />
|}</div>165.124.152.78http://gmod.org/wiki/Insertional_Mutagenesis_Database_(IMDB)Insertional Mutagenesis Database (IMDB)2007-01-25T18:43:33Z<p>165.124.152.78: New page: NOTE: this project is no longer supported ; '''Description'''<br /> : IMDB stores<br /> the sequenced insertion sites for model organisms, provides users with visualization tools to sea...</p>
<hr />
<div><br />
<br />
NOTE: this project is no longer supported<br />
<br />
; '''Description'''<br /><br />
: IMDB stores<br /> the sequenced insertion sites for model organisms, provides users with visualization tools to search for insertions in gene of interest and to track progress of multi-project effort to find insertions in each gene of a species, and facilitate the study of insertion site distributions on a geneome. It is suitable for high-throughput insertional mutagenesis projects, such as those mediated by transposons, T-DNAs and other knockout techniques. This project was developed from ATIDB, the Arabidopsis thaliana insertion database.<br />
<br />
; '''Demo &amp; Screenshots'''<br /><br />
: See [http://atidb.cshl.org/ http://atidb.cshl.org] for a database of Arabidopsis insertional mutants.<br />
<br />
; '''Requirements'''<br /><br />
: MySQL, gbrowse and Perl.<br />
<br />
; '''Documentation'''<br /><br />
: [http://sourceforge.net/project/shownotes.php?release_id=157273 Release notes]<br />
<br />
; '''Contact'''<br /><br />
: [[addresshere]].<br />
<br />
; '''Downloads'''<br /><br />
: [http://sourceforge.net/project/showfiles.php?group_id=27707 From SourceForge]</div>165.124.152.78http://gmod.org/wiki/Genome_Directory_System_(GDS)Genome Directory System (GDS)2007-01-25T18:43:27Z<p>165.124.152.78: New page: This is a place holder for GDS. The authors of this project have indicated that "This project is on hold until more funding and development time can be found."</p>
<hr />
<div><br />
<br />
This is a place holder for GDS. The authors of this project have indicated that "This project is on hold until more funding and development time can be found."</div>165.124.152.78http://gmod.org/wiki/CitrinaCitrina2007-01-25T18:43:23Z<p>165.124.152.78: New page: ; '''Description''' : Citrina (sih-TREE-nuh) is a database management tool that automates the mirroring and processing of databases that are distributed via ftp servers. It is built aro...</p>
<hr />
<div><br />
<br />
; '''Description'''<br />
<br />
: Citrina (sih-TREE-nuh) is a database management tool that automates the mirroring and processing of databases that are distributed via ftp servers. It is built around the Ant java build tool making it very flexible and portable. Citrina only provides the basic functionality for mirroring but can easily be extended to do other tasks. For example, with Citrina you could mirror the uniprot database to your local system, generate fasta files, create the blast dbs, and run blast on a set of proteins you are interested in. Or it can be used to transfer Chado SQL dumps between organism sites and automatically populate the postgres database via Ant's SQL tasks. Ant can also execute external scripts so Citrina can take advantage of any existing processing tools that you have already developed.<br />
<br />
; '''Requirements'''<br /><br />
: Java 1.4.x (http://java.sun.com/j2se/)<br /> Ant 1.6.x ([http://ant.apache.org http://ant.apache.org/])<br /> Wget 1.6 or higher (http://www.gnu.org/software/wget/wget.html)<br /> GNU Tar 1.13 or higher (http://www.gnu.org/software/tar/tar.html)<br /> Gzip 1.3.3 or higher (http://www.gzip.org/)<br /> Bzip2 0.9.0 or higher (http://sources.redhat.com/bzip2/)<br />
<br />
Tar, Gzip, and Bzip2 are only needed if you need to extract files that use those compression formats.<br />
<br />
Citrina has only been tested on Redhat Linux 9, Solaris 2.8, and SuSe 9.1. Other Unices<br /> including Mac OS X should also work but they have not been tested. It will not run on Windows due<br /> to its unique use of symbolic links.<br />
<br />
; '''Documentation'''<br /><br />
: [[FAQ]]<br />[[Quick Start Guide]]<br />[[Citrina User Guide]]<br />
<br />
; '''Downloads'''<br /><br />
: [http://prdownloads.sourceforge.net/gmod/citrina-0.5.1.tar.gz?download citrina-0.5.1.tar.gz]<br />[http://prdownloads.sourceforge.net/gmod/citrina-0.5.1-src.tar.gz?download citrina-0.5.1-src.tar.gz]<br />
<br />
; '''Mailing List'''<br /><br />
: [http://sourceforge.net/mailarchive/forum.php?forum=gmod-citrina Citrina discussion list]<br />
<br />
; '''Contact'''<br /><br />
: Josh Goodman (jogoodma AT bio DOT indiana DOT edu )<br />
<br />
<br />
<br />
* [[Citrina FAQ]]<br />
* [[Citrina Quickstart]]<br />
<br />
<br />
<br />
{| id="attachments"<br />
! Attachment<br />
! Size<br />
|- class="dark"<br />
|<br />
[http://www.gmod.org/files/userguide.pdf userguide.pdf]<br />
| 71.98 KB<br />
|}</div>165.124.152.78http://gmod.org/wiki/ArgosArgos2007-01-25T18:43:21Z<p>165.124.152.78: New page: ; '''Description'''<br /> : Argos, a.k.a. Flybase-NG, a.k.a. biodb, is designed to provide automatic<br /> replication, installation and updates of genome and organism databases<br /> a...</p>
<hr />
<div><br />
<br />
; '''Description'''<br /><br />
:<br />
<br />
Argos, a.k.a. Flybase-NG, a.k.a. biodb, is designed to provide automatic<br /> replication, installation and updates of genome and organism databases<br /> and information servers, including FlyBase and euGenes. It should be not<br /> too difficult to add other organism/genome services to this replication<br /> structure.<br />
<br />
Its main value is a collection of pre-tested and implemented<br /> common database/information service tools needed for organism database<br /> systems, which can be automatically distributed and updated to any<br /> computer.<br />
<br />
The replication includes scripts, configurations, data, and Unix binaries<br /> for all needed programs except Perl, Java and rsync. Rsync is<br /> used as the primary distribution program.<br />
<br />
This is server distribution system is still in development. It<br /> will be possible to use this for automated updates of mirrored<br /> servers. Uses of an automated server distribution system include<br /> local use for load distribution (apache backhand module is included<br /> for this), world-wide mirror sites for rapid local access,<br /> institution/company mirror servers for local projects and<br /> data mining. An automatable mirroring system differs from the<br /> method of providing software and data downloads by FTP in that<br /> packages of data and software in this system are kept up-to-date<br /> without human intervention. Similar package management systems<br /> such as RPM, pacman and others are well developed tools but don't<br /> quite meet the needs of this bio-database distribution.<br />
<br />
The basic system structure is:<br />
<br />
<br />
common/<br />
java/ ; perl/ -- language packages<br />
servers/ -- major programs (blast, dbms, internet servers)<br />
systems/ -- operating system binaries of programs, packages<br />
docs/ -- general documents<br />
logs/ -- server logs<br />
template/ -- template information system structure<br />
flybase/ -- implemented genome information system structures<br />
eugenes/<br />
daphnia/<br />
<br />
This design allows segregation common infrastructure from<br /> project-specific parts. Projects may contain any needed software<br /> along with data, web docs, database files, etc. A common symbolic link<br /> folder in each project is used to access the common software structure.<br />
<br />
Per-package installations and updates are available, to<br /> allow customer choices of packages.<br /> This includes logic to update infrastructure software from<br /> different source sites, and focuses on using rsync as<br /> primary distribution/update tool (ftp, http, others are possible;<br /> rsync has needed file-system aware updating methods).<br />
<br />
Evaluation of RPM, pacman, cluster-backup/mirror tools,<br /> grid packaging tools found none were quite right, so a 'quick hack'<br /> perl installation program has been built.<br />
<br />
; '''Developer notes'''<br /><br />
: Current developers are Don Gilbert, Nihar Sheth and Victor<br /> Strelets for FlyBase-NG and euGenes uses. We hope others will<br /> try it and join us in using and developing it. Email us at<br /> argos@eugenes.org or flybase-ng@flybase.net<br />
<br />
Contents in cvs.gmod.sourceforge.net:/cvsroot/gmod/argos/ for this project are<br /> installation and configuration files. CVS is not designed for storage<br /> and distribution of bulk data, program binaries, and the many package<br /> installations included in Argos repositories.<br />
<br />
For this, argos/install/packages.conf configurations point to source servers<br /> for fetching ready-to-use packages, similar to the distribution system<br /> used by Globus.org for grid computing packages that are distributed to<br /> multiple grid node computers.<br />
<br />
Also it is presumed that each implemented service will maintain software<br /> and documents separately from Argos, as the open-source software<br /> collected into the Argos commons are separately maintained, but installed<br /> for use with Argos.<br />
<br />
GMOD developers can add new package sets to the<br /> argos/install/packages.conf which point to rsync servers for<br /> the packages.<br />
<br />
; '''Demo &amp; Screenshots'''<br /><br />
: Genome information systems running in Argos are at<br />http://flybase.net/flybase-ng/<br /><br /><br /> A slide set outlines Argos/FlyBase-NG here: flybase-ng-may03.<br />[[ppt]]<br />[[html]]<br /><br /><br /> These are overviews of FlyBase's server system structures:<br />[[last generation]]<br /> -- [[next generation]]<br />
<br />
; '''Requirements'''<br /><br />
: A current Unix computer, with several free Gigabytes of disk space, depending<br /> on which system packages are to be installed. The following software needs<br /> to be pre-installed on the system. Argos includes all other packages needed<br /> for its operation, drawn from common open-source software tools and packages used for<br /> bioinformatics databases and information systems.<br />
These packages need to be and commonly are preinstalled<br />
:* Perl v5.6 or later - http://www.perl.com/<br />
:* Java v1.3 or later - http://java.sun.com/<br />
:* rsync v2.5 or later - http://rsync.samba.org/<br />
The Argos system will replicate updates to compiled programs for these<br /> operating systems, obviating need for any human-attended compiling and<br /> installation. Unix systems that have binary package support are:<br />
:* Apple MacOSX (v10.2 build)<br />
:* Intel Linux (kernel 2.4 build)<br />
:* Sun Solaris (v8 build)<br />
In this alpha 0.3 (june 2003) release,<br /> installation of common Argos packages uses ~ 200 MB of disk.<br /> Installation of a full FlyBase service uses ~ 2.5 GB of disk.<br /> Installation of a full euGenes service uses ~ 4 GB of disk.<br />
; '''Documentation'''<br /><br />
:<br />
<br />
'''Quick start:'''<br /> Fetch<br />[http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/gmod/argos/install/installng.pl?rev=HEAD <br /> argos/install/installng.pl]<br /> and run from a command line ('perl installng.pl').<br />
<br />
==== Summary of steps to installation of a Argos server system ====<br />
<br />
# Fetch the install script from a command line with<br /><br /><code>rsync rsync://flybase.net/biodb/install/installng.pl .</code><br /> (or use web link on this page)<br />
# Run <code>perl installng.pl</code><br /><br /> for summary help.<br />
# Run <code>perl installng.pl -root=/usr/local/biodb -install</code><br /><br /> to create root folder and fetch the installation package<br /> (location for -root= is your choice; change below steps to match)<br />
# Edit <code>/usr/local/biodb/install/install.conf.local</code><br /><br /> to set configuration. Change package set, paths and ports<br /> as desired in this install.conf.<br />
# Run <code>/usr/local/biodb/install/installng.pl -install </code><br /><br /> to add the full set of packages. Packages selected from<br /> packages.conf will by copied from servers.<br />
# Run <code>/usr/local/biodb/install/run-apache</code><br /><br /> to start servers<br />
# Run <code>/usr/local/biodb/install/installng.pl -update </code><br /><br /> to update server periodically.<br />
<br />
; '''Downloads'''<br /><br />
:<br />
<br />
* Argos-based servers:<br />http://flybase.net/flybase-ng/<br /> for FlyBase Next Generation,<br /> euGenes genome database,<br /> and other services in development.<br />
* Main package distribution: <u>rsync://flybase.net/biodb</u><br />
with project-specific packages distributed from<br /> other servers, as specified in the<br />[http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/gmod/argos/install/packages.conf?rev=HEAD <br /> argos/install/packages.conf<br />]</div>165.124.152.78http://gmod.org/wiki/Gene_Expression_VisualizationGene Expression Visualization2007-01-25T18:43:08Z<p>165.124.152.78: New page: Components for visualizing gene expression results; currently focused on microarray based experments. * Caryoscope * GeneXplorer * Insertional Mutagenesis Database (IMDB) ...</p>
<hr />
<div><br />
<br />
Components for visualizing gene expression results; currently focused on microarray based experments.<br />
<br />
<br />
<br />
* [[Caryoscope]]<br />
* [[GeneXplorer]]<br />
* [[Insertional Mutagenesis Database (IMDB)]]<br />
* [[Java TreeView]]</div>165.124.152.78http://gmod.org/wiki/Ontology_VisualizationOntology Visualization2007-01-25T18:42:45Z<p>165.124.152.78: </p>
<hr />
<div><br />
<br />
This is a stub.<br />
<br />
Other sites that might be of interest:<br />
<br />
* [http://song.sourceforge.net/ Sequence Ontology]<br />
* [http://www.geneontology.org/ Gene Ontology]<br />
* [http://obo.sourceforge.net/ Open Biomedical Ontologies (OBO)]<br />
<br />
<br />
<br />
* [[GO Graphic Viewer]]</div>165.124.152.78http://gmod.org/wiki/BioMartBioMart2007-01-25T18:42:34Z<p>165.124.152.78: New page: ; '''Description'''<br /> : BioMart is a robust, query-oriented data integration system, based on distributed data warehousing ideas. The system can be applied to a single or multiple da...</p>
<hr />
<div><br />
<br />
; '''Description'''<br /><br />
: BioMart is a robust, query-oriented data integration system, based on distributed data warehousing ideas. The system can be applied to a single or multiple databases. It supports scalable large scale querying of individual databases as well as query-chaining between them. All datasources in the system comply with the BioMart data model - a simple, query optimised database schema. The system consists of database schema specification, administration tools for deploying and configuring mart-spec databases and data access software, which includes web and standalone interfaces. The BioMart suite is still under developement.<br />
<br />
; '''Demo &amp; Screenshots'''<br /><br />
: The BioMart web interface at EBI can be found [http://www.ebi.ac.uk/biomart/martview here]. For Ensembl implemetation look [http://www.ensembl.org/Multi/martview here]. The screenshots of standalone interfaces and admin tools can be found [http://www.ebi.ac.uk/biomart/system.html here]<br />
<br />
; '''Requirements'''<br /><br />
: The requirements will differ dependent on which part of the BioMart suite you want to install. For more details look [http://www.ebi.ac.uk/biomart/install.html here]<br />
<br />
; '''Documentation'''<br /><br />
: The project documentation is still scanty but for pointers go to [http://www.ebi.ac.uk/biomart BioMart home]<br />
<br />
; '''Contact'''<br /><br />
: The contact details can be found [http://www.ebi.ac.uk/biomart/contact.html here].<br />
<br />
; '''Downloads'''<br /><br />
: The download/installation instructions can be found [http://www.ebi.ac.uk/biomart/install.html here]</div>165.124.152.78http://gmod.org/wiki/Restriction_Graphic_ViewerRestriction Graphic Viewer2007-01-25T18:42:32Z<p>165.124.152.78: New page: ; '''Description'''<br /> : The Restriction Graphic Viewer is a simple web_based restriction analysis tool. It is a component of the Generic Model Organism Systems Database project (GMOD...</p>
<hr />
<div><br />
<br />
; '''Description'''<br /><br />
: The Restriction Graphic Viewer is a simple web_based restriction analysis tool. It is a component of the Generic Model Organism Systems Database project (GMOD.sourceforge.net).<br />
<br />
; '''Demo &amp; Screenshots'''<br /><br />
: [http://seq.yeastgenome.org/cgi-bin/GMOD/RestGraph.pl Demo at Stanford]<br />
<br />
; '''Requirements'''<br /><br />
: The Apache web server ([http://www.apache.org http://www.apache.org/])<br /><br />
: Perl 5.005 or higher.<br /><br />
: GD module.<br /><br />
: Bioperl Bio::Root::IO, Bio::PrimarySeq, Bio::Seq, and Bio::Tools::RestrictionEnzyme.<br />
<br />
; '''Documentation'''<br /><br />
: N/A<br />
<br />
; '''Contact'''<br /><br />
: [[Shuai Weng]].<br />
<br />
; '''Downloads'''<br /><br />
: [http://sourceforge.net/project/showfiles.php?group_id=27707 From SourceForge]</div>165.124.152.78http://gmod.org/wiki/PubSearchPubSearch2007-01-25T18:42:30Z<p>165.124.152.78: New page: ; '''Description'''<br /> : PubSearch is a web-based literature curation tool to allow curators to search and annotate genes to keywords from articles. It has a simple, MySQL database ba...</p>
<hr />
<div><br />
<br />
; '''Description'''<br /><br />
: PubSearch is a web-based literature curation tool to allow curators to search and annotate genes to keywords from articles. It has a simple, MySQL database backend and uses a set of Java Servlets and JSPs for querying, modifying, and adding gene, gene-annotation, and literature information.<br />
<br />
; '''Demo &amp; Screenshots'''<br /><br />
: See [http://tesuque.stanford.edu:9999/pubdemo this page] for a web-based demo of the production version. The user name is "demo" and the password is "demo".<br />
<br />
; '''Requirements'''<br /><br />
: A Java servlet engine satisfying the Servlet 2.3 and JSP 1.2 specs; Tomcat 4.0 is an example of a supporting servlet engine.<br />
<br />
; '''Documentation'''<br /><br />
: [http://pubsearch.org/releases/install_guide.pdf Install docs]<br /><br />
: Online documentation [http://tesuque.stanford.edu/pubsearch.org/pubsearchFrame.html is available]<nowiki>; please submit bug reports to the SourceForge </nowiki>[http://sourceforge.net/tracker/?func=add&group_id=27707&atid=511475 bug tracker].<br />
<br />
; '''Contact'''<br /><br />
: Danny Yoo ([[dyoo@acoma.stanford.edu]]).<br />
<br />
; '''Downloads'''<br />
<br />
: [http://sourceforge.net/project/showfiles.php?group_id=27707&package_id=100560&release_id=238168 From sourceforge]</div>165.124.152.78http://gmod.org/wiki/Pathway_ToolsPathway Tools2007-01-25T18:42:25Z<p>165.124.152.78: New page: ; '''Description'''<br /> : The Pathway Tools software is a bioinformatics software system for pathway analysis of genomes, and for creating Pathway/Genome Databases (PGDBs). A pathway/g...</p>
<hr />
<div><br />
<br />
; '''Description'''<br /><br />
: The Pathway Tools software is a bioinformatics software system for pathway analysis of genomes, and for creating Pathway/Genome Databases (PGDBs). A pathway/genome database (PGDB), such as [http://biocyc.org/ecoli/ EcoCyc,] is a bioinformatics DB that integrates genomic data with detailed functional annotations of the genome, such as descriptions of metabolic and signaling pathways. A PGDB is a type of model-organism DB.<br />
<br />
Pathway Tools supports extensive functionality including prediction, interactive editing, querying, and visualization of metabolic pathways and related datatypes including reactions, metabolites, and enzymes. It also includes query, visualization, and editing support for operons, genes, proteins, and chromosomes.<br />
<br />
Pathway Tools can be coupled with other genome browsers to add support for pathways to an existing MOD.<br />
<br />
Pathway Tools was developed by [http://www.ai.sri.com/pkarp/ Peter D. Karp] and coworkers at the [http://bioinformatics.ai.sri.com/ Bioinformatics Research Group] at SRI International.<br />
<br />
; '''Demo &amp; Screenshots'''<br /><br />
: See [http://biocyc.org/samples.shtml Samples]<br />
<br />
; '''Requirements'''<br /><br />
: Sun workstation, Windows2000 or XP, or Linux.<br />
<br />
; '''Documentation'''<br /><br />
: See the [http://bioinformatics.ai.sri.com/ptools/ Pathway Tools Information Site] for links.<br />
<br />
; '''Contact'''<br /><br />
: [[ptools-support@ai.sri.com]].<br />
<br />
; '''Downloads'''<br /><br />
: [http://biocyc.org/download.shtml biocyc.org]</div>165.124.152.78http://gmod.org/wiki/Org.bdgpOrg.bdgp2007-01-25T18:42:17Z<p>165.124.152.78: New page: ; '''Description'''<br /> : The org.bdgp toolkit is a library of useful Java classes shared by several pieces of gmod software. ; '''Requirements'''<br /> : JDK 1.2 or higher. ; '''Con...</p>
<hr />
<div><br />
<br />
; '''Description'''<br /><br />
: The org.bdgp toolkit is a library of useful Java classes shared by several pieces of gmod software.<br />
<br />
; '''Requirements'''<br /><br />
: JDK 1.2 or higher.<br />
<br />
; '''Contact'''<br /><br />
: [[addresshere]].<br />
<br />
; '''Downloads'''<br /><br />
: [http://sourceforge.net/project/showfiles.php?group_id=27707 From SourceForge.]</div>165.124.152.78http://gmod.org/wiki/GO_Graphic_ViewerGO Graphic Viewer2007-01-25T18:42:15Z<p>165.124.152.78: New page: ; '''Description'''<br /> : The GO Graphic Viewer module (Bio::GMOD::GO::View) generates a graphic that displays the parent and child relationships of a selected GO term. It also provide...</p>
<hr />
<div><br />
<br />
; '''Description'''<br /><br />
: The GO Graphic Viewer module (Bio::GMOD::GO::View) generates a graphic that displays the parent and child relationships of a selected GO term. It also provides the visualization for the result from the GO::TermFinder perl module created by the Stanford Microarray Database (SMD). This module is useful when analyzing experimental or computational results that produce a set of gene products that may have a common function or process. This distribution also includes two examples of its use in web-based user interfaces (goView.pl and goTermFinder.pl). It is a component of the Generic Model Organism Systems Database project (GMOD.sourceforge.net).<br />
<br />
; '''Demo &amp; Screenshots'''<br /><br />
: [http://seq.yeastgenome.org/cgi-bin/GMOD/goTermFinder.pl Demo at Stanford]<br />
<br />
; '''Requirements'''<br /><br />
: The Apache web server ([http://www.apache.org http://www.apache.org/])<br /><br />
: Perl 5.005 or higher.<br /><br />
: GD and CGI modules.<br /><br />
: GraphViz<br /><br />
: GO::AnnotationProvider::AnnotationParser<br /><br />
: GO::OntologyProvider::OntologyParser<br /><br />
: GO::TermFinder<br /><br />
: Bioperl Bio::Root::IO<br />
<br />
; '''Documentation'''<br /><br />
: N/A<br />
<br />
; '''Contact'''<br /><br />
: [[Shuai Weng]]<br />
<br />
; '''Downloads'''<br /><br />
: [http://sourceforge.net/project/showfiles.php?group_id=27707 From SourceForge]</div>165.124.152.78http://gmod.org/wiki/GeneXplorerGeneXplorer2007-01-25T18:42:08Z<p>165.124.152.78: New page: ; '''Description'''<br /> : GeneXplorer is a web application that allows clustered microarray<br /> data to be browsed interactively via the web, and can be used either<br /> for resear...</p>
<hr />
<div><br />
<br />
; '''Description'''<br /><br />
:<br />
<br />
GeneXplorer is a web application that allows clustered microarray<br /> data to be browsed interactively via the web, and can be used either<br /> for research purposes, or for providing web supplements to accompany<br /> microarray publication.<br />
<br />
; '''Demo &amp; Screenshots'''<br /><br />
:<br />
<br />
GeneXplorer has been used to provide several web supplements for<br /> papers arising from data in the Stanford Microarray Database. As an<br /> example, see:<br />
<br />
[http://microarray-pubs.stanford.edu/cgi-bin/gx?n=prostate1&rx=5 J Lapointe, C Li, JP Higgins, M van de Rijn, E Bair, K Montgomery, M Ferrari, L Egevad, W Rayford, U Bergerheim, et al: Gene expression profiling identifies clinically relevant subtypes of prostate cancer. Proc Natl Acad Sci U S A 2004, 101:811-6.]<br />
<br />
; '''Requirements'''<br /><br />
:<br />
<br />
GeneXplorer is written in Perl, and thus requires a system that is<br /> capable of running Perl.<br /> GeneXplorer also requires the following modules:<br />
<br />
# [http://search.cpan.org/dist/GD/ GD]<br />
# [http://search.cpan.org/dist/Getopt-Long/ Getopt::Long]<br />
<br />
In addition, GeneXplorer requires the C-program, correlations, which<br /> is included in the distribution. This must be compiled with an ANSI<br /> compatible compiler, such as [http://gcc.gnu.org/ gcc.]<br />
<br />
The Clustered datafiles must be created using a clustering program<br /> that produces files in the [http://smd.stanford.edu/help/formats.shtml#cdt cdt format].<br /> Such software includes Mike Eisen's [http://rana.lbl.gov/EisenSoftware.htm Cluster], or [http://genetics.stanford.edu/~sherlock/cluster.html XCluster].<br />
<br />
; '''Documentation'''<br /><br />
: See the README file at the download site (see below)<br />
<br />
; '''Contact'''<br /><br />
: [[Gavin Sherlock]].<br />
<br />
; '''Downloads'''<br /><br />
: GeneXplorer is freely available under the MIT license from [http://search.cpan.org/dist/Microarray-GeneXplorer/ CPAN].</div>165.124.152.78http://gmod.org/wiki/GenoGridGenoGrid2007-01-25T18:41:51Z<p>165.124.152.78: New page: '''Genome analysis and annotation via Grid computing''' This subproject builds re-usable tools and workflows for genome analyses and annotation, using shared cyberinfrastructure (Grids ...</p>
<hr />
<div><br />
<br />
'''Genome analysis and annotation via Grid computing'''<br />
<br />
This subproject builds re-usable tools and workflows for genome analyses and annotation, using shared cyberinfrastructure (Grids or clusters). Here within are collections of scripts, documents and workflows for employing existing genome analysis tools (BLAST, homology tools, predictors, comparative and phylogenetic analyses) on available cyberinfrastructure. One emphasis here is on simplified use of grids and genome tools, to make it feasible for new genome projects to take advantage of these readily.<br />
<br />
REPOSITORY<br />[http://gmod.cvs.sourceforge.net/gmod/genogrid/ <br /> http://gmod.cvs.sourceforge.net/gmod/genogrid/]</div>165.124.152.78http://gmod.org/wiki/Apollo-Chado_example_databaseApollo-Chado example database2007-01-25T18:41:46Z<p>165.124.152.78: New page: There is an example database with GFF3 source files, a log of the commands I ran to build the database, database dumps at a few steps of building the database, and an Apollo chado-adaptor....</p>
<hr />
<div>There is an example database with GFF3 source files, a log of the commands I ran to build the database, database dumps at a few steps of building the database, and an Apollo chado-adaptor.xml config file in the [http://gmod.cvs.sourceforge.net/gmod/schema/chado/modules/sequence/apollo-bridge/sample_db/ schema cvs]. A local checkout of that repository is available from [[this webserver]]. The README (command log) is here:<br />
<br />
<br />
The sample database that used to be in this directory has been moved to its<br />
own cvs repository in the gmod cvs, in a repository called 'sample_dbs'.<br />
<br />
Scott Cain<br />
1/9/07</div>165.124.152.78http://gmod.org/wiki/Comparison_of_XORT_and_Hibernate_for_Chado_reportingComparison of XORT and Hibernate for Chado reporting2007-01-25T18:41:07Z<p>165.124.152.78: New page: '''Comparison of XORT and [http://www.hibernate.org/ Hibernate ] for Chado reporting''' - written by Josh Goodman, FlyBase - Indiana University '''Introduction'''<br /> At [http://f...</p>
<hr />
<div><br />
<br />
'''Comparison of [[XORT]] and [http://www.hibernate.org/ Hibernate ] for Chado reporting''' - written by Josh Goodman, FlyBase - Indiana University<br />
<br />
'''Introduction'''<br /> At [http://flybase.org/ FlyBase] we are currently in the process of migrating all our existing data into Chado. In order to deal with data in this new format we are re-vamping all of our report generation tools. The qualities we were looking for in a new reporting framework were a good balance of speed, flexibility, and minimizing the amount of in house code that needed to be written. The term "reporting" here is referring to the presentation aspect of Chado data to end users, i.e. a gene page, an allele page, etc...<br />
<br />
Thus, we evaluated two approaches to reporting Chado data, XORT and Hibernate. XORT was chosen because it is already used with Chado and provides a nice language neutral interface for extracting your data from Chado into XML. Hibernate was chosen because it is one of the most mature and stable object to relational mapping tools available. It is very well documented, maintained by a large community, and can mostly be tweaked through its XML files rather than modifying java code. [http://ibatis.org/ iBATIS] was also investigated but no formal tests were done with it, more on that later.<br />
<br />
In the end XORT proved to be the better choice, but only because it excelled in areas that were most important to us. Other situtations may not be the same so please don't take this case study literally without carefully weighing your needs and expectations. We hope that our experience can prove to be useful in this respect.<br />
<br />
'''Tools Used'''<br />
<br />
* [http://hibernate.org/ Hibernate 3.0]<br />
* [http://sourceforge.net/project/showfiles.php?group_id=27707&package_id=148718 XORT 0.001]<br />
* [http://eclipse.org/ Eclipse Java IDE 3.1.1]<br />
* [http://www.jboss.com/products/jbosside/downloads JBOSS Eclipse plugin v1.5]<br />
* [http://www.postgresql.org/ Postgres 8.0]<br />
<br />
'''Setting up Hibernate'''<br /> In order to use Hibernate you need two things, the Hibernate XML mapping files and java code to set up the objects that will be populated. Hibernate mapping files are usually set up by hand but the JBOSS Eclipse IDE has a nice tool that will read your database schema and generate them for you. This is also the first place we encountered a small problem.<br />
<br />
The chado schema has many indices that are not explicitly named and on older versions of Postgres (7.x and maybe some early 8.0.x) indices that are not named will automatically be given a name of $1, $2, $3, etc... The problem is that it doesn't check if it has already used those names for other tables and so you end up with indices that are named in duplicate. The JBOSS plugin doesn't like this and so it dies when reading a schema with duplicate names. We dropped the schema, named the indices, and reimported to correct this problem. Newer versions of Postgres (8.0.x, 8.1) use unique names when creating indices that aren't explicitly named.<br />
<br />
Once this was fixed, the JBOSS hibernate plugin expertly read our schema, generated all of our XML mapping files, and the necessary java code.<br />
<br />
'''Setting up XORT'''<br /> Setting up XORT is fairly simple if you've installed perl modules before. The trickiest part is making sure that the ddl.properites file that describes your schema matches the actual schema that is in the database. Once that is done all you need to do is write a dumpspec to dump the data you want.<br />
<br />
'''Results'''<br /> The test plan was fairly simple, it consisted of first working with a single table and adding linked table information one by one to see how each system scaled. The hub table we started with was the pub table with ~130,000 records, it is fairly simple and we had a nice test data set available. For Hibernate we setup a query that fetched all the publication records and all their fields and for XORT we setup a dumpspec that did the same. Since XORT also fetches the cvterm table by default we modified the Hibernate query to fetch the same.<br />
<br />
For this simple test case we did 5 runs each and Hibernate took 181 seconds and XORT took 372 on average. The advantage here can best be explained by the caching strategies used by Hibernate when dealing with the cvterm. XORT is executing a query to the cvterm table for each pub record it encounters, whereas Hibernate caches the hits and only queries the cvterm table when it finds an entry it hasn't cached.<br />
<br />
One problem we did have with Hibernate was with its session based cache because it was trying to keep a copy of each pub object as we scanned the entire pub table. To get around this we had to explicitly cast the pub object out of the session cache after we were done processing it.<br />
<br />
The next table we added was the feature and organism tables linked via the feature_pub table. This time Hibernate took 402 seconds and XORT took 546 seconds on average. Hibernate is still out performing XORT but not by much. The next table added was the pubauthor table, for this case Hibernate's performance advantage went away taking 1800 seconds vs XORT's 780 seconds. This huge change with such a simple table took us by surprise. A single cause couldn't be pinpointed but it is thought that a mix of the Hibernate table prefetching and cache performance caused most of it. By this point we had to start using a disk based cache for some of the objects and this caused a lot of disk IO. Several attempts to bring this time down by tweaking various Hibernate parameters failed and further table additions got exponentially worse compared to XORT.<br />
<br />
Another possible cause of performance problems is the fact that, by default, when an object is fetched you get all fields of that object populated. Thus, if you are simply wanting a list of all feature names and their type that are related to a particular publication what you end up getting back is a fully populated feature object with name, type, sequence, length, etc... Fetching these additional fields can put a lot of overhead on a query and caching.<br />
<br />
There are two options for getting around this field fetching problem. First, you can customize the XML mapping files to set, on a field by field basis, whether or not it is retrieved by default or not. The problem with this is that for our purposes the optimal fetching strategy is going to change depending on the task we are carrying out. i.e. When querying/dumping out features for reporting we may want to get all fields by default and a only a sub set of them when fetching features attributed to publications. We could create a set of mapping files for a table based on the different strategies but this would make our application overly complex and hard to maintain in the long term.<br />
<br />
The second approach is to use what Hibernate calls projection queries. They amount to:<br />
<br />
select new Feature(name,type) from feature where uniquename='FBgn0000001'<br />
<br />
This approach requires additional POJO code, is much less flexible, and is essentially doing things the iBATIS way without the flexibility that iBATIS provides so we saw little point of trying this method. On a side note, we did not evaluate iBATIS because it required a greater degree of direct java code manipulation than Hibernate. We wanted a solution that all members of our dev team could edit and maintain rather than have this responsibility sit with a few key people who know java. iBATIS itself looked very capable and excelled at being less complicated in certain areas where Hibernate can make your head spin. Other groups who aren't concerned with committing your group to maintaining Java code in the long term should definitely give it a look.<br />
<br />
'''Conclusion'''<br /> In conclusion, we chose XORT over Hibernate because it provides a language neutral interface and has good performance when dealing with a realistic amount of tables compared to Hibernate. Hibernate's forte is geared more towards a fetch/modify/update workflow and working with numbers of objects on the scale of 1 to a few hundred at a time and not tens or hundreds of thousands. We often felt like we were going against the Hibernate grain by trying to setup this reporting system with a large number of objects. Thus if you are working on applications that fit this model it might be a good system to evaluate. It provides so much functionality out of the box like advanced caching, application level transactions, and much more that it is worth considering. Hibernate's query lanaguage (HQL) does take a small amount of time to get used to but it is rich enough to provide almost as much flexibility as standard SQL. If you do find it limiting for some things it is possible to place SQL in the mapping files to get around the limitations.<br />
<br />
XORT would greatly benefit by borrowing some of the strategies used by iBATIS and Hibernate such as a caching layer to reduce the impact of redundant calls to tables. It also needs to have improved documentation so that the barrier for using it is lowered for those who may not be familiar with Chado and how it is structured. A tutorial with a few use case scenarious that describe what each line does would be immensely helpful. Once you become familiar with Chado's structure, writing dumpspecs is fairly straightforward. Overall, these are minor short comings and we were pleased with XORT.</div>165.124.152.78http://gmod.org/wiki/Summary_of_Fall_2005_CHSL_MeetingSummary of Fall 2005 CHSL Meeting2007-01-25T18:41:03Z<p>165.124.152.78: New page: '''GMOD Architecture Work Group Discussion (Fall 2005 meeting)''' '''During the meeting it was suggested that we have a common<br /> database against which to test software. A first ste...</p>
<hr />
<div><br />
<br />
'''GMOD Architecture Work Group Discussion (Fall 2005 meeting)'''<br />
<br />
'''During the meeting it was suggested that we have a common<br /> database against which to test software. A first step would be to<br /> provide Postgres dumps of existing databases.'''<br />
<br />
Allen Day suggested that he will provide a Postgres dump of the database for human, yeast and mouse.<br />
<br />
Dave Emmert from Flybase also agreed to give a Postgres dump.<br />
<br />
Kara Dolinski will provide yeast dumps.<br />
<br />
I agreed to make a wiki page which links to said dumps when they are up.<br />
<br />
'''The testing recommendations were reviewed.'''<br />
<br />
'''During a conversation about testing, Interface testing was brought up.'''<br />
<br />
Brain OâConnor agreed on trying out some tools for interfaces testing.<br /> Http::Unit, Apache::Test were suggested at the meeting. Eric Just<br /> recommended looking into Selenium.<br />
<br />
'''Someone suggested having resident experts on different testing frameworks to answer questions.'''<br />
<br />
Gavin Sherlock was suggested as the âperl testing expertâ. Danny Yoo<br /> volunteered to be a contact for JUnit, Clover, log4J, PMD and testNG<br /> (mentioned in the Java testing recommendation).<br />
<br />
'''On the topic of Logging and Debugging:'''<br />
<br />
Allen Day suggested using log4perl a very nice framework for debugging.<br />
<br />
'''IBatis VS Hibernate'''<br />
<br />
Jeff Bowes agreed on doing some use cases for IBatis and we have to<br /> find some one who will take the responsibility on doing Hibernate use<br /> case. Adrian from GeneDB is mentioned (volunteered). If these two<br /> volunteers can agree on some common use cases and implement, the rest<br /> of the group can then evaluate the technologies better.</div>165.124.152.78http://gmod.org/wiki/Software_Testing_RecommendationsSoftware Testing Recommendations2007-01-25T18:40:39Z<p>165.124.152.78: New page: This wiki is to provide an interaction space for generating a set of recommendations on software testing for the GMOD group. The documents should contain enough information so that after...</p>
<hr />
<div><br />
<br />
This wiki is to provide an interaction space for generating a set of recommendations on software testing for the GMOD group. The documents should contain enough information so that after reading it, someone new to GMOD development will have a good idea of a recommended set of tools and practices that one should employ to develop reliable software.<br />
<br />
General Information<br />
<br />
Links<br />
<br />
It won't hurt you to read the following up front:<br />http://en.wikipedia.org/wiki/Unit_testing<br />http://en.wikipedia.org/wiki/Integration_testing<br />http://en.wikipedia.org/wiki/Regression_testing<br />http://en.wikipedia.org/wiki/Extreme_programming<br />
<br />
Recommendations<br />
<br />
<br />
<br />
* [[Java Testing - by Jon Slenk]]<br />
* [[Perl Testing - by Gavin Sherlock]]<br />
* [[Questions on Testing - by Chris Mungall]]</div>165.124.152.78