GUS WDK Presentation
Overview
Genome databases typically have distinct layers
- batch data production
- data presentation
- curation
PlasmoDB, the Genome Resource for Plasmodium, the malaria parasite, is a warehouse that integrates a large number of data sources about Plasmodium. It does not do curation. The experience of PlasmoDB is that the batch data production back end and the web front end have very different middleware needs.
The PlasmoDB uses
- backend: the GUS Perl Object Layer
- frontend: the GUS Web Development Kit
The GUS Perl Object Layer
The GUS Perl Object Layer is an in-house developed O-R layer specific to GUS databases. It was presented at the caucus only briefly as an example backend layer.
- Written eight years ago (Mark Gibson/Brian Brunk)
- There were no Perl O-R tools available
- One table <-> one object (including linking tables)
- Columns become accessors: getScore(), setScore()
- Relationships modeled with get/setChild() and get/setParent()
- Conscious of “one-level” subclassing (GUS style)
- Cascading retrieve, update, delete
- Cascade can also follow parent links
- Default transaction scope: one object graph
- Or, you can manually control transations
- Simple object retrieval “language”
- Equal in power to a single table select
- Can also retrieve by navigation
- Code generator
- Recently evaluated Class::DBI
- Would have worked nicely, but we didn’t transition
- Nice subclassing model
The GUS WDK
The GUS WDK is a presentation layer toolkit (for any relational schema). It uses a M-V-C design. The focus of this caucus is O-R middleware, which corresponds to the WDK's Model.
The intention of the WDK Model's design is to allow non-programmers to design coarse grained objects that correspond to the entities presented to a user in a web UI, and to specify the queries that users can ask with respect to those entities.
The WDK's Model is configured in XML. The model author specifies
- record classes (eg Gene, SNP)
- queries against those classes
- site content
Here is a sample specification of an ArrayElement entity:
<wdkModel> <recordClass idPrefix="" name="ArrayElementRecordClass" type="Array Element"> <attributeQueryRef ref="ArrayElementAttributes.ProviderAndOrganism"> <columnAttribute name="provider" displayName="Provider"/> <columnAttribute name="genus_species" displayName="genus_species"/> <columnAttribute name="organism" displayName="Organism"/> <columnAttribute name="sequence" displayName="Sequence"/> </attributeQueryRef> <table name="GenomicLocations" displayName="Genomic Locations" queryRef="ArrayElementTables.GenomicLocations"> <columnAttribute name="source_id" internal="true"/> <linkAttribute name="genomicSequence" displayName="Genomic Sequence" visible="$$source_id$$"> <url> <![CDATA[showRecord.do?name=SequenceRecordClasses.SequenceRecordClass&primary_key=$$source_id$$]]> </url> </linkAttribute> <columnAttribute name="min_subject_start" displayName="Start"/> <columnAttribute name="max_subject_end" displayName="End"/> <columnAttribute name="gbrowse_start" internal="true"/> <columnAttribute name="gbrowse_end" internal="true"/> <columnAttribute name="max_subject_end" displayName="End"/> <columnAttribute name="is_reversed" displayName="Reversed"/> <linkAttribute name="gbrowseLink" displayName="Genome Browser" visible="view"> <url> <![CDATA[@CGI_URL@/gbrowse/plasmodb/?name=$$source_id$$:$$gbrowse_start$$-$$gbrowse_end$$;enable=AnnotatedGenes]]> </url> </linkAttribute> </table> <table name="Genes" displayName="Mapped Genes" queryRef="ArrayElementTables.Genes"> <columnAttribute name="source_id" internal="true"/> <linkAttribute name="gene" displayName="Gene" visible="$$source_id$$"> <url> <![CDATA[showRecord.do?name=GeneRecordClasses.GeneRecordClass&primary_key=$$source_id$$]]> </url> </linkAttribute> </table> </recordClass> <querySet name="ArrayElementAttributes"> <sqlQuery name="ProviderAndOrganism" isCacheable="false"> <paramRef ref="params.primaryKey"/> <column name="provider"/> <column name="genus_species"/> <column name="sequence"/> <column name="organism"/> <syntaxhighlight lang="sql"> <![CDATA[ SELECT ens.source_id, ed.name as provider, ens.sequence, tn.name as genus_species, SUBSTR(tn.name, 1, 1) || '. ' || SUBSTR(tn.name, INSTR(tn.name, ' ', 1, 1) +1) as organism FROM sres.ExternalDatabase ed, sres.ExternalDatabaseRelease edr, dots.ExternalNASequence ens, sres.TaxonName tn WHERE ens.source_id = '$$primaryKey$$' AND ens.external_database_release_id = edr.external_database_release_id AND edr.external_database_id = ed.external_database_id AND tn.taxon_id = ens.taxon_id AND tn.name_class = 'scientific name' ]]>
</sqlQuery>
</querySet>
<querySet name="ArrayElementTables">
<sqlQuery name="Genes" isCacheable="false"> <paramRef ref="params.primaryKey"/> <column name="source_id" />
<![CDATA[ SELECT t.source_id FROM dots.Similarity s, dots.Transcript t, dots.ExternalNaSequence ens, core.TableInfo ti1, core.TableInfo ti2 WHERE ens.source_id = '$$primaryKey$$' AND ens.na_sequence_id = s.query_id AND 'ExternalNASequence' = ti1.name AND ti1.table_id = s.query_table_id AND 'Transcript' = ti2.name AND ti2.table_id = s.subject_table_id AND t.na_feature_id = s.subject_id ]]>
</sqlQuery>
<sqlQuery name="GenomicLocations" isCacheable="false"> <paramRef ref="params.primaryKey"/> <column name="source_id" /> <column name="min_subject_start" /> <column name="max_subject_end" /> <column name="gbrowse_start" /> <column name="gbrowse_end" /> <column name="is_reversed" />
<![CDATA[ SELECT ens2.source_id, s.min_subject_start, s.max_subject_end, s.is_reversed, s.min_subject_start - 25000 AS gbrowse_start, s.min_subject_start + 25000 AS gbrowse_end FROM dots.Similarity s, dots.ExternalNaSequence ens1, core.TableInfo ti1, sres.ExternalDatabase ed2, sres.ExternalDatabaseRelease edr2, dots.ExternalNaSequence ens2, core.TableInfo ti2 WHERE ens1.source_id = '$$primaryKey$$' AND ens1.na_sequence_id = s.query_id AND 'ExternalNASequence' = ti1.name AND ti1.table_id = s.query_table_id AND ed2.name IN ('Sanger P. falciparum chromosomes', 'Jane Carlton P. yoelii chromosomes') AND ed2.external_database_id = edr2.external_database_id AND edr2.external_database_release_id = ens2.external_database_release_id AND ens2.na_sequence_id = s.subject_id AND 'ExternalNASequence' = ti2.name AND ti2.table_id = s.subject_table_id ]]>
</sqlQuery>
</querySet> </wdkModel>
</syntaxhighlight>