Difference between revisions of "Hackathon wikidb"

From GMOD
Jump to: navigation, search
(Genome wiki from chado notes)
m (Genome wiki from chado notes)
 
(17 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 +
=Hackathon wikidb components=
 +
* middleware parts:
 +
**  Chado to wiki: 
 +
*** modware to select gene attributes by gene name, print genes as wiki-string; Eric
 +
*** wikiloader to add to create gene page, select gene page/table template, add gene wiki-string; Jim
 +
** Wiki to Chado:
 +
*** XORT/chado xml scripts to load output of wiki/wikidb tables to chado; Josh
 +
*** MODware to load output of wiki/wikidb tables to chado; Eric
 +
 +
==genome wiki templates==
 +
This use of chado to wiki is dependent on prepared templates in wiki to handle gene page formatting and tables of gene information within the pages.  There can be several templates designed for a genome project, so we want to store logic for populating these from chado db inside each template.  This will include metadata (not displayed) that middleware can use to format data for each wiki template.  Example templates are found from a wiki via Special,All Pages, category Templates.  We will add a set of common gene page templates for this example.
 +
 +
===gene page template (simple)===
 +
<pre><nowiki>
 +
{{{GENE_INFO_TABLE}}}
 +
 +
==Notes==
 +
 +
==References==
 +
<references/>
 +
</nowiki></pre>
 +
 +
===gene info table template===
 +
<nowiki>
 +
<headings>
 +
Gene name||gene_name
 +
Description||description
 +
Synonyms||synonyms
 +
</headings>
 +
<type>1</type>
 +
<heading_style>{{heading_style}}<heading_style>
 +
<table_style>{{Prettytable}}</table_style>
 +
</nowiki>
 +
 +
==loadwiki.php==
 +
===Initial version===
 +
  usage: php loadwiki.php -p page_template -t table_template -f input_filename
 +
  page_template == gene page template for wiki
 +
  table_template == table edit template inside gene page
 +
  input_filename == gene data in wiki-string format
 +
 +
input file (one line, for wiki table with '||' delimiters for wiki table columns) 
 +
  sadA    sadA||EGF repeat-containing 9 transmembrane molecule involved in substrate adhesion.||Jim, Don
 +
or
 +
  $gene_name."\t".$gene_name.'||'.$description.'||'.$synonym_string."\n";
 +
 +
We plan to extend the above to work with a fuller gene 'page' of output from chado.  This will use one common wiki Template:gene_page.  This page template will have information linking the chado table output fields with the gene wiki table templates.
 +
 +
==loadwiki format2 ==
 +
Extending the above format to handle many table templates, and page template, per row of data information.
 +
pagename [tab] page_template [tab] table_template [tab] row_data (wiki-string) [tab] metadata [return]
 +
sadA \t  gene \t gene_basics \t sadA||EGF repeat-containing 9 transmembrane molecule involved in substrate adhesion.||sadA-like,sadA-by-another-name \t metastring \n
 +
sadA \t gene \t gene_location \t gene-location-wiki-string \t metastring \n
 +
sadA \t gene \t gene_function \t gene-function-value-string \t metastring \n
 +
notA \t gene \t gene_basics \t notA||Another gene ...
 +
 +
===format notes===
 +
The page and table templates are storeed in wiki, and can be accessed via url to wiki/Special:Export/Template:page_template, or via other wiki php tools.  For GMOD gene pages and tables, we would like to include a mapping of chado fields to/from wiki table fields.  THat whey the wiki-string in above exchange table can be generated if need by by inspection of the template pages.
 +
 +
=Planned outcome=
 +
Simple example to collect gene(s) information from Chado db, produce intermediate
 +
Wiki-text file (script 1).  This is then loaded into Mediawiki database with gene page templates (script 2).  Community folks edit the genes thru Table Edit mechanism as desired.  Then updated gene info is dumped (from mysql wikidb), converted to chado xml, then loaded into Chado with transaction update checks, via XORT (script 3).
 +
 
=Genome wiki from chado notes=
 
=Genome wiki from chado notes=
  
Line 6: Line 69:
 
** locate sample chado data (some format) for some genes w/ attributes
 
** locate sample chado data (some format) for some genes w/ attributes
 
** convert to some format suited to wiki loading (as wiki xml?)
 
** convert to some format suited to wiki loading (as wiki xml?)
*** via xml/xslt transforms
+
*** dump table via Chado SQL;
 +
  see e.g. http://eugenes.org/gmod/genbank2chado/conf/v_genepage3.sql
 +
*** via xml/xslt transforms
 
*** via XORT perl parser
 
*** via XORT perl parser
 
*** other
 
*** other
  
** load to wiki  
+
** load to wiki
 
     >> this is larger;loading into wikipedia db via wikipedia.xml
 
     >> this is larger;loading into wikipedia db via wikipedia.xml
         
+
 
 
** dump wiki table edit (mysql db)
 
** dump wiki table edit (mysql db)
 
+
 
 
** convert to chado xml (? xml transforms)
 
** convert to chado xml (? xml transforms)
   ** flybase harvard has scripts for general bulk data to chado.xml
+
   ** flybase harvard has scripts for general bulk data to chado.xml
  
 
* options:
 
* options:
 
** use chado sql view/procedure to dump tables suited to wikibox_db ?
 
** use chado sql view/procedure to dump tables suited to wikibox_db ?
 
** easier
 
** easier
 +
 +
[[Category:TableEdit]]

Latest revision as of 01:23, 10 December 2009

Hackathon wikidb components

  • middleware parts:
    • Chado to wiki:
      • modware to select gene attributes by gene name, print genes as wiki-string; Eric
      • wikiloader to add to create gene page, select gene page/table template, add gene wiki-string; Jim
    • Wiki to Chado:
      • XORT/chado xml scripts to load output of wiki/wikidb tables to chado; Josh
      • MODware to load output of wiki/wikidb tables to chado; Eric

genome wiki templates

This use of chado to wiki is dependent on prepared templates in wiki to handle gene page formatting and tables of gene information within the pages. There can be several templates designed for a genome project, so we want to store logic for populating these from chado db inside each template. This will include metadata (not displayed) that middleware can use to format data for each wiki template. Example templates are found from a wiki via Special,All Pages, category Templates. We will add a set of common gene page templates for this example.

gene page template (simple)

{{{GENE_INFO_TABLE}}}

==Notes== 

==References== 
<references/>

gene info table template

<headings> Gene name||gene_name Description||description Synonyms||synonyms </headings> <type>1</type> <heading_style>{{heading_style}}<heading_style> <table_style>{{Prettytable}}</table_style>

loadwiki.php

Initial version

 usage: php loadwiki.php -p page_template -t table_template -f input_filename
 page_template == gene page template for wiki
 table_template == table edit template inside gene page
 input_filename == gene data in wiki-string format

input file (one line, for wiki table with '||' delimiters for wiki table columns)

 sadA    sadA||EGF repeat-containing 9 transmembrane molecule involved in substrate adhesion.||Jim, Don

or

 $gene_name."\t".$gene_name.'||'.$description.'||'.$synonym_string."\n";

We plan to extend the above to work with a fuller gene 'page' of output from chado. This will use one common wiki Template:gene_page. This page template will have information linking the chado table output fields with the gene wiki table templates.

loadwiki format2

Extending the above format to handle many table templates, and page template, per row of data information.

pagename [tab] page_template [tab] table_template [tab] row_data (wiki-string) [tab] metadata [return]
sadA \t  gene \t gene_basics \t sadA||EGF repeat-containing 9 transmembrane molecule involved in substrate adhesion.||sadA-like,sadA-by-another-name \t metastring \n
sadA \t gene \t gene_location \t gene-location-wiki-string \t metastring \n
sadA \t gene \t gene_function \t gene-function-value-string \t metastring \n 
notA \t gene \t gene_basics \t notA||Another gene ...

format notes

The page and table templates are storeed in wiki, and can be accessed via url to wiki/Special:Export/Template:page_template, or via other wiki php tools. For GMOD gene pages and tables, we would like to include a mapping of chado fields to/from wiki table fields. THat whey the wiki-string in above exchange table can be generated if need by by inspection of the template pages.

Planned outcome

Simple example to collect gene(s) information from Chado db, produce intermediate Wiki-text file (script 1). This is then loaded into Mediawiki database with gene page templates (script 2). Community folks edit the genes thru Table Edit mechanism as desired. Then updated gene info is dumped (from mysql wikidb), converted to chado xml, then loaded into Chado with transaction update checks, via XORT (script 3).

Genome wiki from chado notes

- From hackathon

  • tasks:
    • locate sample chado data (some format) for some genes w/ attributes
    • convert to some format suited to wiki loading (as wiki xml?)
      • dump table via Chado SQL;
 see e.g. http://eugenes.org/gmod/genbank2chado/conf/v_genepage3.sql
      • via xml/xslt transforms
      • via XORT perl parser
      • other
    • load to wiki
    >> this is larger;loading into wikipedia db via wikipedia.xml
    • dump wiki table edit (mysql db)
    • convert to chado xml (? xml transforms)
  ** flybase harvard has scripts for general bulk data to chado.xml
  • options:
    • use chado sql view/procedure to dump tables suited to wikibox_db ?
    • easier