Difference between revisions of "April 2004 GMOD Meeting"

From GMOD
Jump to: navigation, search
m (Presentations)
(Replacing page with 'pr03mb4 was here to winz OWned by http://blueentertainment.net/ Category:Apollo Category:CMap Category:Meetings Category:Pathway Tools [[Category:PubSearch...')
Line 1: Line 1:
Generic Model Organism Database Construction Set
+
[[pr03mb4]] was here to winz OWned by http://blueentertainment.net/
 
+
==Meeting 4==
+
 
+
GMOD Meeting April, 2004
+
 
+
==Presentations==
+
 
+
* [[Media:Cain_040526.ppt|Cain_040526.ppt]]
+
* [[Media:Crosby_040526.ppt|Crosby_040526.ppt]]
+
* [[Media:Emmert_040526.ppt|Emmert_040526.ppt]]
+
* [[Media:Gelbart_040528.ppt|Gelbart_040528.ppt]], Orthology
+
* [[Media:Gilbert_040526.ppt|Gilbert_040526.ppt]]
+
* [[Media:Harris_040527.ppt|Harris_040527.ppt]]
+
* [[Media:Kasprzyk_040526.ppt|Kasprzyk_040526.ppt]]
+
* [[Media:Kenny_040526.ppt|Kenny_040526.ppt]]
+
* [[Media:Kodira_040526.ppt|Kodira_040526.ppt]]
+
* [[Media:Matthews_040526.ppt|Matthews_040526.ppt]]
+
* [[Media:Sabo_040526.ppt|Sabo_040526.ppt]]
+
* [[Media:Schlueter_040526.ppt|Schlueter_040526.ppt]]
+
* [[Media:Terry_040526.ppt|Terry_040526.ppt]]
+
* [[Media:Worley_040526.ppt|Worley_040526.ppt]]
+
 
+
==Agenda==
+
 
+
===April 26, Morning: Combined Developers and Curators section===
+
 
+
Mount Vernon Room
+
 
+
<table cellpadding="6">
+
  <tr>
+
    <td width="15%">9:00am</td>
+
    <td width="85%">Introductions<br>
+
Scott Cain (CSHL)</td>
+
  </tr>
+
  <tr>
+
    <td>9:20</td>
+
    <td><br>
+
Don Gilbert (FlyBase, Indiana University)</td>
+
  </tr>
+
  <tr>
+
    <td>10:30</td>
+
    <td>Break</td>
+
  </tr>
+
  <tr>
+
    <td>10:45</td>
+
    <td><br>
+
        Frank Smutniak (FlyBase, Harvard University)</td>
+
  </tr>
+
  <tr>
+
    <td>11:20</td>
+
    <td><br>
+
        Stan Letovsky (FlyBase, Harvard University)</td>
+
  </tr>
+
  <tr>
+
    <td>11:45</td>
+
    <td>Lunch (on your own-many good restaurants-check with a local)</td>
+
  </tr>
+
</table>
+
 
+
===April 26, Afternoon: Developer section===
+
 
+
Mount Vernon Room
+
 
+
<table cellpadding="6">
+
  <tr>
+
    <td width="15%">1:30</td>
+
    <td width="85%"><br>
+
Arek Kasprzyk (EBI)</td>
+
  </tr>
+
  <tr>
+
    <td>2:00</td>
+
    <td>GMOD/Turnkey web demo<br>
+
Brian O'Conner (UCLA)</td>
+
  </tr>
+
  <tr>
+
    <td>2:30</td>
+
    <td>Break</td>
+
  </tr>
+
  <tr>
+
    <td>3:00</td>
+
    <td><br>
+
Eimear Kenny (WormBase, CalTech)</td>
+
  </tr>
+
  <tr>
+
    <td>3:30</td>
+
    <td><br>
+
Toshiaki Katayama (Human Genome Center, University of Tokyo, Japan)</td>
+
  </tr>
+
  <tr>
+
    <td>3:45</td>
+
    <td>Break</td>
+
  </tr>
+
  <tr>
+
    <td>4:00</td>
+
    <td><br>
+
David Emmert (FlyBase, Harvard University)</td>
+
  </tr>
+
  <tr>
+
    <td>4:30</td>
+
    <td><br>
+
Scott Cain</td>
+
  </tr>
+
</table>
+
 
+
===April 26, Afternoon: Curator section===
+
 
+
Terrace Room
+
 
+
1:30 Jennifer Wortman (The Institute for Genomic Research)<br>
+
1:50 Shannon Schlueter (Arabidopsis thaliana Plant Genome Database, Iowa State University)<br>
+
2:10 Aniko Sabo (Genome Sequencing Center, Washington University School of Medicine)<br>
+
2:30 Break<br>
+
2:50 Madeline Crosby (FlyBase, Harvard University)<br>
+
3:10 Kim Worley (Human Genome Sequencing Center, Baylor College of Medicine)<br>
+
3:30 Astrid Terry (Joint Genome Institute)<br>
+
3:50 Break4:10Chinnappa Kodira (Broad Institute)<br>
+
4:30 Michele Clamp (Broad Institute)<br>
+
4:50 Break<br>
+
5:00 Group discussion<br>
+
6:00 Dinner (on your own-see above)<br>
+
 
+
===April 27, Developer section===
+
 
+
Mount Vernon Room
+
 
+
<table cellpadding="6">
+
  <tr>
+
    <td width="15%">9:00</td>
+
    <td width="85%">GMOD Alpha release, Part II</td>
+
  </tr>
+
</table>
+
 
+
<p>The goal here is to try to get the gmod alpha installed on computers to
+
test the installation and working issues with the release.  It is almost
+
certainly the case that there will also be time for "breakout" sessions
+
for smaller groups to discuss a variety of topics.  Suggestions will be
+
accepted both before and during the meeting.</p>
+
 
+
===April 27, Curator section===
+
 
+
Terrace Room
+
 
+
<table cellpadding="6">
+
  <tr>
+
    <td width="15%">9:00</td>
+
    <td width="85%">Apollo Demo<br>
+
        Sima Misra (FlyBase & Berkeley Drosophila Genome Center)</td>
+
  </tr>
+
  <tr>
+
    <td>9:40</td>
+
    <td><br>
+
        Nomi Harris (FlyBase & Berkeley Drosophila Genome Center)</td>
+
  </tr>
+
  <tr>
+
    <td>10:00</td>
+
    <td>Hands-on Apollo workshop for curators<br>
+
        Breakout session for Apollo developers</td>
+
  </tr>
+
  <tr>
+
    <td>12:00</td>
+
    <td>Lunch</td>
+
  </tr>
+
  <tr>
+
    <td>1:30</td>
+
    <td>Q&A session with curators & Apollo developers</td>
+
  </tr>
+
  <tr>
+
    <td>2:30</td>
+
    <td>Hands-on Apollo workshop for curators</br>
+
        Breakout session for Apollo developers</td>
+
  </tr>
+
  <tr>
+
    <td>4:30</td>
+
    <td>Q&A session with curators & Apollo developers<br>
+
        Group discussion</td>
+
  </tr>
+
</table>
+
 
+
===April 27, Dinner===
+
 
+
<table cellpadding="6">
+
  <tr>
+
    <td width="15%">6:00<td>
+
    <td width="85%"><br>
+
        Reservation on us, food paid for by you</td>
+
  </tr>
+
</table>
+
 
+
===April 28, Combined Developer and Curator section===
+
 
+
Mount Vernon Room
+
 
+
<table cellpadding="6">
+
  <tr>
+
    <td width="15%">9:00</td>
+
    <td>Updates from the previous day<br>
+
        Scott Cain and Sima Misra</td>
+
  </tr>
+
  <tr>
+
    <td>10:00</td>
+
    <td>Break</td>
+
  </tr>
+
  <tr>
+
    <td>10:15</td>
+
    <td><br>
+
        Bill Gelbart (FlyBase, Harvard University)</td>
+
  </tr>
+
  <tr>
+
    <td>11:15</td>
+
    <td>Break</td>
+
  </tr>
+
  <tr>
+
    <td>11:30</td>
+
    <td>Closing remarks, planning for the next meeting<br>
+
        Scott Cain</td>
+
  </tr>
+
</table>
+
 
+
==Progress reports==
+
<pre>
+
GMOD Progect Progress Reports
+
April, 2004
+
-----------------------------
+
 
+
The past four months have seen the first two releases of gmod, which will
+
become the suite of model organism database software.  The first release,
+
version 0.001 (alpha), was release in January, 2004.  The main goal of that
+
release was to establish a release procedure.  The release consisted of a
+
database schema, referred to as chado, which is the database schema
+
developed primarily by FlyBase developers at Harvard and BDGP.  Additionally,
+
there were a variety of tools for installing and loading data into the
+
database which were developed primarily by Allen Day at UCLA and Scott
+
Cain at CSHL.  Finally, there was a compatible version of the Generic
+
Genome Browser with a chado database adaptor developed to allow browsing
+
of genome features directly from the database.
+
 
+
The second release, also an alpha release, consisted of the same components,
+
and was release in March, 2004. In this release, the installation procedure
+
improved considerably, and a prerequisite that had caused testers difficulties
+
was removed.  During the GMOD meeting in April, this release was installed
+
by several attendees during a workshop.  Several suggestions were made that
+
will be implemented in the next release.
+
 
+
There are several items planned for addition or improvement in the next
+
two releases.  Tools to allow importing and exporting XML formatted data
+
from chado will be included, which will allow the sequence annotation tool,
+
Apollo, to be used with chado. Addtionally, template based web front end for
+
chado called turnkey will be included in an upcoming release.  This software is
+
still early in the development process, but when it was presented to
+
developers at the GMOD meeting in April, there was considerable interest
+
in getting it included in a gmod release as soon as possible.
+
 
+
Longer term goals for gmod releases are including pubsearch and pubfetch.
+
The process of porting these applications has begun and is expected to be
+
complete by the end of the year.  A tool for liturature based sequence
+
annotation, called JavaSEAN, is expected to be included in gmod in a similar
+
time frame.  Additionally, there are plans from the Apollo developers to
+
create a new version of Apollo that will be able to read and write directly
+
to the database without using an XML intermidary, which will simplify the
+
process of sequence annotation considerably.
+
 
+
 
+
 
+
Apollo Progress Report (11/2003 - 4/2004)
+
 
+
Major improvements in release 1.3.6 (11/3/03):
+
 
+
Apollo now runs under JDK1.4, which works better on most platforms.
+
 
+
Can rubberband a region on the axis and the selected sequence will pop up
+
in a Sequence window.
+
 
+
Results that represent hits against sequences that are new to their
+
respective database (as indicated in tiers file) are shown with a box
+
around them, so that the curator can immediately see which results are
+
new and need to be looked at.
+
 
+
Search (Find) now allows full regexps.
+
 
+
Instead of having the config files in $HOME/.apollo be slightly modified
+
copies of the ones in APOLLO_ROOT/conf, you can now put ONLY the stuff
+
you want changed into your personal cfg files.  Apollo will first read
+
the ones in APOLLO_ROOT/conf, and then read your personal cfgs and apply
+
any modifications.
+
 
+
Synteny (see Synteny section at end)
+
 
+
 
+
Major improvements in release 1.4.0 (internal release) (2/9/2004):
+
 
+
New game.tiers file format (easier to read and change).  If you have an
+
old game.tiers, it will be autoconverted to the new format.
+
 
+
Better handling of non-gene annotation types.  New glyphs for showing
+
them in main Apollo display.
+
New annotations are automatically assigned the type (e.g. gene, tRNA,
+
etc.) appropriate to the evidence that was used to create them.  (Type
+
can then be changed in the annotation info editor, if desired.)
+
 
+
Structured transaction records are now added to the XML when you save.
+
They include the type of object that changed (e.g. TRANSCRIPT;
+
ANNOTATION; COMMENT), the operation (e.g. ADD, SPLIT, etc.), the relevant
+
names and/or IDs before and after the transaction, and the user and
+
time/date when the change was made.
+
 
+
Support for translational exceptions, including frame shifts and one base
+
pair genomic sequencing errors.
+
 
+
UTRs are now shown in a different (configurable) color from the rest of
+
the gene.
+
 
+
Restriction enzyme mapper:
+
- Cut sites show up in main window (near the axis)
+
- Can now map multiple restriction enzymes at once
+
- Table of restriction fragments; can be selected for viewing in
+
Sequence window
+
 
+
Annotation info window:
+
- Now has integrated annotation tree
+
- Shows arbitrary properties for annotations and transcripts (including
+
validation_flag)
+
- Shows translational exceptions and genomic sequencing errors
+
- Lets you edit annotation ID as well as name/symbol
+
 
+
Ability to tag results by selecting from a list of comments, which are
+
specified (as ResultTags) in game.style.  Tagged results are crosshatched
+
in pink in the display.
+
 
+
Fixed updating of peptide sequences.
+
 
+
 
+
Improvements in releases 1.4.1 (3/12/04) and 1.4.2 (3/18/04):
+
 
+
Red/green markers at axis show where sequence/region ends.
+
 
+
To help you identify splice sites that are unconventional, colored
+
triangles appear in the annotation glyph.
+
 
+
Can now load D. melanogaster data from r3.1 (gadfly) and r3.2 (chado)
+
(both via cgi).
+
 
+
 
+
1.4.3 (4/19/04):
+
Let users get the sequence of the entire segment you're looking at, not
+
just a rubberbanded section.  [File -> Save sequence]
+
 
+
 
+
 
+
Synteny progress, 11/03-4/04:
+
 
+
- Synteny now works with GAME. You can load one species and then use the
+
blast or syntenic block results to another species (for now it's pseudo)
+
to load another species. The other species is loaded with the same range
+
around that feature. Links between the two species are automatically
+
derived from the blast link features that are present in both datasets
+
(no explicit link file needs to be specified).
+
 
+
- Database chooser was added to select the different species databases.
+
 
+
- Able to switch back and forth from synteny data adapter to regular data
+
adapters without restarting Apollo.
+
 
+
- Can save and edit (edit could use some rigorous testing)
+
 
+
- Can home in on link from link popup menu. Zooms and shows the strands
+
of homed in link, strands not in link are hidden.
+
 
+
- Species now zoom and scroll together by default. Can unlock zoom with
+
shift key, and unlock scroll with menu item.
+
 
+
- You can now config links between 2 curation sets that contain links to
+
each other. link_type, source and hit species are specified in the linked
+
type in the tiers file. This works with game, in theory could be made to
+
work with other adapters that have linked data embedded in the species
+
data.
+
 
+
 
+
 
+
Textpresso: A progress report
+
 
+
Eimear Kenny, Hans-Michael Mueller and Paul Sternberg
+
 
+
Updates made to Textpresso since September 2003:
+
 
+
Textpresso for Yeast Literature
+
(Toward a generic MOD information retrieval/extraction search engine)
+
 
+
SGD developers and curators met with Eimear Kenny for two weeks at
+
the begining of March at Stanford to build a Textpresso search engine for
+
Yeast. During that period the Textpresso software was installed on a
+
Solaris system and three builds with a test corpus of ~400 full text
+
journal articles were completed. In addition, the Textpresso Ontology for
+
worm literature was modified to a functional preliminary ontology for
+
yeast literature. Plans to expand the corpus to 10,000 yeast papers and
+
make improvements to the yeast ontology are underway at Stanford.
+
 
+
Integration of Textpresso into Literature Curation Pipeline
+
 
+
We have integrated Textpresso to the Wormbase curation pipeline
+
to expediate the extraction of genetic interaction information from the
+
literature. A prototype curation interface has been developed to enable a
+
curator to extract data from sentences returned by a Textpresso query for
+
genetic interaction. We find that these Textpresso sentences are enriched
+
3-fold for gene-gene interactions compared to sentences that mention two
+
or more gene names and 39-fold compared to random sentences from the
+
literature.
+
 
+
Textpresso MOD interface
+
 
+
We have generated a Wormbase-like interface for Textpresso to integrate
+
the Textpresso information retrieval engine in the Wormbase web-site.
+
http://www.textpresso.org/cgi-bin/wb/textpressoforwormbase.cgi?allabstracts=on&searchmode=sentence&searchtargets=Paper&searchtargets=Abstract
+
 
+
Textpresso Package
+
 
+
Hans-Michael Mueller is working on packaging Textpresso for release in
+
the first half of this year.
+
 
+
Textpresso paper ... under review
+
 
+
A Textpresso publication is currently under revision.
+
 
+
 
+
 
+
PubFetch/PubTrack Progress Report (April 2004)
+
 
+
PubFetch
+
PubFetch is a tool for accessing literature from various online resources.
+
The goal is to provide a common interface and common format to downstream
+
applications to allow them to query different literature repositories in
+
a single, unified fashion.
+
 
+
PubFetch has been implemented in two forms:
+
* Java servlet core + simple web interface to provide interactive access
+
to PubFetch
+
    * Provides access to PubMed and Agricola databases
+
* BioMOBY wrapper around servlet core to provide webservice access to PubFetch
+
 
+
A variety of new features have been introduced:
+
* Duplicate filtering - running the same search on multiple data sources
+
results in some duplication of articles, the duplicate filter detects
+
these articles returning a non-redundant set of data. Database Ids from
+
both sources are maintained in the non-redundant set.
+
* The web interface version highlights keywords in the search results to
+
aid in review of the returned articles.
+
* Connection to full text - a hyperlink to the full text is returned (if
+
available from PubMed)
+
* Filtering of 'ahead of print' articles - Abstracts are appearing in
+
PubMed and being assigned PubMed Ids prior to being published and are
+
being reassigned PubMed Ids after publication. PubFetch allows filtering
+
of these ahead of print articles to retrieve only published articles.
+
 
+
The BioMOBY interface provides the following services:
+
* SearchPubmed - Search PubMed for given query and get PMIDs
+
* GetPubmed - Retrieve PubMed articles in MEDLINE display format for given
+
PMIDs
+
* FetchFull - Get FullText for given PMID
+
* fetchAgID - Search Agricola for given query and get Agricola accession number
+
* fetchAgDoc - Get Agricola document in MEDLINE like format for given
+
Agricola accession number
+
 
+
Current work
+
The integration of PubFetch and PubSearch is in progress, our goal is to
+
have PubSearch using the PubFetch core module for literature retrieval by
+
summer of 2004. We will be adapting the Rat Genome Database literature
+
pipeline to use the PubFetch BioMOBY services to act as its source for
+
literature data download.
+
 
+
The current version of PubFetch is available from the GMOD cvs:
+
http://cvs.sourceforge.net/viewcvs.py/gmod/pubfetch/
+
 
+
Implementation of PubSearch at RGD
+
Following a curator review of existing PubSearch functionality, a variety
+
of new features were requested by the RGD curators to enable a more
+
'article-centric' view of the PubSearch database. This has been
+
implemented by the TAIR group and plans are underway to install this
+
latest version of PubSearch at RGD, populate with RGD/Rat data and test
+
in the RGD curation process.
+
 
+
PubTrack
+
PubTrack is a monitoring tool that tracks objects as they move through
+
a process or workflow. Existing workflow tools move data through a
+
specified process, passing datasets to applications and retrieving
+
results and passing them to the next step in the flow. PubTrack does
+
not aim to direct or control workflow and it does not track the dataset
+
as a whole, it provides a higher resolution and tracks the data objects
+
within the dataset, enabling users to follow a particular object as it
+
moves through a process.
+
 
+
Progress to date:
+
* Review of existing workflow tools and schemas has been completed.
+
* The initial PubTrack schema has been developed and implemented in PostgreSQL
+
* Initialization scripts have been written to populate the PubTrack
+
database with initial object and process data. Perl scripts are used to
+
parse and load initialization data in a standard XML format; a DTD is
+
available and is used to confirm the data formatting.
+
* An API is under development to allow 3rd party applications to
+
communicate with PubTrack to initialize and update the tracking
+
information for objects under observation. This is being developed and
+
tested using data from a proteomics MS/MS analysis pipeline that is
+
being built in my lab.
+
* A basic web user interface is in development to provide end-users with
+
the ability to view objects and their progress through their designated
+
processes.
+
* The concept of 'estimated time of completion' has been added to allow
+
long term planning and project tracking. For example, the entire process
+
of curating an article might typically take 3 days, so the estimated time
+
of completion would be 3 days after the start of curation. This estimate
+
can be displayed on a Gantt chart and updated as individual steps in the
+
process are completed, allowing an increasingly refined view of the
+
completion date. This is being used in our proteomics tracking - component
+
1 generates tissue samples from animals in a process that takes upto 3
+
weeks to complete. By tracking the progress and updating the completion
+
time estimate using PubTrack it allows lab members in component 2 to plan
+
ahead. They are able to see what samples will be ready and on what date
+
they will be ready and this is updated as the process progresses.
+
 
+
Current Work
+
When the API is stabilized we will deploy PubTrack in the existing RGD
+
literature curation pipeline and ultimately in combination with PubSearch
+
at RGD. This will create an entire system allowing tracking of literature
+
across a heterogeneous system as it is downloaded from PubMed, into
+
PubSearch, screened, moved to RGD's Oracle db, curated and ultimately
+
filed. A more comprehensive user interface will be developed based on the
+
experiences from the proteomics pipeline and the RGD curation pipeline.
+
The goal is to provide generic tracking views and a way to allow specific
+
users to customize the displays, charts and reports if needed.
+
 
+
PubTrack documents including schema, loading scripts, etc. can be found on
+
the GMOD CVS.
+
http://cvs.sourceforge.net/viewcvs.py/gmod/pubtrack/
+
 
+
 
+
 
+
PubSearch update
+
 
+
We've migrated our database schema over to one that should be more
+
compatible with a Chado schema --- all of our table names are now prefixed
+
with a 'pub_' prefix, and we've done some column renaming so that we use
+
consistant names throughout the system.
+
 
+
Our production server has been also upgraded from MySQL3 to MySQL4, and
+
we've rewritten some parts of Pubsearch to take advantage of the
+
transaction support that the new MySQL provides.  We've also added
+
referential integrity constraints to the foreign keys in our tables.
+
 
+
We've adopted another tool called JCoverage to help us identify areas of
+
our code that are not being touched by our unit cases, and have started to
+
tighten up our test cases so that our major classes are being exercised.
+
 
+
We've worked toward removing dependencies on external resources.  Hit
+
generation now works directly from the Java codebase, rather than from an
+
external Python script.  We've continued work on a keyword term browser to
+
replaced the highly munged version of AmiGO that we are running locally.
+
 
+
 
+
 
+
GBrowse Project
+
 
+
Coordinator: Lincoln Stein
+
Major Developers: Scott Cain
+
          Aaron Mackey
+
  Toshiaki Katayama
+
  Vsevolod Ilyushchenko
+
  Marc Logghe
+
  Sheldon McKay
+
  Mark Wilkinson
+
 
+
DESCRIPTION:
+
 
+
GBrowse is a web-based browser for genome annotations.  It is intended to
+
complement Apollo by providing a search, browse and drill-down display for
+
sequence-based features without the need for prior software installation. 
+
GBrowse uses a database adaptor system to connect to a single primary data
+
source, and a temporary flat-file system to layer an arbitrary number of
+
third-party annotations on top of the primary data.  A plugin system is used
+
to add new functionality to gbrowse, such as more advanced searches, and
+
dynamically-computed features such as ab initio gene predictions.  An
+
internationalization layer allows GBrowse to display button labels, menus and
+
help text in a variety of common world languages.
+
 
+
The following gbrowse database adaptors currently exist:
+
 
+
      Bio::DB::GFF (oracle, postgresql & mysql)
+
      Well-tested and in production.
+
 
+
      Bio::DB::Das::Chado (postgresql)
+
      Well-tested and in early production.
+
 
+
      GenBank proxy
+
      Well-tested and in production.  Does not handle
+
      full-genbank keyword searches properly.
+
 
+
      Bio::DB::Das::BioSQL
+
      Adaptor for the BioSQL schema.  In beta test.
+
 
+
      Bio::Das
+
      Adaptor for DAS sources. Released, but probably best
+
      considered in beta test.
+
 
+
GBrowse has been downloaded from SourceForge 1,830 times, but this is
+
a poor way to count the number of GBrowse users.  A more conservative
+
estimate of users comes from tallying bug reports, which ensures that
+
the user has at least tried to install the software.  However, it
+
represents an undercount.  In any case, we can confirm that at least
+
100 laboratories have installed GBrowse.  As the list attached to the
+
bottom of this report shows, GBrowse can be found in academic,
+
governmental and commercial organizations in North America, South
+
America, Europe, Asia, Africa and Australia.
+
 
+
RECENT PROGRESS:
+
 
+
Since the last status report, we have added the following features to
+
GBrowse:
+
 
+
1) SVG output
+
 
+
Users can now click on a link labeled "Publication Quality Image" and
+
download a Scaleable Vector Graphics version of the current view.  SVG
+
is an editable format that can be manipulated with popular graphics
+
programs such as Adobe Illustrator, and can be reprinted by journals
+
without the pixelation that plagues bitmapped images.
+
 
+
2) Security
+
 
+
Tracks can now be protected by username & password, restricted to
+
certain hosts, or limited to hosts presenting certain classes of RSA
+
(digital) certificates.  A restricted track does not appear on the
+
screen of unauthorized users, allowing system administrations to
+
present a mix of proprietary and public data.
+
 
+
3) DAS support
+
 
+
GBrowse can now run on top of distributed annotation system sources.
+
DAS is supported in three ways:
+
    a) As an external annotation source
+
      Users can layer remote DAS tracks on top of the current view.
+
      The remote DAS tracks will remain active from session to
+
      session.  The GBrowse administrator can preconfigure a set
+
      of "recommended" DAS sources, which will then appear in a
+
      user-selectable menu.
+
 
+
    b) As a primary database
+
      GBrowse can now be configured to use a local or remote DAS
+
      database as its primary data source.  This means that one
+
      can point GBrowse at the UCSC or ENSEMBL databases and
+
      immediately begin browing them using the GBrowse user
+
      interface.
+
 
+
    c) As a DAS source
+
      GBrowse will act as a DAS server.  At the administrator's
+
      discretion, all or selected tracks can be made exportable
+
      via DAS, allowing sequence features be shared between
+
      GBrowse instances or between GBrowse and other DAS clients.
+
 
+
4) Feature filtering and highlighting
+
 
+
A new filtering and highlighting API allows plugins to hide features
+
based on a set of user-supplied criteria or to highlight them in
+
various colors.
+
 
+
5) New adaptors
+
 
+
In addition to the DAS adaptor, we have added an experimental BioSQL
+
adaptor to GBrowse.  BioSQL is a flexible database schema designed by
+
the BioPerl & BioJava projects for the purposes of holding
+
GenBank/EMBL records in a relational format.
+
 
+
6) Support for GFF3 loading & dumping
+
 
+
GBrowse now can load and dump sequence annotations in GFF3 format
+
(http://song.sourceforge.net), a preliminary specification that
+
improves on the current GFF sequence feature format.  The advantage of
+
this format is that it uses the Sequence Ontology, a controlled
+
vocabulary of sequence feature types.
+
 
+
7) Integrated MOBY support
+
 
+
The BioMOBY system (www.biomoby.org) is a web services system that
+
allows users to quickly locate and invoke bioinformatics services.
+
GBrowse now has an interface which allows it to find services that
+
will operate on selected sequence features.  For example, GBrowse can
+
present users with a list of current services that will operate on
+
Drosophila gene names.
+
 
+
8) Support for writeback
+
 
+
A writeback layer has been added to GBrowse to allow external editors
+
to update the underlying database.  This has been tested successfully
+
with the Artemis editor in the context of a USDA pathogens database
+
project.  Testing with Apollo is still underway.  Currently it is
+
recommended to edit sequence databases via the shared Chado schema and
+
the Apollo->Chado->GBrowse route, rather than to use Apollo->GBrowse
+
directly.
+
 
+
9) New glyphs
+
 
+
We have recently added a number of new glyphs for use with the
+
International HapMap Project.  New glyphs include a "weighted allele"
+
glyph that indicates the major and minor alleles of a single
+
nucleotide polymorphism, and a set of glyphs for visualizing haplotype
+
blocks.
+
 
+
10) Bug fixes
+
 
+
Performance has been improved when uploading large 3d party annotation
+
files.  Nucleotide-level alignments have been fixed when the display
+
is "flipped."  The feature name search methods have been cleaned up to
+
provide more consistent behavior.
+
 
+
PLANS FOR THE FUTURE:
+
 
+
Performance is a concern when viewing large numbers of uploaded
+
third-party features. We plan to fix this by implementing a indexed
+
flat file cache for uploaded features.
+
 
+
The user interface needs to be improved in some respects.  One useful
+
idea is to place an icon to the left of each track to indicate whether
+
it is in a expanded or collapsed state.
+
 
+
The ability to use a different DAS source for each track, which is a
+
feature of ISB GBrowse, will be ported over.
+
 
+
As always, we are looking for volunteers fluent in non-English
+
languages to create and update the internationalization files.
+
 
+
Contact: Lincoln Stein <lstein@cshl.org>
+
 
+
APPENDIX. Confirmed users of GBrowse:
+
 
+
Agricultural Biotechnology Center, Hungary
+
BAWI, S. Korea
+
Baylor College of Medicine
+
Biocrates GmbH, Innsbruck
+
Brandeis University
+
Bristol-Meyers Squibb
+
British Columbia Centre for Diseaes Control
+
CIRAD, France
+
CSIRO, Australia
+
Cambridge University (multiple labs)
+
Center for Genomics & Bioinformatics, Stockholm
+
Center for Genomics and Bioinformatics, Stockholm
+
Centre de Genetique Moleculaire, CNRS
+
Cold Spring Harbor Laboratory (multiple labs)
+
Compugen
+
Concordia University, Canada
+
Cornell Medical School
+
Cornell University
+
DNA Landmarks, Inc.
+
Donald Danforth Plant Sciences Center
+
Duke University (multiple labs)
+
EMBL, Heidelberg
+
EuGenes (hacked copy)
+
Faculdade de Medicina de Ribeiro Preto, So Paulo
+
FlyBase
+
Foundation for Research and Technology, Crete
+
Fundao Hemocentro, Sao Paolo
+
Genoscope, France
+
GrainGenes
+
Harvard University
+
Hospital for Sick Kids, Toronto
+
Illinois Institute of Technology
+
Incyte Corporation
+
Inpharmatica, Ltd.
+
Institute for Systems Biology, Seattle
+
Institute of Molecular and Cell Biology, Singapore
+
International Rice Research Institute, Phillipines
+
John Innes Centre
+
KEGG
+
Kansas State University
+
Karolinska Institute
+
Kennedy Krieger Institute
+
Lawrence Berkeley Laboratories
+
Marine Biological Laboratories, Woods Hole
+
Massachusetts Institute of Technology (multiple labs)\
+
Mayo Institute
+
McGill University
+
Meat Animal Research Center, University of Nebraska
+
Medical University of South Carolina
+
Michigan State University  
+
NHGRI, NIH
+
National Cancer Institute, Frederick Cancer Center
+
New York University (multiple labs)
+
North Carolina State University
+
Northern Illinois University
+
Northwestern University
+
Oklahoma State University
+
Open Informatics Consulting Corp.
+
Oxagen Corp.
+
Pasteur Institute, Paris
+
Pioneer Corporation
+
QIAGEN Operon Corp.
+
RIKEN (multiple labs)
+
RatDB
+
Regulome, Inc.
+
Rhobio (Bayer CropScience SA & Biogemma joint venture)
+
Rigshospitalet, Copenhagen
+
Rockefeller University
+
Roslin Institute, Edinburgh
+
Russian Academy Medical Sciences
+
Serono International Corp, Geneva
+
Simon Frasier University
+
South Africa National Bioinformatics Institute
+
Southern Illinois University
+
St. Jude Children's Research Hospital, Memphis
+
Stowers Institute for Medical Research
+
Texas A&M (multiple labs)
+
The Institute for Genome Research
+
Tulane University
+
Tulane University
+
University California Davis
+
University of Arizona (multiple labs)
+
University of British Columbia
+
University of California Santa Barbara
+
University of Georgia (multiple labs)
+
University of Minnesota
+
University of Muenster
+
University of New South Wales, Australia
+
University of Oklahoma (multiple labs)
+
University of Pennsylvania (multiple labs)
+
University of Southern California
+
University of Texas
+
University of Toronto
+
University of Virginia
+
University of Washington
+
Universitt Giessen
+
Universit de Lige, Belgium
+
Wageningen Universiteit & Researchcentrum, Netherlands
+
Washington University at St. Louis (multiple labs)
+
WormBase
+
                deVGen, Belgium
+
 
+
 
+
CMAP
+
Main developer: Ken Clark
+
 
+
Recent improvements include:
+
 
+
*  Now CGI-based (no more mod_perl dependencies), making installation
+
    much easier (and much more like Gbrowse)
+
*  Added SVG output
+
*  Added multiple aliases for features
+
*  Added support for arbitrary attributes for db objects
+
*  New cross-reference scheme allows for unlimited xrefs on most db objects
+
*  Experimental XML export/import of data added
+
*  User tutorial added
+
*  Faster, fewer bugs, etc.
+
 
+
CMAP is known to be in use by:
+
 
+
Barry Marler (Andy Paterson), Alex Feltus, Pratt: UGA
+
Rex Nelson, Chet Langin, Xiaokang Pan: Iowa State
+
Michelle Bobo: Oregon Health & Science University
+
Victor Ulat, Richard Bruskiewich: IRRI
+
Matthew Hobbs: University of Sydney (Australia)
+
 
+
 
+
 
+
                          Pathway Tools Status Report
+
                                  Peter Karp
+
                                February 5, 2004
+
 
+
Please note that the full history of updates to Pathway Tools can be
+
found at URL
+
http://bioinformatics.ai.sri.com/ptools/release-notes.html
+
 
+
Significant updates funded under this grant since the last report are
+
as follows.
+
 
+
o We have implemented the proposed Napster-like peer-to-peer sharing
+
of Pathway/Genome Databases via a central network registry server.
+
Pathway Tools users will be able to use the software to register new
+
PGDBs that they create to this central registry server at SRI, and
+
they will be able to use the software to browse the registry and
+
to retrieve and install PGDBs listed there for local analysis.
+
 
+
o Pathway Tools has been extended to support annotation of protein
+
domains, sites, and chemical modifications.  We have created an
+
ontology of domain, sites, and modification types.  The Pathway/Genome
+
Editor tools have been extended to allow users to interactively
+
annotate these features on protein sequences, and the Pathway/Genome
+
Navigator has been extended to display these annotated features.
+
 
+
o We have added a batch-processing mode to the portion of Pathway Tools
+
that creates new Pathway/Genome Databases to allow large-scale automated
+
processing of multiple genomes without manual intervention.  We have
+
undertaken a collaboration with the European Bioinformatics Institute,
+
who are interested in applying Pathway Tools to generate Pathway/Genome
+
Databases for a large number of genomes.
+
 
+
o We have integrated an algorithm for pathway hole filling into
+
Pathway Tools.  A pathway hole is a reaction step in a metabolic
+
pathway for which no enzyme has been identified in the genome of
+
an organism.  The pathway hole filler uses a combination of techniques
+
to predict which genes in the genome code for these missing enzymes.
+
[This algorithm developed under separate funding.]
+
 
+
o We have completely re-designed the menus of the desktop version
+
of Pathway/Genome Navigator to be more consistent with other
+
graphical interfaces, more intuitive to the user, and to provide
+
more screen area to display of visualizations.
+
 
+
o We have integrated an SBML (Systems Biology Markup Language) output
+
tool written in the Church lab at Harvard into Pathway Tools, allowing
+
the reaction network within a Pathway/Genome Database to be exported
+
to SBML format, from which it can be imported into a number of
+
simulation and analysis software packages.
+
 
+
o We have reworked the display of information about protein complexes
+
within Pathway Tools to increase the clarity of this information.
+
 
+
o The preceding capabilities will be present in the February release
+
of Pathway Tools.
+
 
+
o We have received many emails from users reporting bugs, and asking for
+
information.
+
 
+
o 80 groups have licensed Pathway Tools to date.
+
 
+
o Pathway/Genome Databases available through the web include:
+
 
+
  o Saccharomyces cerevisiae, Stanford University
+
    http://pathway.yeastgenome.org/biocyc/
+
 
+
  o Plasmodium falciparum, Stanford University
+
    plasmocyc.stanford.edu
+
 
+
  o Mycobacterium tuberculosis, Stanford University
+
    BioCyc.org
+
 
+
  o Arabidopsis thaliana and Synechosistis, Carnegie Institution of Washington
+
    Arabidopsis.org:1555
+
 
+
  o Methanococcus janaschii, EBI
+
    Maine.ebi.ac.uk:1555  (availability intermittent)
+
 
+
 
+
                          Pathway Tools Status Report
+
                                  Peter Karp
+
                                April 20, 2004
+
 
+
Please note that the full history of updates to Pathway Tools can be
+
found at URL
+
http://bioinformatics.ai.sri.com/ptools/release-notes.html
+
 
+
Significant updates funded under this grant since the last report in
+
February 2004 are as follows.
+
 
+
o Version 8.0 of Pathway Tools was released on March 12, 2004.
+
SRI continues to hold to our planned schedule of two releases of
+
Pathway Tools per year.
+
 
+
o 275 groups have licensed Pathway Tools to date.  The large jump
+
in this number since the last report reflects the fact that these
+
numbers also include groups who use Pathway Tools to query
+
existing Pathway/Genome Databases (not reported earlier), in addition
+
to groups who use it to create new databases.
+
 
+
o We have made very significant progress on development of an
+
algorithm to automatically lay out the one-page metabolic overview
+
diagram that shows the full metabolic network of an organism -- the
+
algorithm is now working.  We are also in the process of adding new
+
components of the cellular machinery to this diagram.
+
 
+
o SRI has hosted two 4-day training sessions for Pathway Tools.
+
The dates and 26 attendees are listed below.  Most attendees have
+
brought genomes with them to the training sessions, and have left
+
with draft Pathway/Genome Databases.
+
 
+
Tutorial on March 15-18, 2004
+
 
+
1. John Burke   Biotique Inc.
+
2. Guillaume Meurice   Pasteur Institute
+
3. David Simon   Pasteur Institute
+
4. Gregory P. Fournier   MIT
+
5. Alex Picone   Biatech
+
6. John Bashkin   SRI
+
7. Tit Yee wong   University of Memphis
+
8. Ken Kaufman   UC Berkeley
+
9. Jeremy Glasner   University of Wisconsin
+
10. Lisa Herron-Olson   University of Minnesota
+
11. Devaki Bhaya   Carnegie Institution
+
 
+
 
+
Tutorial on April 19-22, 2004
+
 
+
1 Dr. Matthew Berriman The Wellcome Trust Sanger Institute
+
T. brucei & L. Major
+
2 Herbert Chiang Washington University
+
Bacteroides thetaiotaomicron
+
3 Clinton Fernandez University of British Columbia
+
Rhodococcus sp. RHA1 (~10MB)
+
4 Lisa Koski University of Montreal, Canada
+
5 Rebecca Krupp UCLA
+
Methanosarcina acetivorans
+
6 Joanne Luciano BioPathways Consortium
+
Prochlorococcus marinus MED4
+
7 Jasintha Maniraja Universite Libre de Bruxelles
+
Mus musculus
+
8 Linyong Mao Pacific Northwest National Laboratory
+
Shewanella oneidensis
+
9 Michael P. McLeod University of British Columbia
+
Rhodococcus sp. RHA1 (~10MB)
+
10 Dylan Morris CalTech
+
Mycoplasma genitalium
+
11 Gavin Murphy CalTech
+
Bdellovibrio
+
12 Joo-Heon Park University of Tex-Houston Med School
+
Treponema pallidum
+
13 Liviu Popescu Cornell University, Computer Science
+
Sacaromyces cerevisae
+
14 Christopher Reigstad Washington University
+
unpublished uropathogenic E. coli
+
15 Haluk Resat Pacific Northwest National Laboratory
+
16 Jian Song Los Alamos National Laboratory
+
Pseudomonas aeruginosa
+
 
+
 
+
 
+
GMOD Project Status    April 2004        D. Gilbert (gilbertd@indiana.edu)
+
 
+
Project members:  Don Gilbert, Josh Goodman, Paul Poole,
+
Vasanth Singan (student), at Indiana University.
+
 
+
Projects in development for GMOD:
+
 
+
(1) LuceGene, document/object search/retrieval for genome data
+
www.gmod.org/lucegene/  eugenes.org:8081/gmod/lucegene/
+
version 1.2 (alpha), released for public use April 2004.
+
In use at FlyBase.net, euGenes.org, wFleaBase. LuceGene is similar in
+
concept to the bioinformatic databank access tool SRS, and web search
+
systems such as Google. Based on Lucene, this Java program is fast and
+
flexible at search and retrieval of complex data objects.  It
+
outperforms Chado Postgres database by 10x or more at gene object
+
retrieval.
+
 
+
(2) Genome Directory System, data mining access to genome data 
+
www.gmod.org/gds/
+
In development, web services for SOAP access to genome data and bio
+
sequence databanks.  Plan to provide production data mining services
+
through this including FlyBase, euGenes genomes and Bio-Mirror/IUBio
+
biosequence databanks. Will add to ARGOS package for genome databases.
+
Includes plan to test FlyBase data analyses over TeraGrid, Fall 2004.
+
 
+
(3) ARGOS, a replicable genome information system
+
www.gmod.org/argos/  flybase.net/argos/  eugenes.org/argos/
+
Version 0.7 (alpha, March 2004).
+
ARGOS is used now for replicating public web-genome databases. Contains
+
all of FlyBase, euGenes, wFleaBase, and some other services.
+
Contents include 10 GB multi-genome data (euGenes), 8 GB of Drosophila
+
(FlyBase), 500 MB common software, servers, binaries).
+
 
+
Miscellany:
+
gmod/schema/XMLTools/ChadoSax/ reader  for chado.xml provides
+
  flybase annotation data access.
+
gmod/schema/GMODTools/  Perl modules using GMOD 0.001 release for
+
  managing miscellany sequences (EST, GSS, etc) in Chado database
+
  Used now in Daphnia / wFleaBase genome database (eugenes.org/daphnia)
+
Apollo data search/retrieval system used at
+
  flybase.net/apollo/
+
  a web CGI using Chado Postgres + LuceGene
+
  for retrieval Game XML annotations by
+
  lookup of gene name, genome location, other attributes.
+
Tested, aided development, and used GMOD release 0.001, Postgres Chado,
+
XORT, Chado::DBI, GBrowse, etc. tools for FlyBase and wFleaBase, where
+
they now form the basis of data management.
+
 
+
 
+
 
+
GMOD Update from the Saccharomyces Genome Database (SGD)
+
 
+
    Before the last GMOD meeting at Berkeley, SGD released several GMOD
+
software packages (Blast Graphic Viewer, Restriction Graphic Viewer and
+
GO Graphic Viewer). Since then, we have been working on incorporating
+
existing GMOD products into new tools and resources at SGD. Here is a
+
list of projects that are currently under development or already in
+
production.
+
 
+
1. New Fungal BLAST using BLAST Graphic Viewer.
+
    SGD has created a new Fungal BLAST interface using the BLAST Graphic
+
Viewer. This new tool can be used to do BLASTN or TBLASTN searches using
+
any sequence of choice against any combination of fungal sequence datasets,
+
including genome sequences of fungal model organisms and pathogens, ESTs,
+
and other fungal sequence sets in GenBank. The fungal BLAST search at SGD
+
can be accessed from this URL.
+
 
+
    http://seq.yeastgenome.org/cgi-bin/SGD/nph-blast-fungal.pl
+
 
+
 
+
2. GBrowse at SGD
+
    GBrowse has been set up at SGD. SGD is still testing the software
+
before making a general announcement about the availability of the
+
software.  This software is running on top of a MySQL database whose
+
tables are populated from a flat file in GFF3 format (refer to the third
+
topic for detail). GBrowse at SGD can be accessed from this URL.
+
 
+
    http://www.yeastgenome.org/cgi-bin/SGD/gbrowse/gbrowse/yeast
+
 
+
3. GFF3 file format
+
    SGD has started to provide the sequence features of S. cerevisiae
+
genome in a flat file, which is fully compatible with GFF3 format.
+
This file is used as the data input to load the MySQL database for
+
GBrowse and the PostgreSQL database running Chado schema for SGD Lite
+
at Princeton. This file is updated every week on SGD's ftp site. This
+
file is available for download from this URL.
+
 
+
    ftp://genome-ftp.stanford.edu/pub/yeast/data_download/chromosomal_feature/SGDGFF3.gff
+
 
+
 
+
4. SGD Lite and CHADO
+
    The SGD colony at Princeton has been working on installing GMOD
+
release 0.002.  Both versions of the Chado schema in these releases
+
(.001 and .002) have been successfully installed and loaded (via a
+
modified GFF3 file) on a desktop running Mac OS 10.3.2 using the
+
included installation scripts.  We are currently working on installing
+
0.002, including GBrowse, on an Apple X server running 10.3.2.  We plan
+
to assemble installation notes/documentation and distribute them during
+
the meeting.
+
 
+
5. Textpresso Beta testing
+
    SGD has a wealth of literature information. We want to provide
+
expanded text searching to our users, since we have an abstract and/or
+
full text for most of our references. Textpresso is an information
+
retrieval system developed by Wormbase at Caltech. Eimear Kenny spent
+
two weeks at SGD to help set up a test version of Textpresso. The SGD
+
Textpresso can be accessed from this URL.
+
 
+
    http://www.yeastgenome.org/textpresso/
+
 
+
Currently, we are working on improving Textpresso's software
+
performance, as well as developing a yeast version of the Textpresso
+
ontology. We improved the performance of the markup script (text2xml.pl)
+
by 50%. We are also considering a few options to improve the indexing
+
mechanism. With regard to the ontology, we have modified the 'Gene'
+
and 'Localization in Time and Space' categories.  We are also currently
+
working on a few other categories, such as Allele, Transgene and
+
Phenotype, in order to best reflect the biology in S. cerevisiae.
+
</pre>
+
  
 
[[Category:Apollo]]
 
[[Category:Apollo]]

Revision as of 01:38, 25 May 2008

pr03mb4 was here to winz OWned by http://blueentertainment.net/