GMOD

Apollo-Chado Integration at BovineBase: Bugs and Suggestions

This was written by Justin Reese in preparation for Hackathon 2007.

In preparation for the Bovine Annotation effort, we set up a Chado database containing annotation evidence, allow annotators to connect via Apollo and do their annotations (we haven’t gotten Apollo->Chado writebacks working yet, but we’d like to eventually).

We thought it might help GMOD developers improve Apollo/Chado interoperability to get some feedback from the Apollo users (annotators) and developers (the ones who set up our Chado db). So, below are some bug reports and suggestions that compiled from annotators and developers involved in the Bovine Genome annotation effort. I will be fleshing this out in the next 12-24 hours, hopefully before the hackathon starts hacking, but feel free to contact me if something isn’t clear.

Bugs

A few ideas for future improvements

  1. Move as much Apollo configuration stuff as possible out of conf files like chado-adapter.xml, and instead query the user or the database, e.g:
    • Allow user to enter URLs, id’s, password for Chado databases like they would in a web browser, rather than having them specified in chado-adapter.xml
    • Have apollo retrieve “track” information from Chado’s ‘analysis’ table, rather than specifying them in chado-adapter.xml (searchHitPrograms, genePredictionPrograms, etc).

    Our annotators aren’t particularly good at installing conf files* and are spread out all over the world, so we can’t really do it for them. Having things like tracks names and URLs hard-coded in conf files forces us to distribute new conf files to our annotators when we change something and hope they do it correctly. This hasn’t always gone smoothly. Ideally, whenever possible, we would just change our Chado database (add a track or change our URL for example), and Apollo would automatically get hip by querying the Chado database or the user. *no offense, if any of you annotators are reading this

  2. Simplify track naming schemes in Apollo conf files - the names of the tracks are a little complex and hard to understand for the uninitiated developer, and it’s not always clear which one to use. For example, during my first foray, I naively tried loading repeatmasker results under searchHitPrograms, not realizing that searchHitPrograms are always alignments between the reference sequence and a second sequence. Not sure if I can suggest an intelligent improvement, but would it be possible to construct tracks like you do in GBrowse (using aggregators and the names of the things I would like to aggregate, like gene/trancript/CDS) or have Apollo construct them automatically using some SQL magic (query for a parent, query for it’s children, query for the children’s children, etc?). Just a thought, this is probably asking a lot.

Categories:

Documentation

Community

Tools