NOTE: We are working on migrating this site away from MediaWiki, so editing pages will be disabled for now.

Web-apollo-meeting-2011-4-4

From GMOD
Revision as of 21:11, 4 April 2011 by Suzi (Talk | contribs)

Jump to: navigation, search
  • Chris questions
    • I've put together some thoughts and questions about the project from the perspective of the groups looking to use WebApollo for their own community annotation projects. I know that a lot of these won't really have an answer until the project is further along, and will differ from one project to another. I think a lot of these questions go along with the same questions that we were talking about during the hackathon.
    1. Is there anything we need to do to prepare our data to facilitate use by WebApollo? Would it be better to pull from databases or use preprocessed flatfiles? What would the trade-offs be in terms of disk space and server resource usage?
    2. What additional requirements does WebApollo have on the server side on top of the requirements JBrowse has?
    3. How do the system resources compare to Apollo classic in terms of memory and bandwidth usage?
    4. Will the system scale well to annotation projects with very large datasets/very large chromosomes?
    5. JBrowse is fast on loading large regions but I don't know how it compares with Apollo (classic or WebApollo). Are there any estimates for the resources needed to serve a really large annotation project(one to two hundred annotators at peak load)? Will there be issues in running multiple annotation projects simultaneously from one server?
    6. What sort memory or bandwidth overhead will there be for loading a multi megabase sequence with a dozen evidence tracks? Will there also be issues on the client side?
    7. On projects where the genome is not as well polished, there are many unplaced scaffolds (ChrUn), on the order of thousands. Will there be an option to type in the chromosome name and position in addition to or instead of a drop down box?
      • Need a generic way
    8. We need to evaluate how NCBI and UCSC utilize unplaced scaffolds. Some groups concatenate all the unplaced scaffolds into one sequence, which may make annotation problematic. In the past, we had split all the ChrUn scaffolds into separate sequences, but this may be a problem if we are to keep compatability with UCSC.
      • Might need to embed the Georgetown splitter software into the retrieval software. Hard to make this work across different genome projects. Perhaps an initial configuration phase to gather the coordinates (NCBI or whoever has AGP file) will be needed first.
    9. What system requirements will there be (if any) for the end users?
      • Hardware:
        • Memory requirements (scaffold vs chromosome scale regions)
        • CPU?
      • Software:
        • OS/browser/other requirements?
  • UI
    • Server now returning CDS features. But JBrowse doesn't yet handle these and is still using separate UTR features. So its still a hack.
  • Jay
    • Not much to report this work, but did talk for a couple hours last week and figured out a lot of details.