Web-apollo-meeting-2011-4-4

From GMOD
Revision as of 21:55, 4 April 2011 by Elee (Talk | contribs)

Jump to: navigation, search
  • Chris questions
    • I've put together some thoughts and questions about the project from the perspective of the groups looking to use WebApollo for their own community annotation projects. I know that a lot of these won't really have an answer until the project is further along, and will differ from one project to another. I think a lot of these questions go along with the same questions that we were talking about during the hackathon.
    1. Is there anything we need to do to prepare our data to facilitate use by WebApollo? Would it be better to pull from databases or use preprocessed flatfiles? What would the trade-offs be in terms of disk space and server resource usage?
    2. What additional requirements does WebApollo have on the server side on top of the requirements JBrowse has?
    3. How do the system resources compare to Apollo classic in terms of memory and bandwidth usage?
    4. Will the system scale well to annotation projects with very large datasets/very large chromosomes?
    5. JBrowse is fast on loading large regions but I don't know how it compares with Apollo (classic or WebApollo). Are there any estimates for the resources needed to serve a really large annotation project(one to two hundred annotators at peak load)? Will there be issues in running multiple annotation projects simultaneously from one server?
    6. What sort memory or bandwidth overhead will there be for loading a multi megabase sequence with a dozen evidence tracks? Will there also be issues on the client side?
    7. On projects where the genome is not as well polished, there are many unplaced scaffolds (ChrUn), on the order of thousands. Will there be an option to type in the chromosome name and position in addition to or instead of a drop down box?
      • Need a generic way
    8. We need to evaluate how NCBI and UCSC utilize unplaced scaffolds. Some groups concatenate all the unplaced scaffolds into one sequence, which may make annotation problematic. In the past, we had split all the ChrUn scaffolds into separate sequences, but this may be a problem if we are to keep compatability with UCSC.
      • Might need to embed the Georgetown splitter software into the retrieval software. Hard to make this work across different genome projects. Perhaps an initial configuration phase to gather the coordinates (NCBI or whoever has AGP file) will be needed first.
    9. What system requirements will there be (if any) for the end users?
      • Hardware:
        • Memory requirements (scaffold vs chromosome scale regions)
        • CPU?
      • Software:
        • OS/browser/other requirements?
  • UI
    • Server now returning CDS features. But JBrowse doesn't yet handle these and is still using separate UTR features. JSONUtils.createJBrowseFeature() parses the CDS feature returned from the server and if the PROCESS_CDS flag is set to true, it creates UTR/CDS JBrowse features. This however breaks the selection model for the annotation track. Until we change JBrowse's data model, perhaps we can change the selection behavior to select all features that are adjacent to the one selected, so features 1-5, 6-8, 9-12 would all be selected when selecting any of them?
    • Server will return flags in JSON for exons that have boundaries that are non-canonical. This keeps to the model of having the server deal with all the biological issues. The UI can then use this flag to display non-canonical splice sites.
    • Added code to communicate with the server for splitting exons and making introns (two separate operations).
  • Server
    • Handles lazy loading and caching of genomic sequence. Makes use of the same chunks used by JBrowse, so doesn't require any extra pre-processing.
    • Added code to "Make intron" operation. As discussed, it will default to finding the closest acceptor and donor sites from selected position (1 bp), but the minimum length of the automatically calculated intron will be configurable. Note that this is only the default behavior, and the curator can always manually drag an end to override the exon boundary.
  • Jay
    • Not much to report this work, but did talk for a couple hours last week and figured out a lot of details.