GSOC Project Ideas 2016

Revision as of 22:42, 18 February 2016 by Dannon (Talk | contribs)

Jump to: navigation, search

There are plenty of challenging and interesting project ideas this year. These projects include a broad set of skills, technologies and domains, such as GUIs, database integration and algorithms.

Students are also encouraged to propose their own ideas related to our projects. If you have strong computer skills and have an interest in biology or bioinformatics, you should definitely apply! Do not hesitate to propose your own project idea: some of the best applications we see are by students that go this route. As long as it is relevant to one of our projects, we will give it serious consideration. Creativity and self-motivation are great traits for open source programmers.

  • Project Idea Name
    • Brief explanation: Brief description of the idea, including any relevant links, etc.
    • Expected results: describe the outcome of the project idea.
    • Knowledge prerequisites: programming language(s) to be used, plus any other particular computer science skills needed.
    • Skill level: Basic, Medium or Advanced.
    • Mentors: name + contact details of the lead mentor, name + contact details of backup mentor.

Here is a list of the proposed project ideas for 2016:

  • Project Idea 1: Biological Graph Visualization
    • Brief explanation: Tripal ( is an open-source suite of Drupal modules that allows a scientific research community to more easily setup and manage a data repository for genomic, genetic and related biological data. It provides data pages, data mining tools and visualizations. Tripal is used or in development by over 25 different genome database websites, and is developed by an international group. A Tripal module currently exists for importing, searching and visualizing graph data that models the "network" of interactions of various components of a biological system. However, the module is not complete and requires improvements to the visualizations. The goal of this project would be to complete the remaining work for this module such that it can be shared with others.
    • Expected results: Once completed, a Drupal module will freely available for Tripal-based sites to use on their own sites. Thus providing graph visualizations for complex biological systems.
    • Knowledge prerequisites: PHP, Drupal, JavaScript, SQL.
    • Skill level: Medium
    • Mentors: Stephen Ficklin
  • Project Idea 2: Github-based revision control of synthetic chromosomes
    • Brief explanation: JBrowse ( is a robust open-source genome visualization tool built around Javascript and HTML5. It has gained wide acceptance among biologists and bioinformatisist with thousands of active installations worldwide in genomic research. This project deals with a module that enables management synthetic DNA sequences in JBrowse. Synthetic biologists design DNA sequences that differ from their analogous sequences in natural organisms. As with code, the differences can be incremental or radical, and can be visualized using “diff”-like tools. And, also as with code, good revision control is fundamentally important. We propose to build module that enables biologists to manage synthetic sequences. This will contribute toward a broader effort to visualize the results of synthetic biology experiments in the computational design phase, after synthesis (via DNA re-sequencing), and to verify gene expression under various conditions (via RNA sequencing). We propose to store revisions of synthetic chromosome sequences in git (primarily github) repositories. The project would include the development of plugin components on both the server and client sides. The new extensions would provided a means of detection of branch and tag updates on the github repos and provide a means to select and retrieve the synthetic sequences from github. The backend part of the plugin would provide a means to manage multiple synthetic sequences and manipulate associated JBrowse-based datasets. This is a challenging and exciting project at the interface of computational and synthetic biology. You’ll have lots of guidance developing cool science tools that will have a relevant impact to the scientific community.
    • Expected results: Your module will be exploited at large by a new breed of synthetic biologists.
    • Knowledge prerequisites: Candidates should have some good experience with Javascript, HTML5 & CSS3. Experience with REST, Node.js, Dojo, and jQuery, or Github API would be a plus (you’ll be learning it all). If you think biology is cool and would like to learn a lot more about it, that’s a plus too.
    • Skill level: Medium
    • Mentors: Eric Yao, Lead JBrowse Developer
  • Project Idea 3: Lightweight chat plugin for the JBrowse genome browser
    • Brief explanation: Increasingly, genome scientists collaborate remotely from multiple sites: at genome centers in academic institutions, from biotech companies, from clinical labs and (increasingly with the advent of portable genome sequencing) from field sites. This project idea is to develop a lightweight messaging/chat plugin for JBrowse using OAuth2 and the Faye pub/sub framework. Users will be able to see who else is currently browsing the genome (provided that they have set themselves as visible), to see where they are browsing, and to send and receive messages. A possible extension is to post comments on the genome. The general idea here is to make genomes (and their constituent objects, e.g. gene annotations) into “social objects”. This is in keeping with our vision of JBrowse as not just a tool for genomics, but for social genomics. The availability of thousands of JBrowse instances which could readily incorporate the plugin offers the possibility quick and deep adoption by the genomics community.
    • Expected results: This module will enable a new way for bioengineers to share and socialize genomic information.
    • Knowledge prerequisites: Javascript/HTML5/CSS3. NodeJS, Dojo a plus. Digging science, a plus.
    • Skill level: Medium
    • Mentors: Ian Holmes, Principal Investigator and founder of JBrowse
  • Project Idea 4: Linking Galaxy with Google Drive
    • Brief explanation: The Galaxy application implements the notion of an Object Store - a pluggable file management interface that acts as a layer between Galaxy and any user dataset. This Object Store interface allows datasets to be ‘physically’ disconnected from a particular instance of Galaxy while the application can still access and interact with them. This opens the door for providing various storage mediums where the data is actually stored. Ultimately, thus allows a user to associate self-provisioned external storage resources with their Galaxy account and move beyond the imposed quota or limitations on any given Galaxy instance. Thus far, an abstract hierarchical store, Amazon S3, iRODS, and various local disk object stores have been implemented. However, use of an Object Store within Galaxy is an application-wide setting instead of being a per-user setting allowing users to specify their own back-end storage medium. Additionally, linking with the Google Drive is highly desirable allowing user to leverage the Google Drive for Education program.
    • Expected results: Implement a Galaxy Object Store for Google Drive. Allow per-user specification of a back-end data store
    • Knowledge prerequisites: Required Skills Python programming. Familiarity with Galaxy and/or object store APIs
    • Skill level: Medium
    • Mentors: Enis Afgan (
  • Project Idea 5: Work with the Dockstore Team and the GA4GH to Enable Cross Docker Repository Sharing
    • Brief explanation: The Dockstore project seeks to create a site where researchers can encapsulate their tools in Docker, a flexible and popular virtualization technology, and describe the tools using the Common Workflow Language and/or the Workflow Definition Language. The benefit is having a programatic way to then create, share, and run bioinformatics tools. On its own this is cool since it allows scientists to make their tools portable from cloud-to-cloud, something we saw as key in Petabyte-scale projects like PCAWG where the data simply can't be moved around. But an equally important goal is create a community standard with the GA4GH so many sites like Dockstore can be created that all share a common API. This project will focus on working with the GA4GH community, which is a huge collaboration between over 300 groups and companies world-wide, to create and implement API standards, ensure Dockstore supports them, and to facilitate cross indexing of tools across all sites that support the standard in order to share tools as seamlessly as possible.
    • Expected results: the API is "approved" by the GA4GH as an official standard and Dockstore, and other Docker repositories, support the standard in order to facilitate exchange of tools
    • Knowledge prerequisites: Dockstore is written in Java and uses AngularJS, experience with the former is important and the latter is nice to have. Ability to work with diverse people from a variety of organizations and companies required.
    • Skill level: Basic to Medium
    • Mentors: Brian O'Connor (OICR & UCSC), Denis Yuen (OICR), and the GA4GH Containers and Workflows interest group (!forum/ga4gh-dwg-containers-workflows)
  • Project Idea 6: Galaxy Pages Overhaul
    • Brief explanation: Galaxy Pages are a way of communicating Galaxy analyses so that other researchers can easily view, reproduce, or extend an analyses. To build pages - researchers use a WYSIWYG editor to build HTML pages that may contain embedded Galaxy objects such as histories, datasets, workflows, and visualizations. Pages are a powerful concept but are underutilized, and we believe a substantial overhaul could increase their accessibility and usage. The current HTML-based pages contain a number of usability issues. The first step would be to address these and update the embedded WYMeditor to its latest stable version. The embedded HTML approach works well for non-technically savvy users - but advanced users would prefer alternatives such as Markdown or IPython Notebooks - extending the framework to allow these is one possibility for the project. Alternatively - extending pages with new features for collaborative editing would make them much more powerful as well.
    • Expected results: Improve Galaxy Pages addressing existing bugs and swapping to a pluggable back-end supporting Markdown.
    • Knowledge prerequisites: Python and JavaScript experience would be useful.
    • Skill level: Medium
    • Mentors: Dannon Baker ( and other Galaxy core developers

  • Project Idea 7: Galaxy Kubernetes Integration
    • Brief explanation: Galaxy supports running jobs in Docker containers for running jobs on a single node. However, the size of biological datasets and the complexity of the questions being asked is constantly increasing and this is leading to ever more complex analytics - meaning one container running on one node will become an increasingly problematic limitation. Kubernetes is an exciting project that provides facilities for coordination of many containers. Extending Galaxy and/or the Galaxy remote job submission application Pulsar to interface with Kubernetes would potentially allow Galaxy to leverage to run these more complicated multiple-node, multiple-container analysis steps that will be required for future large scale biological data analysis.
    • Expected results: Implement the ability to annotate Galaxy tools with Kubernetes pods and orchestrate these jobs via Kubernetes orchestration either in Galaxy directly or via Pulsar. Develop example pod.
    • Knowledge prerequisites: Python programming, with experience in cloud and/or cluster computing, and with containers.
    • Skill level: Medium
    • Mentors: Dannon Baker ( and other Galaxy core developers