GSoC

From GMOD
Revision as of 21:58, 6 February 2015 by Robin.haw (Talk | contribs)

Jump to: navigation, search

Google Summer of Code 2015 @ Genome Informatics

Google Summer of Code is a global program that offers student developers stipends to write code for various open source software projects. We work with many open source, free software, and technology-related groups to identify and fund projects over a three month period. Since its inception in 2005, the program has brought together over 8,500 successful student participants from 101 countries and over 8,300 mentors from over 109 countries worldwide to produce over 50 million lines of code. Through Google Summer of Code, accepted student applicants are paired with a mentor or mentors from the participating projects, thus gaining exposure to real-world software development scenarios and the opportunity for employment in areas related to their academic pursuits. In turn, the participating projects are able to more easily identify and bring in new developers. Best of all, more source code is created and released for the use and benefit of all. (Excerpt from the Google Summer of Code website)

Since 2011, the Genome Informatics group has served as an "umbrella organization" to a variety of bioinformatics projects, including GMOD and its software projects -- GBrowse, JBrowse, etc.; Galaxy; PortEco; Reactome; SeqWare; WormBase; and others. More information about this year's participating bioinformatics groups can be found here here.

To learn more about this year's event and how GSoC works, please refer to the GSoC FAQ.


Mailing lists, IRC, and other ways to get in touch

  • Email: help@gmod.org and robin.haw@oicr.on.ca -- find out more about GSoC, a specific project, or your potential mentor(s).
  • Discussion mailing lists: Genome Informatics Google Groups - ask about our projects; join the community!
  • IRC channel: #genomeinformatics on Freenode.
  • Mentors can email both Robin and Scott to get more information about the program and get signed up.


Project Ideas

There are plenty of challenging and interesting project ideas this year. These projects include a broad set of skills, technologies and domains, such as GUIs, database integration and algorithms.

Students are also encouraged to propose their own ideas related to our projects. If you have strong computer skills and have an interest in biology or bioinformatics, you should definitely apply! Do not hesitate to propose your own project idea: some of the best applications we see are by students that go this route. As long as it is relevant to one of our projects, we will give it serious consideration. Creativity and self-motivation are great traits for open source programmers.

Preparing for GSoC 2015

Right now it is off-season for GSoC - we won't know if Genome Informatics has been accepted as a GSOC 2015 mentoring organization until March 2nd. The timeline for GSoC for 2015 has now been posted here. Nevertheless, it is a perfect time if students would like to talk to mentors about project ideas. If you are interested in mentoring, please check the Mentors section below, and contact the organization admin.

Students

More information about writing your application will be available closer to the start of the student application period.

Mentors

We encourage mentors and mentoring organizations to think about new projects year round! If you'd like help with your ideas page or your separate mentoring org application, please feel to contact the organization admins. Links to advice about mentoring and other resources are available.

2014 Project Ideas

Reactome: Visualising Large Diagrams

Reactome is a free, open-source, curated and peer reviewed database of biomolecular pathways with about 12.000 distinct visitors/month. The Reactome Pathway Diagram viewer was develop initially as a GSoC project and it has become part of the Reactome Pathway Browser (http://www.reactome.org/PathwayBrowser/). The widget works fine for the current size of the diagrams but there is a need of including larger diagrams in the future, so we need to improve the current implementation using a different approach.

  • Languages and skills: Java, GWT, HTML5 Canvas, Data visualisation
  • Idea: Henning Hermjakob <hhe@ebi.ac.uk>, Antonio Fabregat <fabregat@ebi.ac.uk>
  • Mentor(s): Antonio Fabregat Mundo <fabregat@ebi.ac.uk>, Robin Haw <robin.haw@oicr.on.ca>

Description: The current pathway diagram widget works fine for the pathways in Reactome but diagrams with a large number of entities, for example large biomolecular disease maps, slow the widget down unacceptably. A different approach is needed in order to draw larger pathways in the canvas. Including techniques used for gaming can help to our propose, for example using quadtrees would help to filter the number of objects to be drawn in each canvas iteration (depending of the zoom level and the targeted frame) and will also speed up the object hovering detection while the user moves the mouse over the diagram. Another useful improvement to the diagram could be implementing a multi-layer approach using several canvases for representing different layers of information. In this case exporting the view as an image will be a little more complicated but it is a good use case to take into account at the end of the internship.

Pathway Comparison Widget

The Pathway Comparison viewer was developed initially as a GSoC project and it has become part of the Reactome Analysis Tool (http://www.reactome.org). The idea is to improve the pathway comparison widget in order to make it interactive so the user can navigate through the result by clicking the nodes, edges and using the zoom level.

  • Languages and skills: Java, GWT, HTML5 Canvas, Data visualisation, BioJS
  • Idea: Henning Hermjakob <hhe@ebi.ac.uk>, Antonio Fabregat <fabregat@ebi.ac.uk>
  • Mentor(s): Antonio Fabregat Mundo <fabregat@ebi.ac.uk>, Robin Haw <robin.haw@oicr.on.ca>

Description: The current widget represents pathways as circles, whose size is determined by the number of proteins contained in the pathway and the coloration is by mean expression level. The width of the lines connecting two nodes is determined by the similarity of both pathways in terms of contained proteins (see Figure below).

Reactome Network Summary


For the new version we would like to have a slightly different look and feel (study different options is required) and adding interactivity will be one of the main requirements. A basic approach would be showing a popup when the user clicks a node or a link between nodes. The pop up will show a summary of the data proteins contained in the case of nodes or the similarity summary in case of the links. A most advanced improvement could be allowing the user to move nodes across the canvas or allowing to show or hide nodes (and so only showing links between the ones that are shown) and adding zoom in order to show different data granularity depending on the zoom level. The last requirement is to include the widget in the EBI BioJS registry.

SeqWare

  • Languages and skills: Java, Bash/Linux, AWS, Google Cloud, Ansible, Vagrant, HBase/NOSQL, MapReduce+associated Hadoop technologies
  • Mentor(s): Brian O'Connor <boconnor@oicr.on.ca>, Denis Yuen <denis.yuen@oicr.on.ca>

There are quite a few projects that I would like to see happen for SeqWare and it would be great to get a student to help on these:

  • add hybrid workflow support to SeqWare Pipeline so users can write workflows that include support for Hadoop tools (Pig, Hive, M/R, etc) and traditional command line tools
  • push forward the design of our multi-cloud cluster provisioning technology stack based on Vagrant. This includes incorporating cool provision technologies like Ansible.
  • leverage Elastic Map Reduce on Amazon's AWS as an environment to run SeqWare
  • leverage the Google cloud, add support for spinning up SeqWare clusters in this environment and to interact with their bucket store
  • work with the Galaxy tool and finish the compatibility layer that allows SeqWare workflows to run/interact with Galaxy
  • write a AngularJS-based web application on top of our HBase variant/read NOSQL database, write proof of concept analytical plugins that use machine learning and other advanced techniques to analyze data stored in this scalable backend


InterMine

  • Languages and skills: Java, JavaScript, Python
  • Mentor(s): InterMine team members

Some brief ideas for InterMine projects:

  • InterMine and the Semantic Web - make InterMine more semantic.
  • Building biological tools - eg: a synteny viewer
  • Mobile (Android/iOS) apps
  • Data importer/Mine builder: an application to build a mine from a set of standard files and web-services.

Tripal Pedigree Viewer

  • Languages and skills: PHP, HTML 5 and Javascript
  • Mentor(s): Lacey-Anne Sanderson <lacey.sanderson@usask.ca>

Description: Development of an interactive, collapsible pedigree diagram to be displayed on Tripal Germplasm pages. The nodes of the diagram need to contain the name of the stock with a link to the page and the edges of the diagram need to be named with the relationship type (ie: maternal parent of). All of the data is already stored within a PHP tree class with traversal methods. Thus we are looking for a student to use the traversal methods to generate the markup needed for their application and the actual drawing of the pedigree using languages and libraries of their choosing. Here is an example showing the collapsibility desired; however, names within the node circles (as compared to beside in the example) and labelled connector lines (edges) are needed.

Background: Tripal is a Drupal module that implements display and management of biological data within a Drupal site. Drupal is a PHP-based, database-driven content management system used for development of websites (from blogs, to ecommerce sites, and now organism community sites such as KnowPulse: Legume Breeding & Genomics, Citrus Genome and many more). See our website for more Tripal sites as well as additional information. The Tripal Germplasm module provides the ability to display and manage plant/animal breeding programs. Currently the pedigree is displayed in the community standard textual format (ie: ParentA//ParentB1/ParentB2 which says the offspring of ParentB1 & ParentB2 mated with ParentA to produce the current germplasm). Although this is descriptive and common in the community, a graphical diagram showing these relationships would be a lot more intuitive which is the motivation behind this project.


JBrowse: REST daemon for Chado

Implement a self-contained server in the language of your choice (such as Python/WSGI, Perl/Plack, node.js, or Java/Jetty) to serve feature data and name completions out of a GMOD Chado database schema according to the JBrowse 1 REST API, enabling an instance of JBrowse1 to run directly atop a Chado database. Possible addition: implement another daemon in Perl/Plack that does the same thing for a GBrowse 2 installation.

  • Skills: server-side language of student's choice


JBrowse "regions of interest" lists

Add functionality to JBrowse 2 to manage lists of "regions of interest" on a per-user basis, storing the lists using the JavaScript localStorage API. Allow a user to "apply" a regions list to a view in JBrowse 2 so that the view shows only the user's regions, without any of the intervening space in between.

  • Skills: advanced JavaScript


Drupal-based GMOD Tool Information Tool

  • Skills: PHP / Drupal, HTML, Javascript
  • Mentors: Lacey-Anne Sanderson, Amelia Ireland

Description to be added soon.


GMOD Virtual Server Configurator

  • Skills: cgi-capable language of your choice (e.g. Perl, PHP, JS), html, javascript
  • Mentors: Scott Cain
  • Idea: Amelia Ireland, Scott Cain

GMOD virtual servers are preconfigured sets of GMOD components that allow users a quick, easy way to set up a bioinformatics resource for their data. Each tool has configuration options that are currently set using a plain text files. Create a user-friendly configuration client that will allow users to customise components without having to dig into a text editor.


Galaxy CloudMan

  • Languages and skills: Pyhton, JavaScript, Backbone, Mako, Bash/Linux, AWS
  • Idea: Enis Afgan (afgane AT gmail.com)
  • Mentor(s): Enis Afgan (afgane AT gmail DOT com), Dannon Baker (dannon DOT baker AT gmail DOT com)

Galaxy CloudMan (http://usecloudman.org) is a cloud manager that orchestrates all the steps required to provision and manage a set of cloud resources to deliver a functional compute cluster in the cloud. A deployed instance of CloudMan comes preconfigured with the Galaxy application, dozens of bioinformatics tools and gigabytes of genome reference data. The application is used around the world to launch hundreds of clusters per month. The following are suggestions for the student improvements that would help the project grow further (each would be a separate project):

  • A new web interface, exposing key application functionality and focusing on scalability and accessibility
  • An automated process for deploying/replicating Galaxy on the Cloud across all AWS regions
  • Advanced cluster autoscaling (responsive, based on individual cluster’s workload, taking advantage of different cloud instance types)

Galaxy Charts and Open Requests

  • Languages and skills: Pyhton, JavaScript, Bash/Linux
  • Idea: Aysam Guerler (aysam.guerler AT gmail.com)
  • Mentor(s): Sam Guerler (aysam DOT guerler AT mail.com)

Ideas:

  • Improving Galaxy Charts by e.g. adding new visualizations or options to customize visualizations. This is a very confined project. It has the advantage that the student can (basically) not break code and does not have to grasp Galaxy’s inner layers, but still would be able to make a major contribution.


dictyBase: Integration of HTML5 based live content editor

Languages and skills: HTML5, Javascript(angularjs) and CSS(Bootstrap/Pure framework) markup.

Idea: Siddhartha Basu(siddhartha DASH basu AT northwestern DOT edu)

Mentor(s): Siddhartha Basu(siddhartha DASH basu AT northwestern DOT edu), Petra Fey(pfey AT northwestern DOT edu)

Idea

dictyBase has quite a lot of static HTML pages(for example the front page) that are handcrafted and maintained by manually editing on the server side. The pages are content heavy, however the manual nature of it makes it incredibly difficult to add new content, integrate third party widgets (such as twitter feed) or do collaborative editing. The proposal is to integrate one of raptor, mercury or bootstrap X-editable client side HTML5 editor to make the content editable right from the browser. The content will be pushed back and forth through a RESTful backend. The project is expected to be split into the following sections...

  • Generate a bootstrap(optionally pure framework) based markup of core page structure. This includes header/footers and parts of pages that are not editable.
  • Identify the contentblocks and integrate one of the editors (student's choice) to make them editable.
  • Use angularjs (restangular prefered) framework to save the edited content to a RESTful backend. The RESTful backend (written in golang) along with HTTP resource specification would be made available (deployable binary) to the student.
  • Integrate image inclusion. Could explore angularjs based option such as angular file upload
  • Make the editor available only to authorized users. For this, integrate the frontend to our RESTful authentication backend.


WormBase: data visualization

WormBase (www.wormbase.org) is a central data repository supporting the nematode research community.

  • Languages and skills: javascript, HTML5, JS graphical library of your choice (eg. d3), some perl
  • Mentor(s): Abigail Cabunoc <abigail.cabunoc@oicr.on.ca>

There are several areas of improvement for data visualization on the wormbase website. Here are a couple requests we've received from the community, but we are open to other ideas: