GSoC

From GMOD
Revision as of 18:25, 13 February 2014 by Robin.haw (Talk | contribs)

Jump to: navigation, search
Landing-page-gsoc2014.png

Welcome to the Genome Informatics Google Summer of Code

from the Google Summer of Code website:

Google Summer of Code is a global program that offers student developers stipends to write code for various open source software projects. We work with many open source, free software, and technology-related groups to identify and fund projects over a three month period. Since its inception in 2005, the program has brought together over 7,500 successful student participants from 97 countries and over 7,000 mentors from over 100 countries worldwide to produce over 50 million lines of code. Through Google Summer of Code, accepted student applicants are paired with a mentor or mentors from the participating projects, thus gaining exposure to real-world software development scenarios and the opportunity for employment in areas related to their academic pursuits. In turn, the participating projects are able to more easily identify and bring in new developers. Best of all, more source code is created and released for the use and benefit of all.

Google Summer of Code has several goals:

  1. Create and release open source code for the benefit of all
  2. Inspire young developers to begin participating in open source development
  3. Help open source projects identify and bring in new developers and committers
  4. Provide students the opportunity to do work related to their academic pursuits (think "flip bits, not burgers")
  5. Give students more exposure to real-world software development scenarios (e.g., distributed development, software licensing questions, mailing-list etiquette)


Genome Informatics GSoC

For the past few years, a group of related bioinformatics projects have participated in Google Summer of Code under the umbrella of Genome Informatics. This includes GMOD and its software projects -- GBrowse, JBrowse, etc.; Galaxy; PortEco; Reactome; SeqWare WormBase; and others.

How GSoC Works

From the GSoC FAQ:

  1. Open source projects who'd like to participate in Google Summer of Code in 2014 should choose at least two organization administrators to represent them.
  2. Organization administrators will submit the mentoring organization’s proposal for participation online.
  3. Google will notify the organization administrators of acceptance, and an account for the accepted organizations will be created on the Google Summer of Code 2014 site.
  4. Students submit project proposals online to work with particular mentoring organizations.
  5. Mentoring organizations rank student proposals and perform any other due diligence on their potential students; student proposals are matched with a mentor.
  6. Google allocates a particular number of student slots to each organization.
  7. Mentoring organizations make their final decision on which students to accept into the program.
  8. Students are notified of acceptance.
  9. Students begin learning more about their mentoring organization and its community before coding work starts.
  10. Students begin coding work at the official start of the program, provided they've interacted well with their community up until the program start date.
  11. Mentors and students provide mid-term progress evaluations.
  12. Mentors provide a final evaluation of student progress at close of program; students submit a final review of their mentor and the program.
  13. Students upload completed code to Google Summer of Code site.

The organization administrators for the Genome Informatics group are Robin Haw of Reactome and Amelia Ireland of GMOD.


Contact Us

Students: How to apply

We would like to know who you are and how you think. Incorporate the following into your application:

  • Your information
    • Name, email, and website (optional)
  • Brief background: education and relevant work experience
  • Your programming interests and strengths
    • What are your languages of choice?
    • Any prior experience with open source development?
    • Your interest and background in biology or bioinformatics
    • Any prior exposure to biology or bioinformatics?
  • Your ideas for a project (an original idea or one expanded from our Ideas Page)
    • Provide as much detail as possible
    • Strong applicants include an implementation plan and timeline (hint!)
    • Refer to and link to other projects or products that illustrate your ideas
    • Identify possible hurdles and questions that will require more research/planning
  • What can you bring to the team?


Resources

Guides

For Students

For Mentors


Project Ideas

These projects include a broad set of skills, technologies and domains, such as GUIs, database integration and algorithms.

Students are also encouraged to propose their own ideas related to our projects. If you have strong computer skills and have an interest in biology or bioinformatics, you should definitely apply! Do not hesitate to propose your own project idea: some of the best applications we see are by students that go this route. As long as it is relevant to one of our projects, we will give it serious consideration. Creativity and self-motivation are great traits for open source programmers.

If you have any difficulty using the wiki, please email your project proposal to help@gmod.org and we will add it for you.

See the list of project proposals from 2013 for ideas from last year's GSoC.


Advice from Google on suitable project ideas

The following information comes from the GSoC manual on what makes a good GSoC project:


There are many ways to define a good GSoC project—probably as many ways as there are student-mentor pairings. Here are just a few:

Low-hanging fruit: These projects require minimal familiarity with the codebase and basic technical knowledge. They are relatively short, with clear goals.

Risky/Exploratory: These projects push the scope boundaries of your development effort. They might require expertise in an area not covered by your current development team. They might take advantage of a new technology. There is a reasonable chance that the project might be less successful, but the potential rewards make it worth the attempt.

Fun/Peripheral: These projects might not be related to the current core development focus, but create new innovations and new perspective for your project.

Core development: These projects derive from the ongoing work from the core of your development team. The list of features and bugs is never-ending, and help is always welcome.

Infrastructure/Automation: These projects are the code that your organization uses to get its development work done; for example, projects that improve the automation of releases, regression tests and automated builds. This is a category in which a GSoC student can be really helpful, doing work that the development team has been putting off while they focus on core development.


source: GSoC manual


From the Genome Informatics GSoC experience in 2013, prospective students are interested in "new" technologies and languages, such as iOS and Android apps, and fancy, flashy, web-based projects.


Project idea format

Example of Idea

Brief description of the idea, including any relevant links, etc.

  • Languages and skills: programming language(s) to be used, plus any other particular computer science skills needed
  • Idea: name + contact details of the person(s) who thought up the idea
  • Mentor(s): name + contact details of the proposed mentor(s)

2014 Project Ideas

Visualising Large Diagrams

Reactome is a free, open-source, curated and peer reviewed database of biomolecular pathways with about 12.000 distinct visitors/month. The Reactome Pathway Diagram viewer was develop initially as a GSoC project and it has become part of the Reactome Pathway Browser (http://www.reactome.org/PathwayBrowser). The widget works fine for the current size of the diagrams but there is a need of including larger diagrams in the future, so we need to improve the current implementation using a different approach.

Languages and skills: Java, GWT, HTML5 Canvas, Data visualisation
Idea: Henning Hermjakob <hhe@ebi.ac.uk>, Antonio Fabregat <fabregat@ebi.ac.uk>
Mentor(s): Antonio Fabregat Mundo <fabregat@ebi.ac.uk>
Description: The current pathway diagram widget works fine for the pathways in Reactome but diagrams with a large number of entities, for example large biomolecular disease maps, slow the widget down unacceptably. A different approach is needed in order to draw larger pathways in the canvas. Including techniques used for gaming can help to our propose, for example using quadtrees would help to filter the number of objects to be drawn in each canvas iteration (depending of the zoom level and the targeted frame) and will also speed up the object hovering detection while the user moves the mouse over the diagram. Another useful improvement to the diagram could be implementing a multi-layer approach using several canvases for representing different layers of information. In this case exporting the view as an image will be a little more complicated but it is a good use case to take into account at the end of the internship.

SeqWare

Languages and skills: Java, Bash/Linux, AWS, Google Cloud, Ansible, Vagrant, HBase/NOSQL, MapReduce+associated Hadoop technologies
Mentor(s): Brian O'Connor <boconnor@oicr.on.ca>, Denis Yuen <denis.yuen@oicr.on.ca>

There are quite a few projects that I would like to see happen for SeqWare and it would be great to get a student to help on these:

  • add hybrid workflow support to SeqWare Pipeline so users can write workflows that include support for Hadoop tools (Pig, Hive, M/R, etc) and traditional command line tools
  • push forward the design of our multi-cloud cluster provisioning technology stack based on Vagrant. This includes incorporating cool provision technologies like Ansible.
  • leverage Elastic Map Reduce on Amazon's AWS as an environment to run SeqWare
  • leverage the Google cloud, add support for spinning up SeqWare clusters in this environment and to interact with their bucket store
  • work with the Galaxy tool and finish the compatibility layer that allows SeqWare workflows to run/interact with Galaxy
  • write a AngularJS-based web application on top of our HBase variant/read NOSQL database, write proof of concept analytical plugins that use machine learning and other advanced techniques to analyze data stored in this scalable backend