Difference between revisions of "GSoC"

From GMOD
Jump to: navigation, search
m (Adding Chado owltools loader and AmiGO2 proposals)
m (Google Summer of Code 2019 @ Open Genome Informatics)
(124 intermediate revisions by 11 users not shown)
Line 1: Line 1:
[[Image:soc-logo-google-blue.jpg|right|480px|link=http://www.google-melange.com/gsoc/homepage/google/gsoc2013]]
+
[[File:GoogleSummer_2016logo.jpg|373px|right|link=GSoC]]
  
== Welcome to the Genome Informatics Google Summer of Code ==
+
== Google Summer of Code 2019 @ Open Genome Informatics ==
''“Google Summer of Code is a global program that offers post-secondary student developers ages 18 and older stipends to write code for various open source software projects. We have worked with open source, free software, and technology-related groups to identify and fund projects over a three month period. Since its inception in 2005, the program has brought together over 7000 successful student participants and over 3000 mentors from over 180 countries worldwide, all for the love of code. Through Google Summer of Code, accepted student applicants are paired with a mentor or mentors from the participating projects, thus gaining exposure to real-world software development scenarios and the opportunity for employment in areas related to their academic pursuits. In turn, the participating projects are able to more easily identify and bring in new developers. Best of all, more source code is created and released for the use and benefit of all.”''<br>
+
  
GSoC has several goals:
+
'''[https://summerofcode.withgoogle.com/ Google Summer of Code]''' is a global program that offers student developers stipends to write code for various open source software projects. We work with many open source, free software, and technology-related groups to identify and fund projects over a three month period. Since its inception in 2005, the program has brought together over 14,000 successful student participants from 118 countries, 651 open source organizations, and over 35 million lines of code. Through Google Summer of Code, accepted student applicants are paired with a mentor or mentors from the participating projects, thus gaining exposure to real-world software development scenarios and the opportunity for employment in areas related to their academic pursuits. In turn, the participating projects are able to more easily identify and bring in new developers. Best of all, more source code is created and released for the use and benefit of all. (''Excerpt from the [https://summerofcode.withgoogle.com/ Google Summer of Code website]'')
  
*get more open source code created and released for the benefit of all
+
Since 2011, the Open Genome Informatics group has served as an "umbrella organization" to a variety of bioinformatics projects, including [[Main Page|GMOD]] and its software projects -- [[JBrowse]], [[Apollo]], [[Chado]], [[Galaxy]] etc.; [http://www.informatics.jax.org/ Mouse Genome Informatics]; [https://oicr.on.ca/research-portfolio/ OICR]; [http://www.reactome.org Reactome]; [http://www.wormbase.org WormBase]; and [https://bioconda.github.io/ Bioconda].
*inspire young developers to begin participating in open source development
+
*help open source projects identify and bring in new developers and committers
+
*provide students the opportunity to do work related to their academic pursuits during the summer
+
*give students more exposure to real-world software development scenarios.
+
  
[http://www.google-melange.com/gsoc/homepage/google/gsoc2013 Google Summer of Code (GSoC)]
+
'''More information about this year's participating bioinformatics groups can be found [[GSOC_Groups | here]].'''
  
== Member Projects ==
+
To learn more about this year's event and how GSoC works, please refer to the [https://developers.google.com/open-source/gsoc/faq FAQ].
The Genome Informatics group is organizing the joint efforts of Galaxy, GBrowse, GMOD, JBrowse, Reactome, SeqWare, and Wormbase (see below). This is a great opportunity for students to contribute to the work of any of  eight established bioinformatics projects.<br>
+
  
;'''[http://galaxy.psu.edu Galaxy]''': An open, web-based platform for accessible, reproducible, and transparent computational biomedical research. The public Galaxy service makes analysis tools, genomic data, tutorial demonstrations, persistent workspaces, and publication services available to any scientist that has access to the Internet. Local Galaxy servers can be set up by downloading the Galaxy application and customizing it to meet particular needs. Galaxy is implemented in Python. Links: [http://galaxy.psu.edu/ Website].
+
==Mailing lists, IRC, and other ways to get in touch  ==
  
;'''[http://www.gmod.org Generic Model Organism Database (GMOD)]''' : An open source project to develop a complete set of software for creating and administering a model organism database. Components of this project include genome visualization and editing tools, literature curation tools, a robust database schema, biological ontology tools, and a set of standard operating procedures. This project is collaboration of several database projects, including WormBase, FlyBase, Mouse Genome Informatics, Gramene, the Rat Genome Database, TAIR, EcoCyc, and the Saccharomyces Genome Database. Links: [http://www.gmod.org Website], [http://blog.gmod.org GMOD Blog]
+
*Email: [mailto:robin.haw@oicr.on.ca robin.haw@oicr.on.ca] '''and''' [mailto:help@gmod.org help@gmod.org] -- find out more about GSoC, a specific project, or your potential mentor(s).
 
+
;'''[http://gmod.org/wiki/GBrowse GBrowse]''': The Generic Genome Browser (GBrowse) is a web application for searching and displaying annotations on genomes. GBrowse was designed from the bottom up for portability, extensibility, and modularity. It relies on no proprietary software, but only readily available open source software such as MySQL and the BioPerl libraries. GBrowse is implemented in Perl. Link: [http://gmod.org/wiki/GBrowse Website].
+
 
+
;'''[http://jbrowse.org JBrowse]''': JBrowse is being developed as the successor to GBrowse.  It is a modern, fast genome browser implemented almost entirely in JavaScript, with some server-side formatting code in Perl.  Link: [http://jbrowse.org Website].
+
 
+
;'''[http://porteco.org PortEco]''': PortEco project unifies web access to information and tools about the biology of E. coli, its bacteriophages, plasmids, and mobile genetic elements. PortEco partners include [http://ecocyc.org EcoCyc], [http://ecoliwiki.org EcoliWiki], the [http://expression.porteco.org Stanford Microarray Database], and [http://pantherdb.org PANTHER] protein families database.  PortEco is responsible for maintaining the [http://geneontology.org Gene Ontology] annotation of ''E. coli'' genes.
+
 
+
;'''[http://www.reactome.org Reactome]''' : A manually curated database of core pathways and reactions in human biology that functions as a data mining resource and electronic textbook. The Reactome data model describes diverse processes in the human system, including the pathways of intermediary metabolism, regulatory pathways, signal transduction, and high-level processes, such as the cell cycle. Reactome software uses only freely available (and often open source) components and has been created with cross-platform compatibility and wide usability in mind. Data is stored in a MySQL database, the web site is implemented in Perl and data entry tool in Java programming language. The Reactome team is composed of individuals who are both biologists and programmers at the Ontario Institute for Cancer Research, New York University Langone Medical Center, Cold Spring Harbor Laboratory, and The European Bioinformatics Institute. Links: [http://www.reactome.org Website], [http://wiki.reactome.org ReactomeWiki ].
+
 
+
;'''[http://seqware.github.com SeqWare]''': SeqWare is a multi-faceted project that includes a developer-friendly workflow development and execution engine (SeqWare Pipeline) along with a NoSQL variant database (SeqWare Query Engine).  The system is used by OICR to automate the analysis of a large percentage of the NGS samples processed by the institute.  It's our intention to share these workflow with the community and we're using the SeqWare workflow format and reference VM on Amazon to do this.  For the GSoC we're interested in integrating the project with Galaxy to use that terrific application as a frontend.  We also are interested in exposing the Query Engine HBase variant database through a nice REST API and web app for interacting with this highly scalable variant storage and analysis system.  The SeqWare team is based at OICR and includes developers at UNC and other locations.  Links: [http://seqware.github.com]
+
 
+
;'''[http://www.wormbase.org WormBase]''' : An online bioinformatics database of the biology and genome of the model organism Caenorhabditis elegans and related nematodes. It is used by the C. elegans research community both as an information resource and as a mode to publish and distribute their results. The database is constantly updated and new versions are released on a monthly basis. WormBase is a collaboration among the Wellcome Trust Sanger Institute, Ontario Institute for Cancer Research, Washington University in St. Louis, and the California Institute of Technology. Links: [http://www.wormbase.org Website].
+
 
+
== Contact Us ==
+
*Email: robin.haw[AT]oicr.on.ca - contact me to find out more about a project or your potential mentor(s).
+
 
*Discussion mailing lists: [http://groups.google.com/group/genome-informatics Genome Informatics Google Groups] - ask about our projects; join the community!
 
*Discussion mailing lists: [http://groups.google.com/group/genome-informatics Genome Informatics Google Groups] - ask about our projects; join the community!
 
*IRC channel: #genomeinformatics on Freenode.
 
*IRC channel: #genomeinformatics on Freenode.
 +
* Students and Mentors can email both [[User:Robin.haw|Robin]] and [[User:Scott|Scott]] to get more information about the program.
  
== How to apply ==
+
== [[GSOC_Project_Ideas_2019 | Project Ideas]] ==
 
+
We would like to know who you are and how you think. Incorporate the following into your application:
+
 
+
*Your information
+
**Name, email, and website (optional)
+
*Brief background: education and relevant work experience
+
*Your programming interests and strengths
+
**What are your languages of choice?
+
**Any prior experience with open source development?
+
**Your interest and background in biology or bioinformatics
+
**Any prior exposure to biology or bioinformatics?
+
*Your ideas for a project (an original idea or one expanded from our Ideas Page)
+
**Provide as much detail as possible
+
**Strong applicants include an implementation plan and timeline (hint!)
+
**Refer to and link to other projects or products that illustrate your ideas
+
**Identify possible hurdles and questions that will require more research/planning
+
*What can you bring to the team?
+
 
+
== Resources ==
+
*[http://www.google-melange.com/gsoc/homepage/google/gsoc2013 GSoC Main Site]
+
*[http://www.google-melange.com/gsoc/events/google/gsoc2013 Events and Timeline]
+
*[http://www.google-melange.com/gsoc/document/show/gsoc_program/google/gsoc2013/about_page About GSoC]
+
*[http://www.google-melange.com/gsoc/document/show/gsoc_program/google/gsoc2013/help_page FAQ]
+
*[http://google-opensource.blogspot.com/ GSoC Blog]
+
 
+
===Guides===
+
*[http://en.flossmanuals.net/melange/ GSoC User Guide]
+
*[http://en.flossmanuals.net/GSoCStudentGuide/ GSoC Student Guide]
+
*[http://en.flossmanuals.net/GSoCMentoring/ GSoC Mentoring Guide]
+
 
+
=== For Students ===
+
*[http://en.flossmanuals.net/GSoCStudentGuide/ GSoC Student Guide]
+
*[http://groups.google.com/group/google-summer-of-code-students-list GSoC Students-Only List]
+
 
+
=== For Mentors ===
+
*[http://en.flossmanuals.net/GSoCMentoring/ GSoC Mentoring Guide]
+
*[http://groups.google.com/group/google-summer-of-code-mentors-list GSoC Mentors-Only List]
+
 
+
== Project Ideas ==
+
 
+
These projects include a broad set of skills, technologies and domains, such as GUIs, database integration and algorithms. You are also encouraged to propose your own ideas related to our projects. If you have strong computer skills and have an interest in biology or bioinformatics, then you should definitely apply!
+
 
+
=== Reactome Smartphone Application ===
+
 
+
Reactome has a new [http://reactomews.oicr.on.ca:8080/ReactomeRESTfulAPI/ReactomeRESTFulAPI.html RESTful interface], which has the ability to expose pathway data in Reactome as XML and JSON. We would like to develop a smartphone application for Reactome so that it runs on a variety of platforms (iOS and Android in the first instance). The application will consume the data available via the [http://reactomews.oicr.on.ca:8080/ReactomeRESTfulAPI/ReactomeRESTFulAPI.html RESTful interface] to render its views and perform its functions.
+
 
+
* Language and Skills: HTML/CSS/Javascript, and familiarity to AJAX/JSON and popular JavaScript libraries (e.g. jQuery)
+
* Idea by: Guanming Wu
+
* Potential Mentors: Guanming Wu
+
 
+
 
+
=== Reactome Big Picture Visualization ===
+
 
+
In this project, the successful applicant will apply modern technologies for visualizing large data sets to Reactome's collection of pathways.  Good visualizations already exist for single pathways and are being deployed on the [http://www.reactome.org Reactome website], but we lack a view that will succinctly summarize the entirety of our pathways in an informative and intuitive way.  Two new visualization technologies will be explored as part of this project: a [http://en.wikipedia.org/wiki/Tag_cloud word cloud] and a [http://en.wikipedia.org/wiki/Voronoi_diagram Voroni map].  These will be embedded into Reactome's existing web infrastructure, which is based on the [https://developers.google.com/web-toolkit/ Google Web Toolkit].
+
 
+
* Language and Skills: Java/GWT/CSS
+
* Idea by: Lincoln Stein, Henning Hermjakob
+
* Potential Mentors: David Croft
+
 
+
 
+
=== Reactome Search ===
+
 
+
The current [http://www.reactome.org Reactome website] provides a search facility, but this is based on Perl CGI and is not easily extensible to new requirements.  We are seeking a student to implement a new search facility in Java, using a [http://lucene.apache.org/ Lucene-based] engine.  The results of the search should be meaningfully ranked, which means that domain information would need to be incorporated into intelligent ranking algorithms.  Optionally, the candidate may also wish to implement a web client for displaying the output of the search process.
+
 
+
* Language and Skills: Java/Lucene
+
* Idea by: Robin Haw
+
* Potential Mentors: David Croft
+
 
+
 
+
=== JBrowse trackhub ===
+
 
+
[[JBrowse]] is a fast, modern genome browser written primarily in JavaScript.  We would like to see a server-side application (the trackhub) implemented that would accept uploads of track data and serve it up to requesting JBrowse instances, as well as allowing searches of the track meta data.  The JBrowse trackhub instances would federate with each other and register with a central server at jbrowse.org.
+
 
+
* Language and Skills: Groovy or Scala
+
* Idea by: [[User:RobertBuels|Rob Buels]]
+
* Potential mentors: [[User:RobertBuels|Rob Buels]]
+
 
+
 
+
=== Galaxy ===
+
 
+
A couple of rough ideas:
+
 
+
Ideas for which [http://wiki.galaxyproject.org/EnisAfgan Enis Afgan] of the [[Galaxy]] Project would be the mentor.  However, Enis is incommunicado through mid-April.
+
 
+
* Work on fully integrating HTCondor &rarr; [http://wiki.galaxyproject.org/CloudMan CloudMan] &rarr; [[Galaxy]] and demonstrating how to run HTCondor jobs from Galaxy across multiple CloudMan clusters?
+
* Implement DRMAA interface on top of CloudMan so it can provision clusters/nodes that act as isolated job runners (much less defined problem). This is all in support of highly flexible and federated job execution.
+
 
+
Other ideas, with some member of the [http://wiki.galaxyproject.org/GalaxyTeam Galaxy Team] as mentor:
+
 
+
* Interactive visualizations for metagenomics (could build on the work done for GSoC last year)
+
* [http://irods.org/ iRODS] integration with Galaxy's ObjectStore abstract storage interface.
+
* Annotation pipelines of any sort
+
* a Galaxy shell (example commands: login as a user, cd to a history, get/put datasets, run tools/workflows, etc.) implemented in JavaScript (then can run locally via node or use in a Web browser);
+
* an integrated genome browser using Circster (the circular visualization platform embedded in Galaxy) and 1 or more Trackster (the track browser in Galaxy) views simultaneously;
+
* fusion gene/chromosomal rearrangement visualization using both Trackster and Circster;
+
* eQTL pipeline + visualization.
+
* Directory showing which tools are available on which [http://bit.ly/gxypublic Galaxy Public Servers].  Make it easy to search and discover this information.  See the [https://trello.com/c/FdF5h17c feature request].
+
 
+
 
+
=== GMOD wiki makeover ===
+
 
+
The main GMOD website is a wiki, which is good for collaborative editing, but it looks boring and lacks any "brand identity." We would like someone with a good eye for clean, stylish design to develop a new skin for the GMOD wiki. For ideas of what you can do with MediaWiki skinning, see some of the following sites: [http://fr.wikimini.org/wiki/Accueil Wiki Mini]; [http://ltrmenuplus.bilardi.net/ltrmenuplus/index.php/Main_Page ltrMenuPlus]; [http://strategywiki.org/wiki StrategyWiki]; [http://wiki.blender.org/ Blender Wiki]; [http://säsongsmat.nu/ Säsongsmat].
+
 
+
* Language and skills: HTML, CSS, PHP, possibly some javascript; graphic design
+
* idea by: [[User:Girlwithglasses|Amelia Ireland]]
+
* Potential mentor: [[User:Girlwithglasses|Amelia Ireland]]
+
 
+
 
+
=== WormBase iOS App/Mobile web site ===
+
 
+
The new version of WormBase has a RESTful API. Our current site is built on this API (html for each widget is fetched via AJAX request). We'd like to develop either a mobile site or a native iOS app on top of this API.
+
* Language and Skills: familiarity with AJAX/JSON, and either: HTML/CSS/JS (mobile site) or iOS development (iOS app)
+
* Idea by: Abigail Cabunoc
+
* Potential Mentors: Abigail Cabunoc, Todd Harris
+
 
+
 
+
=== SeqWare ===
+
 
+
There are quite a few projects that I would like to see happen for SeqWare and it would be great to get a student to help on these:
+
 
+
* finish our Oozie workflow engine that allows jobs to be scheduled on a M/R cluster
+
* look at leveraging StarCluster/Cloudman/other tech to build SeqWare-capable clusters on the cloud (right now we just have single-node cluster launching which is only really helpful for human exomes)
+
* work with the Galaxy tool and finish the compatibility layer that allows SeqWare workflows to run/interact with Galaxy
+
* write a RESTful web API on our HBase variant database, write a proof of concept variant exploration and sharing web app using the REST API
+
 
+
 
+
=== AmiGO 2 Browser Integration ===
+
 
+
[http://amigo2.geneontology.org AmiGO 2] is the new [http://geneontology.org Gene Ontology] web interface to search, display, and download functional annotation data. Potential projects:
+
 
+
*Write an adaptor to run AmiGO 2 off a Chado database
+
*Embed AmiGO 2 into [[Tripal]] to provide users with an integrated toolset (see [http://banana-genome.cirad.fr/galaxy Galaxy], [http://banana-genome.cirad.fr/cgi-bin/gbrowse/musa_acuminata/ GBrowse], [http://banana-genome.cirad.fr/cmap CMap], and the other tools on the Banana Genome Hub Tripal website for examples of GMOD components integrated into Tripal)
+
 
+
Possible mentors: Chris Mungall and/or Seth Carbon.
+
 
+
 
+
=== Chado owltools loader ===
+
 
+
Ontologies are a vital component of the data stored in a Chado database, and with more ontologies shifting over to [http://www.w3.org/TR/owl2-overview/ OWL] representations, a new Chado loader that can load OWL files would be a valuable contribution to the GMOD project. There is already a set of open-source tools for dealing with OWL files, [http://code.google.com/p/owltools/ owltools], and the programmer could create a Chado loader that uses owltools or a Chado module for owltools. Using owltools to populate certain tables in Chado (e.g. cvtermpath) would be much faster than the current process.
+
  
Some more info: [http://generic-model-organism-system-database.450254.n5.nabble.com/Loading-OWL-ontologies-into-Chado-td5043394.html Loading OWL ontologies into Chado]
+
'''Got an idea for a GSOC project? [[GSOC_Project_Ideas_2019 |Add it here]].'''  Ideas will be included in the proposal we send to GSOC, and great ideas make for a great proposal, so please add yours now.
 +
 +
These projects can use a broad set of skills, technologies, and domains, such as GUIs, database integration, and algorithms. Students are also encouraged to propose their own ideas related to our projects. If you have strong computer skills and have an interest in biology or bioinformatics, you should definitely apply! '''Do not hesitate to propose your own project idea: some of the best applications we see are by students that go this route.''' As long as it is relevant to one of our projects, we will give it serious consideration. Creativity and self-motivation are great traits for open source programmers.
  
(via Kim Rutherford/Chris Mungall)
 
  
 +
== Preparing for GSoC 2019 ==
 +
Right now it is the organization application process for GSoC - we won't know if Open Genome Informatics has been accepted as a GSOC 2019 mentoring organization until [https://developers.google.com/open-source/gsoc/timeline February 6th]. Nevertheless, it is a perfect time if students would like to talk to mentors about project ideas. If you are interested in mentoring, please check the Mentors section below, and contact the organization admin.
  
=== (your idea here)  ===
+
===Students===
 +
More information about [[GSOC_Applications_Guide | writing your application]] will be available closer to the start of the student application period.
  
Please feel very free to propose your own idea. As long as it is relevant to one of our projects, we will give it serious consideration. Creativity and self-motivation are great traits for open source programmers.
+
===Mentors===
'''Do not hesitate to propose your own project idea: some of the best applications we see are by students that go this route.'''
+
We encourage mentors and mentoring organizations to think about new projects year round! If you'd like help with your ideas page or your separate mentoring org application, please feel to contact the organization admins. Links to [[GSOC_Mentoring_Guide | advice about mentoring and other resources]] are available.
  
 
[[Category:Galaxy]]
 
[[Category:Galaxy]]
 
[[Category:JBrowse]]
 
[[Category:JBrowse]]
 +
[[Category:MGI]]
 
[[Category:WormBase]]
 
[[Category:WormBase]]
 
[[Category:GSoC]]
 
[[Category:GSoC]]
 +
[[Category:Reactome]]
 +
[[Category:WebApollo]]

Revision as of 17:09, 18 December 2018

GoogleSummer 2016logo.jpg

Google Summer of Code 2019 @ Open Genome Informatics

Google Summer of Code is a global program that offers student developers stipends to write code for various open source software projects. We work with many open source, free software, and technology-related groups to identify and fund projects over a three month period. Since its inception in 2005, the program has brought together over 14,000 successful student participants from 118 countries, 651 open source organizations, and over 35 million lines of code. Through Google Summer of Code, accepted student applicants are paired with a mentor or mentors from the participating projects, thus gaining exposure to real-world software development scenarios and the opportunity for employment in areas related to their academic pursuits. In turn, the participating projects are able to more easily identify and bring in new developers. Best of all, more source code is created and released for the use and benefit of all. (Excerpt from the Google Summer of Code website)

Since 2011, the Open Genome Informatics group has served as an "umbrella organization" to a variety of bioinformatics projects, including GMOD and its software projects -- JBrowse, Apollo, Chado, Galaxy etc.; Mouse Genome Informatics; OICR; Reactome; WormBase; and Bioconda.

More information about this year's participating bioinformatics groups can be found here.

To learn more about this year's event and how GSoC works, please refer to the FAQ.

Mailing lists, IRC, and other ways to get in touch

Project Ideas

Got an idea for a GSOC project? Add it here. Ideas will be included in the proposal we send to GSOC, and great ideas make for a great proposal, so please add yours now.

These projects can use a broad set of skills, technologies, and domains, such as GUIs, database integration, and algorithms. Students are also encouraged to propose their own ideas related to our projects. If you have strong computer skills and have an interest in biology or bioinformatics, you should definitely apply! Do not hesitate to propose your own project idea: some of the best applications we see are by students that go this route. As long as it is relevant to one of our projects, we will give it serious consideration. Creativity and self-motivation are great traits for open source programmers.


Preparing for GSoC 2019

Right now it is the organization application process for GSoC - we won't know if Open Genome Informatics has been accepted as a GSOC 2019 mentoring organization until February 6th. Nevertheless, it is a perfect time if students would like to talk to mentors about project ideas. If you are interested in mentoring, please check the Mentors section below, and contact the organization admin.

Students

More information about writing your application will be available closer to the start of the student application period.

Mentors

We encourage mentors and mentoring organizations to think about new projects year round! If you'd like help with your ideas page or your separate mentoring org application, please feel to contact the organization admins. Links to advice about mentoring and other resources are available.