Difference between revisions of "GSoC"

From GMOD
Jump to: navigation, search
(Improved search and navigation for ontology-based wikis)
m (Google Summer of Code 2019 @ Open Genome Informatics)
(159 intermediate revisions by 15 users not shown)
Line 1: Line 1:
[[Image:GSoC2012_300x200.png|right|frame|link=http://code.google.com/soc]]
+
[[File:GoogleSummer_2016logo.jpg|373px|right|link=GSoC]]
  
== Welcome to the Genome Informatics Google Summer of Code ==
+
== Google Summer of Code 2019 @ Open Genome Informatics ==
''“Google Summer of Code (GSoC) is a global program that offers student developers stipends to write code for various open source software projects. We have worked with several open source, free software, and technology-related groups to identify and fund several projects over a three month period. Since its inception in 2005, the program has brought together over 4,500 students and more than more than 4,000 mentors & co-mentors from over 85 countries worldwide, all for the love of code. Through Google Summer of Code, accepted student applicants are paired with a mentor or mentors from the participating projects, thus gaining exposure to real-world software development scenarios and the opportunity for employment in areas related to their academic pursuits. In turn, the participating projects are able to more easily identify and bring in new developers. Best of all, more source code is created and released for the use and benefit of all.”''<br>
+
  
GSoC has several goals:
+
'''[https://summerofcode.withgoogle.com/ Google Summer of Code]''' is a global program that offers student developers stipends to write code for various open source software projects. We work with many open source, free software, and technology-related groups to identify and fund projects over a three month period. Since its inception in 2005, the program has brought together over 14,000 successful student participants from 118 countries, 651 open source organizations, and over 35 million lines of code. Through Google Summer of Code, accepted student applicants are paired with a mentor or mentors from the participating projects, thus gaining exposure to real-world software development scenarios and the opportunity for employment in areas related to their academic pursuits. In turn, the participating projects are able to more easily identify and bring in new developers. Best of all, more source code is created and released for the use and benefit of all. (''Excerpt from the [https://summerofcode.withgoogle.com/ Google Summer of Code website]'')
  
*get more open source code created and released for the benefit of all
+
Since 2011, the Open Genome Informatics group has served as an "umbrella organization" to a variety of bioinformatics projects, including [[Main Page|GMOD]] and its software projects -- [[JBrowse]], [[Apollo]], [[Chado]], [[Galaxy]] etc.; [http://www.informatics.jax.org/ Mouse Genome Informatics]; [https://oicr.on.ca/research-portfolio/ OICR]; [http://www.reactome.org Reactome]; [http://www.wormbase.org WormBase]; and [https://bioconda.github.io/ Bioconda].
*inspire young developers to begin participating in open source development
+
*help open source projects identify and bring in new developers and committers
+
*provide students the opportunity to do work related to their academic pursuits during the summer
+
*give students more exposure to real-world software development scenarios.
+
  
[http://socghop.appspot.com Google Summer of Code (GSoC)]
+
'''More information about this year's participating bioinformatics groups can be found [[GSOC_Groups | here]].'''
  
== Member Projects ==
+
To learn more about this year's event and how GSoC works, please refer to the [https://developers.google.com/open-source/gsoc/faq FAQ].
The Genome Infomatics group is organizing the joint efforts of Galaxy, GBrowse, GMOD, JBrowse, Reactome, and Wormbase (see below). This is a great opportunity for students to contribute to the work of any of six established bioinformatics projects.<br>
+
  
;'''[http://galaxy.psu.edu Galaxy]''': An open, web-based platform for accessible, reproducible, and transparent computational biomedical research. The public Galaxy service makes analysis tools, genomic data, tutorial demonstrations, persistent workspaces, and publication services available to any scientist that has access to the Internet. Local Galaxy servers can be set up by downloading the Galaxy application and customizing it to meet particular needs. Galaxy is implemented in Python. Links: [http://galaxy.psu.edu/ Website].
+
==Mailing lists, IRC, and other ways to get in touch  ==
  
;'''[http://gmod.org/wiki/GBrowse GBrowse]''': The Generic Genome Browser (GBrowse) is a web application for searching and displaying annotations on genomes. GBrowse was designed from the bottom up for portability, extensibility, and modularity. It relies on no proprietary software, but only readily available open source software such as MySQL and the BioPerl libraries. GBrowse is implemented in Perl. Link: [http://gmod.org/wiki/GBrowse Website].
+
*Email: [mailto:robin.haw@oicr.on.ca robin.haw@oicr.on.ca] '''and''' [mailto:help@gmod.org help@gmod.org] -- find out more about GSoC, a specific project, or your potential mentor(s).
 
+
;'''[http://jbrowse.org JBrowse]''': JBrowse is being developed as the successor to GBrowse.  It is a modern, fast genome browser implemented almost entirely in JavaScript, with some server-side formatting code in Perl.  Link: [http://jbrowse.org Website].
+
 
+
;'''[http://www.gmod.org Generic Model Organism Database (GMOD)]''' : An open source project to develop a complete set of software for creating and administering a model organism database. Components of this project include genome visualization and editing tools, literature curation tools, a robust database schema, biological ontology tools, and a set of standard operating procedures. This project is collaboration of several database projects, including WormBase, FlyBase, Mouse Genome Informatics, Gramene, the Rat Genome Database, TAIR, EcoCyc, and the Saccharomyces Genome Database. Links: [http://www.gmod.org Website], [http://blog.gmod.org GMOD Blog]
+
 
+
;'''[http://www.reactome.org Reactome]''' : A manually curated database of core pathways and reactions in human biology that functions as a data mining resource and electronic textbook. The Reactome data model describes diverse processes in the human system, including the pathways of intermediary metabolism, regulatory pathways, signal transduction, and high-level processes, such as the cell cycle. Reactome software uses only freely available (and often open source) components and has been created with cross-platform compatibility and wide usability in mind. Data is stored in a MySQL database, the web site is implemented in Perl and data entry tool in Java programming language. The Reactome team is composed of individuals who are both biologists and programmers at the Ontario Institute for Cancer Research, New York University Langone Medical Center, Cold Spring Harbor Laboratory, and The European Bioinformatics Institute. Links: [http://www.reactome.org Website], [http://wiki.reactome.org ReactomeWiki ].
+
 
+
;'''[http://www.wormbase.org WormBase]''' : An online bioinformatics database of the biology and genome of the model organism Caenorhabditis elegans and related nematodes. It is used by the C. elegans research community both as an information resource and as a mode to publish and distribute their results. The database is constantly updated and new versions are released on a monthly basis. WormBase is a collaboration among the Wellcome Trust Sanger Institute, Ontario Institute for Cancer Research, Washington University in St. Louis, and the California Institute of Technology. Links: [http://www.wormbase.org Website].
+
 
+
;'''[http://porteco.org PortEco]''': PortEco project unifies web access to information and tools about the biology of E. coli, its bacteriophages, plasmids, and mobile genetic elements. PortEco partners include [http://ecocyc.org EcoCyc], [http://ecoliwiki.org EcoliWiki], the [http://expression.porteco.org Stanford Microarray Database], and [http://pantherdb.org PANTHER] protein families database.  PortEco is responsible for maintaining the [http://geneontology.org Gene Ontology] annotation of ''E. coli'' genes.
+
 
+
== Contact Us ==
+
*Email: robin.haw[AT]oicr.on.ca - contact me to find out more about a project or your potential mentor(s).
+
 
*Discussion mailing lists: [http://groups.google.com/group/genome-informatics Genome Informatics Google Groups] - ask about our projects; join the community!
 
*Discussion mailing lists: [http://groups.google.com/group/genome-informatics Genome Informatics Google Groups] - ask about our projects; join the community!
 
*IRC channel: #genomeinformatics on Freenode.
 
*IRC channel: #genomeinformatics on Freenode.
 +
* Students and Mentors can email both [[User:Robin.haw|Robin]] and [[User:Scott|Scott]] to get more information about the program.
  
== How to apply ==
+
== [[GSOC_Project_Ideas_2019 | Project Ideas]] ==
 
+
We would like to know who you are and how you think. Incorporate the following into your application:
+
 
+
*Your information
+
**Name, email, and website (optional)
+
*Brief background: education and relevant work experience
+
*Your programming interests and strengths
+
**What are your languages of choice?
+
**Any prior experience with open source development?
+
**Your interest and background in biology or bioinformatics
+
**Any prior exposure to biology or bioinformatics?
+
*Your ideas for a project (an original idea or one expanded from our Ideas Page)
+
**Provide as much detail as possible
+
**Strong applicants include an implementation plan and timeline (hint!)
+
**Refer to and link to other projects or products that illustrate your ideas
+
**Identify possible hurdles and questions that will require more research/planning
+
*What can you bring to the team?
+
 
+
== Guidelines and Advice for Student Applicants ==
+
 
+
*[http://www.booki.cc/gsocstudentguide GSoC Student Guide]
+
*[http://www.google-melange.com/document/show/gsoc_program/google/gsoc2011/faqs#applying Application Guidelines]
+
*[http://code.google.com/p/google-summer-of-code/wiki/AdviceforStudents Advice for Student Applicants]
+
 
+
== Resources ==
+
*[http://socghop.appspot.com GSoC Main Site]
+
*[http://code.google.com/p/google-summer-of-code/wiki/GsocFlyers#2011_Flyers This year's flyer]
+
*[http://code.google.com/p/google-summer-of-code/wiki/ProgramPresentations GSoC Presentations]
+
*[http://socghop.appspot.com/document/show/gsoc_program/google/gsoc2011/faqs Program FAQ]
+
*[http://socghop.appspot.com/document/show/gsoc_program/google/gsoc2011/faqs#timeline Program Timeline]
+
*[http://groups.google.com/group/google-summer-of-code-discuss GSoC Discussion List]
+
*[http://groups.google.com/group/google-summer-of-code-announce Announcement List]
+
*[http://www.youtube.com/user/googOSPOstudntprgrms GSoC YouTube Channel]
+
*[http://google-opensource.blogspot.com/ GSoC Blog]
+
 
+
=== For Students ===
+
*[http://www.google-melange.com/document/show/gsoc_program/google/gsoc2012/faqs#student_apply How to apply]
+
*[http://en.flossmanuals.net/GSoCStudentGuide/ GSoC Student Guide]
+
*[http://socghop.appspot.com/document/show/gsoc_program/google/gsoc2010/faqs Frequently Asked Questions (FAQ)]
+
*[http://socghop.appspot.com/document/show/gsoc_program/google/gsoc2010/faqs#payments When do I get paid?!]
+
*[http://code.google.com/p/google-summer-of-code/wiki/AdviceforStudents Advice for Student Applicants]
+
*[http://groups.google.com/group/google-summer-of-code-students-list GSoC Students-Only List]
+
 
+
=== For Mentors ===
+
*[http://socghop.appspot.com/ GSoC Main Site]
+
*[http://en.flossmanuals.net/gsocmentoring/ Mentoring: The Book]
+
*[http://code.google.com/p/google-summer-of-code/wiki/AdviceforMentors Advice for Mentors]
+
*[http://groups.google.com/group/google-summer-of-code-mentors-list GSoC Mentors-Only List]
+
 
+
== Project Ideas ==
+
 
+
These projects include a broad set of skills, technologies and domains, such as GUIs, database integration and algorithms. You are also encouraged to propose your own ideas related to our projects. If you have strong computer skills and have an interest in biology or bioinformatics, then you should definitely apply!
+
 
+
=== Reactome Pathway Summary Visualization ===
+
 
+
The classic [http://www.reactome.org Reactome] website provided a view called the "Sky", which gave a visual summary of all of the pathways in the [http://www.reactome.org Reactome] database.  Unfortunately, this overview was lost in the migration to the new, [http://code.google.com/webtoolkit/ GWT]-based website.
+
 
+
This project would produce a replacement for the old "Sky".  In particular, it would show expression and species comparison information in a multi-pathway context.  The project would be [http://code.google.com/webtoolkit/ GWT]-based, giving a student experience with an increasingly popular website construction environment.
+
 
+
* Language and Skills: Java, GWT
+
* Idea by: David Croft
+
* Potential Mentors: David Croft
+
 
+
You are strongly recommended to set up a local Reactome installation on your own computer now, before starting with the project.  Take a look at the [[SupplementaryInformation|Reactome Supplementary Information]] for instructions on doing this.  Click [[PathwaySummaryVisualizationSpec|here]] for a loose specification of what we want to do.
+
 
+
=== Reactome Smartphone Application ===
+
 
+
Reactome has a new [http://www.reactome.org:8080/ReactomeRESTfulAPI/ReactomeRESTFulAPI.html RESTful interface], which has the ability to expose pathway data in Reactome as XML and JSON. We would like to develop a smartphone application for Reactome so that it runs on a variety of platforms (iOS and Android in the first instance). The application will consume the data available via the [http://www.reactome.org:8080/ReactomeRESTfulAPI/ReactomeRESTFulAPI.html RESTful interface] to render its views and perform its functions.
+
 
+
* Language and Skills: HTML/CSS/Javascript, and familiarity to AJAX/JSON and popular JavaScript libraries (e.g. jQuery)
+
* Idea by: Guanming Wu
+
* Potential Mentors: Guanming Wu
+
 
+
=== Build an interactive phylogenetic tree visualization framework for [[Galaxy]] ===
+
 
+
[[Galaxy]] is a web-based data integration and analysis framework for biological researchers.  We have a strong need for an interactive phylogenetic tree visualization component inside Galaxy.
+
 
+
* Language and Skills:
+
** Interactive web visualization (HTML5/CSS/JavaScript/JQuery).  Python would be a plus.
+
** Familiarity with phylogeny, visual analytics
+
* Idea by: Anton Nekrutenko
+
* Potential Mentors: Anton Nekrutenko
+
 
+
=== Add additional analysis tools and polishing to Mimosa ===
+
 
+
Mimosa is a nascent GMOD project to create a powerful, easy-to-use tool for calculating and displaying sequence alignments on the web with a variety of tools.  It is implemented in Perl and JavaScript using Catalyst, DBIx::Class, ExtJS, and Chado.  Currently, it only has support for running `blastall` (i.e. first-generation NCBI BLAST).  A student is needed to add support for running more analysis tools, and for developing a more polished user interface and installation.
+
 
+
* Skills needed:
+
** Perl (Catalyst, DBIx::Class)
+
** JavaScript (ExtJS)
+
** HTML and CSS
+
* Possible mentors:
+
** [[User:Scott|Scott Cain]]
+
** Jonathan Leto
+
** Robert Buels
+
 
+
=== Speed up Chado GFF3 loading ===
+
 
+
[[Chado]] is the organism-agnostic database schema for GMOD which is capable of storing multiple data types.  The data type that nearly everyone who uses Chado stores in it is sequence features (like chromosomes, genes and exons).  There is currently a loader for the most common flat file format for sequence features, [[GFF3]], that works.  However, it is quite slow compared to GFF3 loaders for other databases, like Bio::DB::SeqFeature::Store.  A student could undertake profiling of the existing application and develop strategies for speeding up the GFF3 loading.  Possible improvements could include code optimization as well as methodological changes like loading the GFF3 into a staging database before loading into Chado.
+
 
+
* Skills needed:
+
** Perl (DBI, BioPerl)
+
** PostgreSQL
+
* Possible mentors:
+
** [[User:Scott|Scott Cain]]
+
 
+
<br clear=all>
+
 
+
=== Visualizing Whole Genomes with GBrowse 2===
+
[[File:ideograms_combined.png|250px|thumb|Figure 1. The whole human genome, rendered as chromosome ideograms.  Top chromosome banding patterns; Bottom A gene-density heat-map applied to the same chromosomes ]]
+
 
+
Most genome browsers tend to focus on small regions of the genome.  In the era of whole genome experiments, the ability to view quantitative data on a genome wide scale would be helpful in its own right as well as providing an entry point into more targeted browsing with conventional genome browsers.  In the context of GMOD, there was an application GBrowse_karyotype that was able to render whole genomes as chromosome ideograms and even map features to the chromosomes.  While the original application was deprecated with the transition to GBrowse 2, much of the original karyotype functionality was retained.  A worthwhile project would be to access this functionality as a development framework for a stand-alone whole genome viewer that would replicate and extend the functionality of GBrowse_karyotype.  Some examples of scientific use-cases that could benefit from such a viewer include whole geneome copy number variation studies, read-coverage plots for next generation sequencing-based transcriptome profiling, etc.
+
 
+
* Skills needed:
+
** Object-oriented Perl; Perl-CGI**
+
** Interactive web visualization (HTML5/CSS/JavaScript) would be a plus.
+
** Familiarity with visual analytics
+
 
+
* Proposed by:[[User:mckays|Sheldon McKay]]
+
 
+
* Possible mentors:
+
** <span class=pops>[[User:mckays|Sheldon McKay]]</span>
+
 
+
<br clear=all>
+
 
+
=== Visualizing Comparative Genomics with GBrowse 2===
+
[[Image:GBrowse_syn.png|right|thumb|500px|GBrowse_syn, as implemented at WormBase]]
+
As more genomes are sequenced the need for better comparative genome browsers grows in proportion.  [[GBrowse_syn]], the comparative synteny browser, has been ported to GBrowse 2 but is not fully integrated and lacks a lot of the rich, user interface offered by GBrowse.  It would benefit from different visualization paradigms, such as whole genome views, dot-plots, etc.  This project would involve fully integrating this application into the GBrowse code base, as well as adding new capabilities such as a richer, more configurable user interface, "on the fly" multiple sequence alignment generation and alternate views.  This project has the potential to greatly enhance comparative genomics viewing with GBrowse.
+
 
+
* Skills needed:
+
** Object-oriented Perl; Perl-CGI**
+
** Interactive web visualization (HTML5/CSS/JavaScript) would be a plus.
+
** Familiarity with visual analytics
+
 
+
* Proposed by:[[User:mckays|Sheldon McKay]]
+
 
+
* Possible mentors:
+
** <span class=pops>[[User:mckays|Sheldon McKay]]</span>
+
  
<br clear=all>
+
'''Got an idea for a GSOC project? [[GSOC_Project_Ideas_2019 |Add it here]].'''  Ideas will be included in the proposal we send to GSOC, and great ideas make for a great proposal, so please add yours now.
 +
 +
These projects can use a broad set of skills, technologies, and domains, such as GUIs, database integration, and algorithms. Students are also encouraged to propose their own ideas related to our projects. If you have strong computer skills and have an interest in biology or bioinformatics, you should definitely apply! '''Do not hesitate to propose your own project idea: some of the best applications we see are by students that go this route.''' As long as it is relevant to one of our projects, we will give it serious consideration. Creativity and self-motivation are great traits for open source programmers.
  
=== PortEco single sign-on system ===
 
The PortEco projects integrates activities at several web projects hosted at different locations in different URL domains. This project would produce a user authentication and authorization system that would allow users to work seamlessly across the different PortEco components.  This would include integrating the login systems of GMOD components like Gbrowse with other common open-source systems (MediaWiki and phpBB) and building login systems for other components.
 
* Skills needed:
 
** PHP, Perl, and Java web programming
 
* Proposed by [[User:JimHu|Jim Hu]]
 
* Possible mentors
 
** [[User:JimHu|Jim Hu]]
 
  
=== Improved search and navigation for ontology-based wikis ===
+
== Preparing for GSoC 2019 ==
[http://ecoliwiki.net EcoliWiki], the [http://gowiki.tamu.edu Gene Ontology Normal Usage Tracking System (GONUTS)], and the [http://microbialphenotypes.org Ontology of Microbial Phenotypes wiki] all use [http://mediawiki.org MediaWiki's] built-in search tools. Better search systems that take advantage of the Directed Acyclic Graph nature of Mediawiki's Category system, which is used to represent ontologies, are needed. This project would build an advanced search interface that combines Ajax-autosuggest and boolean operators to search for pages in combinations of categories.
+
Right now it is the organization application process for GSoC - we won't know if Open Genome Informatics has been accepted as a GSOC 2019 mentoring organization until [https://developers.google.com/open-source/gsoc/timeline February 6th]. Nevertheless, it is a perfect time if students would like to talk to mentors about project ideas. If you are interested in mentoring, please check the Mentors section below, and contact the organization admin.
* Skills needed:
+
** PHP, Mediawiki, Ajax
+
* Proposed by [[User:JimHu|Jim Hu]]
+
* Possible mentors
+
** [[User:JimHu|Jim Hu]]
+
  
=== (your idea here)  ===
+
===Students===
 +
More information about [[GSOC_Applications_Guide | writing your application]] will be available closer to the start of the student application period.
  
Please feel very free to propose your own idea. As long as it is relevant to one of our projects, we will give it serious consideration. Creativity and self-motivation are great traits for open source programmers.
+
===Mentors===
 +
We encourage mentors and mentoring organizations to think about new projects year round! If you'd like help with your ideas page or your separate mentoring org application, please feel to contact the organization admins. Links to [[GSOC_Mentoring_Guide | advice about mentoring and other resources]] are available.
  
'''Do not hesitate to propose your own project idea: some of the best applications we see are by students that go this route.'''
+
[[Category:Galaxy]]
 +
[[Category:JBrowse]]
 +
[[Category:MGI]]
 +
[[Category:WormBase]]
 +
[[Category:GSoC]]
 +
[[Category:Reactome]]
 +
[[Category:WebApollo]]

Revision as of 17:09, 18 December 2018

GoogleSummer 2016logo.jpg

Google Summer of Code 2019 @ Open Genome Informatics

Google Summer of Code is a global program that offers student developers stipends to write code for various open source software projects. We work with many open source, free software, and technology-related groups to identify and fund projects over a three month period. Since its inception in 2005, the program has brought together over 14,000 successful student participants from 118 countries, 651 open source organizations, and over 35 million lines of code. Through Google Summer of Code, accepted student applicants are paired with a mentor or mentors from the participating projects, thus gaining exposure to real-world software development scenarios and the opportunity for employment in areas related to their academic pursuits. In turn, the participating projects are able to more easily identify and bring in new developers. Best of all, more source code is created and released for the use and benefit of all. (Excerpt from the Google Summer of Code website)

Since 2011, the Open Genome Informatics group has served as an "umbrella organization" to a variety of bioinformatics projects, including GMOD and its software projects -- JBrowse, Apollo, Chado, Galaxy etc.; Mouse Genome Informatics; OICR; Reactome; WormBase; and Bioconda.

More information about this year's participating bioinformatics groups can be found here.

To learn more about this year's event and how GSoC works, please refer to the FAQ.

Mailing lists, IRC, and other ways to get in touch

Project Ideas

Got an idea for a GSOC project? Add it here. Ideas will be included in the proposal we send to GSOC, and great ideas make for a great proposal, so please add yours now.

These projects can use a broad set of skills, technologies, and domains, such as GUIs, database integration, and algorithms. Students are also encouraged to propose their own ideas related to our projects. If you have strong computer skills and have an interest in biology or bioinformatics, you should definitely apply! Do not hesitate to propose your own project idea: some of the best applications we see are by students that go this route. As long as it is relevant to one of our projects, we will give it serious consideration. Creativity and self-motivation are great traits for open source programmers.


Preparing for GSoC 2019

Right now it is the organization application process for GSoC - we won't know if Open Genome Informatics has been accepted as a GSOC 2019 mentoring organization until February 6th. Nevertheless, it is a perfect time if students would like to talk to mentors about project ideas. If you are interested in mentoring, please check the Mentors section below, and contact the organization admin.

Students

More information about writing your application will be available closer to the start of the student application period.

Mentors

We encourage mentors and mentoring organizations to think about new projects year round! If you'd like help with your ideas page or your separate mentoring org application, please feel to contact the organization admins. Links to advice about mentoring and other resources are available.