GSOC Project Ideas 2021

From GMOD
Revision as of 18:37, 1 April 2021 by Clements (Talk | contribs)

Jump to: navigation, search

Got an idea for GSOC 2021?

Then please post it. You can either

  1. Add it here, by directly editing this page. Just copy, paste and update the template below. This requires that you have or create a GMOD.org login.

Projects can use a broad set of skills, technologies, and domains, such as GUIs, database integration and algorithms.

Students are also encouraged to propose their own ideas related to our projects. If you have strong computer skills and have an interest in biology or bioinformatics, you should definitely apply! Do not hesitate to propose your own project idea: some of the best applications we see are by students that go this route. As long as it is relevant to one of our projects, we will give it serious consideration. Creativity and self-motivation are great traits for open-source programmers.


Proposed project ideas for 2021

JBrowse 2 Plugins for Additional Synteny Formats

    • Brief explanation: Write a new JBrowse 2 plugin to support MSPCrunch or Mummer data input
    • Expected results: a new JBrowse 2 plugin that adds support for one of the data formats listed above
    • Project Home Page URL: JBrowse.org
    • Knowledge prerequisites: JavaScript
    • Skill level: Medium
    • Mentors: JBrowse development team

Interactive viewer for systems-biology variant interpretation (UI)

    • Brief explanation: Develop a interactive DAG linking user-specified variant to genes, cell-type expression, disease association/known cancer mutations, known drug targets.
    • Expected results: Website powered by Cytoscape.js which shows input variants as nodes, linked to different levels of system organization.
    • Project Home Page URL: Reactome.org
    • Knowledge prerequisites: JavaScript
    • Skill level: Medium
    • Mentors: Shraddha Pai
    • Project Description

Interactive viewer for systems-biology variant interpretation (Server-side)

    • Brief explanation: Create server-side database and application for system-level annotation of variants/gene, to connect to interactive UI (e.g. selected single-cell marker datasets , known disease associations, drug targets).
    • Expected results: Website allows users to visualize systems-level variant/gene annotation with interactive linkouts to data sources
    • Project Home Page URL: Reactome.org
    • Knowledge prerequisites: Experience with document oriented databases (e.g. MongoDB), graphQL
    • Skill level: Medium
    • Mentors: Shraddha Pai
    • Project Description

Style Guides for Biological Information Portal (WormBase / Alliance of Genome Resources)

    • Brief explanation: The Alliance of Genome Resources is founded to unify access to research knowledge across different model organism systems (such as worms, flies, mouse, etc). It provides ways for published research knowledge to be categorized, aggregated, and searched. The Alliance is founded by members that each specialize in a specific model organism system. They each have their own existing websites and user base (more detail on these members and their sites here: https://www.alliancegenome.org/) At the Alliance, we look to support the existing uses cases of the member sites while furthering usability and consistency. To achieve those goals, we need style guides that can be applied to the development of the Alliance website.
    • Expected results: Design Prototypes and guidelines resulting from several iterations of design lifecycle.
    • Project Home Page URL: https://www.alliancegenome.org/.
    • Project paper reference and URL: The Alliance of Genome Resources: Building a Modern Data Ecosystem for Model Organism Databases
    • Knowledge prerequisites: Design or HCI. Knowledge of biology is preferred.
    • Skill level: Advanced.
    • Mentors: Sibyl Gao (sibyl@wormbase.org).

Bioinformatics with Jupyter Notebooks (WormBase)

    • Brief explanation: WormBase is an informational portal for curated biological research knowledge. In addition to the website, we offer programmatic access to the data through REST API and downloadable files. This programmatic access is intended to support bioinformatics work. We believe working examples would augment the existing documentation, making it easier for bioinformaticians to access WormBase data programmatically.
    • Expected results: A series of Jupyter Notebooks that demonstrates how WormBase data can be used in bioinformatics.
    • Project Home Page URL: https://wormbase.org.
    • Project paper reference and URL: WormBase: a modern Model Organism Information Resource
    • Knowledge prerequisites: Python or R, and bioinformatics knowledge.
    • Skill level: Advanced.
    • Mentors: Sibyl Gao (sibyl@wormbase.org).

Authentication and Authorization Service (Alliance)

    • Brief explanation: Alliance of Genomic Resources has several services which require authentication and authorization. We would like a single service to maintain authentication and authorization across the project as the number of services increase.
    • Expected results: A Service which utilizes JSON Web Tokens (JWT), Refresh Token and OAuth2 for authentication and GitHub Managed JSON files for Authorization.
    • Project Home Page URL: https://www.alliancegenome.org/.
    • Project paper reference and URL: The Alliance of Genome Resources: Building a Modern Data Ecosystem for Model Organism Databases
    • Knowledge prerequisites: Python, JavaScript, Microservices.
    • Skill level: Medium.
    • Mentors: Adam Wright (adam.wright@wormbase.org).

Use Galaxy to run Reactome analysis and processes on proteomics data (Reactome)

  • Brief explanation: Reactome is a free, open-source, curated and peer-reviewed pathway database. Our goal is to provide intuitive bioinformatics tools for the visualization, interpretation and analysis of pathway knowledge to support basic research, genome analysis, modelling, systems biology and education. Galaxy is an open, web-based platform for data-intensive biomedical research, which allows users to perform, reproduce, and share complete analyses.
  • Expected results: There are two potential sub-projects. 1) Adding Reactome as a data resource in Galaxy, to enable Galaxy users to use Reactome reaction and pathway annotation data, and 2) Performing identifier mapping and over-representation analysis workflows from Reactome in Galaxy. Reactome Github.
  • Project Home Page URL: if there is one.
  • Project paper reference and URL: reactome.org, galaxyproject.org, ProteoRE (Proteomics Research Environment)
  • Knowledge prerequisites: Galaxy, Java, web services.
  • Skill level: Medium.
  • Mentors: Robin Haw (robin.haw[AT]oicr.on.ca) and Joel Weiser (joel.weiser[AT]oicr.on.ca).

GraphDB API (Reactome)

  • Brief explanation: Reactome uses both a relational database (MySQL) and a graph database (Neo4j). There is an existing API that uses the relational database, and many Reactome components use this API. To make it easier to transition these components to using the graph database, a new API with equivalent functionality needs to be created.
  • Expected results: A new Java API that interacts with the graph database, with functionality such that it could be used as a drop-in replacement for the relational database API.
  • Project Home Page URL: reactome.org.
  • Project paper reference and URL:
  • Knowledge prerequisites: Java, MySQL. Neo4j would be good, but not necessary.
  • Skill level: Advanced.
  • Mentors: Solomon Shorser (solomon.shorser[AT]oicr.on.ca)

Centralized dashboard or metrics system (Reactome)

  • Brief explanation: Reactome has both manual and automated statistical tracking of its quarterly release data. This project would seek to fully automate and consolidate the quantification of release data measurement for metrics such as the number of pathways, reactions, distinct proteins (with and without UniProt isoforms), complexes, small molecules, drugs/therapeutics, literature references, etc. for human (curated) and non-human (electronically inferred) species and stratified for normal and disease biology. a centralized dashboard would be useful by the team for discussing metrics externally and community outreach.
  • Expected results: A program which will produce a standardized report of statistics for a Reactome release database with aesthetic visuals
  • Project Home Page URL: reactome.org.
  • Knowledge prerequisites: Java, MySQL and/or Neo4j, creating visuals for statistical data (preferred but not required)
  • Skill level: Medium.
  • Mentors: Robin Haw (robin.haw[AT]oicr.on.ca) Joel Weiser (joel.weiser[AT]oicr.on.ca)

Community access portal to Reactome Archive (Reactome)

  • Brief explanation: Reactome generates new pathway and other annotation data on a quarterly basis. With each new release, the preceding data set is archived to an AWS S3 bucket. As part of our data sharing policy, we would like to develop web interface to allow users to request specific versions of archived data and to make it available to download.
  • Expected results: Web interface for users to request data and download via a shareable link that either expires within a certain timeframe or after data is downloaded.
  • Project Home Page URL: reactome.org.
  • Knowledge prerequisites: Java, AWS, Joomla
  • Skill level: Medium.
  • Mentors: Robin Haw (robin.haw[AT]oicr.on.ca) Solomon Shorser (solomon.shorser[AT]oicr.on.ca)

Bioinformatics Visualization Library - Mapping Reactome Pathway Hierarchy to Low Dimension Manifolds (Reactome)

  • Brief explanation: Build an interactive and dynamic D3 or webGL based dimention reduction visualization library with the ability to tinker Reactome Pathway Hierarchy Network features. Demonstrate usability and features in Jupyter Notebook.
  • Expected results: A bundled javascript visualization library of tSNE, UMAP, and PCA plots designed for python programmers in bioinformatics community with a focus on Reactome Pathway Hierarchy features. Jupyter Notebook demonstrating usability and features.
  • Project Home Page URL: reactome.org.
  • Knowledge prerequisites: Python3, D3 or webGL, javascript
  • Skill level: Medium.
  • Mentors: Nasim Sanati (nasim[AT]plenary.org) Solomon Shorser (solomon.shorser[AT]oicr.on.ca)

Bioinformatics Resource - Data Pipeline for Systematic Feature Extraction of Major Single-Cell Resources Utilizable for Downstream Analysis (Reactome)

  • Brief explanation: Build a dockized systematic data pipeline that modulates feature extraction of single cell data in both human and mice tissues. For this project, we will focus on Descartes data resource [1] with 121 human and 61 mice organogenesis tissues. This data resource has gone through preprocessing steps utilizing monocle3 [2]. For feature extraction, we will leverage Reactomes' curated data and utilize libraries such as scanpy [3], scVelo [4], pySCENIC [5], and ssGSEA. All steps including EDA, quality of data, feature extraction, and pathway activity workflow will be demonstrated in Jupyter Notebook.
  • Expected results: Dockerized python3 client with automated pipeline that gathers and processes Single-Cell features in each tissue type. The resource is utilizable for downstream pathway or network based analysis in bioinformatics community with validated pathway activity in each type. A jupyter notebook demonstrating feature extraction pipeline and pathway activity.
  • Project Home Page URL: reactome.org.
  • Knowledge prerequisites: Python3, Single-Cell Bioinformatics Processes, Feature Extraction, Dockers
  • Skill level: Medium.
  • Mentors: Nasim Sanati (nasim[AT]plenary.org) Solomon Shorser (solomon.shorser[AT]oicr.on.ca)

Bioinformatics Data Science - Reactome Pathway Embeddings (Reactome)

  • Brief explanation: Utilize established methods and simulated data to find pathway embeddings in human Reactome pathway hierarchy graph. These pathway representations may be used in downstream analysis as biological processes features that are involved/not-involved within disease or other biological context.
  • Expected results: Python client with the capability to compute graph embeddings of human Reactome pathway hierarchy within different context and compare the differences and/or similarities between embeddings results with validated examples.
  • Project Home Page URL: reactome.org.
  • Knowledge prerequisites: Python3, Keras/Tensorflow, Machine Learning & Feature Extraction, Biological Networks
  • Skill level: Medium/Advanced.
  • Mentors: Nasim Sanati (nasim[AT]plenary.org)

Datatypes Help in Galaxy (Galaxy)

  • Brief explanation: Create infrastructure for providing datatype help in Galaxy. Includes expanding datatype definitions and updating Galaxy user interface to take advantage of it.
  • Expected results: Datatype format and semantics help would be widely available when using Galaxy, including in tools that consume and produce particular datatypes, as well as server-wide help describing supported datatypes.
  • Project Home Page URL: https://galaxyproject.org/
  • Project paper reference and URL: Jalili, V., Afgan, E., Gu, Q., Clements, D., Blankenberg, D., Goecks, J., Taylor, J., & Nekrutenko, A. (2020). The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2020 update. Nucleic Acids Research, 48(W1), W395–W402. https://doi.org/10.1093/nar/gkaa434
  • Knowledge prerequisites: Python and JavaScript. Will use Vue.js components in front end.
  • Skill level: Basic
  • Mentors: Björn Grüning, University of Freiburg (bjoern.gruening[at]gmail.com); Dave Clements, Johns Hopkins University (clements[at]galaxyproject.org); Galaxy Support Working Group, global.

Provide users with better quota information (Galaxy)

  • Brief explanation: Publish each server's quotas in a standard way; provide users with more information about what analyses and datasets are consuming their quota allocation.
  • Expected results: Users will know immediately what a server's quotas are, and what items are contributing most to consuming their quota. Users will have a clear idea of what they can expect, and what they can do to increase their available resources.
  • Project Home Page URL: https://galaxyproject.org/
  • Project paper reference and URL: Jalili, V., Afgan, E., Gu, Q., Clements, D., Blankenberg, D., Goecks, J., Taylor, J., & Nekrutenko, A. (2020). The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2020 update. Nucleic Acids Research, 48(W1), W395–W402. https://doi.org/10.1093/nar/gkaa434
  • Knowledge prerequisites: Python and JavaScript. Will use Vue.js components in front end.
  • Skill level: Medium
  • Mentors: Björn Grüning, University of Freiburg (bjoern.gruening[at]gmail.com); Dave Clements, Johns Hopkins University (clements[at]galaxyproject.org); Galaxy Support Working Group, global.

Creating learning paths within the Galaxy Training Network

  • Brief explanation: Implement and display learning paths in the Galaxy Training Material infrastructure to show learners, especially newcomers, which tutorial they should take first or which sequence of tutorials to follow to become knowledgeable about a particular topic.
  • Expected results: An easy way to configure and update learning paths. Easy to understand and navigate learning paths for web site users.
  • Project Home Page URL: https://training.galaxyproject.org/
  • Project paper reference and URL: Serrano-Solano, B., Erxleben, A., Gallardo-Alba, C., Rasche, H., Hiltemann, S., Föll, M., Fahrner, M., Dunning, M. J., Schulz, M., Scholtz, B., Clements, D., Nekrutenko, A., Batut, B., & Grüning, B. (2020). Fostering Accessible Online Education Using Galaxy as an e-learning Platform. Preprints. https://doi.org/10.20944/preprints202009.0457.v2
  • Knowledge prerequisites: Python and JavaScript.
  • Skill level: Medium
  • Mentors: Bérénice Batut, University of Freiburg (berenice DOT batut[at] gmail.com). Galaxy Outreach & Training Working Group, global.

Template: Project Idea Name (Project Name/Lab Name)

  • Brief explanation: Brief description of the idea, including any relevant links, etc.
  • Expected results: describe the outcome of the project idea.
  • Project Home Page URL: if there is one.
  • Project paper reference and URL: Is there a paper about the project this effort will be a part of?
  • Knowledge prerequisites: programming language(s) to be used, plus any other particular computer science skills needed.
  • Skill level: Basic, Medium or Advanced.
  • Mentors: name + contact details of the lead mentor, name + contact details of 1 or 2 backup mentors.