If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
School of Medicine, University of California, San Diego, La Jolla, CA, USAThe Broad Institute of MIT and Harvard, Cambridge, MA, USAMoores Cancer Center, University of California, San Diego, La Jolla, CA, USA
Interactive analysis notebook environments promise to streamline genomics research through interleaving text, multimedia, and executable code into unified, sharable, reproducible “research narratives.” However, current notebook systems require programming knowledge, limiting their wider adoption by the research community. We have developed the GenePattern Notebook environment (http://www.genepattern-notebook.org), to our knowledge the first system to integrate the dynamic capabilities of notebook systems with an investigator-focused, easy-to-use interface that provides access to hundreds of genomic tools without the need to write code.
The ongoing explosion of “omics” datasets and the promise of scientific discovery arising from their analysis have given rise to software systems that aim to provide easy access to advanced methods for nonprogramming scientists. These “bioinformatics tool aggregation portals,” e.g., Galaxy (
), also provide for the creation and encapsulation of analytic workflows, transparent access to scalable compute resources, and removal of software installation and implementation concerns from the scientific user.
Alternatively, analysis notebook environments, inspired by the “literate programming” philosophy (
), integrate the exposition of a scientific project with the associated code. They aim to create an “executable document” that ideally serves as a complete description of a research project and which could also be run to reproduce the author's results. Examples include SWEAVE (
Each of these two types of system brings significant value to its targeted user base yet has limitations that prevent wider adoption. Notebook environments model their interface around the annotation of sections of code, and therefore assume that the user is fluent in a programming language such as Python or R. Bioinformatics tool aggregation portals successfully remove the requirement for coding expertise but to date have had limited ability to incorporate the variety of rich text and media formats required to represent the full scientific narrative surrounding each analysis step.
We have developed GenePattern Notebook (Figure 1), an environment that integrates the capabilities of both types of system, allowing users to incorporate encapsulated analysis tools, complete with their user-friendly interface, from a bioinformatics aggregation portal into an interactive analysis notebook. The environment is based on two long-standing software projects: the GenePattern platform for integrative genomics and the Jupyter Notebook environment for interactive computing.
GenePattern (www.genepattern.org), first released in 2004, consists of a repository of hundreds of bioinformatics analysis and visualization methods (“modules”), as well as utilities for data formatting, preprocessing, and other auxiliary functions that provide important “glue” between analysis steps. The user interface is point and click with no programming required. The public GenePattern server, hosted at www.genepattern.org since 2008, has over 40,000 registered users and runs 2,000–5,000 analysis jobs per week. Additional public servers are available at Indiana University (gp.indiana.edu/gp) and the Garvan Institute (pwbc.garvan.org.au/gp). The software has also been downloaded for local installation by over 17,000 bioinformatics core facilities, research laboratories, and individual scientists.
The Jupyter Notebook environment (www.jupyter.org) provides a laboratory notebook metaphor in which researchers build a step-by-step scientific narrative out of “cells” that interleaves code, formatted text, mathematical formulae, plots, and multimedia. The resulting notebooks can be shared, edited, executed, and published as complete encapsulations of in silico research.
The GenePattern Notebook functionality takes the Jupyter Notebook interface one step further, adding analysis, login, and rich text input components that present the GenePattern interface to provide code-free analysis and visualization (Figure S1). All cell types interact seamlessly with existing Jupyter cell types. Within a Python code cell, programming users can easily reference analysis results from a previous GenePattern analysis cell, and in a GenePattern analysis cell, programmers can use Python variables as inputs.
We integrated GenePattern with Jupyter through the use of Jupyter's ipywidgets package, which provides a framework for the creation of new user interface objects within Jupyter Notebooks, and GenePattern's Web services interface, which exposes all of the functionality of GenePattern (e.g., searching for and obtaining module information or querying for the execution status of an analysis) to programmatic access. This combination is a design pattern that has general applicability to the class of Web service-based tools, and the Jupyter development team is incorporating our approach into the currently evolving design of the Jupyter interfaces for graphical input (Dr. Fernando Perez, personal communication, September 26, 2016).
To promote the development and dissemination of GenePattern Notebooks with minimal installation requirements, we have released an online GenePattern Notebook repository and workspace where researchers can collaboratively develop and publish notebook documents. It provides a complete Jupyter environment, connections to several GenePattern servers, and for programmers, the common Python packages used in bioinformatics analysis (numpy, pandas, matplotlib, scikit, etc.). We seeded the repository with notebooks that provide commonly used machine-learning methods: clustering, classification, and prediction, as well as dimension reduction and differential expression analysis.
Those who wish to run the GenePattern Notebook environment on their own compute resources have two options. (1) Non-programmers can install the Kitematic Docker (kitematic.com) application and use it to run the GenePattern Notebook Docker image, available on the standard Docker Hub repository (hub.docker.com). This provides a complete, ready-to-run notebook environment with all dependencies preinstalled. (2) Programmers may install the GenePattern Notebook and its dependencies through the pip or conda package manager interfaces.
To our knowledge GenePattern Notebook is the first integration of a bioinformatics tool aggregation portal with an analysis notebook environment. This approach benefits both nonprogramming and programming investigators alike. For the nonprogrammer, GenePattern Notebook provides the user-friendly GenePattern genomic analysis capabilities within a publishable notebook format. For the programmer already using the Jupyter environment, it affords easy access to the entire GenePattern library of analysis and visualization modules that can be supplemented with the investigator's own coded routines.
The GenePattern Notebook environment, along with an introductory demonstration video, documentation, and tutorials, is available at www.genepattern-notebook.org. The software is freely available under a BSD-style open source license.