GenePattern Notebook Integration Guide

A detailed guide for integrating GenePattern Notebook components into your shared Jupyter environment.

The GenePattern Notebook environment consists of a series of extensions for Jupyter and JupyterHub that make the interface more user-friendly and which reduce the need to write and visibly execute code, allowing notebooks to best encapsulate an entire scientific research narrative, driven by user-friendly interactive widgets.

The following extensions and packages are major components of this environment:

Jupyter

  • jupyter-wysiwyg: A rich text editor for markdown cells.
  • nbtools: A framework for automatically generating user-friendly interactive widgets which encapsulate code execution and increase portability between notebooks.
  • genepattern-notebook: Provides users with the ability to embed and execute widgets for hundreds of popular biologic and bioinformatic tools, as well as general machine learning methods. Built using nbtools.
  • genepattern-python: Programmatic client used to call the GenePattern API. This package is used by genepattern-notebook to run and manage analyses.

JupyterHub

  • gpauthenticator: Allows users to authenticate with JupyterHub using their GenePattern credentials.
  • notebook-projects: A collection of configuration and templates, which allow users to encapsulate dependencies, notebooks and supporting files within individual “projects.”
  • notebook-repository: A service which allows users to share notebooks, and their associated data files, with the public or with select collaborators. Depends on notebook-projects.

Using the GenePattern Notebook Docker images

Note: As of the time of this document’s writing (April 2021), JupyterLab support is still in beta, and both Docker images run classic Jupyter Notebook by default. In the coming months this will be changing, and JupyterLab will be launched by default.

The GenePattern Notebook environment is made available through two Docker images, one encapsulating Jupyter and the other encapsulating JupyterHub. Both include the associated extensions noted above. The Jupyter image also includes a curated list of Python packages which are useful in bioinformatics and data science.

Using these images directly is the easiest, albeit but least flexible, way to integrate GenePattern Notebook. They are available from DockerHub using the following commands:

Jupyter Image

docker pull genepattern/genepattern-notebook:<release_number>

JupyterHub Image

docker pull genepattern/notebook-repository:<release_number>

Configuration

When running either image you will likely want to set some configuration. For the Jupyter image this can largely be done directly from the Docker command line, although for more explicit control, some users may want to mount and overwrite the image’s jupyter_notebook_config.py file. Below are a few likely configurations:

  • -p 8888:8888: This maps port 8888 inside the container (the port on which Jupyter runs by default) to port 8888 outside the container, allowing you to access its Jupyter server from the host machine.
  • -v ~:/home/jovyan: This mounts the host machine’s home directory as the home directory of the Jupyter user. Change the home directory to a different path if you want the user to be restricted to that directory.
  • -v jupyter_notebook_config.py:/data/jupyter_notebook_config.py: This allows you to override the included jupyter_notebook_config.py file with a different config file on the host machine.

Configuring the JupyterHub image is a bit more complex. You will almost certainly want to run it on a docker network, mount directories and override the jupyterhub_config.py file. A script for launching the image and managing these configurations has been written and is available in GitHub alongside the image’s Dockerfile.

Installing GenePattern Notebook manually

A more flexible, albeit more manual, way to integrate GenePattern Notebook is to install the extensions directly into your Jupyter or JupyterHub environments.

All of the Jupyter extensions have been made available through the pip and conda package managers, and can be installed with a single command. Installing JupyterHub extensions, on the other hand, is a more “hands on” process, requiring pulling code from GitHub.

Python 3.6+ is required for all packages.

jupyter-wysiwyg

Note: At the moment this package only supports classic Jupyter Notebook. Compatibility with JupyterLab is coming in the future.

This package contains a rich text editor for markdown cells. To install jupyter-wysiwyg, run either of the following commands inside the same Python environment in which you installed Jupyter:

pip install jupyter-wysiwyg
conda install -c genepattern jupyter-wysiwyg

Upon installation, the extension should automatically register and enable itself with your Jupyter environment. However, if you’re using an older version of Jupyter that predates 5.3, this may not happen, in which case you will need to manually install and enable the extension using the following commands:

jupyter nbextension install --py jupyter-wysiwyg
jupyter nbextension enable jupyter-wysiwyg

nbtools

Note: JupyterLab support for this extension is in beta, and can be installed from pip using the --pre flag.

This is a framework for automatically generating user-friendly interactive widgets. To install nbtools, run either of the following commands.

pip install nbtools
conda install -c genepattern nbtools

If your Jupyter install includes separate environments for Jupyter and for its various kernels, you may need to install nbtools in each.

If using an older (pre-5.3) version of Jupyter, you will also need to install and enable nbtools manually (see the instructions under jupyter-wysiwyg).

genepattern-notebook

Note: JupyterLab support for this extension is in beta, and can be installed from pip using the --pre flag.

This extension provides users with the ability to embed and execute widgets for hundreds of popular biologic and bioinformatic tools. To install genepattern-notebook, run either of the following commands.

pip install genepattern-notebook
conda install -c genepattern genepattern-notebook

Some optional genepattern-notebook functionality depends on the pandas library. You may want to install it alongside the extension using one of the following commands:

pip install pandas
conda install pandas

Installing the genepattern-notebook extension also installs nbtools and genepattern-python, as they are both dependencies of genepattern-notebook.

If your Jupyter install includes separate environments for Jupyter and for its various kernels, you may need to install genepattern-notebook in each.

If using an older (pre-5.3) version of Jupyter, you will also need to install and enable genepattern-notebook manually (see the instructions under jupyter-wysiwyg).

genepattern-python

This is a programmatic client used to call the GenePattern API. To install it, run either of the following commands.

pip install genepattern-python
conda install -c genepattern genepattern-python

Some optional genepattern-python functionality depends on the pandas library. You may want to install it alongside this package (see the instructions under genepattern-notebook).

gpauthenticator

This is an authenticator plugin for JupyterHub that allows users to sign in using their GenePattern credentials. To enable it, you will need to pull the repository, add it to your Python path and then configure JupyterHub. To check out the repository, use the following command:

git clone https://github.com/genepattern/notebook-repository.git

This will create a notebook-repository directory with the contents of the git repository you just cloned. Inside this directory will be a subdirectory called gpauthenticator. Move this subdirectory to a directory on your Python path. You can find a list of valid directories on your path by running:

python -c "import sys; print(sys.path)"

To enable the authenticator plugin, you will also need to locate your jupyterhub_config.py and then add the following line:

c.JupyterHub.authenticator_class = 'gpauthenticator.GenePatternAuthenticator'

notebook-projects

This functionality consists of a set of configuration and templates, which allow users to encapsulate dependencies, notebooks and supporting files within individual “projects.” To enable it, you will need to pull the repository, copy the templates and configure JupyterHub.

The templates and config are contained in the notebook-repository git repository. To pull it, see the instructions under gpauthenticator.

All of the templates and client-side assets used for this functionality are located in the theme subdirectory. Copy this subdirectory someplace permanent and accessible to JupyterHub; for example, to /data/theme.

Finally, locate your jupyterhub_config.py and add the following lines, changing any file paths as necessary:

# Template config
c.JupyterHub.logo_file = '/data/theme/images/gpnb.png'
c.JupyterHub.template_paths = ['/data/theme/templates']

# Named server config
c.JupyterHub.allow_named_servers = True
c.JupyterHub.default_url = '/home'
c.JupyterHub.extra_handlers = [('user.json', UserHandler)]

You will also want to mount different directories for each user project. How you configure this depends on the spawner being employed by your JupyterHub install. An example assuming DockerSpawner is below:

c.DockerSpawner.name_template = "{prefix}-{username}-{servername}"
c.DockerSpawner.volumes = {
        '/data/users/{username}/{servername}': '/home/jovyan',
}

notebook-repository

This is a JupyterHub-managed service which allows users to share notebooks, and their associated data files, with the public or with select collaborators. It uses JupyterHub’s services API and depends on the templates in notebook-projects. To enable it, you will need to pull the repository, copy the files and then configure JupyterHub.

This service is contained in the notebook-repository git repository. To pull it, see the instructions under gpauthenticator.

The notebook-repository service is written in Python and employs the tornado framework—the same technology stack as JupyterHub itself. It’s located in the projects subdirectory. Copy this subdirectory someplace permanent and accessible to JupyterHub; for example, to /data/projects.

Finally, locate your jupyterhub_config.py and add the following lines, changing any file paths as necessary. Note that the configuration below assumes the installation of notebook-projects and is in addition to the configuration shown in that section above:

c.JupyterHub.extra_handlers = [('user.json', UserHandler), ('preview', PreviewHandler)]
c.JupyterHub.services = [
    {
        'name': 'projects',
        'admin': True,
        'url': 'http://127.0.0.1:3000/',
        'cwd': '/data/projects',
         'environment': {
             'IMAGE_WHITELIST': ','.join(c.DockerSpawner.image_whitelist.keys())
        },

        'command': ['python', 'start-projects.py', '--config=/data/projects_config.py']
    },
]