Extending your JupyterHub setup

The helm chart used to install JupyterHub has a lot of options for you to tweak. This page lists some of the most common changes.

Applying configuration changes

The general method is:

  1. Make a change to the config.yaml

  2. Run a helm upgrade:

    helm upgrade <YOUR_RELEASE_NAME> https://github.com/jupyterhub/helm-chart/releases/download/v0.3/jupyterhub-v0.3.tgz -f config.yaml
    

    Where <YOUR_RELEASE_NAME> is the parameter you passed to --name when installing jupyterhub with helm install. If you don’t remember it, you can probably find it by doing helm list.

  3. Wait for the upgrade to finish, and make sure that when you do kubectl --namespace=<YOUR_NAMESPACE> get pod the hub and proxy pods are in Ready state. Your configuration change has been applied!

Using an existing image

It’s possible to build your JupyterHub deployment off of a pre-existing Docker image. To do this, you need to find an existing image somewhere (such as DockerHub), and configure your installation to use it.

For example, UC Berkeley’s Data8 Program publishes the image they are using on Dockerhub. To instruct JupyterHub to use this image, simply add this to your config.yaml file:

singleuser:
    image:
       name: berkeleydsep/singleuser-data8
       tag: v0.1

You can then apply the change to the config as usual.

Setting memory and CPU guarantees / limits for your users

Each user on your JupyterHub gets a slice of memory and CPU to use. There are two ways to specify how much users get to use: resource guarantees and resource limits.

A resource guarantee means that all users will have at least this resource available at all times, but they may be given more resources if they’re available. For example, if users are guaranteed 1G of RAM, users can technically use more than 1G of RAM if these resources aren’t being used by other users.

A resource limit sets a hard limit on the resources available. In the example above, if there were a 1G memory limit, it would mean that users could use no more than 1G of RAM, no matter what other resources are being used on the machines.

By default, each user is guaranteed 1G of RAM. All users have at least 1G, but they can technically use more if it is available. You can easily change the amount of these resources, and whether they are a guarantee or a limit, by changing your config.yaml file. This is done with the following structure.

singleuser:
    memory:
       limit: 1G
       guarantee: 1G

This sets a memory limit and guarantee of 1G. Kubernetes will make sure that each user will always have access to 1G of RAM, and requests for more RAM will fail (your kernel will usually die). You can set the limit to be higher than the guarantee to allow some users to use larger amounts of RAM for a very short-term time (e.g. when running a single, short-lived function that consumes a lot of memory).

Note

Remember apply the changes after changing your config.yaml file!

Extending your software stack with s2i

s2i, also known as Source to Image, lets you quickly convert a GitHub repository into a Docker image that we can use as a base for your JupyterHub instance. Anything inside the GitHub repository will exist in a user’s environment when they join your JupyterHub. If you include a requirements.txt file in the root level of your of the repository, s2i will pip install each of these packages into the Docker image to be built. Below we’ll cover how to use s2i to generate a Docker image and how to configure JupyterHub to build off of this image.

Note

For this section, you’ll need to install s2i and docker.

  1. Download s2i. This is easily done with homebrew on a mac. For linux and Windows it entails a couple of quick commands that you can find in the links below:

  2. Download and start Docker. You can do this by downloading and installing Docker at this link. Once you’ve started Docker, it will show up as a tiny background application.

  3. Create (or find) a GitHub repository you want to use. This repo should have all materials that you want your users to access. In addition you can include a requirements.txt file that has one package per line. These packages should be listed in the same way that you’d install them using pip install. You should also specify the versions explicitly so the image is fully reproducible. E.g.:

    numpy==1.12.1
    scipy==0.19.0
    matplotlib==2.0
    
  4. Use s2i to build your Docker image. s2i uses a template in order to know how to create the Docker image. We have provided one at the url in the commands below. Run this command:

    s2i build --exclude "" <git-repo-url>  jupyterhub/singleuser-builder-venv-3.5:v0.1.5 gcr.io/<project-name>/<name-of-image>:<tag>
    

    this effectively says s2i, build `<this repository>` to a Docker image by using `<this template>` and call the image `<this>`. The –exclude “” ensures that all files are included in the container (e.g. .git directory).

Note

  • The project name should match your google cloud project’s name.

  • Don’t use underscores in your image name. Other than this it can be anything memorable. This is a bug that will be fixed soon.

  • The tag should be the first 6 characters of the SHA in the GitHub commit for the image to build from.

  1. Push our newly-built Docker image to the cloud. You can either push this to Docker Hub, or to the gcloud docker repository. Here we’ll push to the gcloud repository:

    gcloud docker -- push gcr.io/<project-name>/<image-name>:<tag>
    
  2. Edit the JupyterHub configuration to build from this image. We do this by editing the config.yaml file that we originally created to include the jupyter hashes. Edit config.yaml by including these lines in it:

    singleuser:
      image:
        name: gcr.io/<project-name>/<image-name>
        tag: <tag>
    
  3. Tell helm to update JupyterHub to use this configuration. Using the normal method to apply the change to the config.

  4. Restart your notebook if you are already logging in If you already have a running JupyterHub session, you’ll need to restart it (by stopping and starting your session from the control panel in the top right). New users won’t have to do this.

  5. Enjoy your new computing environment! You should now have a live computing environment built off of the Docker image we’ve created.

    Note

    The contents of your GitHub repository might not show up if you have enabled persistent storage. Disable persistent storage if you want them to show up!

Pre-populating $HOME directory with notebooks when using Persistent Volumes

By default, Persistent Volumes are used, so if you would like to include the contents of the GitHub repository in the $HOME directory (e.g. all of the *.ipynb files), then edit config.yaml include these lines in it:

singleuser:
  lifecycleHooks:
    postStart:
      exec:
        command: ["/bin/sh", "-c", "test -f $HOME/.copied || cp -Rf /srv/app/src/. $HOME/; touch $HOME/.copied"]

Note that this will only copy the contents of the directory to $HOME once - the first time the user logs in. Further updates will not be reflected. There is work in progress for making this better.

Authenticating with OAuth2

JupyterHub’s oauthenticator has support for enabling your users to authenticate via a third-party OAuth provider, including GitHub, Google, and CILogon.

Follow the service-specific instructions linked on the oauthenticator repository to generate your JupyterHub instance’s OAuth2 client ID and client secret. Then declare the values in the helm chart (config.yaml).

Here are example configurations for two common authentication services. Note that in each case, you need to get the authentication credential information before you can configure the helmchart for authentication.

Google

For more information see the full example of Google OAuth2 in the next section.

auth:
  type: google
  google:
    clientId: "yourlongclientidstring.apps.googleusercontent.com"
    clientSecret: "adifferentlongstring"
    callbackUrl: "http://<your_jupyterhub_host>/hub/oauth_callback"
    hostedDomain: "youruniversity.edu"
    loginService: "Your University"

GitHub

auth:
  type: github
  github:
    clientId: "y0urg1thubc1ient1d"
    clientSecret: "an0ther1ongs3cretstr1ng"
    callbackUrl: "http://<your_jupyterhub_host>/hub/oauth_callback"

Full Example of Google OAuth2

If your institution is a G Suite customer that integrates with Google services such as Gmail, Calendar, and Drive, you can authenticate users to your JupyterHub using Google for authentication.

Note

Google requires that you specify a fully qualified domain name for your hub rather than an IP address.

  1. Log in to the Google API Console.

  2. Select a project > Create a project… and set ‘Project name’. This is a short term that is only displayed in the console. If you have already created a project you may skip this step.

  3. Type “Credentials” in the search field at the top and click to access the Credentials API.

  4. Click “Create credentials”, then “OAuth client ID”. Choose “Application type” > “Web application”.

  5. Enter a name for your JupyterHub instance. You can give it a descriptive name or set it to be the hub’s hostname.

  6. Set “Authorized JavaScript origins” to be your hub’s URL.

  7. Set “Authorized redirect URIs” to be your hub’s URL followed by “/hub/oauth_callback”. For example, http://example.com/hub/oauth_callback.

  8. When you click “Create”, the console will generate and display a Client ID and Client Secret. Save these values.

  9. Type “consent screen” in the search field at the top and click to access the OAuth consent screen. Here you will customize what your users see when they login to your JupyterHub instance for the first time. Click Save when you are done.

  10. In your helm chart, create a stanza that contains these OAuth fields:

auth:
  type: google
  google:
    clientId: "yourlongclientidstring.apps.googleusercontent.com"
    clientSecret: "adifferentlongstring"
    callbackUrl: "http://<your_jupyterhub_host>/hub/oauth_callback"
    hostedDomain: "youruniversity.edu"
    loginService: "Your University"

The ‘callbackUrl’ key is set to the authorized redirect URI you specified earlier. Set ‘hostedDomain’ to your institution’s domain name. The value of ‘loginService’ is a descriptive term for your institution that reminds your users which account they are using to login.

Expanding and contracting the size of your cluster

You can easily scale up or down your cluster’s size to meet usage demand or to save cost when the cluster is not being used. Use the resize command and provide a new cluster size as a command line option --size:

gcloud container clusters resize \
             <YOUR-CLUSTER-NAME> \
             --size <NEW-SIZE> \
             --zone <YOUR-CLUSTER-ZONE>

To display the cluster’s name, zone, or current size, use the command gcloud container clusters list.

Note

When organizing and running a workshop, resizing a cluster gives you a way to save cost and prepare JupyterHub before the event. For example:

  • One week before the workshop: You can create the cluster, set everything up, and then resize the cluster to zero nodes to save cost.

  • On the day of the workshop: You can scale the cluster up to a suitable size for the workshop. This workflow also helps you avoid scrambling on the workshop day to set up the cluster and JupyterHub.

  • After the workshop: The cluster can be deleted.