.. _extending-jupyterhub: Extending your JupyterHub setup =============================== The helm chart used to install JupyterHub has a lot of options for you to tweak. This page lists some of the most common changes. Applying configuration changes ------------------------------ The general method is: 1. Make a change to the ``config.yaml`` 2. Run a helm upgrade: .. code-block:: bash helm upgrade https://github.com/jupyterhub/helm-chart/releases/download/v0.3/jupyterhub-v0.3.tgz -f config.yaml Where ```` is the parameter you passed to ``--name`` when `installing jupyterhub `_ with ``helm install``. If you don't remember it, you can probably find it by doing ``helm list``. 3. Wait for the upgrade to finish, and make sure that when you do ``kubectl --namespace= get pod`` the hub and proxy pods are in ``Ready`` state. Your configuration change has been applied! Using an existing image ----------------------- It's possible to build your JupyterHub deployment off of a pre-existing Docker image. To do this, you need to find an existing image somewhere (such as DockerHub), and configure your installation to use it. For example, UC Berkeley's `Data8 Program `_ publishes the image they are using on Dockerhub. To instruct JupyterHub to use this image, simply add this to your ``config.yaml`` file: .. code-block:: yaml singleuser: image: name: berkeleydsep/singleuser-data8 tag: v0.1 You can then `apply the change <#applying-configuration-changes>`_ to the config as usual. Setting memory and CPU guarantees / limits for your users --------------------------------------------------------- Each user on your JupyterHub gets a slice of memory and CPU to use. There are two ways to specify how much users get to use: resource *guarantees* and resource *limits*. A resource *guarantee* means that all users will have *at least* this resource available at all times, but they may be given more resources if they're available. For example, if users are *guaranteed* 1G of RAM, users can technically use more than 1G of RAM if these resources aren't being used by other users. A resource *limit* sets a hard limit on the resources available. In the example above, if there were a 1G memory limit, it would mean that users could use no more than 1G of RAM, no matter what other resources are being used on the machines. By default, each user is *guaranteed* 1G of RAM. All users have *at least* 1G, but they can technically use more if it is available. You can easily change the amount of these resources, and whether they are a *guarantee* or a *limit*, by changing your ``config.yaml`` file. This is done with the following structure. .. code-block:: yaml singleuser: memory: limit: 1G guarantee: 1G This sets a memory limit and guarantee of 1G. Kubernetes will make sure that each user will always have access to 1G of RAM, and requests for more RAM will fail (your kernel will usually die). You can set the limit to be higher than the guarantee to allow some users to use larger amounts of RAM for a very short-term time (e.g. when running a single, short-lived function that consumes a lot of memory). .. note:: Remember `apply the changes <#applying-configuraiton-changes>`_ after changing your config.yaml file! Extending your software stack with s2i -------------------------------------- s2i, also known as `Source to Image `_, lets you quickly convert a GitHub repository into a Docker image that we can use as a base for your JupyterHub instance. Anything inside the GitHub repository will exist in a user’s environment when they join your JupyterHub. If you include a ``requirements.txt`` file in the root level of your of the repository, s2i will ``pip install`` each of these packages into the Docker image to be built. Below we’ll cover how to use s2i to generate a Docker image and how to configure JupyterHub to build off of this image. .. note:: For this section, you’ll need to install s2i and docker. 1. **Download s2i.** This is easily done with homebrew on a mac. For linux and Windows it entails a couple of quick commands that you can find in the links below: - On OSX: ``brew install s2i`` - On Linux and Windows: `follow these instructions `_ 2. **Download and start Docker.** You can do this by downloading and installing Docker at `this link `_. Once you’ve started Docker, it will show up as a tiny background application. 3. **Create (or find) a GitHub repository you want to use.** This repo should have all materials that you want your users to access. In addition you can include a ``requirements.txt`` file that has one package per line. These packages should be listed in the same way that you’d install them using ``pip install``. You should also specify the versions explicitly so the image is fully reproducible. E.g.: .. code-block:: bash numpy==1.12.1 scipy==0.19.0 matplotlib==2.0 4. **Use s2i to build your Docker image.** `s2i` uses a template in order to know how to create the Docker image. We have provided one at the url in the commands below. Run this command:: s2i build --exclude "" jupyterhub/singleuser-builder-venv-3.5:v0.1.5 gcr.io//: this effectively says *s2i, build `` to a Docker image by using `` and call the image ``*. The `--exclude ""` ensures that all files are included in the container (e.g. `.git` directory). .. note:: - The project name should match your google cloud project's name. - Don’t use underscores in your image name. Other than this it can be anything memorable. This is a bug that will be fixed soon. - The tag should be the first 6 characters of the SHA in the GitHub commit for the image to build from. 5. **Push our newly-built Docker image to the cloud.** You can either push this to Docker Hub, or to the gcloud docker repository. Here we’ll push to the gcloud repository:: gcloud docker -- push gcr.io//: 6. **Edit the JupyterHub configuration to build from this image.** We do this by editing the ``config.yaml`` file that we originally created to include the jupyter hashes. Edit ``config.yaml`` by including these lines in it: .. code-block:: bash singleuser: image: name: gcr.io// tag: 7. **Tell helm to update JupyterHub to use this configuration.** Using the normal method to `apply the change <#applying-configuration-changes>`_ to the config. 8. **Restart your notebook if you are already logging in** If you already have a running JupyterHub session, you’ll need to restart it (by stopping and starting your session from the control panel in the top right). New users won’t have to do this. 9. **Enjoy your new computing environment!** You should now have a live computing environment built off of the Docker image we’ve created. .. note:: The contents of your GitHub repository might not show up if you have enabled persistent storage. Disable persistent storage if you want them to show up! Pre-populating `$HOME` directory with notebooks when using Persistent Volumes ----------------------------------------------------------------------------- By default, Persistent Volumes are used, so if you would like to include the contents of the GitHub repository in the `$HOME` directory (e.g. all of the `*.ipynb` files), then edit ``config.yaml`` include these lines in it: .. code-block:: bash singleuser: lifecycleHooks: postStart: exec: command: ["/bin/sh", "-c", "test -f $HOME/.copied || cp -Rf /srv/app/src/. $HOME/; touch $HOME/.copied"] Note that this will only copy the contents of the directory to $HOME *once* - the first time the user logs in. Further updates will not be reflected. There is work in progress for making this better. Authenticating with OAuth2 -------------------------- JupyterHub's `oauthenticator `_ has support for enabling your users to authenticate via a third-party OAuth provider, including GitHub, Google, and CILogon. Follow the service-specific instructions linked on the `oauthenticator repository `_ to generate your JupyterHub instance's OAuth2 client ID and client secret. Then declare the values in the helm chart (``config.yaml``). Here are example configurations for two common authentication services. Note that in each case, you need to get the authentication credential information before you can configure the helmchart for authentication. **Google** For more information see the full example of Google OAuth2 in the next section. .. code-block:: yaml auth: type: google google: clientId: "yourlongclientidstring.apps.googleusercontent.com" clientSecret: "adifferentlongstring" callbackUrl: "http:///hub/oauth_callback" hostedDomain: "youruniversity.edu" loginService: "Your University" **GitHub** .. code-block:: yaml auth: type: github github: clientId: "y0urg1thubc1ient1d" clientSecret: "an0ther1ongs3cretstr1ng" callbackUrl: "http:///hub/oauth_callback" Full Example of Google OAuth2 ----------------------------- If your institution is a `G Suite customer `_ that integrates with Google services such as Gmail, Calendar, and Drive, you can authenticate users to your JupyterHub using Google for authentication. .. note:: Google requires that you specify a fully qualified domain name for your hub rather than an IP address. 1. Log in to the `Google API Console `_. 2. Select a project > Create a project... and set 'Project name'. This is a short term that is only displayed in the console. If you have already created a project you may skip this step. 3. Type "Credentials" in the search field at the top and click to access the Credentials API. 4. Click "Create credentials", then "OAuth client ID". Choose "Application type" > "Web application". 5. Enter a name for your JupyterHub instance. You can give it a descriptive name or set it to be the hub's hostname. 6. Set "Authorized JavaScript origins" to be your hub's URL. 7. Set "Authorized redirect URIs" to be your hub's URL followed by "/hub/oauth_callback". For example, http://example.com/hub/oauth_callback. 8. When you click "Create", the console will generate and display a Client ID and Client Secret. Save these values. 9. Type "consent screen" in the search field at the top and click to access the OAuth consent screen. Here you will customize what your users see when they login to your JupyterHub instance for the first time. Click Save when you are done. 10. In your helm chart, create a stanza that contains these OAuth fields: .. code-block:: bash auth: type: google google: clientId: "yourlongclientidstring.apps.googleusercontent.com" clientSecret: "adifferentlongstring" callbackUrl: "http:///hub/oauth_callback" hostedDomain: "youruniversity.edu" loginService: "Your University" The 'callbackUrl' key is set to the authorized redirect URI you specified earlier. Set 'hostedDomain' to your institution's domain name. The value of 'loginService' is a descriptive term for your institution that reminds your users which account they are using to login. Expanding and contracting the size of your cluster -------------------------------------------------- You can easily scale up or down your cluster's size to meet usage demand or to save cost when the cluster is not being used. Use the ``resize`` command and provide a new cluster size as a command line option ``--size``: .. code-block:: bash gcloud container clusters resize \ \ --size \ --zone To display the cluster's name, zone, or current size, use the command ``gcloud container clusters list``. .. note:: When organizing and running a workshop, resizing a cluster gives you a way to save cost and prepare JupyterHub before the event. For example: - **One week before the workshop:** You can create the cluster, set everything up, and then resize the cluster to zero nodes to save cost. - **On the day of the workshop:** You can scale the cluster up to a suitable size for the workshop. This workflow also helps you avoid scrambling on the workshop day to set up the cluster and JupyterHub. - **After the workshop:** The cluster can be deleted.