Giter Site home page Giter Site logo

notebook-as-a-service-demo's Introduction

Demo : Notebook as a Service

Integration of Jupyterhub (Jhub) & Jupyter Enterprise Gateway (JEG)

LAB Objective:

Demonstarte steps to realize Jupyter notebook as a service using Jupyter Enterprise Gateway(JEG) & Jupyterhub (JHub) to realize separation of Jupyter server (backend of JupyterLab) and (computation) Kernels.

LAB Overview:

  • Deploy JEG & JHub on single node Microk8s cluster.
  • Simulate users access Jupyterhub via their browsers . Jupyterhub launches server (JupyterLab backend) pod for each individual user. When a user tries to connect to a kernel, the server will acts as a proxy to spawn a kernel Pod in a separate namespace which is separate from server Pod. All these Pods are managed by a K8s cluster. When a user shutdowns kernel, the kernel pod will be destroyed . When a user shutdowns server, server Pod will be destroyed. But, a PVC (bound to a PV backed by a NFS share storing all users' home directory data) persists even after a server pod is deleted, thus keeping user's home directory data beyond server pod lifecycle. Next time, when the user logs in to start another server, the newly created server pod will grab existing PVC so that the user can continue to work with his/her data.
  • In this demo, we showcase how Jupyter Enterprise Gateway works with Python kernel and Spark Python kernel (Spark on Kubernetes).

Diagram

LAB Steps:

  1. Setup a NFS server Assume that a NFS server (IP: 172.17.0.1) exports a share at the path: /home/nfs_share .

  2. Setup single node Microk8s cluster on Ubuntu machine

  3. Create namespaces

    kubectl create namespace enterprise-gateway
    kubectl create namespace jupyterhub
    
  4. Create PV & PVCs

    Use yaml file jhub_pvc.yaml to create :

    • pv nfs-pv : mount of a nfs share at 172.17.0.1:/home/nfs_share/claim
    • pvc jhub-claim in namespace jupyterhub : bound to pv nfs-pv

    Use yaml file kernelspecs_pvc.yaml to create pvc kernelspecs-pvc in namespace enterprise-gateway : 20MB nfs share allocated from nfs-client storage class to store kernelspecs.

  5. Deploy JEG to namespace enterprise-gateway

    git clone https://github.com/jupyter-server/enterprise_gateway
    mkdir eg
    helm template --output-dir ./eg enterprise-gateway enterprise-gateway/etc/kubernetes/helm/enterprise-gateway -n enterprise-gateway -f jeg_customized_values.yaml
    kubectl apply -f ./eg/enterprise-gateway/templates/
    

    We use jeg_customized_values.yaml to customize JEG chart values.

    Copy kernelspecs and kernel-launcher sciprts and j2 templates to the NFS share that corresponds to pvc kernelspecs-pvc . After copy, the NFS share file /directory structure looks like:

    .
    ├── python_kubernetes
    │   ├── kernel.json
    │   └── scripts
    │       ├── kernel-pod.yaml.j2
    │       └── launch_kubernetes.py
    └── spark_python_kubernetes
        ├── bin
        │   └── run.sh
        ├── kernel.json
        └── scripts
            ├── kernel-pod.yaml.j2
            └── launch_kubernetes.py
    		
    
  6. Helm deploy JHub to namespace jupyterhub

    helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/
    helm repo update
    helm install jhub jupyterhub/jupyterhub -f jhub_customized_values.yaml --version=2.0.0 -n jupyterhub 
    

    We use jhub_customized_values.yaml to customize Jhub chart values:

    • Set singleuser.storage.type to static to use static storage allocation
    • Set singleuser.storage.static.pvcName (jhub-claim) to allocate static storage for jupyter server (ex: each user's home dir mapped to a subdir of the username)
    • Set singleuser.extraEnv.JUPYTER_GATEWAY_URL to point to JEG's endpoint (http://enterprise-gateway.enterprise-gateway:8888)
    • Set singleuser.cmd to have a shell expand 'KERNEL_PATH=' expression first and pass KERNEL_PATH as an environment variable to 'jupyterhub-singleuser' by 'env' command. JUPYTERHUB_USER contains username and KERNEL_PATH stores the nfs share path to be mapped to the user's home dir in kernel pod.
  7. Build custom Spark python kernel image that supports S3A access to object storage

    As the default Spark python kernel images available in repo are built from Spark with Hadoop 2.7, we cannot get S3A access work with these images, we therefore build a custom container image using Dockerfile from the repo as a template and specifically install Spark 3.2.3 with Hadoop 3.2. We add hadoop-aws-3.2.3.jar and aws-java-sdk-bundle-1.11.901.jar required for S3A access to this image. These two jar files can get along with Spark 3.2.3 on Hadoop 3.2.

    cd build
    docker build -t yangxh/kernel-spark.py:latest .
    

    We use the Dockerfile to build the custom kernel image.

  8. Customize kernel.json file in kernelspecs' nfs share

    Customize python_kubernetes/kernel.json and spark_python_kubernetes/kernel.json files in kernelspecs' nfs share to include environment variables KERNEL_VOLUME_MOUNTS and KERNEL_VOLUMES. These variables will be read by python-kubernetes/scripts/launch_kubernetes.py script to render kernel pod yaml file which includes mount of a nfs share at the path as specified by environment variable KERNEL_PATH. (When JHub launches a server for a user that connects to JEG ,KERNEL_PATH is one of enviornment variables that get passed to JEG. As KERNEL_VOLUMES in python-kubernetes/kernel.json makes reference to variable KERNEL_PATH, the kernel pod yaml file prepared by JEG will include a mount entry of NFS share at the path specified by KERNEL_PATH.) This will make a user's Jupyter server pod and kernel pod have a common nfs share mapped to their home directories ( '/home/jovyan').

    In addition to the above customization, we will add configuration pertaining to S3 object storage endpoint URL, authentication provider and credentials to spark_python_kubernetes/kernel.json so that S3 object storage could be accessed in Spark python kernel. For illustration, here we specify AWS_SECRET_ACCESS_KEY=foo,AWS_ACCESS_KEY_ID=bar as environment variables and append --conf spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider --conf spark.hadoop.fs.s3a.endpoint=object.storage.com to SPARK_OPTS variable. We also configure Spark driver and executor containers to use custom container image docker.io/yangxh/kernel-spark-py:latest we built in previous step.

Screenshots:

  • PVs

PV

  • PVCs

PVC

  • Services

Services

  • Before any user's login ( a total of 16 pods in cluster) and after Alice exits login session

  • After user Alice login and connect to a python kernel (two more Pods: One for server and one for kernel)

  • After user Alice login and connect to Spark python kernel (three more Pods: one Spark driver, two executors, and one server)

  • After user Alice shuts dowm a kernel ,but keeps logged in (one less Pod: Kernel pod is gone and server pod stays)

  • Helm releases

References:

notebook-as-a-service-demo's People

Contributors

xianhong avatar

Watchers

 avatar

Forkers

qd888

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.