pangeo-data / helm-chart Goto Github PK

View Code? Open in Web Editor NEW

21.0 21.0 26.0 2.11 MB

Pangeo helm charts

Home Page: https://pangeo-data.github.io/helm-chart/

Shell 100.00%

helm-chart's People

Contributors

Stargazers

Watchers

helm-chart's Issues

help wanted!

This issue is just to make it explicitly clear that we are looking for people to help develop and maintain these helm charts, docker images, and related deployment instructions.

#5 (continuous deployment) remains a central goal. We would like to be able to have changes to the chart / dockerfiles automatically built and, upon merging into master, deployed in a test environment. We then want a way to automatically deploy this to pangeo.pydata.org.

We also have a long-running problem with helm update. We generally just have to delete and re-install the cluster in order to make it work.

It would be fantastic if someone who understands kubernetes, docker, and jupyterhub could just take this on and help us sort it out. None of the pangeo core team really has the expertise to get us out of this rut.

Helm charts point to Docker images not available on Dockerhub

xref here: pangeo-data/pangeo#337

The two most recent helm-charts are tagged with commits that don't correspond to the image tags on dockerhub.

If I recall, chartpress is not pushing images to dockerhub, right? So this is still a manual, two-step process?

error updating pangeo.pydata.org

I just added a new notebook docker image in #20.

Now I am trying to deploy it.

$ helm repo update
Hang tight while we grab the latest from your chart repositories...
...Skip local chart repository
...Successfully got an update from the "pangeo" chart repository
...Successfully got an update from the "stable" chart repository
Update Complete. ⎈ Happy Helming!⎈ 
$ helm upgrade jupyter pangeo/pangeo -f jupyter-config.yaml -f secret-config.yaml --version 0.1.0-c1651d3
2018/05/01 11:25:54 warning: cannot overwrite table with non table for extraConfig (map[])
2018/05/01 11:25:54 warning: cannot overwrite table with non table for extraConfig (map[])
Error: UPGRADE FAILED: Deployment.apps "proxy" is invalid: spec.template.metadata.labels:
Invalid value: map[string]string{"heritage":"Tiller", "hub.jupyter.org/network-access-hub":"true", "hub.jupyter.org/network-access-singleuser":"true", "name":"proxy", "release":"jupyter", "component":"proxy"}: `selector` does not match template `labels`

No idea what to make of this error.

404 on https://pangeo-data.github.io/helm-chart/

Why does this show a 404?
https://pangeo-data.github.io/helm-chart/

It appears to work if I do

$ helm repo add pangeo https://pangeo-data.github.io/helm-chart/
"pangeo" has been added to your repositories
$ helm repo update
Hang tight while we grab the latest from your chart repositories...
...Skip local chart repository
...Successfully got an update from the "pangeo" chart repository
...Successfully got an update from the "stable" chart repository

But it is confusing to me that I get a 404 when I browse to the helm-chart link.

cc @tjcrone

Error: InvalidImageName

Trying to re-deploy the cluster with the new chart and docker image, I am getting this error

Failed to apply default image tag "None:None": couldn't parse image reference "None:None": invalid reference format: repository name must be lowercase

Deploy script fails

I know I am making a mess here. But I can't understand how it is possible that the original deployment script ever actually worked.

With my new modifications (#12), the deploy now fails.

This is the error it gives:

Deploying application
Cloning into 'pangeo-e853181'...
Warning: Permanently added the RSA host key for IP address '192.30.253.113' to the list of known hosts.
Switched to a new branch 'gh-pages'
Branch 'gh-pages' set up to track remote branch 'gh-pages' from 'origin'.
Traceback (most recent call last):
  File "/home/travis/virtualenv/python3.6.3/bin/chartpress", line 11, in <module>
    load_entry_point('chartpress==0.2.0.dev0', 'console_scripts', 'chartpress')()
  File "/home/travis/virtualenv/python3.6.3/lib/python3.6/site-packages/chartpress.py", line 292, in main
    extra_message=args.extra_message,
  File "/home/travis/virtualenv/python3.6.3/lib/python3.6/site-packages/chartpress.py", line 223, in publish_pages
    '--destination', td + '/',
  File "/opt/python/3.6.3/lib/python3.6/subprocess.py", line 286, in check_call
    retcode = call(*popenargs, **kwargs)
  File "/opt/python/3.6.3/lib/python3.6/subprocess.py", line 267, in call
    with Popen(*popenargs, **kwargs) as p:
  File "/opt/python/3.6.3/lib/python3.6/subprocess.py", line 709, in __init__
    restore_signals, start_new_session)
  File "/opt/python/3.6.3/lib/python3.6/subprocess.py", line 1344, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'helm': 'helm'
Script failed with status 1

problems with compatibility with zero2jupyterhub-k8s 0.7 chart version

In #63 I updated our chart to point to the latest stable release of the jupyterhub helm chart (0.7). However, when I tried to re-deploy pangeo.pydata.org with this chart, it didn't work. Specifically, the method for mounting the custom templates using extraVolumes / extraVolumeMounts didn't work, and as a result, the landing page was broken.

In pangeo-data/pangeo#389, I pin the hub image tag to v0.6, which seems to fix the problem. In the long term, we need to fix the underlying problem.

charpress didn't upload the new tag to dockerhub

The most recent charpress build succeeded in building a notebook docker image:
https://travis-ci.org/pangeo-data/helm-chart/jobs/391946771#L2019
It tagged it 9d7f08b but it doesn't appear to have uploaded to dockerhub: https://hub.docker.com/r/pangeo/notebook/tags/.

I expected this, since we don't have any docker credentials stored here. But I also expected an error.

Someone needs to figure out what is going on here. Does this fact that this tag was not uploaded mean that the current helm chart is not deployable?
https://pangeo-data.github.io/helm-chart/pangeo-v0.1.1-9d7f08b.tgz

(Also, how can I inspect the contents of the helm chart at that link? Do I really have to download it an unzip it or is there an easier tool?)

move chart press page to pangeo.io

Currently, this helm chart is published at https://pangeo-data.github.io/helm-chart/. Can we move it to pangeo-data.org?

Use the same image for notebook and worker

I want to suggest that we remove the worker image and use the notebook image for both purposes.

This is what we are doing on our Pangeo. I actually spent yesterday rewriting our image to use the notebook image from this repo as a base so it will be easier for me to push changes upstream. Both the notebooks and the workers use this image. Although the workers end up using a larger image the startup time isn't affected and we reduce disk usage by only using one image.

examples directory is not downloading correctly

In #44, @martindurant implemented a new system for downloading example notebooks, using the following code:

helm-chart/docker-images/notebook/prepare.sh

Lines 8 to 21 in 7f2f89e

    
           if [ -z "$EXAMPLES_GIT_URL" ]; then 
        
               export EXAMPLES_GIT_URL=https://github.com/pangeo-data/pangeo-example-notebooks 
        
           fi 
        
           if [ ! -d "examples" ]; then 
        
             git clone $EXAMPLES_GIT_URL examples 
        
           fi 
        
           cd examples 
        
           git remote set-url origin $EXAMPLES_GIT_URL 
        
           git fetch origin 
        
           git reset --hard origin/master 
        
           git merge --strategy-option=theirs origin/master 
        
           if [ ! -f DONT_SAVE_ANYTHING_HERE.md ]; then 
        
             echo "Files in this directory should be treated as read-only"  > DONT_SAVE_ANYTHING_HERE.md 
        
           fi

It does not appear to be working. Here is the kubernetes log of a new user logging in to pangeo.pydata.org for the first time

$ kubectl --namespace=pangeo logs jupyter-buaamuxu
+ echo 'Copy files from pre-load directory into home'
Copy files from pre-load directory into home
+ cp --update -r -v /pre-home/. /home/jovyan
'/pre-home/./examples' -> '/home/jovyan/./examples'
'/pre-home/./config.yaml' -> '/home/jovyan/./config.yaml'
'/pre-home/./worker-template.yaml' -> '/home/jovyan/./worker-template.yaml'
+ '[' -z '' ']'
+ export EXAMPLES_GIT_URL=https://github.com/pangeo-data/pangeo-example-notebooks
+ EXAMPLES_GIT_URL=https://github.com/pangeo-data/pangeo-example-notebooks
+ '[' '!' -d examples ']'
+ cd examples
+ git remote set-url origin https://github.com/pangeo-data/pangeo-example-notebooks
fatal: not a git repository (or any parent up to mount point /home)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
+ git fetch origin
fatal: not a git repository (or any parent up to mount point /home)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
+ git reset --hard origin/master
fatal: not a git repository (or any parent up to mount point /home)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
+ git merge --strategy-option=theirs origin/master
fatal: not a git repository (or any parent up to mount point /home)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
+ '[' '!' -f DONT_SAVE_ANYTHING_HERE.md ']'
+ echo 'Files in this directory should be treated as read-only'

As a result, no no users are getting any example notebooks loaded.

This is a pretty serious problem, and I'm surprised it hasn't ben caught until now.

I think the fix might be to set EXAMPLES_GIT_URL to https://github.com/pangeo-data/pangeo-example-notebooks.git

factor worker-template.yaml out of notebook docker image

Currently the dask worker template file lives inside the notebook docker image. This makes it complicated to change the worker.

Imagine I want to spin up a new cluster in which the only difference is that the workers use a different VM with more memory. (I actually do want to do this right now.) Currently this requires a change to a single line of the worker-template.yaml file. But in order to push this change to the cluster, I have to build, tag, and push a new notebook docker image, then create a new helm chart pointing to that image, and then re-deploy the cluster

Instead, I would like to specify the worker-template.yaml information at the helm level. That way, to change the worker resources, I could just edit the helm chart values.yaml or some similar config file held by helm.

(This might also help us with auto building / tagging worker images by chartpress.)

Unfortunately, I don't understand helm nearly well enough to figure out how to make this work. So I have to cc the people who know more: @tjcrone, @yuvipanda, @choldgraf, @mrocklin.

Travis is broken

Travis seems to be broken 😞.

In an announcement in May Travis decided to migrate all the open source repos from travis-ci.org to travis-ci.com. Historically org was for open source and com was for paid users. It has been a slow process and some repos are still being moved over.

It looks like this repo has stopped building on org but hasn't started building on com. This is something we should potentially raise with Travis support.

Automate version bumps

It would be really helpful if we could setup a bot to watch the two dependencies in this project (z2jh and dask-gateway) and open PRs whenever a new version is available. @jacobtomlinson has been creating some awesome github actions (example) that do this sort of thing and I'm wondering if those can be repurposed for this use case.

Charts to watch:

Fields to replace:

helm-chart/pangeo/Chart.yaml

Lines 5 to 14 in 6749ba1

    
           dependencies: 
        
             - name: jupyterhub 
        
               version: "0.9.0-beta.4.n008.hb20ad22" 
        
               repository: 'https://jupyterhub.github.io/helm-chart/' 
        
               import-values: 
        
                 - child: rbac 
        
                   parent: rbac 
        
             - name: dask-gateway 
        
               version: "0.6.1" 
        
               repository: 'https://dask.org/dask-gateway-helm-repo/'

@jacobtomlinson - any pointers to get us started?

Consider continuous deployment for this Repo

This is awesome work, @jacobtomlinson! I'm also excited to see chartpress being used :)

What you think of continuous deployment for this repo? It took the pressure off me and other ops-y people in both mybinder.org and the berkeley jupyterhub deployment. See http://mybinder-sre.readthedocs.io/en/latest/deployment/how.html for how deployments to mybinder.org work. https://github.com/jupyterhub/mybinder.org-deploy/graphs/contributors shows that people of various skillsets can deploy with confidence.

This will also allow us (The JupyterHub community) to document good practices around continuous deployment, security management in public repos, and growing a community around your deployment.

Possible sequence of steps to doing CD:

Move the docker image contents here
Use git-crypt to store deployment secrets. MyBinder.org uses it to good effect!
Set up automatic build and push on merge for docker image
Create a 'staging' environment that mimics the production environment
Create a Google Cloud Service Account that has just enough rights to do a helm upgrade
Create two branches - 'staging' and 'prod'. Merging to 'staging' will deploy to the staging environment. Deployer then verifies staging environment works as intended, and merges PR to 'prod' environment after.

Open questions

Who has merge (and hence deploy) rights?
Whose responsibility it is to fix stuff when things break?
How do we incorporate other deployments (on AWS? etc) to this CD pipeline?

Permissions of Jupyterhub users on Kubernetes

I am testing the GCE deployment from https://github.com/pangeo-data/pangeo/blob/master/gce on Jetstream. BTW the live demo on https://www.youtube.com/watch?v=rSOJKbfNBNk looks awesome!

I am inspecting Jupyterhub configuration, in particular:
https://github.com/pangeo-data/pangeo/blob/master/gce/jupyter-config.yaml#L62
Does this mean that each Jupyterhub user has control over Kubernetes from their pod? Could the user kill other users' pods?

pangeo.pydata.org did not scale down

A lot of people logged on to pangeo.pydata.org after my talk at JupyterCon. The cluster size went up to ~450. But it never scaled down:

I have brought it back down manually now. But I just wanted to document this so we can figure out how to avoid it in the future.

This cost a lot of credits!

how can i use docker image instead of helm-chart to deploy pangeo?

for some reason,i have no a helm environment and not intend to deploy pangeo in public cloud such as GEC. can i use kubernetes with docker image instead of helm-chart to deploy pangeo?

add graphviz to notebook docker image

dask experts: what is the recommended way to install graphviz? (I know it has some complicated dependencies.)

I would like to be able to visualize dask graphs from pangeo.pydata.org.

xref: pangeo-data/pangeo#74

[Proposal] Add dask-gateway to this chart

I think we're getting to the place where we can start thinking about how to include dask-gateway in this project. Just to quickly summarize some of the motivation for this proposal:

Our current daskkubernetes RBAC gives all uses on the hub permissions to ["get", "list", "watch", "create", "delete"] any pods/services on the cluster. We've long known this is not a long term solution to allowing user to control their dask resources
We don't have clean ways to limit users usage or resources.

Dask-gateway gives us the ability to solve both of these problems (and a few more). So what is our path toward adoption of the dask-gateway chart? Is it too soon? Is it missing features we think we should wait for? Would this change require us to disable the RBAC solution or is there a use case for keeping both solutions here?

cc @jcrist, @jacobtomlinson, @TomAugspurger, @yuvipanda

New helm chart to address this vulnerability?

Do we need a new helm chart to address this vulnerability?
https://blog.jupyter.org/open-redirect-vulnerability-in-jupyter-jupyterhub-adf43583f1e4

Failed to start up a notebook server

Hi, I deployed a pangeo in kubernetes(version: 1.17.3) via relm and I can login the JupyterHub via my browser. However, when i click the button "Start My Server"， it fails. It seem a pod named jupyter-xxx with notebook images will be created every time i start a server.But the jupyter-xxx pod never succeeds to run, always in peeding status,like:

After 300s, Spawn failed and the pod will be deleted :

I notice that there were some errors in the logs of the user-scheduler,something like 'Failed to list *v1beta1.StatefulSet: the server could not find the requested resource':

Is it the reason why the server fails to start up? And if the version of my kubernetes is too new?

Typo in Dask RBAC RoleBinding

The apiGroup is at the wrong level in the Dask RoleBinding, resulting in the following error from Helm:

Error: error validating "": error validating data: [ValidationError(RoleBinding): unknown field "apiGroup" in io.k8s.api.rbac.v1beta1.RoleBinding, ValidationError(RoleBinding.roleRef): missing required field "apiGroup" in io.k8s.api.rbac.v1beta1.RoleRef]

It just needs to be pushed forward by two spaces...

Unable to mount volumes - Pangeo on the Cloud

Hello, got my dynamic PV storageclass for my kube cluster lab working. Now seeing an different mount error, as per the attached.

Thanks

Can we move hub/extraConfig/customPodHook to helm chart?

Is there a reason we don't have the following block of code in this helm chart?

hub:
    extraConfig:
      customPodHook: |
        from kubernetes import client
        def modify_pod_hook(spawner, pod):
            pod.spec.containers[0].security_context = client.V1SecurityContext(
                privileged=True,
                capabilities=client.V1Capabilities(
                    add=['SYS_ADMIN']
                )
            )
            return pod
        c.KubeSpawner.modify_pod_hook = modify_pod_hook

consistent versioning of helm chart

Right now we are at v0.1.1 and have been for a while. Yet we continue to update the helm chart using date and / or hash-based suffixes...

What is the accepted best practice for helm chart versioning? How do we determine when to increment the version number?

upgrading to v0.8 of the JupyterHub Helm Chart

Jupyterhub is about to release v0.8 of the JupyterHub Helm Chart. This release has a number of improvements including optimization for autoscaling (see #58). I think we should plan to make a proper release of this chart once their release is complete.

@jacobtomlinson - is this something you at the met office are interested in? Would you want to lead this effort?

Are there other things we should think about prior to upgrading the chart?

inconsistent notebook and worker images on pangeo.pydata.org

As noted by @mrocklin in pydata/xarray#2234, we currently have mismatches between package version on notebook and cluster docker images. This causes lots of things to break.

Just opening an issue to remind us of this state and that we need to fix it. I guess we need an updated worker image, or a better solution to worker environments.

client.get_versions(check=True)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-11-4a95fbecd8f8> in <module>()
----> 1 client.get_versions(check=True)

/opt/conda/lib/python3.6/site-packages/distributed/client.py in get_versions(self, check)
   3064                 raise ValueError("Mismatched versions found\n"
   3065                                  "\n"
-> 3066                                  "%s" % ('\n\n'.join(errs)))
   3067 
   3068         return result

ValueError: Mismatched versions found

bokeh
+-------------------------+---------+
|                         | version |
+-------------------------+---------+
| client                  | 0.12.16 |
| tcp://10.20.209.8:40004 | 0.12.7  |
+-------------------------+---------+

cloudpickle
+-------------------------+---------+
|                         | version |
+-------------------------+---------+
| client                  | 0.5.3   |
| tcp://10.20.209.8:40004 | 0.4.0   |
+-------------------------+---------+

numpy
+-------------------------+---------+
|                         | version |
+-------------------------+---------+
| client                  | 1.14.5  |
| tcp://10.20.209.8:40004 | 1.14.3  |
+-------------------------+---------+

pandas
+-------------------------+---------+
|                         | version |
+-------------------------+---------+
| client                  | 0.23.1  |
| tcp://10.20.209.8:40004 | 0.20.3  |
+-------------------------+---------+

toolz
+-------------------------+---------+
|                         | version |
+-------------------------+---------+
| client                  | 0.9.0   |
| tcp://10.20.209.8:40004 | 0.8.2   |
+-------------------------+---------+

tornado
+-------------------------+---------+
|                         | version |
+-------------------------+---------+
| client                  | 5.0.2   |
| tcp://10.20.209.8:40004 | 4.5.2   |
+-------------------------+---------+

PV for Deploying Pangeo on the Cloud

Hello, I am following the "Deploying Pangeo on the Cloud" guide. I have a lab Kube cluster on AWS (not AKE). So I started from "Step Three: Configure Kubernetes". All went well. Only edited the loadBalancerIP in jupyter_config.yaml, added github oauth to secret_config.yaml

kubectl --namespace=pangeo get pod
NAME READY STATUS RESTARTS AGE
hub-76dcc8c697-n7dp5 0/1 Pending 0 53m
proxy-7746576cb7-rzd8n 1/1 Running 0 53m

pod has unbound immediate PersistentVolumeClaims
no persistent volumes available for this claim and no storage class is set

Not sure whats required?

Thanks

Restricting user's resources

Currently any user on our JupyterHub+Dask deployments can launch as many pods as they like. This is troublesome because it opens us up to excessive costs and because other users can't easily get on. What is the right way to do this?

Two approaches have come up before:

Place each user in a separate namespace. Some users, notably the UK Met office say that this is unpleasant because Pangeo is only one of many services running on their Kubernetes cluster and they'd like to have it all within a single namespace if possible. This is probably representative of a larger concern about cleanliness polluting namespaces.
Make a separate service that manages everything. Users don't talk to Kubernetes, they talk to that thing, which talks to Kubernetes for them. This is doable, but perhaps larger in scope than we'd like to tackle near-term.
@yuvipanda mentioned that there might be user-based resource quotas in some corner of the expansive Kubernetes world. If he has time it would be good to get links from him.

Also cc @jacobtomlinson @dsludwig

use conda-forge packages over pip in Dockerfiles?

I just tried to build the pangeo-worker container and ran into version conflicts with pyasl1. These conflicts went away when I moved urllib3 from pip installed packages to the conda installed packages.

I thought the best practice was to use conda-forge packages if available and only use pip when conda packages were not available or when we need development stuff from git.

If that is true should we move all conda-installable packages to the conda install list?

Fix tagged deployments

Currently I have the CI configured to only run chartpress --publish-chart on tagged commits.

However it appears tagged commits do not have the TRAVIS_COMMIT_RANGE env var set which causes the deploy to fail.

This needs to be fixed, but also I would be keen to have a conversation about how to want to manage releases and versioning. I'll raise a separate issue for that.

Upgrade dask-gateway to 0.6.1

Dask-Gateway version 0.6.1 has been released, which includes several new features Pangeo may be interested in:

Adaptive scaling (e.g. cluster.adapt())
Automatic shutdown of idle clusters
Several performance and resiliency bugfixes

This release also fixes a bug in the helm chart that prevents smooth upgrades between versions (see dask/dask-gateway#150 for more information). As such, upgrading the chart will require manually deleting the previous dask-gateway deployment and installing the new version. From 0.6.0 on this issue has been resolved and upgrades should be smoother.

refactor notebook Dockerfile to use environment.yaml

It would be nice to have a standalone pangeo default pangeo_environment.yaml. From the Dockerfile, we could then do

conda install -f pangeo_environment.yaml

Does anyone see any problem with this approach before I try it?

CI Testing of this chart

In the spirit of improving testing of Pangeo's cloud infrastructure, @alando46 has opened pangeo-data/pangeo#544 (thanks!). One of the topics that has been discussed is adding a CI test suite that uses miniconda to this repo. I'm not sure exactly how this would work (soliciting input from @yuvipanda and @jacobtomlinson) but I am aware of two examples that currently have a similar pattern:

dask-kubernetes: https://github.com/dask/dask-kubernetes/blob/master/.circleci/config.yml
zero-to-jupyterhub: https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/73b295d6ec1e6e04680aebcf783f72b9cdede3f4/.travis.yml#L18

How should we proceed and who can help with this effort?

Pangeo failing all Helm versions > v0.1.1-e5fa7c4

Hello, I installed Pangeo 0.1.1-86665a6 via the cloud deploy process successfully. I have be testing 4_upgrade_helm.sh which works up to v0.1.1-e5fa7c4. Any version after this fails to deploy Pangeo. Initially versions complete the upgrade successfully but launching the server pod fails with "cant find singleuser" script. Latter versions fail to deploy completely as the hub pod can not start with a similar error about an OS env for singeluser.

Thanks

server/user pod error:
Error: failed to start container "notebook": Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "exec: \"start-singleuser.sh\": executable file not found in $PATH": unknown Back-off restarting failed container

HUB Error:
[E 2019-05-16 07:08:55.884 JupyterHub app:1623] Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/jupyterhub/app.py", line 1620, in launch_instance_async yield self.initialize(argv) File "/usr/lib/python3.6/types.py", line 204, in __next__ return next(self.__wrapped) File "/usr/local/lib/python3.6/dist-packages/jupyterhub/app.py", line 1358, in initialize self.load_config_file(self.config_file) File "<decorator-gen-5>", line 2, in load_config_file File "/usr/local/lib/python3.6/dist-packages/traitlets/config/application.py", line 87, in catch_config_error return method(app, *args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/traitlets/config/application.py", line 598, in load_config_file raise_config_file_errors=self.raise_config_file_errors, File "/usr/local/lib/python3.6/dist-packages/traitlets/config/application.py", line 562, in _load_config_files config = loader.load_config() File "/usr/local/lib/python3.6/dist-packages/traitlets/config/loader.py", line 457, in load_config self._read_file_as_dict() File "/usr/local/lib/python3.6/dist-packages/traitlets/config/loader.py", line 489, in _read_file_as_dict py3compat.execfile(conf_filename, namespace) File "/usr/local/lib/python3.6/dist-packages/ipython_genutils/py3compat.py", line 198, in execfile exec(compiler(f.read(), fname, 'exec'), glob, loc) File "/srv/jupyterhub_config.py", line 46, in <module> c.KubeSpawner.singleuser_image_spec = os.environ['SINGLEUSER_IMAGE'] File "/usr/lib/python3.6/os.py", line 669, in __getitem__ raise KeyError(key) from None KeyError: 'SINGLEUSER_IMAGE'

Archive and stop updating pangeo helm chart?

In the interest of consolidating pangeo infrastructure and configuration, should we archive this repo and update pangeo-cloud-federation to no longer use the pangeo-helm chart? This question has come up before on calls, I haven't looked through all existing issues, but if I understand correctly if we fully embrace dask-gateway and drop dask-kubernetes I think we can just rely on upstream helm charts.

Pangeo-binder does not require the pangeo helm chart. (https://github.com/pangeo-data/pangeo-binder/tree/staging/pangeo-binder). I've always found it confusing that default config for the persistent hubs is in two places (https://github.com/pangeo-data/helm-chart/blob/master/pangeo/values.yaml) and (https://github.com/pangeo-data/pangeo-cloud-federation/blob/staging/pangeo-deploy/values.yaml).

Pros:

one less repository to monitor and keep in sync with upstream jupyterhub changes

Cons:

some effort to modify config values in pangeo-cloud-federation

@jhamman, @TomAugspurger, @rabernat, @tjcrone, @consideRatio, @yuvipanda, please chime in in case I'm overlooking something related to the helm chart history or configuration needs.

chartpress 0.5.0 is out

I've fixed most bugs that I think disrupted you before in chartpress 0.5.0, and documented it a lot better. See https://github.com/jupyterhub/chartpress for more info!

jupyterlab extension doesn't load in latest docker image

In our latest docker image, jupyterlab doesn't work. We discovered this after merging #47 and deploying the latest helm chart.

The key difference is that these lines are present in the working image (85dc5c9 ) but missing from the more recent image (2bd2369):

[I 2018-07-18 11:26:08.763 SingleUserNotebookApp extension:53] JupyterLab beta preview extension loaded from /opt/conda/lib/python3.6/site-packages/jupyterlab
[I 2018-07-18 11:26:08.763 SingleUserNotebookApp extension:54] JupyterLab application directory is /opt/conda/share/jupyter/lab
[I 2018-07-18 11:26:10.660 SingleUserNotebookApp handlers:73] [nb_anacondacloud] enabled
[I 2018-07-18 11:26:10.664 SingleUserNotebookApp handlers:292] [nb_conda] enabled
[I 2018-07-18 11:26:10.708 SingleUserNotebookApp __init__:35] ✓ nbpresent HTML export ENABLED
[W 2018-07-18 11:26:10.709 SingleUserNotebookApp __init__:43] ✗ nbpresent PDF export DISABLED: No module named 'nbbrowserpdf'

I am having serious deja-vu. We dealt with a very similar issue in pangeo-data/pangeo#261 (comment). There are a couple of different issues mixed together there, but ultimately we fixed it issue by pinning jupyterlab_launcher=0.10.5

What is frustrating here is that we didn't change anything jupyter-related in the latest docker image, yet it has broken again in the same way.

cc @yuvipanda, @mrocklin, @jhamman

Add real time monitoring

After some discussion in pangeo-data/pangeo#184 and on our monthly catch up call yesterday I think it would be useful to add some default real time monitoring tools to this helm chart.

I see that there are likely to be two scenarios for people using this chart:

People who want a turn-key data analysis platform in the cloud and create a kubernetes cluster on a cloud compute platform purely for running Pangeo.
People who are already running a kubernetes cluster and want to install Pangeo on it.

My feeling is that the majority of Pangeo users will be in the first camp for now. Therefore it would make sense to include some real time monitoring tools such as prometheus and grafana along with some Pangeo specific default dashboard views.

To accommodate the second camp of users (which includes myself) this should be an optional config option as we already have monitoring on the cluster. But as it's likely that the majority will want it the default should be that is is enabled.

This would enable people to see useful information about the cluster like so:

use HTTPS?

zero2jupyterhub-k8s recommends using HTTPS:
https://zero-to-jupyterhub.readthedocs.io/en/latest/security.html#https

Is there any reason we are not doing this on pangeo.pydata.org?

Incorrect value for jupyterhub.hub.services.dask-gateway.url

helm-chart/pangeo/values.yaml

Line 43 in b1a230a

url: http://web-public-dev-staging-dask-gateway

That references dev-staging, which (I think) is supposed to be the namespace the chart is installed into.

Automate setup of dask-gateway jupyterhub service

We currently have a line in our chart that sets the URL of the dask-gateway service on the hub:

helm-chart/pangeo/values.yaml

Lines 39 to 43 in 6749ba1

    
           hub: 
        
             services: 
        
               dask-gateway: 
        
                 # This makes the gateway available at ${HUB_URL}/services/dask-gateway 
        
                 url: http://web-public-dev-staging-dask-gateway

Unfortunately, this only works for our dev-staging deployment. How can we automatically determine this URL. Any Helm/k8s magic that can make this possible?

cc @jcrist

Helm chart is currently broken because docker tag doesn't exist

In the latest chartpress, now that #28 has been merged, we get a line like this:

Updating pangeo/values.yaml: jupyterhub.singleuser.image: {'name': 'pangeo/notebook', 'repository': 'pangeo/notebook', 'tag': '74d2d85'}

https://travis-ci.org/pangeo-data/helm-chart/jobs/381406496#L457

However, that tag doesn't actually exist.

Hopefully this will be fixed as soon as we merge a real update to the notebook image (e.g. #29).

	if [ -z "$EXAMPLES_GIT_URL" ]; then
	export EXAMPLES_GIT_URL=https://github.com/pangeo-data/pangeo-example-notebooks
	fi
	if [ ! -d "examples" ]; then
	git clone $EXAMPLES_GIT_URL examples
	fi
	cd examples
	git remote set-url origin $EXAMPLES_GIT_URL
	git fetch origin
	git reset --hard origin/master
	git merge --strategy-option=theirs origin/master
	if [ ! -f DONT_SAVE_ANYTHING_HERE.md ]; then
	echo "Files in this directory should be treated as read-only" > DONT_SAVE_ANYTHING_HERE.md
	fi

	dependencies:
	- name: jupyterhub
	version: "0.9.0-beta.4.n008.hb20ad22"
	repository: 'https://jupyterhub.github.io/helm-chart/'
	import-values:
	- child: rbac
	parent: rbac
	- name: dask-gateway
	version: "0.6.1"
	repository: 'https://dask.org/dask-gateway-helm-repo/'

	hub:
	services:
	dask-gateway:
	# This makes the gateway available at ${HUB_URL}/services/dask-gateway
	url: http://web-public-dev-staging-dask-gateway

pangeo-data / helm-chart Goto Github PK

helm-chart's People

Contributors

Stargazers

Watchers

Forkers

helm-chart's Issues

Recommend Projects

Recommend Topics

Recommend Org