Giter Site home page Giter Site logo

pangeo-data / pangeo-docker-images Goto Github PK

View Code? Open in Web Editor NEW
123.0 14.0 90.0 6.22 MB

Docker Images For Pangeo Jupyter Environment

Home Page: https://pangeo-docker-images.readthedocs.io

License: MIT License

Dockerfile 49.53% Shell 7.38% Python 33.86% Makefile 9.22%
docker jupyter python pangeo

pangeo-docker-images's Introduction

Pangeo Docker Images

Documentation build status Build Status Publish Status DockerHub Version

The images defined in this repository capture reproducible computing environments used by Pangeo Cloud. They build on top of the Ubuntu operating system and include conda environments with a curated set of Python packages for geospatial analysis. While initially intended for Pangeo Cloud, they can be used outside of Pangeo infrastructure too!

More details can be found in our documentation.

Images are hosted on DockerHub and on Quay.io

Image Description Size Pulls
base-image Foundational Dockerfile for builds
base-notebook minimally functional image for pangeo hubs
pangeo-notebook base-notebook + core earth science analysis packages
pytorch-notebook pangeo-notebook + GPU-enabled pytorch
ml-notebook pangeo-notebook + GPU-enabled tensorflow2

Click on the image name in the table above for a current list of installed packages and versions

graph TD;
    base-image-->base-notebook;
    base-notebook-->pangeo-notebook;
    pangeo-notebook-->pytorch-notebook;
    pangeo-notebook-->ml-notebook;
    click base-image "https://hub.docker.com/r/pangeo/base-image" "Open this in a new tab" _blank
    click base-notebook "https://hub.docker.com/r/pangeo/base-notebook" "Open this in a new tab" _blank
    click pangeo-notebook "https://hub.docker.com/r/pangeo/pangeo-notebook" "Open this in a new tab" _blank
    click pytorch-notebook "https://hub.docker.com/r/pangeo/pytorch-notebook" "Open this in a new tab" _blank
    click ml-notebook "https://hub.docker.com/r/pangeo/ml-notebook" "Open this in a new tab" _blank
Loading

Using the image with Singularity on HPC systems

If you want to use this image on an HPC system (including a GPU system), we recommend using Singularity. Please see the Singularity guide.

Dask-gateway compatibility

The primary use of these Docker images is running on Pangeo Cloud deployments with dask-gateway. Generally, the dask-gateway library version built into the image must match the dask-gateway version deployed in the cloud environment. The follow table keeps track of the first time a new dask-gateway version appears in a tagged image:

dask-gateway Image tag
0.9 2020.11.06
0.8 2020.07.28
0.7 2020.04.22

Other notes

pangeo-docker-images's People

Contributors

actions-bot avatar actions-user avatar andersy005 avatar bnjam avatar cisaacstern avatar consideratio avatar dcherian avatar dependabot[bot] avatar github-actions[bot] avatar hdrake avatar jbusecke avatar jhamman avatar marioherreroglez avatar maxrjones avatar ngam avatar paigem avatar pangeo-bot avatar rabernat avatar rsignell-usgs avatar scottyhq avatar sgibson91 avatar shanicetbailey avatar sharkinsspatial avatar shunzi-work avatar thenaomig avatar tjcrone avatar tomaugspurger avatar weiji14 avatar willirath avatar yuvipanda avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pangeo-docker-images's Issues

making pangeo/base-notebook compatible with stock jupyterhub

We'd like to adapt https://github.com/pangeo-data/pangeo-docker-images for vanilla jupyterhub deployments for research and teaching, and need a little guidance on how to make pangeo/base-notebook:latest work with dockerspawner and jupyterhub/jupyterhub:master. Specifically, this works on localhost: https://github.com/phaustin/jupyterhub_basics/tree/master/examples/simple

i.e. docker-compose up brings up a jupyterhub on localhost:8000 and spawns new notebooks.

but swapping scipy-notebook for pangeo/base-notebook:

https://github.com/phaustin/jupyterhub_basics/compare/pangeo

and rerunning gives the following error:

# 500 : Internal Server Error

Error in Authenticator.pre_spawn_start: TypeError unsupported operand type(s) for +: 'NoneType' and 'list'

You can try restarting your server from theย [home page](http://localhost:8000/hub/home).

Any idea what dockerspawner is looking for here?

Discussion: Integration with pangeo conda-forge metapackages

Based on various discussions and the pangeo call today there is some debate as to how to organize these docker images and sync up with corresponding pangeo metapackages. First, some background. Currently the metapackages specify the minimal set of packages necessary to run workflows on Pangeo clusters. Versions are pinned at the minor version level for flexibility in conda solutions and to avoid major version bumps being installed that can lead to incompatibilities. The goal is to have consistent versions of core packages across images.

For dask workers we have the pangeo-dask=2020.03.30 metapackage that pins:
https://github.com/conda-forge/pangeo-dask-feedstock/blob/master/recipe/meta.yaml

    - dask =2.13*
    - distributed =2.13*
    - dask-kubernetes =0.10*
    - dask-gateway =0.6*

Because we're running these images on jupyterlab we have a separate pangeo-notebook metapackage that pins both the dask metapackage and adds user-interface requirements:
https://github.com/conda-forge/pangeo-notebook-feedstock/blob/master/recipe/meta.yaml

    - pangeo-dask =2020.03.30
    - dask-labextension =1.1*
    - jupyter-server-proxy =1.3*
    - jupyterhub =1.1*
    - jupyterlab =1.2*
    - nbgitpuller =0.8*

Note that we might want to release a new pangeo-notebook=2020.04.01 that still keeps the dask versions but bumps to jupyterlab=2.0*, so it's possible for these calendar date versions (calver) currently to be out of sync.

The metapackages allow people to easily create pangeo-compatible images with conda with

  • conda create -c conda-forge pangeo-notebook
  • or conda create -c conda-forge pangeo-notebook=2020.03.30

It's very important to realize if you run those commands today and a month from now, you will not end up with an exact reproduced environment because dependency packages could change their versions. Part of the intentional design of this repository is to create a lock file that does allow you to create exactly the same (linux) conda environment for any tagged image.

For example, we built a pangeo-notebook image on 2020.03.30 with the following environment.yml. Note it is using an earlier metapackage, those dates don't have to match.
https://github.com/pangeo-data/pangeo-stacks-dev/blob/a33c9b65eeb1fbbe298a27fcba32b856aa39edaf/pangeo-notebook/environment.yml#L1-L8

To access the environment we created you have two options:

  1. docker run -it pangeo/pangeo-notebook:2020.03.30

But if you don't want to bother with docker and just need to install the conda-environment (linux only), you can also run!

  1. conda create -n pangeo-notebook --file="https://raw.githubusercontent.com/pangeo-data/pangeo-stacks-dev/2020.03.30/pangeo-notebook/conda-linux-64.lock"

cc @jhamman , @TomAugspurger , @tjcrone , @rabernat , @rsignell-usgs , @ocefpaf

Improve PR GitHub Actions for Locking environment

It's not obvious currently that PRs changing the environment.yml in an image subfolder an admin for this repository needs to run the /condalock command in a comment.

For example, #91 didn't have any effect because the lockfile wasn't updated.

We should structure CI to ensure this happens before merging. The complication is that either a bot needs to run the environment lock step, or we need a seperate workflow to ensure lockfiles have been modified before enabling the 'merge' button (probably easiest).

cc @TomAugspurger

CondaValueError: prefix already exists: /srv/conda

Are you ready for bug reports yet @scottyhq? ๐Ÿ˜„

For the repository at https://github.com/pangeo-gallery/default-binder, which gives this URL: https://binder.pangeo.io/v2/gh/pangeo-gallery/default-binder/master

Here's the build log

Waiting for build to start...
Picked Git content provider.
Cloning into '/tmp/repo2dockerfaxxwd15'...
HEAD is now at 997628f initial commit
Using DockerBuildPack builder
Step 1/1 : FROM pangeo/base-image:2020.03.27
# Executing 8 build triggers
 ---> Running in 2a56f0986488
Removing intermediate container 2a56f0986488
 ---> Running in 21db55ee8d58
Checking for 'apt.txt'...
Removing intermediate container 21db55ee8d58
 ---> Running in e6240ee36159
Removing intermediate container e6240ee36159
 ---> Running in 5be4bf039d3e
Checking for conda 'spec-file.txt' or 'environment.yml'...

CondaValueError: prefix already exists: /srv/conda

Removing intermediate container 5be4bf039d3e
The command '/bin/sh -c echo "Checking for conda 'spec-file.txt' or 'environment.yml'..."         ; [ -d binder ] && cd binder         ; if test -f "spec-file.txt" ; then         conda create --name pangeo --file spec-file.txt         ; elif test -f "environment.yml" ; then         conda env create -f environment.yml          ; else echo "No spec-file.txt or environment.yml! *creating default env*" ;         conda create --namepangeo pangeo-notebook=0.0.2         ; fi         && conda clean -yaf         && find ${CONDA_DIR} -follow -type f -name '*.a' -delete         && find ${CONDA_DIR} -follow -type f -name '*.pyc' -delete         && find ${CONDA_DIR} -follow -type f -name '*.js.map' -delete         && find${NB_PYTHON_PREFIX}/lib/python*/site-packages/bokeh/server/static -follow -type f -name '*.js' ! -name '*.min.js' -delete' returned a non-zero code: 1

New DockerHub Image retention policies will delete unused images after 6 months

https://www.docker.com/pricing/resource-consumption-updates

Starting November 2020, Images untouched for 6 months will be scrubbed.

On one hand, this isn't an issue because this repository stores the complete configuration needed to build any previously tagged image. But eventually someone might want to reproduce a study from a year ago and hit an 'image not found' error.

There are some options... 1) Pangeo has it's own pro Docker account. 2) Start pushing copies of the image to GitHub Packages via an Action.

cc @jhamman @rabernat @salvis2 @TomAugspurger @tjcrone

Current Base does not work well with internal PyPi repos

It would be nice to have a way to pass a pip.conf file to the base image or on a new build for easy use of internal pypi repos. Currently I have had to modify the image in this way:


FROM ubuntu:18.04
# Master build file for pangeo images

# Run this section as root
# try to keep conda version in sync with repo2docker
# ========================
ENV CONDA_VERSION=4.8.2-1 \
    CONDA_ENV=notebook \
    NB_USER=jovyan \
    NB_UID=1000 \
    SHELL=/bin/bash \
    LANG=C.UTF-8  \
    LC_ALL=C.UTF-8 \
    CONDA_DIR=/srv/conda

ENV NB_PYTHON_PREFIX=${CONDA_DIR}/envs/${CONDA_ENV} \
    DASK_ROOT_CONFIG=${CONDA_DIR}/etc \
    HOME=/home/${NB_USER} \
    PATH=${CONDA_DIR}/bin:${PATH}

# Create jovyan user, permissions, add conda init to startup script
RUN echo "Creating ${NB_USER} user..." \
    && groupadd --gid ${NB_UID} ${NB_USER}  \
    && useradd --create-home --gid ${NB_UID} --no-log-init --uid ${NB_UID} ${NB_USER} \
    && echo ". ${CONDA_DIR}/etc/profile.d/conda.sh ; conda activate ${CONDA_ENV}" > /etc/profile.d/init_conda.sh \
    && chown -R ${NB_USER}:${NB_USER} /srv

# COPY chown available docker>17.09
# but env sub only works for docker>19.03 (kubernetes>1.17)
# https://github.com/moby/moby/issues/35018
#COPY --chown=${NB_USER}:${NB_USER} . ${HOME}
COPY --chown=jovyan:jovyan . /srv

# SEE: https://github.com/phusion/baseimage-docker/issues/58
# and https://github.com/phusion/baseimage-docker/issues/319
ARG DEBIAN_FRONTEND=noninteractive

RUN echo "Installing Apt-get packages..." \
    && apt-get update --fix-missing \
    && apt-get install -y apt-utils 2> /dev/null \
    && apt-get install -y wget \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

ARG PYPI_FILE
RUN echo "Creating pip.conf" &&\
    mkdir -p $HOME &&\
    mkdir -p $HOME/.config/ &&\
    mkdir -p $HOME/.config/pip/ &&\
    echo "${PYPI_FILE}" | awk '$1=$1' RS="\\n" > $HOME/.config/pip/pip.conf
# ========================

USER ${NB_USER}
WORKDIR ${HOME}

RUN echo "Installing Miniforge..." \
    && URL="https://github.com/conda-forge/miniforge/releases/download/${CONDA_VERSION}/Miniforge3-${CONDA_VERSION}-Linux-x86_64.sh" \
    && wget --quiet ${URL} -O miniconda.sh \
    && /bin/bash miniconda.sh -u -b -p ${CONDA_DIR} \
    && rm miniconda.sh \
    && conda clean -afy \
    && find ${CONDA_DIR} -follow -type f -name '*.a' -delete \
    && find ${CONDA_DIR} -follow -type f -name '*.pyc' -delete

RUN echo "Copying configuration files..." \
    && mv /srv/condarc.yml ${CONDA_DIR}/.condarc \
    && mv /srv/dask_config.yml ${CONDA_DIR}/etc/dask.yml

EXPOSE 8888
ENTRYPOINT ["/srv/start"]
#CMD ["jupyter", "notebook", "--ip", "0.0.0.0"]

# Only run these if used as a base image
# ----------------------
ONBUILD USER root
# hardcode for now
ONBUILD COPY --chown=jovyan:jovyan . /home/jovyan

ONBUILD RUN echo "Checking for 'binder' or '.binder' subfolder" \
    ; if [ -d binder ] ; then \
    echo "Using 'binder/' build context" \
    ; elif [ -d .binder ] ; then \
    echo "Using '.binder/' build context" \
    ; else \
    echo "Using './' build context" \
    ; fi

ONBUILD ARG DEBIAN_FRONTEND=noninteractive
ONBUILD RUN echo "Checking for 'apt.txt'..." \
    ; [ -d binder ] && cd binder \
    ; [ -d .binder ] && cd .binder \
    ; if test -f "apt.txt" ; then \
    apt-get update --fix-missing \
    && xargs -a apt.txt apt-get install -y \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/* \
    ; fi

ONBUILD USER ${NB_USER}

# Create "notebook" conda environment and dask labextensions
ONBUILD RUN echo "Checking for 'conda-linux-64.lock' or 'environment.yml'..." \
    ; [ -d binder ] && cd binder \
    ; [ -d .binder ] && cd .binder \
    ; if test -f "conda-linux-64.lock" ; then \
    conda create --name ${CONDA_ENV} --file conda-linux-64.lock \
    ; elif test -f "environment.yml" ; then \
    conda env create --name ${CONDA_ENV} -f environment.yml  \
    ; else echo "No conda-linux-64.lock or environment.yml! *creating default env*" ; \
    conda create --name ${CONDA_ENV} pangeo-notebook \
    ; fi \
    && conda clean -yaf \
    && find ${CONDA_DIR} -follow -type f -name '*.a' -delete \
    && find ${CONDA_DIR} -follow -type f -name '*.pyc' -delete \
    && find ${CONDA_DIR} -follow -type f -name '*.js.map' -delete \
    && find ${NB_PYTHON_PREFIX}/lib/python*/site-packages/bokeh/server/static -follow -type f -name '*.js' ! -name '*.min.js' -delete

# Install pip packages
# remove cache https://github.com/pypa/pip/pull/6391 ?
ONBUILD RUN echo "Checking for pip 'requirements.txt'..." \
    ; [ -d binder ] && cd binder \
    ; [ -d .binder ] && cd .binder \
    ; if test -f "requirements.txt" ; then \
    ${NB_PYTHON_PREFIX}/bin/pip install --no-cache-dir -r requirements.txt \
    ; fi

# Run postBuild script within "pangeo" environment
ONBUILD RUN echo "Checking for 'postBuild'..." \
    ; [ -d binder ] && cd binder \
    ; [ -d .binder ] && cd .binder \
    ; if test -f "postBuild" ; then \
    export PATH=${NB_PYTHON_PREFIX}/bin:${PATH} \
    && chmod +x postBuild \
    && ./postBuild \
    && rm -rf /tmp/* \
    && rm -rf ${HOME}/.cache ${HOME}/.npm ${HOME}/.yarn \
    && rm -rf ${NB_PYTHON_PREFIX}/share/jupyter/lab/staging \
    && find ${CONDA_DIR} -follow -type f -name '*.a' -delete \
    && find ${CONDA_DIR} -follow -type f -name '*.pyc' -delete \
    && find ${CONDA_DIR} -follow -type f -name '*.js.map' -delete \
    ; fi

# Overwrite start entrypoint script if present
ONBUILD RUN echo "Checking for 'start'..." \
    ; [ -d binder ] && cd binder \
    ; [ -d .binder ] && cd .binder \
    ; if test -f "start" ; then \
    chmod +x start \
    && cp start /srv/start \
    ; fi
# ----------------------

Where the build is slightly different:


docker build -t us.gcr.io/climacell-research/climacell-base:latest --build-arg PYPI_FILE="`cat ~/.pip/pip.conf`" .

I am sure there is some room for improvement but would be nice to have the ability to use an internal pypi repo without a good amount of work.

nbgitpuller hangs while syncing git repo

Trigger rebuilding images from issues

Since we don't pin explicit versions of most packages in environment.yml, it would be nice to be able to retrigger solving and building images to pick up the latest compatible versions on any given day:
https://github.com/pangeo-data/pangeo-stacks-dev/blob/5687d73044ec8382a0b6792fe1229dcb2d63d008/pangeo-notebook/environment.yml#L1-L8

This issue tests and demonstrates a chatops dispatch command (/rebuild) in issues to accomplish a rebuild - instead of creating "dummy" commits as @rsignell-usgs was forced to do in #49.

The /rebuild command has to be the first line of a new comment and can only be run by repo admins. It will create a linked PR that rebuilds and tests images.

Permissions when running as another user

I encountered an issue with permissions on shared volumes, which eventually boils down to :

$ docker run -it --rm -p 8888:8888 -u 1002:1002 pangeo/base-notebook:latest ls -ltrn
total 40
-rwxr-xr-x 1 1000 1000   303 Mar  7 18:21 start
-rw-r--r-- 1 1000 1000  2993 Mar  7 18:21 packages.txt
-rw-r--r-- 1 1000 1000   166 Mar  7 18:21 environment.yml
-rw-r--r-- 1 1000 1000 21054 Mar  7 18:21 conda-linux-64.lock
-rw-r--r-- 1 1000 1000    30 Mar  7 18:21 Dockerfile

This prevents running the image with another user (1002 here) which is needed to satisfy permissions when accessing shared volumes owned by that (uid != 1000) user

From my understanding, this comes from moby/moby#7198

I am aware of two possible fixes :

I was wondering if someone encountered the same issue ?
If so, what would be the appropriate fix here ?

Add option to use conda packages.txt or conda.lock file in addition to environment.yml

The current setup only supports conda environment.yml files (https://github.com/pangeo-data/pangeo-stacks-dev/blob/master/pangeo-notebook/environment.yml), consistent with repo2docker (https://repo2docker.readthedocs.io/en/latest/config_files.html#environment-yml-install-a-python-environment)

It would be nice to also support alternative conda files for more flexible conda create commands instead of conda env create -f environment.yml For example it would be great to utilize

  1. new conda.lock files pointed out by @ocefpaf
    https://github.com/mariusvniekerk/conda-lock#dockerfile-example

  2. conda packages.txt from conda list --export > packages.txt

# This file may be used to create an environment using:
# $ conda create --name <env> --file <this file>
# platform: osx-64
anaconda-client=1.7.2=py37_0
asn1crypto=1.3.0=py37_0

these files require an extra step to generate but lead to more reproducible environments.

Review of Dockerfile cleanup commands

Currently we are running a lot of custom cleanup commands to save space on the images:
https://github.com/pangeo-data/pangeo-stacks-dev/blob/899e2e7bcc76354f1d4b584dc21742492cd81325/base-image/Dockerfile#L92-L96

I just ran into an error where one of the commands failed, possibly related to a change in layout with bokeh 2.0. cc @TomAugspurger . I'll go ahead and comment it out for now

bokeh-2.0.0          | 6.8 MB    | ########## | 100%
find: โ€˜/srv/conda/envs/pangeo/lib/python*/site-packages/bokeh/server/staticโ€™: No such file or directory

Package Discussion

What packages belong in a "default" pangeo metapackage? Currently pangeo-notebook has essentially dask + jupyterhub + jupyterlab. https://github.com/conda-forge/pangeo-notebook-feedstock/blob/master/recipe/meta.yaml. IMO, there's value in having a minimal metapackage.

There's also value in a "useful" metapackage that includes things like

  • xarray
  • s3fs / gcsfs / fsspec
  • intake
  • zarr
  • matplotlib?
  • other viz libraries?

In pangeo-stacks we called this pangeo-notebook: https://github.com/pangeo-data/pangeo-stacks/blob/a8cf6aefa36800301977390a785d06edac9b915e/pangeo-notebook/binder/environment.yml.

Also, what should we call this? Perhaps just pangeo?

cc @rabernat @scottyhq @jhamman

GitHub Actions Configuration

The GitHub Actions in the repository are fairly complex, mainly stemming from the fact that we want to add environment lock files from any PR. But PRs coming from forks only have read access by default, which is why we use our pangeo-bot user access token and /slash commands to have write access.

I recently learned a lot from this blog post https://securitylab.github.com/research/github-actions-preventing-pwn-requests/ on best practices for structuring this style of CI. Likely could make some modifications and improvements to how things are currently structured.

xesmf not importing required shapely library not found

Hi I'm running latest docker image using singularity on HPC trying to get xesmf working.

I've looked at #101 (comment) but this seems different
import xesmf

gives

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-5-7ec3550bcfad> in <module>
----> 1 import xesmf

~/.local/lib/python3.8/site-packages/xesmf/__init__.py in <module>
      1 # flake8: noqa
      2 
----> 3 from . import data, util
      4 from .frontend import Regridder, SpatialAverager
      5 

~/.local/lib/python3.8/site-packages/xesmf/util.py in <module>
      3 import numpy as np
      4 import xarray as xr
----> 5 from shapely.geometry import MultiPolygon, Polygon
      6 
      7 

~/.local/lib/python3.8/site-packages/shapely/geometry/__init__.py in <module>
      2 """
      3 
----> 4 from .base import CAP_STYLE, JOIN_STYLE
      5 from .geo import box, shape, asShape, mapping
      6 from .point import Point, asPoint

~/.local/lib/python3.8/site-packages/shapely/geometry/base.py in <module>
     17 
     18 from shapely.affinity import affine_transform
---> 19 from shapely.coords import CoordinateSequence
     20 from shapely.errors import WKBReadingError, WKTReadingError
     21 from shapely.geos import WKBWriter, WKTWriter

~/.local/lib/python3.8/site-packages/shapely/coords.py in <module>
      6 from ctypes import byref, c_double, c_uint
      7 
----> 8 from shapely.geos import lgeos
      9 from shapely.topology import Validating
     10 

~/.local/lib/python3.8/site-packages/shapely/geos.py in <module>
     90         'libc.musl-x86_64.so.1'
     91     ]
---> 92     free = load_dll('c', fallbacks=c_alt_paths).free
     93     free.argtypes = [c_void_p]
     94     free.restype = None

~/.local/lib/python3.8/site-packages/shapely/geos.py in load_dll(libname, fallbacks, mode)
     58     else:
     59         # No shared library was loaded. Raise OSError.
---> 60         raise OSError(
     61             "Could not find lib {} or load any of its variants {}.".format(
     62                 libname, fallbacks or []))

OSError: Could not find lib c or load any of its variants ['libc.musl-x86_64.so.1'].

ML image update

Was talking to @scottyhq about using the ML image over here and having pytorch preloaded. I know @rabernat has asked about this before (#179) .

We were wondering who all are using the ML image? and what might be the requirements they have? @nbren12 @jhamman
It seems like the usage for the ML image is low based on the pulls here: https://github.com/pangeo-data/pangeo-docker-images.

Since pytorch and tensorflow are two of the big candidates,(and maybe used independently usually), @scottyhq suggested having a pangeo-pytorch and a pangeo-tensorflow.

Any other thoughts that people have?

old version of s3fs being installed (0.4.2)

Cross posting https://discourse.pangeo.io/t/slow-xarray-open-zarr-with-default-s3fs-0-4-2-fixed-in-latest-0-5-1/1073

Our unpinned environment specification included both boto3 and s3fs which apparently resolves to installing s3fs 0.4.2

cc @emiliom and @martindurant or @ocefpaf for package dependency insights?

i can try 1) pinning s3fs>0.5, 2) removing boto3 from the requirement file (but i think it comes along with the awscli) and installing AWSCLI v2 as recommended here https://docs.aws.amazon.com/cli/latest/userguide/welcome-versions.html

Typo in the README.md

Little typo in the following line:

How to launch jupyterlab locally with one of these images
docker run -it --rm -p 8888:8888 pangeo/base-notebook:latest jupyter lab --ip 0.0.0.0

Should be:
docker run -it --rm -p 8888:8888 pangeo/base-notebook:latest jupyterlab --ip 0.0.0.0

use miniforge to install conda base environment

Following @yuvipanda 's lead in jupyterhub/repo2docker#859

miniforge is a new community-led installer that uses conda-forge as the default
channel, rather than defaults. So no mixing of defaults & conda-forge channel by default. and would likely reduce image size.

@TomAugspurger also looked into using just the conda standalone executable since we don't really need a (base) environment, but it is what people are used to locally, so I think it is worth keeping.
https://github.com/TomAugspurger/pangeo2docker/blob/f687f97dcf8f48d34033ea1903befd12d578f659/pangeo2docker/Dockerfile.tpl#L25

Add Jax to ML image?

Jax is becoming very important / useful to us. Would it be easy to add it to the ML image?

Add an image with R in it

In 2i2c-org/farallon-image#15, we added R to a base pangeo image to so folks from Farallon can use it. Most of that is upstreamable - would there be interest in adding an R image here? It would provide the R kernel, RStudio, and probably an install.R onbuild system similar to what repo2docker offers.

dask labextension cluster launching broken in latest image

As reported by @cspencerjones on https://discourse.pangeo.io/t/cluster-unknown-address-scheme-error-when-injected-from-sidebar/720

When I inject a cluster from the sidebar and run I get the following error:
ValueError: unknown address scheme 'gateway' (known schemes: ['inproc', 'tcp', 'tls', 'ucx'])

(this is again on https://us-central1-b.gcp.pangeo.io/ 1)
Thanks again!

This is configured here:

labextension:
factory:
module: dask_gateway
class: GatewayCluster
args: []
kwargs: {}

xref: dask/dask-labextension#135

update a package (not adding it)

Hello all,

based on this issue writing to zarr it's not working - I could actually write a random file, but then it didn't work again, so clearly it's not a stable set up.

I did read these instructions about how to add a package. I am not sure how it works to update a version of the packages (the yml file does not have pinned versions, so I assume whatever version is currently in the image is because of the last time it was updated and all the dependencies were solved).

So how do I trigger an update of a package? Also - I would definitely test it first on staging, but right now it seems that write to zarr is problematic.

Thanks! (and a question ;-) )

Thanks for the great project! I really liked the way you organized the docker images, especially the use of the onbuild in Dockerfiles and I'm totally stealing your great ideas to build oodles of bioinformatics-related images over here.

This isn't an issue, just a question, but I was wondering what does this line do in the start scripts?

#!/bin/bash -l

# ==== ONLY EDIT WITHIN THIS BLOCK =====

export PANGEO_ENV="ml-notebook"
if ! [[ -z "${PANGEO_SCRATCH_PREFIX}" ]] && ! [[ -z "${JUPYTERHUB_USER}" ]]; then
    export PANGEO_SCRATCH="${PANGEO_SCRATCH_PREFIX}/${JUPYTERHUB_USER}/"
fi

# ==== ONLY EDIT WITHIN THIS BLOCK =====

exec "$@"

Since all the notebook environments get named notebook?

Include tzdata

Without tzdata, code like this fails

>>> import pendulum; pendulum.now()
  File "/opt/conda/envs/my_project/lib/python3.7/site-packages/pendulum/__init__.py", line 211, in now
    dt = _datetime.datetime.now(local_timezone())
  File "/opt/conda/envs/my_project/lib/python3.7/site-packages/pendulum/tz/__init__.py", line 60, in local_timezone
    return get_local_timezone()
  File "/opt/conda/envs/my_project/lib/python3.7/site-packages/pendulum/tz/local_timezone.py", line 35, in get_local_timezone
    tz = _get_system_timezone()
  File "/opt/conda/envs/my_project/lib/python3.7/site-packages/pendulum/tz/local_timezone.py", line 63, in _get_system_timezone
    return _get_unix_timezone()
  File "/opt/conda/envs/my_project/lib/python3.7/site-packages/pendulum/tz/local_timezone.py", line 242, in _get_unix_timezone
    raise RuntimeError("Unable to find any timezone configuration")
RuntimeError: Unable to find any timezone configuration

Workaround for now.

FROM pangeo/pangeo-notebook
USER root
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update -y && \
    apt-get install -y tzdata
USER ${NB_USER}

I'm not sure, but this seems OK to include in the base image. It doesn't seem to be too large

The following NEW packages will be installed:
  tzdata
0 upgraded, 1 newly installed, 0 to remove and 11 not upgraded.
Need to get 190 kB of archives.
After this operation, 3109 kB of additional disk space will be used.

discussion: conda-docker as an alternative architecture/build system

qhub(https://qhub.dev) mentions using https://github.com/conda-incubator/conda-docker. I remember coming across the project a while back when we started to use conda-lock in this repository. Seems like an interesting idea, if you can manipulate a base docker image as a .tar.gz and simply 'swap out' the conda environment, this would allow users on a jupyterhub to pretty easily create custom docker images for use in dask-gateway without needing familiarity with docker (or access to docker!).

Missing Bottleneck package on new gcp-uscentral1b

Hello,

I started playing with the new pangeo server.
I started running an old script and I got this error:

ModuleNotFoundError: No module named 'bottleneck'

my script uses bfill()

In fact, if I go on a new notebook and try

import bottleneck as bn
I get

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-2-d82a72844dfa> in <module>
----> 1 import bottleneck as bn

ModuleNotFoundError: No module named 'bottleneck'

I just thought to report it.
thanks so much!

CI Refactor

Ideally someone should be able to

  1. fork this repo
  2. edit environment.yml
  3. create a PR
  4. CI automatically adds a lockfile to the PR
  5. Images are built and tests are run

I think PR slash commands (https://github.com/peter-evans/slash-command-dispatch). Are the way to go for this - maybe /conda-lock ?

Ideally also if base-image is changed all notebook images should rebuild. but if you have a PR that just changed pangeo-notebook, we should only rebuild pangeo-notebook. Currently everything is rebuilt as a single matrix job.

Installing aws cli version 2 and/or its dependency for the `help` subcommand

Using an image built FROM pangeo/pangeo-notebook:2021.05.15 I'm using the aws CLI.

  1. I wonder if you think we should install version 2 of the CLI (curl, unzip, ๐Ÿ˜ฑ, move on filesystem, done, ๐Ÿคข).
  2. I also wonder if you think we should install groff (an apt package). Apparently it is a dependency to run help sub-commands of like aws s3 help but groff isn't available by default in the pangeo-notebook image.

Example output with aws CLI v1 trying to use help sub-command

$ aws
Note: AWS CLI version 2, the latest major version of the AWS CLI, is now stable and recommended for general use. For more information, see the AWS CLI version 2 installation instructions at: https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html

usage: aws [options] <command> <subcommand> [<subcommand> ...] [parameters]
To see help text, you can run:

  aws help
  aws <command> help
  aws <command> <subcommand> help
aws: error: the following arguments are required: command



$ aws help

Could not find executable named "groff"

Remove dask-kubernetes config

@TomAugspurger , I believe we can now fully remove dask-kubernetes config from the images since this config is now taken care of in the dask-gateway helm chart, correct?

kubernetes:
name: dask-{JUPYTERHUB_USER}-{uuid}
worker-template:
spec:
serviceAccount: daskkubernetes
restartPolicy: Never
containers:
- name: dask-worker
image: ${JUPYTER_IMAGE_SPEC}
args:
- dask-worker
- --nthreads
- '2'
- --no-dashboard
- --memory-limit
- 7GB
- --death-timeout
- '60'
resources:
limits:
cpu: "1.75"
memory: 7G
requests:
cpu: 1
memory: 7G

Dask labextension does not work in latest pangeo-notebook image

This is a duplicate of pangeo-gallery/default-binder#21, but I'm positing it here for greater visibility.

In our latest pangeo-notebook images, pasting the dashboard link into the Dask labextension dialog has no effect. It does not recognize the link, and none of the buttons become active. I have tried this on several recent

image

Here is a binder link for our classic example notebook using the latest pangeo-gallery/default-binder, which points to the latest pangeo-notebook docker image (c8c0d72).

https://binder.pangeo.io/v2/gh/pangeo-gallery/default-binder/64d772a/?urlpath=git-pull?repo=https://github.com/pangeo-gallery/physical-oceanography%26amp%3Burlpath=lab/tree/physical-oceanography/01_sea-surface-height.ipynb%3Fautodecode

All of our JupyterHubs are running pangeo/pangeo-notebook:2020.12.08. So something since then has broken the labextension.

Inheriting from these images

Hello! I am interested in setting up a daskhub (following these instruction). IIUC, to configure user environments for this case, I would have to inherit from one of the images provided here. My question is: there are some special considerations for how to do that? Even more specifically: if I wrote a Dockerfile that has:

FROM pangeo/pangeo-notebook:2021.02.07

At the top and then installed a bunch of my desired dependencies, could I build that with repo2docker? Because that would be nifty ๐Ÿ˜„

As always, thanks for leading the way here!

Cannonical pangeo binder?

I am following this closely, as it is quite related to my ongoing work on refactoring our gallery (status update here).

I am very enthusiastic about the idea of having a sort of standard cannonical binder which we can use for many distinct example repos. I think this is what @scottyhq is developing in https://github.com/scottyhq/pangeodev-binder. The benefits of this approach are:

  • Much faster to iterate binder development if you don't have to keep rebuilding the environment
  • We could configure our binders to preload this image on all nodes, greatly speeding up launch time

The challenge is whether we can agree on what packages should be included.

Is https://github.com/scottyhq/pangeodev-binder basically ready? Can we move it to the pangeo org? I want to move forward with some stuff and it would be good to have this piece of the puzzle fixed.

nbgitpuller and pangeo binder problems with recent images

I am trying to debug a very obscure and annoying problem involving nbgitpuller and binder.pangeo.io.

Basically, if I make a binder using a recent pangeo-notebook and try to run it on binder.pangeo.io, nbgitpuller hangs

https://binder.pangeo.io/v2/gh/rabernat/pangeo-osn-demo/75509ae/?urlpath=git-pull?repo=https://github.com/rabernat/pangeo-osn-demo%26amp%3Bbranch=main

There is a javascript error

image

That image is using the same docker image base as as the current staging.us-central1-b.gcp

FROM pangeo/pangeo-notebook:2021.03.27
RUN echo $(which mamba)
RUN mamba install -n notebook -c conda-forge rise ipytree

Dockerfile

However, it does work on staging:
https://staging.us-central1-b.gcp.pangeo.io/hub/user-redirect/git-pull?repo=https://github.com/rabernat/pangeo-osn-demo&branch=main

It also works on binder.pangeo.io if I roll back to an older image, e.g. pangeo/pangeo-notebook:2c94acd:

https://binder.pangeo.io/v2/gh/rabernat/pangeo-osn-demo/34e294b/?urlpath=git-pull?repo=https://github.com/rabernat/pangeo-osn-demo%26amp%3Bbranch=main

Dockerfile

So my best guess is that there is some specific incompatibility with binder.pangeo.io and some package in our recent images related to nbgitpuller, but I have no idea which packages are involved.

cc @yuvipanda and @choldgraf who helped me dig into this.

Latest not resolving to latest

I'm noticing that the latest tag on dockerhub does not resolve to the latest image. Is this automated or do I need to do something to make this work? Please let me know. Thank you very much for all your help Pangeo!!

Does onbuild work with these new images?

I would like to extend the pangeo-notebook image, as we used to do in the old system. I made the following repo:
https://github.com/rabernat/poseidon-bot/tree/binder
with the following Dockerfile

FROM pangeo/pangeo-notebook:9d0723d

plus an environment.yaml file. But it just ignores the environment.yaml file.

Is this "onbuild" capability no longer supported? If not, how do we recommend extending the images?

Binder:
https://binder.pangeo.io/v2/gh/rabernat/poseidon-bot/binder

2021.04.05 production release

I'm not totally sure when release tags should be manually added. But looking through some of the recent tags, it appears when the pangeo-notebook conda meta-package version has been updated a corresponding git tag has been added. Given this, I was wondering if aadc1aa, which bumped the pangeo-notebook conda package to 2021.04.05, should have a 2021.04.05 git tag as well?

Confusing tags on dockerhub

Looking at DockerHub's list of recent tags for the base-image, I'm noticing some concerning inconsistencies:

image

Despite #79 being merged last night, the 2020.06.10 tag doesn't appear in the tags list. Also, the latest and the master tags differ.

@scottyhq - ideas here? I'm at a loss.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.