micromamba-docker

Micromamba for fast building of small conda-based containers.

Images available on Dockerhub at mambaorg/micromamba. Source code on GitHub at mamba-org/micromamba-docker.

"This is amazing. I switched CI for my projects to micromamba, and compared to using a miniconda docker image, this reduced build times more than 2x" -- A new micromamba-docker user

About the image

The micromamba image is currently derived from the debian:bullseye-slim image. Thus far, the image has been focused on supporting use of the bash shell. We plan to build from additional base images and support additional shells in the future (see road map).

Quick start

The micromamba image comes with an empty environment named 'base'. Usually you will install software into this 'base' environment.

Define your desired conda environment in a yaml file:

# Using an environment name other than "base" is not recommended!
# Read https://github.com/mamba-org/micromamba-docker#multiple-environments
# if you must use a different environment name.
name: base
channels:
  - conda-forge
dependencies:
  - pyopenssl=20.0.1
  - python=3.9.1
  - requests=2.25.1

Copy the yaml file to your docker image and pass it to micromamba

FROM mambaorg/micromamba:0.23.0
COPY --chown=$MAMBA_USER:$MAMBA_USER env.yaml /tmp/env.yaml
RUN micromamba install -y -f /tmp/env.yaml && \
    micromamba clean --all --yes

Build your docker image

The 'base' conda environment is automatically activated when the image is running.

$ docker build --quiet --tag my_app .
sha256:b04d00cd5e7ab14f97217c24bc89f035db33a8d339bfb9857698d9390bc66cf8
$ docker run -it --rm my_app python --version
3.9.1

Running commands in Dockerfile within the conda environment

While the conda environment is automatically activated for docker run ... commands, it is not automatically activated during build. To RUN a command from a conda environment within a Dockerfile, as explained in detail in the next two subsections, you must:

Set ARG MAMBA_DOCKERFILE_ACTIVATE=1 to activate the conda environment
Use the 'shell' form of the RUN command

Activating a conda environment for RUN commands

No conda environment is automatically activated during the execution of RUN commands within a Dockerfile. To have an environment active during a RUN command, you must set ARG MAMBA_DOCKERFILE_ACTIVATE=1. For example:

FROM mambaorg/micromamba:0.23.0
COPY --chown=$MAMBA_USER:$MAMBA_USER env.yaml /tmp/env.yaml
RUN micromamba install --yes --file /tmp/env.yaml && \
    micromamba clean --all --yes
ARG MAMBA_DOCKERFILE_ACTIVATE=1  # (otherwise python will not be found)
RUN python -c 'import uuid; print(uuid.uuid4())' > /tmp/my_uuid

Use the shell form of RUN with micromamba

The Dockerfile RUN command can be invoked either in the 'shell' form:

RUN python -c "import uuid; print(uuid.uuid4())"

or the 'exec' form:

RUN ["python", "-c", "import uuid; print(uuid.uuid4())"]  # DO NOT USE THIS FORM!

You must use the 'shell' form of RUN or the command will not execute in the context of a conda environment.

Activating a conda environment for ENTRYPOINT commands

The Dockerfile for building the mambaorg/micromamba image contains:

ENTRYPOINT ["/usr/local/bin/_entrypoint.sh"]

where _entrypoint.sh activates the conda environment for any programs run via CMD in a Dockerfile or using docker run mambaorg/micromamba my_command on the command line. If you were to make an image derived from mambaorg/micromamba with:

ENTRYPOINT ["my_command"]

then you will have removed the conda activation from the ENTRYPOINT and my_command will be executed outside of any conda environment.

If you would like an ENTRYPOINT command to be executed within an active conda environment, then add "/usr/local/bin/_entrypoint.sh" as the first element of the JSON array argument to ENTRYPOINT. For example, if you would like for your ENTRYPOINT command to run python from a conda environment, then you would do:

ENTRYPOINT ["/usr/local/bin/_entrypoint.sh", "python"]

Advanced Usages

Pass list of packages to install within a Dockerfile RUN command

FROM mambaorg/micromamba:0.23.0
RUN micromamba install --yes --name base --channel conda-forge \
      pyopenssl=20.0.1  \
      python=3.9.1 \
      requests=2.25.1 && \
    micromamba clean --all --yes

Using a lockfile

Pinning a package to a version string doesn't guarantee the exact same package file is retrieved each time. A lockfile utilizes package hashes to ensure package selection is reproducible. A lockfile can be generated using conda-lock or micromamba:

docker run -it --rm -v $(pwd):/mnt mambaorg/micromamba:0.23.0 \
   /bin/bash -c "micromamba install -y -f /mnt/env.yaml &&  \
                 micromamba env export --explicit  > /mnt/env.lock"

The lockfile can then be used to create a conda environment:

FROM mambaorg/micromamba:0.23.0
COPY --chown=$MAMBA_USER:$MAMBA_USER env.lock /tmp/env.lock
RUN micromamba install --name base --yes --file /tmp/env.lock && \
    micromamba clean --all --yes

When a lockfile is used to create an environment, the micromamba create .. command does not query the package channels or execute the solver. Therefore using a lockfile has the added benefit of reducing the time to create a conda environment.

Multiple environments

For most use cases you will only want a single conda environment within your derived image, but you can create multiple conda environments:

FROM mambaorg/micromamba:0.23.0
COPY --chown=$MAMBA_USER:$MAMBA_USER env1.yaml /tmp/env1.yaml
COPY --chown=$MAMBA_USER:$MAMBA_USER env2.yaml /tmp/env2.yaml
RUN micromamba create --yes --file /tmp/env1.yaml && \
    micromamba create --yes --file /tmp/env2.yaml && \
    micromamba clean --all --yes

You can then set the active environment by passing the ENV_NAME environment variable like:

docker run -e ENV_NAME=env2 my_multi_conda_image

Changing the user id or name

The default username is stored in the environment variable MAMBA_USER, and is currently mambauser. (Before 2022-01-13 it was micromamba, and before 2021-06-30 it was root.) Micromamba-docker can be run with any UID/GID by passing the docker run ... command the --user=UID:GID parameters. Running with --user=root is supported.

There are two supported methods for changing the default username to something other than mambauser:

If rebuilding this image from scratch, the default username mambauser can be adjusted by passing --build-arg MAMBA_USER=new-username to the docker build command. User id and group id can be adjusted similarly by passing --build-arg MAMBA_USER_ID=new-id --build-arg MAMBA_USER_GID=new-gid

When building an image FROM an existing micromamba image,

FROM mambaorg/micromamba:0.23.0
ARG NEW_MAMBA_USER=new-username
ARG NEW_MAMBA_USER_ID=1000
ARG NEW_MAMBA_USER_GID=1000
USER root
RUN usermod "--login=${NEW_MAMBA_USER}" "--home=/home/${NEW_MAMBA_USER}" \
        --move-home "-u ${NEW_MAMBA_USER_ID}" "${MAMBA_USER}" && \
    groupmod "--new-name=${NEW_MAMBA_USER}" "-g ${NEW_MAMBA_USER_GID}" "${MAMBA_USER}" && \
    # Update the expected value of MAMBA_USER for the _entrypoint.sh consistency check.
    echo "${NEW_MAMBA_USER}" > "/etc/arg_mamba_user" && \
    :
ENV MAMBA_USER=$NEW_MAMBA_USER
USER $MAMBA_USER

Disabling automatic activation

It is assumed that users will want their environment automatically activated whenever running this container. This behavior can be disabled by setting the environment variable MAMBA_SKIP_ACTIVATE=1.

For example, to open an interactive bash shell without activating the environment:

docker run --rm -it -e MAMBA_SKIP_ACTIVATE=1 mambaorg/micromamba bash

Details about automatic activation

At container runtime, activation occurs by default at two possible points:

The end of the ~/.bashrc file, which is loaded by interactive non-login Bash shells.
The ENTRYPOINT script, which is automatically prepended to docker run commands.

The activation in ~/.bashrc ensures that the environment is activated in interactive terminal sessions, even when switching between users.

The ENTRYPOINT script ensures that the environment is also activated for one-off commands when Docker is used non-interactively.

Setting MAMBA_SKIP_ACTIVATE=1 disables both of these automatic activation methods.

Minimizing final image size

Uwe Korn has a nice blog post on making small containers containing conda environments that is a good resource. He uses mamba instead of micromamba, but the general concepts still apply when using micromamba.

Support

Please open an issue if the micromamba docker image does not behave as you expect. For issues about the micromamba program, please use the mamba issue tracker.

Contributing

This project is a community effort and contributions are welcome. Best practice is to discuss proposed contributions on the issue tracker before you start writing code.

Development

Testing

The Bats testing framework is used to test the micromamba docker images and derived images. When cloning this repo you'll want to use git clone --recurse-submodules ..., which will bring in the git sub-modules for Bats. Nox is used to automate tests and must be installed separately. To execute the test suite on all base images, run nox in the top-level directory of the repo. To execute the test suite on a single base image, run nox --session "tests(base_image='debian:bullseye-slim')". If GNU parallel is available on the $PATH, then the test suite will be run in parallel using all logical CPU cores available.

Road map

The current road map for expanding the number of base images is as follows:

Add all releases of debian slim that have not yet reached LTS end of life (ie, bullseye, buster, stretch)
Add the non-slim debian image
Add other debian based distributions that have community interest (such as Ubuntu)
Add non-debian based distributions that have community interest

The build and test infrastructure will need to be altered to support additional base images such that automated test and build occur for all images produced.

Policies

Entrypoint script should not write to files in the home directory. On some container execution systems, the host home directory is automatically mounted and we don't want to mess up or pollute the home directory on the host system.

Parent container choice

As noted in the micromamba documentation, the official micromamba binaries require glibc. Therefore Alpine Linux does not work naively. To keep the image small, a Debian slim image is used as the parent. On going efforts to generate a fully statically linked micromamba binary are documented in mamba GitHub issue #572, but most conda packages also depend on glibc. Therefore using a statically linked micromamba would require either a method to install glibc (or an equivalent) from a conda package or conda packages that are statically linked against glibc.

qbit- / micromamba-docker Goto Github PK