Giter Site home page Giter Site logo

docker-stateless-ml's Introduction

Stateless Docker Machine Learning Experiments

Here we want explore how we can use docker as a thin wrapper to run machine learning experiments in a stateless way. What do I mean by stateless? Let’s look at a stateful solution: We use some base image, e.g. nvcr.io/nvidia/pytorch:23.12-py3, and start an interactive container based off this image. We then attach a shell to this container and set up our system dependencies and project dependencies (e.g. pip requirements). To run our experiment we start some command python main.py --foo --bar. We get some results somewhere which we need to copy to some volume mapped directory on the host. We further modify something here and there in the container. Now the running container has a certain state. Sure we can detach and attach the container and hope that the server on which the container runs is never rebooted, or that we never have to switch the server and set everything up again. In most of these cases, we end up loosing our state and the actual experiment (running python main.py ...) becomes harder to reproduce.

A stateless setup: In a stateless docker setup, we want docker to act as a virtual environment for everything that is not (a) our code, (b) our data, and (c) our results. That is, we want docker to define the operating system, the system dependencies, the python version and environment, and the python dependencies. In docker, this can be done by defining a docker custom image in a Dockerfile:

# Select the base image
FROM nvcr.io/nvidia/pytorch:23.12-py3

# Select the working directory
WORKDIR /app

# Setup image: install system dependencies etc.
# RUN apt install ...

# Install Python requirements
COPY ./requirements.txt ./requirements.txt
RUN pip install -r requirements.txt

You can follow these steps by cloning this repository:

git clone https://github.com/braun-steven/docker-stateless-ml.git
cd docker-stateless-ml

We start by building the docker image:

docker build -t tutorial .

To test if everything works, we can now start a container using this image:

docker run --gpus all -it --rm tutorial python -c "print('Hello World from docker')"

Flags:

  • --gpus all: Give the container access to all GPUs on the host machine. Note, that for this flag we need the nvidia-container-runtime.
  • --rm: Remove the container after it is stopped since we do not care about the container state.

To make use of our project code, required data, and to store our results, we need to mount volumes into the container. We can do this using the --volume flag. The following will run the example =src/run.py= script of the repository using data in data/ and storing results in results/:

docker run --gpus all --rm \
    --volume "$(pwd)"/src:/app/src \
    --volume "$(pwd)"/data:/data \
    --volume "$(pwd)"/results:/results \
    tutorial \
    python /app/src/run.py

Output:

[...]  # truncated

PyTorch Version: 2.0.1+cu117
CUDA Available: True
CUDA Version: 11.7
CuDNN Version: 8500
CUDA Device Name: Tesla V100-SXM3-32GB-H
Number of CUDA Devices Available: 16
Current CUDA Device Index: 0
Reading /data/samples.csv
Writing /results/sums.csv

We can now investigate the saved result on the host (i.e., not in the docker container) in results/sums.csv:

$ cat results/sums.csv
3.0
12.0

With this, we have successfully used docker to thinly wrap our project into an environment defined in our Dockerfile. To be able to reproduce our experiment, we only need to ensure, that a different server has the project code, the data, and a build of the docker image (docker build -t turorial .). Bonus points if we are able to synchronize or symlink project, data, and result directories across servers via some shared storage.

docker-stateless-ml's People

Contributors

braun-steven avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

minimrbanana

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.