Giter Site home page Giter Site logo

imagewolf's Introduction

ImageWolf - Fast Distribution of Docker Images on Clusters

ImageWolf is a PoC that provides a blazingly fast way to get Docker images loaded onto your cluster, allowing updates to be pushed out quicker.

ImageWolf works alongside existing registries such as the Docker Hub, Quay.io as well as self-hosted registries.

The PoC for ImageWolf uses the BitTorrent protocol spread images around the cluster as they are pushed.

Video

Docker Swarm

asciicast

Kubernetes

asciicast

Getting Started

ImageWolf is currently alpha software and intended as a PoC - please don't run it in production!

Docker Swarm Mode

To start ImageWolf, run the following on your Swarm master:

docker network create -d overlay --attachable wolf
docker service create --network wolf --name imagewolf --mode global \
       --mount type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock \
       containersol/imagewolf

The ImageWolf service is now running on all nodes in our cluster.

The next step is to link ImageWolf with a registry. Whenever an image is pushed to the registry, ImageWolf will immediately pull it and distribute across all the nodes. To set up a private registry linked to ImageWolf:

# First find the id of the ImageWolf task running on this node
# This should work, but is a bit of a hack
TASK=$(docker ps -f name="imagewolf." --format {{.ID}})

# Configuration for the notification endpoint

export REGISTRY_NOTIFICATIONS_ENDPOINTS=$(cat <<EOF
    - name: imagewolf
      disabled: false
      url: http://${TASK}:8000/registryNotifications
EOF
)

# Start up a single instance of the registry
docker run -d --name registry-wolf --network wolf -p 5000:5000 -p 5001:5001 \
           -e REGISTRY_NOTIFICATIONS_ENDPOINTS \
           containersol/registry-wolf

You can then push an image to the registry running on the local node:

docker tag redis localhost:5000/myimage
docker push localhost:5000/myimage

ImageWolf should immediately see the push and distribute the image to the other nodes. You can see what's going on by running docker service logs imagewolf.

We can now start another global service using this image:

# Use the digest of the image to avoid problems with repo lookups
IMAGE_HASH=$(docker inspect --format {{.Id}} localhost:5000/myimage)
docker service create --name test-service --mode global $IMAGE_HASH

In order to monitor progress, you can either pass -d=false when starting the service or run docker service ps test-service. Note that nodes will reject jobs until ImageWolf completes loading the image onto the node.

Kubernetes

In Kubernetes, there is the DaemonSet concept which ensures that all (or some) nodes run a copy of a pod. As nodes are added to the cluster, pods are added to them. As nodes are removed from the cluster, those pods are garbage collected.

Then a headless Service is created allowing ImageWolf to discover all the peers.

To start ImageWolf, deploy the DaemonSet and its associated headless Service using the following command:

kubectl apply -f kubernetes.yaml

Testing:

# You may need to wait few minutes before Kubernetes display the service public IP
export IMAGEWOLF_IP=$(kubectl get svc imagewolf --no-headers | awk '{print $3}')

# Simulate a Docker Hub webhook
curl "${IMAGEWOLF_IP}/hubNotifications" \
   -H 'Content-Type: application/json' \
   -d '{
      "push_data": {
         "tag": "latest"
      },
      "repository": {
         "repo_name": "redis"
      }
   }'

# Inspect the logs
kubectl logs -l app=imagewolf

# Check the stats
curl "${IMAGEWOLF_IP}/stats"

Integration with Docker Hub

The Docker Hub has a web hooks feature which can be used to call a remote service when an image is pushed. When ImageWolf receives the callback, it will pull the image and distribute to the cluster, which is significantly faster than all nodes pulling individually from the Docker Hub.

To use this feature, you will need to expose the ImageWolf service so that it is accessible to the Hub. This can be done by adding the flag -p 8000:8000 to the service create command. You can then add the URL or IP address of your server as a webhook, specifying hubNotifications as the path e.g: http://mycluster.com/hubNotifications. If your cluster runs on a internal network you can use a service such as ngrok to forward calls.

Stats

There are no hard numbers yet.

The real improvements are expected on large clusters, where multiple Docker engines pull images simultaneously. Also whilst a ramped deployment may avoid the "stampeding herd" problem swamping the registry, deployment times will still be longer as whenever a container is deployed to a node without the image a new pull will take place - with ImageWolf the image will already be on the node and the container will start immediately.

Other Approaches

Using a global or distributed file system to back a Docker registry can also achieve many of the benefits of ImageWolf.

Multiarch

ImageWolf was tested on a Raspberry PI cluster as well as in the Google cloud. You should find that the above instructions work identically on 32-bit ARM (armv7l) as well as x86_64 through the magic of multi-arch images.

Bugs & Improvements

ImageWolf is a PoC currently and there are a lot of rough edges:

  • Services have to be started using the Image ID to avoid repo pinning problems
  • No optimisations have been carried out
  • The internal use of the Docker CLI and sock is a bit hacky
  • If ImageWolf is still distributing the image when a service is created, nodes will attempt to pull from the registry simultaneous with ImageWolf pushing the image
  • Allow Google Cloud Platform container registry webhook using Pub/Sub

Assuming there is interest in ImageWolf, the next step will be to change the hacked together code into a coherent solution.

Feedback

This is a PoC. If it is useful or interesting to you, please get in touch and let us know.

imagewolf's People

Contributors

amouat avatar moredhel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

imagewolf's Issues

Support image and node selection

Should be able to define which images are pushed to nodes e.g. all images with "production" label. Would be nice to also be able to push to nodes with a given label e.g. just push this image to the "web tier".

Image pinning problem

Currently there is a horrible hack to get nodes to use the local image where we explicitly use the digest. Need to find a better solution.

RSS feed support for clusters that don't allow inbound connections

Hi,

I have this idea, not all k8s clusters allow incoming connections and that would make listening for registry events unusable, for those scenarios, maybe there could be a service sitting next to the registry that would listen for the registry events and generating an RSS feed from them Imagewolf would subscribe to the feed and pull new images when available.

What do you think?

Docker load causes high CPU

The docker load calls cause high CPU and mem usage. I believe quayctl solves this problem by doing a docker pull from a temporary reg set up on the local node.

Pipe data through notary to docker load

If DOCKER_CONTENT_TRUST were set to 1, and notary were on the PATH (by building it into a child image of ImageWolf), I'd expect the nodes to verify the content through notary since docker load allegedly doesn't do that.

Support kubernetes

Current PoC only supports swarm, but it should be easy to get working on k8s. First step is figuring out how to get list of peers.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.