Giter Site home page Giter Site logo

nvidia / container-canary Goto Github PK

View Code? Open in Web Editor NEW
221.0 3.0 14.0 202 KB

A tool for testing and validating container requirements against versioned manifests

License: Apache License 2.0

Makefile 4.35% Go 95.65%
containers docker podman ci versioning utilities kubernetes automation

container-canary's Introduction

Container Canary

Test GitHub go.mod Go version GitHub tag (latest SemVer)

A little bird to validate your container images.

$ canary validate --file examples/awesome.yaml your/container:latest
Validating your/container:latest against awesome
 ๐Ÿ“ฆ Required packages are installed                  [passed]
 ๐Ÿค– Expected services are running                    [passed]
 ๐ŸŽ‰ Your container is awesome                        [passed]
validation passed

Many modern compute platforms support bring-your-own-container models where the user can provide container images with their custom software environment. However platforms commonly have a set of requirements that the container must conform to, such as using a non-root user, having the home directory in a specific location, having certain packages installed or running web applications on specific ports.

Container Canary is a tool for recording those requirements as a manifest that can be versioned and then validating containers against that manifest. This is particularly useful in CI environments to avoid regressions in containers.

Installation

You can find binaries and instructions on our releases page.

Example (Kubeflow)

The Kubeflow documentation has a list of requirements for container images that can be used in the Kubeflow Notebooks service.

That list looks like this:

  • expose an HTTP interface on port 8888:
    • kubeflow sets an environment variable NB_PREFIX at runtime with the URL path we expect the container be listening under
    • kubeflow uses IFrames, so ensure your application sets Access-Control-Allow-Origin: * in HTTP response headers
  • run as a user called jovyan:
    • the home directory of jovyan should be /home/jovyan
    • the UID of jovyan should be 1000
  • start successfully with an empty PVC mounted at /home/jovyan:
    • kubeflow mounts a PVC at /home/jovyan to keep state across Pod restarts

With Container Canary we could write this list as the following YAML spec.

# examples/kubeflow.yaml
apiVersion: container-canary.nvidia.com/v1
kind: Validator
name: kubeflow
description: Kubeflow notebooks
env:
  - name: NB_PREFIX
    value: /hub/jovyan/
ports:
  - port: 8888
    protocol: TCP
volumes:
  - mountPath: /home/jovyan
checks:
  - name: user
    description: ๐Ÿ‘ฉ User is jovyan
    probe:
      exec:
        command:
          - /bin/sh
          - -c
          - "[ $(whoami) = jovyan ]"
  - name: uid
    description: ๐Ÿ†” User ID is 1000
    probe:
      exec:
        command:
          - /bin/sh
          - -c
          - "id | grep uid=1000"
  - name: home
    description: ๐Ÿ  Home directory is /home/jovyan
    probe:
      exec:
        command:
          - /bin/sh
          - -c
          - "[ $HOME = /home/jovyan ]"
  - name: http
    description: ๐ŸŒ Exposes an HTTP interface on port 8888
    probe:
      httpGet:
        path: /
        port: 8888
      initialDelaySeconds: 10
  - name: NB_PREFIX
    description: ๐Ÿงญ Correctly routes the NB_PREFIX
    probe:
      httpGet:
        path: /hub/jovyan/lab
        port: 8888
      initialDelaySeconds: 10
  - name: allow-origin-all
    description: "๐Ÿ”“ Sets 'Access-Control-Allow-Origin: *' header"
    probe:
      httpGet:
        path: /
        port: 8888
        responseHttpHeaders:
          - name: Access-Control-Allow-Origin
            value: "*"
      initialDelaySeconds: 10

The Canary Validator spec reuses parts of the Kubernetes configuration API including probes. In Kubernetes probes are used to check on the health of a pod, but in Container Canary we use them to validate if the container meets our specification.

We can then run our specification against any desired container image to see a pass/fail breakdown of requirements. We can test one of the default images that ships with Kubeflow as that should pass.

$ canary validate --file examples/kubeflow.yaml public.ecr.aws/j1r0q0g6/notebooks/notebook-servers/jupyter-scipy:v1.5.0-rc.1
Validating public.ecr.aws/j1r0q0g6/notebooks/notebook-servers/jupyter-scipy:v1.5.0-rc.1 against kubeflow
 ๐Ÿ‘ฉ User is jovyan                                   [passed]
 ๐Ÿ†” User ID is 1000                                  [passed]
 ๐Ÿ  Home directory is /home/jovyan                   [passed]
 ๐ŸŒ Exposes an HTTP interface on port 8888           [passed]
 ๐Ÿงญ Correctly routes the NB_PREFIX                   [passed]
 ๐Ÿ”“ Sets 'Access-Control-Allow-Origin: *' header     [passed]
validation passed

For more examples see the examples directory.

Validator reference

Validator manifests are YAML files that describe how to validate a container image. Check out the examples directory for real world applications.

Metadata

Each manifests starts with some metadata.

# Manifest versioning
apiVersion: container-canary.nvidia.com/v1
kind: Validator

# Metadata
name: foo  # The name of the platform that this manifest validates for
description: Foo runs containers for you  # A description of that platform
documentation: https://example.com  # A link to the documentation that defines the container requirements in prose

Runtime options

Next you can set runtime configuration for the container you are validating. You should set these to mimic the environment that the compute platform will create. When you validate a container it will be run locally using Docker.

Environment variables

A list of environment variables that should be set on the container.

env:
  - name: HELLO
    value: world
  - name: FOO
    value: bar

Ports

Ports that need to be exposed on the container. These need to be configured in order for Container Canary to perform connectivity tests.

ports:
  - port: 8888
    protocol: TCP

Volumes

Volumes to be mounted to the container. This is useful if the compute platform will always mount an empty volume to a specific location.

volumes:
  - mountPath: /home/jovyan

Command

You can specify a custom command to be run inside the container.

command:
 - foo
 - --bar=true

Checks

Checks are the tests that we want to run against the container to ensure it is compliant. Each check contains a probe, and those probes are superset of the Kubernetes probes API and so any valid Kubernetes probe can be used in a check.

checks:
  - name: mycheck  # Name of the check
    description: Ensuring a thing  # Descrption of what is being checked (will be used in output)
    probe:
      ...  # A probe to run

Exec

An exec check runs a command inside the running container. If the command exits with 0 the check will pass.

checks:
  - name: uid
    description: User ID is 1234
    probe:
      exec:
        command:
          - /bin/sh
          - -c
          - "id | grep uid=1234"

HTTPGet

An HTTP Get check will perform an HTTP GET request against your container. If the response code is <300 and the optional response headers match the check will pass.

checks:
  - name: http
    description: Exposes an HTTP interface on port 80
    probe:
      httpGet:
        path: /
        port: 80
        httpHeaders:  # Optional, headers to set in the request
          - name: Foo-Header
            value: "myheader"
        responseHttpHeaders:  # Optional, headers that you expect to see in the response
          - name: Access-Control-Allow-Origin
            value: "*"

TCPSocket

A TCP Socket check will ensure something is listening on a specific TCP port.

checks:
  - name: tcp
    description: Is listening via TCP on port 80
    probe:
      tcpSocket:
        port: 80

Delays, timeouts, periods and thresholds

Checks also support the same delays, timeouts, periods and thresholds that Kubernetes probes do.

checks:
  - name: uid
    description: User ID is 1234
    probe:
      exec:
        command: [...]
      initialDelaySeconds: 0  # Delay after starting the container before the check should be run
      timeoutSeconds: 30  # Overall timeout for the check
      successThreshold: 1  # Number of times the check must pass before moving on
      failureThreshold: 1  # Number of times the check is allowed to fail before giving up
      periodSeconds: 1  # Interval between runs if threasholds are >1

Contributing

Contributions are very welcome, be sure to review the contribution guidelines.

Maintaining

Maintenance steps can be found here.

License

Apache License Version 2.0, see LICENSE.

container-canary's People

Contributors

bashbunni avatar dependabot[bot] avatar jacobtomlinson avatar kylefromnvidia avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

container-canary's Issues

Add optional conditions

It would be useful to make some conditions as optional and add a flag to enable them if desired.

For example in the databricks example folks may want Python OR R but not both. They may also want to check for CUDA toolkit for use on GPU nodes, but not everyone will want to test for that.

It would be nice to add an optional setting which still runs them but doesn't fail the test if they fail, but have a CLI flag that makes them mandatory.

E.g

# foo.yaml
apiVersion: container-canary.nvidia.com/v1
kind: Validator
name: foo
command:
  - /bin/sh
  - -c
  - "sleep 3600"
checks:
  - name: bash
    description: Has bash installed
    optional: true
    probe:
      exec:
        command:
          - /bin/sh
          - -c
          - "which bash"
$ canary validate --file foo.yaml --check-optional "bash" somecontainer

Implement gRPC check

The gRPC liveness check is in alpha in Kubernetes v1.23 and behind a feature gate. Once it is not behind a gate it needs to be supported here in order for the checks to continue to be a superset of the Kubernetes probes.

It could also be implemented sooner and placed in a similar alpha state.

Container fails to start if port already in use

If canary tries to expose a container port for testing and that port is already in use the container fails to start and canary fails to validate.

Works

# check-port.yaml
apiVersion: container-canary.nvidia.com/v1
kind: Validator
name: check-port
description: Check port
env: []
ports:
  - port: 80
    protocol: TCP
volumes: []
checks:
  - name: http
    description: Check port 80
    probe:
      httpGet:
        path: /
        port: 80
      failureThreshold: 30
$ canary validate --file check-port.yaml nginx
Validating nginx against check-port
 Check port 80                                      [passed]
validation passed

Reproducer

$ docker run -p 80:80 nginx  # Start a process that binds to port 80 in another terminal
$ canary validate --file /tmp/test.yaml nginx
Validating nginx against check-port
\ Starting container
Error: container failed to start after 10 seconds

The container also doesn't get cleaned up.

$ docker ps -a             
CONTAINER ID   IMAGE                                  COMMAND                  CREATED         STATUS         PORTS                               NAMES
e8d32b8f45aa   nginx                                  "/docker-entrypoint.โ€ฆ"   2 minutes ago   Created                                            canary-runner-d43137e8

--debug flag causes panic

$ canary version                                              
Container Canary
 Version:         v0.2.1
 Go Version:      go1.17.8
 Commit:          d97ec23
 OS/Arch:         linux/amd64
 Built:           2022-04-14T10:03:44Z

$ canary validate --file examples/awesome.yaml ubuntu --debug     
Validating ubuntu against awesome
Running container with command 'docker run -d --name canary-runner-f716bacd ubuntu sleep 30'
 ๐Ÿ“ฆ Required packages are installed                  [passed]
 ๐Ÿค– Expected services are running                    [passed]
 ๐ŸŽ‰ Your container is awesome                        [passed]
validation passed
Caught panic:

runtime error: invalid memory address or nil pointer dereference

Restoring terminal...

goroutine 1 [running]:
runtime/debug.Stack()
        /opt/hostedtoolcache/go/1.17.8/x64/src/runtime/debug/stack.go:24 +0x65
runtime/debug.PrintStack()
        /opt/hostedtoolcache/go/1.17.8/x64/src/runtime/debug/stack.go:16 +0x19
github.com/charmbracelet/bubbletea.(*Program).StartReturningModel.func3()
        /home/runner/go/pkg/mod/github.com/charmbracelet/[email protected]/tea.go:359 +0x95
panic({0xb3e660, 0x11f1150})
        /opt/hostedtoolcache/go/1.17.8/x64/src/runtime/panic.go:1047 +0x266
github.com/nvidia/container-canary/internal/validator.model.View({0xc000048360, {0xd116d8, 0xc00035a000}, 0x1, {0xc0001dc8c0, 0x3, 0x4}, 0x1, {{{0x11f6e60, 0x4, ...}, ...}, ...}, ...})
        /home/runner/work/container-canary/container-canary/internal/validator/validator.go:184 +0x185
github.com/charmbracelet/bubbletea.(*Program).StartReturningModel(0xc0001b0200)
        /home/runner/go/pkg/mod/github.com/charmbracelet/[email protected]/tea.go:549 +0x1438
github.com/nvidia/container-canary/internal/validator.Validate({0x7fff7bb98156, 0x6}, {0x7fff7bb98140, 0x15}, 0xc0000dbdd0, 0x1)
        /home/runner/work/container-canary/container-canary/internal/validator/validator.go:239 +0x545
github.com/nvidia/container-canary/cmd.glob..func1(0x11f91c0, {0xc00009ac40, 0x1, 0x4})
        /home/runner/work/container-canary/container-canary/cmd/validate.go:50 +0xd1
github.com/spf13/cobra.(*Command).execute(0x11f91c0, {0xc00009ac00, 0x4, 0x4})
        /home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:856 +0x60e
github.com/spf13/cobra.(*Command).ExecuteC(0x11f8f40)
        /home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:974 +0x3bc
github.com/spf13/cobra.(*Command).Execute(...)
        /home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:902
github.com/nvidia/container-canary/cmd.Execute()
        /home/runner/work/container-canary/container-canary/cmd/root.go:44 +0x25
main.main()
        /home/runner/work/container-canary/container-canary/main.go:23 +0x17
Error: program returned unknown model

`tcpSocket` doesn't actually test TCP ports inside container

Consider the following:

#!/bin/sh

set -ex

cat > phony-tcp.Dockerfile <<EOF
FROM ubuntu:22.04

# It succeeds even without the EXPOSE command
# EXPOSE 8080

CMD /bin/bash -c 'while true; do sleep 60; done'
EOF

cat > phony-tcp.yaml <<EOF
apiVersion: container-canary.nvidia.com/v1
kind: Validator
name: phony-tcp
description: phony-tcp checks
ports:
  - port: 8080
    protocol: tcp
checks:
  - name: tcp
    probe:
      tcpSocket:
        port: 8080
EOF

docker build -t phony-tcp -f phony-tcp.Dockerfile .

container-canary validate --file phony-tcp.yaml phony-tcp

The check succeeds even though the container is clearly not listening to port 8080, because container-canary is connecting to the Docker proxy, rather than the actual process inside the container.

Unfortunately, I'm not sure on how to actually fix this. We may have to simply issue a warning for this particular check.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.