Giter Site home page Giter Site logo

containerd / cri Goto Github PK

View Code? Open in Web Editor NEW
900.0 61.0 349.0 19.77 MB

Moved to https://github.com/containerd/containerd/tree/master/pkg/cri . If you wish to submit issues/PRs, please submit to https://github.com/containerd/containerd

Home Page: https://github.com/containerd/containerd/tree/master/pkg/cri

License: Apache License 2.0

Go 92.44% Makefile 0.77% Shell 6.45% Dockerfile 0.13% PowerShell 0.21%
containerd container-runtime-interface kubernetes hacktoberfest

cri's Issues

Create a Makefile

We need a Makefile to manage the build, test, package and release process.

Create permanent network namespace for sandbox.

Currently, the network teardown logic assumes that once the network namespace is gone, we don't need to teardown the network again.

However, it doesn't work well with current implementation of cri-containerd. Currently, cri-containerd creates a sandbox container for each sandbox and use the network namespace of this container. But the problem is that, once the container dies, we'll not be able to find the network namespace by /proc/${PID}/ns/net.

So based on current logic, once the sandbox container dies itself, we'll not be able to find the network namespace, thus not teardown network for it. This will cause resource leakage.

We need to maintain a permanent network namespace for each sandbox, which is similar with cri-o.
We should remove the permanent network namespace as soon as we successfully tear down the network, so as to avoid multiple teardown to the same network in most cases.

/cc @xlgao-zju

Support Privileged.

Privileged is initially a Docker concept. Docker grants a group of permissions and mounts a group of directories into a privileged pod.

In Kubernetes, we should define our own Privileged kubernetes/kubernetes#44503. For now, we could do whatever docker is doing.

Things to do:

  1. Figure out what docker is doing to Privileged container.
  2. Do the same in cri-contained.

/cc @heartlock

Enable Travis CI

We should enable travis to run presubmit build and unit test.

The parameters of InitCNI should be filled in reverse order in NewCRIContainerdService function

In InitCNI, the first parameter "pluginDir " means the path of cni config file.
"cniDirs" means the path of bin.

func NewCRIContainerdService(containerdEndpoint, rootDir, networkPluginBinDir, networkPluginConfDir,
	streamAddress, streamPort string) (CRIContainerdService, error) {
        ......
        netPlugin, err := ocicni.InitCNI(networkPluginBinDir, networkPluginConfDir)     <--here
	

func InitCNI(pluginDir string, cniDirs ...string) (CNIPlugin, error) {
	plugin := probeNetworkPluginsWithVendorCNIDirPrefix(pluginDir, cniDirs, "")

Use `repo-infra` when it is ready for our use case.

https://github.com/kubernetes/repo-infra is great tool.

Now we copied the boilerplate verification part into the repo, but removed the other go code verification.

It would be better to consolidate this in the future and only use repo-infra when it is ready for our use case.

Several related issue:

  1. kubernetes/repo-infra#15
  2. kubernetes/repo-infra#12
  3. We don't need the build part now.
  4. Subtree the repo causes DCO problem.
  • If we squash commits, @mikebrow found that he could not sign the squashed commit.
  • If we don't squash, that will be quite spammy.

Add Container Metrics Support

According to CRI, container runtime is responsible to provide container metrics, including:

  • container CPU usage.
  • container memory usage.
  • container writable layer size.
  • Image FS usage.

See here

Containerd should have provided all those information:

  • container CPU/memory usage: Provided through Prometheus metrics endpoint.
  • container writable layer size: Provided by snapshotter.
  • Image FS usage:
    • May need to count size of each snapshot;
    • May need to consider content store.

We should support these container metrics.

/cc @kubernetes-incubator/maintainers-cri-containerd

CRI-Containerd Missing Pieces

cri-containerd already supports basic container/sandbox lifecycle and image management today. However, there are still many missing pieces. List them here with relative proprieties:

Missing Features

  • [P0] Privileged support. (Issue: #29, PR: #51)
  • [P0] Stop container with image default stop signal. (Issue: #61, PR: #83)
  • [P1] Container logging. (PR: #56)
  • [P1] ExecSync. ExecSync is very useful for testing, so prioritize it.
  • [P1] Pull image authentication. (Issue: #58, PR: #88)
  • [P1] Sandbox /etc/resolv.conf. (Issue: #28, PR: #50)
  • [P1] Sandbox /etc/hosts. (PR: #60)
  • [P1] Sandbox /dev/shm. (PR: #67)
  • [P1] Device mapping. (PR: #51)
  • [P1] Set user/username. (PR: #146, #168)
  • [P1] ExecSync timeout. (PR: #137)
  • [P2] Selinux options/label. (PR: #157)
  • [P2] Container streaming.
    • Exec (PR: #115)
    • Attach
    • Portforward (PR: #130)
  • [P2] Support systemd cgroup. (PR: #290)
  • [P3] Image list filter.
  • [P2] CRI conformance. Stop running container when stopping sandbox, remove container when removing sandbox etc. (PR: #77)
  • [P2] OOM Event. Handle containerd event, and set container status exit reason correspondingly. (PR: #91)
  • [P2] Apparmor. (PR: #159)
  • [P2] Seccomp. (PR: #219)
  • [P2] Sysctl (PR: #119).
  • [P2] Other pod sandbox security context (user/selinux etc.). Figure out what this means to sandbox container.
  • [P2] Container metrics. (PR: #265)
  • [P2] Image filesystem metrics. (PR: #257)
  • [P2] Container manager. Add container manager to ensure containerd and cri-containerd are in runtime cgroup. (Issue: #181, PR: containerd/containerd#1443, #184)
  • [P2] Host port. (PR: #154)

Improvements

  • [P0] Switch to containerd api. Including change the implementation and update unit test (add mock client etc.). (Issue: #49)
  • [P1] Switch to new containerd client. (PR: #113)
  • [P1] Refactor metadata store. (PR: #66)
    • Add pure in-memory cache for image management;
    • Add in-memory wrapper for sandbox/container metadata store, because there are several things which don't need to be checkpointed.
    • Reconsider what we should store in container labels.
  • [P1] Create permanent network namespace. (Issue: #43, PR: #54)
  • [P2] Add unit test for image. With new containerd client, it's much easier to add unit test for image part. (Issue: #36)
  • [P3] Check image config and top level snapshot existence when list/Inspect image. (Issue: containerd/containerd#1514, #303)
  • [P1] Default sandbox container resource limit. (PR: #92)
  • [P3] Add truncindex for image and container id. (PR: #235)
  • [P2] Reliable containerd event handling. Requeue event on error. (#628)
  • [P2] Checkpoint versioning. (Metadata and status are all versioned now)
  • [P2] Handle recovery from cri-containerd and containerd restart.

Containerd Missing Features

  • [P0] V2 Schema 1 image support. (Issue: #35, containerd/containerd#851)
  • [P1] Image (content/snapshot) garbage collection.
  • [P1] Containerd version. Containerd doesn't report semver because of a bug.

Cleanup logs and errors.

We should cleanup logs and errors:

  1. We should reconsider log level, level 4 for list/status operations is too spammy;
  2. We should make sure we are using %+v for struct, %q for string, and %v for error;
  3. We should make sure necessary information is included in the log/error, e.g. container/sandbox/image id.

/cc @mikebrow

Cleanup the image management code to use containerd client

After #113, cri-containerd will be using containerd client.

However, there are still some image specific code needs to cleanup, e.g. getImageInfo, localResolve etc.

We should add required functions into containerd client, and clean those functions up.

/cc @kubernetes-incubator/maintainers-cri-containerd
@abhinandanpb

We need fake containerd services.

To better unit test functions of cri-containerd, we need a group of fake containerd services (similar with fake docker client)

Containerd now has 4 kinds of services:

  • content
  • execution
  • images
  • rootfs

We can add a fake execution service first, because the execution api is more stable. And add others later when image management code is added and needs test.

/cc @kubernetes-incubator/maintainers-cri-containerd
@heartlock Are you interested in this? This is a good start to get familiar with containerd and the code base.

Use context in the right way.

Currently, we pass the grpc handler context to containerd client directly.

Once the context is cancelled by the client, no functions calls could be made to containerd client.

The good thing is that, we can avoid proceeding when user want to cancel the context. However, the bad thing is that we the cleanup functions in defer will not be able to run, either.

Note the problem here, so that we could revisit this in the future.

Add `--version` flag.

Add cri-containerd version.

We should version cri-containerd based on the repo tag and commit number. The version will be used:

  1. As part of the name when building a release package or any docker image in the future.
  2. As part of --version output.

Refer kubernetes/node-problem-detector#71.

Support schema 1 manifest.

Currently, containerd only supports schema 2 manifest.

For cri-containerd, not supporting schema 1 means:

  1. Users who are still using schema 1 image could not use cri-containerd integration.
  2. Some system container images are still using schema 1 manifest, e.g. pause image.

Regarding to 1), it is fine for our alpha release (this quarter), but it should be fixed before beta (probably the end of the year).

For 2), we need to rebuild our schema 1 system container image.

Ref containerd/containerd#851.

Add support for annotations and labels

generateSandboxContainerSpec() is currently filtering all annotations received through the CRI api vs copying them to the runtime spec. Consider if some of the annotations should be filtered or if all should be copied over. For example:

annotations := config.GetAnnotations()
for key, value := range annotations {
 		g.AddAnnotation(key, value)
}

Similarly, labels received over CRI are not being processed / handed over to containerd.

Consider the patterns enabled/being used for kubernetes when connected to dockershim and CRI-O, should we be unique here?

Integrate with kubernetes network plugin.

CRI expects the runtime to handle container networking.

As is expected, containerd itself doesn't provide container networking. We should integrate cri-containerd with kubernetes network plugin to initialize the network namespace.

Things need to do:

  1. Add flags for network plugin settings and initialize network plugin in cri-containerd. (See https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/dockershim/docker_service.go#L197-L204)
  2. Implement interface needed by network plugin, include namespaceGetter and portMappingGetter. (See https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/dockershim/docker_service.go#L119-L137)
  3. Setup and teardown network when run/stop pod sandbox. (See https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/dockershim/docker_sandbox.go#L128)

/cc @xlgao-zju

Get rid of nsenter and socat

Currently we are using nsenter for both CNI plugin and port forword.

Ideally, we should be able to use NetNS.Do to run code inside a specific network namespace.

However, golang has an issue that it may spawn new threads from locked thread, which will inherit the network namespace. golang/go#20676

We should be able to get rid of all nsenters after this is fixed.

Error happened when run `make lint`

Error happened when run make lint

root@ubuntu:/home/vcap/go-project/src/github.com/kubernetes-incubator/cri-containerd# make lint
checking lint
for directory ./cmd ...
for directory ./cmd/cri-containerd ...
WARNING: deadline exceeded by linter interfacer on ./cmd/cri-containerd (try increasing --deadline)
WARNING: deadline exceeded by linter gosimple on ./cmd/cri-containerd (try increasing --deadline)
WARNING: deadline exceeded by linter structcheck on ./cmd/cri-containerd (try increasing --deadline)
Makefile:31: recipe for target 'lint' failed
make: *** [lint] Error 2

check so slow that exceed deadline
@mikebrow @Random-Liu

Add unit test back

Many of our unit tests rely on fake containerd services. Unit test does give us some values:

  1. Make sure the options we pass to containerd are as expected;
  2. Make sure the workflow is as expected especially for error handling.

However, since containerd is still in alpha, they are still getting feedbacks and may change api again and again, it's a pain and a bit waste of effort to maintain the fake services up to date.

We decided to temporarily remove the unit tests which use fake services, and rely on the integration test (CRI validation test and node e2e test soon) for now.

We'll add the unit test back after:

  1. Containerd api is beta or GA.
  2. cri-containerd starts to use containerd client.

The commit before removing unit test is cbd936b.

@mikebrow @yujuhong
/cc @stevvooe @crosbymichael

Developer Guide

It's too early to add a developer guide, but we should add one in the future when the build and development process is finalized.

Checkpoint and restart recovery

There are several restart recovery problems with current cri-containerd:

  1. cri-containerd restart. Because cri-containerd maintains all internal state in-memory, including sandbox list, container list and image list, once restarted all state will be lost.
  2. containerd restart. When containerd restart and reconnect, there may be state mismatch between containerd and cri-containerd, e.g. a container dies during containerd is down.

To fix this, we should recover/reconcile state during cri-containerd start or after containerd restart and reconnect.

There are 3 kinds of internal state:

  1. Image list. Containerd has all the information we need, we just need to list images from containerd and recover the image list.
  2. Sandbox/container metadata: Most of the metadata is not provided by containerd, we need to checkpoint them for restart recovery. However, because metadata is constant, we could save it into containerd container label so as to leverage containerd metadata store to save it for us.
  3. Container status: Container status is not persisted by containerd, we need to persist it ourselves. And because it's constantly changing, we may not want to abuse containerd container label to save it. So we need to maintain its checkpoint ourselves.

/cc @kubernetes-incubator/maintainers-cri-containerd

Maintain resolv.conf for pod.

Previously, Docker maintains a resolv.conf for containers sharing the same network namespace.

Now, when using containerd, we need to do these ourselves. We need to:

  1. Maintain a resolv.conf when creating/removing sandbox if not using HostNetwork. And initialize resolv.conf with DNSOptions in PodSandboxConfig.
  2. Bind mount the resolv.conf into each container.

This logic is similar with cri-o. (See https://github.com/kubernetes-incubator/cri-o/blob/master/server/sandbox_run.go#L144-L159)

/cc @xlgao-zju

Disable Pid namespace sharing.

Kubernetes enabled Pid namespace sharing inside a pod with Docker 1.13+.

However, some customers reported that they could not run their container, because their containers made assumptions about the pid 1.

Before resolving this issue, we could not use pid namespace sharing in production. In cri-containerd, we should also disable it by default, and probably introduce a flag to enable it.

Rename `container` to `task`

The previous container concept in containerd is changed to task.
In the refactoring PR, I've changed most of them to task.

However, there are some left, we should clean them up.

CRI Containerd Integration TODOs

Required Internal Services:

  • [P0] Metadata store. A metadata store is required to store metadata and system state #9.
    • [P0] In-memory metadata store #9.
    • [P1] On-disk metadata store. (file-based or db-based)
  • [P1] Container manager. A container manager keeps containerd running in runtime cgroup. (See dockershim container manager)

Container Runtime Interface Functions:

Use new execution api.

The execution api of containerd is updated just now containerd/containerd#894.

This is a good move for us, we could potentially checkpoint state-independent data into the container labels, like what we are doing for docker today.

However, we still need to do checkpoint for the state-dependent data, such as exit status, container state, timestamp.

We should update containerd version and use the new execution api. This basically involves:

  1. Update containerd version to containerd/containerd#894
  2. Update current implementation to use new execution api.
  3. Write fake client for the new execution api and add unit test.

Add unit test for image management code.

The image management code is merged #21.

However, we don't have unit test for it, because:

  1. The fake containerd service is not ready.
  2. Containerd is still changing their image side api.

We should:

  1. Add unit test as much as we can for now, e.g. current ListImages and ImageStatus purely based on metadata store, it's testable.
  2. Add unit test with fake containerd service when containerd stabilizes their api and #20 is finished.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.