The cri's discuss from containerd

Add a document about how to use crictl with cri-containerd.

crictl is a CLI to interact with CRI directly for debugging purpose.

We should add a document to tell user/developer how to use crictl to help them debug.

Create a Makefile

We need a Makefile to manage the build, test, package and release process.

Create permanent network namespace for sandbox.

Currently, the network teardown logic assumes that once the network namespace is gone, we don't need to teardown the network again.

However, it doesn't work well with current implementation of cri-containerd. Currently, cri-containerd creates a sandbox container for each sandbox and use the network namespace of this container. But the problem is that, once the container dies, we'll not be able to find the network namespace by /proc/${PID}/ns/net.

So based on current logic, once the sandbox container dies itself, we'll not be able to find the network namespace, thus not teardown network for it. This will cause resource leakage.

We need to maintain a permanent network namespace for each sandbox, which is similar with cri-o.
We should remove the permanent network namespace as soon as we successfully tear down the network, so as to avoid multiple teardown to the same network in most cases.

/cc @xlgao-zju

Support Privileged.

Privileged is initially a Docker concept. Docker grants a group of permissions and mounts a group of directories into a privileged pod.

In Kubernetes, we should define our own Privileged kubernetes/kubernetes#44503. For now, we could do whatever docker is doing.

Things to do:

Figure out what docker is doing to Privileged container.
Do the same in cri-contained.

/cc @heartlock

boilerplate verification is needed.

The initial Makefile is merged in #7.

Currently, we already have lint and fmt verification. We need a boilerplate verification.

Ref https://github.com/kubernetes/repo-infra.

Enable Travis CI

We should enable travis to run presubmit build and unit test.

Upgrade containerd api.

Containerd is still making changes to API, e.g. containerd/containerd#1047, containerd/containerd#1062.

We should upgrade once more after they finalize the api.

Support UpdateContainerResources

A new CRI function UpdateContainerResources is added to update container resource constraints kubernetes/kubernetes#46105.

This is relatively lower priority for now, because Kubelet hasn't started using it yet.

However, since containerd already has corresponding interface, it's easy to do. So let's support it. :)

@kubernetes-incubator/maintainers-cri-containerd

Stop container with image default stop signal

Currently, when stopping a container, currently we send SIGTERM.

In the new image spec, a StopSignal field is added. After updating containerd, we should send the stop signal in the image config if it is specified.

The parameters of InitCNI should be filled in reverse order in NewCRIContainerdService function

In InitCNI, the first parameter "pluginDir " means the path of cni config file.
"cniDirs" means the path of bin.

func NewCRIContainerdService(containerdEndpoint, rootDir, networkPluginBinDir, networkPluginConfDir,
	streamAddress, streamPort string) (CRIContainerdService, error) {
        ......
        netPlugin, err := ocicni.InitCNI(networkPluginBinDir, networkPluginConfDir)     <--here
	

func InitCNI(pluginDir string, cniDirs ...string) (CNIPlugin, error) {
	plugin := probeNetworkPluginsWithVendorCNIDirPrefix(pluginDir, cniDirs, "")

Should not use dot in namespace.

Currently we are using k8s.io.

However, containerd will enforce dns label. containerd/containerd#1059

We should get rid of the dot on our side.

Use `repo-infra` when it is ready for our use case.

https://github.com/kubernetes/repo-infra is great tool.

Now we copied the boilerplate verification part into the repo, but removed the other go code verification.

It would be better to consolidate this in the future and only use repo-infra when it is ready for our use case.

Several related issue:

kubernetes/repo-infra#15
kubernetes/repo-infra#12
We don't need the build part now.
Subtree the repo causes DCO problem.

If we squash commits, @mikebrow found that he could not sign the squashed commit.
If we don't squash, that will be quite spammy.

Add Container Metrics Support

According to CRI, container runtime is responsible to provide container metrics, including:

container CPU usage.
container memory usage.
container writable layer size.
Image FS usage.

See here

Containerd should have provided all those information:

container CPU/memory usage: Provided through Prometheus metrics endpoint.
container writable layer size: Provided by snapshotter.
Image FS usage:
- May need to count size of each snapshot;
- May need to consider content store.

We should support these container metrics.

/cc @kubernetes-incubator/maintainers-cri-containerd

CRI-Containerd Missing Pieces

cri-containerd already supports basic container/sandbox lifecycle and image management today. However, there are still many missing pieces. List them here with relative proprieties:

Missing Features

Improvements

Containerd Missing Features

[P0] V2 Schema 1 image support. (Issue: #35, containerd/containerd#851)
[P1] Image (content/snapshot) garbage collection.
~~[P1] Containerd version. Containerd doesn't report semver because of a bug.~~

Cleanup logs and errors.

We should cleanup logs and errors:

We should reconsider log level, level 4 for list/status operations is too spammy;
We should make sure we are using %+v for struct, %q for string, and %v for error;
We should make sure necessary information is included in the log/error, e.g. container/sandbox/image id.

/cc @mikebrow

Additional work needed on image apis

add config.json image/image.config struct to the resolver helper
remaining nit TODOs
add auth impl
add list filters
add name resolution code

Cleanup the image management code to use containerd client

After #113, cri-containerd will be using containerd client.

However, there are still some image specific code needs to cleanup, e.g. getImageInfo, localResolve etc.

We should add required functions into containerd client, and clean those functions up.

/cc @kubernetes-incubator/maintainers-cri-containerd
@abhinandanpb

We need fake containerd services.

To better unit test functions of cri-containerd, we need a group of fake containerd services (similar with fake docker client)

Containerd now has 4 kinds of services:

content
execution
images
rootfs

We can add a fake execution service first, because the execution api is more stable. And add others later when image management code is added and needs test.

/cc @kubernetes-incubator/maintainers-cri-containerd
@heartlock Are you interested in this? This is a good start to get familiar with containerd and the code base.

Use context in the right way.

Currently, we pass the grpc handler context to containerd client directly.

Once the context is cancelled by the client, no functions calls could be made to containerd client.

The good thing is that, we can avoid proceeding when user want to cancel the context. However, the bad thing is that we the cleanup functions in defer will not be able to run, either.

Note the problem here, so that we could revisit this in the future.

Add `--version` flag.

Add cri-containerd version.

We should version cri-containerd based on the repo tag and commit number. The version will be used:

As part of the name when building a release package or any docker image in the future.
As part of --version output.

Refer kubernetes/node-problem-detector#71.

Use containerd client.

Containerd is adding a new client, which will make integration and our unit test much easier.

We should use it after it lands.

containerd/containerd#904

A practical way to create a Kubernetes cluster.

We need to design, implement and document a practical way to create a Kubernetes cluster using containerd as container runtime.

kubeadm seems to be the way to go, and https://www.projectatomic.io/blog/2017/06/using-kubeadm-with-cri-o/ is very good reference.

We can start to run some cluster e2e test after this is done.

/cc @kubernetes-incubator/maintainers-cri-containerd

Put containerd-shim into pod cgroup.

We should put containerd-shim into pod cgroup, so that we could charge the resource usage to pod instead.

Ref containerd/containerd#1032 and containerd/containerd#1134.

/cc @lanchongyizu

Update CRI version

Kubernetes 1.7 code freeze started, we should update CRI version to the newest version kubernetes/kubernetes#45614.

Support schema 1 manifest.

Currently, containerd only supports schema 2 manifest.

For cri-containerd, not supporting schema 1 means:

Users who are still using schema 1 image could not use cri-containerd integration.
Some system container images are still using schema 1 manifest, e.g. pause image.

Regarding to 1), it is fine for our alpha release (this quarter), but it should be fixed before beta (probably the end of the year).

For 2), we need to rebuild our schema 1 system container image.

Ref containerd/containerd#851.

Add support for annotations and labels

generateSandboxContainerSpec() is currently filtering all annotations received through the CRI api vs copying them to the runtime spec. Consider if some of the annotations should be filtered or if all should be copied over. For example:

annotations := config.GetAnnotations()
for key, value := range annotations {
 		g.AddAnnotation(key, value)
}

Similarly, labels received over CRI are not being processed / handed over to containerd.

Consider the patterns enabled/being used for kubernetes when connected to dockershim and CRI-O, should we be unique here?

Integrate with kubernetes network plugin.

CRI expects the runtime to handle container networking.

As is expected, containerd itself doesn't provide container networking. We should integrate cri-containerd with kubernetes network plugin to initialize the network namespace.

Things need to do:

Add flags for network plugin settings and initialize network plugin in cri-containerd. (See https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/dockershim/docker_service.go#L197-L204)
Implement interface needed by network plugin, include namespaceGetter and portMappingGetter. (See https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/dockershim/docker_service.go#L119-L137)
Setup and teardown network when run/stop pod sandbox. (See https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/dockershim/docker_sandbox.go#L128)

/cc @xlgao-zju

Change all Sirupsen to sirupsen

Ref

We should change vendored Sirupsen to sirupsen after dependencies are resolved.

Get rid of nsenter and socat

Currently we are using nsenter for both CNI plugin and port forword.

Ideally, we should be able to use NetNS.Do to run code inside a specific network namespace.

However, golang has an issue that it may spawn new threads from locked thread, which will inherit the network namespace. golang/go#20676

We should be able to get rid of all nsenters after this is fixed.

Pull Image Authentication

Containerd should have already supported pull image authentication. containerd/containerd#783

We should figure it out whether it's enough for us and use it.

Error happened when run `make lint`

Error happened when run make lint

root@ubuntu:/home/vcap/go-project/src/github.com/kubernetes-incubator/cri-containerd# make lint
checking lint
for directory ./cmd ...
for directory ./cmd/cri-containerd ...
WARNING: deadline exceeded by linter interfacer on ./cmd/cri-containerd (try increasing --deadline)
WARNING: deadline exceeded by linter gosimple on ./cmd/cri-containerd (try increasing --deadline)
WARNING: deadline exceeded by linter structcheck on ./cmd/cri-containerd (try increasing --deadline)
Makefile:31: recipe for target 'lint' failed
make: *** [lint] Error 2

check so slow that exceed deadline
@mikebrow @Random-Liu

Update ocicni vendoring

Update ocicni vendoring to master once cri-o/ocicni#1 is merged.

Add unit test back

Many of our unit tests rely on fake containerd services. Unit test does give us some values:

Make sure the options we pass to containerd are as expected;
Make sure the workflow is as expected especially for error handling.

However, since containerd is still in alpha, they are still getting feedbacks and may change api again and again, it's a pain and a bit waste of effort to maintain the fake services up to date.

We decided to temporarily remove the unit tests which use fake services, and rely on the integration test (CRI validation test and node e2e test soon) for now.

We'll add the unit test back after:

Containerd api is beta or GA.
cri-containerd starts to use containerd client.

The commit before removing unit test is cbd936b.

@mikebrow @yujuhong
/cc @stevvooe @crosbymichael

Developer Guide

It's too early to add a developer guide, but we should add one in the future when the build and development process is finalized.

Checkpoint and restart recovery

There are several restart recovery problems with current cri-containerd:

cri-containerd restart. Because cri-containerd maintains all internal state in-memory, including sandbox list, container list and image list, once restarted all state will be lost.
containerd restart. When containerd restart and reconnect, there may be state mismatch between containerd and cri-containerd, e.g. a container dies during containerd is down.

To fix this, we should recover/reconcile state during cri-containerd start or after containerd restart and reconnect.

There are 3 kinds of internal state:

Image list. Containerd has all the information we need, we just need to list images from containerd and recover the image list.
Sandbox/container metadata: Most of the metadata is not provided by containerd, we need to checkpoint them for restart recovery. However, because metadata is constant, we could save it into containerd container label so as to leverage containerd metadata store to save it for us.
Container status: Container status is not persisted by containerd, we need to persist it ourselves. And because it's constantly changing, we may not want to abuse containerd container label to save it. So we need to maintain its checkpoint ourselves.

/cc @kubernetes-incubator/maintainers-cri-containerd

Support NoNewPrivileges

Kubernetes 1.8 and runc both support it, we should update CRI version and support it.

Check whether permanent network namespace is removed in SandboxRemove

Currently, we stop sandbox container and remove permanent network namespace when stop sandbox.

When remove sandbox, we check whether sandbox container is still running, but we should also check whether network namespace is properly removed.

Maintain resolv.conf for pod.

Previously, Docker maintains a resolv.conf for containers sharing the same network namespace.

Now, when using containerd, we need to do these ourselves. We need to:

Maintain a resolv.conf when creating/removing sandbox if not using HostNetwork. And initialize resolv.conf with DNSOptions in PodSandboxConfig.
Bind mount the resolv.conf into each container.

This logic is similar with cri-o. (See https://github.com/kubernetes-incubator/cri-o/blob/master/server/sandbox_run.go#L144-L159)

/cc @xlgao-zju

Update Test Document for Node E2E

We added make test-e2e-node in #145.

We should document that in the test document https://github.com/kubernetes-incubator/cri-containerd/blob/master/docs/testing.md.

We need a fake OS.

We need a fake OS for unit test, similar with https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/container/testing/os.go.

Disable Pid namespace sharing.

Kubernetes enabled Pid namespace sharing inside a pod with Docker 1.13+.

However, some customers reported that they could not run their container, because their containers made assumptions about the pid 1.

Before resolving this issue, we could not use pid namespace sharing in production. In cri-containerd, we should also disable it by default, and probably introduce a flag to enable it.

An example cni config.

We need an example cni config in user guide.

Should we share UTS namespace?

In current implementation, we let container inside a sandbox share the same uts namespace. This is what we want in Kubernetes kubernetes/kubernetes#1615.

However, docker doesn't support UTS namespace sharing. We should figure out:

Why they don't share it?
Whether there is any problem with UTS sharing?

Rename `container` to `task`

The previous container concept in containerd is changed to task.
In the refactoring PR, I've changed most of them to task.

However, there are some left, we should clean them up.

CRI Containerd Integration TODOs

Required Internal Services:

[P0] Metadata store. A metadata store is required to store metadata and system state #9.
- [P0] In-memory metadata store #9.
- [P1] On-disk metadata store. (file-based or db-based)
[P1] Container manager. A container manager keeps containerd running in runtime cgroup. (See dockershim container manager)

Container Runtime Interface Functions:

However, we still need to do checkpoint for the state-dependent data, such as exit status, container state, timestamp.

We should update containerd version and use the new execution api. This basically involves:

Update containerd version to containerd/containerd#894
Update current implementation to use new execution api.
Write fake client for the new execution api and add unit test.

Add unit test for image management code.

The image management code is merged #21.

However, we don't have unit test for it, because:

The fake containerd service is not ready.
Containerd is still changing their image side api.

We should:

Add unit test as much as we can for now, e.g. current ListImages and ImageStatus purely based on metadata store, it's testable.
Add unit test with fake containerd service when containerd stabilizes their api and #20 is finished.

containerd / cri Goto Github PK

cri's Issues

Missing Features

Improvements

Containerd Missing Features

Recommend Projects

Recommend Topics

Recommend Org