spegel-org / spegel Goto Github PK

View Code? Open in Web Editor NEW

1.0K 10.0 49.0 2.49 MB

Stateless cluster local OCI registry mirror.

License: MIT License

Dockerfile 0.34% Makefile 0.33% Go 90.82% Smarty 1.68% Shell 6.83%

containerd docker kubernetes oci

spegel's People

Contributors

Stargazers

Watchers

Forkers

yohannhammad warmchang tculp doytsujin bryanasdev000 phillebaba noyoshi dimm0 avtakkar everesio zdtsw deepakdeore2004 andrewrothstein onedr0p brandond russpalms kalyann567 rsavage-nozominetworks k3s-io fatelei fly-open-devops lukaszraczylo bc185174 quynhdang-vt tool-chain-io zerodayyy jeffwan sonyafenge wolfguoliang kh185193 imneov andcolobok bittrance guettli stenh0use mysticaltech webclinic017 gozer stevenleizhang castieler chainguard-wolfi-bites-back mdzz110 frelon eliasbokreta arikgrahl gu9gu9gu gucio1200 vflaux buroa

spegel's Issues

Bootstrapping during outage

Spegel does currently not automatically deal with registry outages, if the registry which has an outage is the same one that the Spegel image is pulled from. The mirror configuration is written by Spegel, and for that to happen the image has to be present on the node.

This becomes a catch 22 scenario where scaling a cluster will be impossible because new nodes will not be able to pull images from the registry with the outage.

We need a solution to protect from these types of situations.

provide meaningful name for leader election configmap

current configmap name for leader election is kube-system-leader-election for kube-system namespace
providing helm-chart name as prefix would be more meaningful

Debug source of image

Knowing if Spegel is working or not may be a bit difficult as images fall back to the original registry. For debugging purposes it may be nice to after an image is pulled inspect if it was fetched through Spegel or not. One option could be to annotate or label the image to indicate when it is fetched with Spegel.

AWS Bottlerocket support

Good day!
Is it possible somehow configure spegel on AWS Bottlerocket nodes? If not - are you planning add this support?
Thanks!

IP used in peer key

Currently the Pod IP is used in the peer key for routing. This means that every time that the Spegel instance is restarted the IP changes and the peer identifier changes. This seems wasteful as the Pod is going to run on the same node with the same images advertised. We should explore what the impact is of the current solutions and if there are any benefits of changing to using the Node IP as the peer key.

CRI-O support

Currently Spegel only supports Containerd. CRI-O is another popular container runtime. This should also be supported as an alternative backened. It has the benefit of also forcing the code architecture to not be centered around Containerd, enabling other runtimes to also be implemented.

Question - Image Layer Locations

LMK if there is a better way to ask this, but my team was wondering how images are stored on nodes in the cluster. If you pull an image to a node, and its not in the spegel cache in the cluster, does that image + all associated layers stay on that one node, or does spegel try to distribute the layers throughout all nodes in the cluster?

Wondering because our application runs as 1 pod / node a lot of times, and was wondering if all the layers of that pod's containers are kept on that node. We have ML compute pods + nodes coming up and down a lot so if the images all stay on the same node, then we would have a lot of cold start situations.

Thanks!

topologyKeys on service should be optional

topologyKeys on service chart template, should be optional, since not supported by all potential configurations

Unable to deploy to Kapsule with a release greate than 0.0.9

Hi team,
We are using Spegel on top of Scaleway Kapsule cluster. Behavior was great until we tried to upgrade from 0.0.9 to 0.0.11 or 0.0.12.
The upgrade failed, getting this error in registry container:
Search...
{"level":"error","ts":1694005879.5690522,"caller":"build/main.go:72","msg":"","error":"Containerd registry config path needs to be set for mirror configuration to take effect"

Since containerd settings can be managed by end users, any idea how to handle this ? Any possibilité to go back the the 0.0.9 behavior which was not facing such issue ?
Thansk for your support

Waiting for CNI driver delays Spegel startup - mirror not used

Hi,

Spegel looks great! But I'm on a test cluster and have noticed that it's not starting up quickly enough on a fresh node to get any benefit from it - by the time Spegel has started and registered the mirrors the pods that I want to pull from Spegel have already started pulling from the default registry

The startup process seems to be blocking on my CNI driver (Calico) starting up:

Warning  NetworkNotReady    16s (x12 over 38s)  kubelet             network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized

This more or less happens in parallel to my pods starting up, so by the time Spegel is running my pods have pulled from the default registry and bypassed Spegel.

My workaround so far is to enable hostNetwork: true in the Spegel daemonset spec - this stops it having a dependency on Calico, and it starts more or less immediately (before my pods at least)

I assume not everyone uses Calico so changing Spegel to use hostNetwork by default may not be desired? Could a flag be added to the helm chart to enable this instead? Bear in mind that service.registry.port needs setting to 30020 for this to work, but otherwise it does seem to work in some limited testing..

imagePull from private repo is slow second time

imagepull second time from another node in same cluster should be fast but it took same time as of downloading from remote repo, looks like caching didnt work

here are the logs from spegel pod for xxx.yyy.io/app-image-k8:dev_123 image (masked repo and image details)

any pointers to look into what can be the issue?

{"level":"error","ts":1696247240.4278097,"caller":"[email protected]/logger.go:62","msg":"","path":"/v2/app-image-k8/manifests/dev_123","status":404,"method":"HEAD","latency":5.000973923,"ip":"10.14.130.153","handler":"mirror","error":"could not resolve mirror for key: xxx.yyy.io/app-image-k8:dev_123","stacktrace":"github.com/xenitab/pkg/gin.Logger.func1\n\t/go/pkg/mod/github.com/xenitab/pkg/[email protected]/logger.go:62\ngithub.com/gin-gonic/gin.(*Context).Next\n\t/go/pkg/mod/github.com/gin-gonic/[email protected]/context.go:174\ngithub.com/gin-gonic/gin.(*Engine).handleHTTPRequest\n\t/go/pkg/mod/github.com/gin-gonic/[email protected]/gin.go:620\ngithub.com/gin-gonic/gin.(*Engine).ServeHTTP\n\t/go/pkg/mod/github.com/gin-gonic/[email protected]/gin.go:576\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2936\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1995"}

{"level":"error","ts":1696247731.0288842,"caller":"registry/registry.go:211","msg":"mirror failed attempting next","error":"expected mirror to respond with 200 OK but received: 500 Internal Server Error","stacktrace":"github.com/xenitab/spegel/internal/registry.(*Registry).handleMirror.func2\n\t/build/internal/registry/registry.go:211\nnet/http/httputil.(*ReverseProxy).modifyResponse\n\t/usr/local/go/src/net/http/httputil/reverseproxy.go:324\nnet/http/httputil.(*ReverseProxy).ServeHTTP\n\t/usr/local/go/src/net/http/httputil/reverseproxy.go:490\ngithub.com/xenitab/spegel/internal/registry.(*Registry).handleMirror\n\t/build/internal/registry/registry.go:217\ngithub.com/xenitab/spegel/internal/registry.(*Registry).registryHandler\n\t/build/internal/registry/registry.go:137\ngithub.com/gin-gonic/gin.(*Context).Next\n\t/go/pkg/mod/github.com/gin-gonic/[email protected]/context.go:174\ngithub.com/xenitab/spegel/internal/registry.(*Registry).metricsHandler\n\t/build/internal/registry/registry.go:271\ngithub.com/gin-gonic/gin.(*Context).Next\n\t/go/pkg/mod/github.com/gin-gonic/[email protected]/context.go:174\ngithub.com/gin-gonic/gin.CustomRecoveryWithWriter.func1\n\t/go/pkg/mod/github.com/gin-gonic/[email protected]/recovery.go:102\ngithub.com/gin-gonic/gin.(*Context).Next\n\t/go/pkg/mod/github.com/gin-gonic/[email protected]/context.go:174\ngithub.com/slok/go-http-metrics/middleware/gin.Handler.func1.1\n\t/go/pkg/mod/github.com/slok/[email protected]/middleware/gin/gin.go:17\ngithub.com/slok/go-http-metrics/middleware.Middleware.Measure\n\t/go/pkg/mod/github.com/slok/[email protected]/middleware/middleware.go:117\ngithub.com/slok/go-http-metrics/middleware/gin.Handler.func1\n\t/go/pkg/mod/github.com/slok/[email protected]/middleware/gin/gin.go:16\ngithub.com/gin-gonic/gin.(*Context).Next\n\t/go/pkg/mod/github.com/gin-gonic/[email protected]/context.go:174\ngithub.com/xenitab/pkg/gin.Logger.func1\n\t/go/pkg/mod/github.com/xenitab/pkg/[email protected]/logger.go:28\ngithub.com/gin-gonic/gin.(*Context).Next\n\t/go/pkg/mod/github.com/gin-gonic/[email protected]/context.go:174\ngithub.com/gin-gonic/gin.(*Engine).handleHTTPRequest\n\t/go/pkg/mod/github.com/gin-gonic/[email protected]/gin.go:620\ngithub.com/gin-gonic/gin.(*Engine).ServeHTTP\n\t/go/pkg/mod/github.com/gin-gonic/[email protected]/gin.go:576\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2936\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1995"}

Readiness probe based on peer population

A Spegel instance with an empty routing table has little to no use, as it will never be able to discover any other peers. For this reason it would be good for the readiness probe to only give a 200 response only when the routing table is not empty.

Upstream to containerd?

This looks like a great solution to a common problem! Do you think it would be worthwhile to aim for an upstream contribution to containerd?

Reduce host permissions

Currently Spegel requires access to the Containerd sock to function properly. This mean that it is basically root on the node. In reality Spegel only needs read only access to all of the layers and tags. The issue is that the containerd client is required to do tag resolution.

One option would be to switch over to using the CRI API. One drawback currently is that it does not have an event subscription service which we currently use with the containerd client.

If tags could be listed without the use of the containerd client the blobs directory could be mounted read only to the container and new layers could be detected with the help of a file watcher.

Allow serving data from self

With the new feature enabling Spegel to fallback on mirroring from another node, we end up with an interesting situation. Currently every instance will filter out itself when mirroring requests. This is not useful if the request is already coming from another node. In these situation we should actually allow proxying to the same instance.

One solution is to allow mirroring of all requests. The other is to add a client ip check or a header to the request to allow proxying to the same instance.

Spegel pod gets into CrashLoopBackOff with k3s

Hi there,

I have a k3s cluster, where I want to use spegel. I install spegel as described in README using helm upgrade --create-namespace --namespace spegel --install --version v0.0.11 spegel oci://ghcr.io/xenitab/helm-charts/spegel

However, the pod gets into CrashLoopBackOff with the following error, when I check the logs:

Defaulted container "registry" out of: registry, configuration (init)
{"level":"error","ts":1695750332.7763488,"caller":"build/main.go:72","msg":"","error":"rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService","stacktrace":"main.main\n\t/build/main.go:72\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}

How can I solve this problem? I would appreciate any help.

Thanks a lot!

imagePullPolicy Always gets cached manifest

Hello,

Thanks for this awesome project :)

We started to PoC spegel in our environments, but discovered that some of our devs were relying on the infamous combo

imagePullPolicy: Always
image: myimage:latest

it seems that in that case, the pod still gets the old image.

I assume that is because the manifest is also cached and not pulled from the registry every time ?

Can/could we optionally change the behavior and always request the manifests to the registry ?

not sure if that is the right way, but, allowing to switch from :
capabilities = ["pull", "resolve"]
to
capabilities = ["pull"]
?
https://github.com/XenitAB/spegel/blob/v0.0.9/internal/oci/containerd.go#L278

Thanks !

Add persistence

Hi there,

Spegel sounds like a great project and we are interested in using it to avoid Docker Hub's rate limiting on Azure AKS which comes by the end of September.
During regular operations, we should not run into limits. However, we fear that this may happen in case of a disaster recovery. E.g. the whole cluster burns down and everything needs to be set up from scratch asap.

This would be another great use case for Spegel. However, if I got it right, Spegel does not provide a persistence layer. Is it possible to add persistence to Spegel such that a recovery via k8s level backup tools like Velero would be possible?

I hope that this might be a quick win simply by providing parts of the file system via a PV? Or is it all in-memory and there is no easy way to persist the local images?

All the best! Continue the great work.
Niklas

Sign image and Helm chart

Both the image and Helm chart should be signed during release.

Failed to walk image manifests

Hi! I just installed spegel to my cluster, and some of the pods in the daemonset get this cryptic error and crash loop

Defaulted container "registry" out of: registry, configuration (init)
{"level":"info","ts":1688689161.8643885,"logger":"p2p","caller":"routing/p2p.go:63","msg":"starting p2p router","id":"/ip4/:)/tcp/5001/p2p/12D3KooWD6pXCqqMcVeBXtoNQQJRX2fTKfj5Dc98WCs7yoTQLGcy"}
{"level":"info","ts":1688689161.8646646,"caller":"build/main.go:160","msg":"running registry","addr":":5000"}
I0707 00:19:21.865702       1 leaderelection.go:245] attempting to acquire leader lease spegel/spegel-leader-election...
{"level":"info","ts":1688689161.8717268,"caller":"state/state.go:44","msg":"running scheduled image state update"}
{"level":"error","ts":1688689162.1034093,"caller":"build/main.go:68","msg":"","error":"failed to update all images: failed to walk image manifests: content digest sha256:190d5c74da93b56c3fcd4e08603834c9b20a5149214a369a5705b37ed62c2c72: not found","stacktrace":"main.main\n\t/build/main.go:68\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}

Any idea what this is about? Thanks!

Proxy Retry

Currently the mirror handler will forward the HTTP request to the first node it discovers. If the request to the node fails Containerd will fallback to the original registry. A better solution would be to retry the request to a different node which is discovered later.

Evaluate future of libp2p-kad-dht

There is a new project https://github.com/plprobelab/go-kademlia that aims to build a standardized generic Kademlia implementation. It looks like libp2p2 kad will refactor to use the new library according to libp2p/go-libp2p-kad-dht#856. I am currently not sure but I think we could just switch to using go-kademlia if it offers all of the features that we require. This may simplify some of the routing implementations as the new API is simpler.

WASM Support

It is currently possible to run mixed Kubernetes clusters with nodes running WASM containers using krustlet. This is for example supported in AKS. The runwasi project has however made the process a lot simpler by enabling WASM workloads to run in containerd.

As there is already support for packaging WASM in an OCI format support for WASM would be trivial. In the case of runwasi I think it would just work out of the box as it would be Containerd who would to the actual image pulling.

This feature is currently in the early days of development so it may take a while until Spegel supports WASM containers.

GKE Support

Currently it does not seem like there is an easy way to support GKE. When creating a cluster the Containerd config looks like the following.

version = 2
required_plugins = ["io.containerd.grpc.v1.cri"]
# Kubernetes doesn't use containerd restart manager.
disabled_plugins = ["io.containerd.internal.v1.restart"]
oom_score = -999

[debug]
  level = "info"

[grpc]
  gid = 412

[plugins."io.containerd.grpc.v1.cri"]
  stream_server_address = "127.0.0.1"
  max_container_log_line_size = 262144
  sandbox_image = "gke.gcr.io/pause:3.6@sha256:10008c36b4611b44db1229451675d5d7d86c7ddf4ef00f883d806a01547203f6"
[plugins."io.containerd.grpc.v1.cri".cni]
  bin_dir = "/home/kubernetes/bin"
  conf_dir = "/etc/cni/net.d"
  conf_template = "/home/containerd/cni.template"
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
  endpoint = ["https://mirror.gcr.io","https://registry-1.docker.io"]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
  SystemdCgroup = false
[plugins."io.containerd.grpc.v1.cri".containerd]
  default_runtime_name = "runc"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
  runtime_type = "io.containerd.runc.v2"

The issue here is that it does not configure the prerequisite config_path. Without it Containerd will not look for any mirror configuration.

One interesting point to notice is that GKE is using the old mirror syntax to configure a specific mirror for Docker Hub. This is most likely a part of a feature they offer as they will cache Docker Hub images to avoid rate limiting. It would be better if they used the newer mirror configuration method. As even if Spegel would override this, it could be configured to when run in a GKE cluster keep the last mirror fallback for Docker Hub.

Cant install spegel

Hello,

I try to install spegel using helm upgrade --create-namespace --namespace spegel --install --version v0.0.12 spegel oci://ghcr.io/xenitab/helm-charts/spegel however I get the following error. I tried on 2 different machines. Both giving me the same error. Am I missing something?

Error: failed to authorize: failed to fetch anonymous token: unexpected status from GET request to https://ghcr.io/token?scope=repository%3Auser%2Fimage%3Apull&scope=repository%3Axenitab%2Fhelm-charts%2Fspegel%3Apull&service=ghcr.io: 403 Forbidden

Clean up containerd mirror configuration

Spegel makes persistent changes to the hosts filesystem. This is required to setup the mirror configuration. The mirror configuration should be present on the node for as long as Spegel is being used. A good practice however would be to revert the mirror configuration back to its pre Spegel state if Spegel ever was removed. This is especially useful for those who are just evaluating Spegel but chose not to use it. Keeping the mirror configuration will not break the cluster as it would always fallback to the original registry, and the configuration would be removed when old nodes are removed from the cluster.

The old solution was to add an option to remove the configuration files during shutdown. While a good option it wasn't optimal as it is beneficial to keep the configuration during versions updates of Spegel, as it would fallback to other Spegel instances in the cluster.

An alternative solution is to create a Helm uninstall hook which would run and remove the configuration from all nodes when the Helm chart is removed. This has the benefit of cleaning up stateful changes and returning the node configuration to the state it existed in before Spegel.

Cached image not used

I'm getting a mixture of 404 and 500 when trying to pull image. It only happens for my own image and I think I can see some successful image pulls.

Logs from one spegel pod:

{"level":"info","ts":1699687666.3606877,"logger":"p2p","caller":"routing/p2p.go:59","msg":"starting p2p router","id":"/ip4/10.22.130.68/tcp/5001/p2p/12D3KooWKxpkvZfxSMMYutnYGpfsdRJP4cZJ46NGZStL8N8D6hie"}
{"level":"info","ts":1699687666.3609054,"caller":"build/main.go:169","msg":"running registry","addr":":5000"}
I1111 07:27:46.361268       1 leaderelection.go:245] attempting to acquire leader lease spegel/spegel-leader-election...
{"level":"info","ts":1699687666.3617876,"caller":"state/state.go:42","msg":"running scheduled image state update"}
{"level":"error","ts":1699687671.4085758,"caller":"[email protected]/logger.go:62","msg":"","path":"/v2//REDACTED/server/manifests/sha256:d138dcd6666c3050e31fa2e30963aabd33aa23dba34bce0904cb90429622b5c5","status":500,"method":"GET","latency":3.177211296,"ip":"10.22.136.66","handler":"mirror","error":"mirror resolve retries exhausted for key: sha256:d138dcd6666c3050e31fa2e30963aabd33aa23dba34bce0904cb90429622b5c5","stacktrace":"github.com/xenitab/pkg/gin.NewEngine.Logger.func1\n\t/go/pkg/mod/github.com/xenitab/pkg/[email protected]/logger.go:62\ngithub.com/gin-gonic/gin.(*Context).Next\n\t/go/pkg/mod/github.com/gin-gonic/[email protected]/context.go:174\ngithub.com/gin-gonic/gin.(*Engine).handleHTTPRequest\n\t/go/pkg/mod/github.com/gin-gonic/[email protected]/gin.go:620\ngithub.com/gin-gonic/gin.(*Engine).ServeHTTP\n\t/go/pkg/mod/github.com/gin-gonic/[email protected]/gin.go:576\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2938\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:2009"}
{"level":"info","ts":1699687671.4949071,"caller":"state/state.go:49","msg":"received image event","image":"gcr.io/datadoghq/agent:7.46.0@sha256:4e5f6127e43348b78fea1da5393dd2ff038dff72b78b42de5b11c7600412830a"}
{"level":"info","ts":1699687671.5828915,"caller":"state/state.go:49","msg":"received image event","image":"gcr.io/datadoghq/agent:7.46.0@sha256:4e5f6127e43348b78fea1da5393dd2ff038dff72b78b42de5b11c7600412830a"}
{"level":"info","ts":1699687671.6040404,"caller":"state/state.go:49","msg":"received image event","image":"gcr.io/datadoghq/agent"}
{"level":"error","ts":1699687683.8613865,"caller":"[email protected]/logger.go:62","msg":"","path":"/v2//REDACTED/server/blobs/sha256:6967698a904318aed4c1547be609141f22d40a9588e0787dcb801550a1bdca23","status":404,"method":"GET","latency":5.001035143,"ip":"10.22.136.66","handler":"mirror","error":"request closed for key: sha256:6967698a904318aed4c1547be609141f22d40a9588e0787dcb801550a1bdca23","stacktrace":"github.com/xenitab/pkg/gin.NewEngine.Logger.func1\n\t/go/pkg/mod/github.com/xenitab/pkg/[email protected]/logger.go:62\ngithub.com/gin-gonic/gin.(*Context).Next\n\t/go/pkg/mod/github.com/gin-gonic/[email protected]/context.go:174\ngithub.com/gin-gonic/gin.(*Engine).handleHTTPRequest\n\t/go/pkg/mod/github.com/gin-gonic/[email protected]/gin.go:620\ngithub.com/gin-gonic/gin.(*Engine).ServeHTTP\n\t/go/pkg/mod/github.com/gin-gonic/[email protected]/gin.go:576\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2938\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:2009"}
{"level":"error","ts":1699687690.478658,"caller":"[email protected]/logger.go:62","msg":"","path":"/v2//REDACTED/server/blobs/sha256:dee98d4b974d4ac8e95c28a17e5dddc4e818169b6a8a9542c25eafd4fc2b1a58","status":404,"method":"GET","latency":5.00098084,"ip":"10.22.136.66","handler":"mirror","error":"request closed for key: sha256:dee98d4b974d4ac8e95c28a17e5dddc4e818169b6a8a9542c25eafd4fc2b1a58","stacktrace":"github.com/xenitab/pkg/gin.NewEngine.Logger.func1\n\t/go/pkg/mod/github.com/xenitab/pkg/[email protected]/logger.go:62\ngithub.com/gin-gonic/gin.(*Context).Next\n\t/go/pkg/mod/github.com/gin-gonic/[email protected]/context.go:174\ngithub.com/gin-gonic/gin.(*Engine).handleHTTPRequest\n\t/go/pkg/mod/github.com/gin-gonic/[email protected]/gin.go:620\ngithub.com/gin-gonic/gin.(*Engine).ServeHTTP\n\t/go/pkg/mod/github.com/gin-gonic/[email protected]/gin.go:576\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2938\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:2009"}
{"level":"error","ts":1699687690.4786735,"caller":"[email protected]/logger.go:62","msg":"","path":"/v2//REDACTED/server/blobs/sha256:d05bbff9997e08bbd4b8ddf12f496fd312eb5f691d236dd8d3e29b1ac8b91b9d","status":500,"method":"GET","latency":4.992058285,"ip":"10.22.136.66","handler":"mirror","error":"mirror resolve retries exhausted for key: sha256:d05bbff9997e08bbd4b8ddf12f496fd312eb5f691d236dd8d3e29b1ac8b91b9d","stacktrace":"github.com/xenitab/pkg/gin.NewEngine.Logger.func1\n\t/go/pkg/mod/github.com/xenitab/pkg/[email protected]/logger.go:62\ngithub.com/gin-gonic/gin.(*Context).Next\n\t/go/pkg/mod/github.com/gin-gonic/[email protected]/context.go:174\ngithub.com/gin-gonic/gin.(*Engine).handleHTTPRequest\n\t/go/pkg/mod/github.com/gin-gonic/[email protected]/gin.go:620\ngithub.com/gin-gonic/gin.(*Engine).ServeHTTP\n\t/go/pkg/mod/github.com/gin-gonic/[email protected]/gin.go:576\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2938\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:2009"}
{"level":"error","ts":1699687690.4786735,"caller":"[email protected]/logger.go:62","msg":"","path":"/v2//REDACTED/server/blobs/sha256:e7808094cb5d351220c95581e5edd2fdd6f0cbc53154400bf0842a171127e238","status":500,"method":"GET","latency":4.996645105,"ip":"10.22.136.66","handler":"mirror","error":"mirror resolve retries exhausted for key: sha256:e7808094cb5d351220c95581e5edd2fdd6f0cbc53154400bf0842a171127e238","stacktrace":"github.com/xenitab/pkg/gin.NewEngine.Logger.func1\n\t/go/pkg/mod/github.com/xenitab/pkg/[email protected]/logger.go:62\ngithub.com/gin-gonic/gin.(*Context).Next\n\t/go/pkg/mod/github.com/gin-gonic/[email protected]/context.go:174\ngithub.com/gin-gonic/gin.(*Engine).handleHTTPRequest\n\t/go/pkg/mod/github.com/gin-gonic/[email protected]/gin.go:620\ngithub.com/gin-gonic/gin.(*Engine).ServeHTTP\n\t/go/pkg/mod/github.com/gin-gonic/[email protected]/gin.go:576\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2938\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:2009"}
{"level":"error","ts":1699687697.153541,"caller":"[email protected]/logger.go:62","msg":"","path":"/v2//REDACTED/server/blobs/sha256:1d98726e0d51e9317902923837e48db6ba3080104d83cab6e95beb91c744bb2a","status":500,"method":"GET","latency":1.386007186,"ip":"10.22.136.66","handler":"mirror","error":"mirror resolve retries exhausted for key: sha256:1d98726e0d51e9317902923837e48db6ba3080104d83cab6e95beb91c744bb2a","stacktrace":"github.com/xenitab/pkg/gin.NewEngine.Logger.func1\n\t/go/pkg/mod/github.com/xenitab/pkg/[email protected]/logger.go:62\ngithub.com/gin-gonic/gin.(*Context).Next\n\t/go/pkg/mod/github.com/gin-gonic/[email protected]/context.go:174\ngithub.com/gin-gonic/gin.(*Engine).handleHTTPRequest\n\t/go/pkg/mod/github.com/gin-gonic/[email protected]/gin.go:620\ngithub.com/gin-gonic/gin.(*Engine).ServeHTTP\n\t/go/pkg/mod/github.com/gin-gonic/[email protected]/gin.go:576\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2938\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:2009"}
{"level":"error","ts":1699687697.153541,"caller":"[email protected]/logger.go:62","msg":"","path":"/v2//REDACTED/server/blobs/sha256:077b5992cbfc562fac5ca434250dfc92acb83b59d7de408c2495a1a873186e55","status":404,"method":"GET","latency":5.000501557,"ip":"10.22.136.66","handler":"mirror","error":"request closed for key: sha256:077b5992cbfc562fac5ca434250dfc92acb83b59d7de408c2495a1a873186e55","stacktrace":"github.com/xenitab/pkg/gin.NewEngine.Logger.func1\n\t/go/pkg/mod/github.com/xenitab/pkg/[email protected]/logger.go:62\ngithub.com/gin-gonic/gin.(*Context).Next\n\t/go/pkg/mod/github.com/gin-gonic/[email protected]/context.go:174\ngithub.com/gin-gonic/gin.(*Engine).handleHTTPRequest\n\t/go/pkg/mod/github.com/gin-gonic/[email protected]/gin.go:620\ngithub.com/gin-gonic/gin.(*Engine).ServeHTTP\n\t/go/pkg/mod/github.com/gin-gonic/[email protected]/gin.go:576\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2938\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:2009"}
{"level":"error","ts":1699687701.3775723,"caller":"[email protected]/logger.go:62","msg":"","path":"/v2//REDACTED/server/blobs/sha256:64d6ec39685a5d38dc24f9f5987cd01d6bc84d7a0dc21398f977400549cb22eb","status":500,"method":"GET","latency":3.272260102,"ip":"10.22.136.66","handler":"mirror","error":"mirror resolve retries exhausted for key: sha256:64d6ec39685a5d38dc24f9f5987cd01d6bc84d7a0dc21398f977400549cb22eb","stacktrace":"github.com/xenitab/pkg/gin.NewEngine.Logger.func1\n\t/go/pkg/mod/github.com/xenitab/pkg/[email protected]/logger.go:62\ngithub.com/gin-gonic/gin.(*Context).Next\n\t/go/pkg/mod/github.com/gin-gonic/[email protected]/context.go:174\ngithub.com/gin-gonic/gin.(*Engine).handleHTTPRequest\n\t/go/pkg/mod/github.com/gin-gonic/[email protected]/gin.go:620\ngithub.com/gin-gonic/gin.(*Engine).ServeHTTP\n\t/go/pkg/mod/github.com/gin-gonic/[email protected]/gin.go:576\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2938\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:2009"}

Events from my pod:

  Normal   Pulling            2m6s                   kubelet             Pulling image "us-docker.pkg.dev/agones-images/release/agones-sdk:1.34.0"
  Normal   Pulled             118s                   kubelet             Successfully pulled image "us-docker.pkg.dev/agones-images/release/agones-sdk:1.34.0" in 7.60820591s (7.608215832s including waiting)
  Normal   Created            118s                   kubelet             Created container agones-gameserver-sidecar
  Normal   Started            118s                   kubelet             Started container agones-gameserver-sidecar
  Normal   Pulling            118s                   kubelet             Pulling image "us-docker.pkg.dev/REDACTED/server:b25e9d50444856962c9861b6e0842f289490f014"
  Normal   Pulled             92s                    kubelet             Successfully pulled image "us-docker.pkg.dev/REDACTED/server:b25e9d50444856962c9861b6e0842f289490f014" in 26.130419751s (26.130431392s including waiting)
  Normal   Created            92s                    kubelet             Created container days
  Normal   Started            91s                    kubelet             Started container days

Node contaierd config:

Environment is:
AWS EKS
Karpenter

failed to update all images: failed to walk image manifests: unexpected media type for digest

I installed spegel 0.0.9 from Helm. It started on most of the nodes, but on some it failed:

{"level":"error","ts":1691505526.569151,"caller":"build/main.go:68","msg":"","error":"failed to update all images: failed to walk image manifests: unexpected media type for digest: sha256:17e484fedc7ecaa8c83404aa85f0b615743ce723c05d3735436426f12d9e9c77","stacktrace":"main.main\n\t/build/main.go:68\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}

I don't know how to troubleshoot this further.

Support mirroring requests from Policy Controller

Using cosign to sign OCI artifacts is very useful, especially when using registry mirrors, as a way to verify that the artifact received comes from a trusted source. Currently Spegel has not been verified with cosign. After having a quick look it seems like it should be an issue, but it should be verified to work and tested.

Use image digest in Helm chart

Currently the image tag is used in the Helm chart. This is not good practice and it is better to use the image digest instead. The release GitHub action should be updated to include the digest.

Getting lots of 500s "mirror resolve retries exhausted for key"

Hello, trying out spegel and facing these errors across the map:

{"level":"info","ts":1699886843.334517,"caller":"registry/registry.go:174","msg":"handling mirror request from external node","path":"/v2/p2m-qa/silo-p2m-qa-subtemplate-silo/manifests/sha256:2520965b7b02cd4c5090f635622c92c4dfd2446a2d9332106ae140a1e1eb7bc4","ip":"100.96.35.1"}
{"level":"error","ts":1699886843.3367004,"caller":"[email protected]/logger.go:62","msg":"","path":"/v2/p2m-qa/silo-p2m-qa-subtemplate-silo/manifests/sha256:2520965b7b02cd4c5090f635622c92c4dfd2446a2d9332106ae140a1e1eb7bc4","status":500,"method":"GET","latency":0.002227242,"ip":"100.96.35.1","handler":"mirror","error":"mirror resolve retries exhausted for key: sha256:2520965b7b02cd4c5090f635622c92c4dfd2446a2d9332106ae140a1e1eb7bc4","stacktrace":"github.com/xenitab/pkg/gin.NewEngine.Logger.func1\n\t/go/pkg/mod/github.com/xenitab/pkg/[email protected]/logger.go:62\ngithub.com/gin-gonic/gin.(*Context).Next\n\t/go/pkg/mod/github.com/gin-gonic/[email protected]/context.go:174\ngithub.com/gin-gonic/gin.(*Engine).handleHTTPRequest\n\t/go/pkg/mod/github.com/gin-gonic/[email protected]/gin.go:620\ngithub.com/gin-gonic/gin.(*Engine).ServeHTTP\n\t/go/pkg/mod/github.com/gin-gonic/[email protected]/gin.go:576\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2938\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:2009"}

Containers:
  p2m-qa-subtemplate-silo:
    Container ID:  containerd://09110087199b09a31ce246c2be47d4046116f9bc64d927b4262036f56028e20b
    Image:         privateharborrepo.com/p2m-qa/silo-p2m-qa-subtemplate-silo:70
    Image ID:      privateharborrepo.com/p2m-qa/silo-p2m-qa-subtemplate-silo@sha256:2520965b7b02cd4c5090f635622c92c4dfd2446a2d9332106ae140a1e1eb7bc4

Containerd config.toml:

## template: jinja

# Use config version 2 to enable new configuration fields.
# Config file is parsed as version 1 by default.
version = 2

imports = ["/etc/containerd/conf.d/*.toml"]

[plugins]
  [plugins."io.containerd.grpc.v1.cri"]
    sandbox_image = "projects.registry.vmware.com/tkg/pause:3.6"
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
    runtime_type = "io.containerd.runc.v2"
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
    SystemdCgroup = true
  [plugins."io.containerd.grpc.v1.cri".registry]
    config_path = "/etc/containerd/certs.d"
  [plugins."io.containerd.grpc.v1.cri".containerd]
    discard_unpacked_layers = false

spegel-values.yaml:

...
spegel:
  # -- Registries for which mirror configuration will be created.
  registries:
    - https://docker.io
    - https://ghcr.io
    - https://quay.io
    - https://mcr.microsoft.com
    - https://public.ecr.aws
    - https://gcr.io
    - https://registry.k8s.io
    - https://k8s.gcr.io
    - https://lscr.io
    - https://privateharborrepo.com
...

Is there something obvious that we're missing here? It seems to work just fine at other times, although I am still trying to figure out what defines "fine" :)
self hosted Harbor Version v2.7.1-6015b3ef - changed the url for obvious reasons
nodes: v1.23.10+vmware.1 Ubuntu 20.04.5 LTS 5.4.0-135-generic containerd://1.6.6

Explore Containerd HTTPS change

Changes were made to defaulting to https in Containerd 1.7.7 which we need to verify is not an issue.

containerd/containerd#9188

Private images?

Hi,

Is it possible to have private images (the ones that can only be pulled with a password) to not be cached? Seems like spegel is making it possible to pull the private images without any password from cache.

Memory leaking leads to OOMs

Somehow all spegel's pods in my test cluster have linearly increasing memory usage that already killed one of pods twice in 1 day due to OOM (limit is 320Mi).

How can I help to debug such issue?

Support SOCI Snapshotter

SOCI Snapshotter is an interesting project which could reduce Pod startup time. At a glance it seems like Spegel should be able to support SOCI ontop of running a normal registry. The main question is if there are users who want this.

https://github.com/awslabs/soci-snapshotter

Mutiple reverse proxy retries

Currently a reverse proxy request is only done once. This means that a failed request will fallback to going out to the source registry to complete the request. It would be good to retry x number of times to different nodes which have the requested resource.

Single prefer local service

With the old service topology feature in Kubernetes it was possible to create a Service which would "prefer local". What that means is that a NodePort service could be created which when called upon would always route to the node local Pod if possible. If that were to fail it would route to a Pod in the cluster. This was a great feature that was sadly left out of the new solution as stated here.

https://discuss.kubernetes.io/t/deprecated-topologykeys-alternative-for-node-locality/18184

This feature may be re introduced in the future, and when that happens we should use it instead of having to mirror registry configurations.

v0.0.15 arm64 build appears to be incorrect

The pods crashloop with configuration exec ./spegel: exec format error, which is usually a sign that they have been built with the wrong arch in mind.

docker inspect shows that it's built with arm in mind though:

        "Architecture": "arm64",
        "Os": "linux",

Reverting to 0.0.14 immediately fixes the issue.

Broken digest: sha256:f7e32e17ee8b9a64efdc7958f6fa1099cfeaaea3854841b3d47bcb7dba31fcbc

Working digest: sha256:18d64359225e16155ed2a138fb88bba163353c8fb79913197148c4ee9c73f600

Document performance measuring

Currently it is difficult to measure performance differences between using Spegel and not using Spegel.

With the merging of containerd/containerd/pull/7313 it will be easier to measure pull speed per registry. This feature will be released in containerd 1.7. When the version is released the performance measuring should be documented for end users to compare the end result.

pod CrashLoopBackOff on EKS v1.27.4 and v1.24.16-eks

spegel pods are in crashlookbackoff with below error

Defaulted container "registry" out of: registry, configuration (init)
{"level":"error","ts":1696142954.9523215,"caller":"build/main.go:72","msg":"","error":"Containerd registry config path is /etc/containerd/certs.d:/etc/docker/certs.d but needs to be /etc/containerd/certs.d for mirror configuration to take effect","stacktrace":"main.main\n\t/build/main.go:72\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}

helm chart version used to install

helm upgrade --install --namespace kube-system --version v0.0.11 spegel oci://ghcr.io/xenitab/helm-charts/spegel

image: ghcr.io/xenitab/spegel@sha256:8c73772fbf07b8b0458a26dc991b48f06b407082f3f554a0ee3bffd1ff89bf91

containerd config file:

# cat /etc/containerd/config.toml
version = 2
root = "/var/lib/containerd"
state = "/run/containerd"

[grpc]
address = "/run/containerd/containerd.sock"

[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "runc"
discard_unpacked_layers = true

[plugins."io.containerd.grpc.v1.cri"]
sandbox_image = "602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/pause:3.5"

[plugins."io.containerd.grpc.v1.cri".registry]
config_path = "/etc/containerd/certs.d:/etc/docker/certs.d"

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true

[plugins."io.containerd.grpc.v1.cri".cni]
bin_dir = "/opt/cni/bin"
conf_dir = "/etc/cni/net.d"

Update appVersion on new release

Should the appVersion get updated to the latest version for every release that is cut? Right now it is still at 0.0.1 and the image is not working with the later helm chart releases. I am manually changing the tag on my end, but would be nice if it just worked without having to do that.

Support for Bottlerocket

Currently Spegel does not work with Bottlerocket when run with EKS.

https://github.com/bottlerocket-os/bottlerocket/blob/develop/README.md

This needs to be explored further to see if it is possible to support.

Q: Is it possible to selective disable part of metrics?

Is it possible to disable libp2p_* prefix in /metrics response? It have too much not-so-helpfull data for us, so it would be handy to enable it only for debugging purposes.

And thank you for great project!

Document using Spegel with pull through registries

After carefully reviewing the following containerd-related errors, I deleted the configurations related to "mirror" in /etc/rancher/k3s/registries.yaml and /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl. Now, spegel is working properly.
time="2023-11-20T23:28:55.907708354+08:00" level=warning msg="failed to load plugin io.containerd.grpc.v1.cri" error="invalid plugin config: `mirrors` cannot be set when `config_path` is provided"

That's a bit unfortunate since the docs state:

Spegel does not aim to replace projects like Harbor or Zot but instead complements them.

I was hoping to use spegel in my cluster and zot as a pull thru cache deployed elsewhere. For what it's worth this is my current containerd mirrors:

mirrors:
  docker.io:
    endpoint:
      - https://zot.domain.tld/v2/docker.io
  ghcr.io:
    endpoint:
      - https://zot.domain.tld/v2/ghcr.io
  quay.io:
    endpoint:
      - https://zot.domain.tld/v2/quay.io
  gcr.io:
    endpoint:
      - https://zot.domain.tld/v2/gcr.io
  registry.k8s.io:
    endpoint:
      - https://zot.domain.tld/v2/registry.k8s.io
  public.ecr.aws:
    endpoint:
      - https://zot.domain.tld/v2/public.ecr.aws

I don't see a way to have spegel take over this responsibility it seems like you either have spegel or a pull thru cache, maybe this can be a feature request?

Originally posted by @onedr0p in #212 (comment)

Requests to local spegel instance are being detected as external with cilium

I tried Spegel recently and found that when a node pulls an image (etcd in this case), the Spegel instance on the node detects it as an external request:

{"level":"info","ts":1697181744.1945643,"caller":"registry/registry.go:174","msg":"handling mirror request from external node","path":"/v2/etcd-development/etcd/blobs/sha256:47ba7aff063ffd3883d25edc17bc0a92a4b76d48bcc89e8a21a149db234576bd","ip":"10.65.10.213"}

The IP (10.65.10.213) is the one assigned to the cilium_host interface.

I'm not sure how to fix this. Is someone here using Spegel with Cilium?

support for servicemonitor labels, scrape interval and scrape timeout

serviceMonitor object needs to have support to add extra labels for prometheus to discover target, current helm chart doesn't support that

Nodeport service is broken with proxy-mode=ipvs

I observe constant timeouts while containerd trying to access 127.0.0.1:30021 local mirror:

root@myhostname:~# journalctl -r -u containerd |grep 30021 |head
Nov 01 00:29:36 myhostname containerd[2413588]: time="2023-11-01T00:29:36.775166395Z" level=info msg="trying next host" error="failed to do request: Head \"http://127.0.0.1:30021/v2/rtl/marketing-promocode/be-1301/manifests/49ad4e18?ns=gitlab-registry.mycomp.ru\": dial tcp 127.0.0.1:30021: i/o timeout" host="127.0.0.1:30021"
Nov 01 00:29:33 myhostname containerd[2413588]: time="2023-11-01T00:29:33.774039526Z" level=info msg="trying next host" error="failed to do request: Head \"http://127.0.0.1:30021/v2/mkk/loan/feature-mkk-3870/manifests/bf6460d0?ns=gitlab-registry.mycomp.ru\": dial tcp 127.0.0.1:30021: i/o timeout" host="127.0.0.1:30021"
Nov 01 00:29:26 myhostname containerd[2413588]: time="2023-11-01T00:29:26.775053471Z" level=info msg="trying next host" error="failed to do request: Head \"http://127.0.0.1:30021/v2/opendatacollecting/eparser/odc-2259/manifests/ee2d26a6?ns=gitlab-registry.mycomp.ru\": dial tcp 127.0.0.1:30021: i/o timeout" host="127.0.0.1:30021"
Nov 01 00:29:24 myhostname containerd[2413588]: time="2023-11-01T00:29:24.843621330Z" level=info msg="trying next host" error="failed to do request: Head \"http://127.0.0.1:30021/v2/inventory/api/supply-return-readmodel/inv-2632/manifests/ddd7f161?ns=gitlab-registry.mycomp.ru\": dial tcp 127.0.0.1:30021: i/o timeout" host="127.0.0.1:30021"
Nov 01 00:29:05 myhostname containerd[2413588]: time="2023-11-01T00:29:05.787353142Z" level=info msg="trying next host" error="failed to do request: Head \"http://127.0.0.1:30021/v2/wms/svc/logistics-megasort-facade/wms-41662/manifests/ca271126?ns=gitlab-registry.mycomp.ru\": dial tcp 127.0.0.1:30021: i/o timeout" host="127.0.0.1:30021"
Nov 01 00:28:59 myhostname containerd[2413588]: time="2023-11-01T00:28:59.053739919Z" level=info msg="trying next host" error="failed to do request: Head \"http://127.0.0.1:30021/v2/eea/platform/exteca.platform.orchestrator.sublots/master/manifests/9c2bfe17?ns=gitlab-registry.mycomp.ru\": dial tcp 127.0.0.1:30021: i/o timeout" host="127.0.0.1:30021"
Nov 01 00:28:26 myhostname containerd[2413588]: time="2023-11-01T00:28:26.772423981Z" level=info msg="trying next host" error="failed to do request: Head \"http://127.0.0.1:30021/v2/eea/platform/exteca.platform.orchestrator.sublots/master/manifests/9c2bfe17?ns=gitlab-registry.mycomp.ru\": dial tcp 127.0.0.1:30021: i/o timeout" host="127.0.0.1:30021"
Nov 01 00:28:01 myhostname containerd[2413588]: time="2023-11-01T00:28:01.977061019Z" level=info msg="trying next host" error="failed to do request: Head \"http://127.0.0.1:30021/v2/lsplt/trip-container-service/trip-container-business-layer/master/manifests/f07adc9d?ns=gitlab-registry.mycomp.ru\": dial tcp 127.0.0.1:30021: i/o timeout" host="127.0.0.1:30021"
Nov 01 00:27:49 myhostname containerd[2413588]: time="2023-11-01T00:27:49.779983205Z" level=info msg="trying next host" error="failed to do request: Head \"http://127.0.0.1:30021/v2/pricing/services/pricing-comments/f-ppric-3865/manifests/ed7e4e82?ns=gitlab-registry.mycomp.ru\": dial tcp 127.0.0.1:30021: i/o timeout" host="127.0.0.1:30021"
Nov 01 00:27:07 myhostname containerd[2413588]: time="2023-11-01T00:27:07.147305757Z" level=info msg="trying next host" error="failed to do request: Head \"http://127.0.0.1:30021/v2/kdp/api/onec-gateway-api/master/manifests/c495844d?ns=gitlab-registry.mycomp.ru\": dial tcp 127.0.0.1:30021: i/o timeout" host="127.0.0.1:30021"

Seems like access from hostnetwork to 127.0.0.1:nodeport working only with proxy-mode=iptables (see: kubernetes/kubernetes#111840). As result, deploying spegel in cluster with ipvs make image pull times much worse (1s->60s).

How can we handle this issue without falling back to iptables?

Chart README has dead link to "prerequisites"

In https://github.com/XenitAB/spegel/blob/main/charts/spegel/README.md there's a link to https://github.com/XenitAB/spegel/blob/main/README.md#prerequisites

Validate mirror configuration

It is pretty difficult to determine if Spegel is working or not. There are many things that can go wrong and image pulling may still work as it falls back to the original registry. A good solution would be to extend the OCI interface to add a validate function. This function would somehow validate that the mirror configuration added works properly. I am not sure what the best way to do this is. We would want to validate that the configuration is properly picked up and the request is reaching Spegel. I have listed some ideas for how this could be done.

Pulling an image and waiting for the request.
Using the client to read the mirror configuration.
Waiting for a random image to be pulled through Spegel.

Failure to start on EKS 1.27

Hey just thought I'd file a bug report:

Cluster: v1.27.4-eks-2d98532
Nodes: v1.27.3-eks-a5565ad

Log

registry {"level":"error","ts":1693316449.014054,"caller":"build/main.go:71","msg":"","error":"Containerd registry config path is /etc/containerd/certs.d:/etc/docker/certs.d but needs to be /etc/containerd/certs.d for mirror configuration to take effect","stacktrace":"main.main\n\t/build/main.go:71\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}
configuration {"level":"info","ts":1693316396.3836858,"caller":"oci/containerd.go:286","msg":"added containerd mirror configuration","registry":"https://docker.io","path":"/etc/containerd/certs.d/docker.io/hosts.toml"}
configuration {"level":"info","ts":1693316396.3838172,"caller":"oci/containerd.go:286","msg":"added containerd mirror configuration","registry":"https://ghcr.io","path":"/etc/containerd/certs.d/ghcr.io/hosts.toml"}
configuration {"level":"info","ts":1693316396.3839626,"caller":"oci/containerd.go:286","msg":"added containerd mirror configuration","registry":"https://quay.io","path":"/etc/containerd/certs.d/quay.io/hosts.toml"}
configuration {"level":"info","ts":1693316396.3840818,"caller":"oci/containerd.go:286","msg":"added containerd mirror configuration","registry":"https://mcr.microsoft.com","path":"/etc/containerd/certs.d/mcr.microsoft.com/hosts.toml"}
configuration {"level":"info","ts":1693316396.3842103,"caller":"oci/containerd.go:286","msg":"added containerd mirror configuration","registry":"https://public.ecr.aws","path":"/etc/containerd/certs.d/public.ecr.aws/hosts.toml"}
configuration {"level":"info","ts":1693316396.3843098,"caller":"oci/containerd.go:286","msg":"added containerd mirror configuration","registry":"https://gcr.io","path":"/etc/containerd/certs.d/gcr.io/hosts.toml"}
configuration {"level":"info","ts":1693316396.3844151,"caller":"oci/containerd.go:286","msg":"added containerd mirror configuration","registry":"https://registry.k8s.io","path":"/etc/containerd/certs.d/registry.k8s.io/hosts.toml"}
configuration {"level":"info","ts":1693316396.3845265,"caller":"oci/containerd.go:286","msg":"added containerd mirror configuration","registry":"https://k8s.gcr.io","path":"/etc/containerd/certs.d/k8s.gcr.io/hosts.toml"}
configuration {"level":"info","ts":1693316396.3846445,"caller":"build/main.go:74","msg":"gracefully shutdown"}

Flux Config to Deploy

---
apiVersion: v1
kind: Namespace
metadata:
  name: spegel
  labels:
    toolkit.fluxcd.io/tenant: sre-team
---
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: HelmRepository
metadata:
  name: spegel
  namespace: spegel
spec:
  type: "oci"
  interval: 5m0s
  url: oci://ghcr.io/xenitab/helm-charts
---
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: spegel
  namespace: spegel
spec:
  interval: 1m
  chart:
    spec:
      chart: spegel
      version: "v0.0.11"
      interval: 5m
      sourceRef:
        kind: HelmRepository
        name: spegel

Deploy is failing with the same error log as above on both managed node group nodes and karpenter provisioned spot instances.