projectsveltos / addon-controller Goto Github PK

Sveltos Kubernetes add-on controller programmatically deploys add-ons and applications in tens of clusters. Support for ClusterAPI powered clusters, Helm charts, kustomize ,YAMLs. Sveltos has built-in support for multi-tenancy.

Home Page: https://projectsveltos.github.io/sveltos/

License: Apache License 2.0

Dockerfile 0.07% Makefile 1.60% Go 97.72% Shell 0.42% Lua 0.18%

gitops kubernetes helm multi-tenancy addons k8s jsonnet ytt add-ons devops

addon-controller's Introduction

Sveltos: Kubernetes add-on controller

What is the Projectsveltos?

Sveltos is a Kubernetes add-on controller that simplifies the deployment and management of add-ons and applications across multiple clusters. It runs in the management cluster and can programmatically deploy and manage add-ons and applications on any cluster in the fleet, including the management cluster itself. Sveltos supports a variety of add-on formats, including Helm charts, raw YAML, Kustomize, Carvel ytt, and Jsonnet.

Sveltos allows you to represent add-ons and applications as templates. Before deploying to managed clusters, Sveltos instantiates these templates. Sveltos can gather the information required to instantiate the templates from either the management cluster or the managed clusters themselves. This enables you to use the same add-on configuration across all of your clusters, while still allowing for some variation, such as different add-on configuration values. In other words, Sveltos lets you define add-ons and applications in a reusable way. You can then deploy these definitions to multiple clusters, with minor adjustments as needed. This can save you a lot of time and effort, especially if you manage a large number of clusters.

Sveltos provides precise control over add-on deployment order. Add-ons within a Profile/ClusterProfile are deployed in the exact order they appear, ensuring a predictable and controlled rollout. Furthermore, ClusterProfiles can depend on others, guaranteeing that dependent add-ons only deploy after their dependencies are fully operational. Finally Sveltos' event-driven framework offers additional flexibility. This framework allows for deploying add-ons and applications in response to specific events, enabling dynamic and adaptable deployments based on your needs.

👉 To get updates ⭐️ star this repository.

Profiles vs. ClusterProfiles

Projectsveltos offers two powerful tools for managing cluster configurations: Profiles and ClusterProfiles. Understanding their distinctions is crucial for efficient setup and administration.

ClusterProfiles: Apply across all clusters in any namespace. Ideal for platform admins maintaining global consistency and managing settings like networking, security, and resource allocation.
Profiles: Limited to a specific namespace, granting granular control to tenant admins. This isolation ensures teams manage, from the management cluster, their managed clusters independently without impacting others.

Addon deployment: how it works

The idea is simple:

from the management cluster, selects one or more clusters with a Kubernetes label selector;
lists which addons need to be deployed on such clusters.

where term:

clusters represents both CAPI cluster or any other Kubernetes cluster registered with Sveltos;
addons represents either an helm release, Kubernetes resource YAMLs or kustomize resources.

Here is an example of how to require that any CAPI Cluster with label env: prod has following features deployed:

Kyverno helm chart (version v3.0.1)
kubernetes resource(s) contained in the referenced Secret: default/storage-class
kubernetes resource(s) contained in the referenced ConfigMap: default/contour.

apiVersion: config.projectsveltos.io/v1alpha1
kind: ClusterProfile
metadata:
  name: deploy-kyverno
spec:
  clusterSelector: env=prod
  syncMode: Continuous
  helmCharts:
  - repositoryURL:    https://kyverno.github.io/kyverno/
    repositoryName:   kyverno
    chartName:        kyverno/kyverno
    chartVersion:     v3.0.1
    releaseName:      kyverno-latest
    releaseNamespace: kyverno
    helmChartAction:  Install
    values: |
      admissionController:
        replicas: 3
  policyRefs:
  - name: storage-class
    namespace: default
    kind: Secret
  - name: contour-gateway
    namespace: default
    kind: ConfigMap

As soon as a cluster is a match for above ClusterProfile instance, all referenced features are automatically deployed in such cluster.

Here is an example using Kustomize:

apiVersion: config.projectsveltos.io/v1alpha1
kind: ClusterProfile
metadata:
  name: flux-system
spec:
  clusterSelector: env=fv
  syncMode: Continuous
  kustomizationRefs:
  - namespace: flux-system
    name: flux-system
    kind: GitRepository
    path: ./helloWorld/
    targetNamespace: eng

where GitRepository synced with Flux contains following resources:

├── deployment.yaml
├── kustomization.yaml
└── service.yaml
└── configmap.yaml

Refer to examples for more complex examples.

Different SyncMode

OneTime: This mode is designed for bootstrapping critical components during the initial cluster setup. Think of it as a one-shot configuration injection:
1. Deploying essential infrastructure components like CNI plugins, cloud controllers, or the workload cluster's package manager itself;
2. Simplifies initial cluster setup;
3. Hands over management to the workload cluster's own tools, promoting modularity and potentially simplifying ongoing maintenance.
Continuous: This mode continuously monitors ClusterProfiles or Profiles for changes and automatically applies them to matching clusters. It ensures ongoing consistency between your desired configuration and the actual cluster state:
1. Centralized control over deployments across multiple clusters for consistency and compliance;
2. Simplifies management of configurations across multiple clusters.
ContinuousWithDriftDetection: Detects and automatically corrects configuration drifts in managed clusters, ensuring they remain aligned with the desired state defined in the management cluster.

Configuration Drift Detection

Sveltos can automatically detect drift between the desired state, defined in the management cluster, and actual state of your clusters and recover from it.

Give projectsveltos a try

If you want to try projectsveltos with a test cluster:

git clone https://github.com/projectsveltos/addon-controller
make quickstart

will create a management cluster using Kind, deploy clusterAPI and projectsveltos, create a workload cluster powered by clusterAPI.

Sveltos in action

To see the full demo, have a look at this youtube video

Useful links

Projectsveltos documentation
Quick Start

Contributing

❤️ Your contributions are always welcome! If you want to contribute, have questions, noticed any bug or want to get the latest project news, you can connect with us in the following ways:

Read contributing guidelines
Open a bug/feature enhancement on github
Chat with us on the Slack in the #projectsveltos channel
Contact Us

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

addon-controller's People

Contributors

Stargazers

Watchers

Forkers

pescerosso kprav33n wazofski shepardo realgaurav gianlucam76 rohanmishra315 hanshal101

addon-controller's Issues

BUG: Resources deployed in the management cluster are incorrectly deployed

Create a ClusterProfile that:

matches more than one managed cluster
wants to deploy resources in the management cluster (one resource per managed cluster)

Resources deployed by Sveltos have ClusterProfile has owner reference.

A ClusterSummary is created per managed cluster. ClusterSummary reconciler will:

deploy resources (in the management cluster and the managed cluster)
remove stale resources

The problem sits with removing stale resources.

Let's say ClusterProfile is matching two clusters (cluster1 and cluster2) and wants to deploy, in the management cluster, an instance of Crossplane bucket per managed cluster.

When ClusterSummary for cluster1 reconciles, it creates a Bucket instance 1, then gets all Bucket instances currently present in the management cluster (deployed by Sveltos).
It will find Bucket instance 1 and Bucket instance 2 (created by ClusterSummary for cluster2).
It will leave Bucket instance 1 but it will delete Bucket instance 2 (it will assume that resource is stale).

Report missing referenced resources

Is your feature request related to a problem? Please describe.
Profile/ClusterProfile can reference ConfigMap/Secret. Because of a type it can happen that Sveltos ends up referencing a non existing ConfigMap/Secret.
Currently Sveltos reports no issues in such situation.

Describe the solution you'd like
When a referenced ConfigMap/Secret does not exist, report it in the Status. So it is easier to debug.

Describe alternatives you've considered
None.

Additional context
None.

(feat): Namespaced ClusterProfile resources

Is your feature request related to a problem? Please describe.
CAPI Cluster resource is namespaced while ClusterProfile isn't, thus ClusterProfile has different ownership than Cluster, usually consumers are not allowed to deploy namespace-less resources ( ClusterProfile ) into clusters where Sveltos and CAPI controllers are running. Namespaced ClusterProfile is a life-improving change that will improve the experience in situations like this.

Describe the solution you'd like
Namespaced ClusterProfile resource, probably just Profile which can match only:

clusters in the same namespace
resources in the same namespace

Describe alternatives you've considered
None

Additional context
None

Add an example to deploy Cilium

Currently there is an example on how to deploy Calico using Projectsveltos.

Add another ClusterProfile example showing how to deploy Cilium using Projectsveltos

Deploying large CRDs with Spec.PolicyRefs

Is your feature request related to a problem? Please describe.
I like the idea of templating/rendering my helm charts before applying them as this allows me to review them in Github's diff view. I have currently been doing this by having the output of helm template go into a Configmap as is described in the section on Spec.PolicyRefs. Unfortunately this fails for Kyverno as one of the CRDs is bigger than 1MB. The Kubernetes documentation has this to say:

Note: A ConfigMap is not designed to hold large chunks of data. The data stored in a ConfigMap cannot exceed 1 MiB. If you need to store settings that are larger than this limit, you may want to consider mounting a volume or use a separate database or file service.

Describe the solution you'd like
If supporting this workflow is out of scope for Project Sveltos I would still encourage adding a note to the documentation around PolicyRefs stating that the max size of a Confimap is 1MB and this will limit its usefulness.

Describe alternatives you've considered
This can be worked around by blindly having Project Sveltos render and apply charts.

Additional context
Not at the moment.

BUG: template Values/template resources

Problem Description

Sveltos allows fetching values for helm charts and kubernetes resource at run time from Cluster/InfrastructureProvider/KubeadmControlPlane/SecretRef instances.

For instance, following set cidrs fetching it from Cluster instance Spec.

" cidrs: {{ index .Cluster.Spec.ClusterNetwork.Pods.CIDRBlocks 0 }} "

Currently only when:

helm chart is installed/upgraded
resource. contained in referenced Secret/ConfigMap, is deployed

template values are fetched from Cluster/InfrastructureProvider/KubeadmControlPlane/SecretRef

Ideal behavior should that that any change in above resource which would cause helm values/template resource to change, should trigger reconciliation

System Information

INSTRUCTIONS: Provide the system and application information below.

SVELTOS VERSION: v0.2.1

BUG: Leftover ClusterConfigurations and ClusterSummaries after CAPI cluster deletion

Problem Description

When deleting a CAPI cluster object from our management cluster, I can see that ClusterConfigurations and ClusterSummaries objects are not deleted.

System Information

CLUSTERAPI VERSION: v1.5.1
SVELTOS VERSION: v0.15.3
KUBERNETES VERSION: v1.27.5

Logs

I can see errors for failed reconciling of ClusterSummaries in addon-controller pod, while I couldn't find logs related to ClusterConfigurations objects.

I0915 10:07:50.539414       1 clustersummary_controller.go:140] "Reconciling" controller="clustersummary" controllerGroup="config.projectsveltos.io" controllerKind="ClusterSummary" ClusterSummary="my-namespace/deploy-cilium-v1-26-capi-my-cluster" namespace="my-namespace" name="deploy-cilium-v1-26-capi-my-cluster" reconcileID="54854682-127f-4bb4-8806-23fcab276c57"
I0915 10:07:50.539878       1 clustersummary_controller.go:211] "Reconciling ClusterSummary delete" controller="clustersummary" controllerGroup="config.projectsveltos.io" controllerKind="ClusterSummary" ClusterSummary="my-namespace/deploy-cilium-v1-26-capi-my-cluster" namespace="my-namespace" name="deploy-cilium-v1-26-capi-my-cluster" reconcileID="54854682-127f-4bb4-8806-23fcab276c57"
I0915 10:07:50.539940       1 clusterproxy.go:100] "Cluster does not exist" controller="clustersummary" controllerGroup="config.projectsveltos.io" controllerKind="ClusterSummary" ClusterSummary="my-namespace/deploy-cilium-v1-26-capi-my-cluster" namespace="my-namespace" name="deploy-cilium-v1-26-capi-my-cluster" reconcileID="54854682-127f-4bb4-8806-23fcab276c57"
E0915 10:07:50.539998       1 clustersummary_controller.go:224] "failed to remove ResourceSummary." err="Cluster my-namespace/my-cluster does not exist: Cluster.cluster.x-k8s.io \"my-cluster\" not found" controller="clustersummary" controllerGroup="config.projectsveltos.io" controllerKind="ClusterSummary" ClusterSummary="my-namespace/deploy-cilium-v1-26-capi-my-cluster" namespace="my-namespace" name="deploy-cilium-v1-26-capi-my-cluster" reconcileID="54854682-127f-4bb4-8806-23fcab276c57"

Here is the deploy-cilium-v1.26 ClusterProfile

apiVersion: config.projectsveltos.io/v1alpha1
kind: ClusterProfile
metadata:
  name: deploy-cilium-v1-26
spec:
  clusterSelector: sveltos=enabled,kubernetes=v1-26
  helmCharts:
  - chartName: cilium/cilium
    chartVersion: 1.12.12
    helmChartAction: Install
    releaseName: cilium
    releaseNamespace: kube-system
    repositoryName: cilium
    repositoryURL: https://helm.cilium.io/
    values: |
      k8sServiceHost: "{{ .Cluster.spec.controlPlaneEndpoint.host }}"
      k8sServicePort: "{{ .Cluster.spec.controlPlaneEndpoint.port }}"
      hubble:
        enabled: false
      nodePort:
        enabled: true
      kubeProxyReplacement: strict
      operator:
        replicas: 1
        updateStrategy:
          rollingUpdate:
            maxSurge: 0
            maxUnavailable: 1
  reloader: false
  stopMatchingBehavior: WithdrawPolicies
  syncMode: Continuous

I'm not managing cluster via sveltosctl, but just creating and deleting cluster object on the management cluster.

docker images

Currently for sveltos-manager (and all other sveltos microservices) all docker images are stored in docker-hub under namespace gianlucam76. Replace this to use namespace projectsveltos.

BUG: ClusterSummary is not cleaned-up upon Cluster deletion

Problem Description

Define a {Cluster}Profile
Create several clusters
Thanks to labels, wait for addons being deployed
Delete one of the clusters
Check the deployed cluster still has a ClusterSummary

tl;dr; the ClusterSummary must be deleted as well upon the referencing cluster is deleted too.

System Information

CLUSTERAPI VERSION: v1.6.0
SVELTOS VERSION: v1.21.0
KUBERNETES VERSION: v1.27.3

Logs

N.R.

ClusterProfile installing multiple Helm charts does not update clusterconfigurations in case of failures

If a ClusterProfile is listing more than one Helm charts, and some are installed before an error happens, the installed Helm charts are not reported.

For instance, deploy following ClusterProfile

apiVersion: config.projectsveltos.io/v1alpha1
kind: ClusterProfile
metadata:
  name: multiple-helm-charts
spec:
  clusterSelector: env=fv
  syncMode: Continuous
  helmCharts:
  - repositoryURL:    https://prometheus-community.github.io/helm-charts
    repositoryName:   prometheus-community
    chartName:        prometheus-community/prometheus
    chartVersion:     23.4.0
    releaseName:      prometheus
    releaseNamespace: prometheus
    helmChartAction:  Install
  - repositoryURL:    https://grafana.github.io/helm-charts
    repositoryName:   bitnami
    chartName:        grafana/grafana
    chartVersion:     6.58.9
    releaseName:      grafana
    releaseNamespace: grafana
    helmChartAction:  Install

Prometheus gets installed. Grafana does not (as repositoryName is incorrectly set). Yet ClusterConfiguration does not list Prometheus as installed

  apiVersion: config.projectsveltos.io/v1alpha1
  kind: ClusterConfiguration
  metadata:
    creationTimestamp: "2023-08-20T14:27:28Z"
    generation: 1
    labels:
      projectsveltos.io/cluster-name: clusterapi-workload
      projectsveltos.io/cluster-type: Capi
    name: capi--clusterapi-workload
    namespace: default
    ownerReferences:
    - apiVersion: config.projectsveltos.io/v1alpha1
      kind: ClusterProfile
      name: multiple-helm-charts
      uid: 634678b8-ff6f-479c-acae-de5dd50ab0c2
    resourceVersion: "20595"
    uid: 8b2db1ac-cfdc-418b-a5ec-e20fce064179
  status:
    clusterProfileResources:
    - clusterProfileName: multiple-helm-charts

same is visible with

kubectl exec -it -n projectsveltos sveltosctl-0 -- ./sveltosctl show addons 
+---------+---------------+-----------+------+---------+------+------------------+
| CLUSTER | RESOURCE TYPE | NAMESPACE | NAME | VERSION | TIME | CLUSTER PROFILES |
+---------+---------------+-----------+------+---------+------+------------------+
+---------+---------------+-----------+------+---------+------+------------------+

BUG: Resources stuck in provisioning

Problem Description

Based on previous slack conversation: https://projectsveltos.slack.com/archives/C046P825BBL/p1704806833044409

Cluster summary stays in provsioning state after resoruces have been deployed.

Test resoruces:

---
apiVersion: v1
data:
  ns.yaml: |
    apiVersion: v1
    kind: Namespace
    metadata:
      creationTimestamp: null
      name: test
      namespace: default
    spec: {}
    status: {}
kind: ConfigMap
metadata:
  creationTimestamp: null
  name: test
---
apiVersion: config.projectsveltos.io/v1alpha1
kind: ClusterProfile
metadata:
  name: test
spec:
  clusterSelector: env=dev
  policyRefs:
  - name: test
    namespace: default
    kind: ConfigMap

resource in managed cluster:

k describe namespaces test --kubeconfig test-config                                                          
Name:         test
Labels:       kubernetes.io/metadata.name=test
              projectsveltos.io/reason=Resources
              projectsveltos.io/reference-kind=ConfigMap
              projectsveltos.io/reference-name=test
              projectsveltos.io/reference-namespace=default
Annotations:  projectsveltos.io/hash: sha256:e495e475662b02df5ddbb8ab3cf00a69e5a62bb72a7b0fdb958f6ac6276cc04e
Status:       Active

cluster summary:

 k describe clustersummaries.config.projectsveltos.io -n org-azure-workload
Name:         test-capi-magchr-dev-c2
Namespace:    org-azure-workload
Labels:       projectsveltos.io/cluster-name=magchr-dev-c2
              projectsveltos.io/cluster-profile-name=test
              projectsveltos.io/cluster-type=Capi
Annotations:  <none>
API Version:  config.projectsveltos.io/v1alpha1
Kind:         ClusterSummary
Metadata:
  Creation Timestamp:  2024-01-09T14:52:47Z
  Finalizers:
    clustersummaryfinalizer.projectsveltos.io
  Generation:  1
  Owner References:
    API Version:     config.projectsveltos.io/v1alpha1
    Kind:            ClusterProfile
    Name:            test
    UID:             d1c5dcf3-a960-4a7c-aea8-a713e070ebf6
  Resource Version:  36612
  UID:               f7f775ab-bc7d-434d-bac3-f3eaaf7eaabf
Spec:
  Cluster Name:       magchr-dev-c2
  Cluster Namespace:  org-azure-workload
  Cluster Profile Spec:
    Cluster Selector:  env=dev
    Policy Refs:
      Deployment Type:       Remote
      Kind:                  ConfigMap
      Name:                  test
      Namespace:             default
    Reloader:                false
    Stop Matching Behavior:  WithdrawPolicies
    Sync Mode:               Continuous
  Cluster Type:              Capi
Status:
  Dependencies:  no dependencies
  Feature Summaries:
    Deployed Group Version Kind:
      Namespace.v1.
    Failure Message:    failed to get API group resources: unable to retrieve the complete list of server APIs: lib.projectsveltos.io/v1alpha1: the
 server could not find the requested resource
    Feature ID:         Resources
    Hash:               SU4trXmta5VHbjHlzcM+osWJeTRN+Ho/hcS0lXJvSPQ=
    Last Applied Time:  2024-01-09T15:17:50Z
    Status:             Provisioning
Events:                 <none>

System Information

CLUSTERAPI VERSION: v1.5.2
SVELTOS VERSION: main
KUBERNETES VERSION: minikube v1.27.4

Logs

from addon-controller

 name="test-capi-magchr-dev-c2" reconcileID="d3175851-143b-4ac7-923b-1426c360bae2" clusternamespace="org-azure-workload" clustername="magchr-dev
-c2" clusternamespace="org-azure-workload" clustername="org-azure-workload" applicant="test-capi-magchr-dev-c2" feature="Resources" hash="494e2
dad79ad6b95476e31e5cdc33ea2c58979344df87a3f85c4b495726f48f4" status="Provisioning"
I0109 15:20:10.705880       1 clustersummary_controller.go:516] "no helm configuration" controller="clustersummary" controllerGroup="config.proje
ctsveltos.io" controllerKind="ClusterSummary" ClusterSummary="org-azure-workload/test-capi-magchr-dev-c2" namespace="org-azure-workload" name="
test-capi-magchr-dev-c2" reconcileID="d3175851-143b-4ac7-923b-1426c360bae2" clusternamespace="org-azure-workload" clustername="magchr-dev-c2"
I0109 15:20:10.705886       1 clustersummary_controller.go:518] "no helm status. Do not reconcile this" controller="clustersummary" controllerGro
up="config.projectsveltos.io" controllerKind="ClusterSummary" ClusterSummary="org-azure-workload/test-capi-magchr-dev-c2" namespace="org-azure-
workload" name="test-capi-magchr-dev-c2" reconcileID="d3175851-143b-4ac7-923b-1426c360bae2" clusternamespace="org-azure-workload" clustername="m
agchr-dev-c2"
I0109 15:20:10.705892       1 clustersummary_controller.go:488] "no policy configuration" controller="clustersummary" controllerGroup="config.pro
jectsveltos.io" controllerKind="ClusterSummary" ClusterSummary="org-azure-workload/test-capi-magchr-dev-c2" namespace="org-azure-workload" name
="test-capi-magchr-dev-c2" reconcileID="d3175851-143b-4ac7-923b-1426c360bae2" clusternamespace="org-azure-workload" clustername="magchr-dev-c2"
I0109 15:20:10.705898       1 clustersummary_controller.go:490] "no policy status. Do not reconcile this" controller="clustersummary" controllerG
roup="config.projectsveltos.io" controllerKind="ClusterSummary" ClusterSummary="org-azure-workload/test-capi-magchr-dev-c2" namespace="org-azur
e-workload" name="test-capi-magchr-dev-c2" reconcileID="d3175851-143b-4ac7-923b-1426c360bae2" clusternamespace="org-azure-workload" clustername=
"magchr-dev-c2"
E0109 15:20:10.705905       1 clustersummary_controller.go:350] "failed to deploy" err="feature is still being provisioned" controller="clustersu
mmary" controllerGroup="config.projectsveltos.io" controllerKind="ClusterSummary" ClusterSummary="org-azure-workload/test-capi-magchr-dev-c2" na
mespace="org-azure-workload" name="test-capi-magchr-dev-c2" reconcileID="d3175851-143b-4ac7-923b-1426c360bae2"
I0109 15:20:10.706064       1 controller.go:331] "Reconcile done, requeueing after 10s" controller="clustersummary" controllerGroup="config.proje
ctsveltos.io" controllerKind="ClusterSummary" ClusterSummary="org-azure-workload/test-capi-magchr-dev-c2" namespace="org-azure-workload" name="
test-capi-magchr-dev-c2" reconcileID="d3175851-143b-4ac7-923b-1426c360bae2"

Install Flux using helm with Sveltos

Is your feature request related to a problem? Please describe.
Add an example of how to deploy Flux in the management cluster via Sveltos using helm chart.
This can be added here

Describe the solution you'd like
This can ideally be added also to documentation. So when Sveltos is deployed in the management cluster, Flux is automatically deployed as well (we can make it part of Sveltos YAML, Helm Chart, Kustomize).

Describe alternatives you've considered
Manually installing Flux in the management cluster as a separate process.

BUG: addon-controller keeps redeploying cilium

Problem Description

Posting this Profile (ClusterProfile is same)

  apiVersion: config.projectsveltos.io/v1alpha1
  kind: Profile
  metadata:
    name: cilium
  spec:
    clusterSelector: cni=cilium
    helmCharts:
    - chartName: cilium/cilium
      chartVersion: v1.14.5
      helmChartAction: Install
      releaseName: cilium
      releaseNamespace: kube-system
      repositoryName: cilium
      repositoryURL: https://helm.cilium.io/
      values: |
        null
    reloader: false
    stopMatchingBehavior: WithdrawPolicies
    syncMode: ContinuousWithDriftDetection

depending on how long Sveltos takes to install cilium first time, addon-controller, due to drift-detection, goes in a loop.

Drift-detection starts detecting this

I0108 18:08:57.316119       1 watcher.go:93] "Resource in ResourceSummary civo2--p--cilium-sveltos-civo-cluster-2 potentially drifted (Secret kube-system/hubble-server-certs)" gvk="/v1, Kind=Secret" key="kube-system/hubble-server-certs"

and addon-controller then keeps redeploying cilium.

This seems to be an issue with either addon-controller or drift-detection manager.
Either addon-controller redeploys helm chart first time even though that is not needed or drift-detection incorrectly detects first possible drift.

Cilium documentation: "When using Helm, TLS certificates are (re-)generated every time Helm is used for install or upgrade"

Accept Lua functions for template instantiations

Currently Sveltos accepts Helm charts and raw YAMLs expressed as templates and instantiate those using resources present in the management cluster.

Sveltos extensively uses Lua. So it makes sense for Sveltos to accept Lua functions which are passed the templates and the values fetched from the management cluster and accept back the instantiate template.

BUG: Stale drift-detection pod

Problem Description

When Sveltos is instructed to deploy agents (sveltos-agent and drift-detection-manager) in the management cluster following sequence leaves a stale drift-detection-manager in the management cluster:

create a Profile/ClusterProfile with mode set to driftDetection
have a cluster/sveltoscluster match => drift-detection-manager is created in the management cluster
change cluster/sveltoscluster labels so it does not match any Profile/ClusterProfile anymore
when all ClusterSummaries instances are gone, delete the cluster/sveltosCluster => drift-detection-manager is state

Reason is the cleanup of the drift-detection-manager happens within ClusterSummary reconciler if the cluster is gone or marked for deletion.
But following above sequence, ClusterSummaries are all gone when cluster is deleted, so the stale drift-detection-manager deployment in the management cluster.

ClusterFeature webhook

Referenced object can only be ConfigMaps.

Template annotation

ClusterProfile can reference ConfigMaps and Secrets (content of those will be deployed in the managed clusters matching ClusterProfile's clusterSelector).

Referenced ConfigMaps/Secrets can be expressed as template and sveltos will instantiate those before deploying their content

ConfigMaps/Secrets expressed as template need to have a special annotation which indicates Sveltos to treat those as template.

Annotation is "projectsveltos.io/template" and it is currently defined here in the addon-controller repo

Since event-manager needs to have access to this annotation, it would be good to:

Move the definition in libsveltos
add a utility method to check whether a referenced ConfigMap/Secret is a template or not

Document CustomResources ownership

User Story

As a developer I would like comprehensive documentation for the CRs ownership relationship. For instance, when a cluster matches a ClusterProfile, a ClusterSummary is created and it is owned by the ClusterProfile instance.
This comes in particularly handy as the number of CRDs of the project keeps growing.

Detailed Description

In order to make it easier for devs to quickly understand sveltos internals, in the documentation (https://projectsveltos.github.io/sveltos/) create a section with some CRs ownership diagram.
Add information on when CRs are created/update/deleted and which other CR owns it.

/kind documentation
/help
/good-first-issue

ClusterReport/ClusterSummary name

Currently ClusterReport name is

func getClusterReportName(clusterProfileName, clusterName string, clusterType libsveltosv1alpha1.ClusterType) string {
	// TODO: shorten this value
	return clusterProfileName + "--" + strings.ToLower(string(clusterType)) + "--" + clusterName
}

and ClusterSummary name is

func GetClusterSummaryName(clusterProfileName, clusterName string, isSveltosCluster bool) string {
	prefix := "capi"
	if isSveltosCluster {
		prefix = "sveltos"
	}
	return fmt.Sprintf("%s-%s-%s", clusterProfileName, prefix, clusterName)
}

which might end up exceeding maximum name length.

Describe the solution you'd like
Implement a solution to shorten that making sure no collision ever happens (when pod starts we can fetch all existing ClusterSummary/ClusterReports).

Also, given a clusterProfile and a cluster name, there needs to be a way to get corresponding ClusterReport (which can likely be achieved using labels) and ClusterSummary.

Prometheus Configuration webhook

Is your feature request related to a problem? Please describe.
Currently it is possible to change Prometheus configuration in a ClusterFeature.
That should not be allowed. Only things that can change are storage configuration and customer referenced policies.

Describe the solution you'd like
Add a webhook to block such change.

Ignore sections with just comments and empty lines

Referenced ConfigMaps and Secrets contains YAML of one ore more Kubernetes resources that need to deployed in cluster matching ClusterProfile's clusterSelector.

Currently if one of those section is just comments and empty lines, sveltos fails.

For instance if following section is contained in ConfigMap.Data

---
# This file is generated from the individual YAML files by generate-provisioner-deployment.sh. Do not
# edit this file directly but instead edit the source files and re-render.
#
# Generated from:
#       examples/contour/01-crds.yaml
#       examples/gateway/00-crds.yaml
#       examples/gateway/00-namespace.yaml
#       examples/gateway/01-admission_webhook.yaml
#       examples/gateway/02-certificate_config.yaml
#       examples/gateway-provisioner/00-common.yaml
#       examples/gateway-provisioner/01-roles.yaml
#       examples/gateway-provisioner/02-rolebindings.yaml
#       examples/gateway-provisioner/03-gateway-provisioner.yaml

---

sveltos will fail.

Sveltos should skip sections that contain only comments or empty lines.

The only method that needs to change is

func collectContent(ctx context.Context, clusterSummary *configv1alpha1.ClusterSummary,
	data map[string]string, logger logr.Logger) ([]*unstructured.Unstructured, error) {

where data is essentially the content of ConfigMap.Data or decoded Secret.Data.

BUG: lagged addons delivery

Problem Description

Create a {Cluster}Profile
Deploy a Cluster (Cluster API or SveltosCluster)
Wait up to 10 minutes before getting the addon delivered

System Information

CLUSTERAPI VERSION: v1.6.0
SVELTOS VERSION: v1.21.0
KUBERNETES VERSION: v1.27.3

Logs

N.R.

BUG: mgtmResources template variable typo

Problem Description

There is a nasty bug in addon-controller. Typo in mgtmResources template variable, examples ( https://projectsveltos.github.io/sveltos/template/template/#autoscaler-all-in-one-yaml-definition ) in docs are referring to MgmtResources. Thanks

Helm chart with CustomResourceDefinition and instances of such CRDs

If an Helm chart contains both CustomResourceDefinitions and instances of such CRDs, deploying such helm chart with Sveltos will fail.

For instance

apiVersion: config.projectsveltos.io/v1alpha1
kind: ClusterProfile
metadata:
  name: kubevela-core
spec:
  clusterSelector: env=fv
  syncMode: Continuous
  helmCharts:
  - repositoryURL:    https://kubevela.github.io/charts
    repositoryName:   kubevela
    chartName:        kubevela/vela-core
    chartVersion:     1.9.6
    releaseName:      kubevela-core-latest
    releaseNamespace: vela-system
    helmChartAction:  Install

The reason is Sveltos uses Helm in dry run mode first to get list of resources Helm charts would deploy. It then validates those resources against compliance policies (both OpenAPI and Lua ones).

But dry run for helm would fail for an helm chart with both CRDs and instances of such CRDs.

While this behaviour needs to be called out in documentation, if no compliance policies are defined, Sveltos must be able to deploy such Helm charts.

Reconcile ConfigMap/Secret when annotation change

ClusterSummary reconciles when referenced ConfigMap/Secret changes.

As of now, on updates, this reconciliation is triggered only when content of ConfigMap/Secret changes.

Logic for ConfigMap is here
Logic for Secret is here

ConfigMaps and Secrets expressed as template, must have [special annotation](special annotation to be treated as such.

If a ConfigMap is created as template and annotation is not added (misconfiguration) customer can later on fix the configuration by adding needed annotation.

Currently that does not retrigger a ClusterSummary reconciliation.

Issue is that ConfigMap and Secret predicates need to return true if special annotation is added on update.

Create ClusterProfile first, CAPI Cluster later. Add-ons are not deployed

Problem Description

In this sequence:

create ClusterProfile instance
create Cluster instance with labels matching ClusterProfile

no add-ons are deployed. Essentially Sveltos never recognizes cluster is a match for the ClusterProfile instance (proof of that, ClusterSummary is never created)

Can cluster profile deploy a kustomization ref to only a set number of clusters in a cluster label

In the example provided for deploying a kutomization, https://projectsveltos.github.io/sveltos/addons/kustomize/ it will deploy the app, config map and secrets to all the clusters with the label env=fv .

In case i have two clusters with label env=fv, it will deploy it to both the clusters.

Does cluster profile have any mechanism to restrict the number of cluster to which the kustomization is deployed in when there are multiple clusters in a label?

ex:

i have two custers with label prod,

kubectl get sveltosclusters -n projectsveltos --show-labels
NAME READY VERSION LABELS
clusterapi-workload-dev true v1.27.1 env=dev,sveltos-agent=present
clusterapi-workload-prod true v1.27.1 env=prod,sveltos-agent=present
clusterapi-workload-prod-1 true v1.27.1 env=prod,sveltos-agent=present

When i deploy the sample kustomization ref, it gets deployed to both the prod labled clusters:

Can we restrict it to deploy the cluster to only one of the prod clusters.

Shaping the Future Together: Share Your Wishlist for Addon-Controller Features

Hi Sveltos community!

We're constantly striving to improve Sveltos and deliver the best possible experience for managing your Kubernetes add-ons across multiple clusters. Your feedback is invaluable in guiding our development efforts, and we're eager to hear directly from you.

What we're asking:

What features are most important to you at this stage?
Are there any specific pain points you'd like to see addressed?
Do you have any innovative ideas for enhancing Sveltos's capabilities?

Feel free to share any thoughts, suggestions, or feature requests you may have. We encourage detailed descriptions, use cases, and even mockups if you have them!

BUG: Value changes not reconciled/redeployed

Problem Description

Any changes to values of helm charts defined in ClusterProfile doesn’t reconcile/redeploy K8s controllers. Is this expected behavior?

For example, When I apply the below values to the already defined nginx helm chart with no values in ClusterProfile, the nginx deployment doesn’t get redeployed. The addon-controller log shows added to result with err cannot re-use a name that is still in use. Does this mean the releaseName should be updated with every value change? If I change the releaseName
/chartVersion reconcile happens and new nginx pods get deployed without any issue.

Seems similar to #93 but for non-templated values.

New Values:

    values: |-
      containerPorts:
        http: 8190

ClusterProfile.yml

apiVersion: config.projectsveltos.io/v1alpha1
kind: ClusterProfile
metadata:
  name: nginx
spec:
  clusterSelector: cluster=vcenter1-lab
  syncMode: Continuous
  helmCharts:
  - repositoryURL:    artifactory/kubernetes-helm
    repositoryName:   kubernetes-helm
    chartName:        kubernetes-helm/nginx
    chartVersion:     13.1.7
    releaseName:      nginx-sveltos
    releaseNamespace: k8s-admin
    helmChartAction:  Install

System Information

CLUSTERAPI VERSION: v1.6.1
SVELTOS VERSION: v0.26.0
KUBERNETES VERSION: v1.29.0

Logs

│ I0405 07:35:07.142197       1 worker.go:238] "added to result with err cannot re-use a name that is still in use" logger="deployer" worker="6" key="default:::vcenter1-lab:::Capi:::nginx-capi-vcenter1-lab:::Helm:::false"

BUG: configuration drift detection does not work in the scenario described by this bug

Problem Description

When deploying this ClusterPolicy, configuration drift detection does not work.

apiVersion: config.projectsveltos.io/v1alpha1
kind: ClusterProfile
metadata:
  name: podinfo
spec:
  clusterSelector: env=fv
  helmCharts:
  - chartName: podinfo/podinfo
    chartVersion: v6.5.1
    helmChartAction: Install
    releaseName: podinfo
    releaseNamespace: test
    repositoryName: podinfo
    repositoryURL: https://stefanprodan.github.io/podinfo
  syncMode: ContinuousWithDriftDetection

An initial debug of the issue indicates the problem to be in addon-controller.

Addon-controller, when a ClusterProfile syncMode is set to ContinuousWithDriftDetection, gets list of resources deployed and pass it to drift-detection-manager.

In this case, the manifest returned by Helm does not have namespace even for namespace resources..

    - group: apps
      kind: Deployment
      name: podinfo
      version: v1

while in general in other cases this is what is returned

    - group: apps
      kind: Deployment
      name: kyverno-admission-controller
      namespace: kyverno
      version: v1

The missing namespace information is causing drift-detection-manager to fail.

BUG: Upgrade drift-detection-manager

Problem Description

When Sveltos is upgraded, addon-controller should upgrade drift-detection-manager (and the CRDs) already deployed in existing managed clusters.

projectsveltos / addon-controller Goto Github PK

addon-controller's Introduction

Sveltos: Kubernetes add-on controller

What is the Projectsveltos?

Profiles vs. ClusterProfiles

Addon deployment: how it works

Different SyncMode

Configuration Drift Detection

Give projectsveltos a try

Sveltos in action

Useful links

Contributing

License

addon-controller's People

Contributors

Stargazers

Watchers

Forkers

addon-controller's Issues

Problem Description

System Information

Problem Description

System Information

Logs

Problem Description

System Information

Logs

Problem Description

System Information

Logs

Problem Description

Problem Description

Problem Description

System Information

Logs

Problem Description

Problem Description

Problem Description

System Information

Logs

Problem Description

Problem Description

Recommend Projects

Recommend Topics

Recommend Org