Giter Site home page Giter Site logo

fred78290 / kubernetes-vmware-autoscaler Goto Github PK

View Code? Open in Web Editor NEW
49.0 6.0 13.0 11 MB

Kubernetes autoscaler for vsphere

License: Apache License 2.0

Makefile 0.73% Shell 2.93% Dockerfile 0.64% Go 95.70%
kubernetes vmware-vsphere cluster autoscaling

kubernetes-vmware-autoscaler's Introduction

Build Status Quality Gate Status Licence

kubernetes-vmware-autoscaler

Kubernetes autoscaler for vsphere/esxi including a custom resource controller to create managed node without code

Supported releases

  • 1.26.11
    • This version is supported kubernetes v1.26 and support k3s, rke2, external kubernetes distribution
  • 1.27.9
    • This version is supported kubernetes v1.27 and support k3s, rke2, external kubernetes distribution
  • 1.28.4
    • This version is supported kubernetes v1.28 and support k3s, rke2, external kubernetes distribution
  • 1.29.0
    • This version is supported kubernetes v1.29 and support k3s, rke2, external kubernetes distribution

How it works

This tool will drive vSphere to deploy VM at the demand. The cluster autoscaler deployment use vanilla cluster-autoscaler or my enhanced version of cluster-autoscaler.

This version use grpc to communicate with the cloud provider hosted outside the pod. A docker image is available here cluster-autoscaler

A sample of the cluster-autoscaler deployment is available at examples/cluster-autoscaler.yaml. You must fill value between <>

Before you must create a kubernetes cluster on vSphere

You can do it from scrash or you can use script from project autoscaled-masterkube-vmware to create a kubernetes cluster in single control plane or in HA mode with 3 control planes.

Commandline arguments

Parameter Description
version Display version and exit
save Tell the tool to save state in this file
config The the tool to use config file
log-format The format in which log messages are printed (default: text, options: text, json)
log-level Set the level of logging. (default: info, options: panic, debug, info, warning, error, fatal)
debug Debug mode
distribution Which kubernetes distribution to use: kubeadm, k3s, rke2, external
use-vanilla-grpc Tell we use vanilla autoscaler externalgrpc cloudprovider
use-controller-manager Tell we use vsphere controller manager
use-external-etcd Tell we use an external etcd service (overriden by config file if defined)
src-etcd-ssl-dir Locate the source etcd ssl files (overriden by config file if defined)
dst-etcd-ssl-dir Locate the destination etcd ssl files (overriden by config file if defined)
kubernetes-pki-srcdir Locate the source kubernetes pki files (overriden by config file if defined)
kubernetes-pki-dstdir Locate the destination kubernetes pki files (overriden by config file if defined)
server The Kubernetes API server to connect to (default: auto-detect)
kubeconfig Retrieve target cluster configuration from a Kubernetes configuration file (default: auto-detect)
request-timeout Request timeout when calling Kubernetes APIs. 0s means no timeout
deletion-timeout Deletion timeout when delete node. 0s means no timeout
node-ready-timeout Node ready timeout to wait for a node to be ready. 0s means no timeout
max-grace-period Maximum time evicted pods will be given to terminate gracefully.
min-cpus Limits: minimum cpu (default: 1)
max-cpus Limits: max cpu (default: 24)
min-memory Limits: minimum memory in MB (default: 1G)
max-memory Limits: max memory in MB (default: 24G)
min-managednode-cpus Managed node: minimum cpu (default: 2)
max-managednode-cpus Managed node: max cpu (default: 32)
min-managednode-memory Managed node: minimum memory in MB (default: 2G)
max-managednode-memory Managed node: max memory in MB (default: 24G)
min-managednode-disksize Managed node: minimum disk size in MB (default: 10MB)
max-managednode-disksize Managed node: max disk size in MB (default: 1T)

Build

The build process use make file. The simplest way to build is make container

New features

Use k3s, rke2 or external as kubernetes distribution method

Instead using kubeadm as kubernetes distribution method, it is possible to use k3s, rke2 or external

external allow to use custom shell script to join cluster

Samples provided here

Use the vanilla autoscaler with extern gRPC cloud provider

You can also use the vanilla autoscaler with the externalgrpc cloud provider

Samples of the cluster-autoscaler deployment with vanilla autoscaler. You must fill value between <>

Use external kubernetes distribution

When you use a custom method to create your cluster, you must provide a shell script to vmware-autoscaler to join the cluster. The script use a yaml config created by vmware-autscaler at the given path.

config: /etc/default/vmware-autoscaler-config.yaml

provider-id: vsphere://42373f8d-b72d-21c0-4299-a667a18c9fce
max-pods: 110
node-name: vmware-dev-rke2-woker-01
server: 192.168.1.120:9345
token: K1060b887525bbfa7472036caa8a3c36b550fbf05e6f8e3dbdd970739cbd7373537
disable-cloud-controller: false

If you declare to use an external etcd service

datastore-endpoint: https://1.2.3.4:2379
datastore-cafile: /etc/ssl/etcd/ca.pem
datastore-certfile: /etc/ssl/etcd/etcd.pem
datastore-keyfile: /etc/ssl/etcd/etcd-key.pem

You can also provide extras config onto this file

{
  "external": {
    "join-command": "/usr/local/bin/join-cluster.sh"
    "config-path": "/etc/default/vmware-autoscaler-config.yaml"
    "extra-config": {
        "mydata": {
          "extra": "ball"
        },
        "...": "..."
    }
  }
}

Your script is responsible to set the correct kubelet flags such as max-pods=110, provider-id=vsphere://42373f8d-b72d-21c0-4299-a667a18c9fce, cloud-provider=external, ...

Annotations requirements

If you expected to use vmware-autoscaler on already deployed kubernetes cluster, you must add some node annotations to existing node

Also don't forget to create an image usable by vmware-autoscaler to scale up the cluster create-image.sh

Annotation Description Value
cluster-autoscaler.kubernetes.io/scale-down-disabled Avoid scale down for this node true
cluster.autoscaler.nodegroup/name Node group name vmware-dev-rke2
cluster.autoscaler.nodegroup/autoprovision Tell if the node is provisionned by vmware-autoscaler false
cluster.autoscaler.nodegroup/instance-id The vm UUID 42373f8d-b72d-21c0-4299-a667a18c9fce
cluster.autoscaler.nodegroup/managed Tell if the node is managed by vmware-autoscaler not autoscaled false
cluster.autoscaler.nodegroup/node-index The node index, will be set if missing 0

Sample master node

    cluster-autoscaler.kubernetes.io/scale-down-disabled: "true"
    cluster.autoscaler.nodegroup/autoprovision: "false"
    cluster.autoscaler.nodegroup/instance-id: 42373f8d-b72d-21c0-4299-a667a18c9fce
    cluster.autoscaler.nodegroup/managed: "false" 
    cluster.autoscaler.nodegroup/name: vmware-dev-rke2
    cluster.autoscaler.nodegroup/node-index: "0"

Sample first worker node

    cluster-autoscaler.kubernetes.io/scale-down-disabled: "true"
    cluster.autoscaler.nodegroup/autoprovision: "false"
    cluster.autoscaler.nodegroup/instance-id: 42370879-d4f7-eab0-a1c2-918a97ac6856
    cluster.autoscaler.nodegroup/managed: "false"
    cluster.autoscaler.nodegroup/name: vmware-dev-rke2
    cluster.autoscaler.nodegroup/node-index: "1"

Sample autoscaled worker node

    cluster-autoscaler.kubernetes.io/scale-down-disabled: "false"
    cluster.autoscaler.nodegroup/autoprovision: "true"
    cluster.autoscaler.nodegroup/instance-id: 3d25c629-3f1d-46b3-be9f-b95db2a64859
    cluster.autoscaler.nodegroup/managed: "false"
    cluster.autoscaler.nodegroup/name: vmware-dev-rke2
    cluster.autoscaler.nodegroup/node-index: "2"

Node labels

These labels will be added

Label Description Value
node-role.kubernetes.io/control-plane Tell if the node is control-plane true
node-role.kubernetes.io/master Tell if the node is master true
node-role.kubernetes.io/worker Tell if the node is worker true

Network

Now it's possible to disable dhcp-default routes and custom route

VMWare CPI compliant

Version 1.24.6 and 1.25.2 and above are vsphere cloud provider by building provider-id conform to syntax vsphere://<VM UUID>

CRD controller

This new release include a CRD controller allowing to create kubernetes node without use of govc or code. Just by apply a configuration file, you have the ability to create nodes on the fly.

As exemple you can take a look on artifacts/examples/example.yaml on execute the following command to create a new node

kubectl apply -f artifacts/examples/example.yaml

If you want delete the node just delete the CRD with the call

kubectl delete -f artifacts/examples/example.yaml

You have the ability also to create a control plane as instead a worker

kubectl apply -f artifacts/examples/controlplane.yaml

The resource is cluster scope so you don't need a namespace. The name of the resource is not the name of the managed node.

The minimal resource declaration

apiVersion: "nodemanager.aldunelabs.com/v1alpha1"
kind: "ManagedNode"
metadata:
  name: "vmware-ca-k8s-managed-01"
spec:
  nodegroup: vmware-ca-k8s
  vcpus: 2
  memorySizeInMb: 2048
  diskSizeInMb: 10240

The full qualified resource including networks declaration to override the default controller network management and adding some node labels & annotations. If you specify the managed node as controller, you can also allows the controlplane to support deployment as a worker node

apiVersion: "nodemanager.aldunelabs.com/v1alpha1"
kind: "ManagedNode"
metadata:
  name: "vmware-ca-k8s-managed-01"
spec:
  nodegroup: vmware-ca-k8s
  controlPlane: false
  allowDeployment: false
  vcpus: 2
  memorySizeInMb: 2048
  diskSizeInMb: 10240
  labels:
  - demo-label.acme.com=demo
  - sample-label.acme.com=sample
  annotations:
  - demo-annotation.acme.com=demo
  - sample-annotation.acme.com=sample
  networks:
    -
      network: "VM Network"
      address: 10.0.0.80
      netmask: 255.255.255.0
      gateway: 10.0.0.1
      use-dhcp-routes: false
      routes:
        - to: x.x.0.0/16
          via: 10.0.0.253
          metric: 100
        - to: y.y.y.y/8
          via: 10.0.0.253
          metric: 500
    -
      network: "VM Private"
      address: 192.168.1.80
      netmask: 255.255.255.0
      use-dhcp-routes: false

Declare additional routes and disable default DHCP routes

The release 1.24 and above allows to add additionnal route per interface, it also allows to disable default route declared by DHCP server.

As example of use generated by autoscaled-masterkube-vmware scripts

{
    "use-external-etcd": false,
    "src-etcd-ssl-dir": "/etc/etcd/ssl",
    "dst-etcd-ssl-dir": "/etc/kubernetes/pki/etcd",
    "kubernetes-pki-srcdir": "/etc/kubernetes/pki",
    "kubernetes-pki-dstdir": "/etc/kubernetes/pki",
    "distribution": "rke2",
    "network": "unix",
    "listen": "/var/run/cluster-autoscaler/vmware.sock",
    "cert-private-key": "/etc/ssl/client-cert/tls.key",
    "cert-public-key": "/etc/ssl/client-cert/tls.crt",
    "cert-ca": "/etc/ssl/client-cert/ca.crt",
    "secret": "vmware",
    "minNode": 0,
    "maxNode": 9,
    "maxNode-per-cycle": 2,
    "node-name-prefix": "autoscaled",
    "managed-name-prefix": "managed",
    "controlplane-name-prefix": "master",
    "nodePrice": 0,
    "podPrice": 0,
    "image": "jammy-kubernetes-cni-flannel-v1.27.8-containerd-amd64",
    "optionals": {
        "pricing": false,
        "getAvailableMachineTypes": false,
        "newNodeGroup": false,
        "templateNodeInfo": false,
        "createNodeGroup": false,
        "deleteNodeGroup": false,
    },
    "kubeadm": {
        "address": "192.168.1.120:6443",
        "token": "h1g55p.hm4rg52ymloax182",
        "ca": "sha256:c7a86a7a9a03a628b59207f4f3b3e038ebd03260f3ad5ba28f364d513b01f542",
        "extras-args": [
            "--ignore-preflight-errors=All"
        ],
    },
    "k3s": {
        "address": "192.168.1.120:6443",
        "token": "h1g55p.hm4rg52ymloax182",
        "datastore-endpoint": "https://1.2.3.4:2379",
        "extras-commands": []
    },
    "external": {
        "address": "192.168.1.120:6443",
        "token": "h1g55p.hm4rg52ymloax182",
        "datastore-endpoint": "https://1.2.3.4:2379",
        "join-command": "/usr/local/bin/join-cluster.sh",
        "config-path": "/etc/default/vmware-autoscaler-config.yaml",
        "extra-config": {
            "...": "..."
        }
    },
    "default-machine": "large",
    "machines": {
        "tiny": {
            "memsize": 2048,
            "vcpus": 2,
            "disksize": 10240
        },
        "small": {
            "memsize": 4096,
            "vcpus": 2,
            "disksize": 20480
        },
        "medium": {
            "memsize": 4096,
            "vcpus": 4,
            "disksize": 20480
        },
        "large": {
            "memsize": 8192,
            "vcpus": 4,
            "disksize": 51200
        },
        "xlarge": {
            "memsize": 16384,
            "vcpus": 4,
            "disksize": 102400
        },
        "2xlarge": {
            "memsize": 16384,
            "vcpus": 8,
            "disksize": 102400
        },
        "4xlarge": {
            "memsize": 32768,
            "vcpus": 8,
            "disksize": 102400
        },
    },
    "node-labels": [
        "topology.kubernetes.io/region=home",
        "topology.kubernetes.io/zone=office",
        "topology.csi.vmware.com/k8s-region=home",
        "topology.csi.vmware.com/k8s-zone=office",
    ],
    "cloud-init": {
        "package_update": false,
        "package_upgrade": false,
        "runcmd": [
            "echo 1 > /sys/block/sda/device/rescan",
            "growpart /dev/sda 1",
            "resize2fs /dev/sda1",
            "echo '192.168.1.120 vmware-ca-k8s-masterkube vmware-ca-k8s-masterkube.acme.com' >> /etc/hosts",
        ],
    },
    "ssh-infos": {
        "user": "kubernetes",
        "ssh-private-key": "/root/.ssh/id_rsa"
    },
    "autoscaling-options": {
        "scaleDownUtilizationThreshold": 0.5,
        "scaleDownGpuUtilizationThreshold": 0.5,
        "scaleDownUnneededTime": "1m",
        "scaleDownUnreadyTime": "1m",
    },
    "vmware": {
        "vmware-ca-k8s": {
            "url": "https://[email protected]:[email protected]/sdk",
            "uid": "[email protected]",
            "password": "mySecret",
            "insecure": true,
            "dc": "DC01",
            "datastore": "datastore1",
            "resource-pool": "ACME/Resources/FR",
            "vmFolder": "HOME",
            "timeout": 300,
            "template-name": "jammy-kubernetes-cni-flannel-v1.26.0-containerd-amd64",
            "template": false,
            "linked": false,
            "customization": "",
            "network": {
                "domain": "acme.com",
                "dns": {
                    "search": [
                        "acme.com"
                    ],
                    "nameserver": [
                        "10.0.0.1"
                    ]
                },
                "interfaces": [
                    {
                        "primary": false,
                        "exists": true,
                        "network": "VM Network",
                        "adapter": "vmxnet3",
                        "mac-address": "generate",
                        "nic": "eth0",
                        "dhcp": true,
                        "use-dhcp-routes": true,
                        "routes": [
                            {
                                "to": "172.30.0.0/16",
                                "via": "10.0.0.5",
                                "metric": 500,
                            },
                        ],
                    },
                    {
                        "primary": true,
                        "exists": true,
                        "network": "VM Private",
                        "adapter": "vmxnet3",
                        "mac-address": "generate",
                        "nic": "eth1",
                        "dhcp": true,
                        "use-dhcp-routes": false,
                        "address": "192.168.1.124",
                        "gateway": "10.0.0.1",
                        "netmask": "255.255.255.0",
                        "routes": []
                    }
                ]
            }
        }
    }
}

Unmaintened releases

All release before 1.26.11 are not maintened

kubernetes-vmware-autoscaler's People

Contributors

fred78290 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

kubernetes-vmware-autoscaler's Issues

Unable to launch VM - unable to reconfigure VM reason: The operation is not supported on the object

Hello,
I have a problem with the autoscaler failing to finish the vm creation of the new node. I can clearly see in the console of my vcenter the vm being created but at almost 99% the cloning is aborted with the error:
time="2022-08-22T16:32:37Z" level=error msg="Unable to launch VM:evak8sautoscale.local-ca-01 for nodegroup: evak8sautoscale.local. Reason: unable to launch the VM owned by node: evak8sautoscale.local-ca-01, reason: unable to reconfigure VM:evak8sautoscale.local-ca-01, reason: The operation is not supported on the object." time="2022-08-22T16:32:37Z" level=error msg="unable to launch the VM owned by node: evak8sautoscale.local-ca-01, reason: unable to launch the VM owned by node: evak8sautoscale.local-ca-01, reason: unable to reconfigure VM:evak8sautoscale.local-ca-01, reason: The operation is not supported on the object."
and after the vm is deleted.
my deployment file :
spec: replicas: 1 selector: matchLabels: app: cluster-autoscaler k8s-app: cluster-autoscaler template: metadata: creationTimestamp: null labels: app: cluster-autoscaler k8s-app: cluster-autoscaler spec: volumes: - name: cluster-socket emptyDir: {} - name: config-cluster-autoscaler configMap: name: config-cluster-autoscaler defaultMode: 420 - name: ssl-certs hostPath: path: /etc/ssl/certs/ca-certificates.crt type: '' - name: autoscaler-ssh-keys secret: secretName: autoscaler-ssh-keys defaultMode: 384 - name: etcd-ssl secret: secretName: etcd-ssl defaultMode: 384 - name: kubernetes-pki configMap: name: kubernetes-pki defaultMode: 420 initContainers: - name: cluster-autoscaler-init image: busybox command: - /bin/sh - '-c' - rm -f /var/run/cluster-autoscaler/vmware.sock resources: {} volumeMounts: - name: cluster-socket mountPath: /var/run/cluster-autoscaler terminationMessagePath: /dev/termination-log terminationMessagePolicy: File imagePullPolicy: Always containers: - name: vsphere-autoscaler image: fred78290/vsphere-autoscaler:v1.23.8 command: - /usr/local/bin/vsphere-autoscaler - '--no-use-external-etcd' - '--src-etcd-ssl-dir=/etc/etcd/ssl' - '--dst-etcd-ssl-dir=/etc/etcd/ssl' - '--config=/etc/cluster/kubernetes-vmware-autoscaler.json' - '--save=/var/run/cluster-autoscaler/vmware-autoscaler-state.json' - '--log-level=debug' resources: limits: cpu: 100m memory: 300Mi requests: cpu: 100m memory: 300Mi volumeMounts: - name: cluster-socket mountPath: /var/run/cluster-autoscaler - name: config-cluster-autoscaler mountPath: /etc/cluster - name: autoscaler-ssh-keys mountPath: /root/.ssh - name: etcd-ssl mountPath: /etc/etcd/ssl - name: kubernetes-pki mountPath: /etc/kubernetes/pki terminationMessagePath: /dev/termination-log terminationMessagePolicy: File imagePullPolicy: Always - name: cluster-autoscaler image: fred78290/cluster-autoscaler:v1.23.8 command: - ./cluster-autoscaler - '--logtostderr=true' - '--stderrthreshold=info' - '--v=5' - '--cloud-provider=grpc' - '--cloud-config=/etc/cluster/grpc-config.json' - '--nodes=0:9:true/evak8sautoscale.local' - '--max-nodes-total=9' - '--cores-total=0:100' - '--memory-total=0:256' - '--node-autoprovisioning-enabled' - '--max-autoprovisioned-node-group-count=1' - '--scale-down-enabled=true' - '--scale-down-delay-after-add=1m' - '--scale-down-delay-after-delete=1m' - '--scale-down-delay-after-failure=1m' - '--scale-down-unneeded-time=1m' - '--scale-down-unready-time=1m' - '--unremovable-node-recheck-timeout=1m' resources: limits: cpu: 100m memory: 300Mi requests: cpu: 100m memory: 300Mi volumeMounts: - name: cluster-socket mountPath: /var/run/cluster-autoscaler - name: ssl-certs readOnly: true mountPath: /etc/ssl/certs/ca-certificates.crt - name: config-cluster-autoscaler readOnly: true mountPath: /etc/cluster terminationMessagePath: /dev/termination-log terminationMessagePolicy: File imagePullPolicy: Always restartPolicy: Always terminationGracePeriodSeconds: 30 dnsPolicy: ClusterFirst nodeSelector: master: 'true' serviceAccountName: cluster-autoscaler serviceAccount: cluster-autoscaler securityContext: {} schedulerName: default-scheduler tolerations: - key: node-role.kubernetes.io/master effect: NoSchedule - key: node-role.kubernetes.io/control-plane effect: NoSchedule strategy: type: RollingUpdate rollingUpdate: maxUnavailable: 25% maxSurge: 25% revisionHistoryLimit: 10 progressDeadlineSeconds: 600 status: observedGeneration: 24 replicas: 1 updatedReplicas: 1 readyReplicas: 1 availableReplicas: 1 conditions: - type: Progressing status: 'True' lastUpdateTime: '2022-08-22T16:09:02Z' lastTransitionTime: '2022-08-18T21:55:17Z' reason: NewReplicaSetAvailable message: ReplicaSet "cluster-autoscaler-68f68b5c4f" has successfully progressed. - type: Available status: 'True' lastUpdateTime: '2022-08-22T16:28:35Z' lastTransitionTime: '2022-08-22T16:28:35Z' reason: MinimumReplicasAvailable message: Deployment has minimum availability.
my configmap:
data: grpc-config.json: | { "address": "unix:/var/run/cluster-autoscaler/vmware.sock", "secret": "vmware", "timeout": 300, "config": { "kubeAdmAddress": "x.x.x.x:6443", "kubeAdmToken": "xxxxxxxxxxxxxxxxxxx", "kubeAdmCACert": "sha256:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", "kubeAdmExtraArguments": [ "--ignore-preflight-errors=All" ] } } kubernetes-vmware-autoscaler.json: | { "use-external-etcd": false, "src-etcd-ssl-dir": "/etc/etcd/ssl", "dst-etcd-ssl-dir": "/etc/kubernetes/pki/etcd", "kubernetes-pki-srcdir": "/etc/kubernetes/pki", "kubernetes-pki-dstdir": "/etc/kubernetes/pki", "network": "unix", "listen": "/var/run/cluster-autoscaler/vmware.sock", "secret": "vmware", "minNode": 6, "maxNode": 10, "maxNode-per-cycle": 1, "node-name-prefix": "ca", "managed-name-prefix": "managed", "controlplane-name-prefix": "master", "nodePrice": 0, "podPrice": 0, "image": "focal-kubernetes-cni-calico-v1.23.3-containerd-amd64", "optionals": { "pricing": false, "getAvailableMachineTypes": true, "newNodeGroup": false, "templateNodeInfo": true, "createNodeGroup": true, "deleteNodeGroup": false }, "kubeadm": { "address": "x.x.x.x:6443", "token": "xxxxxxxxxxxxxxxxxxxxxxxxxx", "ca": "sha256:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", "extras-args": [ "--ignore-preflight-errors=All" ] }, "default-machine": "medium", "machines": { "tiny": { "memsize": 1024, "vcpus": 1, "disksize": 10240 }, "small": { "memsize": 2048, "vcpus": 2, "disksize": 10240 }, "medium": { "memsize": 4096, "vcpus": 2, "disksize": 20480 }, "large": { "memsize": 8192, "vcpus": 4, "disksize": 51200 }, "extra-large": { "memsize": 16384, "vcpus": 4, "disksize": 102400 } }, "cloud-init": { "package_update": false, "package_upgrade": false, "runcmd": [ "echo 1 > /sys/block/sda/device/rescan", "growpart /dev/sda 1", "resize2fs /dev/sda1", "echo 'x.x.x.x k8scacp1 k8scacp1.k8s.domain \n z.z.z.z k8scaw1 k8scaw1.k8s.domain \n e.e.e.e k8scaetcd1 k8scaetcd1.k8s.domain' >> /etc/hosts" ] }, "ssh-infos": { "user": "root", "ssh-private-key": "/root/.ssh/id_rsa" }, "vmware": { "evak8sautoscale.local": { "url": "https://[email protected]:[email protected]", "uid": "[email protected]", "password": "password", "insecure": true, "dc": "dc1.xxx.fr", "datastore": "vsanDatastore01", "resource-pool": "/dc1/host/cl1/Resources", "vmFolder": "/dc/vm/Infrastructure/K8S/k8s-autoscale", "timeout": 300, "template-name": "focal-kubernetes-cni-calico-v1.23.3-containerd-amd64", "template": true, "linked": false, "customization": "", "network": { "dns": { "search": [ "xxx.fr" ], "nameserver": [ "x.x.x.x" ] }, "interfaces": [ { "primary": true, "exists": true, "network": "net199", "adapter": "vmxnet3", "mac-address": "generate", "nic": "ens160", "dhcp": false, "address": "x.x.x.x", "gateway": "y.y.y.y", "netmask": "255.255.255.0" } ] } } } }
Cordially.

Unable to launch VM.. vcenter context deadline exceeded

Hello,

I am having issues with this version of autoscaler. The autoscaled vm is cloned fine in vcenter, however, autoscaler app appears to lose track of it and fails with the following errors:

time="2021-11-22T19:49:20Z" level=info msg="Launch VM:vmware-ca-k8s-autoscaled-01 for nodegroup: vmware-ca-k8s" time="2021-11-22T19:49:20Z" level=info msg="Launch VM:vmware-ca-k8s-autoscaled-02 for nodegroup: vmware-ca-k8s" time="2021-11-22T19:56:04Z" level=error msg="Unable to launch VM:vmware-ca-k8s-autoscaled-02 for nodegroup: vmware-ca-k8s. Reason: could not start VM: vmware-ca-k8s-autoscaled-02, reason: Post \"https://vcenter/sdk\": context deadline exceeded" time="2021-11-22T19:56:04Z" level=error msg="unable to launch the VM owned by node: vmware-ca-k8s-autoscaled-02, reason: could not start VM: vmware-ca-k8s-autoscaled-02, reason: Post \"https://vcenter/sdk\": context deadline exceeded" time="2021-11-22T19:56:04Z" level=error msg="Unable to launch VM:vmware-ca-k8s-autoscaled-01 for nodegroup: vmware-ca-k8s. Reason: could not start VM: vmware-ca-k8s-autoscaled-01, reason: Post \"https://vcenter/sdk\": context deadline exceeded"

autoscale pod can't connect to Vmware Vcenter

Hello,
I have a problem with the vsphere-autoscaler container which cannot contact the Vcenter Vmware with this error in its logs:
time="2022-02-09T11:17:31Z" level=error msg="can't get the info for VM: xxxxxx, reason: Post "https:////username:[email protected]": http: no Host in request URL"

I think it's due to //// but I can't find them in the configmap.

Can you please help me.

Cordially.

Is there a way to specify default host?

here is the error:

2021-08-17 19:49:10.030262 I | Cloud provider:vmware:vmware-ca-k8s call NodeGroup::IncreaseSize got error: code:"cloudProviderError" reason:"could not start VM: vmware-ca-k8s-vm-02, reason: default host resolves to multiple instances, please specify"
W0817 19:49:10.030399 1 clusterstate.go:277] Disabling scale-up for node group vmware-ca-k8s until 2021-08-17 19:58:39.770504247 +0000 UTC m=+1065.506283222; errorClass=Other; errorCode=cloudProviderError
E0817 19:49:10.030504 1 static_autoscaler.go:427] Failed to scale up: failed to increase node group size: could not start VM: vmware-ca-k8s-vm-02, reason: default host resolves to multiple instances, please specifycould not start VM: vmware-ca-k8s-vm-02, reason: default host resolves to multiple instances, please specify

Setting "GOVC_HOST" option in create-masterkube.sh does not help

What value should be set in sample?

Hello.

I want to deploy sample, but I don't understand some configmap value between <>.
That's as follows.

    address: <YOUR_GRPC_SERVER_FQCN_OR_IP>
    port: <YOUR_GRPC_SERVER_PORT>
    identifier: <SECRET/IDENTIFIER SHARED BETWEEN CLIENT_SERVER>

I guess to address is vcenter IP address,and what port, identifier?

And the following is everything to configmap. Please help to check if the other value is correct.

apiVersion: v1
kind: ConfigMap
metadata:
  name: cloud-config
data:
  grpc.conf: |-
    address: <YOUR_GRPC_SERVER_FQCN_OR_IP>
    port: <YOUR_GRPC_SERVER_PORT>
    identifier: <SECRET/IDENTIFIER SHARED BETWEEN CLIENT_SERVER>
    timeout: 60
    config:
        kubeAdmAddress: 10.60.200.170
        kubeAdmToken: 9w6o68.jbdndje84e6xc40s
        kubeAdmCACert: "~~~~"
        kubeAdmExtraArguments:
            - --ignore-preflight-errors=All

Autoscale without DHCP server

Hello,

I successfully deployed the autoscaler in my existing K8S cluster. my question is if i can use the autoscaler to create a new node without a DHCP server in my network and if it is possible HOW?

Cordially.

HELP: Scaling doesnt work

Hey,

we enabled the debugging option and see following output:

Call server TargetSize: id:"vmware-ca-k8s"

cluster autoscaler tells us following:

I1207 11:23:17.741823 1 orchestrator.go:546] Pod enbitcon/enbitcon-shopware6-64cfd4bbc8-45m4v can't be scheduled on vmware-ca-k8s, predicate checking error: Too many pods, Insufficient cpu, Insufficient memory; predicateName=NodeResourcesFit; reasons: Too many pods, Insufficient cpu, Insufficient memory; debugInfo=

Manifests:

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "17"
    meta.helm.sh/release-name: cluster-autoscaler
    meta.helm.sh/release-namespace: kube-system
  creationTimestamp: "2023-12-06T13:45:35Z"
  generation: 17
  labels:
    app.kubernetes.io/instance: cluster-autoscaler
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: vsphere-autoscaler
    app.kubernetes.io/version: 1.16.0
    helm.sh/chart: vsphere-autoscaler-0.1.0
  name: cluster-autoscaler-vsphere-autoscaler
  namespace: kube-system
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/instance: cluster-autoscaler
      app.kubernetes.io/name: vsphere-autoscaler
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/instance: cluster-autoscaler
        app.kubernetes.io/name: vsphere-autoscaler
    spec:
      containers:
      - command:
        - /usr/local/bin/vsphere-autoscaler
        - --no-use-external-etcd
        - --src-etcd-ssl-dir=/etc/etcd/ssl
        - --dst-etcd-ssl-dir=/etc/etcd/ssl
        - --config=/etc/cluster/kubernetes-vmware-autoscaler.json
        - --save=/var/run/cluster-autoscaler/vmware-autoscaler-state.json
        - --log-level=debug
        image: fred78290/vsphere-autoscaler:v1.27.1
        imagePullPolicy: IfNotPresent
        name: vsphere-autoscaler
        resources: {}
        securityContext: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/run/cluster-autoscaler
          name: cluster-socket
        - mountPath: /etc/cluster
          name: config-cluster-autoscaler
        - mountPath: /etc/ssh
          name: autoscaler-ssh-keys
        - mountPath: /etc/etcd/ssl
          name: etcd-ssl
        - mountPath: /etc/kubernetes/pki
          name: kubernetes-pki
      - command:
        - ./cluster-autoscaler
        - --v=3
        - --stderrthreshold=info
        - --cloud-provider=externalgrpc
        - --cloud-config=/etc/cluster/cloud-config
        - --max-nodes-total=100
        - --node-autoprovisioning-enabled
        - --max-autoprovisioned-node-group-count=1
        - --scale-down-enabled=true
        - --scale-down-delay-after-add=1m
        - --scale-down-delay-after-delete=1m
        - --scale-down-delay-after-failure=1m
        - --scale-down-unneeded-time=1m
        - --scale-down-unready-time=1m
        - --unremovable-node-recheck-timeout=1m
        image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.28.2
        imagePullPolicy: IfNotPresent
        name: cluster-autoscaler
        resources:
          limits:
            cpu: 100m
            memory: 300Mi
          requests:
            cpu: 100m
            memory: 300Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/run/cluster-autoscaler
          name: cluster-socket
        - mountPath: /etc/ssl/certs/ca-certificates.crt
          name: ssl-certs
          readOnly: true
        - mountPath: /etc/cluster
          name: config-cluster-autoscaler
          readOnly: true
      dnsPolicy: ClusterFirst
      initContainers:
      - command:
        - /bin/sh
        - -c
        - rm -f /var/run/cluster-autoscaler/vmware.sock
        image: busybox
        imagePullPolicy: Always
        name: cluster-autoscaler-init
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/run/cluster-autoscaler
          name: cluster-socket
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: cluster-autoscaler
      serviceAccountName: cluster-autoscaler
      terminationGracePeriodSeconds: 30
      volumes:
      - emptyDir: {}
        name: cluster-socket
      - name: config-cluster-autoscaler
        secret:
          defaultMode: 420
          secretName: config-cluster-autoscaler
      - hostPath:
          path: /etc/ssl/certs/ca-certificates.crt
          type: ""
        name: ssl-certs
      - name: autoscaler-ssh-keys
        secret:
          defaultMode: 420
          secretName: autoscaler-ssh-keys
      - name: etcd-ssl
        secret:
          defaultMode: 384
          secretName: etcd-ssl
      - configMap:
          defaultMode: 420
          name: kubernetes-pki
        name: kubernetes-pki
---
{
  "use-external-etcd": false,
  "src-etcd-ssl-dir": "/etc/etcd/ssl",
  "dst-etcd-ssl-dir": "/etc/kubernetes/pki/etcd",
  "kubernetes-pki-srcdir": "/etc/kubernetes/pki",
  "kubernetes-pki-dstdir": "/etc/kubernetes/pki",
  "use-vanilla-grpc": true,
  "use-controller-manager": true,
  "network": "unix",
  "listen": "/var/run/cluster-autoscaler/vmware.sock",
  "secret": "vmware",
  "minNode": 0,
  "maxNode": 100,
  "maxNode-per-cycle": 2,
  "node-name-prefix": "autoscaled",
  "managed-name-prefix": "enbitkubwork",
  "controlplane-name-prefix": "enbitkub0",
  "nodePrice": 0,
  "podPrice": 0,
  "image": "acmekubworker",
  "optionals": {
    "pricing": false,
    "getAvailableMachineTypes": false,
    "newNodeGroup": false,
    "templateNodeInfo": false,
    "createNodeGroup": false,
    "deleteNodeGroup": false
  },
  "kubeadm": {
    "address": "10.30.2.16:6443",
    "token": "2e4yeNPLhh....",
    "ca": "...",
    "extras-args": [
      "--ignore-preflight-errors=All"
    ]
  },
  "default-machine": "large",
  "machines": {
    "tiny": {
      "memsize": 2048,
      "vcpus": 2,
      "disksize": 10240
    },
    "small": {
      "memsize": 4096,
      "vcpus": 2,
      "disksize": 20480
    },
    "medium": {
      "memsize": 4096,
      "vcpus": 4,
      "disksize": 20480
    },
    "large": {
      "memsize": 8192,
      "vcpus": 4,
      "disksize": 51200
    },
    "xlarge": {
      "memsize": 16384,
      "vcpus": 4,
      "disksize": 102400
    },
    "2xlarge": {
      "memsize": 16384,
      "vcpus": 8,
      "disksize": 102400
    },
    "4xlarge": {
      "memsize": 32768,
      "vcpus": 8,
      "disksize": 102400
    }
  },
  "node-labels": [
    "topology.kubernetes.io/region=k8s-region",
    "topology.kubernetes.io/zone=k8s-zone",
    "topology.csi.vmware.com/k8s-region=k8s-region",
    "topology.csi.vmware.com/k8s-zone=k8s-zone"
  ],
  "cloud-init": {
    "package_update": false,
    "package_upgrade": false,
    "runcmd": [
      "/home/acmekub/nodesetup.sh"
    ]
  },
  "ssh-infos": {
    "user": "root",
    "ssh-private-key": "/etc/ssh/id_rsa"
  },
  "autoscaling-options": {
    "scaleDownUtilizationThreshold": 0.5,
    "scaleDownGpuUtilizationThreshold": 0.5
  },
  "vmware": {
    "vmware-ca-k8s": {
      "url": "https://administrator:[email protected]/sdk",
      "uid": "[email protected]",
      "password": "redacted",
      "insecure": true,
      "dc": "acme RZ",
      "datastore": "[NVME]",
      "resource-pool": "acme/Resources",
      "vmFolder": "HOME",
      "timeout": 300,
      "template-name": "acmekubworker",
      "template": true,
      "linked": true,
      "customization": "",
      "network": {
        "domain": "acme.com",
        "dns": {
          "search": [
            "acme.com"
          ],
          "nameserver": [
            ""
          ]
        },
        "interfaces": [
          {
            "primary": true,
            "exists": true,
            "network": "DMZ",
            "adapter": "vmxnet3",
            "mac-address": "generate",
            "nic": "eth0",
            "dhcp": true,
            "use-dhcp-routes": true,
            "routes": []
          }
        ]
      }
    }
  }
}

Thanks a lot.

ERROR: the object has been modified; please apply your changes to the latest version and try again

Hello,

Once in a while I get "the object has been modified; please apply your changes to the latest version and try again" error when autoscaling. During this time the vm is created but something fails (when adding to k8s cluster?) and vm gets deleted.

time="2021-09-14T15:49:07Z" level=info msg="Launch VM:dev-k8s-as-vm-16 for nodegroup: dev-k8s-as"
time="2021-09-14T15:51:02Z" level=info msg="Wait kubernetes node dev-k8s-as-vm-16 to be ready"
time="2021-09-14T15:51:12Z" level=info msg="The kubernetes node dev-k8s-as-vm-16 is Ready"
time="2021-09-14T15:51:12Z" level=error msg="Unable to launch VM:dev-k8s-as-vm-16 for nodegroup: dev-k8s-as. Reason: set labels on node: dev-k8s-as-vm-16 got error: Operation cannot be fulfilled on nodes \"dev-k8s-as-vm-16\": the object has been modified; please apply your changes to the latest version and try again"
time="2021-09-14T15:51:12Z" level=error msg="unable to launch the VM owned by node: dev-k8s-as-vm-16, reason: set labels on node: dev-k8s-as-vm-16 got error: Operation cannot be fulfilled on nodes \"dev-k8s-as-vm-16\": the object has been modified; please apply your changes to the latest version and try again"
time="2021-09-14T15:51:15Z" level=info msg="Deleted VM:dev-k8s-as-vm-16"

Any ideas what the issue is?

Docker image doesn't contain ssh client tools

The produced docker image based on ubuntu:focal dont include ssh and scp client need by vsphere-autoscaler to prepare autoscaled node.

In fact the package ss-tools has been dropped from recent official ubuntu:focal image.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.