Giter Site home page Giter Site logo

vexxhost / magnum-cluster-api Goto Github PK

View Code? Open in Web Editor NEW
42.0 11.0 19.0 4.01 MB

Cluster API driver for OpenStack Magnum

License: Apache License 2.0

Python 85.73% Shell 7.73% Ruby 0.03% Jinja 0.14% Dockerfile 0.21% Earthly 0.31% Smarty 4.69% Nix 0.16% Mustache 0.99%

magnum-cluster-api's Issues

Better handling for default volume

The default volume is not properly selected, we should either rely on Cinder to select the default volume if none is specified or fail early.

Bubble errors up to Magnum API

At the moment, errors in the Cluster API do not bubble up into the Magnum API

  Warning  Failedcreatenetwork  11m (x19 over 34m)  openstack-controller  Failed to create network k8s-clusterapi-cluster-magnum-system-cluster-2-ikhba5qojt: Expected HTTP response code [201 202] when accessing [POST https://network.openstack.cloud.local/v2.0/networks], but got 409 instead
{"NeutronError": {"type": "OverQuota", "message": "Quota exceeded for resources: ['network'].", "detail": ""}}

We need to bubble those errors up to OpenStack, or at least the quota checks.

Add support for `boot_volume_{size,type}`

There are a few labels that are used inside Magnum when doing boot from volume:

  • boot_volume_size
  • boot_volume_type

There is also corresponding set of functions that allow you to get the volume type:

https://github.com/openstack/magnum/blob/16bdedcf2fe6986c995bd415f4e3c70dac914ada/magnum/common/cinder.py#L25-L37

and some defaults:

It looks like for now, we'll be able to support boot from volume here:

https://github.com/kubernetes-sigs/cluster-api-provider-openstack/blob/main/docs/book/src/clusteropenstack/configuration.md#boot-from-volume

so if boot_volume_size and boot_volume_type are set (or their defaults are), then we point to that.

Integrate to Kolla Ansible

Hello.
I want to test this project with multi node deployment. Can we have some guideline for it?
Thanks

Topology managed health checks

The health checks are currently built out using MHC resources manually, this should be modified to be using the topology so they can be managed by it.

Integration with Keystone auth

When following the instructions in #56 my cluster create fails with an error resembling the following:

{
  "status": "CREATE_FAILED",
  .
  .
  "status_reason": "(https://10.10.4.4:35357/v3/users/590bc654b1014a2fa568631ba399524a/application_credentials): The resource could not be found. (HTTP 404) (Request-ID: req-6204ca58-87f5-4868-bc2c-ba4159399fba)",
  .
  .
}

My question: Is there a way around using application credentials to authenticate? We are using a bit older version of Keystone (Pike) and are probably not going to be able to upgrade very soon.

Management cluster is no upgraded

When trying to bump mcapi, it was supposed to upgrade the cluster and not initialize again.

i.e

TASK [vexxhost.atmosphere.magnum : Initialize the management cluster] ******************************************************************************************************************************************************
fatal: [usegonslp020.vistex.local]: FAILED! => {"changed": false, "cmd": ["clusterctl", "init", "--config", "/etc/clusterctl.yaml", "--core", "cluster-api:v1.3.3", "--bootstrap", "kubeadm:v1.3.3", "--control-plane", "kubeadm:v1.3.3", "--infrastructure", "openstack:v0.7.1"], "delta": "0:00:03.355675", "end": "2023-04-25 13:23:42.272366", "msg": "non-zero return code", "rc": 1, "start": "2023-04-25 13:23:38.916691", "stderr": "Fetching providers\nError: installing provider \"infrastructure-openstack\" can lead to a non functioning management cluster: there is already an instance of the \"infrastructure-openstack\" provider installed in the \"capo-system\" namespace", "stderr_lines": ["Fetching providers", "Error: installing provider \"infrastructure-openstack\" can lead to a non functioning management cluster: there is already an instance of the \"infrastructure-openstack\" provider installed in the \"capo-system\" namespace"], "stdout": "", "stdout_lines": []}

IPIP rules in security groups not applying properly

IPIP encapsulated DNS requests from pods to coredns get blocked by iptables.
on closer inspections, the packet dont get picked up by the security groups rules:

ID                                   | IP Protocol | Ethertype | IP Range  | Port Range  | Remote Security Group
6a9c3c58-bdd0-474e-8bf8-e8e248ad2cc6 | ipip           | IPv4      | 0.0.0.0/0 |             | ba33108a-ceb1-4ef0-98cd-455fa916166f |
c031473b-9da4-414f-94e3-425aa34dc2b8 | ipip           | IPv4      | 0.0.0.0/0 |             | 76192aca-20eb-43e6-bd1c-331979e57157

changing the IP Protocol from IPIP to 4 does the trick.

ID                                   | IP Protocol | Ethertype | IP Range  | Port Range  | Remote Security Group
6a9c3c58-bdd0-474e-8bf8-e8e248ad2cc6 | 4           | IPv4      | 0.0.0.0/0 |             | ba33108a-ceb1-4ef0-98cd-455fa916166f
c031473b-9da4-414f-94e3-425aa34dc2b8 | 4           | IPv4      | 0.0.0.0/0 |             | 76192aca-20eb-43e6-bd1c-331979e57157

What I don't understand is how we are hitting these issues as the fix has been merged since march 2020.

projectcalico/calico#2700
projectcalico/calico#2111

Dynamic `ClusterClass` version

The ClusterClass has as static name, we should probably use something like pbr to generate a dynamic name for the ClusterClass.

Non-existing resources should not stop cluster deletion

Nov 07 13:31:09 devstack magnum-conductor[2806024]: ERROR oslo_messaging.rpc.server [None req-8f39cccc-1fc0-4aab-9062-b9fea14ece2e None None] Exception during message handling: pykube.exceptions.ObjectDoesNotExist: k8s-v1.25.3-74wufkfiei-cloud-config does not exist.
Nov 07 13:31:09 devstack magnum-conductor[2806024]: ERROR oslo_messaging.rpc.server Traceback (most recent call last):
Nov 07 13:31:09 devstack magnum-conductor[2806024]: ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.8/dist-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming
Nov 07 13:31:09 devstack magnum-conductor[2806024]: ERROR oslo_messaging.rpc.server     res = self.dispatcher.dispatch(message)
Nov 07 13:31:09 devstack magnum-conductor[2806024]: ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.8/dist-packages/oslo_messaging/rpc/dispatcher.py", line 309, in dispatch
Nov 07 13:31:09 devstack magnum-conductor[2806024]: ERROR oslo_messaging.rpc.server     return self._do_dispatch(endpoint, method, ctxt, args)
Nov 07 13:31:09 devstack magnum-conductor[2806024]: ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.8/dist-packages/oslo_messaging/rpc/dispatcher.py", line 229, in _do_dispatch
Nov 07 13:31:09 devstack magnum-conductor[2806024]: ERROR oslo_messaging.rpc.server     result = func(ctxt, **new_args)
Nov 07 13:31:09 devstack magnum-conductor[2806024]: ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python3.8/dist-packages/osprofiler/profiler.py", line 159, in wrapper
Nov 07 13:31:09 devstack magnum-conductor[2806024]: ERROR oslo_messaging.rpc.server     result = f(*args, **kwargs)
Nov 07 13:31:09 devstack magnum-conductor[2806024]: ERROR oslo_messaging.rpc.server   File "/opt/stack/magnum/magnum/conductor/handlers/cluster_conductor.py", line 191, in cluster_delete
Nov 07 13:31:09 devstack magnum-conductor[2806024]: ERROR oslo_messaging.rpc.server     cluster_driver.delete_cluster(context, cluster)
Nov 07 13:31:09 devstack magnum-conductor[2806024]: ERROR oslo_messaging.rpc.server   File "/home/ubuntu/magnum-cluster-api/magnum_cluster_api/driver.py", line 194, in delete_cluster
Nov 07 13:31:09 devstack magnum-conductor[2806024]: ERROR oslo_messaging.rpc.server     resources.Cluster(context, self.k8s_api, cluster).delete()
Nov 07 13:31:09 devstack magnum-conductor[2806024]: ERROR oslo_messaging.rpc.server   File "/home/ubuntu/magnum-cluster-api/magnum_cluster_api/resources.py", line 56, in delete
Nov 07 13:31:09 devstack magnum-conductor[2806024]: ERROR oslo_messaging.rpc.server     resource = self.get_object()
Nov 07 13:31:09 devstack magnum-conductor[2806024]: ERROR oslo_messaging.rpc.server   File "/home/ubuntu/magnum-cluster-api/magnum_cluster_api/resources.py", line 1299, in get_object
Nov 07 13:31:09 devstack magnum-conductor[2806024]: ERROR oslo_messaging.rpc.server     utils.generate_cloud_controller_manager_config(
Nov 07 13:31:09 devstack magnum-conductor[2806024]: ERROR oslo_messaging.rpc.server   File "/home/ubuntu/magnum-cluster-api/magnum_cluster_api/utils.py", line 56, in generate_cloud_controller_manager_config
Nov 07 13:31:09 devstack magnum-conductor[2806024]: ERROR oslo_messaging.rpc.server     data = pykube.Secret.objects(api, namespace="magnum-system").get_by_name(
Nov 07 13:31:09 devstack magnum-conductor[2806024]: ERROR oslo_messaging.rpc.server   File "/home/ubuntu/.local/lib/python3.8/site-packages/pykube/query.py", line 116, in get_by_name
Nov 07 13:31:09 devstack magnum-conductor[2806024]: ERROR oslo_messaging.rpc.server     raise ObjectDoesNotExist(f"{name} does not exist.")
Nov 07 13:31:09 devstack magnum-conductor[2806024]: ERROR oslo_messaging.rpc.server pykube.exceptions.ObjectDoesNotExist: k8s-v1.25.3-74wufkfiei-cloud-config does not exist.
Nov 07 13:31:09 devstack magnum-conductor[2806024]: ERROR oslo_messaging.rpc.server 

Have a name converter between openstack and kubernetes resources

Context

We create several k8s resources inside this m-capi driver and their names sometimes include the one of related openstack resources.
For instance, we name the k8s storage classes by using the corresponding openstack volume type name.
Openstack allows for the volume type name to include + but k8s doesn't allow this character in the resource name like this

f"storageclass-{vt.name}.yaml": yaml.dump(
.

It causes k8s resource creation error.

solution suggesting

Have a util func to convert openstack resource name to k8s-like name when it is used in k8s resource name.

Dynamically add all StorageClass

An OpenStack cloud can have many volume types. Instead of having a disconnect between the Kubernetes cluster and the OpenStack cloud, we should automatically poll for all the volume types and create StorageClass for them.

In the same time, we will also make the default StorageClass the one which is default in the cloud, that will make things nice and simple for the user.

Implement auto healing features

In Magnum, when you specify the auto_healing_enabled label set to true, it will enable the magnum-auto-healer. However, since we are using the Cluster API, we can rely on the built-in health checking feature:

https://cluster-api.sigs.k8s.io/tasks/automated-machine-management/healthchecking.html

We need to add a MachineHealthCheck resource if auto_healing_enabled is set to true. Also, unlike how Magnum defaults to auto_healing_enabled set to false (i.e. not enabled), we should always enable it since it would be largely beneficial for the user to have it enabled (better user experience).

We'll also have to factor in cluster label updates in case the user wants to enable/disable it dynamically, so we'd end up with the following:

  • Update create_cluster to add MachineHealthCheck if the label is either on cluster or cluster template (might need to move this function to a utils.py)
  • Update update_cluster to add/remove MachineHealthCheck depending on the value of the label
  • Test auto healing by shutting down a node and seeing if Cluster API autoheals it

Add Manila CSI

We should detect what services are running in the cloud (such as Manila or Cinder) and then install the appropriate CSI.

We already deploy Cinder but all the time, instead, we should deploy it if we detect the Cinder service, and the same goes for Manila, that way we finally have ReadWriteMany volumes! :)

Autoscaling

Auto-scaling is a useful feature that used to exist inside Magnum, however, the Cluster API has built-in autoscaling. We should enable it when auto_scaling_enabled is set to true.

I have not done significant research into this, so I will update this later once we have an idea how to get it all to work.

Review `kube-bench` results

This is an issue to track which ones we need to take care of, or offer the option for the user to work with

  • kube-bench: [WARN] 1.1.9 Ensure that the Container Network Interface file permissions are set to 644 or more restrictive (Manual)
  • kube-bench: [WARN] 1.1.10 Ensure that the Container Network Interface file ownership is set to root:root (Manual)
  • #28
  • kube-bench: [WARN] 1.2.1 Ensure that the --anonymous-auth argument is set to false (Manual)
  • #29
  • kube-bench: [WARN] 1.2.10 Ensure that the admission control plugin EventRateLimit is set (Manual)
  • kube-bench: [WARN] 1.2.12 Ensure that the admission control plugin AlwaysPullImages is set (Manual)
  • kube-bench: [WARN] 1.2.13 Ensure that the admission control plugin SecurityContextDeny is set if PodSecurityPolicy is not used (Manual)
  • #30
  • #31
  • #32
  • #33
  • #34
  • kube-bench: [WARN] 1.2.23 Ensure that the --request-timeout argument is set as appropriate (Manual)
  • kube-bench: [WARN] 1.2.30 Ensure that the --encryption-provider-config argument is set as appropriate (Manual)
  • kube-bench: [WARN] 1.2.31 Ensure that encryption providers are appropriately configured (Manual)
  • kube-bench: [WARN] 1.2.32 Ensure that the API Server only makes use of Strong Cryptographic Ciphers (Manual)
  • kube-bench: [WARN] 1.3.1 Ensure that the --terminated-pod-gc-threshold argument is set as appropriate (Manual)
  • #35
  • #36
  • #37
  • kube-bench: [WARN] 4.2.9 Ensure that the --event-qps argument is set to 0 or a level which ensures appropriate event capture (Manual)
  • kube-bench: [WARN] 4.2.10 Ensure that the --tls-cert-file and --tls-private-key-file arguments are set as appropriate (Manual)
  • kube-bench: [WARN] 4.2.13 Ensure that the Kubelet only makes use of Strong Cryptographic Ciphers (Manual)

Magnum/CAPI provisioned Kubernetes cluster name length

Is there a way to customize the name used by the Magnum API (using the CAPI driver) built into Atmosphere? We are running into issues where operators are adding labels to artifacts that exceed the 63 character limit built into Kubernetes, and a big chunk of the label is the cluster name in these cases. It would be nice if we can define a name (such as prod-k8s-00 or something like that) that tells the user exactly what it is used for and doesn't include the additional Magnum (and/or CAPI/CAPO) syntax.

trivy k8s cluster --compliance k8s-cis --report summary 
2023-03-25T16:59:00.124-0400	FATAL	get k8s artifacts error: running node-collector job: Job.batch "node-collector-foo-test-4admbibsdj-default-worker-infra-rb4bh-f2h5t" is invalid: spec.template.labels: Invalid value: "node-collector-foo-test-4admbibsdj-default-worker-infra-rb4bh-f2h5t": must be no more than 63 characters

`kube-bench`: [FAIL] 1.2.6 Ensure that the --kubelet-certificate-authority argument is set as appropriate (Automated)

1.2.6 Follow the Kubernetes documentation and setup the TLS connection between
the apiserver and kubelets. Then, edit the API server pod specification file
/etc/kubernetes/manifests/kube-apiserver.yaml on the control plane node and set the
--kubelet-certificate-authority parameter to the path to the cert file for the certificate authority.
--kubelet-certificate-authority=<ca-string>

Remove all load balancers pre-delete

At the moment, the load balancers are not removed with CAPO.

We can work-around this by doing something similar to what Magnum does with cleaning up the resources:

https://github.com/openstack/magnum/blob/0ee8abeed0ab90baee98a92cab7c684313bab906/magnum/drivers/heat/driver.py#L306-L311

FTR, pre_delete_cluster is manually called so we can just add it at the top of our delete_cluster. The function that's already implemented seems tied to Heat, so these two parts of code should help:

https://github.com/openstack/magnum/blob/0ee8abeed0ab90baee98a92cab7c684313bab906/magnum/common/octavia.py#L89-L101
https://github.com/openstack/magnum/blob/0ee8abeed0ab90baee98a92cab7c684313bab906/magnum/common/octavia.py#L131-L137

With that in place, we'll be able to wipe all of the resources. However, the one thing to investigate is the cluster UUID in the description and what that is set to with CAPI to make sure the regex works right.

Support `ingress_controller`

We'll need to add the ability to install an Ingress controller onto the cluster. Magnum supports both Octavia and Nginx, but we'll start with Nginx at least.

This will be handled with a ClusterResourceSet that gets applied.

Cluster deletion stuck in DELETE_IN_PROGRESS

Context

Cluster(3 masters and 1 worker) was failed to create because of the resource lack. 2 masters and 1 worker created and the 3rd master creation failed.
Then deleted that cluster but it hangs up on DELETE_IN_PROGRESS status.

$ kubectl get clusters
NAMESPACE       NAME                     PHASE      AGE    VERSION
magnum-system   k8s-v1-25-3-8mic9qdzdl   Deleting   160m   v1.25.3

$ kubectl describe openstackclusters -n magnum-system
...
Events:
  Type     Reason                            Age                From                  Message
  ----     ------                            ----               ----                  -------
  Normal   Successfuldisassociatefloatingip  23m                openstack-controller  Disassociated floating IP 172.24.4.74
  Normal   Successfuldeletefloatingip        23m                openstack-controller  Deleted floating IP 172.24.4.74
  Normal   Successfuldeleteloadbalancer      23m                openstack-controller  Deleted load balancer k8s-clusterapi-cluster-magnum-system-k8s-v1-25-3-8mic9qdzdl-kubeapi with id edc6de48-71e3-4da4-b98d-15ad258ba319
  Warning  Faileddeleteloadbalancer          23m (x5 over 23m)  openstack-controller  Failed to delete load balancer k8s-clusterapi-cluster-magnum-system-k8s-v1-25-3-8mic9qdzdl-kubeapi with id edc6de48-71e3-4da4-b98d-15ad258ba319: Expected HTTP response code [202 204] when accessing [DELETE http://38.108.68.181/load-balancer/v2.0/lbaas/loadbalancers/edc6de48-71e3-4da4-b98d-15ad258ba319?cascade=true], but got 409 instead
{"faultcode": "Client", "faultstring": "Invalid state PENDING_DELETE of loadbalancer resource edc6de48-71e3-4da4-b98d-15ad258ba319", "debuginfo": null}
  Warning  Faileddeletesecuritygroup  101s (x14 over 23m)  openstack-controller  Failed to delete security group k8s-cluster-magnum-system-k8s-v1-25-3-8mic9qdzdl-secgroup-controlplane with id 56f209e7-3ed4-4880-83b0-5a52284c9e8d: Expected HTTP response code [202 204] when accessing [DELETE http://38.108.68.181:9696/networking/v2.0/security-groups/56f209e7-3ed4-4880-83b0-5a52284c9e8d], but got 409 instead
{"NeutronError": {"type": "SecurityGroupInUse", "message": "Security Group 56f209e7-3ed4-4880-83b0-5a52284c9e8d in use.", "detail": ""}}
ubuntu@magnum-capi-driver:~$ source /opt/stack/openrc admin admin
WARNING: setting legacy OS_TENANT_NAME to support cli tools.

Loadbalancer has been deleted finally but sg is not deleted because it is in-use by an undeleted port.

+--------------------------------------+----------------------------------------------------+-------------------+----------------------------------------------------------------------------------------------------+--------+
| ID                                   | Name                                               | MAC Address       | Fixed IP Addresses                                                                                 | Status |
+--------------------------------------+----------------------------------------------------+-------------------+----------------------------------------------------------------------------------------------------+--------+
| 94720b71-091a-42aa-b758-b0675feee028 | k8s-v1-25-3-8mic9qdzdl-control-plane-8xbtc-vwrjx-0 | fa:16:3e:e6:1e:99 | ip_address='10.6.0.95', subnet_id='33540296-1071-4227-8cb7-fab4d042ea5e'                           | DOWN   |
+--------------------------------------+----------------------------------------------------+-------------------+----------------------------------------------------------------------------------------------------+--------+

Only one port is not deleted and remains. I guess this is the port that was bound to 3rd master node.

Workaround

Manually delete that dangling port so sg can be deleted by CAPO.

Cluster upgrades

We've got to implement cluster upgrades which will allow us to go from one release to another of Kubernetes, we will be relying on Cluster API cluster upgrade to get this done.

README.md

  • Upgrade to include instructions to create both 1.24 and 1.25 images and cluster templates

upgrade_cluster driver function

  • Recreate all of the MachineTemplates with the new values (see steps 1-3)
  • Update the version value after recreating the MachineTemplate so it forces a rollout for control plane
  • Update the cluster.x-k8s.io/restartedAt annotation to force a rollout after updating MachineTemplate (see "How to schedule a machine rollout")
  • Validate that the cluster ends up in UPDATE_COMPLETE
  • Validate that the cluster is now fully upgraded.

The above can be tested upgrading from 1.24 to 1.25, we assume that Cluster API can take care of upgrades cleanly and we're just focused on making sure the Magnum/Cluster API interaction is working properly.

object handling require some improve

found errors like

2023-02-07 22:50:52.737 1 ERROR oslo.service.loopingcall [-] Fixed interval looping call 'magnum.service.periodic.ClusterUpdateJob.update_status' failed: KeyError: 'status'
2023-02-07 22:50:52.737 1 ERROR oslo.service.loopingcall Traceback (most recent call last):
2023-02-07 22:50:52.737 1 ERROR oslo.service.loopingcall   File "/var/lib/openstack/lib/python3.10/site-packages/oslo_service/loopingcall.py", line 150, in _run_loop
2023-02-07 22:50:52.737 1 ERROR oslo.service.loopingcall     result = func(*self.args, **self.kw)
2023-02-07 22:50:52.737 1 ERROR oslo.service.loopingcall   File "/var/lib/openstack/lib/python3.10/site-packages/magnum/service/periodic.py", line 73, in update_status
2023-02-07 22:50:52.737 1 ERROR oslo.service.loopingcall     cdriver.update_cluster_status(self.ctx, self.cluster)
2023-02-07 22:50:52.737 1 ERROR oslo.service.loopingcall   File "/var/lib/openstack/lib/python3.10/site-packages/magnum_cluster_api/driver.py", line 55, in update_cluster_status
2023-02-07 22:50:52.737 1 ERROR oslo.service.loopingcall     node_groups = [
2023-02-07 22:50:52.737 1 ERROR oslo.service.loopingcall   File "/var/lib/openstack/lib/python3.10/site-packages/magnum_cluster_api/driver.py", line 56, in <listcomp>
2023-02-07 22:50:52.737 1 ERROR oslo.service.loopingcall     self.update_nodegroup_status(context, cluster, node_group)
2023-02-07 22:50:52.737 1 ERROR oslo.service.loopingcall   File "/var/lib/openstack/lib/python3.10/site-packages/magnum_cluster_api/driver.py", line 206, in update_nodegroup_status
2023-02-07 22:50:52.737 1 ERROR oslo.service.loopingcall     generation = kcp.obj["status"].get("observedGeneration")
2023-02-07 22:50:52.737 1 ERROR oslo.service.loopingcall KeyError: 'status'
2023-02-07 22:50:52.737 1 ERROR oslo.service.loopingcall 

or

2023-02-07 22:50:46.717 1 ERROR oslo.service.loopingcall [-] Fixed interval looping call 'magnum.service.periodic.ClusterUpdateJob.update_status' failed: AttributeError: 'NoneType' object has no attribute 'reload'
2023-02-07 22:50:46.717 1 ERROR oslo.service.loopingcall Traceback (most recent call last):
2023-02-07 22:50:46.717 1 ERROR oslo.service.loopingcall   File "/var/lib/openstack/lib/python3.10/site-packages/oslo_service/loopingcall.py", line 150, in _run_loop
2023-02-07 22:50:46.717 1 ERROR oslo.service.loopingcall     result = func(*self.args, **self.kw)
2023-02-07 22:50:46.717 1 ERROR oslo.service.loopingcall   File "/var/lib/openstack/lib/python3.10/site-packages/magnum/service/periodic.py", line 73, in update_status
2023-02-07 22:50:46.717 1 ERROR oslo.service.loopingcall     cdriver.update_cluster_status(self.ctx, self.cluster)
2023-02-07 22:50:46.717 1 ERROR oslo.service.loopingcall   File "/var/lib/openstack/lib/python3.10/site-packages/magnum_cluster_api/driver.py", line 69, in update_cluster_status
2023-02-07 22:50:46.717 1 ERROR oslo.service.loopingcall     capi_cluster.reload()
2023-02-07 22:50:46.717 1 ERROR oslo.service.loopingcall AttributeError: 'NoneType' object has no attribute 'reload'
2023-02-07 22:50:46.717 1 ERROR oslo.service.loopingcall 

Cluster status is not updated properly

I've been noticing that on every action took on a magnum cluster, it delays up to 1 minute to have it reflected on magnum api.

In some cases, like a sequence of rolling upgrades, the second time we upgrade a cluster ( with UPDATE_COMPLETE status ), it keeps doing the upgrade, but it doesnt update the magnum status to UPDATE_IN_PROGESS.

This might confuse the end-user on what is the operation being done at the moment.

Possible missing Apache2 License?

It would be great to collaborate on this, and merge it with our helm approach.

Would you be willing to add an Apache2 Licence so we could do some of that please?

Support `container_infra_prefix`

At the moment, all of the images are being pulled directly from the internet. This can be a problem in air-gapped environments or places where internet might not be reliable.

We've got to add container_infra_prefix in order to be able to use images from a local registry, and have a very clean script on how to load up said custom registry with all those images.

Add release infrastructure

We need to setup something like release-please which will help us track releases and then push them to pypi for getting started.

Add support for `docker_volume_{size,type}`

At the moment, we cannot use this feature due to the fact that CAPO does not support mounting multiple volumes:

kubernetes-sigs/cluster-api-provider-openstack#1286

Once that's in place, we can use the following labels to create volumes:

  • etcd_volume_size
  • etcd_volume_type
  • docker_volume_size
  • docker_volume_type

We can use the following functions to determine the volume type (if size is set):

https://github.com/openstack/magnum/blob/16bdedcf2fe6986c995bd415f4e3c70dac914ada/magnum/common/cinder.py#L25-L37

And we can get the sizes from labels/API objects.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.