Giter Site home page Giter Site logo

Seed cluster cannot recover after failed reconciliation when trying to delete network resources about gardener-extension-provider-openstack HOT 5 CLOSED

namsral avatar namsral commented on July 18, 2024
Seed cluster cannot recover after failed reconciliation when trying to delete network resources

from gardener-extension-provider-openstack.

Comments (5)

namsral avatar namsral commented on July 18, 2024 1

For future reference, the issue was caused by a syntax error in shoot manifest and was resolved by correcting the syntax in the shoot manifest and infra config.

Although the shoot's subnet was correctly created and functional, the difference in notation caused Terraform to recreate the subnet during a reconciliation:

Terraform will perform the following actions:
...
  # openstack_networking_subnet_v2.cluster must be replaced
-/+ resource "openstack_networking_subnet_v2" "cluster" {
      ~ all_tags          = [] -> (known after apply)
      ~ cidr              = "10.240.0.0/16" -> "10.240.0/16" # forces replacement
      ~ gateway_ip        = "10.240.0.1" -> (known after apply)
...
Plan: 2 to add, 0 to change, 2 to destroy.

from gardener-extension-provider-openstack.

kon-angelo avatar kon-angelo commented on July 18, 2024

Hello @namsral. This is indicative that there are resources on the infrastructure that fail to be deleted. These keep the subnet "busy" and openstack refuses to delete it.

In my experience, the usual suspect in such cases are either loadbalancers or ports. It would be helpful if you can check what resources have not been deleted so that we can find the root cause easier.

from gardener-extension-provider-openstack.

namsral avatar namsral commented on July 18, 2024

Thanks @kon-angelo, removing the port connecting the shoot's subnet and router resolved the issue. As both the port, subnet and router are managed by Gardener I consider this a bug but I'm not sure in which system.

Although not tested, it might have been sufficient to clear the port's device_owner containing network:router_interface.

from gardener-extension-provider-openstack.

kon-angelo avatar kon-angelo commented on July 18, 2024

@namsral Its good that you managed to resolve it on your own. If you see that happening consistently then please let us know about the orphan resources you find and we can discuss about the responsible component.

As a point of reference, if the issues is with Loadbalancers then its most likely the problem of openstack's cloud-controller-manager. If however the ports are used by the nodes, then it is a problem with our MCM.

from gardener-extension-provider-openstack.

namsral avatar namsral commented on July 18, 2024

New information reveals that ports of new spawned machines prevents removal of the subnet.

Seed failed on similar error:

Error waiting for openstack_networking_subnet_v2 <omitted> to become deleted: timeout while waiting for state to become 'DELETED' (last state: 'ACTIVE', timeout: 10m0s)```

Steps to recover failed seed:

  1. Delete two remaining ports on the subnet; ports attached to spawned machines
  2. Delete subnet
  3. Force reconcile of the seed cluster
  4. Delete the spawned machines attached to deleted ports from step 1

This looks like a race condition between the removal of the subnet and the spawning of machines in the subnet.

from gardener-extension-provider-openstack.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.