Giter Site home page Giter Site logo

Comments (9)

WanzenBug avatar WanzenBug commented on July 17, 2024

Please try to update to the latest version.

It also looks like this was not a fresh install? Otherwise, why would there be any resources?

This

    Resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' on node 'k8s-m2' is still in use.

Looks like the resource (which already existed) is still in use somewhere. So someone has the still mounted or similar. Clean that up first (check the resource state linstor r l to find where it is "InUse" and see unmount it there).

from piraeus-operator.

dmrub avatar dmrub commented on July 17, 2024

I will try to upgrade to the latest version, but this is a fresh install. We plan to use Linstor in production, but before that we are doing automated testing by installing fresh Kubernetes on three VMs and then via Flux CD piraeus operator. This installation was started on Friday evening and this morning I saw the installation status and found the errors I describe in this issue.

The output of the linstor r l:

$ kubectl exec -ti -n piraeus-datastore deploy/linstor-controller -- linstor r l
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName                             ┊ Node   ┊ Port ┊ Usage  ┊ Conns ┊      State ┊ CreatedOn           ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ pvc-80745669-9bf4-4776-9865-f6f419c57863 ┊ k8s-m0 ┊ 7002 ┊        ┊       ┊    Unknown ┊                     ┊
┊ pvc-80745669-9bf4-4776-9865-f6f419c57863 ┊ k8s-m2 ┊ 7002 ┊ InUse  ┊       ┊    Unknown ┊ 2024-04-05 15:15:27 ┊
┊ pvc-a6a8ed01-2406-4614-8432-fdef2b2c7abe ┊ k8s-m2 ┊ 7000 ┊ InUse  ┊ Ok    ┊   UpToDate ┊ 2024-04-05 15:15:24 ┊
┊ pvc-b1d25fdb-8729-474b-ab0e-c031cf159d60 ┊ k8s-m0 ┊ 7001 ┊ Unused ┊ Ok    ┊ TieBreaker ┊ 2024-04-05 15:16:03 ┊
┊ pvc-b1d25fdb-8729-474b-ab0e-c031cf159d60 ┊ k8s-m1 ┊ 7001 ┊ InUse  ┊ Ok    ┊   UpToDate ┊ 2024-04-05 15:16:04 ┊
┊ pvc-b1d25fdb-8729-474b-ab0e-c031cf159d60 ┊ k8s-m2 ┊ 7001 ┊ Unused ┊ Ok    ┊   UpToDate ┊ 2024-04-05 15:16:02 ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

The PVC pvc-80745669-9bf4-4776-9865-f6f419c57863 is used by the monitoring, which cannot start:

$ kubectl get pvc -A | grep pvc-80745669-9bf4-4776-9865-f6f419c57863
monitoring           kube-prometheus-stack-grafana         Bound    pvc-80745669-9bf4-4776-9865-f6f419c57863   10Gi       RWO            linstor-fast                 2d17h

$ kubectl get pods -n monitoring
NAME                                                       READY   STATUS     RESTARTS   AGE
alertmanager-kube-prometheus-stack-alertmanager-0          2/2     Running    0          35h
kube-prometheus-stack-grafana-9b8785fdd-m9nkm              0/3     Init:0/1   0          2d17h
kube-prometheus-stack-kube-state-metrics-776c898f6-qbjj9   1/1     Running    0          47h
kube-prometheus-stack-operator-696cbbfbfb-sql6s            1/1     Running    0          35h
kube-prometheus-stack-prometheus-node-exporter-d96g9       1/1     Running    0          2d17h
kube-prometheus-stack-prometheus-node-exporter-dcdh7       1/1     Running    0          2d17h
kube-prometheus-stack-prometheus-node-exporter-gfblh       1/1     Running    0          2d17h
prometheus-kube-prometheus-stack-prometheus-0              2/2     Running    0          35h

from piraeus-operator.

WanzenBug avatar WanzenBug commented on July 17, 2024

So it looks like 6610156F-8EC88-000000 indicates that mkfs failed because DRBD was not set up correctly. But in 66101520-00000-000000 we can see that the resource is apparently in use. This does not make much sense. This would indicate that something is using keeping the resource in primary without any actual disk.

Could you please try to run:

kubectl exec k8s-m2 -- drbdsetup status pvc-80745669-9bf4-4776-9865-f6f419c57863
kubectl exec k8s-m2 -- drbdsetup show pvc-80745669-9bf4-4776-9865-f6f419c57863

It looks like the CSI driver later tried to create the volume again and somehow determined that the volume already exists, which lead to it being bound. I would recommend deleting the PVC and PV and letting it be recreated.

from piraeus-operator.

dmrub avatar dmrub commented on July 17, 2024

Here is output of the commands

$ kubectl exec -n piraeus-datastore k8s-m2 -- drbdsetup status pvc-80745669-9bf4-4776-9865-f6f419c57863
pvc-80745669-9bf4-4776-9865-f6f419c57863 role:Primary

$ kubectl exec -n piraeus-datastore k8s-m2 -- drbdsetup show pvc-80745669-9bf4-4776-9865-f6f419c57863
resource "pvc-80745669-9bf4-4776-9865-f6f419c57863" {
    options {
        on-no-data-accessible	suspend-io;
        on-suspended-primary-outdated	force-secondary;
    }
    _this_host {
        node-id			0;
    }
}

from piraeus-operator.

WanzenBug avatar WanzenBug commented on July 17, 2024

Ok, this looks like a bug in LINSTOR that does not properly restore the resource to secondary after the mkfs call fails. Still leaves the issue how it can be that /dev/drbd1002 does not exist at this point. I have no idea how that can happen.

To fully clean up the volume:

kubectl exec -n piraeus-datastore k8s-m2 -- drbdsetup secondary pvc-80745669-9bf4-4776-9865-f6f419c57863

Then, run linstor rd d pvc-80745669-9bf4-4776-9865-f6f419c57863 and delete PVC and PV.

from piraeus-operator.

dmrub avatar dmrub commented on July 17, 2024

Your last suggestion worked, I was able to reinstall the monitoring. What would you recommend now?
Update to the latest version of piraeus Operator and create a new issue when I get a new error?
What steps would help you to analyze this error?

from piraeus-operator.

WanzenBug avatar WanzenBug commented on July 17, 2024

Yes, please upgrade and see if it happens again. In case you encounter an issue, run

kubectl exec -it deploy/linstor-controller -- linstor sos-report create

Then copy the created file from the pod to your host and attach it to the issue

from piraeus-operator.

dmrub avatar dmrub commented on July 17, 2024

@WanzenBug , I am currently testing the latest version of Piraeus Operator v2.5.0 and so far the problem described in this issue has not reoccurred. However, I have just reproduced again a problem that I described in another issue: LINBIT/linstor-server#396 . Since I never got a response in the linstor-server project, should I recreate the issue in this (piraeus-operator) project?

from piraeus-operator.

WanzenBug avatar WanzenBug commented on July 17, 2024

Yes, this is an issue more appropriate for the piraeus project.

from piraeus-operator.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.