Comments (9)
Please try to update to the latest version.
It also looks like this was not a fresh install? Otherwise, why would there be any resources?
This
Resource 'pvc-80745669-9bf4-4776-9865-f6f419c57863' on node 'k8s-m2' is still in use.
Looks like the resource (which already existed) is still in use somewhere. So someone has the still mounted or similar. Clean that up first (check the resource state linstor r l
to find where it is "InUse" and see unmount it there).
from piraeus-operator.
I will try to upgrade to the latest version, but this is a fresh install. We plan to use Linstor in production, but before that we are doing automated testing by installing fresh Kubernetes on three VMs and then via Flux CD piraeus operator. This installation was started on Friday evening and this morning I saw the installation status and found the errors I describe in this issue.
The output of the linstor r l
:
$ kubectl exec -ti -n piraeus-datastore deploy/linstor-controller -- linstor r l
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ Node ┊ Port ┊ Usage ┊ Conns ┊ State ┊ CreatedOn ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ pvc-80745669-9bf4-4776-9865-f6f419c57863 ┊ k8s-m0 ┊ 7002 ┊ ┊ ┊ Unknown ┊ ┊
┊ pvc-80745669-9bf4-4776-9865-f6f419c57863 ┊ k8s-m2 ┊ 7002 ┊ InUse ┊ ┊ Unknown ┊ 2024-04-05 15:15:27 ┊
┊ pvc-a6a8ed01-2406-4614-8432-fdef2b2c7abe ┊ k8s-m2 ┊ 7000 ┊ InUse ┊ Ok ┊ UpToDate ┊ 2024-04-05 15:15:24 ┊
┊ pvc-b1d25fdb-8729-474b-ab0e-c031cf159d60 ┊ k8s-m0 ┊ 7001 ┊ Unused ┊ Ok ┊ TieBreaker ┊ 2024-04-05 15:16:03 ┊
┊ pvc-b1d25fdb-8729-474b-ab0e-c031cf159d60 ┊ k8s-m1 ┊ 7001 ┊ InUse ┊ Ok ┊ UpToDate ┊ 2024-04-05 15:16:04 ┊
┊ pvc-b1d25fdb-8729-474b-ab0e-c031cf159d60 ┊ k8s-m2 ┊ 7001 ┊ Unused ┊ Ok ┊ UpToDate ┊ 2024-04-05 15:16:02 ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
The PVC pvc-80745669-9bf4-4776-9865-f6f419c57863 is used by the monitoring, which cannot start:
$ kubectl get pvc -A | grep pvc-80745669-9bf4-4776-9865-f6f419c57863
monitoring kube-prometheus-stack-grafana Bound pvc-80745669-9bf4-4776-9865-f6f419c57863 10Gi RWO linstor-fast 2d17h
$ kubectl get pods -n monitoring
NAME READY STATUS RESTARTS AGE
alertmanager-kube-prometheus-stack-alertmanager-0 2/2 Running 0 35h
kube-prometheus-stack-grafana-9b8785fdd-m9nkm 0/3 Init:0/1 0 2d17h
kube-prometheus-stack-kube-state-metrics-776c898f6-qbjj9 1/1 Running 0 47h
kube-prometheus-stack-operator-696cbbfbfb-sql6s 1/1 Running 0 35h
kube-prometheus-stack-prometheus-node-exporter-d96g9 1/1 Running 0 2d17h
kube-prometheus-stack-prometheus-node-exporter-dcdh7 1/1 Running 0 2d17h
kube-prometheus-stack-prometheus-node-exporter-gfblh 1/1 Running 0 2d17h
prometheus-kube-prometheus-stack-prometheus-0 2/2 Running 0 35h
from piraeus-operator.
So it looks like 6610156F-8EC88-000000
indicates that mkfs failed because DRBD was not set up correctly. But in 66101520-00000-000000
we can see that the resource is apparently in use. This does not make much sense. This would indicate that something is using keeping the resource in primary without any actual disk.
Could you please try to run:
kubectl exec k8s-m2 -- drbdsetup status pvc-80745669-9bf4-4776-9865-f6f419c57863
kubectl exec k8s-m2 -- drbdsetup show pvc-80745669-9bf4-4776-9865-f6f419c57863
It looks like the CSI driver later tried to create the volume again and somehow determined that the volume already exists, which lead to it being bound. I would recommend deleting the PVC and PV and letting it be recreated.
from piraeus-operator.
Here is output of the commands
$ kubectl exec -n piraeus-datastore k8s-m2 -- drbdsetup status pvc-80745669-9bf4-4776-9865-f6f419c57863
pvc-80745669-9bf4-4776-9865-f6f419c57863 role:Primary
$ kubectl exec -n piraeus-datastore k8s-m2 -- drbdsetup show pvc-80745669-9bf4-4776-9865-f6f419c57863
resource "pvc-80745669-9bf4-4776-9865-f6f419c57863" {
options {
on-no-data-accessible suspend-io;
on-suspended-primary-outdated force-secondary;
}
_this_host {
node-id 0;
}
}
from piraeus-operator.
Ok, this looks like a bug in LINSTOR that does not properly restore the resource to secondary after the mkfs call fails. Still leaves the issue how it can be that /dev/drbd1002 does not exist at this point. I have no idea how that can happen.
To fully clean up the volume:
kubectl exec -n piraeus-datastore k8s-m2 -- drbdsetup secondary pvc-80745669-9bf4-4776-9865-f6f419c57863
Then, run linstor rd d pvc-80745669-9bf4-4776-9865-f6f419c57863
and delete PVC and PV.
from piraeus-operator.
Your last suggestion worked, I was able to reinstall the monitoring. What would you recommend now?
Update to the latest version of piraeus Operator and create a new issue when I get a new error?
What steps would help you to analyze this error?
from piraeus-operator.
Yes, please upgrade and see if it happens again. In case you encounter an issue, run
kubectl exec -it deploy/linstor-controller -- linstor sos-report create
Then copy the created file from the pod to your host and attach it to the issue
from piraeus-operator.
@WanzenBug , I am currently testing the latest version of Piraeus Operator v2.5.0 and so far the problem described in this issue has not reoccurred. However, I have just reproduced again a problem that I described in another issue: LINBIT/linstor-server#396 . Since I never got a response in the linstor-server project, should I recreate the issue in this (piraeus-operator) project?
from piraeus-operator.
Yes, this is an issue more appropriate for the piraeus project.
from piraeus-operator.
Related Issues (20)
- Deployment on k8s failed due to drbd-module-loader container HOT 1
- StorageException: Failed to pvcreate on device: /dev/sdb HOT 13
- Clarify the meaning of the CRDs .status.conditions HOT 1
- etcd-operator adoption HOT 1
- Linstor, installed via Piraeus operator in Kubernetes cluster, disables LVM monitoring HOT 5
- ImplementationError: Layer 'DRBD did not delete the volume 0 of resource ... properly HOT 4
- Resizing of LVM after Host-Reboot not working HOT 9
- A potential risk in piraeus-operator that could lead to takeover of the cluster HOT 1
- ZFS: change mounting location (for Talos Linux) HOT 4
- failed to fail-over resource HOT 1
- make master node only for DISKLESS as TieBreaker HOT 3
- [bug] The priorityClass of the pods is not set. Cascade of failure ensues. HOT 3
- `Satellite not online` for only one node. No errors, just hangs. HOT 2
- Importing/Mounting pre-existing volumes in linstor/DRBD HOT 2
- setup fails for minikube HOT 1
- BUG: Unnecessary permissions in charts HOT 1
- Error “can't read superblock on /dev/drbd1001.” when trying to start a pod HOT 5
- please install `diffutils` from quay.io/piraeusdatastore/drbd9-almalinux9:v9.2.9 HOT 2
- PVC provisioning fails on LVM thinpool on multipath device with: Logical Volume already exists in volume group HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from piraeus-operator.