Giter Site home page Giter Site logo

rook-ceph cluster osd pod in permanent crash loop CrashLoopBackOff with steps on how to fix without deleting the entire cluster about stock-analysis-engine HOT 4 CLOSED

algotraders avatar algotraders commented on August 15, 2024
rook-ceph cluster osd pod in permanent crash loop CrashLoopBackOff with steps on how to fix without deleting the entire cluster

from stock-analysis-engine.

Comments (4)

jay-johnson avatar jay-johnson commented on August 15, 2024

looks like the cluster actually had a problem back on 5-13-2019 and the redis cluster never reloaded the data afterwards. then the osd pod crashed and took it all down last night (5-28-2019).

rook-ceph cluster issue on 5-13-2019

from stock-analysis-engine.

jay-johnson avatar jay-johnson commented on August 15, 2024

The rook-ceph cluster dashboard is showing the cluster looks stable even though the monitor pod restarted and there are multiple osd-prepare pods on here now:

k get po -n rook-ceph
NAME                                          READY   STATUS      RESTARTS   AGE
rook-ceph-mgr-a-68cb58b456-6m8df              1/1     Running     0          58d
rook-ceph-mon-a-855bbddfd4-sxq9m              1/1     Running     0          58d
rook-ceph-mon-b-f949d66dd-rr9lk               1/1     Running     1          58d
rook-ceph-mon-d-5cb4b65d84-tktc7              1/1     Running     0          35m
rook-ceph-osd-0-5dc6b6686f-hmtkd              1/1     Running     1          58d
rook-ceph-osd-1-5fd56d7798-crtbf              1/1     Running     0          39m
rook-ceph-osd-2-5d5965fcc8-l6qz5              1/1     Running     0          58d
rook-ceph-osd-prepare-m10.example.com-pj9p7   0/2     Completed   0          15d
rook-ceph-osd-prepare-m11.example.com-tq8zr   0/2     Completed   0          15d
rook-ceph-osd-prepare-m12.example.com-26p9x   0/2     Completed   2          15d
rook-ceph-tools-bffbf4d8f-znj7q               1/1     Running     0          58d
k get po -n rook-ceph-system
NAME                                 READY   STATUS    RESTARTS   AGE
rook-ceph-agent-d9cdp                1/1     Running   0          58d
rook-ceph-agent-hz6mj                1/1     Running   0          58d
rook-ceph-agent-l9ns4                1/1     Running   2          58d
rook-ceph-operator-d97564799-hzglp   1/1     Running   1          58d
rook-discover-fhg4h                  1/1     Running   0          58d
rook-discover-g7hwd                  1/1     Running   0          58d
rook-discover-mwkcv                  1/1     Running   1          58d

rook-ceph cluster recovery on 5-29-2019

from stock-analysis-engine.

jay-johnson avatar jay-johnson commented on August 15, 2024

logs from the osd pod that was crashing before the delete:

k logs -n rook-ceph rook-ceph-osd-1-5fd56d7798-fttdh
2019-05-29 18:35:27.969 7f25adf371c0  0 ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable), process ceph-osd, pid 8864
2019-05-29 18:35:27.970 7f25adf371c0  0 pidfile_write: ignore empty --pid-file
starting osd.1 at - osd_data /var/lib/rook/osd1 /var/lib/rook/osd1/journal
2019-05-29 18:35:28.085 7f25adf371c0 -1  Processor -- bind unable to bind to 10.244.2.33:7300/0 on any port in range 6800-7300: (99) Cannot assign requested address
2019-05-29 18:35:28.085 7f25adf371c0 -1  Processor -- bind unable to bind to 10.244.2.33:7300/0 on any port in range 6800-7300: (99) Cannot assign requested address
2019-05-29 18:35:28.086 7f25adf371c0 -1  Processor -- bind was unable to bind. Trying again in 5 seconds 
2019-05-29 18:35:28.086 7f25adf371c0 -1  Processor -- bind was unable to bind. Trying again in 5 seconds 
2019-05-29 18:35:33.116 7f25adf371c0 -1  Processor -- bind unable to bind to 10.244.2.33:7300/0 on any port in range 6800-7300: (99) Cannot assign requested address
2019-05-29 18:35:33.116 7f25adf371c0 -1  Processor -- bind unable to bind to 10.244.2.33:7300/0 on any port in range 6800-7300: (99) Cannot assign requested address
2019-05-29 18:35:33.119 7f25adf371c0 -1  Processor -- bind was unable to bind. Trying again in 5 seconds 
2019-05-29 18:35:33.119 7f25adf371c0 -1  Processor -- bind was unable to bind. Trying again in 5 seconds 
2019-05-29 18:35:38.147 7f25adf371c0 -1  Processor -- bind unable to bind to 10.244.2.33:7300/0 on any port in range 6800-7300: (99) Cannot assign requested address
2019-05-29 18:35:38.147 7f25adf371c0 -1  Processor -- bind unable to bind to 10.244.2.33:7300/0 on any port in range 6800-7300: (99) Cannot assign requested address
2019-05-29 18:35:38.150 7f25adf371c0 -1  Processor -- bind was unable to bind after 3 attempts: (99) Cannot assign requested address
2019-05-29 18:35:38.150 7f25adf371c0 -1  Processor -- bind was unable to bind after 3 attempts: (99) Cannot assign requested address

from stock-analysis-engine.

jay-johnson avatar jay-johnson commented on August 15, 2024

I never figured out these rook-related crashes for this issue. I ended up moving over to openebs on kubernetes and I never looked back. I do not know if these old issues were the k8 version, the redis version, my kernel (at the time it was 3 something), or something else fun and exciting.

here's the repo:
https://github.com/openebs/openebs

from stock-analysis-engine.

Related Issues (10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.