Giter Site home page Giter Site logo

Comments (6)

portante avatar portante commented on July 28, 2024 1

It seems we might be missing the spirit of what a readiness probe means for Elasticsearch. An OpenShift readiness probe refers to the readiness of a particular pod to operate. It does not refer to the readiness of the global distributed service that pod is part of when running.

The readiness of an Elasticsearch pod should be to make sure the low-level services needed by the ES pod are in place: disk space is available (/elasticsearch/persistent/logging-es/...), sanity check of the configuration files are in place (/usr/share/elasticsearch/elasticsearch/config/elasticsearch.yml), etc.

from elasticsearch-cloud-kubernetes.

jcantrill avatar jcantrill commented on July 28, 2024

@jimmidyson Two questions related to this issue for which we have a PR

  1. Can we get a 2.4.4.x branch to which we merge a fix
  2. Do you have and aversion do using pod labels for service discovery?

@portante I agree we should pursue a better probe but in context of the referenced issue, it appears @wozniakjan PR would resolve our short term concerns

from elasticsearch-cloud-kubernetes.

jimmidyson avatar jimmidyson commented on July 28, 2024

I'm slightly surprised this hasn't been reported before by other users, of which there seem to be quite a few. Is this something we can consistently reproduce? I am aware we have no test suite to prove the problem and validate any fixes and I would be very grateful to anyone that provides that so we know this is really fixed.

Can we get a 2.4.4.x branch to which we merge a fix?

Can't you just use the existing 2.x branch?

Do you have and aversion do using pod labels for service discovery?

No aversion, but if elasticsearch isn't actually started, then discovering the pods like this rather than waiting them to be ready is more likely to hit race conditions on joining cluster, but if that can be mitigated then go for it.

from elasticsearch-cloud-kubernetes.

wozniakjan avatar wozniakjan commented on July 28, 2024

Is this something we can consistently reproduce?

yes, it looks like every time we try to deploy ES cluster larger than one with readiness probe waiting for the ES to respond 200. ES that has not yet successfully participated in master discovery responds 503, therefore the probe prevents it from participating in master discovery. Other solution could be based on identifying working ES pod differently than checking the 200 status code, we are currently not certain what would be better. More detailed description is in the mentioned bugzilla and PR.

Can we get a 2.4.4.x branch to which we merge a fix?

Can't you just use the existing 2.x branch?

my apologies, this confusion comes from me, I thought the tag 2.4.4 and branch 2.x differ, but since they are the same, we can use 2.x

from elasticsearch-cloud-kubernetes.

wozniakjan avatar wozniakjan commented on July 28, 2024

As discussed during the bug scrub meeting, more robust test cases will be provided by the QA. The only tests we did were:

  1. deploy logging stack with 2 ES nodes (later tried also 3 and 4 nodes)
  2. kill one of the ES pods to see if cluster recovers

The 1. step already fails for the current setup from the master branch where resides the implementation with the readiness probe. Attached are logs from discovery by service (as currently implemented) and discovery by label (proposed as possible workaround)

Both steps succeed with label discovery.

Attachment:
logs.zip

from elasticsearch-cloud-kubernetes.

wozniakjan avatar wozniakjan commented on July 28, 2024

After discussion with @jcantrill, I created a pull request #98

from elasticsearch-cloud-kubernetes.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.