Giter Site home page Giter Site logo

Comments (10)

alexeyklyukin avatar alexeyklyukin commented on August 28, 2024

It looks like the old master has advanced its WAL position past the promotion point of a new master. I think it might happen during the network outage, when a new data has been written into the master's WAL, but has not been propagated to replicas (this is also possible in synchronous mode). You can configure Patroni to call pg_rewind in order to bring the former master up-to-date.

from patroni.

katoquro avatar katoquro commented on August 28, 2024

Yep, you are right. But this broken node can became a master if there are no competitors

2015-11-18 12:05:50,027 INFO: Lock owner: node5; I am node5
2015-11-18 12:05:50,028 INFO: no action.  i am the leader with the lock
172.17.0.101 - - [18/Nov/2015 12:05:52] "OPTIONS / HTTP/1.0" 200 -

from patroni.

alexeyklyukin avatar alexeyklyukin commented on August 28, 2024

Well, if all other nodes in the cluster have died, then promoting a single leftover node to a master is a sane thing to do, isn't it?

from patroni.

katoquro avatar katoquro commented on August 28, 2024

I'm not sure because this node can be outdated and can contain inconsistent data
Such cases can corrupt logic on the clients so it will better to shutdown such nodes.

from patroni.

alexeyklyukin avatar alexeyklyukin commented on August 28, 2024

It's not a task of Patroni to detect such 'broken' nodes. Your monitoring system should do it (based, for instance, on the replication lag), and it should be human decision to shut them down.

from patroni.

drnic avatar drnic commented on August 28, 2024

I'd like it to the be the task of Patroni to automate and make these decisions

On Wed, Nov 18, 2015 at 6:05 AM, Oleksii Kliukin [email protected]
wrote:

It's not a task of Patroni to detect such 'broken' nodes. Your monitoring system should do it (based, for instance, on the replication lag), and it should be human decision to shut them down.

Reply to this email directly or view it on GitHub:
#99 (comment)

from patroni.

drnic avatar drnic commented on August 28, 2024

Is it entirely out of scope for Patroni cells to self administer this? The only orchestration can be external?

On Wed, Nov 18, 2015 at 5:20 AM, katoquro [email protected]
wrote:

I'm not sure because this node can be outdated and can contain inconsistent data

Such cases can corrupt logic on the clients so it will better to shutdown such nodes.

Reply to this email directly or view it on GitHub:
#99 (comment)

from patroni.

alexeyklyukin avatar alexeyklyukin commented on August 28, 2024

You can use pg_rewind and avoid this problem altogether.

from patroni.

alexeyklyukin avatar alexeyklyukin commented on August 28, 2024

There are other cases of a node that is unable to join the cluster (for instance, if replication username/password is incorrect). It's not possible/does not make much sense to detect every issue like this by Patroni - it should be a task of the monitoring system to realize that some replicas are potentially unhealthy and then a human interaction to fix it.

from patroni.

drnic avatar drnic commented on August 28, 2024

@alexeyklyukin thanks for pointing me to pg_rewind

from patroni.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.