Comments (10)
It looks like the old master has advanced its WAL position past the promotion point of a new master. I think it might happen during the network outage, when a new data has been written into the master's WAL, but has not been propagated to replicas (this is also possible in synchronous mode). You can configure Patroni to call pg_rewind in order to bring the former master up-to-date.
from patroni.
Yep, you are right. But this broken node can became a master if there are no competitors
2015-11-18 12:05:50,027 INFO: Lock owner: node5; I am node5
2015-11-18 12:05:50,028 INFO: no action. i am the leader with the lock
172.17.0.101 - - [18/Nov/2015 12:05:52] "OPTIONS / HTTP/1.0" 200 -
from patroni.
Well, if all other nodes in the cluster have died, then promoting a single leftover node to a master is a sane thing to do, isn't it?
from patroni.
I'm not sure because this node can be outdated and can contain inconsistent data
Such cases can corrupt logic on the clients so it will better to shutdown such nodes.
from patroni.
It's not a task of Patroni to detect such 'broken' nodes. Your monitoring system should do it (based, for instance, on the replication lag), and it should be human decision to shut them down.
from patroni.
I'd like it to the be the task of Patroni to automate and make these decisions
On Wed, Nov 18, 2015 at 6:05 AM, Oleksii Kliukin [email protected]
wrote:
It's not a task of Patroni to detect such 'broken' nodes. Your monitoring system should do it (based, for instance, on the replication lag), and it should be human decision to shut them down.
Reply to this email directly or view it on GitHub:
#99 (comment)
from patroni.
Is it entirely out of scope for Patroni cells to self administer this? The only orchestration can be external?
On Wed, Nov 18, 2015 at 5:20 AM, katoquro [email protected]
wrote:
I'm not sure because this node can be outdated and can contain inconsistent data
Such cases can corrupt logic on the clients so it will better to shutdown such nodes.
Reply to this email directly or view it on GitHub:
#99 (comment)
from patroni.
You can use pg_rewind and avoid this problem altogether.
from patroni.
There are other cases of a node that is unable to join the cluster (for instance, if replication username/password is incorrect). It's not possible/does not make much sense to detect every issue like this by Patroni - it should be a task of the monitoring system to realize that some replicas are potentially unhealthy and then a human interaction to fix it.
from patroni.
@alexeyklyukin thanks for pointing me to pg_rewind
from patroni.
Related Issues (20)
- Unexpected state for replicatefrom after switchover HOT 2
- patroni-consul RPM requires consul package HOT 2
- 3.2.2 throwing unexpected exception HOT 7
- Feature request: Include replication state in Consul service tags HOT 1
- Infinite recursion in handling of replicatefrom tags HOT 3
- patroni_failsafe_mode_is_active prometheus metric always 0 HOT 3
- Unable to connect to external etcdv3 cluster with client TLS authentication (with CommonName in cert) HOT 1
- Cannot configure SSL for replication HOT 7
- wrong role in patronictl list
- Postgresql failed to start after deletion of postgresql.base.conf due to recursion HOT 5
- patroni.dcs.kubernetes.K8sClient.rest.ApiException: (401) HOT 4
- Reinit master with empty directory after data corruption HOT 1
- unreasonable ttl will cause all DCS connection raise [Errno 22] Invalid argument HOT 4
- ERROR: replication slot "bar_psqldb04" does not exist HOT 2
- Patroni overwrite synchronous_standby_names on primary in async mode
- Failsafe mode when master doesn't have access to DCS HOT 1
- TypeError: string argument without an encoding HOT 1
- Patroni Does Not Failover on Data Disk Full Shutdown HOT 3
- Missing cdiff in requirements HOT 2
- switchover pg cluster,but master not failover HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from patroni.