Comments (5)
This is deliberate corruption of the state and not supported. If this happens you would need to recover from backup or use our warm standby premium feature.
from aeron.
Not sure I would call that corruption since the state fully exists in one node: it just needs to be replicated to the other nodes that don't have any state at all. It feels like not much is missing to make it work (but I don't know enough the cluster internals to know for sure).
you would need to recover from backup
Actually, this problem occurred while recovering from backup, but it boils down to the test above.
The initial scenario was:
- start 3 nodes
- start one ClusterBackup node, replicating the state
- stop everything
- forget about the 3 initial nodes
- set up a new 3-node cluster where one node reuses the directories generated by ClusterBackup and 2 other nodes start empty
This scenario works only if the ingress log is never truncated.
An alternative would be start 3 ClusterBackup nodes but it seems overkill to replicate the state 3 times.
Is ClusterBackup an actually viable solution or only the premium features allow to have a reliable backup?
from aeron.
Not sure I would call that corruption since the state fully exists in one node
This is where the example breaks down. Raft and other similar consensus algorithms that handle fail-stop type faults can only handle 1 failure in a 3 node cluster. You have a scenario where the 3 node cluster has 2 failed nodes. This is beyond what the algorithm has the ability to correctly and automatically recover from. Therefore you would need to fall back to manually fixing the system.
An alternative would be start 3 ClusterBackup nodes but it seems overkill to replicate the state 3 times.
The premium Cluster Standby would create 3 replicated copies of the data for the scenarios where the user wants to have another cluster that can they can fail over to. It has some functionality to support daisy chain style replication to reduce load on the primary cluster and potential WAN bandwidth consumption. However, we would not see having 3 replicated copies of the state as overkill.
from aeron.
Thank you for your detailed answer.
This is beyond what the algorithm has the ability to correctly and automatically recover from.
In my initial tests with ClusterBackup, this scenario worked perfectly (until I truncated the log). My mistake was then probably to think it was a supported use-case. There is not much litterature around ClusterBackup.
You have a scenario where the 3 node cluster has 2 failed nodes.
Actually, if I modify the test to have only one failed node (i.e. replace one true
by false
), it fails just the same.
Therefore you would need to fall back to manually fixing the system.
Yes, I could detect the absence of archive + cluster dirs in one node, wait a bit for other nodes to be ready to start, and automate the download of the state from another node before starting the cluster. Not trivial, but doable. Probably easier to update the fail over procedure so that data is copied manually into the extra nodes :)
we would not see having 3 replicated copies of the state as overkill.
Agree. I only meant that transmitting the same data 3 times over the network would not be optimal.
The premium Cluster Standby sure looks interesting!
from aeron.
As @mikeb01 has pointed out this goes beyond the Raft algorithm. The reason the purge causes issues is that the leader must be log complete under the spec. Also consider without the purge the others nodes have to recovery the whole log. For a long running system this would not be practical as an alternative, even if it works in a simple test.
Snapshots are an optimisation but again there is very little explanation to how they are implemented in the Raft paper or PhD thesis. To purge a old log it needs to be coordinated and you need to know other nodes are up to date so you cannot introduce more than one failure in a 3 node system as Mike points out.
You can use a combination of Cluster Backup and some scripting to make this work. We have to make a living so we provide commercial support and a premium offering that makes this much easier. Many open core offerings do not even provide basic replication or fault tolerance in the open offering. We think we have gone pretty far with with we offer openly given the years of engineering effort that as gone into Aeron.
from aeron.
Related Issues (20)
- aeronmd.c Closing multi-publisher IPC publication HOT 2
- Cannot set thread affinity in shared or sharednetwork modes for c media driver HOT 3
- list-members(ClusterTool) command does not show isLeader accurately
- big latency while transmit small packets cross different AWS zone over Aeron comparing with raw UDP HOT 2
- AeronCluster.AsyncConnect can forget to close subscription HOT 1
- [C++] `ReplayMerge` with multicast live destination doesn't merge. HOT 10
- AeronCluster.java decoding order issue HOT 3
- OpenTelemetry Integration
- ArchiveException: ERROR - response for correlationId=15, error: 59232 position not aligned to a data header HOT 8
- Invoke fileChannel's force method before close HOT 2
- Heartbeats being sent, despite no publishing. HOT 5
- `ReplayMerge::doWork` throws exceptions without descriptions.
- ReplayMerge join position is greater than the replay position HOT 4
- AeronCluster client (gateway) - SIGSEGV HOT 1
- Set thread name to "client-conductor" fails.
- aeron ping-pong example build should detect sendmmsg
- code examples for C or C++ HOT 2
- Archive ConductorServiceTimeoutException when using `useConductorAgentInvoker` HOT 2
- [C Media Driver]: Custom poller and receiver functions HOT 2
- Entire cluster of 3 members getting stuck if one of the followers gets stuck HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from aeron.