Comments (3)
Hi @mmarod, it's been a long time since I've looked at this, but I think that's true. I think the idea was that you probably wouldn't want to reconfigure into an already degraded cluster, as that might be a mistake in specifying the new configuration.
If you had additional unexpected failures during the reconfiguration, that could also get pretty messy. Theoretically only a bare majority of old servers and a bare majority of new servers needs to be available during a reconfiguration, but in practice you probably do want a bit more wiggle room to tolerate failures during reconfiguration.
In your example, if node1 isn't functioning and you want to reconfigure the cluster, you should probably remove node1 during that reconfiguration. You can replace it with a new node that is available, like node4. Of course, I can imagine wanting to have different behavior depending on your operational requirements. Are you just asking out of curiosity or are you actually using LogCabin for something? (The project isn't actively maintained these days.)
Since you linked to the dissertation, I should mention that LogCabin uses the joint consensus membership change algorithm described in section 4.3 there. Most of the rest of the concepts of the chapter do still apply or transfer over, but I just wanted to clarify.
Additionally, it also looks like step 2 of the AddServer RPC is not being enforced. The routine is checking that all of _newServers are caught up but not the candidate specifically. This means that if we simply changed the check to be a quorum of staging servers, it would not guarantee that the new server is caught up.
I'm a little confused because the AddServer RPC is described in the dissertation for the single-server membership change algorithm (not the joint consensus algorithm). What is "the candidate" in your question? A "quorum of staging servers" also seems a little sloppy (at least in an imaginary world where servers can be staging for various reasons). You probably mean a quorum of the new servers (a majority of the servers in the new target configuration).
Perhaps you'd want to change the check so that all new servers are up, but if a server was already part of the cluster, it doesn't need to be caught up. I'm not convinced this is better than the current approach, though (at least without a real-world use case).
from logcabin.
First off thanks for the quick and thorough response for a question on an "unmaintained" project!
Taking the example -- I think you are right that it would make sense to remove node1 when node4 is added. My company's software that is using LogCabin only supplies initial bootstrap, add, and remove APIs. Perhaps it should also have a "replace" API which would seemingly get around this issue -- or to do a remove and then an add. The problem is that the software was initially designed with the assumption that if a member goes down, it will come back up at some point (ie: node1 will at some point come back online). This assumption does not necessarily hold in a Cloud environment however as node1 could be gone and lost forever (depending on the implementation of course) with node4 coming up as a replacement. So, when node1 goes down forever, adding node4 becomes impossible without removing node1 first.
A "quorum of staging servers" also seems a little sloppy (at least in an imaginary world where servers can be staging for various reasons).
Indeed -- I meant new servers specifically here.
Perhaps you'd want to change the check so that all new servers are up, but if a server was already part of the cluster, it doesn't need to be caught up. I'm not convinced this is better than the current approach, though (at least without a real-world use case).
In our specific case, node1 is never coming back so this wouldn't work. The workaround I was able to come up with checks:
- That any servers in new servers, but not old servers, are caught up.
- That a majority of new servers are caught up and online.
I also verified that if I brought node1 back online it caught up and membership was accurate.
from logcabin.
Going to close this out -- thanks for the help
from logcabin.
Related Issues (20)
- mention jepsen test in readme HOT 3
- scripts/logcabinctltest.sh sometimes hangs HOT 1
- logcabin init script can erroneously say logcabin is running
- LogCabin cluster can become unavailable due to power failures HOT 5
- cmake HOT 4
- rdtsc test case accuracy on x86_64. HOT 7
- deadlock in client near LogCabin::Event::File::Monitor::~Monitor on GetConfiguration timeout HOT 2
- New PR: logcabin ported to IBM Power8/LE (ppc64le) platform. HOT 2
- logcabin client crash when out of fds at fork
- unnecessary i:
- scons build error HOT 2
- A formal release?
- build failed with scons HOT 1
- test fails due to build failure? HOT 1
- In-memory storage for benchmarking HOT 4
- Unclear documentation HOT 1
- When is this step used : Now use the reconfiguration command to add the second and third servers to the cluster HOT 4
- Could not fetch server info from 127.0.0.1:5255 (Client-specified timeout elapsed). Aborting. HOT 1
- release a current package HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from logcabin.