jgroups-extras / jgroups-raft Goto Github PK
View Code? Open in Web Editor NEWImplementation of the RAFT consensus protocol in JGroups
Home Page: https://jgroups-extras.github.io/jgroups-raft/
License: Apache License 2.0
Implementation of the RAFT consensus protocol in JGroups
Home Page: https://jgroups-extras.github.io/jgroups-raft/
License: Apache License 2.0
This is related to #24 .
When a member P joins for the first time, or is way out of sync, the leader needs to transfer the snapshot to P.
The leader determines that P is out of sync if an ApendEntries message to P with (e.g. index=95) got a negative result with index=0. If the leader's first index is 40, then the leader will never be able to provide messages 1=39 to P.
If the leader has a snapshot, e.g. due to log compaction (see #7), then that snapshot is used. Otherwise, a new snapshot is generated.
Next, the snapshot is sent to P, which applies the snapshot to its state machine and sets last_applied to the index shipped with the snapshot. The log is then created with that index+1.
Following AppendEntries will get P's log up-to-date with the leader using the default mechanisms.
Currently, read requests can either be dirty (handled locally) or are added to the log as normal entries. In the latter case, there's a write to disk which is costly.
Implement a third solution that provides linearizable reads as described in section 6.4 of [1].
When creating a ForkChannel
and adding RAFT
over it, there's an NPE on RAFT.init()
. This requires a change in JGroups, see [1] for details. The code below doesn't work
[1] https://issues.jboss.org/browse/JGRP-1926
public class bla {
protected JChannel ch, fork_ch;
protected void start(String name) throws Exception {
ch=new JChannel("/home/bela/udp.xml").name(name);
fork_ch=new ForkChannel(ch, "singleton", "fc1",
new ELECTION(),
new RAFT().members(Arrays.asList("A", "B", "C")).raftId(name),
new REDIRECT(),
new CLIENT());
fork_ch.connect("ignored");
ch.connect("demo");
}
public static void main(String[] args) throws Exception {
new bla().start(args[0]);
}
}
Exception in thread "main" java.lang.NullPointerException
at org.jgroups.protocols.raft.RAFT.createLogName(RAFT.java:955)
at org.jgroups.protocols.raft.RAFT.start(RAFT.java:430)
at org.jgroups.stack.ProtocolStack.startStack(ProtocolStack.java:965)
at org.jgroups.JChannel.startStack(JChannel.java:890)
at org.jgroups.JChannel._preConnect(JChannel.java:553)
at org.jgroups.JChannel.connect(JChannel.java:288)
at org.jgroups.JChannel.connect(JChannel.java:279)
at org.jgroups.raft.demos.ReplicatedStateMachineDemo.start(ReplicatedStateMachineDemo.java:30)
at org.jgroups.raft.demos.ReplicatedStateMachineDemo.main(ReplicatedStateMachineDemo.java:184)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
Currently, AppendEntries are sent by the leader and committed at the leader when a majority of votes has been received.
However, entries are not committed at the followers.
Todos:
When we have 2 requests (3-4) in the leader's RequestTable with last_applied=4
and commit_index=2
, and the leader crashes and is restarted (or a new leader is started), then RequestTable
needs to be populated with pending requests.
If this is not done, the new leader will start trying to commit pending entries 3 and 4 and when the AppendEntries responses are received, and the RequestTable is empty, we won't ever commit entries 3 and 4.
Solution: when becoming leader, populate RequestTable
from the log: add entries in range [last_applied+1 .. commit_index].
Complete CLIENT
ivy:retrieve] nexus-snapshots: unable to get resource for org/apache/logging/log4j#log4j;2.6.2: res=${nexus.snapshots.url}/org/apache/logging/log4j/log4j/2.6.2/log4j-2.6.2.pom: java.net.MalformedURLException: no protocol: $
nexus.snapshots.url}/org/apache/logging/log4j/log4j/2.6.2/log4j-2.6.2.pom
ivy:retrieve] :: org.mapdb#mapdb;1.0.8: several problems occurred while resolving dependency: org.mapdb#mapdb;1.0.8 {=[]}:
ivy:retrieve] several problems occurred while resolving dependency: org.sonatype.oss#oss-parent;7 {}:
ivy:retrieve] nexus-snapshots: unable to get resource for org/sonatype/oss#oss-parent;7: res=${nexus.snapshots.url}/org/sonatype/oss/oss-parent/7/oss-parent-7.jar: java.net.MalformedURLException: no protocol: ${nexus.snapsh
ts.url}/org/sonatype/oss/oss-parent/7/oss-parent-7.jar
ivy:retrieve] nexus-snapshots: unable to get resource for org/sonatype/oss#oss-parent;7: res=${nexus.snapshots.url}/org/sonatype/oss/oss-parent/7/oss-parent-7.pom: java.net.MalformedURLException: no protocol: ${nexus.snapsh
ts.url}/org/sonatype/oss/oss-parent/7/oss-parent-7.pom
ivy:retrieve] nexus-snapshots: unable to get resource for org/mapdb#mapdb;1.0.8: res=${nexus.snapshots.url}/org/mapdb/mapdb/1.0.8/mapdb-1.0.8.pom: java.net.MalformedURLException: no protocol: ${nexus.snapshots.url}/org/mapd
/mapdb/1.0.8/mapdb-1.0.8.pom
ivy:retrieve] :: org.fusesource.leveldbjni#leveldbjni-all;1.8: several problems occurred while resolving dependency: org.fusesource.leveldbjni#leveldbjni-all;1.8 {=[]}:
ivy:retrieve] several problems occurred while resolving dependency: org.fusesource.leveldbjni#leveldbjni-project;1.8 {}:
ivy:retrieve] several problems occurred while resolving dependency: org.fusesource#fusesource-pom;1.9 {}:
ivy:retrieve] nexus-snapshots: unable to get resource for org/fusesource#fusesource-pom;1.9: res=${nexus.snapshots.url}/org/fusesource/fusesource-pom/1.9/fusesource-pom-1.9.jar: java.net.MalformedURLException: no protocol:
{nexus.snapshots.url}/org/fusesource/fusesource-pom/1.9/fusesource-pom-1.9.jar
ivy:retrieve] nexus-snapshots: unable to get resource for org/fusesource#fusesource-pom;1.9: res=${nexus.snapshots.url}/org/fusesource/fusesource-pom/1.9/fusesource-pom-1.9.pom: java.net.MalformedURLException: no protocol:
{nexus.snapshots.url}/org/fusesource/fusesource-pom/1.9/fusesource-pom-1.9.pom
ivy:retrieve] nexus-snapshots: unable to get resource for org/fusesource/leveldbjni#leveldbjni-project;1.8: res=${nexus.snapshots.url}/org/fusesource/leveldbjni/leveldbjni-project/1.8/leveldbjni-project-1.8.pom: java.net.Ma
formedURLException: no protocol: ${nexus.snapshots.url}/org/fusesource/leveldbjni/leveldbjni-project/1.8/leveldbjni-project-1.8.pom
ivy:retrieve] nexus-snapshots: unable to get resource for org/fusesource/leveldbjni#leveldbjni-all;1.8: res=${nexus.snapshots.url}/org/fusesource/leveldbjni/leveldbjni-all/1.8/leveldbjni-all-1.8.pom: java.net.MalformedURLEx
eption: no protocol: ${nexus.snapshots.url}/org/fusesource/leveldbjni/leveldbjni-all/1.8/leveldbjni-all-1.8.pom
ivy:retrieve] :: org.testng#testng;6.8.+: several problems occurred while resolving dependency: org.sonatype.oss#oss-parent;3 {}:
ivy:retrieve] nexus-snapshots: unable to get resource for org/sonatype/oss#oss-parent;3: res=${nexus.snapshots.url}/org/sonatype/oss/oss-parent/3/oss-parent-3.jar: java.net.MalformedURLException: no protocol: ${nexus.snapsh
ts.url}/org/sonatype/oss/oss-parent/3/oss-parent-3.jar
ivy:retrieve] nexus-snapshots: unable to get resource for org/sonatype/oss#oss-parent;3: res=${nexus.snapshots.url}/org/sonatype/oss/oss-parent/3/oss-parent-3.pom: java.net.MalformedURLException: no protocol: ${nexus.snapsh
ts.url}/org/sonatype/oss/oss-parent/3/oss-parent-3.pom
ivy:retrieve] :: com.beust#jcommander;1.+: several problems occurred while resolving dependency: org.sonatype.oss#oss-parent;3 {}:
ivy:retrieve] nexus-snapshots: unable to get resource for org/sonatype/oss#oss-parent;3: res=${nexus.snapshots.url}/org/sonatype/oss/oss-parent/3/oss-parent-3.jar: java.net.MalformedURLException: no protocol: ${nexus.snapsh
ts.url}/org/sonatype/oss/oss-parent/3/oss-parent-3.jar
ivy:retrieve] nexus-snapshots: unable to get resource for org/sonatype/oss#oss-parent;3: res=${nexus.snapshots.url}/org/sonatype/oss/oss-parent/3/oss-parent-3.pom: java.net.MalformedURLException: no protocol: ${nexus.snapsh
ts.url}/org/sonatype/oss/oss-parent/3/oss-parent-3.pom
ivy:retrieve] :: commons-io#commons-io;2.4: several problems occurred while resolving dependency: commons-io#commons-io;2.4 {=[]}:
ivy:retrieve] several problems occurred while resolving dependency: org.apache.commons#commons-parent;25 {}:
ivy:retrieve] several problems occurred while resolving dependency: org.apache#apache;9 {}:
ivy:retrieve] nexus-snapshots: unable to get resource for org/apache#apache;9: res=${nexus.snapshots.url}/org/apache/apache/9/apache-9.jar: java.net.MalformedURLException: no protocol: ${nexus.snapshots.url}/org/apache/apac
e/9/apache-9.jar
ivy:retrieve] nexus-snapshots: unable to get resource for org/apache#apache;9: res=${nexus.snapshots.url}/org/apache/apache/9/apache-9.pom: java.net.MalformedURLException: no protocol: ${nexus.snapshots.url}/org/apache/apac
e/9/apache-9.pom
ivy:retrieve] nexus-snapshots: unable to get resource for org/apache/commons#commons-parent;25: res=${nexus.snapshots.url}/org/apache/commons/commons-parent/25/commons-parent-25.pom: java.net.MalformedURLException: no proto
ol: ${nexus.snapshots.url}/org/apache/commons/commons-parent/25/commons-parent-25.pom
ivy:retrieve] nexus-snapshots: unable to get resource for commons-io#commons-io;2.4: res=${nexus.snapshots.url}/commons-io/commons-io/2.4/commons-io-2.4.pom: java.net.MalformedURLException: no protocol: ${nexus.snapshots.ur
}/commons-io/commons-io/2.4/commons-io-2.4.pom
ivy:retrieve] ::::::::::::::::::::::::::::::::::::::::::::::
ivy:retrieve]
Counter init value is never used as CounterService.getOrCreateCounter()
calls get()
to find out if there's already exists any value for given counter and eventually use this value. However, get()
always returns a value. In case counter doesn't exist, it returns 0 and thus init value is actually never used.
Example from CounterServiceDemo
which initialize counter to 1
(Counter counter=counter_service.getOrCreateCounter("counter", 1);
):
-- view: [A(raft-id=A)|0] (1) [A(raft-id=A)]
[1] Increment [2] Decrement [3] Compare and set [4] Dump log
[8] Snapshot [9] Increment N times [x] Exit
first-applied=0, last-applied=0, commit-index=0, log size=0b
-- view: [A(raft-id=A)|1] (2) [A(raft-id=A), C(raft-id=C)]
-- changed role to Candidate
-- changed role to Leader
-- view: [A(raft-id=A)|2] (3) [A(raft-id=A), C(raft-id=C), B(raft-id=B)]
4
index (term): command
---------------------
[1] Increment [2] Decrement [3] Compare and set [4] Dump log
[8] Snapshot [9] Increment N times [x] Exit
first-applied=0, last-applied=0, commit-index=0, log size=0b
3
expected value: 1
update: 5
failed setting counter "counter" from 1 to 5, current value is 0
Currently, applications have to retrieve a ref to the RAFT
and CLIENT
protocols, which is cumbersome, as users should not have to deal with the protocol stack of JGroups.
Goal: provide a new class RaftHandle
which sits on top of a channel and has methods such as
Settable
raft-id
RoleChange
listenersIn a nutshell: let this new class handle protocol stack interactions and shield the users from the stack.
All blocks (e.g. CounterService
and ReplicatedStateMachine
) should also use this class.
This class could be created on regular channels or also on fork channels.
The name is yet TBD.
Fix the bug found by Diego et al in his dissertation. The discussion and proposed fix is at [1].
[1] https://mail.google.com/mail/u/0/?tab=wm#inbox/14e7654a1dfc36e9
Currently, votes are granted without checking the candidate's log index/term. This needs to be changed to conform to 5.4.1 of the RAFT paper.
Update jgroups-raft to compile against current JGroups 4.0.5-SNAPSHOT
Does not compile with JGroups 4, JChannel no longer extends Channel.
Add RAFT.max_log_size
which defines the max number of bytes in a log. If exceeded, a snapshot is created.
public synchronized boolean updateTermAndLeader(int term, Address new_leader) {
if(leader == null || (new_leader != null && !leader.equals(new_leader)))
leader=new_leader;
if(term > current_term) {
current_term=term;
return true;
}
return false;
}
maybe leader == null ?
Implement CounterService and CounterServiceDemo in jgroups-raft. All counters are their values are essentially stored in a hashmap, so updates to counters simply updates the hashmap.
When less than N/1+1 servers are running, the service won't be able to make progress (forsaking availability), but it handles network partitions very well.
If we have A.last_applied=1 and A.commit_index=0, then the 1 entry that was added to A's log won't get replicated to a newly started B.
To reproduce:
{A,B,C}
14037 [ERROR] RAFT: A: resending of 0 failed; entry not found
-> RAFT.resend() with index=0 doesn't work (the log starts at 1) !!
Related to #68 - when developing locally, it's common to test with a single node cluster. It seems the commit index is never incremented; perhaps I mis-configured something? Looking through the code, though, it seems like it's behaving as expected: we increment the commit index only when receiving append entries request, which the leader obviously doesn't send to itself. I think this could be solved by using the leader's own index to calculate the quorum.
Did I miss something or is this the intended behaviour?
EDIT: as a side note, using the leader's own match index when computing quorum has a nice side effect of allowing the append to the local log to be asynchronous, i.e. we can start replicating before the entry has been appended to the local log (this is described in section 10.2.1 in the original thesis)
ELECTION protocol searches exactly for RAFT class : https://github.com/belaban/jgroups-raft/blob/bf5bca0440a68581bb91c6355545fc0dd17a1d13/src/org/jgroups/protocols/raft/ELECTION.java#L354 unlike the findProtocol more general method : https://github.com/belaban/JGroups/blob/1763ee1e06d2f71ee6ebdc8ab646ea4cc3b0b7f1/src/org/jgroups/stack/ProtocolStack.java#L751
Make sure ELECTION can find subclasses of RAFT.
Hi,
while trying to make a single node raft quorum work, I ended up in a loophole where the single node A was trying to perform an election that never got a response.
Here is the configuration :
<config xmlns="urn:org:jgroups"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups.xsd">
<TCP
bind_addr="127.0.0.1"
bind_port="7800"
port_range="2"
enable_diagnostics="true"
/>
<TCPPING initial_hosts="127.0.0.1[7800]"
port_range="2"
max_dynamic_hosts="3"
async_discovery="true"
/>
<MERGE3 max_interval="30000"
min_interval="10000"/>
<FD_SOCK/>
<FD_ALL/>
<VERIFY_SUSPECT timeout="1500" />
<BARRIER />
<pbcast.NAKACK2 xmit_interval="500"
xmit_table_num_rows="100"
xmit_table_msgs_per_row="2000"
xmit_table_max_compaction_time="30000"
use_mcast_xmit="false"
discard_delivered_msgs="true"/>
<UNICAST3 xmit_interval="500"
xmit_table_num_rows="100"
xmit_table_msgs_per_row="2000"
xmit_table_max_compaction_time="60000"
conn_expiry_timeout="0"/>
<pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
max_bytes="4M"/>
<raft.NO_DUPES/>
<pbcast.GMS print_local_addr="true" join_timeout="2000"/>
<!--<UFC max_credits="2M"-->
<!--min_threshold="0.4"/>-->
<MFC max_credits="2M"
min_threshold="0.4"/>
<FRAG2 frag_size="60K" />
<RSVP resend_interval="2000" timeout="10000"/>
<pbcast.STATE_TRANSFER />
<raft.ELECTION election_min_interval="100" election_max_interval="500"/>
<raft.RAFT members="A" raft_id="A" resend_interval="1000" />
<raft.REDIRECT/>
<raft.CLIENT bind_addr="0.0.0.0" />
</config>
Listenning to the role change event, I only get
changed role to Candidate
Is this expected ?
These messages don't need to be retransmitted as they're sent periodically anyway. A missing vote request or responses causes no harm, may only delay an election in the worst case.
This means that protocols such as NAKACK2
or UNICAST3
are bypassed.
Log compaction dumps the contents of the state machine into a snapshot file and then truncates the log.
This is done as follows:
Log compaction can be triggered when the log size exceeds a certain configured threshold, or manually (e.g. via JMX). Also, when a leader needs to send a snapshot to a member, and the snapshot doesn't exist, one will be created.
When we have {A,B,C}, then kill C and make a few more updates (e.g. using ReplicatedStateMachineDemo
), after restarting C, A tries to transfer a snapshot to C.
This is costly and (apparently) doesn't work.
Instead, A should simply send the missing log entries to C.
Same problem when we remove C's log: snapshot installation doesn't work.
This is probably a regression of adding snapshot installation
Raft [1] duplicates some of the JGroups functionality: the goal of this issue is to remove that duplication and use JGroups wherever possible, e.g. for heartbeats and election kickoffs.
Also, cluster membership changes as described in ch. 4 of [1] should be simple to implement with this change.
Examples:
majority
, the leader steps downmajority
members, start the election process. When a leader has been chosen, stop the election timerAdvantages:
majority
Details are in [2].
[1] https://github.com/ongardie/dissertation
[2] https://github.com/belaban/jgroups-raft/blob/master/doc/design/Election.txt
Use GitHub Pages to create a simple web site; mainly used to host the manual and point to the discussion list.
Tests RAFT.addServer()/removeServer()
Hi,
I found that one node can't change role from candidate to follower. Then I dig into your code, found that node drops all messages which has smaller term than this node. But it seems not consistence with raft paper. Paper said candidate should change to follower when it receive AppendEntryRequest no matter if the term in the message is bigger or smaller.
Thanks,
Qi
When calling Log.truncate()
with index=N, then closing the log and reopening it, N is unchanged.
LogTest.testTruncate2()
shows the issue
Test Leader Completeness property, e.g. fig 2.9 in [1]: node 3 cannot become leader.
protected boolean voteFor(final Address addr) {
if(addr == null) {
voted_for=null;
return true;
}
if(voted_for == null) {
voted_for=addr;
return true;
}
return voted_for.equals(addr); // a vote for the same candidate in the same term is ok
}
protected void handleVoteRequest(Address sender, int term, int last_log_term, int last_log_index) {
if(local_addr != null && local_addr.equals(sender))
return;
if(log.isTraceEnabled())
log.trace("%s: received VoteRequest from %s: term=%d, my term=%d, last_log_term=%d, last_log_index=%d",
local_addr, sender, term, raft.currentTerm(), last_log_term, last_log_index);
boolean send_vote_rsp=false;
synchronized(this) {
if(voteFor(sender)) {
if(sameOrNewer(last_log_term, last_log_index))
send_vote_rsp=true;
else {
log.trace("%s: dropped VoteRequest from %s as my log is more up-to-date", local_addr, sender);
}
}
else
log.trace("%s: already voted for %s in term %d; skipping vote", local_addr, sender, term);
}
if(send_vote_rsp)
sendVoteResponse(sender, term); // raft.current_term);
}
here!!!!!
return voted_for.equals(addr); // a vote for the same candidate in the same term is ok
Hi
It doesn't work when I try to use tcp in jgroup raft with the following setting.:
<TCP bind_port="7800"
recv_buf_size="${tcp.recv_buf_size:130k}"
send_buf_size="${tcp.send_buf_size:130k}"
max_bundle_size="64K"
sock_conn_timeout="300"
thread_pool.min_threads="0"
thread_pool.max_threads="20"
thread_pool.keep_alive_time="30000"/>
<TCPPING async_discovery="true"
initial_hosts="${jgroups.tcpping.initial_hosts:localhost[7800],localhost[7801]}"
port_range="2"/>
<MERGE3 max_interval="30000"
min_interval="10000"/>
<FD_SOCK/>
<FD_ALL/>
<VERIFY_SUSPECT timeout="1500" />
<BARRIER />
<pbcast.NAKACK2 xmit_interval="500"
xmit_table_num_rows="100"
xmit_table_msgs_per_row="2000"
xmit_table_max_compaction_time="30000"
max_msg_batch_size="500"
use_mcast_xmit="false"
discard_delivered_msgs="true"/>
<UNICAST3 xmit_interval="500"
xmit_table_num_rows="100"
xmit_table_msgs_per_row="2000"
xmit_table_max_compaction_time="60000"
conn_expiry_timeout="0"
max_msg_batch_size="500"/>
<pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
max_bytes="4M"/>
<raft.NO_DUPES/>
<pbcast.GMS print_local_addr="true" join_timeout="2000"
view_bundling="true"/>
<UFC max_credits="2M"
min_threshold="0.4"/>
<MFC max_credits="2M"
min_threshold="0.4"/>
<FRAG2 frag_size="60K" />
<RSVP resend_interval="2000" timeout="10000"/>
<pbcast.STATE_TRANSFER />
<raft.ELECTION election_min_interval="100" election_max_interval="500"/>
<raft.RAFT members="1,2" raft_id="${raft_id:undefined}" resend_interval="1000"/>
<raft.REDIRECT/>
<raft.CLIENT bind_addr="0.0.0.0" />
Thanks,
Qi
Change code to take advantage of JDK 8
Implementation of Log
in memory only. Main benefit: speed for unit tests and no need to clean up files on disk after unit tests. Also avoids incorrect unit tests by inadvertently reusing left over log files from previous runs or demos.
When running 2 (or more) instances of CounterServiceDemo
and incrementing the same counter from 2 instances at the same time (e.g. press [9]
and pick 1000
increments), then both instances block.
A preliminary investigation showed that the issue is with REDIRECT
.
This is re #21: currently probe.sh or JMX needs to be used to dynamically add or remove a server. However, also provide 2 commands which can be submitted by a client (e.g. via a script) to add / remove servers. The commands would try to contact a local server (fixed port) or be given an address:port to connect to.
This requires an additional protocol which listens for client commands and forwards them to the current leader, plus a simple client-server protocol.
Make sure the client-server protocol is generic enough to later be reused for generic client commands, e.g. get(), set() etc.
In some scenarios, we can end up with followers not getting the last commit_id, e.g. (A = leader, B = follower)
The resend task on A should resend the commit-id of 1000 to B, so B can update its commit-id from 999 to 1000 and apply the state change represented by update 1000 to its state machine.
Need to reproduce this first...
I may be missing something, so apologies if I'm reading your code wrong :-)
It appears that when handling a VoteRequest
, the heartbeat timer is never reset upon voting for a candidate.
https://github.com/belaban/jgroups-raft/blob/master/src/org/jgroups/protocols/raft/ELECTION.java#L191-L211
Raft specifies that a follower resets its election timeout when it grants a vote to a candidate. This ensures that a follower doesn't vote for a candidate and then immediately timeout, transition to candidate itself, increment its term, vote for itself, and ultimately force a newly elected leader to step down.
The jgroups version in the current pom.xml states the version 3.6.0.Final however it does not recognize
org.jgroups.util.Util.waitUntilAllChannelsHaveSameView
which is used in the current project. It is only supported in version 4.0.0.Beta3 however that is not compatible with other symbols used from jgroups.
If we have RAFT.members
= {A,B,C}
, we do currently prevent a member D
from joining. However, we don't prevent two members with the same raft-id from joining, e.g. 2 members each with raft-id=C.
This currently only triggers a warning, but ideally we'd like to prevent the second duplicate C
member from starting.
This could be done by each new joiner checking its first view and closing the channel when it finds a dupe. Not nice as this may not necessarily terminate the application which started the channel.
Alternatively, we could create a new protocol DUPE_PREVENTION
which is a copy of AUTH
and rejects all JOIN and MERGE requests which would add a duplicate member to the current view.
'Duplicate member' here means a member whose address (an ExtendedUUID
) has a duplicate raft-id.
56773 [DEBUG] RAFT: A: sending snapshot (17b) to B
57775 [DEBUG] RAFT: A: sending snapshot (17b) to B
58777 [DEBUG] RAFT: A: sending snapshot (17b) to B
59779 [DEBUG] RAFT: A: sending snapshot (17b) to B
...
To reproduce:
{A,B,C}
); they'll get the majority and (assume) A becomes leaderCounterServiceDemo
/tmp/B.log
)gh-pages
in jgroups-raft repo
Currently, election and heartbeating are in RAFT. Move them into separate protocols, so they can later be replaced with JGroups' own heartbeating.
When we have cluster {A,B,C,D,E}
, with A
being the leader, and then the clusters splits into {A,B}
(A
being the leader in term 2) and {C,D,E}
(C
being the leader in term 3), then A
could block client requests for a long time, because it cannot commit them (no majority), until the network split disappears.
However, C
in term 3 is able to commit changes and would therefore not block client requests.
According to section 6.1 ("Leaders") [1], a leader can realize that it cannot get a majority any longer. In JGroups, this can be done by having A
check on each view change if the view's size is still greater than or equal to the majority, and if this is not the case, step down as leader.
Thus, when A
gets view change A|2={A,B}
, it should become a Follower. This would allow clients to possibly access the new coordinator C
.
Followers could do the same: null leader
when they get a view whose size is smaller than the majority. This would eliminate the risk of followers redirecting clients to stale leaders.
We have members={A,B,C,D}
(majority=3) and have members A (leader), B and C running (e.g. CounterServiceDemo
).
To reproduce:
-> Probably B's vote was counted twice
-> Solution: maintain not just the number of votes but also who voted and discard duplicate votes
Correlate AppendEntries requests and responses:
Currently RAFT.majority
defines the majority needed for elections and log commits. This requires the operator to always start the same set of servers. It is easy to start A,B,C, append and commit some changes to the log, then stop them and start D,E,F and make some other changes, overwriting the previous ones and violating Leader Completeness.
We need to define a static membership, e.g. {A,B,C}
in the config file, compute the majority from it (2) and use AddServer
or RemoveServer
to change it dynamically (in running members) and XML editing to change it in the config.
Every member should be identified by a name (logical name), which also names the persistent log. When started, and the name is not in the above list, become read-only (don't participate in elections) or throw an exception.
Adding or removing a server would involve calling AddServer
which needs to be acked by a majority of the existing view, and then the config would need to be changed as well.
Provide a new building block ConsensusService, whih can be used by applications to get consensus on a decision.
The jgroups-raft impl is somewhat tied to a state machine, and this block allows for it to be used in a different scenario.
Hi,
https://github.com/belaban/jgroups-raft/blob/432ad919edcb32915ae174b77a484e123ef065a0/src/org/jgroups/protocols/raft/InMemoryLog.java#L98 is off by one when adding a single log entry and the entries array is full.
Error is :
java.lang.ArrayIndexOutOfBoundsException: 16
at org.jgroups.protocols.raft.InMemoryLog.append(InMemoryLog.java:104)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.