jgroups-extras / jgroups-raft Goto Github PK

View Code? Open in Web Editor NEW

257.0 34.0 81.0 1.39 MB

Implementation of the RAFT consensus protocol in JGroups

Home Page: https://jgroups-extras.github.io/jgroups-raft/

License: Apache License 2.0

Java 99.07% Shell 0.73% Dockerfile 0.21%

consensus distributed-systems raft raft-algorithm raft-protocol

jgroups-raft's Introduction

jgroups-raft

jgroups-raft is an implementation of the Raft consensus algorithm in JGroups. Users can use jgroups-raft embedded in their applications to build strongly consistent, highly available, fault-tolerant systems.

Overview

jgroups-raft is a library offering all the guarantees the Raft algorithm provides, with features including:

Configurable and alternatives for leader election;
Dynamic membership changes in single-step;
Configurable log implementations for storage;
Member deduplication and request redirection;
Ready-to-use building blocks.

By building on top of JGroups, jgroups-raft takes advantage of additional features with a mature and battle-tested network stack. jgroups-raft is verified with Jepsen to identify linearizability violations in the building blocks.

Getting Started

To get started developing with jgroups-raft:

Take a look at the complete documentation;
Details about the implementation are available in the design documents.

Contributing

Get in touch through the discussion group or GitHub discussions;
Open bug reports on GitHub issues;
Feel like coding? Look at the issues page and get in touch with any questions.

jgroups-raft's People

Contributors

Stargazers

Watchers

Forkers

ugol fmarinelli hexiaofeng sanne lemonhall narry juabygroup metanet octo47 yangjunlin-const ligzy tempbottle is00hcw piaolinzhi tonywutao ibrahimshbat brianfdc oscerd chicc999 georgekankava wmydz1 luciferyang zyhxq mrblack1117 baozhi bjo2008cnx damozhiying trentsky zeq9069 xinbinhao yuwentao coreycui zhangshun2014 ouyangshourui duzhanyuan quinnzhuang pferraro junplus mankeyl juaby zwillim nicolaferraro uestc-bai binarchitecture sgano1 glpenghui japperhu paulzhu8597 bigdot123456 hwlsniper kelvinni valdar chinasteel hypnoce baojie5642 chr1st0ph haima-zju misselvexu alvin4jay npepinpe johnpoth vjuranek adedayoominiyi kuangye098 osheroff xue2lang franz1981 lstarby ricky04220 1004770234 yulongq gubaojian pruivo help-lixin cdc-binlog jabolina mysland zhengyuyzhao tsegismont dongyangfu yfei-z

jgroups-raft's Issues

Update jgroups-raft to compile against current JGroups 4.0.5-SNAPSHOT

Does not compile with JGroups 4, JChannel no longer extends Channel.

Jgroups version incompatibility

The jgroups version in the current pom.xml states the version 3.6.0.Final however it does not recognize
org.jgroups.util.Util.waitUntilAllChannelsHaveSameView which is used in the current project. It is only supported in version 4.0.0.Beta3 however that is not compatible with other symbols used from jgroups.

Move election and heartbeating into their own protocols

Currently, election and heartbeating are in RAFT. Move them into separate protocols, so they can later be replaced with JGroups' own heartbeating.

Leader needs to step down if view size falls below the majority

When we have cluster {A,B,C,D,E}, with A being the leader, and then the clusters splits into {A,B} (A being the leader in term 2) and {C,D,E} (C being the leader in term 3), then A could block client requests for a long time, because it cannot commit them (no majority), until the network split disappears.

However, C in term 3 is able to commit changes and would therefore not block client requests.

According to section 6.1 ("Leaders") [1], a leader can realize that it cannot get a majority any longer. In JGroups, this can be done by having A check on each view change if the view's size is still greater than or equal to the majority, and if this is not the case, step down as leader.
Thus, when A gets view change A|2={A,B}, it should become a Follower. This would allow clients to possibly access the new coordinator C.

Followers could do the same: null leader when they get a view whose size is smaller than the majority. This would eliminate the risk of followers redirecting clients to stale leaders.

[1] https://github.com/ongardie/dissertation

All members changeRole for Candidate, How to solve it ?

   protected boolean voteFor(final Address addr) {
        if(addr == null) {
            voted_for=null;
            return true;
        }
        if(voted_for == null) {
            voted_for=addr;
            return true;
        }
        return voted_for.equals(addr); // a vote for the same candidate in the same term is ok
    }

protected void handleVoteRequest(Address sender, int term, int last_log_term, int last_log_index) {
        if(local_addr != null && local_addr.equals(sender))
            return;
        if(log.isTraceEnabled())
            log.trace("%s: received VoteRequest from %s: term=%d, my term=%d, last_log_term=%d, last_log_index=%d",
                      local_addr, sender, term, raft.currentTerm(), last_log_term, last_log_index);
        boolean send_vote_rsp=false;
        synchronized(this) {
            if(voteFor(sender)) {
                if(sameOrNewer(last_log_term, last_log_index))
                    send_vote_rsp=true;
                else {
                    log.trace("%s: dropped VoteRequest from %s as my log is more up-to-date", local_addr, sender);
                }
            }
            else
                log.trace("%s: already voted for %s in term %d; skipping vote", local_addr, sender, term);
        }
        if(send_vote_rsp)
            sendVoteResponse(sender, term); // raft.current_term);
    }

here!!!!!
return voted_for.equals(addr); // a vote for the same candidate in the same term is ok

Manual

Write manual (copy build and templates from JGroups)
Publis to GitHub pages: branch gh-pages in jgroups-raft repo

Killing and restarting member P let's P vote twice

We have members={A,B,C,D} (majority=3) and have members A (leader), B and C running (e.g. CounterServiceDemo).
To reproduce:

Kill C
Now A and B don't have a majority to commit
Make a change in A (e.g. increment the counter from 3 to 4)
Both A and B add the incr to their log but don't commit as A doesn't have the majority (it only has 2 votes for the change). The value of the counter is therefore still 3.
Kill B and restart it
The change to the counter is now committed and its value is (incorrectly) 4 !

-> Probably B's vote was counted twice
-> Solution: maintain not just the number of votes but also who voted and discard duplicate votes

Replicate committed entries from leader to followers

Currently, AppendEntries are sent by the leader and committed at the leader when a majority of votes has been received.
However, entries are not committed at the followers.
Todos:

Maintain commit_table
Start task which periodically (heartbeat_interval ms) sends out AppendEntries messages to members who are lagging behind
Advanced commit_id and apply log entries to the state machine at the followers on reception of AppendEntries message

Singleton service

RAFT: RequestTable needs to be populated when becoming leader

When we have 2 requests (3-4) in the leader's RequestTable with last_applied=4 and commit_index=2, and the leader crashes and is restarted (or a new leader is started), then RequestTable needs to be populated with pending requests.
If this is not done, the new leader will start trying to commit pending entries 3 and 4 and when the AppendEntries responses are received, and the RequestTable is empty, we won't ever commit entries 3 and 4.
Solution: when becoming leader, populate RequestTable from the log: add entries in range [last_applied+1 .. commit_index].

Incorrect commit-index after restart

This is related to #24 .

Static membership={A,B,C}
Start A, B (A is the leader)
Kill B, now we have no majority
Make a change: last-applied=1, commit-index=0 on A
Kill A
Restart A and start B
-> commit-index should now (we have the majority) be 1, but it is still 0 in A and B

Unit tests for election

Test Leader Completeness property, e.g. fig 2.9 in [1]: node 3 cannot become leader.

[1] http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-857.html

Add log compaction

Log compaction dumps the contents of the state machine into a snapshot file and then truncates the log.
This is done as follows:

The member stops handling requests
The state machine is asked to dump its state into a snapshot file (same location as the log file(s))
The logs is truncated:
- If the log was 1-95 (first=1, last_applied=commit_index=95), then 1-94 will be removed, and first=95, last_applied=commit_index=95
The member accepts requests again

Log compaction can be triggered when the log size exceeds a certain configured threshold, or manually (e.g. via JMX). Also, when a leader needs to send a snapshot to a member, and the snapshot doesn't exist, one will be created.

ELECTION: heartbeats, vote requests and vote responses don't need to be sent reliably

These messages don't need to be retransmitted as they're sent periodically anyway. A missing vote request or responses causes no harm, may only delay an election in the worst case.
This means that protocols such as NAKACK2 or UNICAST3 are bypassed.

not work when using tcp

It doesn't work when I try to use tcp in jgroup raft with the following setting.:

<TCP bind_port="7800"
recv_buf_size="${tcp.recv_buf_size:130k}"
send_buf_size="${tcp.send_buf_size:130k}"
max_bundle_size="64K"
sock_conn_timeout="300"

     thread_pool.min_threads="0"
     thread_pool.max_threads="20"
     thread_pool.keep_alive_time="30000"/>

<TCPPING async_discovery="true"
         initial_hosts="${jgroups.tcpping.initial_hosts:localhost[7800],localhost[7801]}"
         port_range="2"/>
<MERGE3 max_interval="30000"
        min_interval="10000"/>
<FD_SOCK/>
<FD_ALL/>
<VERIFY_SUSPECT timeout="1500"  />
<BARRIER />
<pbcast.NAKACK2 xmit_interval="500"
                xmit_table_num_rows="100"
                xmit_table_msgs_per_row="2000"
                xmit_table_max_compaction_time="30000"
                max_msg_batch_size="500"
                use_mcast_xmit="false"
                discard_delivered_msgs="true"/>
<UNICAST3 xmit_interval="500"
          xmit_table_num_rows="100"
          xmit_table_msgs_per_row="2000"
          xmit_table_max_compaction_time="60000"
          conn_expiry_timeout="0"
          max_msg_batch_size="500"/>
<pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
               max_bytes="4M"/>
<raft.NO_DUPES/>
<pbcast.GMS print_local_addr="true" join_timeout="2000"
            view_bundling="true"/>
<UFC max_credits="2M"
     min_threshold="0.4"/>
<MFC max_credits="2M"
     min_threshold="0.4"/>
<FRAG2 frag_size="60K"  />
<RSVP resend_interval="2000" timeout="10000"/>
<pbcast.STATE_TRANSFER />
<raft.ELECTION election_min_interval="100" election_max_interval="500"/>
<raft.RAFT members="1,2" raft_id="${raft_id:undefined}" resend_interval="1000"/>
<raft.REDIRECT/>
<raft.CLIENT bind_addr="0.0.0.0" />

Thanks,
Qi

AddServer / RemoverServer commands

This is re #21: currently probe.sh or JMX needs to be used to dynamically add or remove a server. However, also provide 2 commands which can be submitted by a client (e.g. via a script) to add / remove servers. The commands would try to contact a local server (fixed port) or be given an address:port to connect to.
This requires an additional protocol which listens for client commands and forwards them to the current leader, plus a simple client-server protocol.
Make sure the client-server protocol is generic enough to later be reused for generic client commands, e.g. get(), set() etc.

note can't change role from candidate to follower when this node is isolated to other node for a while.

Hi,

I found that one node can't change role from candidate to follower. Then I dig into your code, found that node drops all messages which has smaller term than this node. But it seems not consistence with raft paper. Paper said candidate should change to follower when it receive AppendEntryRequest no matter if the term in the message is bigger or smaller.

Thanks,
Qi

Reset heartbeat timer on vote

I may be missing something, so apologies if I'm reading your code wrong :-)
It appears that when handling a VoteRequest, the heartbeat timer is never reset upon voting for a candidate.
https://github.com/belaban/jgroups-raft/blob/master/src/org/jgroups/protocols/raft/ELECTION.java#L191-L211
Raft specifies that a follower resets its election timeout when it grants a vote to a candidate. This ensures that a follower doesn't vote for a candidate and then immediately timeout, transition to candidate itself, increment its term, vote for itself, and ultimately force a newly elected leader to step down.

RAFT: add property (max_bytes) to trigger snapshot

Add RAFT.max_log_size which defines the max number of bytes in a log. If exceeded, a snapshot is created.

In-memory log implementation for unit tests

Implementation of Log in memory only. Main benefit: speed for unit tests and no need to clean up files on disk after unit tests. Also avoids incorrect unit tests by inadvertently reusing left over log files from previous runs or demos.

Unit tests for dynamic addition

Tests RAFT.addServer()/removeServer()

Provide ConsensusService building block

Provide a new building block ConsensusService, whih can be used by applications to get consensus on a decision.
The jgroups-raft impl is somewhat tied to a state machine, and this block allows for it to be used in a different scenario.

Make JDK 8 baseline

Change code to take advantage of JDK 8

Prevent duplicate members

If we have RAFT.members = {A,B,C}, we do currently prevent a member D from joining. However, we don't prevent two members with the same raft-id from joining, e.g. 2 members each with raft-id=C.
This currently only triggers a warning, but ideally we'd like to prevent the second duplicate C member from starting.
This could be done by each new joiner checking its first view and closing the channel when it finds a dupe. Not nice as this may not necessarily terminate the application which started the channel.
Alternatively, we could create a new protocol DUPE_PREVENTION which is a copy of AUTH and rejects all JOIN and MERGE requests which would add a duplicate member to the current view.
'Duplicate member' here means a member whose address (an ExtendedUUID) has a duplicate raft-id.

Cannot apply entries in single node cluster

Related to #68 - when developing locally, it's common to test with a single node cluster. It seems the commit index is never incremented; perhaps I mis-configured something? Looking through the code, though, it seems like it's behaving as expected: we increment the commit index only when receiving append entries request, which the leader obviously doesn't send to itself. I think this could be solved by using the leader's own index to calculate the quorum.

Did I miss something or is this the intended behaviour?

EDIT: as a side note, using the leader's own match index when computing quorum has a nice side effect of allowing the append to the local log to be asynchronous, i.e. we can start replicating before the entry has been appended to the local log (this is described in section 10.2.1 in the original thesis)

Web site

Use GitHub Pages to create a simple web site; mainly used to host the manual and point to the discussion list.

Reads should not create an entry in the log

Currently, read requests can either be dirty (handled locally) or are added to the log as normal entries. In the latter case, there's a write to disk which is costly.
Implement a third solution that provides linearizable reads as described in section 6.4 of [1].

[1] https://github.com/ongardie/dissertation

ELECTION does not work with RAFT subclass

ELECTION protocol searches exactly for RAFT class : https://github.com/belaban/jgroups-raft/blob/bf5bca0440a68581bb91c6355545fc0dd17a1d13/src/org/jgroups/protocols/raft/ELECTION.java#L354 unlike the findProtocol more general method : https://github.com/belaban/JGroups/blob/1763ee1e06d2f71ee6ebdc8ab646ea4cc3b0b7f1/src/org/jgroups/stack/ProtocolStack.java#L751
Make sure ELECTION can find subclasses of RAFT.

Add snapshot transfer

When a member P joins for the first time, or is way out of sync, the leader needs to transfer the snapshot to P.
The leader determines that P is out of sync if an ApendEntries message to P with (e.g. index=95) got a negative result with index=0. If the leader's first index is 40, then the leader will never be able to provide messages 1=39 to P.
If the leader has a snapshot, e.g. due to log compaction (see #7), then that snapshot is used. Otherwise, a new snapshot is generated.
Next, the snapshot is sent to P, which applies the snapshot to its state machine and sets last_applied to the index shipped with the snapshot. The log is then created with that index+1.
Following AppendEntries will get P's log up-to-date with the leader using the default mechanisms.

Demos don't work

Exception in thread "main" java.lang.NullPointerException
at org.jgroups.protocols.raft.RAFT.createLogName(RAFT.java:955)
at org.jgroups.protocols.raft.RAFT.start(RAFT.java:430)
at org.jgroups.stack.ProtocolStack.startStack(ProtocolStack.java:965)
at org.jgroups.JChannel.startStack(JChannel.java:890)
at org.jgroups.JChannel._preConnect(JChannel.java:553)
at org.jgroups.JChannel.connect(JChannel.java:288)
at org.jgroups.JChannel.connect(JChannel.java:279)
at org.jgroups.raft.demos.ReplicatedStateMachineDemo.start(ReplicatedStateMachineDemo.java:30)
at org.jgroups.raft.demos.ReplicatedStateMachineDemo.main(ReplicatedStateMachineDemo.java:184)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)

Cannot create single member cluster

Hi,

while trying to make a single node raft quorum work, I ended up in a loophole where the single node A was trying to perform an election that never got a response.

Here is the configuration :

<config xmlns="urn:org:jgroups"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups.xsd">
  <TCP
          bind_addr="127.0.0.1"
          bind_port="7800"
          port_range="2"
          enable_diagnostics="true"
  />


  <TCPPING initial_hosts="127.0.0.1[7800]"
           port_range="2"
           max_dynamic_hosts="3"
           async_discovery="true"
  />
  <MERGE3 max_interval="30000"
          min_interval="10000"/>
  <FD_SOCK/>
  <FD_ALL/>
  <VERIFY_SUSPECT timeout="1500"  />
  <BARRIER />
  <pbcast.NAKACK2 xmit_interval="500"
                  xmit_table_num_rows="100"
                  xmit_table_msgs_per_row="2000"
                  xmit_table_max_compaction_time="30000"
                  use_mcast_xmit="false"
                  discard_delivered_msgs="true"/>
  <UNICAST3 xmit_interval="500"
            xmit_table_num_rows="100"
            xmit_table_msgs_per_row="2000"
            xmit_table_max_compaction_time="60000"
            conn_expiry_timeout="0"/>
  <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
                 max_bytes="4M"/>
  <raft.NO_DUPES/>
  <pbcast.GMS print_local_addr="true" join_timeout="2000"/>
  <!--<UFC max_credits="2M"-->
  <!--min_threshold="0.4"/>-->
  <MFC max_credits="2M"
       min_threshold="0.4"/>
  <FRAG2 frag_size="60K"  />
  <RSVP resend_interval="2000" timeout="10000"/>
  <pbcast.STATE_TRANSFER />
  <raft.ELECTION election_min_interval="100" election_max_interval="500"/>
  <raft.RAFT members="A" raft_id="A" resend_interval="1000" />
  <raft.REDIRECT/>
  <raft.CLIENT bind_addr="0.0.0.0" />
</config>

Listenning to the role change event, I only get
changed role to Candidate

Is this expected ?

CounterServiceDemo: concurrent mass increment blocks clients

When running 2 (or more) instances of CounterServiceDemo and incrementing the same counter from 2 instances at the same time (e.g. press [9] and pick 1000 increments), then both instances block.
A preliminary investigation showed that the issue is with REDIRECT.

Fix bug in dynamic membership

Fix the bug found by Diego et al in his dissertation. The discussion and proposed fix is at [1].

[1] https://mail.google.com/mail/u/0/?tab=wm#inbox/14e7654a1dfc36e9

Grant vote based on candidate's log term and index

Currently, votes are granted without checking the candidate's log index/term. This needs to be changed to conform to 5.4.1 of the RAFT paper.

LevelDBLog: truncate() doesn't persist across Log.close() -> Log.init()

When calling Log.truncate() with index=N, then closing the log and reopening it, N is unchanged.
LogTest.testTruncate2() shows the issue

RAFT: request correlation

Correlate AppendEntries requests and responses:

Clients should get return values when calling set()
The matchIndex and nextIndex has to be adjusted
If we get responses from a majority, we have to advance the commitIndex and apply all committed entries to the state machine, plus unblock clients blocked on set() with results

Counter init value is never used

Counter init value is never used as CounterService.getOrCreateCounter() calls get() to find out if there's already exists any value for given counter and eventually use this value. However, get() always returns a value. In case counter doesn't exist, it returns 0 and thus init value is actually never used.

Example from CounterServiceDemo which initialize counter to 1
(Counter counter=counter_service.getOrCreateCounter("counter", 1);):

-- view: [A(raft-id=A)|0] (1) [A(raft-id=A)]
[1] Increment [2] Decrement [3] Compare and set [4] Dump log
[8] Snapshot [9] Increment N times [x] Exit
first-applied=0, last-applied=0, commit-index=0, log size=0b

-- view: [A(raft-id=A)|1] (2) [A(raft-id=A), C(raft-id=C)]
-- changed role to Candidate
-- changed role to Leader
-- view: [A(raft-id=A)|2] (3) [A(raft-id=A), C(raft-id=C), B(raft-id=B)]
4

index (term): command
---------------------

[1] Increment [2] Decrement [3] Compare and set [4] Dump log
[8] Snapshot [9] Increment N times [x] Exit
first-applied=0, last-applied=0, commit-index=0, log size=0b

3
expected value: 1
update: 5
failed setting counter "counter" from 1 to 5, current value is 0

ArrayIndexOutOfBound when adding more than 16 entry to InMemoryLog one by one

Hi,

https://github.com/belaban/jgroups-raft/blob/432ad919edcb32915ae174b77a484e123ef065a0/src/org/jgroups/protocols/raft/InMemoryLog.java#L98 is off by one when adding a single log entry and the entries array is full.

Error is :

java.lang.ArrayIndexOutOfBoundsException: 16
	at org.jgroups.protocols.raft.InMemoryLog.append(InMemoryLog.java:104)

RAFT: commit_id of follower(s) not updated

In some scenarios, we can end up with followers not getting the last commit_id, e.g. (A = leader, B = follower)

A: last-applied=1000, commit-id=1000
B: last-applied=1000, commit-id=999

The resend task on A should resend the commit-id of 1000 to B, so B can update its commit-id from 999 to 1000 and apply the state change represented by update 1000 to its state machine.
Need to reproduce this first...

RAFT: uncommitted entry is not replicated

If we have A.last_applied=1 and A.commit_index=0, then the 1 entry that was added to A's log won't get replicated to a newly started B.
To reproduce:

RAFT.members is {A,B,C}
Use CounterServiceDemo to reproduce
Start A
Start B. Assuming that the leader is A:
Kill B
A is still the leader (this might change in the future)
Add an entry to A's state machine. This will fail, as we don't have a majority to commit. However, A will add the entry to its log and set last_applied to 1 (commit_index is 0)
Start B again
-> A won't replicate the entry to B !
-> Adding another entry to A won't help either:

14037 [ERROR] RAFT: A: resending of 0 failed; entry not found

-> RAFT.resend() with index=0 doesn't work (the log starts at 1) !!

nexus issue while building from scratch

ivy:retrieve] nexus-snapshots: unable to get resource for org/apache/logging/log4j#log4j;2.6.2: res=${nexus.snapshots.url}/org/apache/logging/log4j/log4j/2.6.2/log4j-2.6.2.pom: java.net.MalformedURLException: no protocol: $
nexus.snapshots.url}/org/apache/logging/log4j/log4j/2.6.2/log4j-2.6.2.pom
ivy:retrieve] :: org.mapdb#mapdb;1.0.8: several problems occurred while resolving dependency: org.mapdb#mapdb;1.0.8 {=[]}:
ivy:retrieve] several problems occurred while resolving dependency: org.sonatype.oss#oss-parent;7 {}:
ivy:retrieve] nexus-snapshots: unable to get resource for org/sonatype/oss#oss-parent;7: res=${nexus.snapshots.url}/org/sonatype/oss/oss-parent/7/oss-parent-7.jar: java.net.MalformedURLException: no protocol: ${nexus.snapsh
ts.url}/org/sonatype/oss/oss-parent/7/oss-parent-7.jar
ivy:retrieve] nexus-snapshots: unable to get resource for org/sonatype/oss#oss-parent;7: res=${nexus.snapshots.url}/org/sonatype/oss/oss-parent/7/oss-parent-7.pom: java.net.MalformedURLException: no protocol: ${nexus.snapsh
ts.url}/org/sonatype/oss/oss-parent/7/oss-parent-7.pom
ivy:retrieve] nexus-snapshots: unable to get resource for org/mapdb#mapdb;1.0.8: res=${nexus.snapshots.url}/org/mapdb/mapdb/1.0.8/mapdb-1.0.8.pom: java.net.MalformedURLException: no protocol: ${nexus.snapshots.url}/org/mapd
/mapdb/1.0.8/mapdb-1.0.8.pom
ivy:retrieve] :: org.fusesource.leveldbjni#leveldbjni-all;1.8: several problems occurred while resolving dependency: org.fusesource.leveldbjni#leveldbjni-all;1.8 {=[]}:
ivy:retrieve] several problems occurred while resolving dependency: org.fusesource.leveldbjni#leveldbjni-project;1.8 {}:
ivy:retrieve] several problems occurred while resolving dependency: org.fusesource#fusesource-pom;1.9 {}:
ivy:retrieve] nexus-snapshots: unable to get resource for org/fusesource#fusesource-pom;1.9: res=${nexus.snapshots.url}/org/fusesource/fusesource-pom/1.9/fusesource-pom-1.9.jar: java.net.MalformedURLException: no protocol:
{nexus.snapshots.url}/org/fusesource/fusesource-pom/1.9/fusesource-pom-1.9.jar
ivy:retrieve] nexus-snapshots: unable to get resource for org/fusesource#fusesource-pom;1.9: res=${nexus.snapshots.url}/org/fusesource/fusesource-pom/1.9/fusesource-pom-1.9.pom: java.net.MalformedURLException: no protocol:
{nexus.snapshots.url}/org/fusesource/fusesource-pom/1.9/fusesource-pom-1.9.pom
ivy:retrieve] nexus-snapshots: unable to get resource for org/fusesource/leveldbjni#leveldbjni-project;1.8: res=${nexus.snapshots.url}/org/fusesource/leveldbjni/leveldbjni-project/1.8/leveldbjni-project-1.8.pom: java.net.Ma
formedURLException: no protocol: ${nexus.snapshots.url}/org/fusesource/leveldbjni/leveldbjni-project/1.8/leveldbjni-project-1.8.pom
ivy:retrieve] nexus-snapshots: unable to get resource for org/fusesource/leveldbjni#leveldbjni-all;1.8: res=${nexus.snapshots.url}/org/fusesource/leveldbjni/leveldbjni-all/1.8/leveldbjni-all-1.8.pom: java.net.MalformedURLEx
eption: no protocol: ${nexus.snapshots.url}/org/fusesource/leveldbjni/leveldbjni-all/1.8/leveldbjni-all-1.8.pom
ivy:retrieve] :: org.testng#testng;6.8.+: several problems occurred while resolving dependency: org.sonatype.oss#oss-parent;3 {}:
ivy:retrieve] nexus-snapshots: unable to get resource for org/sonatype/oss#oss-parent;3: res=${nexus.snapshots.url}/org/sonatype/oss/oss-parent/3/oss-parent-3.jar: java.net.MalformedURLException: no protocol: ${nexus.snapsh
ts.url}/org/sonatype/oss/oss-parent/3/oss-parent-3.jar
ivy:retrieve] nexus-snapshots: unable to get resource for org/sonatype/oss#oss-parent;3: res=${nexus.snapshots.url}/org/sonatype/oss/oss-parent/3/oss-parent-3.pom: java.net.MalformedURLException: no protocol: ${nexus.snapsh
ts.url}/org/sonatype/oss/oss-parent/3/oss-parent-3.pom
ivy:retrieve] :: com.beust#jcommander;1.+: several problems occurred while resolving dependency: org.sonatype.oss#oss-parent;3 {}:
ivy:retrieve] nexus-snapshots: unable to get resource for org/sonatype/oss#oss-parent;3: res=${nexus.snapshots.url}/org/sonatype/oss/oss-parent/3/oss-parent-3.jar: java.net.MalformedURLException: no protocol: ${nexus.snapsh
ts.url}/org/sonatype/oss/oss-parent/3/oss-parent-3.jar
ivy:retrieve] nexus-snapshots: unable to get resource for org/sonatype/oss#oss-parent;3: res=${nexus.snapshots.url}/org/sonatype/oss/oss-parent/3/oss-parent-3.pom: java.net.MalformedURLException: no protocol: ${nexus.snapsh
ts.url}/org/sonatype/oss/oss-parent/3/oss-parent-3.pom
ivy:retrieve] :: commons-io#commons-io;2.4: several problems occurred while resolving dependency: commons-io#commons-io;2.4 {=[]}:
ivy:retrieve] several problems occurred while resolving dependency: org.apache.commons#commons-parent;25 {}:
ivy:retrieve] several problems occurred while resolving dependency: org.apache#apache;9 {}:
ivy:retrieve] nexus-snapshots: unable to get resource for org/apache#apache;9: res=${nexus.snapshots.url}/org/apache/apache/9/apache-9.jar: java.net.MalformedURLException: no protocol: ${nexus.snapshots.url}/org/apache/apac
e/9/apache-9.jar
ivy:retrieve] nexus-snapshots: unable to get resource for org/apache#apache;9: res=${nexus.snapshots.url}/org/apache/apache/9/apache-9.pom: java.net.MalformedURLException: no protocol: ${nexus.snapshots.url}/org/apache/apac
e/9/apache-9.pom
ivy:retrieve] nexus-snapshots: unable to get resource for org/apache/commons#commons-parent;25: res=${nexus.snapshots.url}/org/apache/commons/commons-parent/25/commons-parent-25.pom: java.net.MalformedURLException: no proto
ol: ${nexus.snapshots.url}/org/apache/commons/commons-parent/25/commons-parent-25.pom
ivy:retrieve] nexus-snapshots: unable to get resource for commons-io#commons-io;2.4: res=${nexus.snapshots.url}/commons-io/commons-io/2.4/commons-io-2.4.pom: java.net.MalformedURLException: no protocol: ${nexus.snapshots.ur
}/commons-io/commons-io/2.4/commons-io-2.4.pom
ivy:retrieve] ::::::::::::::::::::::::::::::::::::::::::::::
ivy:retrieve]

RAFT.java if(leader == null || (new_leader != null && !leader.equals(new_leader)))

public synchronized boolean updateTermAndLeader(int term, Address new_leader) {
        if(leader == null || (new_leader != null && !leader.equals(new_leader)))
            leader=new_leader;
        if(term > current_term) {
            current_term=term;
            return true;
        }
        return false;
    }

maybe leader == null ?

RAFT doesn't work with ForkChannels

When creating a ForkChannel and adding RAFT over it, there's an NPE on RAFT.init(). This requires a change in JGroups, see [1] for details. The code below doesn't work

[1] https://issues.jboss.org/browse/JGRP-1926

public class bla {
    protected JChannel   ch, fork_ch;

    protected void start(String name) throws Exception {
        ch=new JChannel("/home/bela/udp.xml").name(name);
        fork_ch=new ForkChannel(ch, "singleton", "fc1",
                                new ELECTION(),
                                new RAFT().members(Arrays.asList("A", "B", "C")).raftId(name),
                                new REDIRECT(),
                                new CLIENT());
        fork_ch.connect("ignored");
        ch.connect("demo");
    }

    public static void main(String[] args) throws Exception {
        new bla().start(args[0]);
    }
}

Add / remove server (dynamic membership)

Currently RAFT.majority defines the majority needed for elections and log commits. This requires the operator to always start the same set of servers. It is easy to start A,B,C, append and commit some changes to the log, then stop them and start D,E,F and make some other changes, overwriting the previous ones and violating Leader Completeness.
We need to define a static membership, e.g. {A,B,C} in the config file, compute the majority from it (2) and use AddServer or RemoveServer to change it dynamically (in running members) and XML editing to change it in the config.
Every member should be identified by a name (logical name), which also names the persistent log. When started, and the name is not in the above list, become read-only (don't participate in elections) or throw an exception.
Adding or removing a server would involve calling AddServer which needs to be acked by a majority of the existing view, and then the config would need to be changed as well.

Leader keeps sending snapshot forever

56773 [DEBUG] RAFT: A: sending snapshot (17b) to B
57775 [DEBUG] RAFT: A: sending snapshot (17b) to B
58777 [DEBUG] RAFT: A: sending snapshot (17b) to B
59779 [DEBUG] RAFT: A: sending snapshot (17b) to B
...

To reproduce:

Start A and B (out of {A,B,C}); they'll get the majority and (assume) A becomes leader
Make a few updates, e.g. with CounterServiceDemo
Do a snapshot on A
Kill B and remove its log (/tmp/B.log)
Restart B again
-> Problem as shown above

CLIENT: implement client redirecting

Complete CLIENT

ELECTION: start / stop elections based on view changes

Raft [1] duplicates some of the JGroups functionality: the goal of this issue is to remove that duplication and use JGroups wherever possible, e.g. for heartbeats and election kickoffs.
Also, cluster membership changes as described in ch. 4 of [1] should be simple to implement with this change.
Examples:

When a view size drops under majority, the leader steps down
When a view size reaches majority members, start the election process. When a leader has been chosen, stop the election timer
When a leader is not in the next view, start an election

Advantages:

No constant heartbeating, this is done by JGroups anyway
Easy addition of members which would change majority
No need to mark views as special entries in the log

Details are in [2].

[1] https://github.com/ongardie/dissertation
[2] https://github.com/belaban/jgroups-raft/blob/master/doc/design/Election.txt

RaftHandle: new class to interact with RAFT

Currently, applications have to retrieve a ref to the RAFT and CLIENT protocols, which is cumbersome, as users should not have to deal with the protocol stack of JGroups.
Goal: provide a new class RaftHandle which sits on top of a channel and has methods such as

set() (implementing Settable
register the state machine
get the last-applied, commit-index etc
set raft-id
Register RoleChange listeners

In a nutshell: let this new class handle protocol stack interactions and shield the users from the stack.
All blocks (e.g. CounterService and ReplicatedStateMachine) should also use this class.

This class could be created on regular channels or also on fork channels.

The name is yet TBD.

Consensus based CounterService

Implement CounterService and CounterServiceDemo in jgroups-raft. All counters are their values are essentially stored in a hashmap, so updates to counters simply updates the hashmap.
When less than N/1+1 servers are running, the service won't be able to make progress (forsaking availability), but it handles network partitions very well.

Log synchronization to out-of-sync member doesn't catch up

When we have {A,B,C}, then kill C and make a few more updates (e.g. using ReplicatedStateMachineDemo), after restarting C, A tries to transfer a snapshot to C.
This is costly and (apparently) doesn't work.
Instead, A should simply send the missing log entries to C.
Same problem when we remove C's log: snapshot installation doesn't work.

This is probably a regression of adding snapshot installation