Giter Site home page Giter Site logo

raft's Introduction

raft

This is an implementation of raft forked from SimpleRaft.

Why?

Python is an easy to learn language, only marginally more complex than pseudo code, but something you can write unit tests against and try new ideas. Python3 also supports type annotations, which can improve readability.

How?

This implementation has a few layers. You can navigate the code base as follows:

  • raft/messages Raft message types
  • raft/state Actual logic. Should be a superset of what you see in the one pager on the raft paper. It's a bit closer to the practice in the theory <-> practice spectrum than the pseudo code in the raft paper. Still fits in about 500 lines of code.
  • raft/servers A server is the networking infra that is needed to make the logic above useful. There are two implementations:
    • ZeromqServer: Single process, multiple thread implementation suitable for unit testing. Uses PUB/SUB sockets.
    • ZREServer: Multi-process implementation suitable for a simple chat client for example. Uses ZRE
  • raft/board This is where the state (x <- 3, y <- 1) in the raft paper is stored. db_board.py is probably the most useful one.

Details

Unlike simpleRaft, this code uses python3's async/await to replace the threading logic that existed before and caused unit tests to hang in the presence of many threads.

Async RPC is implemented on top of zeromq sockets using pyserde. Each message has a UUID. Responses from raft quorum participants contain the UUID of the message being responded to. The server maintains a cache of recently sent out messages. So it's able to infer which message is being responded to in the presence of out of order, pipelined delivery of RPCs (as opposed to traditional request/response pattern on TCP sockets in traditional RPC implementations).

Testing

$ apt install python3-pip python3-pytest
$ pip3 install -r requirements.txt --user
$ alias t=pytest-3
$ t
============================= test session starts =============================
platform linux -- Python 3.9.1, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /home/foo/raft
collected 25 items

tests/test_CandidateServer.py .......                                   [ 28%]
tests/test_FollowerServer.py ........                                   [ 60%]
tests/test_LeaderServer.py .....                                        [ 80%]
tests/test_MemoryBoard.py ..                                            [ 88%]
tests/test_raft.py ...                                                  [100%]

============================= 25 passed in 1.96s ==============================

References:

raft's People

Contributors

adsharma avatar dependabot[bot] avatar kainswor avatar seocam avatar streed avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

raft's Issues

Implement hierarchical quorums fully

59a0755 introduced the idea of hierarchical quorums. It ensures that a node becomes a candidate only if it's the leader in the parent quorum.

However, it doesn't handle the case when the node steps down as a leader in the parent quorum. This part needs to be implemented.

Multiple leaders elected

Setup: three participants - Alice, Bob and Charlie.

Alice Logs:

$ ./zre_raft/chat.py -n alice
using Python native module
bob cce7aeb7-9c55-431b-85c1-3a89bc8624f1 entered
bob cce7aeb7-9c55-431b-85c1-3a89bc8624f1 joined raft
alice raft: Lost Leader: None
bob raft: 
cce7aeb7-9c55-431b-85c1-3a89bc8624f1 exit 
alice raft: Lost Leader: None
bob 2d64670e-48e3-4e35-99c5-359738078b0a entered
bob 2d64670e-48e3-4e35-99c5-359738078b0a joined raft
charlie b0efbf70-9180-4758-9b95-d1bfe02d84ed entered
charlie b0efbf70-9180-4758-9b95-d1bfe02d84ed joined raft
alice raft: Lost Leader: None
4c0768b8461a4c33bdf582d286bf90dc: New Leader
charlie 5eaca74d-2ce4-4aa4-a5a9-790f17a00133 entered
charlie 5eaca74d-2ce4-4aa4-a5a9-790f17a00133 joined raft
b0efbf70-9180-4758-9b95-d1bfe02d84ed exit 
charlie raft: 
bob raft: 
alice raft: /status
4c0768b8-461a-4c33-bdf5-82d286bf90dc
consensus: ZREServer(_name='4c0768b8461a4c33bdf582d286bf90dc', _state=<simpleRaft.states.leader.Leader object at 0x7f8038adc820>, _log=[], _messageBoard=<simpleRaft.boards.memory_board.MemoryBoard object at 0x7f8038adccd0>, _neighbors=[UUID('2d64670e-48e3-4e35-99c5-359738078b0a'), UUID('5eaca74d-2ce4-4aa4-a5a9-790f17a00133')], _commitIndex=0, _currentTerm=5, _lastApplied=0, _lastLogIndex=0, _lastLogTerm=None)

Charlie Logs:

$ ./zre_raft/chat.py -n charlie
using Python native module
bob 2d64670e-48e3-4e35-99c5-359738078b0a entered
bob 2d64670e-48e3-4e35-99c5-359738078b0a joined raft
alice 4c0768b8-461a-4c33-bdf5-82d286bf90dc entered
alice 4c0768b8-461a-4c33-bdf5-82d286bf90dc joined raft
charlie raft: Accepted new leader: 4c0768b8461a4c33bdf582d286bf90dc
Lost Leader: 4c0768b8461a4c33bdf582d286bf90dc
5eaca74d2ce44aa4a5a9790f17a00133: New Leader
Lost Leader: 4c0768b8461a4c33bdf582d286bf90dc
5eaca74d2ce44aa4a5a9790f17a00133: New Leader
charlie raft:
charlie raft: /status
5eaca74d-2ce4-4aa4-a5a9-790f17a00133
consensus: ZREServer(_name='5eaca74d2ce44aa4a5a9790f17a00133', _state=<simpleRaft.states.leader.Leader object at 0x7f28bc501580>, _log=[], _messageBoard=<simpleRaft.boards.memory_board.MemoryBoard object at 0x7f28bc64dc70>, _neighbors=[UUID('2d64670e-48e3-4e35-99c5-359738078b0a'), UUID('4c0768b8-461a-4c33-bdf5-82d286bf90dc')], _commitIndex=0, _currentTerm=5, _lastApplied=0, _lastLogIndex=0, _lastLogTerm=None)

And then Bob alternates between the two leaders:

Accepted new leader: 4c0768b8461a4c33bdf582d286bf90dc
Accepted new leader: 5eaca74d2ce44aa4a5a9790f17a00133
Accepted new leader: 4c0768b8461a4c33bdf582d286bf90dc
Accepted new leader: 5eaca74d2ce44aa4a5a9790f17a00133
Accepted new leader: 4c0768b8461a4c33bdf582d286bf90dc
Accepted new leader: 5eaca74d2ce44aa4a5a9790f17a00133
Accepted new leader: 4c0768b8461a4c33bdf582d286bf90dc

Implement stable storage

Raft requires several things on stable storage, so we don't vote for two different candidates in the same term.
Use shelve to implement it like db_board.py, but in a separate db.

Not all followers get quorum updates

Start cluster with alice and bob.
Say alice is the leader.
Set some keys
Start charlie as a follower
Alice and Charlie have the same hash.
Bob has a different hash, because he's missing Charlies addition to the quorum.

Evaluate the impact of async, out of order replication

https://github.com/adsharma/raft/blob/master/raft/states/leader.py#L168-L180

This loop can send out new log entries before it has received acks from followers. This is not validated for correctness by TLA+ implementation of the raft protocol. See on_response_received() in the same file where acks are processed and leader commitIndex is advanced.

It's possible that multiple followers ack log entry at index 9, but do not agree on log entry at index 7.

The current implementation uses uuids to assign a unique ID to each AppendEntriesRPC. When it receives an ack, it looks up a cache of the uuid -> AppendEntriesRPC to discover which particular log entries were acked. It then advances matchIndex here:

https://github.com/adsharma/raft/blob/master/raft/states/leader.py#L102-L105

It's not clear that this is safe. It's possible that some previous log entries haven't been matched yet if there is packet loss.

We also need to think about all the failure cases, network partition, leader rejoining etc.

Quorum maintenance bugs

After 58f24aa two bugs remain:

  1. A (leader) B, C (followers). A leaves the quorum and C becomes the leader.

A QUORUM_PUT entry noting that A has left is never written to the log. When A rejoins as a follower as A', it sees both A and A' in the log.

  1. A (leader), B (follower). C joins the quorum

A writes add(C) to the log, but self._quorum is not updated. Suspect some kind of a bug in wait_for_consensus

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.