lenary / cs4414-project Goto Github PK

An Implementation of Raft in Rust

License: Other

Rust 100.00%

cs4414-project's Introduction

About Me

👨‍💻 I'm a Compiler Engineer, working on LLVM-based Toolchains for RISC-V and other embedded platforms. In the past I worked on toolchains for Arm and AArch64.

🤔 I used to be a programming languages researcher. I worked on Checked C and Idris, as well as other projects and publications.

cs4414-project's People

Contributors

Stargazers

Watchers

Forkers

nathantypanski xiaohong-deng nibin

cs4414-project's Issues

Design a leader election system

Before we can implement leader election, we need some sort of general design for it. This can be discussed here, as well as by email or over Google Hangouts if those are convenient.

Regardless, it would be nice to put all our design notes in sort of one place, once they're somewhat finalized.

Figure out peer spawning/node initialization

Related files: server.rs and schooner/mod.rs.

We might need a separate peer structure (not just the NetPeer one) for keeping track of followers' log states from the leader's perspective to achieve this.

It also requires working network code, see #13, #12.

Re-integrate old logic

As in #18, we want to make the best use of tree/schooner as foundation for our implementation logic. While the structure of that code is radically different, we should be looking to it for solutions to some of the hiccups we will encounter along the way.

Leader Election + Log Replication

Fill out leader.rs

Implement TCP connections to peers

Implement net::NetListener

Fill out candidates.rs

leader_loop design is flawed

Once I began adding log repair (only very partially done) to schooner, I found a flaw in the leader design. Currently, it spawns persistent tasks to manage each peer (leader_peer_handler fn) and those peer handlers manage sending heartbeats. This seemed like a good idea because it frees the main leader task from having to manage that heartbeat interval amongst all its other activities including trying to respond to clients in a timely fashion.

The problem is that the peer handlers send stale data in heartbeats - prev_log_idx in particular. This is deadly once log_repair is in replace because older prev_log_idx (say a 5 when it should be a 6) will cause the followers to truncate their logs (remove 6 since the heartbeat said 5). Disaster.

So I'd like to open this for review and even invite someone else to implement it if they'd like to do that.

I thought of two alternatives to solving this (neither fully fleshed out, so one or both may be flawed as well):

Option 1: put an RWArc around a shared data structure

This shared data structure would be shared between the persistent peer handlers and the main leader_loop task. It could be a new StateMachine struct that we define. The peer handlers would get this and would base new heartbeats off what is in the StateMachine.

Pros

It solves (I think) the stale heartbeat problem

Cons

It introduces a mutex-based data structure, something I'd like to avoid if possible
I'm not yet sure what the StateMachine contains and whether it has the right information for the heartbeat messages
Having persistent peer handlers (persistent while that node is leader that is) may be flawed in itself.
- There is no way (I know of) to mark tasks as daemon threads like you can in Java or Go (all goroutines are daemon). "Daemon" here means they will shutdown when the non-daemon or main thread shuts down.
- Because of that if the leader_loop task dies, the peer handlers will not and the only way they shut down (right now) is to get a stop message from the leader_loop task. They will forever send heartbeats even after another new leader has been elected. Bad.

#### Option 2: leader_loop manages heartbeats; peer_handlers are no longer long-lived/persistent

The leader_loop has to balance two opposing forces:

heartbeats must be sent out on time
clients are served serially and must be responded to within some finite time period that is probably longer than the heartbeat timeout and the leader_loop has to wait on followers to respond before responding to the client

Here's a redesigned leader_loop model for consideration:

There is an outer loop that selects over three channels:

A timeout channel that goes off every heartbeat_interval milliseconds
The event Receiver channel (from the network_listener)
An AEResponse Receiver from (non-persistent) peer handlers

The event Receiver channel brings in:

client commands
AERequests from other peers that claim to be leader (may be a valid or bogus claim)

The AEResponse Receiver gets AEResponses coming from peers that were not needed to make a majority to respond to the client, so were not picked up in the inner loop below I'm about to describe.

When heartbeat_interval timer is selected

One non-persistent peer handler task is spawned per peer. That handler makes a connection to the peer and is given an AERequest and a Sender to message back (the Receiver end is the "AEResponse Receiver" mentioned as number 3 in the channel list above).

When event Receiver channel is selected - and it is a client cmd

As with heartbeats, one non-persistent peer handler task is spawned per peer. That handler makes a connection to the peer and is given an AERequest and a Sender to message back (the Receiver end is the "AEResponse Receiver" mentioned as number 3 in the channel list above).

The leader_loop then sets up an inner loop with another select, that selects over:

AEResponse Receiver, waiting for peer responses in order to respond to the waiting client
A (new) heartbeat_interval timer
A client timeout timer (which probably has much longer duration than the heartbeat_interval timer)

The heartbeat_interval timer is set once per round. The first time it is set it will be heartbeat_interval millis. If the AEReponse Receiver is selected, then it decide if a majority of the cluster has logged the cmd and if so respond to the client or if not go back into this inner select loop. However, this time when it sets the heartbeat_timer, will need to be for heartbeat_interval minus the duration of handling the AEResponse.

The client timeout timer, on the other hand, will be set once over all rounds and if it goes off then the client is notified of a failure to log its command and then leader_loop returns to the outer loop.

Ideally the time remaining on the heartbeat_inteval timer should be taken into account when setting the heartbeat timer in the outer loop, but if the heartbeat_interval is <= 1/3 of the election_timeout setting, it is probably OK to be a little sloppy on this one.

When the AEResponse Receiver is selected (outer loop)

These are responses that were not handled in the inner loop since they weren't need to make a majority, or they are responses to heartbeat AERs. When they come in they must be analyzed for success/failure status. If success, then the peer's next_idx must be updated in the leader. If failure, then additional AERequests must be sent to do log repair - which requires spawning another peer handler task. The leader returns to the outer leader loop. Again the time taken here must be subtracted from heartbeat_interval when resetting the heartbeat timer.

Summary of Option 2

A lot of details are not covered here, including how to handle AERequests from putative new leaders, but hopefully this outlines enough how this model would work.

Pros

solves the stale-heartbeat data problem
no mutexes or shared state
only one persistent task, so no rogue unkillable leader tasks can remain if the leader_loop task dies

Cons

complexity

#### Option 3: ????

Profit! (Your idea here)
Look at other implementations?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.