Giter Site home page Giter Site logo

nvanbenschoten / paxos Goto Github PK

View Code? Open in Web Editor NEW
18.0 4.0 4.0 15.55 MB

A pluggable implementation of the Paxos Consensus Protocol

License: Apache License 2.0

Makefile 0.70% Go 94.48% Protocol Buffer 4.83%
paxos consensus replication distributed-systems

paxos's Introduction

Paxos

An implementation of the Paxos Consensus Protocol

Paxos is a protocol for solving consensus through state machine replication in an asynchronous environment with unreliable processes. This consensus protocol is then extended with a stable leader optimization to a replication protocol (commonly referred to as Multi-Paxos) to assign global, persistent, total order to a sequence of client updates. The protocol works by having multiple replicas work in parallel to maintain the same state. This state is updated on each request from a client by each replica, allowing it to be automatically replicated and preserved even in the case of failures. The basic algorithm was famously described by Leslie Lamport in his 1998 paper, The Part-Time Parliament. It was later clarified in his follow-up paper from 2001, Paxos Made Simple.

This library is implemented with a minimalistic philosophy. The main paxos package implements only the core Paxos algorithm, with storage handling, network transport, and physical clocks left to the clients of the library. This minimalism buys flexibility, determinism, and performance. The design was heavily inspired by CoreOS's raft library.

The library was also heavily influenced by the work of Jonathan Kirsch and Yair Amir in their paper Paxos for System Builders.

Features

The paxos implementation is a full implementation of the Paxos replication protocol. Features include:

  • Leader election
  • Update replication

Building

Run make to build the binaries client and server

Run make clean to clean all build artifacts

Run make check to perform linting and static analysis

Running

The project comes with two sample binaries, client and server. These binaries exemplify the proper use of the library.

To run a server process, a command like the following can be used:

./server -p 54321 -h hostfile

All server processes are identical; there is no designated leader process. Instead, the Paxos protocol will perform leader election periodically itself to establish a stable leader. There is also no order in which processes need to be brought up, although they will exit after 15 seconds if a connection cannot be established any of their peers.

To run a client process, a command like the following can be used:

./client -p 54321 -h hostfile

The client will connect to all available hosts in the hostfile and then prompt the user for an update string. Whenever an update is entered, it will send the update to a random available server, which attempts to globally order the update.

Verbose Mode (server only)

Adding the -v (--verbose) flag will turn on verbose mode, which will print logging information to standard error. This information includes details about all messages sent and received, as well as round timeout information.

Command Line Arguments

A full list of command line arguments for the two binaries can be seen by passing the --help flag.

Testing

The project comes with an automated test suite which contains both direct unit tests to test pieces of functionality within the Paxos state machine, and larger network tests that test a network of Paxos nodes. The unit tests are scattered throughout the paxos/*_test.go files, while the network tests are located in the paxos/paxos_test.go file.

To run all tests, run the command make test

System Architecture

The library is designed around the the paxos type, which is a single-threaded state machine implementing the Paxos consensus protocol. The state machine can be interacted with only through a Node instance, which is a thread-safe handle to a paxos state machine.

Because the library pushes tasks like storage handling and network transport up to the users of the library, these users have a few responsibilities. In a loop, the user should read from the Node.Ready channel and process the updates it contains. These Ready struct will contain any updates to the persistent state of the node that should be synced to disk, and messages that need to be delivered to other nodes, and any updates that have been successfully ordered. The user should also periodically call Node.Tick in regular interval (probably via a time.Ticker).

Together, the state machine handling loop will look something like:

for {
    select {
    case <-ticker.C:
        node.Tick()
    case rd := <-node.Ready():
        if rd.PersistentState != nil {
            saveToStorage(rd.PersistentState)
        }
        for _, msg := range rd.Messages {
            send(msg)
        }
        for _, update := range rd.OrderedUpdates {
            applyUpdate(update)
        }
    case <-ctx.Done():
        return
    }
}

To propose a change to the state machine, serialize the update into a byte slice within a pb.ClientUpdate and call:

node.Propose(ctx, update)

If committed, the data will eventually appear in the rd.OrderedUpdates slice.

Example Transport

While the paxos library itself does not restrict users to a specific network transport approach, the project includes an example using the gRPC framework. This decision means that the layer can hook directly into the paxospb protocol buffer definitions defined in the paxos library (see Protobuf Message Serialization). It also allowed the example implementation to utilize client-side message streaming between Paxos nodes to achieve improved performance.

The example transport layer can be seen in the transport package.

Design Decisions

Injected Time

Instead of including timeouts within the paxos library bound to physical time, the library instead exposes the notion of a "tick". Ticks are injected by clients of the library using the Node.Tick method. These ticks are then transferred to various tickingTimer instances that handle heartbeats and election timeouts. By avoiding a dependence on physical time, the package has remained fully deterministic and is much easier to test.

Protobuf Message Serialization

All messages that the paxos library interacts with are implemented as Protocol Buffers. This decision gives a couple of huge benefits. Primarily, it means that they come with a mechanism for trivial serialization, which is essential for sending the messages over a network of persisting them to disk. It also means that the future modifications to the message format will remain backwards compatible. Third, it means that the message formats are language-agnostic. Finally, it means that if a user of the library chooses to use gRPC as a network transport like we have, the messages will play nicely with their service definitions.

Implementation Issues

Deterministic Failures Testing

One of the implementation issues faced while developing the algorithm was its difficulty to test because the amount of required machinery was so large and necessitated having multiple processes run on different hosts. To get around this restriction and make testing easier, we isolated the core state machine from the network transport layer. This meant that we could mock out the network interactions of nodes and perform comprehensive testing on the underlying algorithm. The result of this can be seen in the paxos/paxos_test.go network tests. Here, we can choose to interfere with network traffic, isolate paxos nodes, or crash paxos instances completely. All of this is deterministic and easy to test.

These network tests include a number of interesting cases, such as failures during leader election. They verify that as long as fewer than a majority of nodes go down during the leader election, then the election still succeeds. However, if a majority of nodes crash, then the election blocks and never finishes. They also instrument the paxos timers themselves to make sure they are behaving as expected.

Multiple Processes on the Same Host

Another implementation issue faced while developing the sample applications was their difficulty to run because the suggested template "assumes that each host is running only one instance of the process." This meant that even during development, to test a m process instance of the algorithm, m hosts needed to coordinate and be kept in sync with code changes. To address this, the single-process-per-host restriction was lifted early in the development cycle. This was accomplished by allowing an optional port specification in the hostfile for a given process using a <hostname>:<port> notation. Once individual processes could specify unique ports, an optional -i (--id) flag was used to distinguish the current process in a hostfile where multiple processes were running on the same host. This way, the algorithm could be run on a single host with a hostfile like:

<hostname>:1234
<hostname>:1235
<hostname>:1236
<hostname>:1237

And commands like:

./server -h hostfile -i=0

Future Development

There are a few things missing from the library at the moment that will be addressed in future improvements.

Recovery

At the moment, a crashed node has no way to join back into the paxos group that it was previously a member of. We should introduce a mechanism for node recovery.

Reconciliation

Related to recovery, the Paxos algorithm does not specify a reconciliation protocol to bring servers back up to date. This means that the algorithm can allow servers to order updates quickly without allowing them to actually execute the updates due to gaps in the global sequence. A reconciliation strategy should be introduced to address this issue.

Membership Changes

At the moment, the library requires that a set of Paxos instances be specified before starting the protocol. This set must stay the same, and no new nodes are allowed to join. In the future support for changing membership of the paxos group should be added.

paxos's People

Contributors

nvanbenschoten avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.