Giter Site home page Giter Site logo

stateright / stateright Goto Github PK

View Code? Open in Web Editor NEW
1.5K 1.5K 53.0 2.78 MB

A model checker for implementing distributed systems.

Home Page: https://docs.rs/stateright

License: MIT License

Rust 95.00% CSS 0.78% JavaScript 2.26% HTML 1.58% Shell 0.17% Nix 0.20%
actor-model distributed-systems model-checker paxos

stateright's Introduction

chat crates.io docs.rs LICENSE

Correctly implementing distributed algorithms such as the Paxos and Raft consensus protocols is notoriously difficult due to inherent nondetermism such as message reordering by network devices. Stateright is a Rust actor library that aims to solve this problem by providing an embedded model checker, a UI for exploring system behavior (demo), and a lightweight actor runtime. It also features a linearizability tester that can be run within the model checker for more exhaustive test coverage than similar solutions such as Jepsen.

Stateright Explorer screenshot

Getting Started

  1. Please see the book, "Building Distributed Systems With Stateright."
  2. A video introduction is also available.
  3. Stateright also has detailed API docs.
  4. Consider also joining the Stateright Discord server for Q&A or other feedback.

Examples

Stateright includes a variety of examples, such as a Single Decree Paxos cluster and an abstract two phase commit model.

Passing a check CLI argument causes each example to validate itself using Stateright's model checker:

# Two phase commit with 3 resource managers.
cargo run --release --example 2pc check 3
# Paxos cluster with 3 clients.
cargo run --release --example paxos check 3
# Single-copy (unreplicated) register with 3 clients.
cargo run --release --example single-copy-register check 3
# Linearizable distributed register (ABD algorithm) with 2 clients
# assuming ordered channels between actors.
cargo run --release --example linearizable-register check 2 ordered

Passing an explore CLI argument causes each example to spin up the Stateright Explorer web UI locally on port 3000, allowing you to browse system behaviors:

cargo run --release --example 2pc explore
cargo run --release --example paxos explore
cargo run --release --example single-copy-register explore
cargo run --release --example linearizable-register explore

Passing a spawn CLI argument to the examples leveraging the actor functionality will cause each to spawn actors using the included runtime, transmitting JSON messages over UDP:

cargo run --release --example paxos spawn
cargo run --release --example single-copy-register spawn
cargo run --release --example linearizable-register spawn

The bench.sh script runs all the examples with various settings for benchmarking the performance impact of changes to the library.

./bench.sh

Features

Stateright contains a general purpose model checker offering:

  • Invariant checks via "always" properties.
  • Nontriviality checks via "sometimes" properties.
  • Liveness checks via "eventually" properties (experimental/incomplete).
  • A web browser UI for interactively exploring state space.
  • Linearizability and sequential consistency testers.
  • Support for symmetry reduction to reduce state spaces.

Stateright's actor system features include:

  • An actor runtime that can execute actors outside the model checker in the "real world."
  • A model for lossy/lossless duplicating/non-duplicating networks with the ability to capture actor message history to check an actor system against an expected consistency model.
  • Pluggable network semantics for model checking, allowing you to choose between fewer assumptions (e.g. "lossy unordered duplicating") or more assumptions (speeding up model checking; e.g. "lossless ordered").
  • An optional network adapter that provides a lossless non-duplicating ordered virtual channel for messages between a pair of actors.

In contrast with other actor libraries, Stateright enables you to formally verify the correctness of your implementation, and in contrast with model checkers such as TLC for TLA+, systems implemented using Stateright can also be run on a real network without being reimplemented in a different language.

Contribution

Contributions are welcome! Please fork the library, push changes to your fork, and send a pull request. All contributions are shared under an MIT license unless explicitly stated otherwise in the pull request.

License

Stateright is copyright 2018 Jonathan Nadal and other contributors. It is made available under the MIT License.

To avoid the need for a Javascript package manager, the Stateright repository includes code for the following Javascript dependency used by Stateright Explorer:

  • KnockoutJS is copyright 2010 Steven Sanderson, the Knockout.js team, and other contributors. It is made available under the MIT License.

stateright's People

Contributors

abhinav-upadhyay avatar andreastedile avatar cjen1 avatar davidrusu avatar degregat avatar graydon avatar haoyang670 avatar jeffa5 avatar jonnadal avatar liangrunda avatar winksaville avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

stateright's Issues

Add ability to save and load checker state

It would be really nice to be able to have the results of a check be saved (including states created/visited etc) so that they could be shared without others having to rerun the check.

Maybe this could lead to the web explorer having an option to upload a file with the check that it can process into an explorable version.

I guess this wouldn't directly mean that the checker can continue from a persisted state but could also enable snapshots while checking is running to enable investigating issues later.

Quicker detection of done state in reporter

Currently when using a reporter there is a sleep of 1 second between each report which is fine for long-running models (arguably could be configurable). However, when checking very small models (first few stages of iterative DFS) it would be nicer to have the reporter notified that the checker is finished and finish up immediately.

DFS and/or DPOR

A fair number of explicit-state model checkers use DFS not BFS. I realize this would be a fairly big revision to the structure (at least of the checker module) but I think there are good reasons for it:

  • We can spit the sources (visited-state-tracking) hashtable in two: one piece that's a strictly-consulted and unbounded (but probably quite small), containing the current DFS path, used for detecting re-visits on a single path; and a second piece that's a size-bounded (randomly evicting on overflow) set of previously-visited states not on the current DFS path. A false negative on the second piece (due to an eviction) causes accidental revisiting of a subspace of the state-space but doesn't affect termination of the algorithm. This allows trading time and space: specifically avoiding memory exhaustion, at the cost of a longer runtime.
  • I think it might better-enable distributed operation in the future (on a cluster of machines) by delegating entire subtrees to separate nodes. Not sure about this.
  • More significantly, I think it's a path towards doing dynamic partial-order reduction (as described here: https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.421.1291). Specifically in an actor system we could classify the choice-points in a given DFS path as splitting or not-splitting the future state space into disjoint sets of communicating actors and only backtrack and reorder within a given disjoint set.

I'm not sure about drawbacks. Maybe a slightly more complicated stack-of-choice-points structure to track progress? What do you think?

Named timers

Currently timers can only be used to perform operations in a combined manner at an actor.

Would it feasible to let timers be named in some way so that they can trigger different events? I think tagging the timer struct with a generic type would make this work, then current behaviour is storing unit.

Order-preserving network flows

This is a note about an addition I'm thinking of making -- thought I'd post here before I work on it to ask if it makes sense to you and/or would be welcome.

(Context: the actor system I'm driving with stateright exhibits pretty bad state-space explosion -- 3 peers and 2 consensus cycles finish in "only" a few hundred thousand states, but 3 consensus cycles exceeds 100m states and exhausts memory on my 128gb test box -- so I'm kinda grasping at straws to figure out things to reduce it, and/or whether there's a bug in my code. I'll file a couple other issues as possible avenues I'm considering for discussion of those also.)

The simplest change I can think of is to add another flag concerning the network semantics, such that messages in a given flow don't get reordered: if peer 1 sends messages A then B to peer 2, I'd like to only explore delivery schedules that have A arriving at peer 2 before B. I think this is not an unreasonable variant -- lots of network abstractions preserve order per-flow -- and it'll cut out a lot of the state space, for a reasonably small change.

This would not alter the interleaving of separate flows: if peer 3 sends message C to peer 2, then we'd still explore arrival schedules at peer 2 of each of [C,A,B], [A,C,B], [A,B,C]. Just not [C,B,A], [B,C,A], or [B,A,C].

What do you think?

Question: Is it possible to share the outputs / states

First, love the project. Superb work and thanks.
We are using this a lot now to test some hypothesis and it's proving magnificent. I recently got a larger 96Gb machine to run some larger models. (would love a gpu handoff).

What I would really like to do is share the data with collaborators to let them investigate any counterexamples and associated sequence diagrams.

Can you suggest any mechanism to do this or perhaps even a route to achieving this.

`smoke_test_states` fails on the master branch.

---- checker::explorer::test::smoke_test_states stdout ----
thread 'checker::explorer::test::smoke_test_states' panicked at 'assertion failed: `(left == right)`
  left: `StateView { action: Some("Id(0) → Ping(0) → Id(1)"), outcome: Some("OUT: [Send(Id(0), Pong(0))]\n\nNEXT_STATE: 1\n\nPREV_STATE: 0\n"), state: Some(ActorModelState { actor_states: [0, 1], history: (1, 2), is_timer_set: [Timers({}), Timers({})], network: UnorderedNonDuplicating({Envelope { src: Id(1), dst: Id(0), msg: Pong(0) }: 1}) }), properties: [(Always, "delta within 1", None), (Sometimes, "can reach max", None), (Eventually, "must reach max", Some("14242056848553854200/2214304369511095301")), (Eventually, "must exceed max", Some("14242056848553854200/3590395016933165332/11432146542073814434/6075559035352024258")), (Always, "#in <= #out", None), (Eventually, "#out <= #in + 1", None)], svg: Some("<svg version='1.1' baseProfile='full' width='500' height='60' viewbox='-20 -20 520 80' xmlns='http://www.w3.org/2000/svg'><defs><marker class='svg-event-shape' id='arrow' markerWidth='12' markerHeight='10' refX='12' refY='5' orient='auto'><polygon points='0 0, 12 5, 0 10' /></marker></defs><line x1='0' y1='0' x2='0' y2='60' class='svg-actor-timeline' />\n<text x='0' y='0' class='svg-actor-label'>0</text>\n<line x1='100' y1='0' x2='100' y2='60' class='svg-actor-timeline' />\n<text x='100' y='0' class='svg-actor-label'>1</text>\n<line x1='0' x2='100' y1='0' y2='30' marker-end='url(#arrow)' class='svg-event-line' />\n<text x='100' y='30' class='svg-event-label'>Ping(0)</text>\n</svg>\n") }`,
 right: `StateView { action: Some("Id(0) → Ping(0) → Id(1)"), outcome: Some("OUT: [Send(Id(0), Pong(0))]\n\nNEXT_STATE: 1\n\nPREV_STATE: 0\n"), state: Some(ActorModelState { actor_states: [0, 1], history: (1, 2), is_timer_set: [Timers({}), Timers({})], network: UnorderedNonDuplicating({Envelope { src: Id(1), dst: Id(0), msg: Pong(0) }: 1}) }), properties: [(Always, "delta within 1", None), (Sometimes, "can reach max", None), (Eventually, "must reach max", None), (Eventually, "must exceed max", None), (Always, "#in <= #out", None), (Eventually, "#out <= #in + 1", None)], svg: Some("<svg version='1.1' baseProfile='full' width='500' height='60' viewbox='-20 -20 520 80' xmlns='http://www.w3.org/2000/svg'><defs><marker class='svg-event-shape' id='arrow' markerWidth='12' markerHeight='10' refX='12' refY='5' orient='auto'><polygon points='0 0, 12 5, 0 10' /></marker></defs><line x1='0' y1='0' x2='0' y2='60' class='svg-actor-timeline' />\n<text x='0' y='0' class='svg-actor-label'>0</text>\n<line x1='100' y1='0' x2='100' y2='60' class='svg-actor-timeline' />\n<text x='100' y='0' class='svg-actor-label'>1</text>\n<line x1='0' x2='100' y1='0' y2='30' marker-end='url(#arrow)' class='svg-event-line' />\n<text x='100' y='30' class='svg-event-label'>Ping(0)</text>\n</svg>\n") }`', src/checker/explorer.rs:475:9


failures:
    checker::explorer::test::smoke_test_states

Reproduced step:

> cargo clean
> cargo update
> cargo build
> cargo test

One weird thing is that if I run cargo test --verbose, the test can pass.

Simulation checker

Hi @jonnadal. I'm looking to add a simulation checker to stateright, similar to TLC's.

A rough outline:

  1. Each thread starts with a seed and the set of initial states.
  2. It picks a random initial state using the seeded rng
  3. It then generates the next states
  4. It picks a random state from this list using the seeded rng, going back to 3
    • This continues until it reaches the boundary of the model
    • Or a max depth is specified and that is reached
  5. Once a single trace has finished it begins from point 2 again.

This is not intended to finish running (similarly to TLC's).

Primarily this aims to enable checking deeper in the search space quickly, typically where more interesting behaviour might be happening (at least for more complex protocols).

I'd be happy to implement this, let me know what you think!

Checker timeout

Executing a check run can take a lot of time for exhaustive checkers, but for the simulation checker it can be unbounded.

While the user can wrap calls with a timeout mechanism, I think it would be a better experience to add a mechanism for timing out the execution.

This could be done at the join level, timing out there, but perhaps better would be within each checker thread, letting the rest of the API be untouched, or perhaps on the job market level, so that there is a single signal for all threads to observe.

Question about stateright states

In stateright, is it possible for the "environment" to have state? Or is all of the state represented in the actors?

For example, could the environment have a clock that is non-deterministically updated?

(Sorry for the naive question.)

Smaller (and lazy/persistent) states

The system state representation in the actor subsystem is relatively large and eagerly copies vectors and btrees full of contents when cloned. This makes it relatively memory-hungry, which in turn limits the size of the state-space it's possible to explore.

How would you feel about changes that reworked the representation to use lazy/persistent types (that copy-on-write only the substructures written) like those in the im crate? They are a bit slower than the eager types but I think they can take up substantially less room.

Remove the `actions` argument from `Model::actions` and return the actions

In the definition of Module::actions, the argument actions: &mut Vec<Self::Action> is just used to be pushed the result in. This makes no sense because it should be the caller to decide how to use the returned actions, and it also introduces a mut reference.

Instead, I'd like the definition of Module::actions to be

fn actions(&self, state: &Self::State) -> Vec<Self::Action>

, and it could also make the API consistent with next_steps and next_states

Handle panics during checking

Would it be possible to catch panics during checking and cleanly terminate the process, maybe with a typical report?

Spawning actors with a reliable network

Currently, actors can be spawned with UDP connections, though we may model them with reliable ordered networks.

It would be cool to be able to spawn the actors with TCP connections to have this be more accurate.

Bug in the explorer

I integrated stateright to my Raft implementation and I was trying to explore state space using Web UI. I got a bug with the explorer and you can easily reproduce it in the following steps:

  1. Clone my Raft project with https://github.com/LiangrunDa/raft-lite.git and run cargo run -- -m explore -n unordered-duplicating to start the explorer (It is bug free with unordered-nonduplicating network)

  2. Start with http://localhost:3000/#/steps/17615423386414597689/8809746236804019274/1839872530274288054/12494312633948216985/16316435326953127737/

image

  1. click the next action "Id(2) → VoteResponse(VoteResponseArgs { voter_id: 2, term: 1, granted: true }) → Id(0)".

  2. The "Path of Actions" will show that the last action is "Id(0) → VoteRequest(VoteRequestArgs { cid: 0, cterm: 1, clog_length: 0, clog_term: 0 }) → Id(2)", which is not the action taken in step 3.

image

I found that it is caused by the following code in ui/app.js line 253 :

step = nextSteps.find(step => step.fingerprint == nextFingerprint);

The array nextSteps contains steps that might be taken next. However, the fingerprint associated with each step may not be unique. Consequently, the find function will simply return the first step encountered with a matching fingerprint.

This lack of uniqueness arises from certain actions not changing the state of the actor and the unordered-duplicate network also remains unchanged. By "unchanged," it is meant that the envelope in the unordered-duplicate network is not consumed.

It is very common to have such action that doesn't change the actor state in a distributed system. For example, in the Raft algorithm, the action VoteResponse does not change the actor state if the recipient is already a leader of the term.

In above case, in step3, we take Id(2) → VoteResponse(VoteResponseArgs { voter_id: 2, term: 1, granted: true }) → Id(0), since Id(0) is already a leader, the fingerprint of this step will be same as the previous one.

Since duplicate delivery of Id(0) → VoteRequest(VoteRequestArgs { cid: 0, cterm: 1, clog_length: 0, clog_term: 0 }) → Id(2) will also not change the actor state, the fingerprint should be the same as the previous one as well.

Consequently, the Web UI encounters two subsequent actions with identical fingerprints, and it simply selects the first one Id(0) → VoteRequest(VoteRequestArgs { cid: 0, cterm: 1, clog_length: 0, clog_term: 0 }) → Id(2) as the action taken in step 3.

Would it be possible to add some additional information to the network state to make the fingerprint unique? That is, we need to let the system "remember" an action that is taken by the user but does not change the actor state.

Relax trait bounds for messages and actor state

Hi. I was upgrading to the latest version and noticed that new trait bounds were introduced (e.g. Ord for Actor::Message). If I understand correctly, this prevents the use of e.g. HashMap's in such messages. Would it be possible to relax these trait bounds?

(I tried to perform such change, but it turned out more complex than what I was expecting (probably because I'm not familiar with the codebase). If you have ideas on how to go about this, I can give it a try. Thanks.)

Go code implementation of stateright project

Hello author, I am very interested in the stateright project, especially its model checker, and I have also tried it. I personally think that Rust is easier to use than TLA+ in terms of writing specifications. However, most of our project teams are familiar with Go and Java code, so we would like to try to use the Go language to implement the stateright project to facilitate its promotion and use. Could you please provide the structure flow chart of the Stateright project, or other information related to Stateright, so as to facilitate our development. If you are interested, we especially welcome you to join to develop the go version of stateright to make it more valuable. thank you!

Simulation checker

Stateright currently has two main checkers (dfs and bfs). BFS uses lots of memory whilst providing shortest paths, DFS uses little memory but doesn't produce shortest paths. An iterative DFS can be implemented using the max depth to get BFS-like behaviour with low memory usage and a little performance overhead for short paths.

For larger models, complex interactions can happen deep in the tree where DFS is able to reach but not able to cover reliably.
For this it would be more useful to have a 'shuffled' DFS implementation, with a seed. This would permute the search order to explore different paths of the state space in different runs.

However, this still can get stuck searching very deep in the space rather than exploring paths from different origins quickly. Thus I think it would be interesting to have a simulation checker that does searches from the initial states to the max depth (or end of the model) repeatedly and randomly. This would not track seen states to avoid memory usage and would be stateless, and so unbounded in time, but able to check deeper paths.

TLC has a simulation mode already. Fibril seems to be going the stateless direction but I wonder if we can have some statelessness in stateright to maintain the models, and Shuttle is a stateless checker in Rust too but lacks the more abstract specifications of properties and tracking history etc.

Tag your releases

Please consider tagging your releases, so that there is a snapshot to go to, when working with specific versions.

Right now I'm trying to get the examples to run standalone, but without a proper tag it's hard to get the correct source version of the example matching to the version in my lock-file. Right now it might be easily fixable, but it's only gonna get harder moving forward.

Report depth in ReportData

Whilst a check run is running it would be nice to be able to monitor the depth progress through the tree as some sort of indication how complete it might be.

`DGraph::with_path` should take the ownership of `self` to avoid cloning.

pub fn with_path(&self, path: Vec<u8>) -> Self {
let mut src = *path.first().unwrap();
DGraph {
inits: {
let mut inits = self.inits.clone();
inits.insert(src);
inits
},
edges: {
let mut edges = self.edges.clone();
for &dst in path.iter().skip(1) {
edges.entry(src).or_default().insert(dst);
src = dst;
}
edges
},
property: self.property.clone(),
}
}

It is better to let with_path consume self to avoid coping.

pub fn with_path(mut self, path: Vec<u8>) -> Self {
    ...
}

Fail fast option

When working with a model that has more than one property it is common, at least in development, that we want to see if any property is broken, rather than waiting for an exhaustive check to report all of the results.

This encourages quicker turnaround times for things that quickly run into issues.

To work with this, I propose that the checkers should implement a fail fast option, that changes the finish condition from requiring all properties to have a discovery, to any property having a negative discovery, an example for an always or eventually condition, a lack of examples for a sometimes property.

Pending-state randomization

Currently it looks like the checker always splits chunks off the end of the pending vecdeque, and then pushes new extensions back on the end. But it doesn't necessarily split off everything pending, and it can often grow the end by more than it tried to consume. When this happens, it ignores (potentially for a very long time, until much later in the run) entries earlier in the deque.

I'm not sure this is a problem (I guess if we're going to explore all states eventually the order might not matter?) but I worry a bit vaguely that there's some possibility that an imbalance in the shape the frontier gets expanded at will wind up costing more work or more memory than if it were more balanced.

Might it make sense to randomize or otherwise permute the pending vector from time to time?

Explorer UI: dynamic property feedback

When exploring states in the UI it shows the overall properties for all states so far, could it also show the status of the properties at the current state being explored.

My hope is that this will allow us to automatically be told (when clicking through) that we are now in a bad state (violating properties).

Leveraging future's state machines

Hello! Love this project. I had a thought the other day...Futures are state machines, and stateright explores state space. Rather than having to manually hoist state into the State associated type forActors, would it possible to automatically plug stateright into the generated state machine for futures? Essentially treating futures as actors, where the state is always Poll::Ready(val) or Poll::Pending?

Heterogenous Actors

What is the preferred way to model actors of different types, like "client" and "server"?

The 2pc example seems to model this with a shared state, but I'd like to use Actors because they are easier to reason about. However, ActorModel appears to only accept a single Actor type.

I see there is a Choice Actor, that simply requires the same type for messages and timers. Should we use that? What if we'd like more than two actor types, like "client, server, database" or similar?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.