Giter Site home page Giter Site logo

jcp19 / spider Goto Github PK

View Code? Open in Web Editor NEW
16.0 4.0 1.0 15.7 MB

Automated data race detection from a distributed trace via SMT constraint solving

License: MIT License

Java 79.54% Shell 3.78% Awk 7.74% Alloy 6.49% SMT 2.44%
distributed-systems traces race-conditions race-detection race-detection-engines runtime-verification testing-tools debugging-tools

spider's Introduction


SPIDER

Automated distributed data race detection from distributed logs via SMT constraint solving.
Report Bug · Request Feature

About The Project

Data races have been shown to be a frequent source of distributed concurrency bugs in major distributed programs. They occur when two memory accesses to the same variable (where at least one of them is a write) are concurrent and thus, their relative order is unpredictable: if two memory accesses A and B are concurrent, then A may occur before B or B may occur before A.

SPIDER provides automated distributed data race detection via SMT constraint solving. It receives as input an event trace captured at runtime, and generates a happens-before model that encodes the causal relationships between the events. After building the constraint system, SPIDER resorts to Z3, an SMT solver, to check for data races. It is capable of detecting race conditions even if no bug caused by race conditions manifests itself in the traced execution. SPIDER was initially intended to be run alongside Minha (in fact, it was originally named Minha-Checker) but it became an independent tool.

Getting Started

Prerequisites

In order to compile SPIDER on your machine, you first need to have the following packages installed:

  • Z3 in your PATH
  • falcon-taz - the easiest way to obtain falcon-taz is to follow these steps:
# 1. clone the repo
git clone [email protected]:fntneves/falcon.git

# 2. change directory to the falcon-taz folder
cd falcon/falcon-taz

# 3. checkout the branch fix_message_handler (might not be
#    needed in future versions of falcon-taz)
git checkout fix_message_handler

# 4. install the package using Maven
mvn clean install

Building SPIDER

# 1. clone this repo
git clone [email protected]:jcp19/SPIDER.git

# 2. build a jar using Maven
mvn clean package

After these commands, SPIDER will be available as a .jar in your target/ folder.

Usage

Tracing Your Target

SPIDER was designed to inspect event traces from distributed systems and look for data races. Currently, it can only read traces in the Event Trace JSON API. As such, before using SPIDER, you must collect a trace from your target system using a suitable method. The language in which your target system is written is not important as long as there is an effective way to trace your system at runtime - take a look at Minha and falcon-tracer.

Running SPIDER

Having built the jar file, you can run it with a combination of the following command and flags:

java -jar spider.jar
 -d,--dataRaces          Check for data races.
 -f,--file <arg>         File containing the distributed trace.
 -m,--messageRaces       Check for message races.
 -r,--removeRedundancy   Removes redundant events before checking for race
                         conditions.

-f is the only required flag and it is used for specifying the path to your trace file. The flag -d indicates that SPIDER will look for data races between different threads while -m indicates that SPIDER should look into data races caused by racing messages (still an experimental feature!). -r is used to perform a (safe) optimization which decreases the time needed to find data races in the trace.

During its execution, SPIDER outputs a list of pairs of code locations that contain racing instructions.

A Brief Example

This section provides a small but descriptive example of how to use SPIDER. We show how to find data races in the following Java program:

class Example1 implements Runnable {
    static int counter = 0;

    public static void main(String[] args) {
        Thread t1 = new Thread(new Ex1());
        t1.start();
        counter++;
        System.out.println("The value of counter is " + counter);
    }

    public void run() {
        counter++;
    }
}

This is by no means a complex program (heck, it only runs two simple threads on a single node!) but it suffers from two data races - that's enough to demonstrate SPIDER's functionality. The two races in question occur between the instructions on lines 7 and 12 because the increments to the static variable counter are not synchronized and between the instructions on lines 8 and 12 because the increment operation on line 12 is not causally related with the reading of counter's value on line 8. Because of this, Example1 has two possible outputs:

  • Possible Output #1:
The value of counter is 1
  • Possible Output #2:
The value of counter is 2

Running this program with a tracing mechanism will produce a trace similar to this one:

{
    "thread": "[email protected]",
    "type": "START",
    "timestamp": 1525270020050
}{
    "thread": "[email protected]",
    "type": "CREATE",
    "child": "[email protected]",
    "timestamp": 1525270020069
}{
    "thread": "[email protected]",
    "type": "START",
    "timestamp": 1525270021812
}{
    "thread": "[email protected]",
    "loc": "demos.Example1.main.7",
    "variable": "demos.Example1.counter",
    "type": "W",
    "timestamp": 1525270021837
}{
    "thread": "[email protected]",
    "loc": "demos.Example1.main.8",
    "variable": "demos.Example1.counter",
    "type": "R",
    "timestamp": 1525270021852
}{
    "thread": "[email protected]",
    "loc": "demos.Example1.run.12",
    "variable": "demos.Example1.counter",
    "type": "W",
    "timestamp": 1525270021875
}{
    "thread": "[email protected]",
    "type": "END",
    "timestamp": 1525270022188
}{
    "thread": "[email protected]",
    "type": "END",
    "timestamp": 1525270022200
}

Note: if you are looking for more complex examples, we suggest you take a look at the programs in this repo whose traces are available here.

We are now ready to run SPIDER with the flag -d to detect data races resulting from concurrent accesses in different threads, which will produce the following simplified output:

java -jar spider.jar -d -f traces/Example1.log
[main] Loading events from traces/Example1.log
[main] Trace successfully loaded!
[RaceDetector] Generate program order constraints
[RaceDetector] Generate fork-start constraints
[RaceDetector] Generate join-end constraints
[RaceDetector] Generate wait-notify constraints
[RaceDetector] Generate locking constraints
[RaceDetector] Generate communication constraints
[RaceDetector] Generate message handling constraints

Data Race Candidates:
-- ([email protected][email protected],
    [email protected][email protected])
-- ([email protected][email protected],
    [email protected][email protected])

Actual Data Races:
-- ([email protected][email protected],
    [email protected][email protected])
-- ([email protected][email protected],
    [email protected][email protected])

=======================
        RESULTS
=======================
> Number of events in trace:          8
> Number of constraints in model:     6
> Time to generate constraint model:  0.001 seconds

## DATA RACES:
> Number of data race candidates:     2
> Number of actual data races:        2
> Time to check all candidates:       0.007 seconds

SPIDER's output is structured in the following maner: after some logging messages, it presents the candidate pairs of events which form a data race.

Even though the candidate pairs look complicated, they are actually very easy to read - each element of the pair is an identifier of an event which starts with W if it is a write or with R if it is a read and is followed by the accessed variable, the thread where it occurs, the node where the thread is running and the line of code responsible for the event.

A pair of events constitutes a candidate if both events access the same variable from different threads and at least one of them is a write. After listing the candidate pairs, the output lists the actual data races, i.e. the candidate pairs whose events are concurrent. Finally, a brief summary is shown.

How Does It Work?

SPIDER builds a causality relation < between the traced events such that events a and b are related (a < b) if and only if a must occur before b in any execution of the system.

Note: < is in fact a poset.

Using <, it is possible to detect which candidate pairs of memory accesses are in fact concurrent - they are exactly the pairs of the form (a,b) such that neither a < b nor b < a. If you're more of a visual learner, you can look at the causality relation < as a directed graph whose nodes are the traced events and whose edges connect nodes a and b iff a < b. Thus, ignoring the actual identifiers for the events and threads, the causality relation for the Example1 trace shown above looks like this:

abstract causality relation of Example1

this image was obtained in Alloy with this model of the causality relation.

Determining wether two events are concurrent can also be done "visually" - two events are concurrent if there is not a directed path in the graph that contains both events. For example, there's no directed path that goes through Write0 and Write1. Thus, they are concurrent.

Internally, SPIDER builds the causality relation as a set of constraints (e.g. for Example1) and queries an SMT solver to decide if two memory accesses are concurrent given those constraints.

If you are curious and want to know more about this project (e.g. how we generate the constraints, benchmarks, etc) check out our TAP 2020 paper (draft).

Roadmap

See the open issues for a list of proposed features and known issues.

Contact

SPIDER was developed by

Acknowledgements

SPIDER couldn't be developed without the work that has been put into some great projects such as

spider's People

Contributors

dependabot[bot] avatar jcp19 avatar nunomachado avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

ryancaicse

spider's Issues

Debug results for the Cyclon trace

Current results without elimination:

 > Data Race #1 : (cyclonlite.CyclonActive.initWithData.73, cyclonlite.CyclonActive.insertReceivedToView.198)
 > Data Race #2 : (cyclonlite.CyclonActive.initWithData.73, cyclonlite.CyclonActive.selectPeersToShuffle.249)
 > Data Race #3 : (cyclonlite.CyclonActive.insertReceivedToView.198, cyclonlite.CyclonActive.selectPeersToShuffle.249)

The first two races should not happen

Old README

Improve description of the project, its goals, dependencies, how to install it and how to use it.

Simplify evaluation scripts

Instead of having two different gawk scripts for each of the two evaluation scenarios (data race detection and message race detection), I believe that one suffices per scenario. The gawk script should accept and argument detailing which of the two csv formats to print and read in the END action. In the body, we should read every required line/argument

Improve `Stats` Class

Ideas:

  • align results in the output
  • convert this class to a singleton instead of using static fields

Test scripts hard to mantain

The couple of scripts in the repository are probably outdated and are unnecessarily convoluted. It is better to create a small class to test the program and generate the reports. At first sight, it looks like it is going to need some hacks because the TraceProcessor class is a singleton -.-

Confusing result reporting

Improve the output of the program. The current output has a bunch of debug information which should not be printed.

Removing Inter-Thread Event Redundancy is leading to races being missed

In the case of the m3274.log in traces/micro-benchmarks, the number of detected data race conditions drops from 3 to 1 when the removing "redundant" inter-thread events. At first, it seems like a bug in the elimination of SND and RCV events but need more testing.

If this happens to not be a bug and the the removal of messages is proven to be unsound, we can create a new flag for Spider (-E) which stands for "Experimental Redundancy Elimination"
which will call the extension on ReX.

Optimization possible

Basically, the races seem to be very "local" in the sense that even though they have hundreds of candidates, what happens, at least in the case of the cyclon, is that these races translate into a few unique locations. Therefore, a possible optimization would be to make a list of unique possible locations in race from the set of candidates to race conditions and then only solve until you find a race. All other candidates can be ignored because even if they are sat, they will give rise to a race that has already been detected

Tests lacking

There's no testing in this project. I believe the best way to do this will be to run the program with small instances of traces and compare it's results to results computed 'by hand'. Besides, I believe this project could gain from having some unit testing.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.