Loom is based on a number of papers that are useful to read, but it would be nice to give an informal description of these as well. I wrote a brain dump of my understanding but it probably should be verified / edited:
Papers:
The idea of permutation testing concurrent code is to permute through every possible interleaving multiple threads can have. However, doing so naively results in combinatorial explosion. The number of possible permutations grows factorially. The idea of partial order reduction is to only run "unique" permutations. Two permutations are identical if they result in the same behavior. As an example, assuming sequential consistency, if two threads concurrently read from the same atomic value, it doesn't matter which thread reads first, the end result is
the same. In this case, the two permutations are identical and the goal of partial order reduction is to only run one of those two permutations.
DPOR does this reduction dynamically, i.e. at runtime. The idea is that you only need to permute concurrent accesses to a cell that are dependent, i.e. where the permutation results in different behavior. The way this is done is by running the execution "naively". It runs through the execution without any thread pre-emptions. Whenever an atomic object is accessed, it continues the execution with the same thread. Whenever a mutex is accessed (as long as it is unlocked), it
continues with the same thread. The only time the scheduler switches threads is when the current thread is blocked on a wait call or if the thread terminates. When it switches threads, it picks another thread randomly and eagerly schedules that thread.
The key is, whenever a thread (lets call it thread B) accesses a shared object (atomic, mutex, condvar, ...) loom tracks this access on the object. When a thread (thread B) accesses a shared object, it also checks when the last dependent access was performed. A dependent
access is another thread (thread A) accessing that object at a point that a) is not prior in the causality of the current thread (happens-before) and b) in a way that could result in different
behavior if the order is permuted. When this happens, the exploration (sequence of execution permutations) has to add a backtrack point in order to check the alternate permutation. To do this, loom will perform the exact same execution up until thread A accessed that
shared object. Right before thread A accesses that shared object, loom then permutes. The goal of the permutation is to get thread B to access the object before thread A. To do this, loom performs a preemption and schedules thread B to run so that it can access the
shared object.
However, there is the potential for thread B to be currently blocked (waiting for a mutex to unlock or a condvar to be signaled). In this case, some other thread X needs to be scheduled in order to unblock B, but we don't know which thread X is, so loom ends up permuting through
activating all threads at this point. By doing so, either thread B will eventually access the shared object before thread A or it is impossible for thread B to access the shared object before thread A.
DPOR is a depth-first exploration of permutations.
That is the overview of DPOR. DPOR performs an exhaustive exploration of the execution. This means that every possible concurrent behavior is explored. DPOR works great for small test
cases, but for more complex test cases, it still cannot reduce enough to complete the test in a reasonable time. Enter bounded DPOR. The idea of bounded dpor is that to catch 99.9% of all concurrency bugs does not require an exhaustive exploration of all possible concurrent
behaviors. most bugs can be caught by limiting the number of thread preemptions. By running all possible permutations that can be done with a low number (2~3) thread preemptions, almost all bugs can be caught. The problem is that naively limiting thread preemptions
would break DPOR because DPOR adds permutations detected at run time and by artificially limiting permutations that are explored, DPOR will be be unable to discover critical permutations to run. The Bounded DPOR algorithm counters this problem by adding potentially unnecessary
backtracking points when hitting a bound. This guarantees that all possible permutations within the bounds are executed, but there may be unnecessary (identical to previous permutations) permutations as well.