Giter Site home page Giter Site logo

swym's Introduction

swym

Build Status License Cargo Documentation Rust Nightly

swym is an experimental STM that can be used to implement concurrent data structures with performance not far from lock-free data structures.

See the docs for more information.

Initial benchmarks can be found here.

swym's People

Contributors

avi-d-coder avatar mtak- avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

swym's Issues

Need x86_64 hardware transactional memory benchmarks

To create heuristics for determining when HTM is likely to speed up a transaction, benchmarks are necessary.

If anyone reading this has a few free moments, and a CPU that supports transactional memory (some haswell processors, or later), I would be grateful if you would help out.

Need your cpuinfo.

linux

$ cat /proc/cpuinfo
... output here ...

macos, use system profiler

Hardware Overview:

  Model Name:	MacBook Pro
  Model Identifier:	MacBookPro13,3
  Processor Name:	Intel Core i7
  Processor Speed:	2.9 GHz
  Number of Processors:	1
  Total Number of Cores:	4
  L2 Cache (per Core):	256 KB
  L3 Cache:	8 MB

In the swym-htm project, there's a x.py script that can be run to create benchmarks.
Path: swym/swym-htm/x.py

My output

$ ./x.py bench
test bench_abort  ... bench:  49,516,424 ns/iter (+/- 2,894,024)
test bench_tx0000 ... bench:  12,223,724 ns/iter (+/- 832,878)
test bench_tx0001 ... bench:  12,501,338 ns/iter (+/- 1,302,260)
test bench_tx0002 ... bench:  12,575,727 ns/iter (+/- 1,187,649)
test bench_tx0004 ... bench:  12,615,560 ns/iter (+/- 1,315,751)
test bench_tx0008 ... bench:  12,570,609 ns/iter (+/- 1,079,593)
test bench_tx0016 ... bench:  12,489,782 ns/iter (+/- 1,333,362)
test bench_tx0024 ... bench:  20,527,809 ns/iter (+/- 2,620,641)
test bench_tx0032 ... bench:  21,912,395 ns/iter (+/- 2,646,718)
test bench_tx0040 ... bench:  22,725,125 ns/iter (+/- 2,352,237)
test bench_tx0048 ... bench:  23,449,303 ns/iter (+/- 2,910,261)
test bench_tx0056 ... bench:  24,562,059 ns/iter (+/- 2,760,177)
test bench_tx0064 ... bench:  14,486,932 ns/iter (+/- 767,444)
test bench_tx0072 ... bench:  14,776,765 ns/iter (+/- 1,522,511)
test bench_tx0080 ... bench:  15,262,921 ns/iter (+/- 1,356,767)
test bench_tx0112 ... bench:  17,002,269 ns/iter (+/- 1,317,870)
test bench_tx0120 ... bench:  17,277,812 ns/iter (+/- 1,579,734)
test bench_tx0128 ... bench:  17,917,132 ns/iter (+/- 2,082,900)
test bench_tx0256 ... bench:  26,625,425 ns/iter (+/- 1,667,125)

Additionally, running test with nocapture will give a feel for the capacity of your cpu

$ ./x.py test capacity --release -- --nocapture
Capacity: 30016

Lock Sorting

I'm interested in modifying the current implementation so that locks are acquired in order based on memory location. I'm trying to implement sorting within the epoch_locks() function call of write_log and read_log, and have a couple of questions:

  • Is this the correct place to implement sorting (given that I don't want to change the underlying data structure to a heap)? Are there additional calls/places that I need to sort to make sure global lock order is preserved?

  • How are epoch_locks stored within the "self.data" (in write_log for example) field that is then transformed to a flatmap? all I can see is an fvec with an additional data field, but am unsure how the data numbers translate to the entries and tcells that are returned and processed. Are these values the times of last write? or memory locations? how is the mapping done between these and the actual EpochLock structures returned?

Transaction Ordering

I am working to understand how transactions can possibly interact with each other in swym.  Below is an example that I have been working with.  It seems that it is possible for two transactions to commit out of order with respect to the recorded times in the EpochLocks.  Is this the case?  

If so, is it possible for any other transaction to read/write a set of inconsistent values (for example, to read both A=1 and B=100 in the following example)?  I have been unable to find a scenario in which the "out of order" EpochLock values actually cause any issues; is it correct that the time associated with each EpochLock does not need to be able to be compared to the time in a different EpochLock?

Example: Transaction Tx1 reads B and writes (B+1) to A.  Tx2 writes 100 to B.  Initially, B=1.  If we considered a purely sequential model, there are two possible outcomes: (1) Tx1 executes, then Tx2, and the final values are A=2 and B=100; or (2) Tx2 executes, then Tx1, and the final values are A=101 and B=100.

Now suppose we consider a possible execution order with swym. Tx1 reads B=1, then acquires the write lock for A, and validates the read of B=1 successfully.  Next, Tx2 acquires the write lock for B, writes B=100, and increments EpochClock to 1.  Tx2 then releases the lock on B, publishing 100 to B and setting the time in the EpochLock associated with B to 1.  Next, Tx1 writes A=2, increments EpochClock, and releases the lock, setting the time in the EpochLock associated with A to 2.
In this example, the final values are A=2 and B=100, which corresponds to scenario (1) above.  However, the EpochLock times associated with A and B indicate that Tx1 and Tx2 completed in the opposite order.

To summarize, my two questions are: (1) Am I understanding the code correctly that the above situation can indeed occur? (2) Is there any scenario in which this situation (the two transactions seeming to commit in the opposite order) matters for any other transactions?

Undefined Behavior

Running cargo miri test reports the following UB.

running 20 tests
error[E0080]: Miri evaluation error: trying to reborrow for Unique, but parent tag <93516> does not have an appropriate item in the borrow stack
   --> src/internal/alloc/dyn_vec.rs:361:38
    |
361 |             Some(DynElemMut { value: result })
    |                                      ^^^^^^ Miri evaluation error: trying to reborrow for Unique, but parent tag <93516> does not have an appropriate item in the borrow stack
    |
note: inside call to `<internal::alloc::dyn_vec::IterMut<dyn internal::alloc::dyn_vec::test::MyAny> as std::iter::Iterator>::next` at src/internal/alloc/dyn_vec.rs:426:9
   --> src/internal/alloc/dyn_vec.rs:426:9
    |
426 |         self.iter.next().map(|DynElemMut { value }| Owned { value })
    |         ^^^^^^^^^^^^^^^^
note: inside call to `<internal::alloc::dyn_vec::Drain<dyn internal::alloc::dyn_vec::test::MyAny> as std::iter::Iterator>::next` at src/internal/alloc/dyn_vec.rs:550:21
   --> src/internal/alloc/dyn_vec.rs:550:21
    |
550 |         let first = iter.next().unwrap();
    |                     ^^^^^^^^^^^
note: inside call to `internal::alloc::dyn_vec::test::drain` at src/internal/alloc/dyn_vec.rs:530:5
   --> src/internal/alloc/dyn_vec.rs:530:5
    |
530 | /     fn drain() {
531 | |         use core::cell::Cell;
532 | |
533 | |         thread_local! {
...   |
572 | |         assert_eq!(DROP_COUNT.with(|x| x.get()), 85);
573 | |     }
    | |_____^
    = note: inside call to closure at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ops/function.rs:227:5
    = note: inside call to `<[closure@src/internal/alloc/dyn_vec.rs:530:5: 573:6] as std::ops::FnOnce<()>>::call_once - shim` at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ops/function.rs:227:5
    = note: inside call to `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libtest/lib.rs:1467:5
    = note: inside call to `memory::test::__rust_begin_short_backtrace::<fn()>` at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libtest/lib.rs:1458:30
    = note: inside call to closure at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ops/function.rs:227:5
    = note: inside call to `<[closure@DefId(38:367 ~ test[b7d9]::run_test[0]::{{closure}}[3]) 0:fn()] as std::ops::FnOnce<()>>::call_once - shim(vtable)` at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/liballoc/boxed.rs:922:9
    = note: inside call to `<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send> as std::ops::FnOnce<()>>::call_once` at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panic.rs:315:9
    = note: inside call to `<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>> as std::ops::FnOnce<()>>::call_once` at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panicking.rs:292:40
    = note: inside call to `std::panicking::try::do_call::<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>, ()>` at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panicking.rs:288:5
    = note: inside call to `std::panicking::try::<(), std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>>` at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panic.rs:394:9
    = note: inside call to `std::panic::catch_unwind::<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>, ()>` at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libtest/lib.rs:1413:26
    = note: inside call to closure at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libtest/lib.rs:1435:13
    = note: inside call to `memory::test::run_test::run_test_inner` at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libtest/lib.rs:1454:28
    = note: inside call to `memory::test::run_test` at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libtest/lib.rs:1107:13
    = note: inside call to `memory::test::run_tests::<[closure@DefId(38:316 ~ test[b7d9]::run_tests_console[0]::{{closure}}[2]) 0:&mut memory::test::ConsoleTestState, 1:&mut std::boxed::Box<dyn memory::test::formatters::OutputFormatter>]>` at /home/user/.rustup/toolchains/nightly
-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libtest/lib.rs:957:5
    = note: inside call to `memory::test::run_tests_console` at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libtest/lib.rs:295:15
    = note: inside call to `memory::test::test_main` at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libtest/lib.rs:329:5
    = note: inside call to `memory::test::test_main_static`
    = note: inside call to `main` at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/rt.rs:64:34
    = note: inside call to closure at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/rt.rs:49:73
    = note: inside call to closure at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/sys_common/backtrace.rs:126:5
    = note: inside call to `std::sys_common::backtrace::__rust_begin_short_backtrace::<[closure@DefId(1:6021 ~ std[8641]::rt[0]::lang_start_internal[0]::{{closure}}[0]::{{closure}}[0]) 0:&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe], i32>` at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/rt.rs:49:13
    = note: inside call to closure at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panicking.rs:292:40
    = note: inside call to `std::panicking::try::do_call::<[closure@DefId(1:6020 ~ std[8641]::rt[0]::lang_start_internal[0]::{{closure}}[0]) 0:&&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe], i32>` at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panicking.rs:288:5
    = note: inside call to `std::panicking::try::<i32, [closure@DefId(1:6020 ~ std[8641]::rt[0]::lang_start_internal[0]::{{closure}}[0]) 0:&&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe]>` at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panic.rs:394:9
    = note: inside call to `std::panic::catch_unwind::<[closure@DefId(1:6020 ~ std[8641]::rt[0]::lang_start_internal[0]::{{closure}}[0]) 0:&&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe], i32>` at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/rt.rs:48:25

powerpc hardware transactional memory support

It would be greatly appreciated if someone would contribute an implementation of powerpc's hardware transactional memory. It is mostly already sketched out, in the swym-htm crate, but needs hardware to test.

Simple tree using swym

Hi there,

I'm trying to write a simple BST in swym based on this Mutex implementation I wrote. I'm reading through swym-rbtree for inspiration now, but it's a lot more complicated and confusing.

Could you give me any pointers (haha) on the easiest way to translate this? I think TPtr makes sense instead of TCell to avoid cloning, but I don't see examples of TPtr in swym except for TPtr::privatize_as_box.

I would appreciate any advice!

Segfault with concurrent insertions/deletions using the rbtree

Hi and thanks for the cool project!

I was playing with the rbtree implementation and while the current benchmarks work as expected, when I tried to have multiple threads performing insertions and deletions at the same time I got a null pointer exception in the removal method.

I was wondering whether this is a known issue and if you plan to address it. :)

Thanks,
George

100,000 items for benchmark too small?

With only 100k items in the benchmark, all the items could be held in the 8 MB cache, which would result in misleading performance, or? Can you kindly post a similar benchmark but with say, 10M or 100M items? Thanks!

Possible improvement: justifying uses of `unsafe { .. }`

Hi there; Interesting library!

It seems generally well documented from a user's perspective. However, I found the amount of comments justifying uses of unsafe { .. } and unsafe impl to be lacking. I think it could help everyone (including future you...) to document why parts of the library is sound.

As an example, I found it unclear why https://docs.rs/swym/0.1.0-preview/src/swym/tcell.rs.html#197-210 is sound. In particular, I conflated Borrow with the standard library's trait and didn't spot the exclusion of uninhabited types T (if you didn't rule those out it might have been unsound).

Best wishes // Centril

Backoff algorithm

Presently, on transaction failure, the transaction is immediately retried. Using --features stats on rbtree reveals that can have terrible outcomes. There are occasionally transactions that have been attempted 80,000 times or more.

Adding a simple std::thread::yield_now on tx fail brings the max retries down to ~= 30 increasing overall throughput substantially. Is yield_now the right tool, or should there be a more complex backoff algorithm?

In the critical section of the commit algorithm, write locks are grabbed and held for a limited amount of time. While they are held, no allocations, syscalls, etc happen on that thread, making me think that parking the thread might be overkill. I don't know enough about schedulers to say for sure tho.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.