swym is an experimental STM that can be used to implement concurrent data structures with performance not far from lock-free data structures.
See the docs for more information.
Initial benchmarks can be found here.
Efficient transactional memory in rust.
License: MIT License
Hi there,
I'm trying to write a simple BST in swym based on this Mutex implementation I wrote. I'm reading through swym-rbtree
for inspiration now, but it's a lot more complicated and confusing.
Could you give me any pointers (haha) on the easiest way to translate this? I think TPtr
makes sense instead of TCell
to avoid cloning, but I don't see examples of TPtr
in swym except for TPtr::privatize_as_box
.
I would appreciate any advice!
It would be greatly appreciated if someone would contribute an implementation of powerpc's hardware transactional memory. It is mostly already sketched out, in the swym-htm
crate, but needs hardware to test.
To create heuristics for determining when HTM is likely to speed up a transaction, benchmarks are necessary.
If anyone reading this has a few free moments, and a CPU that supports transactional memory (some haswell processors, or later), I would be grateful if you would help out.
Need your cpuinfo.
linux
$ cat /proc/cpuinfo
... output here ...
macos, use system profiler
Hardware Overview:
Model Name: MacBook Pro
Model Identifier: MacBookPro13,3
Processor Name: Intel Core i7
Processor Speed: 2.9 GHz
Number of Processors: 1
Total Number of Cores: 4
L2 Cache (per Core): 256 KB
L3 Cache: 8 MB
In the swym-htm
project, there's a x.py
script that can be run to create benchmarks.
Path: swym/swym-htm/x.py
My output
$ ./x.py bench
test bench_abort ... bench: 49,516,424 ns/iter (+/- 2,894,024)
test bench_tx0000 ... bench: 12,223,724 ns/iter (+/- 832,878)
test bench_tx0001 ... bench: 12,501,338 ns/iter (+/- 1,302,260)
test bench_tx0002 ... bench: 12,575,727 ns/iter (+/- 1,187,649)
test bench_tx0004 ... bench: 12,615,560 ns/iter (+/- 1,315,751)
test bench_tx0008 ... bench: 12,570,609 ns/iter (+/- 1,079,593)
test bench_tx0016 ... bench: 12,489,782 ns/iter (+/- 1,333,362)
test bench_tx0024 ... bench: 20,527,809 ns/iter (+/- 2,620,641)
test bench_tx0032 ... bench: 21,912,395 ns/iter (+/- 2,646,718)
test bench_tx0040 ... bench: 22,725,125 ns/iter (+/- 2,352,237)
test bench_tx0048 ... bench: 23,449,303 ns/iter (+/- 2,910,261)
test bench_tx0056 ... bench: 24,562,059 ns/iter (+/- 2,760,177)
test bench_tx0064 ... bench: 14,486,932 ns/iter (+/- 767,444)
test bench_tx0072 ... bench: 14,776,765 ns/iter (+/- 1,522,511)
test bench_tx0080 ... bench: 15,262,921 ns/iter (+/- 1,356,767)
test bench_tx0112 ... bench: 17,002,269 ns/iter (+/- 1,317,870)
test bench_tx0120 ... bench: 17,277,812 ns/iter (+/- 1,579,734)
test bench_tx0128 ... bench: 17,917,132 ns/iter (+/- 2,082,900)
test bench_tx0256 ... bench: 26,625,425 ns/iter (+/- 1,667,125)
Additionally, running test
with nocapture will give a feel for the capacity of your cpu
$ ./x.py test capacity --release -- --nocapture
Capacity: 30016
I'm interested in modifying the current implementation so that locks are acquired in order based on memory location. I'm trying to implement sorting within the epoch_locks() function call of write_log and read_log, and have a couple of questions:
Is this the correct place to implement sorting (given that I don't want to change the underlying data structure to a heap)? Are there additional calls/places that I need to sort to make sure global lock order is preserved?
How are epoch_locks stored within the "self.data" (in write_log for example) field that is then transformed to a flatmap? all I can see is an fvec with an additional data field, but am unsure how the data numbers translate to the entries and tcells that are returned and processed. Are these values the times of last write? or memory locations? how is the mapping done between these and the actual EpochLock structures returned?
Is there a plan for Wasm?
swym-rbtree
is not published on crates.io. Do you have any plan to release it soon?
rust-stm has a lot of helpful background info to pull from. This Simon Marlow's Parallel and Concurrent Programming in Haskell: Software Transactional Memory
Chapter Summary looks valuable.
Also, that project has a handful of "watchers" who might have interest in this project.
Hi and thanks for the cool project!
I was playing with the rbtree implementation and while the current benchmarks work as expected, when I tried to have multiple threads performing insertions and deletions at the same time I got a null pointer exception in the removal method.
I was wondering whether this is a known issue and if you plan to address it. :)
Thanks,
George
Running cargo miri test
reports the following UB.
running 20 tests
error[E0080]: Miri evaluation error: trying to reborrow for Unique, but parent tag <93516> does not have an appropriate item in the borrow stack
--> src/internal/alloc/dyn_vec.rs:361:38
|
361 | Some(DynElemMut { value: result })
| ^^^^^^ Miri evaluation error: trying to reborrow for Unique, but parent tag <93516> does not have an appropriate item in the borrow stack
|
note: inside call to `<internal::alloc::dyn_vec::IterMut<dyn internal::alloc::dyn_vec::test::MyAny> as std::iter::Iterator>::next` at src/internal/alloc/dyn_vec.rs:426:9
--> src/internal/alloc/dyn_vec.rs:426:9
|
426 | self.iter.next().map(|DynElemMut { value }| Owned { value })
| ^^^^^^^^^^^^^^^^
note: inside call to `<internal::alloc::dyn_vec::Drain<dyn internal::alloc::dyn_vec::test::MyAny> as std::iter::Iterator>::next` at src/internal/alloc/dyn_vec.rs:550:21
--> src/internal/alloc/dyn_vec.rs:550:21
|
550 | let first = iter.next().unwrap();
| ^^^^^^^^^^^
note: inside call to `internal::alloc::dyn_vec::test::drain` at src/internal/alloc/dyn_vec.rs:530:5
--> src/internal/alloc/dyn_vec.rs:530:5
|
530 | / fn drain() {
531 | | use core::cell::Cell;
532 | |
533 | | thread_local! {
... |
572 | | assert_eq!(DROP_COUNT.with(|x| x.get()), 85);
573 | | }
| |_____^
= note: inside call to closure at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ops/function.rs:227:5
= note: inside call to `<[closure@src/internal/alloc/dyn_vec.rs:530:5: 573:6] as std::ops::FnOnce<()>>::call_once - shim` at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ops/function.rs:227:5
= note: inside call to `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libtest/lib.rs:1467:5
= note: inside call to `memory::test::__rust_begin_short_backtrace::<fn()>` at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libtest/lib.rs:1458:30
= note: inside call to closure at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ops/function.rs:227:5
= note: inside call to `<[closure@DefId(38:367 ~ test[b7d9]::run_test[0]::{{closure}}[3]) 0:fn()] as std::ops::FnOnce<()>>::call_once - shim(vtable)` at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/liballoc/boxed.rs:922:9
= note: inside call to `<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send> as std::ops::FnOnce<()>>::call_once` at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panic.rs:315:9
= note: inside call to `<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>> as std::ops::FnOnce<()>>::call_once` at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panicking.rs:292:40
= note: inside call to `std::panicking::try::do_call::<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>, ()>` at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panicking.rs:288:5
= note: inside call to `std::panicking::try::<(), std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>>` at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panic.rs:394:9
= note: inside call to `std::panic::catch_unwind::<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>, ()>` at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libtest/lib.rs:1413:26
= note: inside call to closure at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libtest/lib.rs:1435:13
= note: inside call to `memory::test::run_test::run_test_inner` at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libtest/lib.rs:1454:28
= note: inside call to `memory::test::run_test` at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libtest/lib.rs:1107:13
= note: inside call to `memory::test::run_tests::<[closure@DefId(38:316 ~ test[b7d9]::run_tests_console[0]::{{closure}}[2]) 0:&mut memory::test::ConsoleTestState, 1:&mut std::boxed::Box<dyn memory::test::formatters::OutputFormatter>]>` at /home/user/.rustup/toolchains/nightly
-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libtest/lib.rs:957:5
= note: inside call to `memory::test::run_tests_console` at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libtest/lib.rs:295:15
= note: inside call to `memory::test::test_main` at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libtest/lib.rs:329:5
= note: inside call to `memory::test::test_main_static`
= note: inside call to `main` at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/rt.rs:64:34
= note: inside call to closure at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/rt.rs:49:73
= note: inside call to closure at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/sys_common/backtrace.rs:126:5
= note: inside call to `std::sys_common::backtrace::__rust_begin_short_backtrace::<[closure@DefId(1:6021 ~ std[8641]::rt[0]::lang_start_internal[0]::{{closure}}[0]::{{closure}}[0]) 0:&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe], i32>` at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/rt.rs:49:13
= note: inside call to closure at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panicking.rs:292:40
= note: inside call to `std::panicking::try::do_call::<[closure@DefId(1:6020 ~ std[8641]::rt[0]::lang_start_internal[0]::{{closure}}[0]) 0:&&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe], i32>` at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panicking.rs:288:5
= note: inside call to `std::panicking::try::<i32, [closure@DefId(1:6020 ~ std[8641]::rt[0]::lang_start_internal[0]::{{closure}}[0]) 0:&&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe]>` at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panic.rs:394:9
= note: inside call to `std::panic::catch_unwind::<[closure@DefId(1:6020 ~ std[8641]::rt[0]::lang_start_internal[0]::{{closure}}[0]) 0:&&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe], i32>` at /home/user/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/rt.rs:48:25
It looks like increment::unlogged
and increment::logged
read and writes the same value without incrementing it. Am I misreading the code? Is this the intended behavior?
https://github.com/mtak-/swym/blob/master/benches/increment.rs#L23
Presently, on transaction failure, the transaction is immediately retried. Using --features stats
on rbtree reveals that can have terrible outcomes. There are occasionally transactions that have been attempted 80,000 times or more.
Adding a simple std::thread::yield_now
on tx fail brings the max retries down to ~= 30 increasing overall throughput substantially. Is yield_now
the right tool, or should there be a more complex backoff algorithm?
In the critical section of the commit algorithm, write locks are grabbed and held for a limited amount of time. While they are held, no allocations, syscalls, etc happen on that thread, making me think that parking the thread might be overkill. I don't know enough about schedulers to say for sure tho.
I am working to understand how transactions can possibly interact with each other in swym. Below is an example that I have been working with. It seems that it is possible for two transactions to commit out of order with respect to the recorded times in the EpochLocks. Is this the case?
If so, is it possible for any other transaction to read/write a set of inconsistent values (for example, to read both A=1 and B=100 in the following example)? I have been unable to find a scenario in which the "out of order" EpochLock values actually cause any issues; is it correct that the time associated with each EpochLock does not need to be able to be compared to the time in a different EpochLock?
Example: Transaction Tx1 reads B and writes (B+1) to A. Tx2 writes 100 to B. Initially, B=1. If we considered a purely sequential model, there are two possible outcomes: (1) Tx1 executes, then Tx2, and the final values are A=2 and B=100; or (2) Tx2 executes, then Tx1, and the final values are A=101 and B=100.
Now suppose we consider a possible execution order with swym. Tx1 reads B=1, then acquires the write lock for A, and validates the read of B=1 successfully. Next, Tx2 acquires the write lock for B, writes B=100, and increments EpochClock to 1. Tx2 then releases the lock on B, publishing 100 to B and setting the time in the EpochLock associated with B to 1. Next, Tx1 writes A=2, increments EpochClock, and releases the lock, setting the time in the EpochLock associated with A to 2.
In this example, the final values are A=2 and B=100, which corresponds to scenario (1) above. However, the EpochLock times associated with A and B indicate that Tx1 and Tx2 completed in the opposite order.
To summarize, my two questions are: (1) Am I understanding the code correctly that the above situation can indeed occur? (2) Is there any scenario in which this situation (the two transactions seeming to commit in the opposite order) matters for any other transactions?
With only 100k items in the benchmark, all the items could be held in the 8 MB cache, which would result in misleading performance, or? Can you kindly post a similar benchmark but with say, 10M or 100M items? Thanks!
Hi there; Interesting library!
It seems generally well documented from a user's perspective. However, I found the amount of comments justifying uses of unsafe { .. }
and unsafe impl
to be lacking. I think it could help everyone (including future you...) to document why parts of the library is sound.
As an example, I found it unclear why https://docs.rs/swym/0.1.0-preview/src/swym/tcell.rs.html#197-210 is sound. In particular, I conflated Borrow
with the standard library's trait and didn't spot the exclusion of uninhabited types T
(if you didn't rule those out it might have been unsound).
Best wishes // Centril
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.