utah-scs / splinter Goto Github PK

A low-latency, extensible, multi-tenant key-value store.

Makefile 1.11% JavaScript 0.56% C++ 4.54% Awk 0.02% R 0.58% Python 3.45% Go 0.49% Rust 75.32% Shell 7.26% C 2.28% Vim Script 4.26% Dockerfile 0.14%

key-value rust distributed-systems kernel-bypass low-latency

splinter's People

Contributors

Stargazers

Watchers

Forkers

u1142877 mazharn jethrosun rstutsman 742362144 fbird2020 chaosstudygroup imcg liaoyunkun ajunlonglive webclinic017 iq-scm

splinter's Issues

Sandstorm panics if compiled in debug mode

Steps to reproduce:

Compile in debug mode
Run sudo scripts/run-server and use a client with use_invoke = true to run sudo scripts/run-ycsb

Note: I'm not sure if there are other configurations that also trigger these errors, there may or may not be.

Sandstorm fails with two types of panics present in the log:

ethanr@sandstorm01:~/Sandstorm$ sudo scripts/run-server
INFO:server: Starting up Sandstorm server with config ServerConfig { mac_address: "3c:fd:fe:04:9f:c2", ip_address: "192.168.0.2", udp_port: 0, nic_pci: "0000:04:00.1", client_mac: "3c:fd:fe:04:b0:e2", client_ip: "192.168.0.1", num_tenants: 8, install_addr: "127.0.0.1:7700" }
INFO:server: Populating test data table and extensions...                                                                                                                                         INFO:server: Finished populating data and extensions                                                                                                                                              EAL: Detected 32 lcore(s)
EAL: Probing VFIO support...
Devname: "0000:04:00.1"
EAL: PCI device 0000:04:00.1 on NUMA socket 0
EAL:   probe driver: 8086:1572 net_i40e
INFO:server: Successfully added scheduler(TID 43978) with rx,tx,sibling queues (3, 3, 4) to core 13.
INFO:server: Successfully added scheduler(TID 43975) with rx,tx,sibling queues (0, 0, 1) to core 10.
INFO:server: Successfully added scheduler(TID 43979) with rx,tx,sibling queues (4, 4, 5) to core 14.
INFO:server: Successfully added scheduler(TID 43980) with rx,tx,sibling queues (5, 5, 6) to core 15.
INFO:server: Successfully added scheduler(TID 43977) with rx,tx,sibling queues (2, 2, 3) to core 12.
INFO:server: Successfully added scheduler(TID 43982) with rx,tx,sibling queues (7, 7, 0) to core 17.
INFO:server: Successfully added scheduler(TID 43981) with rx,tx,sibling queues (6, 6, 7) to core 16.
INFO:server: Successfully added scheduler(TID 43976) with rx,tx,sibling queues (1, 1, 2) to core 11.
thread 'sched-17' panicked at 'attempt to subtract with overflow', /users/ethanr/Sandstorm/net/framework/src/native/zcsi/mbuf.rs:91thread ':sched-109' panicked at '
attempt to subtract with overflownote: Run with `RUST_BACKTRACE=1` for a backtrace.
', /users/ethanr/Sandstorm/net/framework/src/native/zcsi/mbuf.rs:91:9
WARN:server: Detected misbehaving task 43975 on core 10.
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: "SendError(..)"', libcore/result.rs:983:5

attempt to subtract with overflow, from net/framework/src/native/zcsi/mbuf.rs:91
called 'Result::unwrap()' on an 'Err' value: "SendError(..)", I believe the unwrap is coming from "net/framework/src/scheduler/context.rs" line 200

I'm trying to debug the panic issue (#9), and it would be nice to be able to work from a binary that has been built with debug info.

Unify the client code

Client code is scattered across multiple files, and the same code(request packet formation, request Tx, and response Rx, etc.) is used multiple times in different clients, which requires manual changes in each client file(whenever needed).

Design the client layout, maybe in a way where standard code is shared across all the clients, and we only need to implement a trait per client.
Allow both open-loop and closed-loop load generation from the client
Add a few test cases, maybe automate the YCSB(or some other) benchmark to compare the performance across commits?

A panicking extension can sometimes cause a double panic and bring down the db

I will be adding an ext/panic extension soon to make it easier to reproduce this issue. I will edit this issue when I have done so.

For now, this issue can be trivially reproduced by adding a panic!("muahaha") to the main closure of the get extension. The first handful of invocations will panic and be caught, as intended, but then invariably a double panic will occur, which causes the runtime to abort.

This issue has been hard to debug, but here is what I do know:

Recall that panic handling has two phases
1. A call to panic!(), which calls the rust runtime handlers present in frames 0-5 of the backtrace, as well as the panic hook.
2. An unwinding phase, where the program traces back up through the call stack, running drop functions along the way. (And possibly other things?)
The second panic does not appear to be during the call to panic!(). This code panics during the formatting of the panic string, notice how std::panicking::rust_panic_with_hook appears twice in the backtrace. We only see one copy of this function in our backtrace.
Therefore, it seems that the panic is happening during the unwind phase. From the backtrace we can see that the second panic seems to be in the main generator closure of the extension. Here is an example of a panic during a drop method. As expected, there is only one set of runtime panic functions in the backtrace.

Below is a backtrace of the double panic.

Here is a more complete log of a full session, including a backtrace of one of the single panics that was successfully caught:
fullPanicIssueLog.txt.

stack backtrace:
   0:     0x7f3d01b67c6b - std::sys::unix::backtrace::tracing::imp::unwind_backtrace::h8458cd77216b6cb4
			       at libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
   1:     0x7f3d01b35b10 - std::sys_common::backtrace::print::hc884ca89c7ab7468
			       at libstd/sys_common/backtrace.rs:71
			       at libstd/sys_common/backtrace.rs:59
   2:     0x7f3d01b5b3bd - std::panicking::default_hook::{{closure}}::h4a3e30c6d4d0cba4
			       at libstd/panicking.rs:206
   3:     0x7f3d01b5b11b - std::panicking::default_hook::hea868ab86a1b7a87
			       at libstd/panicking.rs:222
   4:     0x7f3d01b5b8cf - std::panicking::rust_panic_with_hook::h2568e23a59a493fa
			       at libstd/panicking.rs:400
   5:     0x7f3d01b21065 - std::panicking::begin_panic::hce7e5a88f7ff4fa1
   6:     0x7f3d01b20ee2 - get::init::{{closure}}::h620c422872f2a80f
   7:     0x555e505f5e88 - std::panickingWARN:server: Detected misbehaving task 84558 on core 10.::
try::do_call::ha34c11298de1ecc2
   8:     0x555e50709cae - __rust_maybe_catch_panic
			       at libpanic_unwind/lib.rs:102
   9:     0x555e505e2d0d - <db::container::Container as db::task::Task>::run::h29a2f1bd716d50bd
  10:     0x555e505f587f - db::sched::RoundRobin::poll::hd3a9151cfc6bbe36
  11:     0xWARN:server: Detected misbehaving task 84559 on core 17.555e50653927
 - e2d2::schedulerINFO:server: Successfully added scheduler(TID 84560) with rx,tx,sibling queues (0, 0, 1) to core 10.::
standalone_scheduler::StandaloneScheduler::execute_internal::h4d688f42b578547c
  12:     0x555e5065371a - e2d2::scheduler::thread 'standalone_scheduler<unnamed>::' panicked at 'StandaloneSchedulerexplicit panic::', handle_requestsrc/lib.rs:::h393b287b1d55187346
:13
13:     0x555e50662b31thread ' - <unnamed>std' panicked at '::explicit panicsys_common', ::src/lib.rsbacktrace:::46__rust_begin_short_backtrace:::13h7a42e3ab2bac4c68

  14:     0x555e5066111b - std::panicking::try::do_call::h8e2568bebf30af60
  15:     0x555e50709cae - __rust_maybe_catch_panic
	       INFO:server: Successfully added scheduler(TID 84561) with rx,tx,sibling queues (7, 7, 0) to core 17.
	       at libpanic_unwind/lib.rs:102
  16:   thread ' <unnamed> ' panicked at '0xexplicit panic555e50659d0c',  - src/lib.rs<:F46 :as13
alloc::boxed::FnBox<A>>::thread 'call_box<unnamed>::' panicked at 'h3ddc12d9236471d3explicit panic
',   src/lib.rs17:: 46 : 13
 0x555e506ff117 - std::sys_common::thread::start_thread::h441a470255b0983b
			       at /checkout/src/liballoc/boxed.rs:645
			       at libstd/sys_common/thread.rs:24
  18:     0x555e506f24d8 - std::sys::unix::thread::Thread::new::thread_start::h8246db0ba3b8ab5d
			       at libstd/sys/unix/thread.rs:90
  19:     0x7f3d1c2836b9 - start_thread
  20:     0x7f3d1bda341c - clone
  21:                0x0 - <unknown>

Fix Sandstorm build system

Handle stack overflows by untrusted code

Currently, extensions can't cause arbitrary memory accesses using large stack values, but they can exhaust the stack since they run on the database workers kernel thread.

The default stack overflow handler for Rust is basically just a signal handler that checks to see if an access is to the guard page at the end of the stack, if it is then it basically crashes the program with a "stack overflow" message. If it's not it removes the handler and retries, usually giving a segfault.

This is all safe, but it's a clear denial-of-service. Sandstorm needs to override the signal handler and replace it with one that doesn't terminate the database process.

How we unwind the extension call that caused the problem is a separate issue. For now, detecting the case and preventing the crash is key.

Implement framework for cluster setup and experiments on Cloudlab

About the YCSB+T benchmark

Hi, @ankit-iitb .

I have a question about the implementation of YCSB+T.

Where did you get the latest YCSB+T project?

The only place I can find to fork it is on Akon Dey's github.
https://github.com/akon-dey/YCSB

Could you help me to implement the ycsb+t benchmark correctly?

In his article he reports having such methods:

"
• doTransactionInsert() creates a new account with an
initial balance captured from doTransactionDelete() operation described below.

• doTransactionRead() reads a set of account balances
determined by the key generator.

• doTransactionScan() scans the database given the start
key and the number of records and fetches them from the
data base.

• doTransactionUpdate() reads a record and add $1 from
the balance captured from delete operations to it and write
it back.

• doTransactionDelete() reads an account record, add the
amount to the captured the balance (capture used in
doTransactionInsert()) and then deletes the record.

• doTransactionReadModifyWrite() reads two records,
subtracts $1 from the one of the two and adds $1 to
the other before writing them both back.
"

In the akon repository where I made the fork, I didn't find the implementation of the methods doTransactionInsert() , doTransactionRead() , doTransactionScan() , doTransactionUpdate() and doTransactionDelete().

I just noticed that the doTransactionReadModifyWrite() method is implemented, where it subtracts the value 1 from account A and assigns that value to account B.

Could you help me understand this part of the implementation?

Regards,
Caio

how to build env locally?

I wanna run splinter in my local cluster, but i meet a lot problem.
I try to use build.sh in net dir, there are some error while build.

Compiling zcsi-delay v0.1.0 (/root/splinter/net/test/delay-test)
error: the legacy LLVM-style asm! syntax is no longer supported
--> test/delay-test/src/nf.rs:7:9
|
7 | asm!("nop"
| ^---
| |
| help: replace with: llvm_asm!
| |
8 | | :
9 | | :
10 | | :
11 | | : "volatile");
| |__________________^
|
= note: consider migrating to the new asm! syntax specified in RFC 2873
= note: alternatively, switch to llvm_asm! to keep your code working as it is

warning: use of deprecated function time::precise_time_ns: Use OffsetDateTime::now() - OffsetDateTime::unix_epoch() to get a Duration since a known epoch.
--> test/delay-test/src/main.rs:85:29
|
85 | let mut start = time::precise_time_ns() as f64 / CONVERSION_FACTOR;
| ^^^^^^^^^^^^^^^^^^^^^
|
= note: #[warn(deprecated)] on by default

warning: use of deprecated function time::precise_time_ns: Use OffsetDateTime::now() - OffsetDateTime::unix_epoch() to get a Duration since a known epoch.
--> test/delay-test/src/main.rs:90:27
|
90 | let now = time::precise_time_ns() as f64 / CONVERSION_FACTOR;
| ^^^^^^^^^^^^^^^^^^^^^

error[E0593]: closure is expected to take 4 arguments, but it takes 2 arguments
--> test/delay-test/src/main.rs:70:45
|
70 | context.add_pipeline_to_run(Arc::new(
| ____________________________^
71 | | move |p, s: &mut StandaloneScheduler| test(p, s, delay),
| | ------------------------------------- takes 2 arguments
72 | | ));
| |^ expected closure that takes 4 arguments

error: aborting due to 2 previous errors; 2 warnings emitted

For more information about this error, try rustc --explain E0593.
error: could not compile zcsi-delay

i don't know how to fix it, plz help :)
or should i run splinter in cloudlab? I don't know how to get a account of cloudlab.

failed to send request

when i run ycsb test, error happened like this:

WARN:splinter::dispatch: Failed to send all packets!
WARN:splinter::dispatch: Failed to send all packets!
WARN:splinter::dispatch: Failed to send all packets!
WARN:splinter::dispatch: Failed to send all packets!
WARN:splinter::dispatch: Failed to send all packets!
WARN:splinter::dispatch: Failed to send all packets!
WARN:splinter::dispatch: Failed to send all packets!
WARN:splinter::dispatch: Failed to send all packets!
WARN:splinter::dispatch: Failed to send all packets!
WARN:splinter::dispatch: Failed to send all packets!
WARN:splinter::dispatch: Failed to send all packets!
WARN:splinter::dispatch: Failed to send all packets!
WARN:splinter::dispatch: Failed to send all packets!
WARN:splinter::dispatch: Failed to send all packets!
WARN:splinter::dispatch: Failed to send all packets!
WARN:splinter::dispatch: Failed to send all packets!
WARN:splinter::dispatch: Failed to send all packets!
WARN:splinter::dispatch: Failed to send all packets!
WARN:splinter::dispatch: Failed to send all packets!
WARN:splinter::dispatch: Failed to send all packets!
WARN:splinter::dispatch: Failed to send all packets!
WARN:splinter::dispatch: Failed to send all packets!
WARN:splinter::dispatch: Failed to send all packets!
WARN:splinter::dispatch: Failed to send all packets!
WARN:splinter::dispatch: Failed to send all packets!
WARN:splinter::dispatch: Failed to send all packets!
WARN:splinter::dispatch: Failed to send all packets!
WARN:splinter::dispatch: Failed to send all packets!
WARN:splinter::dispatch: Failed to send all packets!
WARN:splinter::dispatch: Failed to send all packets!
WARN:splinter::dispatch: Failed to send all packets!
WARN:splinter::dispatch: Failed to send all packets!
WARN:splinter::dispatch: Failed to send all packets!
WARN:splinter::dispatch: Failed to send all packets!
thread 'sched-0' panicked at 'Failed to allocate packet for request!', /root/splinter/db/src/rpc.rs:129:10

my hugepage config is 1G * 4 and 2M * 10480.
i want to know the env config in paper saying. plz help, really thanks!

Track and limit heap allocations for extensions

Right now extensions can allocation arbitrary amounts of the database process' heap. We need to inject a custom allocator into all extensions so that we can track heap allocations that happen while running untrusted code.

For now, detecting that an extension is past some allocation limit and forcing a panic is acceptable. Eventually, we'll want to track all allocations, unwind the extension's call stack, and free all heap allocations, but that will require us tackling the larger issue of safely unwinding extension call stacks.

There are some tricky edges to this: for example, if a vector is allocated in an extension call, it is possible that later operations on that vector could cause it to reallocate. Related: if an extension calls out to the database, it's likely that we shouldn't count heap allocations against the extension that happen in trusted code (in fact, ideally, we wouldn't have heap allocations that escape the trusted scope at all).

Index out of range when running PUSHBACK

I got this error when trying to run the PUSHBACK workload

thread 'INFO:splinter::dispatch: Received many responses...sched-7
' panicked at 'index 377 out of range for slice of length 278', INFO:pushback: PUSHBACK/rustc/XXXX/src/libcore/slice/mod.rs

Configuration used:

key_size = 30
value_size = 100
num_aggr = 2
order = 1000

Add -L argument to curl in get-dpdk.sh

Adding it here in case I forget.

Dpdk web server now redirects the request for dpdk-(version).tar.gz to somewhere. This results in download of invalid file and error "gzip: stdin: not in gzip format" when trying to untar the downloaded file.

To allow curl to follow the redirects, add -L argument to curl command in ./net/3rdparty/get-dpdk.sh

Move to a reliably connected network stack (InfRc?)

Implement Performance Metrics framework for Sandstorm

Update the readme file

Keep setup information in the main readme
Add separate readme files for client and server
Maybe add 1 or 2 graphs to show Splinter gains?
Update the OSDI and ATC papers on the main page
Readme for the multiclient run is confusing; update it

Add delete_table and delete_key.

DeleteKey functionality would be especially useful as some stored items may no longer be needed. DeleteTable might also be useful in future extensions.

Cross-Platform solution needed?

Hello,

I am seeking something like this for a cross-platform project that also included Windows, Linux, and MacOS.

Could this be made work for that purpose?

Thanks

cannot compile ext

I am a rust beginer, try to run server and have an error:

thread 'main' panicked at 'Failed to load get() extension.', /root/splinter/db/src/master.rs:598:13
stack backtrace:
0: std::panicking::begin_panic
at /rustc/1c389ffeff814726dec325f0f2b0c99107df2673/library/std/src/panicking.rs:519:12
1: db::master::Master::load_test
at /root/splinter/db/src/master.rs:598:13
2: client::main
at ./src/bin/client/client.rs:632:9
3: core::ops::function::FnOnce::call_once
at /rustc/1c389ffeff814726dec325f0f2b0c99107df2673/library/core/src/ops/function.rs:227:5

I use ext/safe-compile to compile extend.

~/splinter/ext# ./safe-compile get get
Compiling src/lib.rs for get target into target/get/deps/libget.so
ERROR: All extensions must include #![no_std]; extensions must only use modules exposed through the sandstorm crate.

plz help me, verrrrry thanks. : )