I decided to try starfish - just building it and running the tests. I tripped over a couple of problems when doing so and will describe each of them in detail. In the end I had to upgrade rust and SPDK versions to make it to work, but here is the longer story to it.
The first problem I encountered was the same as reported in ticket #5:
232 | #[allow(clippy::cast_ptr_alignment)]
I think this problem appears only for certain combination of rust tool chain and feature crate version. Upgrade of the rust toolchain to more recent version solved the problem.
Another problem was a segfault in SPDK when running the test. I hit double linked list corruption issue in spdk_nvmf_subsystem_state_change(). The stack is as follows (shortened version):
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007ffff7a17535 in __GI_abort () at abort.c:79
#2 0x00007ffff7a7e516 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff7ba2c00 "%s\n") at ../sysdeps/posix/libc_fatal.c:181
#3 0x00007ffff7a853aa in malloc_printerr (str=str@entry=0x7ffff7ba4c88 "malloc(): smallbin double linked list corrupted") at malloc.c:5336
#4 0x00007ffff7a89144 in _int_malloc (av=av@entry=0x7ffff0000020, bytes=bytes@entry=32) at malloc.c:3632
#5 0x00007ffff7a8bbc1 in __libc_calloc (n=<optimized out>, elem_size=<optimized out>) at malloc.c:3420
#6 0x00007ffff7cbf930 in spdk_nvmf_subsystem_state_change (subsystem=0x7ffff0428b40, requested_state=requested_state@entry=SPDK_NVMF_SUBSYSTEM_INACTIVE,
cb_fn=cb_fn@entry=0x7ffff7c8aed0 <nvmf_tgt_subsystem_stopped>, cb_arg=cb_arg@entry=0x0) at subsystem.c:492
#7 0x00007ffff7cbfcbd in spdk_nvmf_subsystem_stop (subsystem=<optimized out>, cb_fn=cb_fn@entry=0x7ffff7c8aed0 <nvmf_tgt_subsystem_stopped>,
cb_arg=cb_arg@entry=0x0) at subsystem.c:529
#8 0x00007ffff7c8ac2a in nvmf_tgt_advance_state () at nvmf_tgt.c:300
#9 0x00007ffff7ca54e1 in _spdk_vhost_fini_remove_vdev_cb (vdev=<optimized out>, arg=0x7ffff7c8b130 <spdk_vhost_subsystem_fini_done>) at vhost.c:1377
#10 0x00007ffff7ca55aa in spdk_vhost_call_external_event_foreach (fn=0x7ffff7ca54c0 <_spdk_vhost_fini_remove_vdev_cb>,
arg=0x7ffff7c8b130 <spdk_vhost_subsystem_fini_done>) at vhost.c:1300
#11 0x00007ffff7cb6eda in _spdk_event_queue_run_batch (reactor=0x7ffff000d0c0, reactor=0x7ffff000d0c0) at reactor.c:207
#12 _spdk_reactor_run (arg=0x7ffff000d0c0) at reactor.c:506
#13 0x00007ffff7cb7529 in spdk_reactors_start () at reactor.c:692
#14 0x00007ffff7cb61a3 in spdk_app_start (opts=0x7ffff717d3a0, start_fn=0x55555558f4d0 <spdk_sys::event::AppOpts::start::start_wrapper>,
arg1=0x7ffff717d198, arg2=0x0) at app.c:576
#15 0x000055555558f2ef in spdk_sys::event::AppOpts::start (self=..., f=...) at spdk-sys/src/event.rs:62
#16 0x0000555555585d80 in spdk_sys::ete_test::ete_test () at spdk-sys/src/lib.rs:48
I decided to upgrade to latest SPDK released version (v18.10) as there were quite a few bug fixes in this area. And indeed, the newer SPDK version does not crash, however it brings a lot of other problems when building it. The way how shared libraries are built is different. libspdk.so
is no longer produced as a by-product of building static library archives. Instead a new configure option --with-shared
must be used to build the shared libs. That builds a set of shared libs and a linker script libspdk.so
which combines all of those libs together. For the record I'm attaching the contents of the linker script:
GROUP ( AS_NEEDED ( libspdk_app_rpc.so libspdk_bdev.so libspdk_bdev_aio.so libspdk_bdev_malloc.so libspdk_bdev_null.so libspdk_bdev_nvme.so libspdk_bdev_rpc.so libspdk_bdev_virtio.so libspdk_blob.so libspdk_blob_bdev.so libspdk_blobfs.so libspdk_conf.so libspdk_copy.so libspdk_copy_ioat.so libspdk_event.so libspdk_event_bdev.so libspdk_event_copy.so libspdk_event_iscsi.so libspdk_event_nbd.so libspdk_event_net.so libspdk_event_nvmf.so libspdk_event_scsi.so libspdk_event_vhost.so libspdk_ioat.so libspdk_iscsi.so libspdk_json.so libspdk_jsonrpc.so libspdk_log.so libspdk_log_rpc.so libspdk_lvol.so libspdk_nbd.so libspdk_net.so libspdk_nvme.so libspdk_nvmf.so libspdk_rpc.so libspdk_rte_vhost.so libspdk_scsi.so libspdk_sock.so libspdk_sock_posix.so libspdk_thread.so libspdk_trace.so libspdk_util.so libspdk_vbdev_error.so libspdk_vbdev_gpt.so libspdk_vbdev_lvol.so libspdk_vbdev_passthru.so libspdk_vbdev_raid.so libspdk_vbdev_split.so libspdk_vhost.so libspdk_virtio.so ) )
Due to "AS_NEEDED" only the libs which are actually used by the application are linked to the executable. So far so good, but there is one more problem. DPDK libs are missing. They are not build nor installed by default. So I had to explicitly call make CONFIG_RTE_BUILD_SHARED_LIB=y
and make install
in dpdk subdirectory of spdk. One more thing is that since dpdk is optional part of spdk, one has to modify build.rs script in spdk-sys to include following:
+ println!("cargo:rustc-link-lib=spdk_env_dpdk");
+ println!("cargo:rustc-link-lib=dpdk");
With these changes we have a test binary which does not fail in dynamic linker stage when executed. But we are still not done:
Starting SPDK v18.10 / DPDK 18.08.0 initialization...
[ DPDK EAL parameters: hello_blob --no-shconf -c 0x1 --file-prefix=spdk_pid4320 ]
EAL: Detected 3 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Probing VFIO support...
app.c: 602:spdk_app_start: *NOTICE*: Total cores available: 1
reactor.c: 703:spdk_reactors_init: *NOTICE*: Occupied cpu socket mask is 0x1
reactor.c: 724:spdk_reactors_init: *NOTICE*: Event_mempool creation failed on preferred socket 0.
reactor.c: 745:spdk_reactors_init: *ERROR*: spdk_event_mempool creation failed
app.c: 612:spdk_app_start: *ERROR*: Invalid reactor mask.
test ete_test::ete_test ... FAILED
Using debugger I found out that the root cause of the problem is missing librte_mempool_ring.so.1.1
. How is it possible? When building the test binary, rustc records to dependencies only those shared libraries which are actually referenced from the main program or libraries which it depends on. This library is not directly referenced but has this important piece of code which is executed in library constructor (when the lib is loaded) (in drivers/mempool/ring/rte_mempool_ring.c):
MEMPOOL_REGISTER_OPS(ops_mp_mc);
MEMPOOL_REGISTER_OPS(ops_sp_sc);
MEMPOOL_REGISTER_OPS(ops_mp_sc);
MEMPOOL_REGISTER_OPS(ops_sp_mc);
This problem is normally easy to fix. Linker can be instructed to always include the library in dependencies even if not used by using --no-as-needed
linker flag. The default in rustc is the opposite and I haven't found a way how to change this in spite of trying many things. So a dirty workaround for this last problem is to manually preload the library:
LD_PRELOAD=/usr/local/lib/librte_mempool_ring.so.1.1 cargo test --all
I have a patch which is dirty and incomplete because of a missing solution for this last problem, but might solve as an inspiration for people having the same problem: jkryl@c8f6c7d