thustorage / sherman Goto Github PK

View Code? Open in Web Editor NEW

91.0 4.0 29.0 82 KB

Sherman: A Write-Optimized Distributed B+Tree Index on Disaggregated Memory

CMake 0.58% C++ 93.46% C 5.42% Shell 0.54%

index memory-disaggregation rdma sigmod

sherman's People

Stargazers

Watchers

sherman's Issues

Compile failure for the project

Hi,

When I compile the project, it reports the following errors.

/home/v-baotonglu/Sherman/include/Common.h:28:55: error: dereferencing a null pointer in ‘*0’
   (char *)&((type *)(0))->field - (char *)((type *)(0))
                                                       ^
/home/v-baotonglu/Sherman/src/Tree.cpp:84:34: note: in expansion of macro ‘STRUCT_OFFSET’
   constexpr int kLeafHdrOffset = STRUCT_OFFSET(LeafPage, hdr);
                                  ^~~~~~~~~~~~~
/home/v-baotonglu/Sherman/include/Common.h:28:35: error: conversion of ‘LeafPage*’ null pointer to ‘char*’ is not a constant expression
   (char *)&((type *)(0))->field - (char *)((type *)(0))
                                   ^~~~~~~~~~~~~~~~~~~~~
/home/v-baotonglu/Sherman/src/Tree.cpp:84:34: note: in expansion of macro ‘STRUCT_OFFSET’
   constexpr int kLeafHdrOffset = STRUCT_OFFSET(LeafPage, hdr);
                                  ^~~~~~~~~~~~~
/home/v-baotonglu/Sherman/include/Common.h:28:55: error: dereferencing a null pointer in ‘*0’
   (char *)&((type *)(0))->field - (char *)((type *)(0))
                                                       ^
/home/v-baotonglu/Sherman/src/Tree.cpp:85:38: note: in expansion of macro ‘STRUCT_OFFSET’
   constexpr int kInternalHdrOffset = STRUCT_OFFSET(InternalPage, hdr);
                                      ^~~~~~~~~~~~~
/home/v-baotonglu/Sherman/include/Common.h:28:35: error: conversion of ‘InternalPage*’ null pointer to ‘char*’ is not a constant expression
   (char *)&((type *)(0))->field - (char *)((type *)(0))
                                   ^~~~~~~~~~~~~~~~~~~~~
/home/v-baotonglu/Sherman/src/Tree.cpp:85:38: note: in expansion of macro ‘STRUCT_OFFSET’
   constexpr int kInternalHdrOffset = STRUCT_OFFSET(InternalPage, hdr);

Any suggestions to fix it? Thanks

why use CityHash64 to process key in benchmark.cpp?

inline Key to_key(uint64_t k) {
return (CityHash64((char *)&k, sizeof(k)) + 1) % kKeySpace;
}

After Hashing, it is impossible to do range search and will overlap the origin {key1,value} if key2 and key1 meet hash collision

Sherman Evaluation

Hello, in the evaluation section of sherman, the performance of in memory database read and write operations is tested. I wonder if sherman can be used as a resource disaggregated system to run the network traffic trace of real applications，e.g. Tensorflow, Phoenix

Can I carry out this exp with only Connectx-4 nics and a 56Gb/s switch?

Can I carry out this exp with only Connectx-4 nics and a 56Gb/s switch? It seems that sherman requires Connectx-5 and above ones to leverage the on-chip memory.

Segfault in multi-threaded compute node

Hello,

I was testing your system on a single machine and only varying the number of threads in the compute node. For a single thread, it works fine. With multiple threads, the behavior is less predictable and it fails with a segmentation fault while allocating a new internal page. Attached is the stack trace from a failure when running ./benchmark 1 0 4

#0  0x00007ffff7aa48d6 in malloc () from /lib64/libc.so.6
#1  0x0000000000438ad9 in IndexCache::add_to_cache (this=0xd315d90, page=0x7fffba200450)
    at /home/x/Sherman/include/IndexCache.h:104
#2  0x00000000004341c1 in Tree::page_search (this=0xd2b7890, page_addr=..., 
    k=@0x7ffdb10cb628: 3814333, result=..., cxt=0x0, coro_id=0, from_cache=false)
    at /home/x/Sherman/src/Tree.cpp:674
#3  0x0000000000433270 in Tree::insert (this=0xd2b7890, k=@0x7ffdb10cb628: 3814333, 
    v=@0x7ffdb10cb620: 4785902, cxt=0x0, coro_id=0) at /home/x/Sherman/src/Tree.cpp:421
#4  0x00000000004266e0 in thread_run (id=3) at /home/x/Sherman/test/benchmark.cpp:110

Have you experienced this issue or I am doing something wrong?

Thank you!!

当运行到connect Node时，无法读取remoteMeta

当我尝试利用https://github.com/ruihong123/Sherman 这个版本的开源代码复现Sherman的时候(因为缺少相应网卡硬件条件)，运行./benchmark的时候，
(1) 首先会在serverEnter()处遇到问题，这个问题是SERVER_NUM_KEY并没有在memcached服务器中出现，当我尝试手动为memcached服务器利用memcached_set添加这个key的时候，才解决问题
(2) 后续会在connectNode处遇到问题，这个问题是DSMKeeper.cpp里第44行ExchangeMeta *remoteMeta = (ExchangeMeta *)memGet(getK.c_str(), getK.size()); 这里memGet执行的时候，无法通过memcached_get获取相应结果，即rc==MEMCACHED_SUCCESS始终无法满足，并最终陷入while(true)的死循环里。请问您能协助我解决这个问题吗？

Traffic type on Sherman cluster

Hello! After reading your paper, I have already know that there will be RDMA network traffic between computing nodes and memory nodes. But I'm not sure whether there will be RDMA network traffic between memory nodes and memory nodes, or between computing nodes and computing notes. Or I plan to capture packets on the switch. Is it feasible to identify the traffic type by analyzing the packet header? We are looking forward to your reply. Thank you very much!

Example problem to enable coroutine of Sherman

Hi,

From the operation interface of Sherman (e.g., insert, search), it seems that it already implemented the logic of coroutine.

But it is unclear to me how to enable the coroutine execution of Sherman. For example, how to fill the parameter CoroContext and coro_id. Could you give me an example program of the coroutine execution of Sherman?

Thanks!

About the Memcached and Cluster setup

您好,

我有一些关于Memcached和集群搭建的问题。

我不太清楚Memcached在这个project的作用是什么。集群中的每个节点都要和Memcached通讯吗？为什么节点之间不直接相连？
Memcached是需要跑在一个单独的节点中吗？以及您能不能提供一下instruction/guide如何build/run Memcached。这部分在README中不是很清楚。
关于这个命令./benchmark kNodeCount kReadRatio kThreadCount, kNodeCount, kReadRatio, kThreadCount各自的含义是什么？

Thanks!

How to launch a memory thread in memory node? I mean which executable file should I use? benchmark?

"Caching first 2 levels" missing?

I don't see the caching of "the highest two level of nodes (including root)" mentioned in paper from this code. There is only a skiplist index_cache for the level above leaves.

Is it removed intentionally? I was trying to find this code because caching the first 2 levels does not seem compatible with fence key checks in my opinion - imagine that a b-link style split have split the children but have not propagated to upper levels.

BTW, regardless, I like the paper!

It seems that the 4 bit field doesn't become effective.

class LeafEntry {
public:
uint8_t f_version : 4;
Key key;
Value value;
uint8_t r_version : 4;

LeafEntry() {
f_version = 0;
r_version = 0;
value = kValueNull;
key = 0;
}
} attribute((packed));

The size of LeafEntry is still 18 bytes, not 17. Is there any other solution to save 1 byte?

it seems to be no 16bit lock on NIC memory

Hello! it says in sherman paper that "we make the granularity of RDMA_CAS finer (16 bits rather than 64 bits), by applying an infrequently used RDMA verb called masked compare and swap [2], which allows us to select a portion of 64-bit for RDMA_CAS operations" . But I didn't find any use of " masked compare and swap " in the project and instead the function "cas_dm" uses 64 bits' "compare" and "swap".
Could you please explain the reason？

How to allocate chip memory on new RDMA driver

My RDMA driver is the newest instead of the one used in this project MLNX_OFED_LINUX-4.7-3.2.9.0. Although I resolve most interface incompatibility, I cannot figure out how to allocate memory on the chip. Below is the error reported, can you take a look at how to fix it?

Question regarding cache eviction

Hi.
I'm currently looking into the caching mechanism of Sherman and came up with some questions.

First of all, I have been trying to adjust the amount of index cache used by Sherman
and found that it can be done by changing kIndexCacheSize in Sherman/include/Common.h.
Just to verify, is it the right way to adjust the amount of index cache?

In addition, I also have a question regarding the cache eviction code.
In the get_a_random_entry function, k is selected through the following random function.

auto k = rand_r(&seed) % (1000ull * define::MB);

What I have understood is that, a random key between 0 and 1000ull * define::MB will be selected.
Then it will try to find a cached page which includes the given key range and evict the found page.
My question here is, what if there is no page left which includes the key range between 0 and 1000ull * define::MB?
Will it fail to find a page to evict and fall into a infinite loop (since it continuously retries) ?

Question about write completion and visbility

Hi, thanks for your paper and the open-source repo of Sherman! I have a question after reading the source code: Does the write completion indicate the write is globally visible? For example, based on my understanding, the workflow of Sherman's node split is:

(1) Create the sibling node and copy data
(2) Wait for write completion of (1)
(3) Insert the sibling node pointer into the parent node
(4) Wait for write completion of (3)

To ensure correctness, (1) must be globally visible before (3). When the parent node and sibling node belong to different MS, the RDMA ordering rule can not be applied here due to different QPs. If write completion can not guarantee visibility (the written data may still reside in MS RNIC buffer), (3) may be "reordered" before (1), since (1) may become visible after (3). I think an RDMA READ to the sibling node's MS should be added before (3) in this case, but I didn't find it in the source code (maybe I missed it).

So my question is does the write completion guarantee visibility? Given that some papers about RDMA+PM [1] suggest that not only does write completion not guarantee durability, it also does not guarantee visibility. Thanks!

[1] Challenges and Solutions for Fast Remote Persistent Memory Access ("The test works by proving that such an RDMA write may not even be visible in the server’s memory hierarchy")

Got completion with error

Hi, thanks for your open source repo of Sherman, we are happy that we can run Sherman on our cluster to learn more about this system.

The issue

We encountered protection error and deadlock running multithread and multi-machine benchmarks.

Instructions executed

We use the following instructions on each machine to run multithread and multi-machine benchmarks, which produce runtime errors. The Memcached server is on a third machine.

./hugepage.sh
./restartMemc.sh
./benchmark 2 100 4

We run the following instructions to run the single-thread and single-machine benchmark, which runs well

./hupages.sh
./restartMemc.sh
./bencharmk 1 100 1

The total number of huge pages in the hugepage.sh is modified to 4096 to reduce prepare time and huge page size is 2MiB.

Error messages

We were able to run a single-thread benchmark on a single machine, but we encountered the following errors when running multithread and multi-machine tests.

Machine configuration

As shown above, RDMA poll failed due to protection error, and deadlock was detected. We are not sure whether this is caused by the wrong hardware configuration or software bugs. The machine configuration is as follows:

The hardware configuration seems to meet the requirement of Sherman (OFED version and firmware version).

Analysis

The protection error is caused by access to invalid memory regions, but we are not sure whether this is caused by software bugs or the wrong hardware setup. The deadlock error is also confusing because the benchmarks are read-only. Can you give us some tips to debug these errors?

Missing implementation of fine-grained write back

Hi, The paper states that only the modified entry is written back to the memory pool to avoid the write amplification. However, in the source code, it seems that the whole node is already written back to the pool. Which part of the implementation is about fine-grained write-back? Look forward to your reply.

Connection code

Hi,

I am trying to understand the code of "initRDMAConnection" but it seems complex. Could you tell the difference between RemoteConnection, ThreadConnection and DirectoryConnection?

Also, what does MAX_APP_THREAD and NR_DIRECTORY mean?

Thanks.

关于搭建多节点的疑问

您好，想问一下，搭建多节点的服务话是不是需要把所有节点连接到一个memcached实例中？
比如：我要在10, 12, 14节点搭建一个分布式的sherman服务，我需要让他们都连接到相同ip地址的memcached实例？

Why range query marked with "need fix"?

Hello！I wonder why range query is not completed. Does it have to support version consistency when doing the range query which may cost a long time? I would aslo like to know why you don't use sibling pointer when doing range query.

Allocate on-chip memory failed

When I am running the benchmark, it shows that I can not allocate on-chip memory.
My enviroment is as follows. The nic is ConnectX-5 Ex and NIC Firmware version is 16.32.1010, which is higher than 16.26.4012.
What can I do to fix the problem?

YCSB benchmark

Hi,

Your paper mentioned that you evaluated with the YCSB benchmark, but benchmark.cpp seems to be custom generated data. Was it a typo in the paper or I misunderstood something?

Thanks!

Runtime Error about QP

Hi,

I have three machines where their IP addresses are 10.150.240.28, 10.150.240.30, 10.150.240.33, individually. I restrict the on-chip memory size as 64KB.

I rely on machine 10.150.240.28 to run Memcached instance, so the memcached.conf content is as follows:

10.150.240.28
11211

Then I run the command ./benchmark 3 100 10 on each machine. But I got the following errors from each machine.
10.150.240.30

kNodeCount 3, kReadRatio 100, kThreadCount 10
shared memory size: 8GB, 0x7f0240000000
cache size: 1GB
Machine NR: 3
NIC Device Memory is 128KB
I am servers 0 [255.255.255.255]
I connect server 1
failed to modify QP state to RTR
failed to modify QP state to RTS
failed to modify QP state to RTR
failed to modify QP state to RTS
failed to modify QP state to RTR
failed to modify QP state to RTS

10.150.240.28 :

kNodeCount 3, kReadRatio 100, kThreadCount 10
shared memory size: 8GB, 0x7f00c0000000
cache size: 1GB
Machine NR: 3
NIC Device Memory is 128KB
I am servers 1 [255.255.255.255]
I connect server 0
failed to modify QP state to RTR
failed to modify QP state to RTS
failed to modify QP state to RTR
failed to modify QP state to RTS

10.150.240.33:

kNodeCount 3, kReadRatio 100, kThreadCount 10
shared memory size: 8GB, 0x7f9c02000000
cache size: 1GB
Machine NR: 3
NIC Device Memory is 128KB
I am servers 2 [255.255.255.255]
I connect server 0 
I connect server 1
failed to modify QP state to RTR
failed to modify QP state to RTS
failed to modify QP state to RTR
failed to modify QP state to RTS

Any suggestions on the reason of this issue and how to solve it? Thanks.

Segmentation fault！（core dumped） when run benchmark！

When I am running benchmark, it occurs an error, it is as follows

I guess this error is caused by incorrect huagepage or memory settings

thustorage / sherman Goto Github PK

sherman's People

Stargazers

Watchers

Forkers

sherman's Issues

The issue

Instructions executed

Error messages

Machine configuration

Analysis

Recommend Projects

Recommend Topics

Recommend Org