thustorage / sherman Goto Github PK
View Code? Open in Web Editor NEWSherman: A Write-Optimized Distributed B+Tree Index on Disaggregated Memory
Sherman: A Write-Optimized Distributed B+Tree Index on Disaggregated Memory
Hi,
When I compile the project, it reports the following errors.
/home/v-baotonglu/Sherman/include/Common.h:28:55: error: dereferencing a null pointer in ‘*0’
(char *)&((type *)(0))->field - (char *)((type *)(0))
^
/home/v-baotonglu/Sherman/src/Tree.cpp:84:34: note: in expansion of macro ‘STRUCT_OFFSET’
constexpr int kLeafHdrOffset = STRUCT_OFFSET(LeafPage, hdr);
^~~~~~~~~~~~~
/home/v-baotonglu/Sherman/include/Common.h:28:35: error: conversion of ‘LeafPage*’ null pointer to ‘char*’ is not a constant expression
(char *)&((type *)(0))->field - (char *)((type *)(0))
^~~~~~~~~~~~~~~~~~~~~
/home/v-baotonglu/Sherman/src/Tree.cpp:84:34: note: in expansion of macro ‘STRUCT_OFFSET’
constexpr int kLeafHdrOffset = STRUCT_OFFSET(LeafPage, hdr);
^~~~~~~~~~~~~
/home/v-baotonglu/Sherman/include/Common.h:28:55: error: dereferencing a null pointer in ‘*0’
(char *)&((type *)(0))->field - (char *)((type *)(0))
^
/home/v-baotonglu/Sherman/src/Tree.cpp:85:38: note: in expansion of macro ‘STRUCT_OFFSET’
constexpr int kInternalHdrOffset = STRUCT_OFFSET(InternalPage, hdr);
^~~~~~~~~~~~~
/home/v-baotonglu/Sherman/include/Common.h:28:35: error: conversion of ‘InternalPage*’ null pointer to ‘char*’ is not a constant expression
(char *)&((type *)(0))->field - (char *)((type *)(0))
^~~~~~~~~~~~~~~~~~~~~
/home/v-baotonglu/Sherman/src/Tree.cpp:85:38: note: in expansion of macro ‘STRUCT_OFFSET’
constexpr int kInternalHdrOffset = STRUCT_OFFSET(InternalPage, hdr);
Any suggestions to fix it? Thanks
inline Key to_key(uint64_t k) {
return (CityHash64((char *)&k, sizeof(k)) + 1) % kKeySpace;
}
After Hashing, it is impossible to do range search and will overlap the origin {key1,value} if key2 and key1 meet hash collision
Hello, in the evaluation section of sherman, the performance of in memory database read and write operations is tested. I wonder if sherman can be used as a resource disaggregated system to run the network traffic trace of real applications,e.g. Tensorflow, Phoenix
Can I carry out this exp with only Connectx-4 nics and a 56Gb/s switch? It seems that sherman requires Connectx-5 and above ones to leverage the on-chip memory.
Hello,
I was testing your system on a single machine and only varying the number of threads in the compute node. For a single thread, it works fine. With multiple threads, the behavior is less predictable and it fails with a segmentation fault while allocating a new internal page. Attached is the stack trace from a failure when running ./benchmark 1 0 4
#0 0x00007ffff7aa48d6 in malloc () from /lib64/libc.so.6
#1 0x0000000000438ad9 in IndexCache::add_to_cache (this=0xd315d90, page=0x7fffba200450)
at /home/x/Sherman/include/IndexCache.h:104
#2 0x00000000004341c1 in Tree::page_search (this=0xd2b7890, page_addr=...,
k=@0x7ffdb10cb628: 3814333, result=..., cxt=0x0, coro_id=0, from_cache=false)
at /home/x/Sherman/src/Tree.cpp:674
#3 0x0000000000433270 in Tree::insert (this=0xd2b7890, k=@0x7ffdb10cb628: 3814333,
v=@0x7ffdb10cb620: 4785902, cxt=0x0, coro_id=0) at /home/x/Sherman/src/Tree.cpp:421
#4 0x00000000004266e0 in thread_run (id=3) at /home/x/Sherman/test/benchmark.cpp:110
Have you experienced this issue or I am doing something wrong?
Thank you!!
当我尝试利用https://github.com/ruihong123/Sherman 这个版本的开源代码复现Sherman的时候(因为缺少相应网卡硬件条件),运行./benchmark的时候,
(1) 首先会在serverEnter()处遇到问题,这个问题是SERVER_NUM_KEY并没有在memcached服务器中出现,当我尝试手动为memcached服务器利用memcached_set添加这个key的时候,才解决问题
(2) 后续会在connectNode处遇到问题,这个问题是DSMKeeper.cpp里第44行ExchangeMeta *remoteMeta = (ExchangeMeta *)memGet(getK.c_str(), getK.size()); 这里memGet执行的时候,无法通过memcached_get获取相应结果,即rc==MEMCACHED_SUCCESS始终无法满足,并最终陷入while(true)的死循环里。请问您能协助我解决这个问题吗?
Hello! After reading your paper, I have already know that there will be RDMA network traffic between computing nodes and memory nodes. But I'm not sure whether there will be RDMA network traffic between memory nodes and memory nodes, or between computing nodes and computing notes. Or I plan to capture packets on the switch. Is it feasible to identify the traffic type by analyzing the packet header? We are looking forward to your reply. Thank you very much!
Hi,
From the operation interface of Sherman (e.g., insert, search), it seems that it already implemented the logic of coroutine.
But it is unclear to me how to enable the coroutine execution of Sherman. For example, how to fill the parameter CoroContext
and coro_id
. Could you give me an example program of the coroutine execution of Sherman?
Thanks!
您好,
我有一些关于Memcached和集群搭建的问题。
我不太清楚Memcached在这个project的作用是什么。集群中的每个节点都要和Memcached通讯吗?为什么节点之间不直接相连?
Memcached是需要跑在一个单独的节点中吗?以及您能不能提供一下instruction/guide如何build/run Memcached。这部分在README中不是很清楚。
关于这个命令./benchmark kNodeCount kReadRatio kThreadCount
, kNodeCount, kReadRatio, kThreadCount各自的含义是什么?
Thanks!
I don't see the caching of "the highest two level of nodes (including root)" mentioned in paper from this code. There is only a skiplist index_cache
for the level above leaves.
Is it removed intentionally? I was trying to find this code because caching the first 2 levels does not seem compatible with fence key checks in my opinion - imagine that a b-link style split have split the children but have not propagated to upper levels.
BTW, regardless, I like the paper!
class LeafEntry {
public:
uint8_t f_version : 4;
Key key;
Value value;
uint8_t r_version : 4;
LeafEntry() {
f_version = 0;
r_version = 0;
value = kValueNull;
key = 0;
}
} attribute((packed));
The size of LeafEntry is still 18 bytes, not 17. Is there any other solution to save 1 byte?
Hello! it says in sherman paper that "we make the granularity of RDMA_CAS finer (16 bits rather than 64 bits), by applying an infrequently used RDMA verb called masked compare and swap [2], which allows us to select a portion of 64-bit for RDMA_CAS operations" . But I didn't find any use of " masked compare and swap " in the project and instead the function "cas_dm" uses 64 bits' "compare" and "swap".
Could you please explain the reason?
Hi.
I'm currently looking into the caching mechanism of Sherman and came up with some questions.
First of all, I have been trying to adjust the amount of index cache used by Sherman
and found that it can be done by changing kIndexCacheSize in Sherman/include/Common.h.
Just to verify, is it the right way to adjust the amount of index cache?
In addition, I also have a question regarding the cache eviction code.
In the get_a_random_entry function, k is selected through the following random function.
auto k = rand_r(&seed) % (1000ull * define::MB);
What I have understood is that, a random key between 0 and 1000ull * define::MB will be selected.
Then it will try to find a cached page which includes the given key range and evict the found page.
My question here is, what if there is no page left which includes the key range between 0 and 1000ull * define::MB?
Will it fail to find a page to evict and fall into a infinite loop (since it continuously retries) ?
Hi, thanks for your paper and the open-source repo of Sherman! I have a question after reading the source code: Does the write completion indicate the write is globally visible? For example, based on my understanding, the workflow of Sherman's node split is:
(1) Create the sibling node and copy data
(2) Wait for write completion of (1)
(3) Insert the sibling node pointer into the parent node
(4) Wait for write completion of (3)
To ensure correctness, (1) must be globally visible before (3). When the parent node and sibling node belong to different MS, the RDMA ordering rule can not be applied here due to different QPs. If write completion can not guarantee visibility (the written data may still reside in MS RNIC buffer), (3) may be "reordered" before (1), since (1) may become visible after (3). I think an RDMA READ to the sibling node's MS should be added before (3) in this case, but I didn't find it in the source code (maybe I missed it).
So my question is does the write completion guarantee visibility? Given that some papers about RDMA+PM [1] suggest that not only does write completion not guarantee durability, it also does not guarantee visibility. Thanks!
[1] Challenges and Solutions for Fast Remote Persistent Memory Access ("The test works by proving that such an RDMA write may not even be visible in the server’s memory hierarchy")
Hi, thanks for your open source repo of Sherman, we are happy that we can run Sherman on our cluster to learn more about this system.
We encountered protection error and deadlock running multithread and multi-machine benchmarks.
We use the following instructions on each machine to run multithread and multi-machine benchmarks, which produce runtime errors. The Memcached server is on a third machine.
./hugepage.sh
./restartMemc.sh
./benchmark 2 100 4
We run the following instructions to run the single-thread and single-machine benchmark, which runs well
./hupages.sh
./restartMemc.sh
./bencharmk 1 100 1
The total number of huge pages in the hugepage.sh
is modified to 4096 to reduce prepare time and huge page size is 2MiB.
We were able to run a single-thread benchmark on a single machine, but we encountered the following errors when running multithread and multi-machine tests.
As shown above, RDMA poll failed due to protection error, and deadlock was detected. We are not sure whether this is caused by the wrong hardware configuration or software bugs. The machine configuration is as follows:
The hardware configuration seems to meet the requirement of Sherman (OFED version and firmware version).
The protection error is caused by access to invalid memory regions, but we are not sure whether this is caused by software bugs or the wrong hardware setup. The deadlock error is also confusing because the benchmarks are read-only. Can you give us some tips to debug these errors?
Hi, The paper states that only the modified entry is written back to the memory pool to avoid the write amplification. However, in the source code, it seems that the whole node is already written back to the pool. Which part of the implementation is about fine-grained write-back? Look forward to your reply.
Hi,
I am trying to understand the code of "initRDMAConnection" but it seems complex. Could you tell the difference between RemoteConnection
, ThreadConnection
and DirectoryConnection
?
Also, what does MAX_APP_THREAD
and NR_DIRECTORY
mean?
Thanks.
您好,想问一下,搭建多节点的服务话是不是需要把所有节点连接到一个memcached实例中?
比如:我要在10, 12, 14节点搭建一个分布式的sherman服务,我需要让他们都连接到相同ip地址的memcached实例?
Hello!I wonder why range query is not completed. Does it have to support version consistency when doing the range query which may cost a long time? I would aslo like to know why you don't use sibling pointer when doing range query.
Hi,
Your paper mentioned that you evaluated with the YCSB benchmark, but benchmark.cpp seems to be custom generated data. Was it a typo in the paper or I misunderstood something?
Thanks!
Hi,
I have three machines where their IP addresses are 10.150.240.28, 10.150.240.30, 10.150.240.33, individually. I restrict the on-chip memory size as 64KB.
I rely on machine 10.150.240.28 to run Memcached instance, so the memcached.conf
content is as follows:
10.150.240.28
11211
Then I run the command ./benchmark 3 100 10
on each machine. But I got the following errors from each machine.
10.150.240.30
kNodeCount 3, kReadRatio 100, kThreadCount 10
shared memory size: 8GB, 0x7f0240000000
cache size: 1GB
Machine NR: 3
NIC Device Memory is 128KB
I am servers 0 [255.255.255.255]
I connect server 1
failed to modify QP state to RTR
failed to modify QP state to RTS
failed to modify QP state to RTR
failed to modify QP state to RTS
failed to modify QP state to RTR
failed to modify QP state to RTS
10.150.240.28 :
kNodeCount 3, kReadRatio 100, kThreadCount 10
shared memory size: 8GB, 0x7f00c0000000
cache size: 1GB
Machine NR: 3
NIC Device Memory is 128KB
I am servers 1 [255.255.255.255]
I connect server 0
failed to modify QP state to RTR
failed to modify QP state to RTS
failed to modify QP state to RTR
failed to modify QP state to RTS
10.150.240.33:
kNodeCount 3, kReadRatio 100, kThreadCount 10
shared memory size: 8GB, 0x7f9c02000000
cache size: 1GB
Machine NR: 3
NIC Device Memory is 128KB
I am servers 2 [255.255.255.255]
I connect server 0
I connect server 1
failed to modify QP state to RTR
failed to modify QP state to RTS
failed to modify QP state to RTR
failed to modify QP state to RTS
Any suggestions on the reason of this issue and how to solve it? Thanks.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.