00pauln00 / niova-core Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
A recent test containing ~130 million raft entry indexes is showing variable startup scan speed - sometimes as low at several thousand entries per second. Warming the page cache does show a large improvement and makes the situation more tenable, however, it's not a great solution since:
There are a few considerations regarding the scanning of entries during startup.
RAFT_INSTANCE_STORE_ROCKSDB_PERSISTENT_APP
guarantee that all application state has been persisted to the backend per raft state machine "apply" operation.
RAFT_INSTANCE_STORE_ROCKSDB_PERSISTENT_APP
persists the last-applied-cumulative-crc
value making possible to verify the entire stream of entries against this CRC.RAFT_INSTANCE_STORE_ROCKSDB_PERSISTENT_APP
will allow for state machine independent garbage collection of old raft log entries
What to do in the case of RAFT_INSTANCE_STORE_ROCKSDB_PERSISTENT_APP
?
I believe the scan may be removed for now and partially reinstated once the feature for state machine independent garbage collection is implemented. This feature will make it possible to bound the number of raft log entries to some limit which is reasonable for scanning.
the writeKV returns errno instead of rv. rv seems to hold the error number.
this is the output from errno when you encounter an error.
ERRO[0000] PMDB put error: PmdbObjPut(): 0
this is the change i applied to return rc
`@@ -229,8 +229,8 @@ func (obj *PmdbClientObj) writeKV(rncui string, key *C.char,
rc := C.PmdbObjPut(obj.pmdb, obj_id, key, c_key_len, &obj_stat)
if rc != 0 {
var errno syscall.Errno
return fmt.Errorf("PmdbObjPut(): %d", errno)
//var errno syscall.Errno
return fmt.Errorf("PmdbObjPut(): %d", rc)
}
return nil`
this is the output after the change. which allowed me to see the correct error number.
ERRO[0000] PMDB put error: PmdbObjPut(): -107
Similar to the ctl-interface command history, the pmdb server should present a history of recently completed operations, including information about request origination (ie raft leader, term, etc.).
The error strings returned by the RocksDB C api are allocated using strndup()
. Users of the api must be checked to ensure that they're passing in a NULL ptr address (ie initialized on the stack) and that in the case of error free()
is called.
static bool SaveError(char** errptr, const Status& s) {
assert(errptr != nullptr);
if (s.ok()) {
return false;
} else if (*errptr == nullptr) {
*errptr = strdup(s.ToString().c_str());
} else {
// TODO(sanjay): Merge with existing error?
// This is a bug if *errptr is not created by malloc()
free(*errptr);
*errptr = strdup(s.ToString().c_str());
}
return true;
}
I am running the control plane range query recipe shows some write failures as with server not available error, even if proxy is up and running and cluster is in good state.
And proxy doesn't have any write errors as well.
That means key was never sent to the proxy for write operation.
NCPC is wronly declaring proxy as dead.
"nisd.075d6832-7df9-472c-8445-cd2979ec7e4d.e40ec2b0-91d9-4d58-a82a-275a9c824a42": {
"Status": 1,
"Value": "(Service discovery) Server not available"
},
In the non-coalesce case, both the raft block as well as the pmdb_obj are immediately written into RocksDB, albeit the write may be asynchronous. This write is done in raft write context, which is serialized. Therefore, a second write arriving from a different client for the same rncui
would be detected as invalid due to the conflicting with a pending write to the same seqno. However, if the pmdb_obj is not written to RocksDB, since it's pending in the coalesce buffer, the second write may enter when it would have otherwise been denied.
Without changing the code too much (i.e. modifying filling and storage of the coalesce buffer OR writing the pmdb_objects synchronously and ahead of the raft block), it may be possible to detect the scenario in the Apply
phase and return an error to all but one of the conflicting writes (the write whose information is in the pmdb_object would see OK).
f97565a introduces a minor leak where a compiled regex is not freed.
<574.109560627:warn:raft-net-test:raft_net_init@2357> enter
<574.110220321:warn:raft-net-test:raft_net_init@2367> exit
<574.110609917:warn:raft-net-test:raft_net_client_user_id_parse@2275> 1a636bd0-d27d-11ea-8cad-90324b2d1e89:2341523123:32452300123:1:0
=================================================================
==21243==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 256 byte(s) in 1 object(s) allocated from:
#0 0x7f338a743bc8 in malloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dbc8)
#1 0x7f338a4054f5 in regcomp (/lib/x86_64-linux-gnu/libc.so.6+0x1024f5)
Direct leak of 224 byte(s) in 1 object(s) allocated from:
#0 0x7f338a743ffe in __interceptor_realloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dffe)
#1 0x7f338a404ab4 (/lib/x86_64-linux-gnu/libc.so.6+0x101ab4)
Indirect leak of 2048 byte(s) in 1 object(s) allocated from:
#0 0x7f338a743dc6 in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10ddc6)
#1 0x7f338a403678 (/lib/x86_64-linux-gnu/libc.so.6+0x100678)
Indirect leak of 1888 byte(s) in 1 object(s) allocated from:
#0 0x7f338a743bc8 in malloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dbc8)
#1 0x7f338a403b5b (/lib/x86_64-linux-gnu/libc.so.6+0x100b5b)
Indirect leak of 1888 byte(s) in 1 object(s) allocated from:
#0 0x7f338a743bc8 in malloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dbc8)
#1 0x7f338a403641 (/lib/x86_64-linux-gnu/libc.so.6+0x100641)
Indirect leak of 1888 byte(s) in 1 object(s) allocated from:
#0 0x7f338a743bc8 in malloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dbc8)
#1 0x7f338a403b6b (/lib/x86_64-linux-gnu/libc.so.6+0x100b6b)
Indirect leak of 1296 byte(s) in 1 object(s) allocated from:
#0 0x7f338a743bc8 in malloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dbc8)
#1 0x7f338a404476 (/lib/x86_64-linux-gnu/libc.so.6+0x101476)
Indirect leak of 952 byte(s) in 81 object(s) allocated from:
#0 0x7f338a743ffe in __interceptor_realloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dffe)
#1 0x7f338a404556 (/lib/x86_64-linux-gnu/libc.so.6+0x101556)
Indirect leak of 472 byte(s) in 1 object(s) allocated from:
#0 0x7f338a743bc8 in malloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dbc8)
#1 0x7f338a403b3a (/lib/x86_64-linux-gnu/libc.so.6+0x100b3a)
Indirect leak of 316 byte(s) in 66 object(s) allocated from:
#0 0x7f338a743bc8 in malloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dbc8)
#1 0x7f338a3f7c69 (/lib/x86_64-linux-gnu/libc.so.6+0xf4c69)
Indirect leak of 256 byte(s) in 8 object(s) allocated from:
#0 0x7f338a743dc6 in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10ddc6)
#1 0x7f338a4000ee (/lib/x86_64-linux-gnu/libc.so.6+0xfd0ee)
Indirect leak of 88 byte(s) in 1 object(s) allocated from:
#0 0x7f338a743dc6 in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10ddc6)
#1 0x7f338a3f9d93 (/lib/x86_64-linux-gnu/libc.so.6+0xf6d93)
Indirect leak of 64 byte(s) in 8 object(s) allocated from:
#0 0x7f338a743bc8 in malloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dbc8)
#1 0x7f338a4042fa (/lib/x86_64-linux-gnu/libc.so.6+0x1012fa)
Indirect leak of 64 byte(s) in 16 object(s) allocated from:
#0 0x7f338a743bc8 in malloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dbc8)
#1 0x7f338a40428a (/lib/x86_64-linux-gnu/libc.so.6+0x10128a)
Indirect leak of 40 byte(s) in 1 object(s) allocated from:
#0 0x7f338a743ffe in __interceptor_realloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dffe)
#1 0x7f338a3f41f4 (/lib/x86_64-linux-gnu/libc.so.6+0xf11f4)
#2 0x6020000053cf ()
Indirect leak of 32 byte(s) in 1 object(s) allocated from:
#0 0x7f338a743ffe in __interceptor_realloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dffe)
#1 0x7f338a3f41f4 (/lib/x86_64-linux-gnu/libc.so.6+0xf11f4)
#2 0x60200000568f ()
Indirect leak of 32 byte(s) in 1 object(s) allocated from:
#0 0x7f338a743ffe in __interceptor_realloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dffe)
#1 0x7f338a3f41f4 (/lib/x86_64-linux-gnu/libc.so.6+0xf11f4)
#2 0x60200000554f ()
Indirect leak of 32 byte(s) in 1 object(s) allocated from:
#0 0x7f338a743ffe in __interceptor_realloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dffe)
#1 0x7f338a3f41f4 (/lib/x86_64-linux-gnu/libc.so.6+0xf11f4)
#2 0x6020000053ef ()
Indirect leak of 32 byte(s) in 1 object(s) allocated from:
#0 0x7f338a743ffe in __interceptor_realloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dffe)
#1 0x7f338a3f41f4 (/lib/x86_64-linux-gnu/libc.so.6+0xf11f4)
#2 0x60200000528f ()
Indirect leak of 16 byte(s) in 1 object(s) allocated from:
#0 0x7f338a743ffe in __interceptor_realloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dffe)
#1 0x7f338a3f2bc3 (/lib/x86_64-linux-gnu/libc.so.6+0xefbc3)
Indirect leak of 4 byte(s) in 1 object(s) allocated from:
#0 0x7f338a743bc8 in malloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dbc8)
#1 0x7f338a3f2ad2 (/lib/x86_64-linux-gnu/libc.so.6+0xefad2)
Indirect leak of 4 byte(s) in 1 object(s) allocated from:
#0 0x7f338a743bc8 in malloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dbc8)
#1 0x7f338a3f9dcf (/lib/x86_64-linux-gnu/libc.so.6+0xf6dcf)
SUMMARY: AddressSanitizer: 12348 byte(s) leaked in 206 allocation(s).
FAIL test/raft-net-test (exit status: 1)
The output shows that lkvt completes without issuing any error, even though the pmdb client sees no successful puts.
$ ./lkvt -d 0 -cp ./config -pp 100.0
INFO[0000] starting the app...
INFO[0000] this is the random key size187
INFO[0000] this is the random value size828720
0 / 1 request completed
1 / 1 request completed
... (hangs here for a while) ...
INFO[0160] stopping member updater
INFO[0160] done
average ms per get=0
average ms per put=150002
get per sec=-Inf
operations completed=1
put per sec=0.6666576526329919
seconds to complete=150.002028185
time for gets=0
time for puts=150.002028185
INFO[0160] ms latency for puts
1.5e+05-1.5e+05 100% █▏ 1
INFO[0160] ms latency for gets
The niovakv_server's ctl-interface reports the following after the above:
{
"raft_client_root_entry" : [
{
"raft-uuid" : " ^",
"client-uuid" : "",
"leader-uuid" : "776cab6a-0fea-11ec-87e0-1b1bbe9ed57b",
"state" : "client",
"default-request-timeout-sec" : 60,
"commit-latency-msec" : {},
"read-latency-msec" : {},
"leader-viable" : true,
"leader-alive-cnt" : 71,
"last-request-sent" : "Thu Jan 01 00:00:00 UTC 1970",
"last-request-ack" : "Thu Jan 01 00:00:00 UTC 1970",
"recent-ops-wr" : [
{
"sub-app-user-id" : "6ec6f9c2-9008-4000-8491-9632f4732774:0:0:0:1",
"rpc-id" : 8906007806612602918,
"rpc-user-tag" : 0,
"blocking" : true,
"status" : "Communication error on send",
"server" : "0.0.0.0:0", <---------------------------------------------
"submitted" : "Tue Sep 21 17:28:23 UTC 2021",
"attempts" : 0,
"completion-time-ms" : 0,
"timeout-ms" : 60000,
"reply-size" : 0,
"op" : "write"
}
],
"recent-ops-rd" : [],
"pending-ops" : []
}
],
"raft_net_bulk_recovery_info" : {},
"raft_net_info" : {
"ignore_timer_events" : false
}
}
tcp_mgr_connection_verify() may return EAGAIN if the connection is not established between client and leader.
But currently raft_net_send_msg() typecasts all error codes to -ECOMM
and later in raft_client_rpc_sendq_dequeue_head_and_send() on error condition rcrh_cancel gets set and client never retries
that request.
Multiple servers aborts with following assert from raft_server_append_entry_log_prepare_and_check() :
// At this point, current_idx should not exceed the one from the leader.
NIOVA_ASSERT(reh.reh_index <= raerq->raerqm_prev_log_index);
<17238497.184134622:warn:pumice-dict-ser:raft_server_becomes_follower@1961> r:F et[s:u]=56:56 ei[s:u]=0:0 ht=56 hs=56 c=-1 la=-1:-1 lck=-1 v=7f583666-a963-11eb-8986-bf8a88ea0b58 l=7f583666-a963-11eb-8986-bf8a88ea0b58 sender-uuid=7f58dbd4-a963-11eb-a701-8f7d081fe073 term=58 rsn=stale-leader
<17238497.188614012:warn:pumice-dict-ser:raft_server_append_entry_log_prune_if_needed@2623> r:F et[s:u]=56:56 ei[s:u]=0:0 ht=58 hs=57 c=-1 la=-1:-1 lck=-1 v=00000000-0000-0000-0000-000000000000 l=7f5a494c-a963-11eb-976c-030a056a075f raerqm_prev_log_term=0 raerqm_prev_log_index=-1
<17238497.189560109:fatal:pumice-dict-ser:raft_server_append_entry_log_prepare_and_check@2762> failed assertion: reh.reh_index <= raerq->raerqm_prev_log_index
SIGABRT: abort
The initial gossip delay provides empty tag data to the niovakv_client, which uses tag data to send http request. Since, the tag is empty niovakv_client send HTTP request to invalid address. This has been solved by adding a check before using the addr to send the request.
Leader had some entries in cowr RB tree (Term 10) and leader gets changed and becomes follower (Term 11).
Again the same follower gets elected as new leader (Term 11).
If client retries the same writes which were tried in term 10 and was present in coalesced write RB tree, leader will find
the older entry in cowr RBTree with stale data.
We should remove the stale entries from cowr RB tree whenever leader changes.
When niovakv server is started, the serf is started before the gossip is issued! It creates a problem in niovakv_client_api while sending the HTTP request!
As Paul investigated, there is missing free for corresponding malloc in raft_net_write_supp_add() and only happening for followers:
It was happening as raft_net_sm_write_supplements_merge() in raft_server_state_machine_apply() was getting called
only when ri_server_sm_request_cb() returns success.
But ri_server_sm_request_cb() could return error for follower as pmdb_object_lookup() would return error for followers
as pmdb object gets created in write phase only for leader as marker.
Solution: raft_net_sm_write_supplements_merge() should get called always.
"raft-uuid" : "11afb04a-f9cc-11ea-b0e5-831d7b183899",
"peer-uuid" : "11b0c976-f9cc-11ea-b60b-ff721df40ac4",
"voted-for-uuid" : "00000000-0000-0000-0000-000000000000",
"leader-uuid" : "11b0e7f8-f9cc-11ea-9d54-7bda2902f51c",
"state" : "follower",
"follower-reason" : "leader-already-present",
"client-requests" : "redirect-to-leader",
"term" : 125840,
----------> "commit-idx" : -1,
----------> "last-applied" : 130836835,
"last-applied-cumulative-crc" : 943050041,
"sync-freq-us" : 4000,
"sync-cnt" : 14508,
----------> "sync-entry-idx" : 130978289,
"sync-entry-term" : 125805,
"sync-entry-data-size" : 128,
"sync-entry-crc" : 3468325892,
"unsync-entry-idx" : 130978307,
"unsync-entry-term" : 125805,
"unsync-entry-data-size" : 128,
"unsync-entry-crc" : 241182324,
During the testing of niovakv application using lkvt , hit the following assert:
<1198901.930547566:fatal:niovakv_pmdbser:pmdb_sm_handler_pmdb_sm_apply_remove_coalesce_tree_item@867> failed assertion: pmdb_cowr_tree_term == -1 || pmdb_cowr_tree_term == rncr->rncr_current_term
This assert expects any stale entries in cowr subapp RB tree must have already been released and pmdb_cowr_tree_term would be set to -1
OR pmdb_cowr_tree_term value is same as ncr->rncr_current_term i.e all the elements in the cowr sub app tree are from the same term.
There are multiple leader election happened in the cluster before hitting this assert.
Analysis:
When can pmdb_cowr_tree_term not be -1 and not same as rncr->rncr_current_term is:
In the bulk recovery recipe, after some num_of_writes we are trying to start new peer(say peer4) but it fails to bootup properly everytime.
The error on console:
**```
TASK [common : Start the peer] ******************************************************************************************************************************************
Thursday 25 November 2021 12:02:16 +0000 (0:00:00.098) 0:04:30.929 *****
ok: [localhost]
TASK [common : pause] ***************************************************************************************************************************************************
Thursday 25 November 2021 12:02:17 +0000 (0:00:00.286) 0:04:31.216 *****
Pausing for 3 seconds
(ctrl+C then 'C' = continue early, ctrl+C then 'A' = abort)
ok: [localhost]
TASK [common : Wait for server process to bootup completely] ************************************************************************************************************
Thursday 25 November 2021 12:02:20 +0000 (0:00:03.034) 0:04:34.250 *****
ERROR! A worker was found in a dead state
And in the peer4 log file it shows error:
<16843861.914079839:warn:pumice-referenc:env_parse@201> env-var NIOVA_INOTIFY_BASE_PATH value /home/makshata/bulk_testing/7e4e791c-4df8-11ec-9f8c-b765ffa10fdc/ctl-interface applied from environment
<16843861.917472515:warn:pumice-referenc:env_parse@201> env-var NIOVA_LOCAL_CTL_SVC_DIR value /home/makshata/bulk_testing/7e4e791c-4df8-11ec-9f8c-b765ffa10fdc/configs applied from environment
<16843861.929415922:warn:pumice-referenc:raft_server_set_max_scan_entries@4912> max_scan_entries=100000
<16843861.929461477:warn:pumice-referenc:raft_server_set_log_reap_factor@4924> log_reap_factor=5
<16843861.929470882:warn:pumice-referenc:raft_server_set_num_checkpoints@4937> num_checkpoints=5
<16843861.929598491:warn:pumice-referenc:raft_epoll_setup_net@869> epoll for tcp for server
<16843861.929659826:warn:pumice-referenc:raft_net_epoll_setup@1051> return code of raft_epoll_setup_net is: 0
<16843861.931249832:error:pumice-referenc:rsbr_db_open_internal@2608> rocksdb_open_column_families(/home/makshata/bulk_testing/7e4e791c-4df8-11ec-9f8c-b765ffa10fdc/raftdb/7eac16c6-4df8-11ec-944f-a3d2c7f96389.raftdb/db'): Invalid argument: /home/makshata/bulk_testing/7e4e791c-4df8-11ec-9f8c-b765ffa10fdc/raftdb/7eac16c6-4df8-11ec-944f-a3d2c7f96389.raftdb/db: does not exist (create_if_missing is false) (try-create=no) <16843861.955763581:warn:pumice-referenc:rsbr_db_open_internal@2608> rocksdb_open_column_families(
/home/makshata/bulk_testing/7e4e791c-4df8-11ec-9f8c-b765ffa10fdc/raftdb/7eac16c6-4df8-11ec-944f-a3d2c7f96389.raftdb/db'): Success (try-create=yes)
<16843861.958094715:warn:pumice-referenc:rsbr_prep_raft_instance_from_db@2652> entry-idxs: lowest=-1 highest=-1
<16843861.982384089:warn:pumice-referenc:raft_server_process_bulk_recovery_check@3297> r:F et[s:u]=0:0 ei[s:u]=-1:-1 ht=0 hs=0 c=-1 la=-1:-1 lck=-1 v=00000000-0000-0000-0000-000000000000 l= needs bulk recovery (leader-lowest-idx=16243)
<16843861.982420611:warn:pumice-referenc:raft_server_main_loop@5745> epoll_mgr_wait_and_process_events(): Success
<16843861.982455203:warn:pumice-referenc:epoll_mgr_close@170> epm=0x7f664d60ef10 cannot be destroyed num_handles=2 (Device or resource busy)
<16843861.982469429:error:pumice-referenc:raft_net_instance_shutdown@1202> raft_net_epoll_cleanup(): Device or resource busy
<16843861.983068527:error:pumice-referenc:raft_net_evp_remove@999> raft_net_epoll_handle_remove_by_fd(1): No such file or directory
<16843861.983088491:error:pumice-referenc:raft_net_evp_remove@999> raft_net_epoll_handle_remove_by_fd(2): No such file or directory
<16843861.983097167:error:pumice-referenc:raft_net_evp_remove@999> raft_net_epoll_handle_remove_by_fd(3): No such file or directory
<16843861.983124291:error:pumice-referenc:raft_server_instance_shutdown@5668> raft_server_evp_cleanup(): No such file or directory
<16843861.984653600:error:pumice-referenc:raft_net_instance_shutdown@1211> ri_shutdown_cb(): No such file or directory
<16843861.984734368:error:pumice-referenc:raft_server_instance_run@5954> raft_net_instance_shutdown(): Device or resource busy
basic_leader_election2 recipe is failing continuously with niova-core master branch. Its failing while electing a leader.
Error on console:
TASK [basic_leader_election2: Wait until leader election happens.] ****************************************************************************************************************************************
Thursday 13 January 2022 08:41:41 +0000 (0:00:00.064) 0:00:16.018 ******
ERROR! A worker was found in a dead state
And the error from pmdb server logs:
`<21057754.661246683:warn:raft-reference-:env_parse@201> env-var NIOVA_INOTIFY_BASE_PATH value /home/makshata/holon_log/98a6d294-744c-11ec-ba45-87528381a0dc/ctl-interface applied from environment
<21057754.661311728:warn:raft-reference-:env_parse@201> env-var NIOVA_LOCAL_CTL_SVC_DIR value /home/makshata/holon_log/98a6d294-744c-11ec-ba45-87528381a0dc/configs applied from environment
<21057754.855329109:warn:raft-reference-:raft_server_become_candidate@1962> r:P et[s:u]=0:0 ei[s:u]=-1:-1 ht=0 hs=1 c=-1 la=-1:-1 lck=-1 v=00000000-0000-0000-0000-000000000000 l=
<21057755.038581618:warn:raft-reference-:raft_server_become_candidate@1962> r:P et[s:u]=0:0 ei[s:u]=-1:-1 ht=0 hs=1 c=-1 la=-1:-1 lck=-1 v=00000000-0000-0000-0000-000000000000 l=
<21057755.335830090:warn:raft-reference-:raft_server_become_candidate@1962> r:P et[s:u]=0:0 ei[s:u]=-1:-1 ht=0 hs=1 c=-1 la=-1:-1 lck=-1 v=00000000-0000-0000-0000-000000000000 l=
<21057755.536117475:warn:raft-reference-:raft_server_become_candidate@1962> r:P et[s:u]=0:0 ei[s:u]=-1:-1 ht=0 hs=1 c=-1 la=-1:-1 lck=-1 v=00000000-0000-0000-0000-000000000000 l=
<21057755.746356631:warn:raft-reference-:raft_server_become_candidate@1962> r:P et[s:u]=0:0 ei[s:u]=-1:-1 ht=0 hs=1 c=-1 la=-1:-1 lck=-1 v=00000000-0000-0000-0000-000000000000 l=
<21057755.905560337:warn:raft-reference-:raft_server_become_candidate@1962> r:P et[s:u]=0:0 ei[s:u]=-1:-1 ht=0 hs=1 c=-1 la=-1:-1 lck=-1 v=00000000-0000-0000-0000-000000000000 l=
<21057756.085834968:warn:raft-reference-:raft_server_become_candidate@1962> r:P et[s:u]=0:0 ei[s:u]=-1:-1 ht=0 hs=1 c=-1 la=-1:-1 lck=-1 v=00000000-0000-0000-0000-000000000000 l=
<21057756.310067763:warn:raft-reference-:raft_server_become_candidate@1962> r:P et[s:u]=0:0 ei[s:u]=-1:-1 ht=0 hs=1 c=-1 la=-1:-1 lck=-1 v=00000000-0000-0000-0000-000000000000 l=
<21057756.543342434:warn:raft-reference-:raft_server_become_candidate@1962> r:P et[s:u]=0:0 ei[s:u]=-1:-1 ht=0 hs=1 c=-1 la=-1:-1 lck=-1 v=00000000-0000-0000-0000-000000000000 l=
<21057756.734550075:warn:raft-reference-:raft_server_become_candidate@1962> r:P et[s:u]=0:0 ei[s:u]=-1:-1 ht=0 hs=1 c=-1 la=-1:-1 lck=-1 v=00000000-0000-0000-0000-000000000000 l=
<21057756.960809726:warn:raft-reference-:raft_server_become_candidate@1962> r:P et[s:u]=0:0 ei[s:u]=-1:-1 ht=0 hs=1 c=-1 la=-1:-1 lck=-1 v=00000000-0000-0000-0000-000000000000 l=
<21057757.248043264:warn:raft-reference-:raft_server_become_candidate@1962> r:P et[s:u]=0:0 ei[s:u]=-1:-1 ht=0 hs=1 c=-1 la=-1:-1 lck=-1 v=00000000-0000-0000-0000-000000000000 l=
<21057757.426285901:warn:raft-reference-:raft_server_become_candidate@1962> r:P et[s:u]=0:0 ei[s:u]=-1:-1 ht=0 hs=1 c=-1 la=-1:-1 lck=-1 v=00000000-0000-0000-0000-000000000000 l=
<21057757.630489983:warn:raft-reference-:raft_server_become_candidate@1962> r:P et[s:u]=0:0 ei[s:u]=-1:-1 ht=0 hs=1 c=-1 la=-1:-1 lck=-1 v=00000000-0000-0000-0000-000000000000 l=
<21057757.888717280:warn:raft-reference-:raft_server_become_candidate@1962> r:P et[s:u]=0:0 ei[s:u]=-1:-1 ht=0 hs=1 c=-1 la=-1:-1 lck=-1 v=00000000-0000-0000-0000-000000000000 l=
<21057758.136951212:warn:raft-reference-:raft_server_become_candidate@1962> r:P et[s:u]=0:0 ei[s:u]=-1:-1 ht=0 hs=1 c=-1 la=-1:-1 lck=-1 v=00000000-0000-0000-0000-000000000000 l=
<21057758.293212469:warn:raft-reference-:raft_server_become_candidate@1962> r:P et[s:u]=0:0 ei[s:u]=-1:-1 ht=0 hs=1 c=-1 la=-1:-1 lck=-1 v=00000000-0000-0000-0000-000000000000 l=
<21057758.585437112:warn:raft-reference-:raft_server_become_candidate@1962> r:P et[s:u]=0:0 ei[s:u]=-1:-1 ht=0 hs=1 c=-1 la=-1:-1 lck=-1 v=00000000-0000-0000-0000-000000000000 l=
<21057758.848726960:warn:raft-reference-:raft_server_become_candidate@1962> r:P et[s:u]=0:0 ei[s:u]=-1:-1 ht=0 hs=1 c=-1 la=-1:-1 lck=-1 v=00000000-0000-0000-0000-000000000000 l=
<21057759.124964410:warn:raft-reference-:raft_server_become_candidate@1962> r:P et[s:u]=0:0 ei[s:u]=-1:-1 ht=0 hs=1 c=-1 la=-1:-1 lck=-1 v=00000000-0000-0000-0000-000000000000 l=
<21057759.324255956:warn:raft-reference-:raft_server_become_candidate@1962> r:P et[s:u]=0:0 ei[s:u]=-1:-1 ht=0 hs=1 c=-1 la=-1:-1 lck=-1 v=00000000-0000-0000-0000-000000000000 l=
<21057759.543424201:warn:raft-reference-:raft_server_become_candidate@1962> r:P et[s:u]=0:0 ei[s:u]=-1:-1 ht=0 hs=1 c=-1 la=-1:-1 lck=-1 v=00000000-0000-0000-0000-000000000000 l=
<21057759.694657303:warn:raft-reference-:raft_server_become_candidate@1962> r:P et[s:u]=0:0 ei[s:u]=-1:-1 ht=0 hs=1 c=-1 la=-1:-1 lck=-1 v=00000000-0000-0000-0000-000000000000 l=
<21057759.924861058:warn:raft-reference-:raft_server_become_candidate@1962> r:P et[s:u]=0:0 ei[s:u]=-1:-1 ht=0 hs=1 c=-1 la=-1:-1 lck=-1 v=00000000-0000-0000-0000-000000000000 l=
<21057760.122084118:warn:raft-reference-:raft_server_become_candidate@1962> r:P et[s:u]=0:0 ei[s:u]=-1:-1 ht=0 hs=1 c=-1 la=-1:-1 lck=-1 v=00000000-0000-0000-0000-000000000000 l=
<21057760.398276729:warn:raft-reference-:raft_server_become_candidate@1962> r:P et[s:u]=0:0 ei[s:u]=-1:-1 ht=0 hs=1 c=-1 la=-1:-1 lck=-1 v=00000000-0000-0000-0000-000000000000 l=
<21057760.623478066:warn:raft-reference-:raft_server_become_candidate@1962> r:P et[s:u]=0:0 ei[s:u]=-1:-1 ht=0 hs=1 c=-1 la=-1:-1 lck=-1 v=00000000-0000-0000-0000-000000000000 l=
<21057760.874726445:warn:raft-reference-:raft_server_become_candidate@1962> r:P et[s:u]=0:0 ei[s:u]=-1:-1 ht=0 hs=1 c=-1 la=-1:-1 lck=-1 v=00000000-0000-0000-0000-000000000000 l=
<21057761.030914781:warn:raft-reference-:raft_server_become_candidate@1962> r:P et[s:u]=0:0 ei[s:u]=-1:-1 ht=0 hs=1 c=-1 la=-1:-1 lck=-1 v=00000000-0000-0000-0000-000000000000 l=
<21057761.296111184:warn:raft-reference-:raft_server_become_candidate@1962> r:P et[s:u]=0:0 ei[s:u]=-1:-1 ht=0 hs=1 c=-1 la=-1:-1 lck=-1 v=00000000-0000-0000-0000-000000000000 l=
<21057761.563381553:warn:raft-reference-:raft_server_become_candidate@1962> r:P et[s:u]=0:0 ei[s:u]=-1:-1 ht=0 hs=1 c=-1 la=-1:-1 lck=-1 v=00000000-0000-0000-0000-000000000000 l=
<21057761.826651712:warn:raft-reference-:raft_server_become_candidate@1962> r:P et[s:u]=0:0 ei[s:u]=-1:-1 ht=0 hs=1 c=-1 la=-1:-1 lck=-1 v=00000000-0000-0000-0000-000000000000 l=
<21057761.827759377:warn:raft-reference-:raft_server_become_candidate@1962> r:C et[s:u]=0:0 ei[s:u]=-1:-1 ht=1 hs=2 c=-1 la=-1:-1 lck=-1 v=98f1fe2c-744c-11ec-80cb-3351cd5aaf2d l=
<21057761.828802377:warn:raft-reference-:raft_server_candidate_becomes_leader@2286> r:L et[s:u]=0:1 ei[s:u]=-1:0 ht=1 hs=2 c=-1 la=-1:-1 lck=-1 v=98f1fe2c-744c-11ec-80cb-3351cd5aaf2d l=
<21057761.859106449:fatal:raft-reference-:rst_sm_handler_commit@180> failed assertion: rtdb->rtdb_op == RAFT_TEST_DATA_OP_WRITE
#0 0x0000556194aec2fd in niova_backtrace_dump () at src/niova_backtrace.c:66
#1 0x0000556194aee8f2 in rst_sm_handler_commit () at test/raft-reference-server.c:180
#2 0x0000556194aee8f2 in raft_server_test_rst_sm_handler () at test/raft-reference-server.c:532
#3 0x00007f664053b750 in raft_server_state_machine_apply () at src/raft_server.c:4705
#4 0x00007f664053e076 in raft_server_sm_apply_evp_cb () at src/raft_server.c:4820
#5 0x00007f66407e1925 in epoll_mgr_wait_and_process_events () at src/epoll_mgr.c:605
#6 0x00007f6640554ac5 in raft_server_main_loop () at src/raft_server.c:5744
#7 0x00007f6640554ac5 in raft_server_instance_run () at src/raft_server.c:5931
#8 0x0000556194aec07a in main () at test/raft-reference-server.c:632`
Please find attached .zip file for logs.
basic_leader_election2.zip
The Basic Leader Election recipe (https://github.com/00pauln00/niova-core/wiki/Recipe%3A-Basic-Leader-Election) is failing due to the occasional occurrence of a second election. This error was detected during a CI build in the niova-CI
branch. The issue is that the recipe does not cope with a case where peer-X becomes leader but cannot complete its initial commit before peer-Y times out, becomes a candidate, and issues its term value to the network. Eventually peer-X (or another peer which has stored peer-X's initial commit) but this will cause one more elections to occur which pushes the commit-idx past the intended value of 0.
Sometimes the LOOKUP operation on an expired lease gives out the following output with the TTL set to some value instead of '0'
1 {
2 "Request": {
3 "Client": "529fc41a-a770-11ed-a40b-3a0bc855fd65",
4 "Resource": "52638892-a770-11ed-bcf8-3a0bc855fd65",
5 "Operation": "LOOKUP"
6 },
7 "Response": {
8 "Client": "529fc41a-a770-11ed-a40b-3a0bc855fd65",
9 "Resource": "52638892-a770-11ed-bcf8-3a0bc855fd65",
10 "Status": "Success",
11 "LeaseState": "EXPIRED",
12 "TTL": 56,
13 "TimeStamp": {
14 "LeaderTerm": 1,
15 "LeaderTime": 1146
16 }
17 }
18 }
With new changes to client request to pass number of values in commit:
56eb7de:
It's showing wrong behavior in following 3 cases.
Following are the client commands and its output:
Observation:
As number of values is 0, it should use random value for num_vals.
And as there are 100 writes , the total writes /seqno should be in multiple of 100.
But here the app-seqno is 842.
cmd: "[email protected]:ce5ba4be-781c-11eb-a693-b3e3dccb1b44:0:0:0:0.100.1"
Output: "/0/app-seqno": 1,
"/0/app-sync": false,
"/0/app-user-id": "ce5ba4be-781c-11eb-a693-b3e3dccb1b44:0:0:0:0",
"/0/pmdb-seqno": 0,
"/0/pmdb-write-pending": false,
"/0/status": "Success"
Observation:
Here the number of writes is 1 for 100 write requests. But the app-seqno is 1.
Expected result: 99
cmd: "[email protected]:019f37da-781f-11eb-8b51-8ffbba46e3cd:0:0:0:0.100
"/0/app-seqno": 1765,
"/0/app-sync": false,
"/0/app-user-id": "019f37da-781f-11eb-8b51-8ffbba46e3cd:0:0:0:0",
"/0/pmdb-seqno": 0,
"/0/pmdb-write-pending": false,
"/0/status": "Success"
Observation: Here the number of values is not passed at all. In this case as well, we take random number. But as number of write requests is 100, app-seqno should be in multiple of 100's
As @00pauln00 suggested, if num_of_values is not specified, it should take default as 1 to make it compatible with older version.
<19230.236487815:warn:niovakv_pmdbser:raft_server_leader_calculate_committed_idx@3248> idx=0 sorted-idx=-1, peer-sync-idx=-1
<19230.236495255:warn:niovakv_pmdbser:raft_server_leader_calculate_committed_idx@3248> idx=1 sorted-idx=-1, peer-sync-idx=4524
<19230.236500298:warn:niovakv_pmdbser:raft_server_leader_calculate_committed_idx@3248> idx=2 sorted-idx=-1, peer-sync-idx=4524
<19230.236503862:warn:niovakv_pmdbser:raft_server_leader_calculate_committed_idx@3248> idx=3 sorted-idx=4524, peer-sync-idx=-1
<19230.236507029:warn:niovakv_pmdbser:raft_server_leader_calculate_committed_idx@3248> idx=4 sorted-idx=4524, peer-sync-idx=-1
panic: runtime error: slice bounds out of range [:33] with length 30
goroutine 1 [running]:
niova/go-pumicedb-lib/server.PmdbLookupKey(0xc0001183a4, 0xa, 0xa, 0x52b99e, 0x9, 0xc000126230, 0x2, 0x2, 0xc000126230)
/home/lucusn/niova-core/go/pumiceDB/server/PumiceDBServer.go:221 +0x2b0
niova/go-pumicedb-lib/server.PmdbReadKV(...)
/home/lucusn/niova-core/go/pumiceDB/server/PumiceDBServer.go:282
niova/go-pumicedb-lib/server.(*PmdbServerObject).ReadKV(0xc0001316c0, 0x7f2c25ec9be0, 0xc0001183a4, 0xa, 0xa, 0x52b99e, 0x9, 0xc000159a10, 0x0, 0x64, ...)
/home/lucusn/niova-core/go/pumiceDB/server/PumiceDBServer.go:292 +0x5d
main.(*NiovaKVServer).Read(0xc000131500, 0x7f2c25ec9be0, 0x7f2c25ec9c30, 0x55, 0x29d60b0, 0x3ff9a8, 0x465925)
/home/lucusn/niova-core/go/pumiceDB/examples/niovakv/pmdb/server/niovakv_pmdbserver.go:183 +0x28e
niova/go-pumicedb-lib/server.goRead(0x7f2c25ec9be0, 0x7f2c25ec9c30, 0x55, 0x29d60b0, 0x3ff9a8, 0x295cf40, 0xa)
/home/lucusn/niova-core/go/pumiceDB/server/PumiceDBServer.go:112 +0x88
niova/go-pumicedb-lib/server._Cfunc_PmdbExec(0x29594a0, 0x294d2b0, 0xc0001013d0, 0xc0001220a0, 0x100000001, 0x295cf40, 0x0)
_cgo_gotypes.go:163 +0x48
niova/go-pumicedb-lib/server.PmdbStartServer.func2(0x29594a0, 0x294d2b0, 0xc0001013d0, 0xc0001220a0, 0x295cf40, 0x10)
/home/lucusn/niova-core/go/pumiceDB/server/PumiceDBServer.go:169 +0xda
niova/go-pumicedb-lib/server.PmdbStartServer(0xc0001316c0, 0x0, 0x0)
/home/lucusn/niova-core/go/pumiceDB/server/PumiceDBServer.go:169 +0x325
niova/go-pumicedb-lib/server.(*PmdbServerObject).Run(...)
/home/lucusn/niova-core/go/pumiceDB/server/PumiceDBServer.go:179
main.main()
/home/lucusn/niova-core/go/pumiceDB/examples/niovakv/pmdb/server/niovakv_pmdbserver.go:67 +0x3b8
pmdb server has crashed when making read/write with lkvt. I have been able to reproduce this error by doing put/gets with a total of only 16 puts being done by 5 concurrent methods and by running 100 puts with only one method. key size of 10 bytes and value size of 20 bytes. 5 pmdb servers were running and one niovakv_server as Node1. pmdb_servers will not come back on until niovakv_server is killed. No error was returned to log.
The leader requires an untenable amount of time to determine the next-idx of a newly initialized peer
To deal with this issue the discovery method should employ some sort of binary reduction. Such a method would reduce the time taken from O(N) to O(logN).
Connection management should only be done from epoll thread, so there should be a mechanism to guarantee that behavior.
I'm wondering if we should create a ctl-svc entry for a client whose info had not been declared at server startup time? This would reinstate some of the functionality which had existed prior to tcp support while still (hopefully) allowing for tcp operation to function properly.
If not we need to address the comment in raft_server_client_recv_ignore_request()
which incorrectly states:
/* Lookup the client in the ctl-svc-node tree - existence is not
* mandatory.
*/
If PMDB Client has finished its execution and terminates then we get the following epoll error
<11042993.414332931:warn:LeaseApp_pmdbSe:epoll_handle_del_complete@330> epoll_ctl(epm_fd=18, eph_fd=35, EPOLL_CTL_DEL): Bad file descriptor
<11042993.414358470:warn:LeaseApp_pmdbSe:epoll_mgr_reap_destroy_list@436> epoll_handle_del_complete(eph=0x95bcc8): Bad file descriptor
This behavior is reproducible in the leaseApplication
branch by starting the PMDB Servers and sending any request from the client side. When the client finishes executing, the above error appears in the PMDB Server logs.
After writing just 7000 raft transactions, the niovakv server and pmdb servers are generating rather large logs full of debug info such as:
`time="21-09-2021 13:11:39" level=info msg="Key passed by client: Key289
This information should be available if requested by the user, however, by default such msgs must be disabled.
pauln@niova:/tmp$ ls -lrt [0-9a-f]*.log
-rw-r--r-- 1 pauln pauln 2425139 Sep 21 13:11 776cab6a-0fea-11ec-87e0-1b1bbe9ed57b.log
-rw-r--r-- 1 pauln pauln 59028765 Sep 21 13:11 776c882e-0fea-11ec-a557-3b6c731ca67e.log
-rw-r--r-- 1 pauln pauln 2300431 Sep 21 13:11 776c638a-0fea-11ec-a731-dbaee6d9855a.log
-rw-r--r-- 1 pauln pauln 2653105 Sep 21 13:11 776c3f68-0fea-11ec-bfa8-87ef3c13525a.log
-rw-r--r-- 1 pauln pauln 2300638 Sep 21 13:11 776c1b28-0fea-11ec-adb3-5b7b4f0914f1.log
244 NIOVA_ASSERT(rncr && rncr->rncr_request && rncr->rncr_is_leader &&
245 rncr->rncr_type == RAFT_NET_CLIENT_REQ_TYPE_WRITE &&
246 raft_net_client_request_handle_writes_raft_entry(rncr));
The condition that's failing is raft_net_client_request_handle_writes_raft_entry(rncr),
which implies that rncr->rncr_write_raft_entry
is false.
Reproduction steps:
scripts/raft-mk-configs.sh /tmp
;PmdbObjGetX asks for buffer to store the value of the key and length of that buffer.
But at client side, application may not know exact size of the value and may allocate smaller buffer
to read the value. In that case E2BIG error is returned to application. But currently there is no way for
application to know the exact size of the value.
On getting the E2BIG, if it knows the exact value size, it can allocate bigger buffer and retry the read request.
We get the size of the value from server in rcrh_reply_size.
This value can be returned to application.
when using lkvt directly connected to pmdb the ctl-interface of the client shows the following
"raft_client_root_entry" : [
{
"raft-uuid" : "<8E>^^^^B",
"client-uuid" : "W^^^]^B",
"leader-uuid" : "337fda78-d7bb-11ec-9592-2ff3607520ea",
"state" : "client",
"default-request-timeout-sec" : 60,
"commit-latency-msec" : {},
"read-latency-msec" : {},
"leader-viable" : true,
"leader-alive-cnt" : 21,
"last-request-sent" : "Thu Jan 01 00:00:00 UTC 1970",
"last-request-ack" : "Thu Jan 01 00:00:00 UTC 1970",
"recent-ops-wr" : [],
"recent-ops-rd" : [],
"pending-ops" : []
}
],
$ ./niovakv_client -c ../lkvt/config -l /tmp/foo -n 200
logfile: /tmp/foo
0 / 200 write request completed
0 / 200 read request completed
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x75ca38]
goroutine 1193 [running]:
niovakv/clientapi.(*ClientAPI).Get(...)
/home/pauln/Code/niova-core/go/pumiceDB/examples/niovakv/niovakv_client_api/niovakv_client_api.go:72
main.(*niovakv_client).sendReq(0xc000220270, 0xc000618a40, 0x0)
/home/pauln/Code/niova-core/go/pumiceDB/examples/niovakv/niovakv_client/niovakv_client.go:95 +0x5f8
created by main.(*niovakv_client).doWrite_Read
/home/pauln/Code/niova-core/go/pumiceDB/examples/niovakv/niovakv_client/niovakv_client.go:144 +0x1ad
<2174445.790382511:fatal:niovakv_pmdbser:pmdb_sm_handler_pmdb_sm_apply_remove_coalesce_tree_item@901> unmatching client / cowr (776e099c-0fea-11ec-b722-b70849783703, 00000000-0000-0000-0000-000000000000) uuid or rncui (7a94c7d4-f7e8-4c0d-9265-a0c1f35bb726:0:0:0:1218887, 7a94c7d4-f7e8-4c0d-9265-a0c1f35bb726:0:0:0:1218887), or cowr_sa term: 1008/ rncr term: 1008
While invalid write seqno the write cmd using a write-seqno == 1 (which is expected to fail since ‘0’ is the expected seqno) The Expected result should be status : "Invalid exchange"
and pmdb-seqno: -1
But actual result is:
"pumice_db_test_client" : {
"pmdb-test-apps" : [
{
"app-user-id" : "43516d34-da3e-11eb-b47f-7b78b41234d2:0:0:0:0",
"status" : "Invalid or incomplete multibyte or wide character"
,
"pmdb-seqno" : 0
,
"pmdb-write-pending" : false,
"last-request" : "Thu Jul 01 07:30:54 UTC 2021",
"last-request-duration-ms" : 12,
"last-request-tag" : 1254333918,
"app-sync" : false,
"app-seqno" : 2,
"app-value" : 1211055621,
"app-validated-seqno" : 0
}
],
ctl_svc_node_destruct()
must ensure that all user have released their dynamically allocated memory. For now we need to watch out for struct tcp_mgr_connection
, but later there may be others which also need to be checked.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.