basho / riak_kv Goto Github PK
View Code? Open in Web Editor NEWRiak Key/Value Store
License: Apache License 2.0
Riak Key/Value Store
License: Apache License 2.0
Add the following from https://github.com/basho/cabal/issues/15 for 1.3 release:
PUT FSM timings
GET FSM timings
Per Vnode request operation and duration
Read Repair Stats (count, sent to dest node, repair reason, dest type (fallback/primary)
Riak_api pb_connects / pb_active should use supervisor:count_children() since a crashing connection will not call the terminate fun so active does not shrink appropriately.
It'd be nice if we tracked down all of the lingering timeouts that can potentially cause a stats call to fail. We often error out with a 500 with all of our stats contained in the body of it. It would be better if some stats cannot be fetched in a timely manner that we just drop or say 'couldn't fetch' for those stats, instead of failing the entire call. Partial information is better than none, and this sounds easier than teaching monitoring solutions to parse our error messages for whatever data they may contain.
Right now, the output from 'riak-admin vnode-status' is mostly unusable, due to incorrect column formatting.
There is a workaround using the Erlang console right now:
1> Bin = <<" Compactions\nLevel Files Size(MB) Time(sec) Read(MB) Write(MB)\n--------------------------------------------------\n 0 1 5 0 0 17\n 1 0 0 1 24 23\n 2 74 99 2 82 82\n 3 654 999 4 110 110\n 4 6320 10000 0 0 0\n 5 51123 99998 0 0 0\n 6 21816 44506 0 0 0\n">>.
3> io:put_chars(Bin).
Compactions
Level Files Size(MB) Time(sec) Read(MB) Write(MB)
--------------------------------------------------
0 1 5 0 0 17
1 0 0 1 24 23
2 74 99 2 82 82
3 654 999 4 110 110
4 6320 10000 0 0 0
5 51123 99998 0 0 0
6 21816 44506 0 0 0
ok
This code in question adds "riak": even if we have a custom "raw_name":
[riak_kv_web.erl]
raw_dispatch() ->
case app_helper:get_env(riak_kv, raw_name) of
undefined -> raw_dispatch("riak");
Name -> lists:append(raw_dispatch(Name), raw_dispatch("riak"))
end.
See review comment here for details and context #414 (comment)
Migrating issue from Buzilla. See the original thread for existing comments and code.
(Moving here from basho/riak_pipe#47, because it's a KV issue.)
PR/PW are checked by looking at the preflist for the partition and checking
that the number of primaries in the preflists satisifies PR/PW. However, in the
case of a yanked network cable, kernel panic or VM pause, where no FIN packet
is sent, in the interval before the other nodes notice a node has become
unavailable, the preference list is not recalculated and so PR/PW checks will
continue to pass. Once erlang notices the node is down, this goes away, but
there is a significant window before this happens.
A possible fix, at least for reads, is to check which nodes actually serviced
the request and check that against the preflist, if there's a discrepancy that
would make PR invalid, throw an error at that point. I'm not sure how to
properly fix writes so they would be 'all or nothing'.
GET /buckets/bucket/keys?keys=true
and GET /buckets?buckets=true
return Link headers for each item returned. This can result in a very long header when listing a large number of items, which makes certain HTTP clients (such as txriak) very unhappy.
This is in reference to the riak-admin transfers
command and written in response to Matt Ranney's comment on a recent blog post.
IFF the information is easy to obtain then it would be very helpful to display the percentage complete for each transfer.
When performing a JS MR job against master (commit 8fe7e1ec9434a616d656c400bd51423f69590f37
) that results in an error, strings containing Erlang terms are returned inside the object. I produced this error manually running the badMap
test case found in the riak-javascript-client (1).
Stand up an instance of Riak
Checkout riak-javascript-client
cd riak-javascript-client/tests
./setup -i
(you may need to modify the setup script to point to correct port/host)
Finally, run the following curl
curl -XPOST http://10.0.1.4:8098/mapred -d '{"inputs":"mr_test", "query":[{"map":{"language":"javascript","source":"function(value) { return [JSON.parse(value)]; }", "keep":true}}], "timeout":60000}'
Below is a copy of my output
$ curl -XPOST http://10.0.1.4:8098/mapred -d '{"inputs":"mr_test", "query":[{"map":{"language":"javascript","source":"function(value) { return [JSON.parse(value)]; }", "keep":true}}], "timeout":60000}' | jsonpp
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 210 100 56 100 154 2365 6506 --:--:-- --:--:-- --:--:-- 8105
{
"lineno": 477,
"message": "JSON.parse",
"source": "unknown"
}
$ curl -XPOST http://10.0.1.4:8098/mapred -d '{"inputs":"mr_test", "query":[{"map":{"language":"javascript","source":"function(value) { return [JSON.parse(value)]; }", "keep":true}}], "timeout":60000}' | jsonpp
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 808 100 654 100 154 63513 14955 --:--:-- --:--:-- --:--:-- 106k
{
"phase": 0,
"error": "[{<<\"lineno\">>,477},{<<\"message\">>,<<\"JSON.parse\">>},{<<\"source\">>,<<\"unknown\">>}]",
"input": "{ok,{r_object,<<\"mr_test\">>,<<\"third\">>,[{r_content,{dict,6,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[[<<\"Links\">>]],[],[],[],[],[],[],[],[[<<\"content-type\">>,116,101,120,116,47,112,108,97,105,110],[<<\"X-Riak-VTag\">>,49,117,106,81,100,85,75,113,105,88,119,84,49,86,99,55,121,121,67,55,85,53]],[[<<\"index\">>]],[],[[<<\"X-Riak-Last-Modified\">>|{1329,256793,914740}]],[],[[<<\"X-Riak-Meta\">>]]}}},<<\"1\">>}],[{<<35,9,254,249,79,57,154,63>>,{3,63496475993}}],{dict,1,16,16,...},...},...}"
}
(1) https://github.com/basho/riak-javascript-client/blob/slf-sigsegv-bug/tests/unit_tests.js#L313
The riak_kv_stat:system_stats/0
function calls erlang:system_info(global_heaps_size)
but that call results in a badarg
exception under Erlang R15B02. The OTP team removed support for global_heaps_size
in commit 9cfeeb9b in the erlang/otp repo.
rebar fails to build nsprpub at riak_kv/deps/erlang_js because of updating CC="$CC -m32" at configure script.
make[3]: Entering directory `/home/kuenishi/src/riak_kv/deps/erlang_js/c_src/nsprpub'
cd config; make -j1 export
make[4]: Entering directory `/home/kuenishi/src/riak_kv/deps/erlang_js/c_src/nsprpub/config'
ccache gcc -m32 -o now.o -c -Wall -O2 -fPIC -UDEBUG -DNDEBUG=1 -DHAVE_VISIBILITY_HIDDEN_ATTRIBUTE=1 -DHAVE_VISIBILITY_PRAGMA=1 -DXP_UNIX=1 -D_GNU_SOURCE=1 -DHAVE_FCNTL_FILE_LOCKING=1 -DLINUX=1 -Di386=1 -D_REENTRANT=1 -DFORCE_PR_LOG -D_PR_PTHREADS -UHAVE_CVAR_BUILT_ON_SEM now.c
In file included from /usr/include/stdio.h:28:0,
from now.c:38:
/usr/include/features.h:323:26: fatal error: bits/predefs.h: No such file or directory
compilation terminated.
Bypassing rebar and directly building nsprpub also fails:
kuenishi@kushana> $ make # in ~/src/erlang_js/c_src
gunzip -c nsprpub-4.8.tar.gz | tar xf -
patching file nsprpub/lib/tests/Makefile.in
(snip)
checking for pthread_create in -lc... no
checking whether ccache gcc -m32 accepts -pthread... no
checking whether ccache gcc -m32 accepts -pthreads... no
creating ./config.status
creating Makefile
(snip)
creating pr/src/pthreads/Makefile
make[1]: Entering directory `/home/kuenishi/src/erlang_js/c_src/nsprpub'
cd config; make -j1 export
make[2]: Entering directory `/home/kuenishi/src/erlang_js/c_src/nsprpub/config'
ccache gcc -m32 -o now.o -c -Wall -O2 -fPIC -UDEBUG -DNDEBUG=1 -DHAVE_VISIBILITY_HIDDEN_ATTRIBUTE=1 -DHAVE_VISIBILITY_PRAGMA=1 -DXP_UNIX=1 -D_GNU_SOURCE=1 -DHAVE_FCNTL_FILE_LOCKING=1 -DLINUX=1 -Di386=1 -D_REENTRANT=1 -DFORCE_PR_LOG -D_PR_PTHREADS -UHAVE_CVAR_BUILT_ON_SEM now.c
In file included from /usr/include/stdio.h:28:0,
from now.c:38:
/usr/include/features.h:323:26: fatal error: bits/predefs.h: No such file or directory
But building erlang_js itself is always successfully done. So I think rebar is adding --disable-64bit or lacking --enable-64bit ( or both ) when building dependent repositories.
This may be a rebar bug or misconfiguration of rebar.config but it's a unique case for riak_kv so I made a issue here. And I'm not sure whether it's specific in my Linux (Debian wheezy, amd64)
See https://github.com/basho/riak_kv/blob/master/src/riak_kv_wm_object.erl#L1023
In this case the user is always told that there w values are wrong, even on a fetch. Try
curl localhost:8098/riak/test/foo?r=10
on a new cluster for response of
Specified w/dw/pw values invalid for bucket n value of 3
Thanks Ingo, for report http://riak-users.197444.n3.nabble.com/Data-loads-td4025122.html#a4025385
https://github.com/basho/riak_kv/blob/master/src/riak_kv_put_fsm.erl#L211
ssumes that length(Preflist2) > 1 which it isn't when n_val = 1 for that bucket. Calling rand_uniform(1, 1) is a badarg and causes this to fail. The simple fix is to check the length of Prelist2 before using it in rand_uniform/2. A quick audit of other uses of rand_uniform in Riak might find similar mistakes.
https://issues.basho.com/show_bug.cgi?id=1034
To reproduce:
curl http://127.0.0.1:8098/riak/test/key -XPUT -H 'content-type: text/plain' -d
''
curl http://127.0.0.1:8098/mapred -XPOST -d '{"inputs": {"key_filters":
[["matches", ".*"]], "bucket":"test"}, "query": [{"reduce": {"source":
"function() { return [1] }", "language": "javascript", "keep": true}}]}'
OR
Run a MapReduce query with a full bucket key listing and a single reduce phase
curl http://127.0.0.1:8098/mapred -XPOST -d '{"inputs": "test", "query":
[{"reduce": {"source": "function() { return [1] }", "language": "javascript",
"keep": true}}]}'
OR
Run a MapReduce query with explicit bucket/key pairs and a single reduce phase
curl http://127.0.0.1:8098/mapred -XPOST -d '{"inputs": [["test","key"]],
"query": [{"reduce": {"source": "function() { return [1] }", "language":
"javascript", "keep": true}}]}'
Expected:
Output from reduce phase
Actual:
{"error":"bad_json"}
Error report from log file:
=ERROR REPORT==== 3-Mar-2011::14:30:07 ===
** State machine <0.17958.0> terminating
** Last event in was inputs_done
** When State == executing
** Data == {state,0,riak_kv_reduce_phase,
{state,
{javascript,
{reduce,
{jsanon,<<"function() { return [1] }">>},
none,true}},
[],
[{<<"test">>,<<"key">>},1]},
true,true,undefined,
[<0.17957.0>],
undefined,0,0,<0.17956.0>,66000}
** Reason for termination =
** {error,bad_json}
Since some backends provides efficient range fold (in particular eleveldb and hanoidb), we should add an API that provides fast M/R for objects in a given key range.
Right now, this is done by first getting the keys in a given range (which is actually reasonably fast in eleveldb and hanoidb), and then getting each object individually.
Since leveldb and hanoi actually has the objects stored close to each other, it would be much faster to simply provide all the values in a given key range to a reduce phase.
Much faster. Grr.. :-)
This problem could be more widespread, but at the least:
If you make an HTTP request for either list-keys or 2i, and the coverage FSM (one of riak_kv_index_fsm
or riak_kv_keys_fsm
) dies, the HTTP socket remains open. To illustrate the error, try this patch:
diff --git src/riak_kv_keys_fsm.erl src/riak_kv_keys_fsm.erl
index 95608e1..faedaa2 100644
--- src/riak_kv_keys_fsm.erl
+++ src/riak_kv_keys_fsm.erl
@@ -96,6 +96,7 @@ process_results({From, Bucket, Keys},
StateData=#state{client_type=ClientType,
from={raw, ReqId, ClientPid}}) ->
process_keys(ClientType, Bucket, Keys, ReqId, ClientPid),
+ exit(self(), error),
riak_kv_vnode:ack_keys(From), % tell that vnode we're ready for more
{ok, StateData};
process_results({Bucket, Keys},
Add equivalent functionality to the get FSM to track the request stages like the put FSM is able to do.
([email protected])2> C:put(riak_object:new(<<"b">>,<<"k">>,<<"v">>),[details]).
{ok,[{response_usecs,113057},
{stages,[{prepare,45894},
{validate,12677},
{precommit,6448},
{execute_local,64},
{waiting_local_vnode,14082},
{execute_remote,47},
{waiting_remote_vnode,33845}]}]}
It may even be valuable to try and break down wait time by vnode.
riak should have a returnHead parameter quite similar to returnBody request parameter but without returning the body of the k/v just the head informations so the client is able to retrieve the updated vector clock after a store operation and can update the local domain object. This would be quite useful to work with POJOs in the riak-java-client. Without the updated vector clock a refetch of the data would always be necessary before storing a POJO again.
Looking at riak_kv_put_fsm:add_timing
, it uses os:timestamp()
to provide time. If NTP decides to set the clock backward to deskew, then timer:now_diff(StageEnd, StageStart)
could return a negative. Which will cause /stats
to return 500 - Internal Server Errors until the negative falls out of the histogram window.
Suggestion is to not capture negative timing statistics
Steps to reproduce:
From riak attach
run:
folsom_metrics:notify_existing_metric({riak_kv,node_put_fsm_time},-9999,histogram).`
From a browser make a call to /stats
Result:
15:44:13.936 [error] Error in process <0.5758.0> on node '[email protected]' with exit value: {badarith,[{bear,scan_values,2,[]},{bear,get_statistics,1,[]},{lists,keysort,2,[]}]}
15:44:13.941 [error] webmachine error: path="/stats"
{error,{error,function_clause,[{proplists,delete,[disk,{error,{badarith,[{bear,scan_values,2,[]},{bear,get_statistics,1,[]},{lists,keysort,2,[]}]}}],[{file,"proplists.erl"},{line,362}]},{riak_kv_wm_stats,get_stats,0,[{file,"src/riak_kv_wm_stats.erl"},{line,88}]},{riak_kv_wm_stats,produce_body,2,[{file,"src/riak_kv_wm_stats.erl"},{line,77}]},{webmachine_resource,resource_call,3,[{file,"src/webmachine_resource.erl"},{line,169}]},{webmachine_resource,do,3,[{file,"src/webmachine_resource.erl"},{line,128}]},{webmachine_decision_core,resource_call,1,[{file,"src/webmachine_decision_core.erl"},{line,48}]},{webmachine_decision_core,decision,1,[{file,"src/webmachine_decision_core.erl"},{line,532}]},{webmachine_decision_core,handle_request,2,[{file,"src/webmachine_decision_core.erl"},{line,33}]}]}}
When trying to access /stats it crashes hard on me, here's the output from the response and the error that pops up in the logs:
https://gist.github.com/3496819
riak-admin status
succeeds, but only after a timeout. It seems to have trouble fetching the number of processors, as this line comes up in the output: cpu_nprocs : {error,timeout}
Using Riak 1.2.0, binary distribution, with the included Erlang. Anything I could do to provide more info or to get this working temporarily? Thanks!
Tested as below, issue seems to be that indexdoc buckets are not excluded from the precommit hook set in default bucket props. When a value is put, precommit is invoked on <><> and then again on <<_rsid_bucket>><>, resulting in 500 internal server error for every put.
See Also: https://help.basho.com/tickets/1507
If MapReduce on a bucket is cancelled (timeout or dropped client connection) the listkeys feeding the objects from the bucket is not cancelled, instead it generates a large number of error messages
Pipe worker startup failed:fitting was gone before startup
Additionally, these running listkey jobs continue to tie up resources that would otherwise be capable of running other MapReduce queries. If enough listkey jobs end up in this state, subsequent MapReduce queries will be unable to complete and will timeout. In the worst case, all MapReduce queries submitted will timeout until the running listkeys eventually terminate and the cluster recovers.
Related user report: http://lists.basho.com/pipermail/riak-users_lists.basho.com/2012-August/009307.html
The riak_kv_mapred_filters:build_filter/2
function processes the filter expression as if it were a flat list. For nesting expressions, and
, or
, and not
, this means that their arguments go unbuilt until execution time. Unfortunately, this means that they also go unvalidated until execution time.
This delay becomes a problem when a query is submitted with an invalid filter nested inside an and
, or
, or not
. Instead of being detected before execution, the bad query makes its way through to KV vnode workers, which fail attempting to execute them. Those KV vnode worker failures go undetected by the Pipe vnode workers that are waiting on their results, and the MapReduce job that that Pipe is implementing hangs around dormant until its timeout.
Minimal reproduction (at Erlang console):
riak_kv_mrc_pipe:mapred({<<"foo">>,[[<<"not">>,[[[<<"breaking things">>]]]]]},[]).
That query will appear to hang, until the default 60 second timeout, at which time it will return {error,{timeout,[]}}
. In the mean time, log messages like the following will print:
08:32:51.006 [error] gen_fsm <0.655.0> in state active terminated with reason: no match of right hand value {error,{bad_filter,[<<"breaking things">>]}} in riak_kv_mapred_filters:logical_not/1 line 189
08:32:51.016 [error] CRASH REPORT Process <0.655.0> with 1 neighbours exited with reason: no match of right hand value {error,{bad_filter,[<<"breaking things">>]}} in riak_kv_mapred_filters:logical_not/1 line 189 in gen_fsm:terminate/7 line 611
08:32:51.022 [error] gen_fsm <0.565.0> in state active terminated with reason: no match of right hand value {error,{bad_filter,[<<"breaking things">>]}} in riak_kv_mapred_filters:logical_not/1 line 189
08:32:51.023 [error] CRASH REPORT Process <0.565.0> with 1 neighbours exited with reason: no match of right hand value {error,{bad_filter,[<<"breaking things">>]}} in riak_kv_mapred_filters:logical_not/1 line 189 in gen_fsm:terminate/7 line 611
08:32:51.024 [error] gen_fsm <0.658.0> in state ready terminated with reason: no match of right hand value {error,{bad_filter,[<<"breaking things">>]}} in riak_kv_mapred_filters:logical_not/1 line 189
08:32:51.025 [error] CRASH REPORT Process <0.658.0> with 10 neighbours exited with reason: no match of right hand value {error,{bad_filter,[<<"breaking things">>]}} in riak_kv_mapred_filters:logical_not/1 line 189 in gen_fsm:terminate/7 line 611
08:32:51.026 [error] Supervisor riak_core_vnode_sup had child undefined started with {riak_core_vnode,start_link,undefined} at <0.655.0> exit with reason no match of right hand value {error,{bad_filter,[<<"breaking things">>]}} in riak_kv_mapred_filters:logical_not/1 line 189 in context child_terminated
08:32:51.028 [error] Supervisor riak_core_vnode_sup had child undefined started with {riak_core_vnode,start_link,undefined} at <0.565.0> exit with reason no match of right hand value {error,{bad_filter,[<<"breaking things">>]}} in riak_kv_mapred_filters:logical_not/1 line 189 in context child_terminated
08:32:51.029 [error] gen_fsm <0.1150.0> in state active terminated with reason: no match of right hand value {error,{bad_filter,[<<"breaking things">>]}} in riak_kv_mapred_filters:logical_not/1 line 189
The behavior should instead be like that of the same query without the nesting under not
:
riak_kv_mrc_pipe:mapred({<<"foo">>,[[<<"breaking things">>]]},[]).
That command immediately returns the following, with no other log output:
{error,{error,{bad_filter,<<"breaking things">>}},{ok,[]}}
Background. Prior to Riak 0.14.2, all fold operations would block the relevant vnode and prevent the vnode from servicing requests. This was changed in Riak 1.0, with the introduction of asynchronous folds that used an async worker pool, as well as additions to the various backends to support async folds.
To support async folds, Bitcask freezes it's in-memory keydir and has async folds iterate over the frozen keydir, with new concurrent writes going to a pending keydir. Since the keydir is in-memory, Bitcask only allows a single frozen keydir. Multiple folds can reuse the same keydir, but only if there has not been writes since the keydir was frozen. If a fold is started, a write occurs, and then a new fold is started, the second fold will block until the first fold finishes, and then re-freeze the keydir.
The Problem. Blocking async folders is expected and not a big deal. However, when determining if a vnode should handoff data, Riak will end up calling riak_kv_vnode:is_empty
which will call, for a Bitcask vnode, bitcask:is_empty
. In Bitcask, the is_empty
check is implemented through as a fold (start a fold and exit as soon as any key is found) to deal with tombstones, expired keys, etc. This fold is executed directly in the vnode pid, not an async worker, and will block the vnode in scenarios such as above.
There are two scenarios:
is_empty
check occurs before calling the handoff manager. So, handoff A to B, write, handoff A to B will cause the vnode to block on the second handoff request, again servicing no requests and leading to a growing message queue.History:
The riak_kv_wm_mapred
module, which provided the MapReduce HTTP interface, implements its timeout mechanism by scheduling a message to be delivered to its process after the timeout has occurred:
https://github.com/basho/riak_kv/blob/master/src/riak_kv_wm_mapred.erl#L148
When a MapReduce query finishes before that message is received, however, mochiweb is likely to receive it instead:
https://github.com/mochi/mochiweb/blob/master/src/mochiweb_http.erl#L69-70
The mochiweb_http
module expects only tcp socket messages, so if this unknown pipe_timeout
message is received, mochiweb immediately sends a 400 error message to the client. The message is safe to ignore, but it can be confusing to prove that it came from this bit of code, and not somewhere else.
What riak_kv_wm_mapred
should do instead is keep the timer reference around, and cancel the timer before returning control to mochiweb. Alternatively, all pipe and timeout messages should be handled by a process other than the mochiweb/webmachine/socket-holding process, which would also make it possible to remove the use of the mochiweb_request_force_close
hack:
https://github.com/basho/riak_kv/blob/master/src/riak_kv_wm_mapred.erl#L299-300
I believe all users who needs more performance would like to process all operations via PBC API.
However, user cannot do so due to lack of some APIs.
The known_content_type
WM callback crashes if no content-type is given.
The error happens around these lines: https://github.com/basho/riak_kv/blob/master/src/riak_kv_wm_mapred.erl#L98 because the call to wrq:get_req_header("content-type", RD)
returns undefined
, and then soon thereafter there is a no-match inside base_ctype
.
In a many-core environment, use of the zlib:zip() and zlib:unzip() function isn't a good idea. Each appears to open an close an Erlang port (for the "zlib_drv" driver) for each invocation. Inside the R15B01 virtual machine, all allocations of a new port # are serialized by the 'get_free_port' mutex.
Suggestion: eliminate zlib's use in favor of term_to_binary(VClock, [compressed]). Or something else that doesn't allocate & free an Erlang port on each call.
NOTE: zlib:zip() and unzip() are also used to compress handoff payload. On extremely busy systems (e.g. 8K Riak ops/sec/node or more), this same port allocation serialization probably interferes quite a bit with handoff.
Right now we use erlangs term_to_binary to encode the vector clock, which means that without using the erlang client, most people can't do anything with that information even if they wanted to, unless they're willing to write and support a version of binary to term in their target language.
I have a partially written python version, but it isn't small and wasn't fun to write.
Rather than duplicating that effort for all clients, it might be better to decide on an standard output format that we can produce and optionally consume, that would allow the user to actually use the data that is encoded there.
In short, "latency leveling" (1) is about bounding the amount of time
a request can take. On the surface this sounds no different than a
timeout but the difference is that you attempt to do all work but
accept some work after a given time bound. It's the same idea as
trading yield for harvest. The purpose is to make a predictable
system out of unpredictable parts. This is essential in a webapp
that has a hard ceiling on response time but requires the coordination
of many services. If a single service's response time can grow
unbounded then that ceiling is breached. A timeout will prevent this
but will also throw away any work that may have occurred.
E.g. Given a read request with N=3
, R=2
and a timeout of 50ms
two of the replicas must respond to the coordinator in less than
50ms
for the client to get a response. If only one replica replies
in time then the client receives a timeout. The read request could be
changed to use R=1
but now all requests are giving up some
consistency for latency. It would be nice if there was a way to
sometimes tradeoff the desired consistency level for latency. This
is what latency leveling achieves. Returning to the first case, if
one replica returns it's value in 29ms
but then nothing else is seen
for the next 31ms
then the coordinator, if told to, will return the
value it did see with an indicator that the value returned was read
with less than desired consistency.
http://features.basho.com/entries/20518738-latency-leveling-for-reads
(1): AFAICT, this term was made up by Coda Hale but I think it's
fitting so I will use it too.
If there's a way to differentiate it from a leave, we should do so and leave the node up, it's irritating to have to restart it every time after you cancel your cluster operation.
Tombstones may not be reaped if reaping occurs before tombstones are written to all replicas.
riak_kv/src/riak_kv_delete.erl
Line 80 in cada9c7
riak_kv/src/riak_kv_delete.erl
Line 86 in cada9c7
( delete | read_repair | noop )
riak_kv/src/riak_kv_get_fsm.erl
Line 282 in cada9c7
If one of the replicas returns an older version during tombstone reaping, the riak_kv_get_core check returns read_repair
rather than delete
.
The riak_kv_get_fsm should be able to delete the object rather than read repairing if one of the replicas returns an object older than the tombstone.
Below is an example set of replicas that will result in a read repair rather than a delete:
[{274031556999544297163190906134303066185487351808,
{ok,{r_object,<<"foo">>,<<"9698">>,
[{r_content,{dict,6,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],...},
{{[],[],[[<<"Links">>]],[],[],[],[],...}}},
<<"hello world">>}],
[{<<"úZ»/O{=\t">>,{1,63500707125}}],
{dict,1,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],...},
{{[],[],[],[],[],[],[],[],...}}},
undefined}}},
{296867520082839655260123481645494988367611297792,
{ok,{r_object,<<"foo">>,<<"9698">>,
[{r_content,{dict,4,16,16,8,80,48,
{[],[],[],[],[],[],[],[],...},
{{[],[],[],[],[],[],...}}},
<<>>}],
[{<<"úZ»/O{=\t">>,{2,63500707280}}],
{dict,1,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],...},
{{[],[],[],[],[],[],[],...}}},
undefined}}},
{251195593916248939066258330623111144003363405824,
{ok,{r_object,<<"foo">>,<<"9698">>,
[{r_content,{dict,4,16,16,8,80,48,
{[],[],[],[],[],[],[],...},
{{[],[],[],[],[],...}}},
<<>>}],
[{<<"úZ»/O{=\t">>,{2,63500707280}}],
{dict,1,16,16,8,80,48,
{[],[],[],[],[],[],[],[],...},
{{[],[],[],[],[],[],...}}},
undefined}}}]
Streamed MapReduce output from riak_pipe
is sent to the HTTP or PB client as one message per result. It may be more efficient to batch up results and send several results per message, as the legacy MapReduce implementation did.
It would be really useful for us automation and monitoring types to be able to connect to any running node of a riak cluster and get the /stats page, then extract out a list of all the nodes in the cluster, with their bound ip and port.
e.g. we currently have in /stats
ring_members: [
"[email protected]",
"[email protected]",
"[email protected]",
"[email protected]",
"[email protected]"
],
I'd really like to see:
ring_members_http: [
"127.0.0.1:11098",
"127.0.0.1:12098",
"127.0.0.1:13098",
"127.0.0.1:14098",
"127.0.0.1:15098"
],
and
ring_members_pb: [
"10.0.2.11:11099",
"10.0.2.12:12099",
"10.0.2.13:13099",
"10.0.2.14:14099",
"10.0.2.15:15099",],
When using Content-Type: application/x-erlang-binary
, Riak calls binary_to_term
on a PUT body, even if you've set Content-Encoding: gzip
.
Steps to repro:
In an erlang shell:
T = {some, <<"term">>}.
file:write_file("erlang-binary", term_to_binary(T)).
gzip erlang-binary
curl -X PUT -H "Content-Encoding: gzip" -H "Content-Type: application/x-erlang-binary" localhost:8091/riak/gzip/test --data-binary @erlang-binary.gz
Gives a 500 error, the problem appears to be here.
There is a race condition in the riak_kv_js_manager
wherein:
riak_kv_js_manager:blocking_dispatch/4
riak_kv_js_manager:blocking_dispatch
function calls reserve_vm
, a gen_server:call
to the riak_kv_js_manager
, which reserves a VMriak_kv_js_vm:blocking_dispatch/3
, a gen_server:call
to the riak_kv_js_manager
, which normally would return the JS VM to the VM pool after successfully running the provided JSIn this situation, the reserved JS VM will never be returned to the JS VM pool.
The offending code is as follows:
blocking_dispatch(Name, JSCall, MaxCount, Count) ->
case reserve_vm(Name) of %% reserve_vm is a gen_server:call to the riak_kv_js_manager
{ok, VM} ->
JobId = {VM, make_ref()},
riak_kv_js_vm:blocking_dispatch(VM, JobId, JSCall); %% riak_kv_js_vm:blocking_dispatch is a gen_server:call to the riak_kv_js_vm
{error, no_vms} ->
back_off(MaxCount, Count),
blocking_dispatch(Name, JSCall, MaxCount, Count - 1)
end.
The simple solution would appear to be linking the calling process to the JS VM before allowing it to be checkout of the pool.
When a precommit hook fails, the put operation might return an {error, {precommit_fail, Message}}
tuple. The riak_kv_wm_object:handle_common_error/3
function immediately passes Message
to riak_kv_wm_object:send_precommit_error/2
.
Unfortunately, send_precommit_error/2
only handles the cases where Message
is an iolist, or the atom undefined
. For all other cases, a badarg
is raised, causing an ugly error message to be sent to the client. For example, if the precommit hook exits (as described in basho/riak_search#134), the client sees:
{error,
{error,badarg,
[{erlang,iolist_to_binary,
[{hook_crashed,{riak_search_kv_hook,precommit,error,badarg}}],
[]},
{wrq,append_to_response_body,2,[{file,"src/wrq.erl"},{line,205}]},
{riak_kv_wm_object,handle_common_error,3,
[{file,"src/riak_kv_wm_object.erl"},{line,998}]},
{webmachine_resource,resource_call,3,
[{file,"src/webmachine_resource.erl"},{line,171}]},
{webmachine_resource,do,3,
[{file,"src/webmachine_resource.erl"},{line,130}]},
{webmachine_decision_core,resource_call,1,
[{file,"src/webmachine_decision_core.erl"},{line,48}]},
{webmachine_decision_core,accept_helper,0,
[{file,"src/webmachine_decision_core.erl"},{line,606}]},
{webmachine_decision_core,decision,1,
[{file,"src/webmachine_decision_core.erl"},{line,512}]}]}}
The hook_crashed
tuple is generated here: https://github.com/basho/riak_kv/blob/master/src/riak_kv_put_fsm.erl#L715
It looks like a similar problem could happen just below that, if a precommit hook returns something that is not a failure message, but also not a riak object: https://github.com/basho/riak_kv/blob/master/src/riak_kv_put_fsm.erl#L726
The client should see the returned failure message, not a stack trace of webmachine.
The function riak_kv_js_manager:blocking_dispatch/3
may exit with a noproc
error if either the manager process has died, or the VM that is attempts to call riak_kv_js_vm:checkout_to/2
on has died. We may want to catch this exit in riak_kv_mrc_map:map_js/3
to avoid bringing down the whole MapReduce pipeline with a cryptic error, as was seen in this mailing list post: http://lists.basho.com/pipermail/riak-users_lists.basho.com/2012-October/009798.html
Deleted keys remain in indexes indefinitely if the tombstone reap process fails. The key will finally be removed once a GET request is made on the key.
https://github.com/basho/riak_kv/blob/master/src/riak_kv_vnode.erl#L196
Same question for riak_kv_vnode:local_put
too, I guess.
I think it would be good to add statistics for node local backend performance to "riak-admin status".
This would make it possible to measure effects of tuning a single node.
Something like:
node_backend_gettime_mean: <>
node_backend_gettime_99: <<99% percentile for for node local gets>>
etc.
The other time measure measurements include network latencies and make it impossible to identify node local problems.
Once error occurs on PBC API, RpbErrorResp is returned but it doesn't make sense because RIAKC_ERR_GENERAL is enable. It should be set the same value to REST API.
It doesn't matter for client if server returns not found on GET operation. So heavy performance cost exception should not be triggered. If modified is returned, application should choose to trigger to repair data or just to retry get ops. It's not server-side issue.
All human readable fields e.g. errmsg field should be understandable for user. The contents should be tuned in long development cycle, so instability in early release is acceptable. If we requires detailed implementation on error handling, it should be done by reading errcode field.
I suggest we improve current version of PBC API.
To reproduce this issue do the following:
make devrel
with Riak 1.0.3map_js_vm_count
to 0 on one nodecurl -XPUT http://localhost:8091/riak/docs/foo \
-H 'Content-Type: text/plain' -d 'demo data goes here' -v
curl -XPUT http://localhost:8091/riak/docs/bar \
-H 'Content-Type: text/plain' -d 'demo demo demo demo' -v
curl -XPUT http://localhost:8091/riak/docs/baz \
-H 'Content-Type: text/plain' -d 'nothing to see here' -v
curl -XPUT http://localhost:8091/riak/docs/qux \
-H 'Content-Type: text/plain' -d 'demo demo' -v
curl -XPOST http://localhost:8091/mapred \
-H 'Content-Type: application/json' \
-d '{"inputs":"docs",
"query":[{"map":{"language":"javascript",
"source":"Riak.mapValues"}}]}'
Only a partial result set will be returned. For example:
["demo demo demo demo","demo data goes here","demo demo"]
You should get:
["demo demo demo demo","demo data goes here","demo demo","nothing to see here"]
In the logs on the node with 0 map_js_vms you will see notifications like this:
2012-02-07 12:38:36.700 [notice] <0.26723.1821>@riak_kv_js_manager:blocking_dispatch:246 JS call failed: All VMs are busy.
Riak KV handoff can be stymied indefinitely by a badarg error in erlang:binary_to_term/1, e.g.:
2012-06-12 01:25:10.263 [error] <0.89.0> Supervisor riak_core_vnode_sup had child undefined started with {riak_core_vnode,start_link,undefined} at <0.9312.1> exit with reason bad argument in call to erlang:binary_to_term(<<131,104,7,100,0,8,114,95,111,98,106,101,99,116,109,0,0,0,10,114,101,97,100,95,105,110,100,101,...>>) in riak_kv_vnode:do_diffobj_put/3 in context child_terminated
Similarly, GET operations can fail with a similar error:
2012-06-12 17:47:00.750 [error] <0.89.0> Supervisor riak_core_vnode_sup had child undefined started with {riak_core_vnode,start_link,undefined} at <0.27729.58> exit with reason bad argument in call to erlang:binary_to_term(<<131,104,7,100,0,8,114,95,111,98,106,101,99,116,109,0,0,0,10,114,101,97,100,95,105,110,100,101,...>>) in riak_kv_vnode:do_get_term/3 in context child_terminated
In both of these cases, the KV backend is leveldb. Neither the LevelDB sanity checks (if the bad SST is copied to my local box for testing) nor the Snappy compression library are detecting any error, which is more than a little strange and needs further investigaion.
In 1.0 and above with vnode_vclocks enabled, nodes must be a member of the preference list to coordinate a put.
Currently nodes forward put requests to the first member of the preference list by talking to riak_kv_up_fsm_sup on the remote node - code here https://github.com/basho/riak_kv/blob/master/src/riak_kv_put_fsm.erl#L193
If the node goes down while starting the remote process instead of getting an {error, Reason} response from the remote supervisor, the start_put_fsm call throws an uncaught exception which crashes the local put FSM.
2012-03-09 06:11:00 =SUPERVISOR REPORT====
Supervisor: {local,riak_kv_put_fsm_sup}
Context: child_terminated
Reason: {{nodedown,'riak@host1'},{gen_server,call,[{riak_kv_put_fsm_sup,'riak@host1'},{start_child,[{raw,12339479,<0.4491.12>},{r_object,<<"bucket">>,<<"key">>,[{r_content,{dict,6,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[[<<"Links">>]],[],[],[],[],[],[],[],[[<<"content-type">>,97,112,112,108,105,99,97,116,105,111,110,47,106,115,111,110],[<<"X-Riak-VTag">>,52,113,74,65,119,52,87,90,51,116,119,86,107,73,112,113,101,65,48,52,73,53]],[[<<"index">>]],[],[[<<"X-Riak-Last-Modified">>|{1331,269199,636663}]],[],[[<<"X-Riak-Meta">>]]}}},<<"body">>}],[{<<54,74,223,68,79,76,172,101>>,{2,63498487778}}],{dict,4,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[[<<"Links">>]],[],[],[],[],[],[],[],[[<<"content-type">>,97,112,112,108,105,99,97,116,105,111,110,47,106,115,111,110]],[[<<"index">>]],[],[],[],[[<<"X-Riak-Meta">>]]}}},<<"{"started":1331268781747}">>},[{w,default},{dw,default},{pw,default},{timeout,60000}]]},infinity]}}
Offender: [{pid,<0.4817.12>},{name,undefined},{mfargs,{riak_kv_put_fsm,start_link,undefined}},{restart_type,temporary},{shutdown,5000},{child_type,worker}]
If the node is under heavy load you'll get a crash dump for however many puts ran in the last 4*net_tick_timeout seconds (typically 60s) which can be in the thousands and overload the logging subsystem.
The easiest path seems to be to replace the remote supervisor call with an rpc:call to the remote node which will then start the supervisor, that way we can supply a timeout that honors the requested timeout for the FSM (which I suppose could worse-case be 2*timeout if the forward requests then the node uses the same timeout again - but distributed time is hard).
To prove the issue is resolved, please do a comparative run of the current code vs the fixed code
I'd suggest using basho_bench against a 4 node cluster, setting N=1 for the test bucket (so that 75% of requests are forwarded) with the memory backend over protobuffs.
If the times are within 95% of the current solution we'll accept the performance trade off for robustness. This failure can be enough to take out a node due to running out of processes/memory.
Please make sure the fix can be applied against the 1.0 branch as well as 1.1 and master.
A problem was uncovered with riak_kv_pipe_get
's failover strategy.
When the KV vnode that the pipe worker talks to returns an error, "not
found" for example, the worker attempts to have another worker for
this fitting try on a different vnode by returning the
forward_preflist
result. The last worker in the fallback list is
then supposed to forward the error to the next phase if it also fails.
This strategy does not work when the kvget workers are unable to keep
up with the rate of incoming inputs. The forward_preflist
strategy
has to use a non-blocking enqueue to prevent workers in a fitting from
mutually blocking on one another. If the workers' queues are already
full of other inputs, forwarding retry inputs to them will never
succeed. When forwarding retry inputs fails, the worker does not get
a chance to send the error to the next phase; Pipe's internal code
sends an error
trace message and drops the input instead.
This bug manifests itself as MapReduce queries failing with a
timeout-like error before their configured timeout, since both the
HTTP and PB mapreduce endpoints treat error trace messages as reasons
to terminate the query. The issue was raised on the mailing list:
http://lists.basho.com/pipermail/riak-users_lists.basho.com/2012-February/007590.html
One quick and incomplete fix may be to simply set the q_limit
much
higher on the riak_kv_pipe_get
fittings in the riak_kv_mrc_pipe
module. This would raise the level/rate of inputs necessary to
trigger the bug. Raising the following riak_kv_mrc_map
fitting's
q_limit
may help as well.
One long-term fix is likely to change riak_kv_pipe_get
such that it
does not use the forward_preflist
return value. Either it will need
to talk directly to the fallback KV vnodes without forwarding the
input, or it will need to be able to call
riak_pipe_vnode_worker:recurse_input/5
directly (which will also
require passing the used-preflist to the worker) to be able to handle
with full-queue errors.
Another long-term fix might be to alter the behavior of
forward_preflist
such that instead of producing a trace error, it
forwards some standard message to the downstream fitting.
Below is a basic method for reproducing the problem. The method is to
follow a riak_kv_pipe_get
fitting with a fitting that takes forever
to process its first input. This in turn causes the
riak_kv_pipe_get
fitting's queues to back up. The nval
, q_limit
and chashfun
parameters have been tuned to provide a minimal case.
The sleeps are not needed, but they make the trace easier to read.
Each call to riak_pipe:queue_work/3
is preceeded by a description of
how the worker (W
), input queue (Q
), and blocking queue (B
) look
after each input settles in its final position.
KVGet = #fitting_spec{name = kvget,
module = riak_kv_pipe_get,
chashfun = chash:key_of(now()),
nval = 2,
q_limit = 1},
SlowMap = #fitting_spec{name = slowmap,
module = riak_pipe_w_xform,
chashfun = chash:key_of(now()),
arg = fun(I,P,F) ->
receive never -> ok end
end,
q_limit = 1},
{ok, Pipe} = riak_pipe:exec([KVGet, SlowMap],
[{log, sink}, {trace, all}]),
%% 1: makes it all the way to the slowmap worker
%% K1: W=idle, Q=[], B=[]
%% K2: W=idle, Q=[], B=[]
%% M1: W={notfound,1}, Q=[], B=[]
riak_pipe:queue_work(Pipe, {<<"notthere">>, <<"1">>}, noblock),
timer:sleep(1000),
%% 2: makes it into the slowmap worker's queue
%% K1: W=idle, Q=[], B=[]
%% K2: W=idle, Q=[], B=[]
%% M1: W={notfound,1}, Q=[{notfound,2}], B=[]
riak_pipe:queue_work(Pipe, {<<"notthere">>, <<"2">>}, noblock),
timer:sleep(1000),
%% 3: makes it to the fallback kvget worker,
%% blocking that worker on the slowmap worker's queue
%% K1: W=idle, Q=[], B=[]
%% K2: W={notthere 3}, Q=[], B=[]
%% M1: W={notfound,1}, Q=[{notfound,2}], B=[{notthere 3}]
riak_pipe:queue_work(Pipe, {<<"notthere">>, <<"3">>}, noblock),
timer:sleep(1000),
%% 4: makes it to the fallback worker's queue
%% K1: W=idle, Q=[], B=[]
%% K2: W={notthere 3}, Q=[{notthere, 4}], B=[]
%% M1: W={notfound,1}, Q=[{notfound,2}], B=[{notfound, 3}]
riak_pipe:queue_work(Pipe, {<<"notthere">>, <<"4">>}, noblock),
timer:sleep(1000),
%% 5: fails because preflist forwarding would block on the
%% fallback worker's queue
%% K1: W={notthere 5}, Q=[], B=[]
%% K2: W={notthere 3}, Q=[{notthere, 4}], B=[]
%% M1: W={notfound,1}, Q=[{notfound,2}], B=[{notfound, 3}]
riak_pipe:queue_work(Pipe, {<<"notthere">>, <<"5">>}, noblock),
timer:sleep(1000),
{timeout, [], Trace} = riak_pipe:collect_results(Pipe),
rp(Trace).
Below is the value captured in the Trace
variable, with additional
comments about what is happening at each step.
%% K1: W=idle, Q=[], B=[]
%% K2: W=idle, Q=[], B=[]
%% M1: W=idle, Q=[], B=[]
[{slowmap,{trace,all,{fitting,init_started}}},
{slowmap,{trace,all,{fitting,init_finished}}},
{kvget,{trace,all,{fitting,init_started}}},
{kvget,{trace,all,{fitting,init_finished}}},
%% 1: send first input to primary kvget vnode
%% K1: W=idle, Q=[{notthere,1}], B=[]
%% K2: W=idle, Q=[], B=[]
%% M1: W=idle, Q=[], B=[]
{kvget,
{trace,all,
{fitting,
{get_details,#Ref<0.0.0.34313>,
1438665674247607560106752257205091097473808596992,
<11776.302.0>}}}},
{kvget,
{trace,all,
{vnode,
{start,1438665674247607560106752257205091097473808596992}}}},
{kvget,
{trace,all,
{vnode,
{queued,1438665674247607560106752257205091097473808596992,
{<<"notthere">>,<<"1">>}}}}},
%% 1: primary kvget worker processes first input
%% K1: W={notthere,1}, Q=[], B=[]
%% K2: W=idle, Q=[], B=[]
%% M1: W=idle, Q=[], B=[]
{kvget,
{trace,all,
{vnode,
{dequeue,
1438665674247607560106752257205091097473808596992}}}},
%% 1: primary kvget worker forwards first input to fallback vnode
%% K1: W=idle, Q=[], B=[]
%% K2: W=idle, Q=[{notthere,1}], B=[]
%% M1: W=idle, Q=[], B=[]
{kvget,
{trace,all,
{fitting,{get_details,#Ref<0.0.0.34313>,0,<0.282.0>}}}},
{kvget,{trace,all,{vnode,{start,0}}}},
{kvget,
{trace,all,{vnode,{queued,0,{<<"notthere">>,<<"1">>}}}}},
%% 1: fallback kvget vnode processes first input
%% K1: W=idle, Q=[], B=[]
%% K2: W={notthere,1}, Q=[], B=[]
%% M1: W=idle, Q=[], B=[]
{kvget,{trace,all,{vnode,{dequeue,0}}}},
{kvget,
{trace,all,
{vnode,
{waiting,
1438665674247607560106752257205091097473808596992}}}},
%% 1: fallback kvget worker sends notfound output to slowmap vnode
%% K1: W=idle, Q=[], B=[]
%% K2: W=idle, Q=[], B=[]
%% M1: W=idle, Q=[{notfound,1}], B=[]
{slowmap,
{trace,all,
{fitting,
{get_details,#Ref<0.0.0.34313>,
593735040165679310520246963290989976735222595584,
<11775.290.0>}}}},
{slowmap,
{trace,all,
{vnode,
{start,593735040165679310520246963290989976735222595584}}}},
{slowmap,
{trace,all,
{vnode,
{queued,593735040165679310520246963290989976735222595584,
{{error,notfound},{<<"notthere">>,<<"1">>},undefined}}}}},
{kvget,{trace,all,{vnode,{waiting,0}}}},
%% 1: slowmap vnode picks up first input
%% K1: W=idle, Q=[], B=[]
%% K2: W=idle, Q=[], B=[]
%% M1: W={notfound,1}, Q=[], B=[]
{slowmap,
{trace,all,
{vnode,
{dequeue,
593735040165679310520246963290989976735222595584}}}},
%% 2: send second input to primary kvget vnode
%% K1: W=idle, Q=[{notthere,2}], B=[]
%% K2: W=idle, Q=[], B=[]
%% M1: W={notfound,1}, Q=[], B=[]
{kvget,
{trace,all,
{vnode,
{queued,1438665674247607560106752257205091097473808596992,
{<<"notthere">>,<<"2">>}}}}},
%% 2: primary kvget worker forwards second input to fallback vnode
%% K1: W=idle, Q=[], B=[]
%% K2: W=idle, Q=[{notthere,2}], B=[]
%% M1: W={notfound,1}, Q=[], B=[]
{kvget,
{trace,all,{vnode,{queued,0,{<<"notthere">>,<<"2">>}}}}},
{kvget,
{trace,all,
{vnode,
{waiting,
1438665674247607560106752257205091097473808596992}}}},
%% 2: fallback worker sends notfound output to slowmap vnode
%% K1: W=idle, Q=[], B=[]
%% K2: W=idle, Q=[], B=[]
%% M1: W={notfound,1}, Q=[{notfound,2}], B=[]
{slowmap,
{trace,all,
{vnode,
{queued,593735040165679310520246963290989976735222595584,
{{error,notfound},{<<"notthere">>,<<"2">>},undefined}}}}},
{kvget,{trace,all,{vnode,{waiting,0}}}},
%% 3: send third input to primary kvget vnode
%% K1: W=idle, Q=[{notthere 3}], B=[]
%% K2: W=idle, Q=[], B=[]
%% M1: W={notfound,1}, Q=[{notfound,2}], B=[]
{kvget,
{trace,all,
{vnode,
{queued,1438665674247607560106752257205091097473808596992,
{<<"notthere">>,<<"3">>}}}}},
%% 3: primary kvget worker forwards third input to fallback vnode
%% K1: W=idle, Q=[], B=[]
%% K2: W=idle, Q=[{notthere 3}], B=[]
%% M1: W={notfound,1}, Q=[{notfound,2}], B=[]
{kvget,
{trace,all,{vnode,{queued,0,{<<"notthere">>,<<"3">>}}}}},
{kvget,
{trace,all,
{vnode,
{waiting,
1438665674247607560106752257205091097473808596992}}}},
%% 3: falback worker blocks on sending third notfound to slowmap vnode
%% K1: W=idle, Q=[], B=[]
%% K2: W={notthere 3}, Q=[], B=[]
%% M1: W={notfound,1}, Q=[{notfound,2}], B=[{notthere 3}]
{slowmap,
{trace,all,
{vnode,
{queue_full,
593735040165679310520246963290989976735222595584,
{{error,notfound},{<<"notthere">>,<<"3">>},undefined}}}}},
%% 4: send fourth input to primary kvget vnode
%% K1: W=idle, Q=[{notthere, 4}], B=[]
%% K2: W={notthere 3}, Q=[], B=[]
%% M1: W={notfound,1}, Q=[{notfound,2}], B=[{notfound, 3}]
{kvget,
{trace,all,
{vnode,
{queued,1438665674247607560106752257205091097473808596992,
{<<"notthere">>,<<"4">>}}}}},
%% 4: primary kvget worker forwards forth input to fallback vnode
%% K1: W=idle, Q=[], B=[]
%% K2: W={notthere 3}, Q=[{notthere, 4}], B=[]
%% M1: W={notfound,1}, Q=[{notfound,2}], B=[{notfound, 3}]
{kvget,
{trace,all,{vnode,{queued,0,{<<"notthere">>,<<"4">>}}}}},
{kvget,
{trace,all,
{vnode,
{waiting,
1438665674247607560106752257205091097473808596992}}}},
%% 5: send fifth input to primary kvget vnode
%% K1: W=idle, Q=[{notthere 5}], B=[]
%% K2: W={notthere 3}, Q=[{notthere, 4}], B=[]
%% M1: W={notfound,1}, Q=[{notfound,2}], B=[{notfound, 3}]
{kvget,
{trace,all,
{vnode,
{queued,1438665674247607560106752257205091097473808596992,
{<<"notthere">>,<<"5">>}}}}},
%% 5: primary kvget worker fails to forward input to fallback
%% because fallback's queue is full
%% K1: W={notthere 5}, Q=[], B=[]
%% K2: W={notthere 3}, Q=[{notthere, 4}], B=[]
%% M1: W={notfound,1}, Q=[{notfound,2}], B=[{notfound, 3}]
{kvget,
{trace,all,
{error,
[{module,riak_kv_pipe_get},
{partition,
1438665674247607560106752257205091097473808596992},
{details,
[{fitting,
#fitting{
pid = <0.1434.0>,ref = #Ref<0.0.0.34313>,
chashfun =
<<249,252,247,9,220,75,44,145,58,167,246,118,61,
183,57,33,176,79,180,32>>,
nval = 2}},
{name,kvget},
{module,riak_kv_pipe_get},
{arg,undefined},
{output,
#fitting{
pid = <0.1433.0>,ref = #Ref<0.0.0.34313>,
chashfun =
<<102,4,176,137,137,246,153,214,136,172,94,97,
152,118,215,52,233,8,52,4>>,
nval = 1}},
{options,
[{sink,
#fitting{
pid = <0.1223.0>,ref = #Ref<0.0.0.34313>,
chashfun = sink,nval = undefined}},
{log,sink},
{trace,all}]},
{q_limit,1}]},
{type,forward_preflist},
{error,[timeout]},
{input,{<<"notthere">>,<<"5">>}},
{modstate,
{state,1438665674247607560106752257205091097473808596992,
#fitting_details{
fitting =
#fitting{
pid = <0.1434.0>,ref = #Ref<0.0.0.34313>,
chashfun =
<<249,252,247,9,220,75,44,145,58,167,246,
118,61,183,57,33,176,79,180,32>>,
nval = 2},
name = kvget,module = riak_kv_pipe_get,
arg = undefined,
output =
#fitting{
pid = <0.1433.0>,ref = #Ref<0.0.0.34313>,
chashfun =
<<102,4,176,137,137,246,153,214,136,172,
94,97,152,118,215,52,233,8,52,4>>,
nval = 1},
options =
[{sink,
#fitting{
pid = <0.1223.0>,
ref = #Ref<0.0.0.34313>,
chashfun = sink,nval = undefined}},
{log,sink},
{trace,all}],
q_limit = 1}}},
{stack,[]}]}}},
{kvget,
{trace,all,
{vnode,
{waiting,
1438665674247607560106752257205091097473808596992}}}}]
Moved from https://issues.basho.com/show_bug.cgi?id=984, reported by @jpartogi
Currently we need to use the erlang client to interface to riak from command line. But however this is not really user friendly and most of the time enforce the user to understand erlang. Because of this sometimes we need to open the manual.
For example (taking redis CLI as an example): This:
del "groceries" "mine"
is much more user friendly and productive than this:
riakc_pb_socket:delete(Pid, <<"groceries">>, <<"mine">>).
riak_client.erl has a reference to a non-existent module riak_kv_mapred_filter.
(without "s")
YAMAMOTO Takashi
http://lists.basho.com/pipermail/riak-users_lists.basho.com/2012-May/008374.html
For consistency, should POST return the new format i.e. "buckets/bucket/keys/key"
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.