Comments (6)
Reported in zd://1139
from riak_kv.
Also, there is a perhaps related issue going on in the thread started here:
http://lists.basho.com/pipermail/riak-users_lists.basho.com/2012-May/008389.html
which might look more like this issue starting here:
http://lists.basho.com/pipermail/riak-users_lists.basho.com/2012-May/008423.html
it isn't clear to me if they're the same thing.
@reiddraper might be able to say with more authority if we're looking at one issue or two related issues.
from riak_kv.
https://gist.github.com/2938621 reproduces the issue on my machine on a clean cluster
from riak_kv.
Moving to 2.1 milestone. Speak up if there are any objections, please.
from riak_kv.
In case it helps anyone, here's a quick ruby script to crawl through and remove the tombstones
use at your own risk - you're not supposed to list all keys in production, but for us the alternative was to have 560K tombstones sitting around taking up space when we only needed ~3K active keys. Toss this in a monthly cron job and there you go - also, if you needed the script to use less memory you could alter the first curl request to: curl 'localhost:8098/buckets/YOUR_BUCKET_HERE/keys?keys=stream > /tmp/tmp_file.json
and then parse the resulting JSON file one chunk at a time.
Note that the ?pw=all&pr=all&w=all&r=all
is the magic part (as per http://riak-users.197444.n3.nabble.com/Riak-Client-Resources-Deleting-a-Key-Doesn-t-Remove-it-from-bucket-keys-td4003576.html).
riak_tombstone_cleanup.rb
require 'json'
puts "Getting list of riak keys"
keys = JSON.parse(`curl 'localhost:8098/buckets/YOUR_BUCKET_HERE/keys?keys=true'`)['keys']
puts "#{keys.count} keys loaded"
bad_keys_counter = 0
good_keys_counter = 0
keys.each do |key|
result = `curl -I 'localhost:8098/buckets/YOUR_BUCKET_HERE/keys/#{key}?pw=all&pr=all&w=all&r=all' 2>&1`
if result['404 Object Not Found']
bad_keys_counter += 1
else
good_keys_counter += 1
end
puts "Processed #{bad_keys_counter} bad key(s) and #{good_keys_counter} good key(s)" if (bad_keys_counter + good_keys_counter) % 1000 == 0
end
from riak_kv.
We spent quite a bit of time today discussing this behavior and have decided to roll better reaping functionality into AAE as opposed to relying entirely on the get after put which might not be done propagating(depending on dw/pw settings). Additionally, when AAE is used as a view of the data for scans etc, we'll be smarted about sifting out tombstones.
Thanks everyone for their discussion and contributions. Closing this issue.
from riak_kv.
Related Issues (20)
- Auto-connect for nextgenrepl real-time HOT 1
- Use Tictac AAE for hints for more repairs
- Eraser/Reaper/Repl-Src - Unbounded queues and crashing with large loop states HOT 3
- Replication, delete_mode not keep and key amnesia HOT 5
- Replication auto_check for nextgenrepl full-sync HOT 2
- Real-time replication - switch to `riak_kv_overflow_queue`
- Tombstone pause, reap pause HOT 1
- Memory fragmentation HOT 11
- Unable to download HOT 2
- Corrupted Object and AAE HOT 3
- Leveled and backend pause HOT 3
- Scheduler Settings HOT 17
- Cannot get Tictac AAE build status
- Joins, 2i queries and bucket types HOT 1
- Ignoring pr value from bucket type. HOT 6
- Hinted handoff and GET HOT 5
- Equal not dominates and Merge
- Differences between APIs - GET before PUT HOT 4
- Conditional PUTs - PB vs HTTP API HOT 2
- Handoff and Delete HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from riak_kv.