Comments (23)
Guys,
Just for the case if you don't find solution for this.
We solved the issue by using hash of lock key string instead of the key string itself - thus limiting lock space to some reasonable and limited number. e.g.:
def lockPath = "/our/lock/items/${item.hashCode() % maxNumberOfLocks}"
from curator.
This is actually an inherit problem with ZooKeeper. The desired behavior is for the parent node to get deleted when it has no children. But, there is no way to do this atomically with the deletion of the last child - at least not from the client. Server support is needed for this. There is a feature enhancement being discussed here: https://issues.apache.org/jira/browse/ZOOKEEPER-723
FYI - it's my understanding that these garbage nodes don't add much overhead.
from curator.
I've experienced an issue with Curator where, after many InterProcessMutex locks for unique paths, the leftover garbage nodes have caused a very large snapshot file to be generated. This causes leader election in ZK to fail while trying to transfer the snapshot between servers, unless initLimit and syncLimit in ZK configs are bumped up considerably.
The fix i've implemented in our code is to simply delete the node that is generated to support the lock (right above the leaf nodes) once a release() is called. This way, locks no longer leave behind garbage that collects over time.
Does this approach make sense? If so, would it be something that I should try to implement in Curator?
from curator.
I'd like to see what you've done. From what I've tried, there's no way to safely remove the parent as other processes might be assuming that it exists.
from curator.
What I've implemented to account for this is some simple retry logic.
Considering the following situation:
Process A: Taking the lock
Process B: Releasing the lockA: Check to see that /namespace/#{lock_id} exits, and if not create it. (Let's say it already exists)
B: Delete the ephemeral node /namespace/#{lock_id}/FOO
B: Delete /namespace/#{lock_id}, since it is now empty
A: Try to create /namespace/#{lock_id}/BAR
--> NoNodeException with message "/namespace/#{lock_id} does not exist."
The simple solution would be for Process A to continue trying to acquire() until SUCCESS (or until X number of retries) in the event of a NoNodeException (I see no other circumstances where this exception can arise in InterProcessMutex). Does this seem like an adequate solution on our end?
In terms of implementing something like this in Curator, here is my (untested) idea:
- Process A calls mutex.acquire()
- In or after the 'ensurePath' step in LockInternals (which makes sure that the path exists, creating nodes as appropriate), set the ACL of /namespace/#{lock_id} to disallow DELETE globally
- Upon successful acquisition, remove the NO-DELETE flag. If another process (B) has set the flag after step 2 (tried to acquire a lock), skip this step. Only the last process calling acquire() should be able to remove the flag. (not sure how to go about this)
from curator.
That's an interesting idea to use the ACL. I'm not sure a general purpose library like Curator can do that, though, as the client may have its own ACL for that node. I'll think about that a bit.
Another avenue: I've been thinking recently that EnsurePath and the create method creatingParentsIfNeeded() are a bit backwards. Instead of pre-checking for the parent paths, they should be reactive. i.e. only create the paths on Exception. This way, deleting the parent node is safe as subsequent locks will get a NoNodeExists exception which would cause it to create the parents.
from curator.
After attempting to implement the solution posted above (to fix things on my end), I found that any instance of EnsurePath can only have EnsurePath.ensure() be called once (subsequent calls are NOPs).
Due to this, I ended up implementing something along the lines of 'ZookeeperClient.newNamespaceAwareEnsurePath(path).ensure(ZookeeperClient)' in the retry logic for the acquisition step (Pretty much exactly the same solution you posted above, but outside of Curator).
Why is this the case? Wouldn't it be safe to assume that a path may need to be created more than once?
If this can be changed, might it be a good idea for ensurePath.ensure() to be called in both circumstances (i.e. both preemptive and reactive)?
If this (or the reactive-only version) can be implemented, I think the only missing piece would be for the InterProcessMutex, or the client, to manually remove unneeded znodes on lock release.
from curator.
The reason is for performance. It would be expensive to check for the parent paths each time.
from curator.
Right, makes sense.
But, what happens in the event of something like this?
Processes A, B, C:
A: Release lock on #{lock_id}
A: Delete the ephemeral node /namespace/#{lock_id}/FOO
A: Delete /namespace/#{lock_id}, since it is now emptyB: Try to get lock for /namespace/#{lock_id}. NoNodeException occurs, so run EnsurePath.ensure()
C: Acquire, then release lock
C: Delete /namespace/#{lock_id}, since it is now emptyB: Try to get lock for /namespace/#{lock_id}. NoNodeException occurs, so run EnsurePath.ensure(), which is now a NOP. #{lock_id} doesn't get created, and process B loops eternally (until another lock on #{lock_id} is attempted by another process).
Thank you for your help, by the way!
from curator.
artemip - I'd end not using EnsurePath or heavily modifying it. Let me try some ideas and I'll report back here.
from curator.
Awesome, thank you.
from curator.
I don't see any way to do this with good guarantees. So, a workaround occurred to me. Why not have a reaper thread that periodically checks for registered nodes. If they are empty, delete them. Here's what I'm thinking of: https://gist.github.com/2970233
Thoughts? Should I add this?
from curator.
It still has the potential race condition where another process taking that lock might have the directory yanked out from under it, right? (It's greatly reduced the probability though, I think.)
@ph1lm that's an interesting approach too. Shouldn't really be done in Curator of course.
from curator.
As to the race - if the recipes/usages are written correctly there should be no race. FYI - I rewrote creatingParentsIfNeeded() as I talked about above. So, if the Reaper deletes the parent, the lock recipes in Curator will be fine as they will just re-make the parents when the error is caught.
from curator.
Oh ok, I thought you were going to create the reaper instead of the above change. My bad.
from curator.
So, to be clear, Curator users will need to create/start the Reaper. Curator won't do it by default. If you like, I can have it on tomorrow's release.
from curator.
@artemip ?
from curator.
Didn't get to it today. I tried, I swear ;) I'll have it by Monday hopefully.
from curator.
I think this is a great solution. Would the reaper be configurable to only listen to and clean up certain directories in ZK? Would be annoying if it consistently removed nodes that are meant to be temporarily empty.
from curator.
Yes - you give the Reaper the paths to check.
from curator.
Thank you very much for the fix - exactly what we needed.
I do have some concerns though:
- The currently available Reaper modes do not take into account the scenario where a path is removed before the Reaper gets a chance to delete it. Would it be possible to implement a mode like REAP_UNTIL_GONE (or similar)? With this mode, you can configure the Reaper to only remove nodes that exist, and forget them if they do not (as opposed to REAP_UNTIL_DELETE, in which case the Reaper only forgets about the path after it successfully deletes it). If this is something I can do myself and send you a pull request for, that would be great.
- There seems to be a possible memory leak with the 'activePaths' variable. In our application, we never know when a path should be removed from the Reaper. So, in our case, this set would grow indefinitely, unless the application explicitly tells the Reaper to remove the paths once it knows that they no longer need to be monitored. Ideally, once a path is deleted, the Reaper should remove all references to it forever. Is this something that can be made configurable as well?
Again, thank you very much for your help.
Let me know if there's anything I can do to assist you with this.
from curator.
Good ideas. Please try to do the implementation and submit a pull request.
from curator.
Pull request submitted: #102
Thanks
from curator.
Related Issues (20)
- java.nio.channels.CancelledKeyException HOT 1
- java.lang.NoSuchMethodError: com.google.common.cache.CacheBuilder.build HOT 4
- adding nodes to zookeeper from local properties through archaius HOT 1
- InterProcessMutex is not releasing when called inside a future's onSuccess function. HOT 1
- Curator's Watch triggered two times for the same notification HOT 3
- Curator integration with Exhibitor-values to be provided for ExhibitorEnsembleProvider arguments HOT 1
- What's the plan for releasing changes to this? HOT 3
- org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists HOT 3
- curator use this.client.create().creatingParentsIfNeeded().withMode(CreateMode.PERSISTENT).forPath(path, data); but I find zk didnot have this node. HOT 2
- Link in wiki is broken HOT 1
- Examples Link in Table of Contents leads to 404 HOT 2
- NoSuchMethodError exception HOT 2
- Curator connecting to a secured SASL zookeeper HOT 1
- Background operation retry gave up HOT 1
- ..
- The result of event.getPath() in BackgroundCallback confused me
- Missing method declaration addAuthInfo
- TestingServer cannot start, it always throw FailedServerStartException
- curator 5.3 connect docker zookeeper cluster
- the parent node never delete cause thousands of parent node
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from curator.