Below is groovy code I used for testing: CuratorFramework

InterProcessMutex leaves empty node about curator HOT 23 CLOSED

netflix commented on June 19, 2024

InterProcessMutex leaves empty node

from curator.

Comments (23)

ph1lm commented on June 19, 2024 1

Guys,
Just for the case if you don't find solution for this.
We solved the issue by using hash of lock key string instead of the key string itself - thus limiting lock space to some reasonable and limited number. e.g.:
def lockPath = "/our/lock/items/${item.hashCode() % maxNumberOfLocks}"

from curator.

Randgalt commented on June 19, 2024

This is actually an inherit problem with ZooKeeper. The desired behavior is for the parent node to get deleted when it has no children. But, there is no way to do this atomically with the deletion of the last child - at least not from the client. Server support is needed for this. There is a feature enhancement being discussed here: https://issues.apache.org/jira/browse/ZOOKEEPER-723

FYI - it's my understanding that these garbage nodes don't add much overhead.

from curator.

artemip commented on June 19, 2024

I've experienced an issue with Curator where, after many InterProcessMutex locks for unique paths, the leftover garbage nodes have caused a very large snapshot file to be generated. This causes leader election in ZK to fail while trying to transfer the snapshot between servers, unless initLimit and syncLimit in ZK configs are bumped up considerably.

The fix i've implemented in our code is to simply delete the node that is generated to support the lock (right above the leaf nodes) once a release() is called. This way, locks no longer leave behind garbage that collects over time.

Does this approach make sense? If so, would it be something that I should try to implement in Curator?

from curator.

Randgalt commented on June 19, 2024

I'd like to see what you've done. From what I've tried, there's no way to safely remove the parent as other processes might be assuming that it exists.

from curator.

artemip commented on June 19, 2024

What I've implemented to account for this is some simple retry logic.

Considering the following situation:

Process A: Taking the lock
Process B: Releasing the lock

A: Check to see that /namespace/#{lock_id} exits, and if not create it. (Let's say it already exists)
B: Delete the ephemeral node /namespace/#{lock_id}/FOO
B: Delete /namespace/#{lock_id}, since it is now empty
A: Try to create /namespace/#{lock_id}/BAR
--> NoNodeException with message "/namespace/#{lock_id} does not exist."

The simple solution would be for Process A to continue trying to acquire() until SUCCESS (or until X number of retries) in the event of a NoNodeException (I see no other circumstances where this exception can arise in InterProcessMutex). Does this seem like an adequate solution on our end?

In terms of implementing something like this in Curator, here is my (untested) idea:

Process A calls mutex.acquire()

In or after the 'ensurePath' step in LockInternals (which makes sure that the path exists, creating nodes as appropriate), set the ACL of /namespace/#{lock_id} to disallow DELETE globally

Upon successful acquisition, remove the NO-DELETE flag. If another process (B) has set the flag after step 2 (tried to acquire a lock), skip this step. Only the last process calling acquire() should be able to remove the flag. (not sure how to go about this)

from curator.

Randgalt commented on June 19, 2024

That's an interesting idea to use the ACL. I'm not sure a general purpose library like Curator can do that, though, as the client may have its own ACL for that node. I'll think about that a bit.

Another avenue: I've been thinking recently that EnsurePath and the create method creatingParentsIfNeeded() are a bit backwards. Instead of pre-checking for the parent paths, they should be reactive. i.e. only create the paths on Exception. This way, deleting the parent node is safe as subsequent locks will get a NoNodeExists exception which would cause it to create the parents.

from curator.

artemip commented on June 19, 2024

After attempting to implement the solution posted above (to fix things on my end), I found that any instance of EnsurePath can only have EnsurePath.ensure() be called once (subsequent calls are NOPs).
Due to this, I ended up implementing something along the lines of 'ZookeeperClient.newNamespaceAwareEnsurePath(path).ensure(ZookeeperClient)' in the retry logic for the acquisition step (Pretty much exactly the same solution you posted above, but outside of Curator).

Why is this the case? Wouldn't it be safe to assume that a path may need to be created more than once?

If this can be changed, might it be a good idea for ensurePath.ensure() to be called in both circumstances (i.e. both preemptive and reactive)?

If this (or the reactive-only version) can be implemented, I think the only missing piece would be for the InterProcessMutex, or the client, to manually remove unneeded znodes on lock release.

from curator.

Randgalt commented on June 19, 2024

The reason is for performance. It would be expensive to check for the parent paths each time.

from curator.

artemip commented on June 19, 2024

Right, makes sense.

But, what happens in the event of something like this?

Processes A, B, C:

A: Release lock on #{lock_id}
A: Delete the ephemeral node /namespace/#{lock_id}/FOO
A: Delete /namespace/#{lock_id}, since it is now empty

B: Try to get lock for /namespace/#{lock_id}. NoNodeException occurs, so run EnsurePath.ensure()

C: Acquire, then release lock
C: Delete /namespace/#{lock_id}, since it is now empty

B: Try to get lock for /namespace/#{lock_id}. NoNodeException occurs, so run EnsurePath.ensure(), which is now a NOP. #{lock_id} doesn't get created, and process B loops eternally (until another lock on #{lock_id} is attempted by another process).

Thank you for your help, by the way!

from curator.

Randgalt commented on June 19, 2024

artemip - I'd end not using EnsurePath or heavily modifying it. Let me try some ideas and I'll report back here.

from curator.

artemip commented on June 19, 2024

Awesome, thank you.

from curator.

Randgalt commented on June 19, 2024

I don't see any way to do this with good guarantees. So, a workaround occurred to me. Why not have a reaper thread that periodically checks for registered nodes. If they are empty, delete them. Here's what I'm thinking of: https://gist.github.com/2970233

Thoughts? Should I add this?

from curator.

jlaban commented on June 19, 2024

It still has the potential race condition where another process taking that lock might have the directory yanked out from under it, right? (It's greatly reduced the probability though, I think.)

@ph1lm that's an interesting approach too. Shouldn't really be done in Curator of course.

from curator.

Randgalt commented on June 19, 2024

As to the race - if the recipes/usages are written correctly there should be no race. FYI - I rewrote creatingParentsIfNeeded() as I talked about above. So, if the Reaper deletes the parent, the lock recipes in Curator will be fine as they will just re-make the parents when the error is caught.

from curator.

jlaban commented on June 19, 2024

Oh ok, I thought you were going to create the reaper instead of the above change. My bad.

from curator.

Randgalt commented on June 19, 2024

So, to be clear, Curator users will need to create/start the Reaper. Curator won't do it by default. If you like, I can have it on tomorrow's release.

from curator.

jlaban commented on June 19, 2024

@artemip ?

from curator.

Randgalt commented on June 19, 2024

Didn't get to it today. I tried, I swear ;) I'll have it by Monday hopefully.

from curator.

artemip commented on June 19, 2024

I think this is a great solution. Would the reaper be configurable to only listen to and clean up certain directories in ZK? Would be annoying if it consistently removed nodes that are meant to be temporarily empty.

from curator.

Randgalt commented on June 19, 2024

Yes - you give the Reaper the paths to check.

from curator.

artemip commented on June 19, 2024

Thank you very much for the fix - exactly what we needed.

I do have some concerns though:

The currently available Reaper modes do not take into account the scenario where a path is removed before the Reaper gets a chance to delete it. Would it be possible to implement a mode like REAP_UNTIL_GONE (or similar)? With this mode, you can configure the Reaper to only remove nodes that exist, and forget them if they do not (as opposed to REAP_UNTIL_DELETE, in which case the Reaper only forgets about the path after it successfully deletes it). If this is something I can do myself and send you a pull request for, that would be great.
There seems to be a possible memory leak with the 'activePaths' variable. In our application, we never know when a path should be removed from the Reaper. So, in our case, this set would grow indefinitely, unless the application explicitly tells the Reaper to remove the paths once it knows that they no longer need to be monitored. Ideally, once a path is deleted, the Reaper should remove all references to it forever. Is this something that can be made configurable as well?

Again, thank you very much for your help.

Let me know if there's anything I can do to assist you with this.

from curator.

Randgalt commented on June 19, 2024

Good ideas. Please try to do the implementation and submit a pull request.

from curator.

artemip commented on June 19, 2024

Pull request submitted: #102

Thanks

from curator.

InterProcessMutex leaves empty node about curator HOT 23 CLOSED

Comments (23)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent