netflix / curator Goto Github PK

View Code? Open in Web Editor NEW

2.1K 610.0 435.0 7.37 MB

ZooKeeper client wrapper and rich ZooKeeper framework

Home Page: http://netflix.github.com/curator

License: Other

Java 100.00%

curator's Introduction

http://netflix.github.com/curator/curator.png

IMPORTANT NOTE!!!

Curator has moved to Apache. The Netflix Curator project will remain to hold Netflix extensions to Curator.

http://curator.apache.org

The previous Netflix branch is now in a branch named "archive".

ZKCLIENT BRIDGE

Please see the doc at https://github.com/Netflix/curator/wiki/ZKClient-Bridge

Jordan Zimmerman (mailto:[email protected])

LICENSE

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

curator's People

Contributors

Stargazers

Watchers

Forkers

seekvence thaingo d5nguyenvan acnithin jmhsieh zarfide ranibagai fengzanfeng yuhanonescreen stigkj nehanarkhede ntolia cb372 sgargan jaxlaw si262001 eatfresh quidryan youest forexware johnnagro stetzer as016194 gxm marczhu simon-og spycos kavanista trevorbernard gwestersf qddxct randgalt seyun springpear artemip amuraru samuelgmartinez bytbox pgdad axemblr zhuomingliang sclasen pbh101 wt alexzuo liweijie tangzk ekdd1987 qingw sumingfei utopiazh c2nes dsias evilkirin baskard maohong answashe longcongduoi strategist922 jameswei pm47 cesaralbloz dougnukem vintagewang fangjh cfregly steamshon butterflywithguns jametong andreaturli angkorcn chenld jeffliu anismiles glimmerveen dhavalsharma kdlan cffyh theambidextrousboy yaja trungz unixcrh bag-of-projects conductor taehui olegshaldybin hunshuiyu jdamick trun baeeq gigfork appliedcode ballerjh ferdiknight sballesteros shaikidris jonohart agiamas haohonglin davidwen

curator's Issues

ConnectionLossException using TestingCluster on Java 7 for Mac (Lion)

We ran into this problem in our own application tests but I was able to write a Curator-specific test that exhibits the same problem (see below). This seems to fail over 90% of the time but I have seen it pass randomly. If I fall back to Java 6 this passes consistently, though our application-level tests still exhibit some flakiness but not in the same exact / reliable manner so we're not sure if this is related.

Here is the stack trace:

2012-03-14 12:54:31,908 ERROR [main] [com.netflix.curator.framework.imps.CuratorFrameworkImpl] Ensure path threw exception
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /namespace
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1003)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1031)
at com.netflix.curator.utils.ZKPaths.mkdirs(ZKPaths.java:163)
at com.netflix.curator.utils.EnsurePath$InitialHelper.ensure(EnsurePath.java:101)
at com.netflix.curator.utils.EnsurePath.ensure(EnsurePath.java:88)
at com.netflix.curator.framework.imps.CuratorFrameworkImpl.ensurePath(CuratorFrameworkImpl.java:479)
at com.netflix.curator.framework.imps.CuratorFrameworkImpl.fixForNamespace(CuratorFrameworkImpl.java:461)
at com.netflix.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:285)
at com.netflix.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:279)
at com.netflix.curator.framework.imps.CreateBuilderImpl$3.forPath(CreateBuilderImpl.java:207)
at com.netflix.curator.framework.imps.CreateBuilderImpl$3.forPath(CreateBuilderImpl.java:165)

Here is the test code:

////// start test code ////

import java.io.IOException;

import org.junit.After;
import org.junit.Before;
import org.junit.Test;

import com.netflix.curator.framework.CuratorFramework;
import com.netflix.curator.framework.CuratorFrameworkFactory;
import com.netflix.curator.retry.ExponentialBackoffRetry;
import com.netflix.curator.test.TestingCluster;

public class CuratorFixtureTest {
private static final String NAMESPACE = "namespace";

TestingCluster testingCluster;

@Before
public void before() throws Exception {
    testingCluster = new TestingCluster(3);
    testingCluster.start();
}

@After
public void after() throws IOException {
    testingCluster.close();

}

@Test
public void foo() throws Exception {
    CuratorFramework framework = CuratorFrameworkFactory.builder().connectString(testingCluster.getConnectString())
            .namespace(NAMESPACE).retryPolicy(new ExponentialBackoffRetry(1, 10)).build();
    try {
        framework.start();

        framework.create().creatingParentsIfNeeded().forPath("/foo22/bar");
    } finally {
        framework.close();
    }

}

}

////// end test code ////

curator leader reconnect

Posted to the ZK DL:

Hi,

i work on a small demo application using the Curator Leader-Election.
What i understand from the wiki is that on a connection LOST-event, the
leader should end his takeLeadership method.

But what should be done with the LeaderSelector instance?
Should this be also closed on a LOST-event? Or can it be re-used, when
RECONNECTED occurs?
How can the LeaderSelector be restarted on RECONNECTED?

Hope someone can help,
Hartmut

Use more specific exception types

This is half question, half feature request.

Is there a particular reason that the forPath() methods throw Exception? If not, I think it would be more useful to throw a more specific exception, e.g. CuratorException, ZooKeeperException or whatever.

In my application I have a lot of places where I mix ZK-related code and app-specific code and surround the whole thing with one try-catch, e.g. (using the standard Apache ZK client):

try {
    if (zk.exists(path, false) != null) {
        // deserialize the zNode's data, which could throw a MyDeserializationFailedException
    } else {
        // zNode should exist, but does not, so throw a MyDataNotFoundException
    }
} catch (KeeperException e) {
    throw new ZooKeeperIsBorkedException(e)
}

This way, I can handle the 3 different types of exception at a higher level, easily differentiating between the various error cases.

But with Curator, I have to either completely rewrite this code or add ugly rethrows like this:

try {
    if (curator.checkExists().forPath(path) != null) {
        // deserialize the zNode's data, which could throw a MyDeserializationFailedException
    } else {
        // zNode should exist, but does not, so throw a MyDataNotFoundException
    }
} catch (MyDeserializationFailedException | MyDataNotFoundException e) {
    throw e;
} catch (Exception e) {
    throw new ZooKeeperIsBorkedException(e)
}

LeaderSelectorListener connection LOST late

Hi Jordan. I've been having great success with Curator, but have one small problem I'd like to ask about.

I'm working on making my application robust in the face of a LOST connection state. The problem is that the LeaderSelectorListener appears to be behave differently than other ConnectStateListeners. Specifically, it doesn't deliver the LOST state when I think it should.

Here's exactly what I'm doing. I've got a cluster of 3 zk servers. I shut them down one by one, and when I've got only one running, I'm under quorum and all the Curator connections go into the SUSPENDED state as expected. I've got RetryOneTime(10000)).sessionTimeoutMs(1000) settings on all my clients, and after 10 seconds, the state switches to LOST on all clients, again, as expected, except for the LeaderSelectorListeners. They stay in the SUSPENDED state indefinitely. When I start one of the zk servers back up -- getting back over quorum -- then and only then do I see the LOST from LeaderSelectorListener.

Is this behavior intentional? If so, what's the reasoning? It seems to be a breaking of the retry policy contract on the client.

I'm working around this by treating SUSPENDED and LOST as the same for now.

Thanks,

Mark

Zookeeper Snapshot and Log Mainenance

According to the zookeeper docs it is the responsibility of the operator to clean up the old transaction logs and snapshots. Does curator add any functionality in this regard to make it easier to automate cleanup?

http://zookeeper.apache.org/doc/r3.2.2/zookeeperAdmin.html#sc_maintenance

Is there a plan for a new release to maven central?

Hi, I was wondering if you are planning to push the latest version to Maven central any time soon.

Many thanks,
Amir

PathChildrenCache IllegalStateException: null

In my connection state testing, I see these in my logs. They don't seem harmful, but I can't figure out what they mean.

java.lang.IllegalStateException: null
at com.google.common.base.Preconditions.checkState(Preconditions.java:129) ~[guava-11.0.1.jar:na]
at com.netflix.curator.framework.imps.CuratorFrameworkImpl.getChildren(CuratorFrameworkImpl.java:294) ~[curator-framework-1.1.7.jar:na]
at com.netflix.curator.framework.recipes.cache.PathChildrenCache.refresh(PathChildrenCache.java:361) [curator-recipes-1.1.7.jar:na]
at com.netflix.curator.framework.recipes.cache.PathChildrenCache.clearAndRefresh(PathChildrenCache.java:291) [curator-recipes-1.1.7.jar:na]
at com.netflix.curator.framework.recipes.cache.PathChildrenCache.handleStateChange(PathChildrenCache.java:320) [curator-recipes-1.1.7.jar:na]
at com.netflix.curator.framework.recipes.cache.PathChildrenCache.access$000(PathChildrenCache.java:64) [curator-recipes-1.1.7.jar:na]
at com.netflix.curator.framework.recipes.cache.PathChildrenCache$1.stateChanged(PathChildrenCache.java:85) [curator-recipes-1.1.7.jar:na]
at com.netflix.curator.framework.state.ConnectionStateManager$2.apply(ConnectionStateManager.java:170) [curator-framework-1.1.7.jar:na]
at com.netflix.curator.framework.state.ConnectionStateManager$2.apply(ConnectionStateManager.java:166) [curator-framework-1.1.7.jar:na]
at com.netflix.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:85) [curator-framework-1.1.7.jar:na]
at com.netflix.curator.framework.state.ConnectionStateManager.processEvents(ConnectionStateManager.java:163) [curator-framework-1.1.7.jar:na]
at com.netflix.curator.framework.state.ConnectionStateManager.access$000(ConnectionStateManager.java:40) [curator-framework-1.1.7.jar:na]
at com.netflix.curator.framework.state.ConnectionStateManager$1.call(ConnectionStateManager.java:96) [curator-framework-1.1.7.jar:na]
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) [na:1.6.0_29]
at java.util.concurrent.FutureTask.run(FutureTask.java:138) [na:1.6.0_29]
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [na:1.6.0_29]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [na:1.6.0_29]
at java.lang.Thread.run(Thread.java:662) [na:1.6.0_29]

TestingZooKeeperServer: %s appears in the log

we have a test for %s appearing in the log as part of our build process. it found this:

curator/curator-test/src/main/java/com/netflix/curator/test/TestingZooKeeperServer.java:97

               logger.error("From testing server (random state: %s)", String.valueOf(configBuilder.isFromRandom()), e);

This produces log entries like:

com.netflix.curator.test.TestingZooKeeperServer | From testing server (random state: %s)

It will be convenient if ConnectionStateListener send a CONNECTED Event also

Currently we get RECONNECTED, SUSPENDED and LOST events. It will be good to get a initial CONNECTED event too. This can be inferred from the CuratorListener today, even though a little hacky.

Shutdown problems with ConnectionStateManager

Yes, I know I should be cleanly shutting down any CuratorFramework instances but I noticed that, if I failed to do so, my JVM would never terminate. This occurs even if I pass a daemon thread factory to CuratorFrameworkFactory.Builder. On looking through the code, I see that ConnectionStateManager builds its own thread factory instead of using what was provided to the builder. Is this intentional?

Waiting for leader election to be done.

Hi Jordan!

Is there a way to pass a callback to LeaderSelector to be called once the new node for each follower has been created? I basically want to wait for each follower until it has completed it's registration on the election node for my tests i.e. to have a synchronized start() method instead of the current background one.

Many thanks,
Amir

Issues with runtime Dependency on guava version > r09

i am using Curator leader recipe in my project (https://github.com/InMobi/data-bus/) which is WIP and faced the following

If i include any version of guava > r09 i.e. 10.0.1 it fails with the following exception

Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.util.concurrent.MoreExecutors.sameThreadExecutor()Ljava/util/concurrent/ExecutorService;
at com.netflix.curator.framework.recipes.leader.LeaderSelector.(LeaderSelector.java:62)

to avoid this i tried using the other constructor of LeaderSelector which allows me to pass the ExecutorService instead of using the G! one.

public LeaderSelector(CuratorFramework client, String mutexPath, ThreadFactory threadFactory, Executor executor, LeaderSelectorListener listener)

again resulted in the same issue as ListenerContainer.addListener method uses the G! MoreExecutors irrespective of user providing his own in the constructor.
Reference Code -
@OverRide
public void addListener(T listener)
{
addListener(listener, MoreExecutors.sameThreadExecutor());
}

After some struggle i was able to get it working with the following

com.google.guava guava r09

Suggestions

Documenting the exact version of this runtime dependency would be helpful
If the user has supplied it's own executorservice it shouldn't try using the G! version.

LeaderSelector AutoRequeue and ConnectionState.LOST

The lack of user mailing list forces me to post this here :s

I'm taking a look at new autorequeue functionality and i'm wondering something. Since i'm using an ExponentialBackOffRetry policy, the use of autorequeue, will requeue automatically LeaderSelector after a connection lost? Or i'm forced to reinstantiate and start the new instance again?

DirectoryUtils.deleteDirectoryContents on windows

Hello,

Thank's a lot for this great project !

A small issue : when using TestingServer on some windows systems, the temp directory of the server is not deleted when the server is closed.

This is because of the following equality check in method DirectoryUtils.deleteDirectoryContents(File directory)

        // Symbolic links will have different canonical and absolute paths 
        if (!directory.getCanonicalPath().equals(directory.getAbsolutePath())) { 
            return; 
        }

In my case, directory.getCanonicalPath() is
C:\WINDOWS\Temp\1332440117841-0

and directory.getAbsolutePath() is :
C:\WINDOWS\TEMP\1332440117841-0

There is a case-sensitivity issue.

InterruptedException when closing DistributedQueue

I'm having the following problem with Curator 1.0.8, with both ZooKeeper 3.3.3 (Linux) and 3.3.5 (OS X). I wrote a test program that starts a CuratorFramework, starts a DistributedQueue with a BlockingQueueConsumer, adds a few items, waits for them to reach the consumer, removes them, closes the queue, then closes the CuratorFramework. Everything succeeds until I try to close the queue, which gives me the following error:

SEVERE: Exception caught in background handler
java.lang.InterruptedException
    at java.lang.Object.wait(Native Method)
    at java.lang.Object.wait(Object.java:485)
    at com.netflix.curator.framework.recipes.queue.DistributedQueue.runLoop(DistributedQueue.java:421)
    at com.netflix.curator.framework.recipes.queue.DistributedQueue.access$100(DistributedQueue.java:62)
    at com.netflix.curator.framework.recipes.queue.DistributedQueue$2.call(DistributedQueue.java:187)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)

What might be causing this?

Thanks,

Esteban

mvn eclipse failing with client-jar dependency

'mvn eclipse:eclipse -Partifactory' currently blows up:

[ERROR] Failed to execute goal on project curator-framework: Could not resolve dependencies for project com.netflix.curator:curator-framework:jar:1.1.1-SNAPSHOT: Failure to find com.netflix.curator:curator-client:jar:1.1.1-SNAPSHOT in https://oss.sonatype.org/content/repositories/snapshots was cached in the local repository, resolution will not be reattempted until the update interval of sonatype-nexus-snapshots has elapsed or updates are forced

InterProcessMutex leaves empty node

Below is groovy code I used for testing:

CuratorFramework framework = CuratorFrameworkFactory.builder().
        connectString(System.getProperty('zookeeper.ensemble', config?.zookeeper?.ensemble ?: 'localhost:2281')).
        sessionTimeoutMs(config?.zookeeper?.session?.timeout ?: 2 * 5000).
        connectionTimeoutMs(config?.zookeeper?.connect?.timeout ?: 2 * 5000).
        retryPolicy(new RetryNTimes(0, 0)).
        build();
framework.start();

def getInteger(property, defaultValue) {
    System.getProperty(property, "${defaultValue}") as Integer
}

getInteger('zookeeper.lock.quantity', 1).times {
    def mutex = new InterProcessMutex(framework, '/some/test');
    mutex.acquire()
    try {
        Thread.sleep(getInteger('zookeeper.lock.sleep', 0))
    } finally {
        mutex.release()
    }
}

framework.close()

After execution I just connect to ZK using Cli:

[zk: localhost:2281(CONNECTED) 7] ls2 /some/test
[]
cZxid = 0x100000004
ctime = Tue Feb 14 17:20:05 EET 2012
mZxid = 0x100000004
mtime = Tue Feb 14 17:20:05 EET 2012
pZxid = 0x100000006
cversion = 2
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 0
numChildren = 0

Node /some/node still exists even after session disconnect.
That may cause a problem because if we don't clear child nodes - parent node may exceed size limit.

Jars used:
curator-client-1.1.0.jar
curator-framework-1.1.0.jar
curator-recipes-1.1.0.jar

Connection States

So I could probably start digging through code, but I figured I'd ask:

If my ConnectionStateListener gets sent a LOST state change, is that the equivalent of Zookeeper's SESSION_EXPIRED?

In other words, will I not get a LOST until I actually reconnect to the cluster (which is how SESSION_EXPIRED works) or will I get a LOST right away after the retry policy is exhausted (but perhaps before I reconnect to the cluster)?

I'm hoping it's the latter. I ask because I'm trying to figure out at which point my locks have become garbage: at SUSPENDED or at LOST.

(Ignoring ZK session timeouts vs retry policy lengths for the moment.)

Thanks, and apologies if this is documented somewhere,
John

PathChildrenCache and unhandledError

From echeddar:

listener on PathChildrenCache, docs for unhandledError message seem to indicate that if it gets called, I need to throw away my cache object, make a new one and start over again. Is that true? 01:20 PM
Yes

Leader recipe: Notify on another leader election

The title is quite self explanatory: It would be great to have the possibility to be notified when a leader is elected while we're still in the waiting queue. This would require two changes:

Exposing the underlying list of waiting nodes
Adding the ability to attach data to a node (an arbitrary user provided string) so the user can identify who the new master node is.

How to get ClosedChannelExceptions to bubble up

Hello again!

Is there an easy way of having the ClosedChannelException's bubble up? I have these cases that I make calls to the Curator framework before it is started and it results in these deadly exceptions:

FATAL: org.apache.zookeeper.server.SyncRequestProcessor.run - Severe unrecoverable error, exiting
java.nio.channels.ClosedChannelException
at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:88)
at sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:243)
at org.apache.zookeeper.server.persistence.Util.padLogFile(Util.java:214)
at org.apache.zookeeper.server.persistence.FileTxnLog.padFile(FileTxnLog.java:237)
at org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:215)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:315)
at org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:468)
at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:107)

But they are not bubbled up and my code just waits there...
Is there a way to pass a callback to catch these or make them be thrown to the runner thread instead of just being logged?

Many thanks,
Amir

Threading Question

Re-posting from Eric Tschetter:

Ok, so I have the choice of using the default ZK thread inside curator
and doing handoff manually in my callbacks, or I can pass the
CuratorFramework instance an Executor that it will use for callbacks.
Are all of the recipes written in a thread-safe manner that would
allow me to pass in a 10 thread pool or something, or is the
synchronization model similar to Swing's in that there's only one
thread updating state at a time?

FATAL errors when accessing Curator objects before they are fully initialized.

Hi JZ,
In my tests, I often run into these errors:

INFO : org.apache.zookeeper.server.PrepRequestProcessor.run - PrepRequestProcessor exited loop!
FATAL: org.apache.zookeeper.server.SyncRequestProcessor.run - Severe unrecoverable error, exiting
java.nio.channels.ClosedChannelException
at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:88)
at sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:243)
at org.apache.zookeeper.server.persistence.Util.padLogFile(Util.java:214)
at org.apache.zookeeper.server.persistence.FileTxnLog.padFile(FileTxnLog.java:237)
at org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:215)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:315)
at org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:468)
at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:107)

Which block the execution of my tests. I seem to be able to get around them by adding waits after I create the Curator objects that I'm creating but I was wondering if there is a better way of handling this.

I'm using Curator's test client as well.

Many thanks,
Amir

Zookeeper SASL Vs Curator

Hi there,

I post this question here since the mailing list has been just set up.

I was wondering if someone is aware of any issue trying to use Curator against a Zookeeper Server that is set to use SASL (ConnectionLossException) since I cannot make it work Curator to make operations or watching events.

I've actually posted a similar message onto the Zookeeper mailing list. Thanks Jordan for the answer but I think I've mislead you.

After some tests I can add more details:

I have a zookeeper server with:

"-Djava.security.auth.login.config=C:\apps\zookeeper-3.4.3\jaas.conf"
and authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider

and using the zkClient.sh to add a node everything works fine.

When I try to do any operation using a Curator client I get a:

Exception in thread "main" org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss

after the node is added.

I'm using zookeeper 3.4.3 and Curator 1.1.5 (but I've tried with ZK 3.4.2 as well)

Example:
Using a Curator client I do:

System.setProperty("java.security.auth.login.config", "C:\apps\zookeeper-3.4.3\client_jaas.conf");

CuratorFramework client = CuratorFrameworkFactory.newClient("127.0.0.1:2183", new RetryNTimes(10, 1000));

client.start();
[...]

client.create().withMode(nodeMode).withACL(ZooDefs.Ids.READ_ACL_UNSAFE).forPath(fullPath, data);

this returns in the client shell:
Exception in thread "main" org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
at com.netflix.curator.ConnectionState.getZooKeeper(ConnectionState.java:84)
at com.netflix.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:90)
at com.netflix.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:381)
at com.netflix.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:184)
at com.netflix.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:173)
at com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:85)
at com.netflix.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:169)
at com.netflix.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:161)
at com.netflix.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:36)
at com.netflix.curator.framework.recipes.cache.PathChildrenCache.rebuild(PathChildrenCache.java:199)
at com.netflix.curator.framework.recipes.cache.PathChildrenCache.start(PathChildrenCache.java:179)
at com.workday.zookeeper.examples.MyChildrenWatcher.main(MyChildrenWatcher.java:53)

while onto the zk server log:

[myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183:NIOServerCnxnFactory@213] - Accepted socket connection from /127.0.0.1:54393
[myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183:ZooKeeperServer@838] - Client attempting to establish new session at /127.0.0.1:54393
[myid:] - INFO [SyncThread:0:ZooKeeperServer@604] - Established session 0x136b09915f00008 with negotiated timeout 15000 for client /127.0.0.1:54393
[myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183:SaslServerCallbackHandler@114] - Successfully authenticated client: authenticationID=bob; authorizationID=bob.
[myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183:SaslServerCallbackHandler@130] - Setting authorizedID: bob
[myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183:ZooKeeperServer@934] - adding SASL authorization for authorizationID: bob

[...]
[myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@466] - Processed session termination for sessionid: 0x136b09915f00009
[myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183:NIOServerCnxn@1000] - Closed socket connection for client /127.0.0.1:54398 which had sessionid 0x136b09915f00009

The node is actually created but after that the client hangs.

Also If I use another client as watcher like:

System.setProperty("java.security.auth.login.config","C:\apps\zookeeper-3.4.3\client_jaas.conf");
[...]

CuratorFramework client = CuratorFrameworkFactory.newClient("127.0.0.1:2183", new RetryNTimes(10, 1000));
client.start();

final PathChildrenCache cache = new PathChildrenCache(client, "/path", true);
cache.getListenable().addListener(new PathChildrenCacheListener() {
@OverRide
public void childEvent(CuratorFramework client, PathChildrenCacheEvent event)
throws Exception {

            System.out.println("event: " + event.getType());
            System.out.println("data: " + new String(event.getData().getData()));
        }
    });

it hangs as well and doesn't get any event. After the retry loop ends it gets the ConnectionLossException too.

If I remove the java.security.policy config file setting and the authProvide everything works fine.

Thanks
Antonio

LeaderSelector autoRequeue() doesn't handle connection issues

Reported by user: ahfeel

https://gist.github.com/2244298

Example shows that during connection instability, LeaderSelector does not requeue.

Feature request: finalize() and closing

For things that are Closeable, can I request that you also override finalize() to emit an error message if the thing is not closed when it gets finalized?

I've found this useful in other libraries to help catch bugs a little quicker.

DistributedBarrier Issue

I'm attempting to use DistributedBarrier, and I'm seeing unexpected behavior.

If I create a new DistributedBarrier for a node that doesn't exist, I expect waitOnBarrier() to block indefinitely, however, what I see is that the method returns immediately.

Looking at the unit test DistributedBarrierTest.testNoBarrier, I see that the line
Assert.assertTrue(barrier.waitOnBarrier(10, TimeUnit.SECONDS));
returns immediately returning true. I expect it to wait 10 seconds and return false.

I suspect the issue is in DistributedBarrier.waitOnBarrier. The line below
result = (client.checkExists().usingWatcher(watcher).forPath(barrierPath) == null);
should be
result = (client.checkExists().usingWatcher(watcher).forPath(barrierPath) != null);

As if checkExists returns null, the node doesn't exist, so we should wait until the node exists or the timeout.

Foreign node in queue path causes infinite loop

If an item doesn't start with "queue-" it will warn that a "Foreign node in queue path" but it will never make a decision on what to do with that item which causes an infinite loop in com.netflix.curator.framework.recipes.queue.DistributedQueue:421.

Is there a way dequeue an entry put into a distributed queue?

I'm using the Queue as a JobRunner and would like to have the option of killing jobs. Is there a way to do this cleanly?
(I can think of keeping track of killed jobs and going over them the consume method.)

Watcher events include the namespace in the path

Unless I am missing something, if I have an object that wants to use watches then I must pass it the namespace that was used with the CuratorFramework client was created.

In the following tiny example program, I would expect it to print "/foo" for the path but it prints "/namespace/foo".

public class WatchIncludesNamespace {

public static void main(String[] args) throws Exception {
    CuratorFramework client = CuratorFrameworkFactory.builder().connectString("pt.devel.vivisimo.com:21181")
            .retryPolicy(new RetryOneTime(10)).namespace("/namespace").build();
    client.start();
    Watcher watcher = new Watcher() {

        public void process(WatchedEvent event) {
            System.out.println("Event for " + event.getPath());
        }
    };
    try {
        client.create().forPath("/foo", null);
    } catch (NodeExistsException e) {
    }
    client.getData().usingWatcher(watcher).forPath("/foo");
    client.setData().forPath("/foo", "bar".getBytes("UTF-8"));
    client.close();
}

}

Accessing Queued item's ID for an IdQueue

Hi Jordan,

Many thanks for adding the distributed id queue. I was wondering if there is a way to retrieve the id I'm setting for the item inside the consumeMessage() method.

Cheers,
Amir

Ability to use TestableZooKeeper

I am trying to write some unit tests for our application code (using Curator & ZK 3.4.2).

I'd like to be able to simulate connection loss / timeouts by using the following -

http://svn.apache.org/repos/asf/zookeeper/trunk/src/java/test/org/apache/zookeeper/TestableZooKeeper.java

But there doesn't seem to be a way to control how ZooKeeper gets instanciated in HandleHolder.java. It'd be nice to be able to provide and/or control how ZooKeeper is instanciated.

If there is already an existing way to do something like this, please let me know.

Thanks,
John

Correct way of using Double Barries

(moved from ZooKeeper users mailing list)

Dear JZ,

I am curious to know how DistributedDoubleBarries can be correctly used in the same JVM. I basically want to sync the start and end of an execution inside a DistributedQueue's consume method and a Listener call for a PathChildrenCache.

I would truly appreciate an example snippet.

Cheers,
Amir

distributed queue in curator

Hi, Mr Zimmerman:

I came across the curator framework based on zookeeper, and got a question on distributed queue recipe in curator. I did not find any specific email group or forum I can post question, so send directly to you, let me know if there is such place to ask.

Basically the zookeeper recipe on queue (http://zookeeper.apache.org/doc/r3.4.2/zookeeperTutorial.html#sc_producerConsumerQueues) sounds like could be multiple producers and multiple consumers and they are not necessarily in same JVM, e.g. a produce and 2 consumers are in 3 different JVM.

I looked the DistributedQueue recipe in curator, it look like

both producer and consumers must be in same JVM
Only one consumer can consume message, you cannot have 2 or more consumers.

Is my understanding right ?

Is it possible to use curator distributed queue receipe to get

multiple producers and multiple consumers
each of them may live in different process.

Thanks
Jason Zhang

Feature Request: Add the path to PathChildrenCacheEvent

Having the path available on the event is great for logging purposes. Simplifies debugging production issues and whatnot a lot.

ServiceDiscovery usage

Not so much an issue here (well, maybe one for documentation), but I am wondering about the cacheability of the ServiceProvider. Is it the intent that one creates a ServiceProvider (not the actual instance, of course) once, starts it up and then caches it, or are you supposed to create one more often? I'm assuming it's meant to be cached, but didn't see any specific docs on it. Is the ServiceProvider thread-safe?

Struggling with watches

Hi there. I'm having some problems with surfacing watches in CuratorListener.

event.getData() always returns null.
WatchedEvent.NodeCreated never surfaces
Occasionally I get WATCHED type events and event.getPath() returns null

I'll work on creating unit tests that demonstrates this, but would be grateful if you could point me at code that shows me I'm wrong.

Thanks.

IPv6 problems with ServiceInstanceBuilder

We ran into this exception today with ServiceInstanceBuilder

Caused by: java.net.SocketException: No such device
        at java.net.NetworkInterface.isP2P0(Native Method)
        at java.net.NetworkInterface.isPointToPoint(NetworkInterface.java:354)
        at com.netflix.curator.x.discovery.ServiceInstanceBuilder.getAllLocalIPs(ServiceInstanceBuilder.java:139)
        at com.netflix.curator.x.discovery.ServiceInstance.builder(ServiceInstance.java:49)

Seeing native code in your stacktrace is always scary but it turned out the getAllLocalIPs() function was coming across an interface with an address of /fe80:0:0:0:5a55:caff:fe24:b9a and that had problems with NetworkInterface.isPointToPoint().

Disabling IPv6 on this machine helped fix the issue but I wanted to point this out in case others ran into it and if you guys wanted to tweak the code to sidestep this issue.

Curator 1.0.1 and ZK keeps retrying to connect to server

import org.apache.commons.lang.StringUtils;

import com.knewton.pettingzoo.config.ZookeeperClientConfig;
import com.netflix.curator.framework.CuratorFramework;
import com.netflix.curator.framework.CuratorFrameworkFactory;
import com.netflix.curator.retry.RetryNTimes;

public class Scratch {
public static void main(String[] args) throws Exception{
CuratorFramework cf = CuratorFrameworkFactory.builder()
.connectionTimeoutMs(1000)
.retryPolicy(new RetryNTimes(10, 0))
.connectString("localhost:2181,localhost:2182,localhost:2183")
.build();
System.out.println("_Starting...");
cf.start();
System.out.println("_Started...");
try{
new EnsurePath ("/somepath").ensure(cf.getZookeeperClient());
cf.setData().forPath("/somepath", "mydata".getBytes());
System.out.println("* Set some data* ");
Thread.sleep(10000);
cf.close();
}catch(Exception e){
System.out.println("** Dropped");
}
}
}

The problem: I executed this piece of code using the CLI. When I start up a cluster and run this piece of code things are fine the method runs and exits. But If say The server is shut down and the code tries to connect to it I see org.apache.zookeeper.ClientCnxn keeps trying a connection infinitely INSPITE OF A RETRY POLICY. Which is weird. I THINK this is a zookeeper bug but want to be sure why curator is not handling this properly? Or weather its really a curator bug thats hiding.

=========Debug log=============
15115 [main-SendThread(localhost:2183)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server localhost/127.0.0.1:2181
15117 [main-SendThread(localhost:2181)] WARN org.apache.zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
16998 [main-SendThread(localhost:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2182
16999 [main-SendThread(localhost:2182)] WARN org.apache.zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
17169 [main-SendThread(localhost:2182)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server localhost/127.0.0.1:2182
17171 [main-SendThread(localhost:2182)] WARN org.apache.zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect

InterProcessMutex questions

-- From John Laban, PagerDuty Inc

I've been dabbling with Curator and it's very nice. I had a few quick questions about it and I was hoping you could help me. If there's a mailing list or somewhere I should be directing these questions, I'd be happy to send them there instead.

I'm mainly using Curator currently for the InterProcessMutex recipe.

Based on the name, I'm guessing that InterProcessMutex doesn't work intra-process, right? To prevent two threads in the same process from acquiring the same lock at the same time, I'll have to layer some other lock on top? (Like a java.util.concurrent.locks.ReentrantLock).
I want to make sure that no two processes think they own the lock at the same time, even if one of those processes disconnects from ZooKeeper part way through a transaction. It sounds like a ConnectionStateListener is the way to detect this... is the listener notified immediately upon a disconnect, or is there a delay? (Also: do you know if there's any way to configure Zookeeper so that ephemeral nodes take a few seconds to disappear after a session ends? This would fix the problem too, afaict, although I don't think it's possible.)
Have you thought about implementing read vs write locks? (Or is there a recipe that I missed?)

Thanks for the help, and thanks a ton for putting this client out there,
John Laban
PagerDuty Inc

possible failure mode in locking

In LockInternals I think this line is problematic,

client.create().withMode(CreateMode.EPHEMERAL_SEQUENTIAL).forPath(path, new byte[0]);

The problem is that the create call may succeed on the server, but the server may crash before it can send the response to the client. If that happened, an ephemeral node would exist in zookeeper, but the client would see an error, and not realize that it had acquired the lock.

This can occur without the session becoming invalid. For example if a client is talking to server A, A crashes, and the client reconnects to server B. The clients session is still valid, and the ephemeral node exists, but the client does not know it exists. What is worse, the client does not even know the path of the node to delete since the node is created with EPHEMERAL_SEQUENTIAL.

What we do is embed a uuid in path, with the uuid being unique for every lock attempt. On failure to create, the client schedules a getChildren to be performed when it reconnects, and deletes the node based on the uuid.

The zk tree looks like,

 /path/to/lock-<uuid2><zk added sequence>
             /lock-<uuid2><zk added sequence>
             /lock-<uuid3><zk added sequence>

PathChildrenCache anomalies

Seeing some unexpected behaviors with the PathChildrenCache recipe in both CACHE_DATA and CACHE_DATA_AND_STAT modes.

When a child is deleted or created, the expected event is triggered in the listeners, but there are also extra CHILD_UPDATED events for all the other children as well.
When a child is updated with the same data it had before a CHILD_UPDATED event is triggered. This sounds right for CACHE_DATA_AND_STAT mode (since the Stat object is changing), but not for CACHE_DATA mode? Not sure.
When start() is called, there is CHILD_UPDATED event triggered for each (already existing) child. Was not expecting this, but I can see how this makes sense.

curator-test dependencies

I was wondering why curator-test's pom.xml defines testng as a dependency. It doesn't seem to be used and, because of the lack of a test scope, it pulls testng into projects that import curator-test. Would a patch to remove it be acceptable?

javassist.NotFoundException

Hi again,

I'm having a bit of trouble using the TestingCluster, and I think it's because of the javassist stuff that it does in it's static initializer block.

ClassPool.get can't seem to find the LearnerZooKeeperServer class:

javassist.NotFoundException: org.apache.zookeeper.server.quorum.LearnerZooKeeperServer
at javassist.ClassPool.get(ClassPool.java:440)
at com.pagerduty.locking.ZookeeperLockSpec.(ZookeeperLockSpec.scala:33)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
...

However I'm sure that LearnerZooKeeperServer is on my classpath as I can even import the thing into my tests:

import org.apache.zookeeper.server.quorum.LearnerZooKeeperServer

I'm using Curator from Scala (via SBT) so there could be something kooky going on due to that. (Curator works great, it's just this javassist test stuff that is failing.)

However, loading the class with javassist works fine when I add the middle line below that first appends to the classpath:

// this is Scala btw
val pool = ClassPool.getDefault
pool.appendClassPath(new javassist.LoaderClassPath(getClass.getClassLoader)) // NEW LINE
pool.get("org.apache.zookeeper.server.quorum.LearnerZooKeeperServer")

It seems pretty benign. Would you be able to add a similar line (the Java equivalent) to the TestingCluster's initializer? (Unless there's a better way to fix this.)

Thanks,
John

Storage of Scheme Name for ServiceDiscovery

Does it make sense to have a value to store the scheme used for the Service that is being monitored (i.e. http, https, etc.)?

I see we have ssl port on the ServiceInstanceBuilder, but what about other protocols than just HTTP? For now, I think I will hook it into a strongly typed payload (looks like it gets serialized to JSON using Jackson, right?), but I was curious if I am missing something.

ConnectionStateListener inconsistent feedback

I'm seeing a strange behavior using the ConnectionStateListener interface. My goal is to pause a service if the ZooKeeper connection is lost.

My listener gets called once or twice and then the listener just don't get called at all, no idea why.

The scenario is that I have my two ZK servers running, and then I just shutdown one, breaking the Quorum.

[2012-03-20 11:11:45,766][main-EventThread][com.netflix.curator.framework.state.ConnectionStateManager] INFO - State change: SUSPENDED
[2012-03-20 11:11:45,769][ConnectionStateManager-0][com.archivezen.server.master.sync.ReplicationService] WARN - Pausing replication (ZooKeeper connection was SUSPENDED)
[2012-03-20 11:11:46,310][LearnerHandler-/127.0.0.1:64639][org.apache.zookeeper.server.quorum.LearnerHandler] ERROR - Unexpected exception causing shutdown while sock still open
[2012-03-20 11:11:48,959][main-EventThread][com.netflix.curator.framework.imps.CuratorFrameworkImpl] ERROR - Background operation retry gave up org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
[2012-03-20 11:11:48,960][main-EventThread][com.netflix.curator.framework.state.ConnectionStateManager] INFO - State change: LOST
[2012-03-20 11:11:48,976][ConnectionStateManager-0][com.archivezen.server.master.sync.ReplicationService] WARN - Pausing replication (ZooKeeper connection was LOST)
[2012-03-20 11:11:50,791][main-EventThread][com.netflix.curator.framework.state.ConnectionStateManager] INFO - State change: RECONNECTED
=== HERE MY LISTENER DOESN'T GET CALLED ===

In my experience using the Leader recipe, when I have a connection LOSS I just call start() again which adds the listener back to the listenable. Should I just do the same with my service ?

PathChildrenCache doesn't throw exceptions when path doesn't exist

PathChildrenCache doesn't throw exceptions when path doesn't exist on zookeeper server,
so in this case if you start a cache previous of path creation, then the listener doesn't catch any event.
This behavior is not logged in, so the user cannot do action to actuate any sort of compensation.

CuratorFramework addListener() throws NoSuchMethod Exception

Wonder if anyone can help me out with this, using the latest versions of curator and guava but still running into this exception!

Caused by: java.lang.NoSuchMethodError: com.google.common.util.concurrent.MoreExecutors.sameThreadExecutor()Lcom/google/common/util/concurrent/ListeningExecutorService;

at com.netflix.curator.framework.listen.ListenerContainer.addListener(ListenerContainer.java:40)

Curator Client/Framework/Receipes : Version 1.1.5 (same with 1.1.4 and other earlier releases)
Google Guava : Version 11.0.1 (same with 10.0.1)

Any help would be greatly appreciated!
Nick

DistributeQueue has no way to set an Executor for the consumer

it seems like that during the consumeMessage of the DistributedQueue, all other watchers are blocked and no other watch fires before that method has left.

InterProcessMutex from multiple threads

I started testing with the InterProcessMutex today and was surprised to find out that it threw an IllegalMonitorException if another thread in the same jvm already had the lock. What I expected was behaviour within the same jvm similar to Java's standard ReentrantLock.

What are the best practices for the case where one would like the lock to be shared both within and across jvms? What I can do short term is to try/catch the illegal monitor exception everywhere I need to call the lock but that seems cludgy and is less than ideal. It seems like to get the behaviour I would really want the acquire method would need the ability to block.

netflix / curator Goto Github PK

curator's Introduction

IMPORTANT NOTE!!!

ZKCLIENT BRIDGE

LICENSE

curator's People

Contributors

Stargazers

Watchers

Forkers

curator's Issues

Recommend Projects

Recommend Topics

Recommend Org