Giter Site home page Giter Site logo

Comments (12)

zhuchenwang avatar zhuchenwang commented on June 20, 2024

zkcluster will block the execution thread until connect to Zookeeper. Since we have started a embeded Zookeeper instance locally, there might be some timing issue.

from squbs.

akara avatar akara commented on June 20, 2024

Good point. How do we make it safe?

from squbs.

zhuchenwang avatar zhuchenwang commented on June 20, 2024

Here is the code https://github.com/paypal/squbs/blob/master/squbs-zkcluster/src/test/scala/org/squbs/cluster/ZkClusterMultiActorSystemTestKit.scala#L107
I might not have much time to take a look recently. If you guys couldn't figure out, I can take care of it later. BTW, can I access the travis CI now?

from squbs.

anilgursel avatar anilgursel commented on June 20, 2024

I do not think the builds are hanging because of zkcluster. Some tests in unicomplex (MultilistenerSpec) and in test-kit (CustomTestkitSpec) have issues. In my local, I even see compile issues with some test classes.. Will update when I have more information..

from squbs.

anilgursel avatar anilgursel commented on June 20, 2024

The issues I mentioned in my previous comment seem to be unrelated.. Still need to be addressed though..

I built zkcluster quite many times exclusively and no hanging at all. Also, builds with zkcluster excluded were also hanging.

The problem seems to be caused by ActorMonitorSpec. Gets stuck in this loop:

https://github.com/paypal/squbs/blob/master/squbs-actormonitor/src/test/scala/org/squbs/actormonitor/ActorMonitorSpec.scala#L65

from squbs.

akara avatar akara commented on June 20, 2024

By the time you get to L65, the system is already fully initialized. Individual actor initialization is asynchronous and therefore some actors are not yet started. You're right, we could not use an event to wake it up. Yet, an infinite loop also does not seem to be the right thing. We definitely should fail the build instead.

This solves the hang, but would get you into sporadic failures. It just means there are less than 12 actors are currently active. So we need to make sure we cause all 12 actors to be active before checking. This can be done by sending a message to the actors and awaiting their responses. Then we check for the 12 actors expose the stats through JMX.

Let me try this out.

from squbs.

zhuchenwang avatar zhuchenwang commented on June 20, 2024

We already used awaitAssert here https://github.com/paypal/squbs/blob/master/squbs-actormonitor/src/test/scala/org/squbs/actormonitor/ActorMonitorSpec.scala#L114
Shall we just changed the test case all to use awaiAssert then we probably can get rid of the infinite loop?
The idea here is that if everything goes correct, the actors will be started at a certain point of time. Just give more chances to get the bean value. If some actors was not started correctly, that means something has to be wrong. Then the build should fail.

from squbs.

akara avatar akara commented on June 20, 2024

I really like this path. The only concern I have is that the awaitAssert itself does not force the actor to become active. So is it possible for some of these actors to get stuck in the actor shell creation (empty shell)? Because they never received a message, the creation of the actor itself will happen a bit later or even get optimized into lazy initialization?

I'd still want to ping each of the critical actors in this test once, just to make sure they're good. The identify message is hopefully good enough to ensure the actor is indeed created (causing the JMX beans to be created). But if we want a sure path, we probably need to hit each actor with an app message. That can be done, too.

from squbs.

zhuchenwang avatar zhuchenwang commented on June 20, 2024

I am open to that.

from squbs.

akara avatar akara commented on June 20, 2024

I think this issue is resolved. Please let me know before I close it. Thx!

from squbs.

anilgursel avatar anilgursel commented on June 20, 2024

It looks like we still have sporadic failures around those lines. Even though frequency is much less. I would keep this open until we fully fix it, link to the related gitter message by you: https://gitter.im/paypal/squbs?at=55ea283e0b6aa72b12ffd02d.

from squbs.

akara avatar akara commented on June 20, 2024

Failures resolved. We should not have sporadic build failures any longer.

from squbs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.