Giter Site home page Giter Site logo

Comments (16)

rosskevin avatar rosskevin commented on May 18, 2024

This has just happened again. As I take down pods and deploy new ones it just runs away:

2018-11-09T22:54:08.709Z 11 TID-go5eoizov SidekiqAlive::Worker JID-13c5f2051107f52a25ef5f1e INFO: start
2018-11-09T22:54:08.710Z 11 TID-go5eoj2d3 SidekiqAlive::Worker JID-29372509d901455f06733abb INFO: done: 0.005 sec
2018-11-09T22:54:08.713Z 11 TID-go5eoizov SidekiqAlive::Worker JID-13c5f2051107f52a25ef5f1e INFO: done: 0.003 sec
2018-11-09T22:54:08.713Z 11 TID-go5eoj1z3 SidekiqAlive::Worker JID-cad550f796b0ee11ea24af10 INFO: start
2018-11-09T22:54:08.716Z 11 TID-go5eoj1z3 SidekiqAlive::Worker JID-cad550f796b0ee11ea24af10 INFO: done: 0.003 sec
2018-11-09T22:54:08.716Z 11 TID-go5eoizef SidekiqAlive::Worker JID-cc66acbdebf274f111b40c58 INFO: start
2018-11-09T22:54:08.719Z 11 TID-go5eoizef SidekiqAlive::Worker JID-cc66acbdebf274f111b40c58 INFO: done: 0.003 sec
2018-11-09T22:54:08.719Z 11 TID-go5eoj4xr SidekiqAlive::Worker JID-31c532b2ec2d1aa57b37d0d2 INFO: start
2018-11-09T22:54:08.725Z 11 TID-go5eoj4xr SidekiqAlive::Worker JID-31c532b2ec2d1aa57b37d0d2 INFO: done: 0.005 sec
2018-11-09T22:54:08.725Z 11 TID-go5eoiytv SidekiqAlive::Worker JID-1c4f6526cadc2a67b52b5ce0 INFO: start
2018-11-09T22:54:08.729Z 11 TID-go5eoiytv SidekiqAlive::Worker JID-1c4f6526cadc2a67b52b5ce0 INFO: done: 0.004 sec
2018-11-09T22:54:08.730Z 11 TID-go5eoj0p3 SidekiqAlive::Worker JID-e7447ac5992ca48667b2aa0d INFO: start
2018-11-09T22:54:08.735Z 11 TID-go5eoj0p3 SidekiqAlive::Worker JID-e7447ac5992ca48667b2aa0d INFO: done: 0.005 sec
2018-11-09T22:54:08.735Z 11 TID-go5eoj1ar SidekiqAlive::Worker JID-48fa855268670a1c77584efe INFO: start
2018-11-09T22:54:08.738Z 11 TID-go5eoj1ar SidekiqAlive::Worker JID-48fa855268670a1c77584efe INFO: done: 0.003 sec
2018-11-09T22:54:08.739Z 11 TID-go5eoj0fn SidekiqAlive::Worker JID-dbb66602fac45881bec1ff59 INFO: start
2018-11-09T22:54:08.743Z 11 TID-go5eoj3dn SidekiqAlive::Worker JID-be3aa1aa7ec2db3402005825 INFO: start
2018-11-09T22:54:08.744Z 11 TID-go5eoj0fn SidekiqAlive::Worker JID-dbb66602fac45881bec1ff59 INFO: done: 0.005 sec
2018-11-09T22:54:08.746Z 11 TID-go5eoj3dn SidekiqAlive::Worker JID-be3aa1aa7ec2db3402005825 INFO: done: 0.003 sec
2018-11-09T22:54:08.746Z 11 TID-go5eoj2d3 SidekiqAlive::Worker JID-e053558a23400b40139cf321 INFO: start
2018-11-09T22:54:08.750Z 11 TID-go5eoizov SidekiqAlive::Worker JID-2dec473f542077fcf88ea7d1 INFO: start
2018-11-09T22:54:08.751Z 11 TID-go5eoj2d3 SidekiqAlive::Worker JID-e053558a23400b40139cf321 INFO: done: 0.004 sec
2018-11-09T22:54:08.753Z 11 TID-go5eoizov SidekiqAlive::Worker JID-2dec473f542077fcf88ea7d1 INFO: done: 0.003 sec
2018-11-09T22:54:08.754Z 11 TID-go5eoj1z3 SidekiqAlive::Worker JID-9abdc83f275f941e630dfa49 INFO: start
2018-11-09T22:54:08.757Z 11 TID-go5eoj1z3 SidekiqAlive::Worker JID-9abdc83f275f941e630dfa49 INFO: done: 0.003 sec
2018-11-09T22:54:08.757Z 11 TID-go5eoizef SidekiqAlive::Worker JID-3f5ed59a06e7e926c9c26e6d INFO: start
2018-11-09T22:54:08.760Z 11 TID-go5eoj4xr SidekiqAlive::Worker JID-19e69736237e3f93faefc95e INFO: start
2018-11-09T22:54:08.768Z 11 TID-go5eoizef SidekiqAlive::Worker JID-3f5ed59a06e7e926c9c26e6d INFO: done: 0.011 sec
2018-11-09T22:54:08.771Z 11 TID-go5eoiytv SidekiqAlive::Worker JID-ad2c19842bb736bbaddd0406 INFO: start
2018-11-09T22:54:08.771Z 11 TID-go5eoj4xr SidekiqAlive::Worker JID-19e69736237e3f93faefc95e INFO: done: 0.011 sec
2018-11-09T22:54:08.773Z 11 TID-go5eoiytv SidekiqAlive::Worker JID-ad2c19842bb736bbaddd0406 INFO: done: 0.003 sec
2018-11-09T22:54:08.774Z 11 TID-go5eoj0p3 SidekiqAlive::Worker JID-dffe4b1ae40923aefde8098d INFO: start
2018-11-09T22:54:08.776Z 11 TID-go5eoj0p3 SidekiqAlive::Worker JID-dffe4b1ae40923aefde8098d INFO: done: 0.003 sec
2018-11-09T22:54:08.777Z 11 TID-go5eoj1ar SidekiqAlive::Worker JID-eda2ac20cd709bf30792d59a INFO: start
2018-11-09T22:54:08.779Z 11 TID-go5eoj0fn SidekiqAlive::Worker JID-50eaefb18564c5780ecc802c INFO: start
2018-11-09T22:54:08.780Z 11 TID-go5eoj1ar SidekiqAlive::Worker JID-eda2ac20cd709bf30792d59a INFO: done: 0.003 sec
2018-11-09T22:54:08.782Z 11 TID-go5eoj0fn SidekiqAlive::Worker JID-50eaefb18564c5780ecc802c INFO: done: 0.002 sec
2018-11-09T22:54:08.782Z 11 TID-go5eoj3dn SidekiqAlive::Worker JID-e599764cdacd0fe43481b5cb INFO: start
2018-11-09T22:54:08.785Z 11 TID-go5eoj2d3 SidekiqAlive::Worker JID-63e250a424575821374ad89c INFO: start
2018-11-09T22:54:08.786Z 11 TID-go5eoj3dn SidekiqAlive::Worker JID-e599764cdacd0fe43481b5cb INFO: done: 0.003 sec
2018-11-09T22:54:08.789Z 11 TID-go5eoj2d3 SidekiqAlive::Worker JID-63e250a424575821374ad89c INFO: done: 0.004 sec
2018-11-09T22:54:08.789Z 11 TID-go5eoizov SidekiqAlive::Worker JID-ceca7482d9946899dc8c80f9 INFO: start
2018-11-09T22:54:08.792Z 11 TID-go5eoizov SidekiqAlive::Worker JID-ceca7482d9946899dc8c80f9 INFO: done: 0.003 sec
2018-11-09T22:54:08.793Z 11 TID-go5eoj1z3 SidekiqAlive::Worker JID-6d532f1c87cb3d6a5ac73391 INFO: start
2018-11-09T22:54:08.796Z 11 TID-go5eoizef SidekiqAlive::Worker JID-368216db9b0db2bc3d8d073b INFO: start
2018-11-09T22:54:08.796Z 11 TID-go5eoj1z3 SidekiqAlive::Worker JID-6d532f1c87cb3d6a5ac73391 INFO: done: 0.003 sec
2018-11-09T22:54:08.815Z 11 TID-go5eoizef SidekiqAlive::Worker JID-368216db9b0db2bc3d8d073b INFO: done: 0.019 sec
2018-11-09T22:54:08.818Z 11 TID-go5eoj4xr SidekiqAlive::Worker JID-b905c7eb0f0b8ab58613e5e3 INFO: start
2018-11-09T22:54:08.822Z 11 TID-go5eoiytv SidekiqAlive::Worker JID-50699c6860cff70fa3384955 INFO: start
2018-11-09T22:54:08.823Z 11 TID-go5eoj4xr SidekiqAlive::Worker JID-b905c7eb0f0b8ab58613e5e3 INFO: done: 0.005 sec
2018-11-09T22:54:08.826Z 11 TID-go5eoj0p3 SidekiqAlive::Worker JID-df0acb5dae800558f88efbf1 INFO: start
2018-11-09T22:54:08.826Z 11 TID-go5eoiytv SidekiqAlive::Worker JID-50699c6860cff70fa3384955 INFO: done: 0.004 sec
2018-11-09T22:54:08.829Z 11 TID-go5eoj1ar SidekiqAlive::Worker JID-256f96301792de537316b1e5 INFO: start
2018-11-09T22:54:08.829Z 11 TID-go5eoj0p3 SidekiqAlive::Worker JID-df0acb5dae800558f88efbf1 INFO: done: 0.004 sec

from sidekiq_alive.

rosskevin avatar rosskevin commented on May 18, 2024

I guess it slows down after it catches up, but there wasn't a sidekiq running for x amount of time. @fruwe is this what you intend to solve with #7?

from sidekiq_alive.

rosskevin avatar rosskevin commented on May 18, 2024

This just happened again - we are sorting out our new production and had 30 minutes of downtime. Upon startup, sidekiq was slammed with 58,000 SidekiqAlive::Worker jobs in less than 10 seconds.

from sidekiq_alive.

gsmetal avatar gsmetal commented on May 18, 2024

It looks like that problem is when new version of pod is deploying:

  1. Old pod scheduled job and then stop accepting new jobs (because caught termination signal).
  2. New pod started and received old pod's job.
  3. Old pod's hostname is still registered, because it's time-to-live is configured time-to-live + one minute
    def self.register_instance(instance_name)
    redis.set(instance_name,
    Time.now.to_i,
    { ex: config.time_to_live.to_i + 60 })
    end
    so new pod's trying to execute job, but it's hostname is different, so it reschedule job immediately
    else
    # requeue for hostname to validate it's own liveness probe
    self.class.perform_async(hostname)
  4. (3) happens indefinitely until old pod's hostname is timed out (for one minute).

I think #7 should help to limit this old pod's job to execute only ~60 times.

from sidekiq_alive.

gsmetal avatar gsmetal commented on May 18, 2024

Another case where this problem may be occured is when there is several replicas of one sidekiq and one of them is stuck. Then it's job will be scheduled to another pods and them will be ping-pong it until hostname timed out.

from sidekiq_alive.

gsmetal avatar gsmetal commented on May 18, 2024

I think #7 should help to limit this old pod's job to execute only ~60 times.

That's not accurate because it depends on average_scheduled_poll_interval option of sidekiq config.

from sidekiq_alive.

arturictus avatar arturictus commented on May 18, 2024

Hi, Sorry for the delay.
I'll add a sidekiq callback on stop to remove the registered hostname from the list.
That will not trigger new jobs.

from sidekiq_alive.

arturictus avatar arturictus commented on May 18, 2024

Hi,
I think this pull should solved it: #12
This works when pods are on the regular kubernetes lifecircle. When deploying or kubernetes kills the pod organically. I will not stand to edge cases like compleat shutdown without a signal. But it should not happen anyway or in really weird conditions.

I think on quiet should be enough I added on shutdown just in case the deployment is not set properly as the example deployment explains.

what do you think?

from sidekiq_alive.

arturictus avatar arturictus commented on May 18, 2024

Hi @rosskevin,
If I understand correctly it only happens when pods are deleted and new ones start processing. Is that the case?

from sidekiq_alive.

arturictus avatar arturictus commented on May 18, 2024

Version 1.0.1 released. Should solve this issue.

from sidekiq_alive.

rosskevin avatar rosskevin commented on May 18, 2024

This should be reopened (verified on 1.0.1):

Throughput of the Redis database indicated below is 1553 RPS (requests per second).

This is a brand new cluster/business with essentially no traffic other than alpha users. The spike occurs on a new deployment (old one deleted, new one added)

redis_labs

What information could I provide that can help?

from sidekiq_alive.

fruwe avatar fruwe commented on May 18, 2024

I guess it slows down after it catches up, but there wasn't a sidekiq running for x amount of time. @fruwe is this what you intend to solve with #7?

@rosskevin Sorry for my late reply. Yes, something similar was happening on my cluster too.

Even with my PR, I experienced spikes of 150k jobs on an empty cluster. Last week, I had to remove sidekiq-alive, since my cluster keeps on crashing (in fact I had to remove the HTTP health check as well). I will have to investigate whether this problem occurs due to sidekiq-alive or a GKE update or whatever.

I will try v1.0.1 from @arturictus (Thanks for #12) and see how it goes.

from sidekiq_alive.

arturictus avatar arturictus commented on May 18, 2024

Reopen this.
I will add a queue per pod

from sidekiq_alive.

arturictus avatar arturictus commented on May 18, 2024

already open

from sidekiq_alive.

arrowcircle avatar arrowcircle commented on May 18, 2024

It looks like the problem persists in some or another way.
Every deploy I get 100k jobs scheduled.

I think the problem is because of no sidekiqctl in binary anymore.

Any ideas how to handle this problem?

from sidekiq_alive.

arturictus avatar arturictus commented on May 18, 2024

Hi @arrowcircle did you try with version 2.
In sidekiq 6 they removed sidekiqctl but when I tried the shutdown was triggering the callbacks.
Aswell in SidekiqAlive version 2 each instance has his own queue I do not see how it can be happening.

from sidekiq_alive.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.