Comments (10)
Is Redis set to use noeviction
or another maxmemory policy?
from sidekiq.
redis.info["maxmemory_policy"] # => "volatile-lru"
from sidekiq.
Oh dear, that’s half of your problem. Redis gets under memory pressure and starts evicting data to free up RAM. You lose data and errors like these are symptomatic of that loss. See the Using Redis wiki page.
from sidekiq.
"Great! Now for the other half of the problem." -Zeno
from sidekiq.
Seriously, though... we'll change the policy, which sounds like it should eliminate the specific behavior observed. I'll give the wiki page a more careful read-through.
Is there any other diagnostic info you need and/or any other action [not detailed in the wiki page, which I'll review] that we should take? (Guessing not, but you said "half of [our] problem", so I'd like to be thorough.) :) I'll take "closed without comment" as a "no." :)
from sidekiq.
Sorry, just a poor turn of phrase. Sidekiq expects Redis to use noeviction
otherwise data can be randomly evicted and inconsistent. If that happens, I don't make any claims for how Sidekiq may behave. Push back on me if you think there is still room for improvement. Should the batch code silently swallow missing callback data?
from sidekiq.
Disclaiming support for a dysfunctional eviction policy seems perfectly reasonable. The job raising an error is also perfectly reasonable. The point at which (IMO) it went pear-shaped, though, was when those errors went through the same retry process as anything else. Given the scale that Sidekiq can allow users to achieve, things can go very wrong very fast.
The problem I had with this error, once I noticed it, was that—unlike an error in one of my app's jobs—I didn't see any obvious way to fix it. All I could do was decide that the jobs weren't coming back, pop a rails console
into prod, and manually delete those jobs from the retry set so they didn't keep doing a "Sorceror's Apprentice" number on our Sentry quota.
Silently swallowing a Sidekiq::Batch::NoSuchBatch
probably isn't the right behavior, but I don't think the standard retry policy is the way to go either. Is it likely that a missing batch will come back? If not, raise once [see GIF below] and be done, or maybe put it into a dead letter queue for manual disposition? Or, maybe the first few get retried, but as [internal] error rates go up, a Circuit Breaker shunts them into a DLQ instead of back into the retry set?
from sidekiq.
Oh, one other easy potential intervention: mention a potential cause of the exception in the error message? I did search for it at the time (and checked again just now) but didn't find anything that would've pointed me in the right direction.
from sidekiq.
I've added code comments to those two places in the batch code where data inconsistency rears its head, explaining about noeviction
, which should make it much easier to fix once you've opened up the code to see the problem area.
I don't know how I can safely write another retry path for bad batch data. It seems like a bad time, testing would be difficult, that code path would be used rarely so a high likelihood for hiding bugs. I'm going to stop here; open a new issue if you have other ideas for improvement.
from sidekiq.
Entirely reasonable, given the low probability and only moderate severity. Thanks!
from sidekiq.
Related Issues (20)
- Add splash logo on startup when using a custom log formatter HOT 2
- Job payload inconsistently persisted through retries HOT 3
- delete_by_class returns zero and doesn't remove any jobs HOT 1
- sidekiq-ent not found when running bin/rails on Ruby 3.3.2 HOT 5
- Dependency on the `logger` gem which will be removed in Ruby 3.5 HOT 2
- QUESTION: Is it possible to customize UI? HOT 3
- sidekiq-ent: unknown keyword: :cluster_safe (ArgumentError) HOT 2
- Error with rack 3.1.0 HOT 3
- Language selector in footer hides web body content HOT 1
- Sidekiq Pro 7.2.0 does not support dogstatsd-ruby versions less than v5 HOT 2
- Nested batches fire in a wrong order in testing mode HOT 5
- Sidekiq API uninitialized when route drawing is deferred HOT 3
- Killing busy job by kill thread issue HOT 1
- Unmet dependency: Redis 6.2.0 doesn't exist HOT 2
- Consolidate metrics under the sidekiq.* namespace
- Is job.delete on a ScheduledSet job a valid way to do it now? HOT 2
- Removing attributes from `Current` breaks job processing HOT 1
- Storing error data on Sidekiq job for next retry HOT 1
- Limiting job by TTL instead of retries count HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sidekiq.