Giter Site home page Giter Site logo

Comments (5)

wheee avatar wheee commented on July 24, 2024 2

Hey @rosa, thanks for responding!

This is different from resque, where the Resque::TermException bubbles up to the jobs. Right now there's nothing similar to this in Solid Queue, but I'm reevaluating all this, starting with no longer having the workers exit via exit!, but rather via exit (#119). I'm still unsure about that, which is why I haven't merged that PR yet.

I think there's merit in being able to bubble up the 'shutdown' signal/exception to the jobs (and not just the workers). Although, I can see why it may not be as useful unless the ability to kill a job via signals was a possibility.

That being said, I do have scenarios where I leverage these jobs as long running processes - that remain up and running until explicitly shut down by user command. These jobs typically follow a pub/sub paradigm and can take commands from the UI while streaming data from an external data source. In these cases, it would be nice to be able to clean up gracefully when a TERM signal is received within the allotted duration before the QUIT signal is issued.

EDIT: the fact that we have a configuration for shutdown_timeout would suggest that jobs should have the ability to respond to the TERM signal... otherwise, why provide the extra time before the QUIT signal is sent?

As a side note, while exploring GoodJob in more detail, I did run across this useful bit:
image

Happy to report that this works pretty nicely with SolidQueue, so while it may not be possible to explicitly kill jobs via signals, at least the use of Timeout provides assurances that jobs that get stuck for whatever reason will eventually timeout and can be handled gracefully and return to the pool. And more importantly, allow me to decide whether I wish to fail the job or retry, etc.

from solid_queue.

bdewater avatar bdewater commented on July 24, 2024 1

If you have any ideas or suggestions, feel free to contribute them! 🙏

For gems like https://github.com/Shopify/job-iteration it is useful to have a way to know a graceful shutdown was initiated, so that it can stop after the current iteration is finished and do it's own graceful shutdown (pushing the job back on the queue with the persisted progress).

It interacts in various ways with background queues: it uses the Sidekiq quiet callback, GoodJob recently added current_thread_shutting_down? to query for this, and for Resque a monkey patch is used.

IMO a callback is a bit more flexible since for other use cases one might not be able to/want to poll for this.

from solid_queue.

wheee avatar wheee commented on July 24, 2024

Also wondering... how would you define a callback for your jobs to gracefully handle TERM signals?

When I'm using Resque, I would handle Resque::TermException in the on_failure hook and put any cleanup/shutdown logic in there.

I took a quick look at the source and I do see SolidQueue::Processes::GracefulTerminationRequested but trying to rescue that via rescue_from in my job didn't seem to do anything.

from solid_queue.

rosa avatar rosa commented on July 24, 2024

Hey @wheee! These are great questions, and I'm afraid Solid Queue doesn't have a way to support your scenario:

Scenario would be a long-running job that is taking too long and the user wishes to kill it and not have it restarted.

As you saw, SolidQueue::Processes::GracefulTerminationRequested doesn't reach the jobs because it's only raised within the supervisor when it receives a TERM signal.

I also noticed if I were send a TERM signal to the worker (assuming it was a 1 thread/1 process worker), then the worker would get restarted and pick up the same job again.

That's right. The supervisor would notice the worker has exited and would start it again. If the worker didn't have time to finish the job within the configured shutdown_timeout time, then all jobs currently being run by its thread pool would have been killed and left in a claimed state. Then, after being deregistered, these claimed jobs would have been released back to the queue so they could be picked up again. This is different from resque, where the Resque::TermException bubbles up to the jobs. Right now there's nothing similar to this in Solid Queue, but I'm reevaluating all this, starting with no longer having the workers exit via exit!, but rather via exit (#119). I'm still unsure about that, which is why I haven't merged that PR yet.

If you have any ideas or suggestions, feel free to contribute them! 🙏

from solid_queue.

rosa avatar rosa commented on July 24, 2024

EDIT: the fact that we have a configuration for shutdown_timeout would suggest that jobs should have the ability to respond to the TERM signal... otherwise, why provide the extra time before the QUIT signal is sent?

To give time to the jobs in-flight to finish, and not take any other jobs. If we don't provide any extra time, any job in-flight will be stopped right away. With the extra time, the worker knows it shouldn't pick up any more jobs, just wait until the ones running finish and then finish.

from solid_queue.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.