Giter Site home page Giter Site logo

Comments (4)

nsaje avatar nsaje commented on July 28, 2024 1

I think both of those cases would be better served by leaving the task in Queued state but still loudly logging an error (rollbar etc.). This way the manual intervention needed would be reduced to just fixing the issue in the code, whereas right now you also have to figure out exactly which tasks failed because of this issue and retry them manually.

The parent process handles the state/queue transitions so we do need to get that info back from the child somehow. Passing it via a Redis entry sounds fine as well, I don't have a strong opinion on it.

from tasktiger.

tsx avatar tsx commented on July 28, 2024

Not all TaskImportErrors are equal. There are two kinds:

  • When we expect the task to become importable on its own - this typically happens as race conditions during deployments, for example when a deployment that schedules a new task was rolled out but the workers that handle that task are not aware of it just yet. These will recover automatically, so leaving them in Queued state would be fine.
  • When the task reference is actually broken, like it was removed or moved and there were in-flight tasks that reference the old (now not existing) name. These need manual intervention and leaving them in Queued means we'll ignore a real problem.

Unless we have a reliable way to distinguish between the two kinds of failures (honestly I don't see how a worker can know whether the task is expected to come in the future or not - we don't have the crystal balls necessary for that), I wouldn't recommend leaving those tasks as Queued.

That said, keeping a list of tasks that have failed with an import error, and retrying them on next worker restart (redeployment presumably) might make recovery (in both scenarios) easier by reducing the need for manually tracking down tasks and removing them.

As for the signalling mechanism - I don't think using exit codes is the best idea. I think the child process can just catch the appropriate ImportError exceptions and write some richer state/report to TT redis instead of trying to communicate through a 1-bit channel. It already stores the execution data from the forked child, so why not add it there.

from tasktiger.

thomasst avatar thomasst commented on July 28, 2024

Do you have an example where a broken child context manager caused a TaskImportError?

from tasktiger.

nsaje avatar nsaje commented on July 28, 2024

No, there's an "or" in there :-) e.g. either TaskImportError or any error caused by the context managers.

from tasktiger.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.