doximity / simplekiq Goto Github PK

Sidekiq-based workflow orchestration library

License: Other

Ruby 99.52% Shell 0.48%

sidekiq

simplekiq's Introduction

Simplekiq

Any time that you find yourself needing to string together a long chain of jobs, particularly when there are multiple stages of Sidekiq-pro batches and callbacks involved, come home instead to the simple flavor of orchestrated job flow with Simplekiq.

Installation

Add this line to your application's Gemfile:

gem "simplekiq"

Note that this gem requires you be a Sidekiq Pro paid subscriber to be able to use it, so after following the installation docs for getting the private gem configured with your system, ensure you have sidekiq-pro at version ~> 5.0.0 or higher (need at least version 5.2.1 if you want to capture on_death callbacks percolating up to parent batches - a supported feature which is not required for typical orchestration behavior) and that it's being required:

gem "sidekiq-pro", "~> 5.0.0"

And then execute:

$ bundle install

Or install it yourself as:

$ gem install simplekiq

Usage

There are currently two primary components of the system which were designed to work in harmony:

Simplekiq::OrchestrationJob - A mixin for a Sidekiq jobs to be able to orchestrate a flow of jobs in one place. It makes long complicated flows between jobs easier to understand, iterate on, and test. It eliminates the need to hop between dozens of files to determine when, where, and why a particular job gets called.
Simplekiq::BatchingJob - A mixin designed to make breaking a large job into a batched process dead simple and contained within a single class while still being trivially composable in orchestrations.

Tool Drilldown

Simplekiq::OrchestrationJob

Mixing in the Simplekiq::Orchestration module lets you define a human-readable workflow of jobs in a single file with almost* no special requirements or restrictions on how the child jobs are designed. In most cases, Sidekiq jobs not designed for use in orchestrations should be compatible for use in orchestrations. A job implementing OrchestrationJob might look like:

class SomeOrchestrationJob
  include Simplekiq::OrchestrationJob

  def perform_orchestration(some_id)
    @some_model = SomeModel.find(some_id) # 1.

    run SomeInitialSetupJob, some_model.id # 2.

    in_parallel do
      some_related_models.each do |related_model|
        run SomeParallelizableJob, related_model.id # 3.
      end
    end

    run SomeFinalizationJob, some_model.id # 4.
  end

  def on_death(status, options) # 5.
    SomeModel.find(options["args"].first).failure_happened!
  end

  def on_complete(status, options) # 6.
    failures = Array(status&.failure_info) # sidekiq-pro batch status api
    return if failures.empty?

    SomeModel.find(options["args"].first).it_was_these_failures(failures)
  end

  private

  attr_reader :some_model

  def some_related_models
    @some_related_models ||= some_model.some_relation
  end
end

Let's use the above example to describe some specifics of how the flow works.

SomeOrchestrationJob pulls up some instance of parent model SomeModel.
It does some initial work in SomeInitialSetupJob, which blocks the rest of the workflow until it completes successfully.
Then it will run a SomeParallelizableJob for each of some number of associated models some_related_models. These jobs will all run parallel to each other independently.
Finally, after all of the parallel jobs from #3 complete successfully, SomeFinalizationJob will run and then after it finishes the orchestration will be complete.
If it ran into an error at some point, on_death will get fired with the first failure. (please use sidekiq-pro of at least 5.2.1 for this feature)
It will call on_complete at the end of the orchestration no matter what, this is the place to collect all the failures and persist them somewhere.

Note - it's fine to add utility methods and attr_accessors to keep the code tidy and maintainable.

When SomeOrchestrationJob itself gets called though, the first thing it does it turn these directives into a big serialized structure indicating which job will be called under what conditions (eg, serial or in parallel) and with what arguments, and then keeps passing that between the simplekiq-internal jobs that actually conduct the flow.

This means when you want to deploy a change to this flow all previous in-flight workflows will continue undisturbed because the workflow is frozen in sidekiq job arguments and will remain frozen until the workflow completes. This is generally a boon, but note that if you remove a job from a workflow you'll need to remember to either keep the job itself (eg, the SomeFinalizationJob class file from our above example) in the codebase or replace it with a stub so that any in-flight workflows won't crash due to not being able to pull up the prior-specified workflow.

"almost* no special requirements or restrictions on how the child jobs are designed" - The one thing you'll want to keep in mind when feeding arbitrary jobs into orchestrations is that if the job creates any new sidekiq batches then those new sidekiq batches should be added as child sidekiq batches of the parent sidekiq batch of the job. The parent sidekiq batch of the job is the sidekiq batch that drives the orchestration from step to step, so if you don't do this it will move onto the next step in the orchestration once your job finishes even if the new sidekiq batches it started didn't finish. This sounds more complicated than it is, you can see an example of code that does this in BatchingJob#perform:

if batch # is there a parent batch?
  batch.jobs do # open the parent batch back up
    create_a_new_batch_and_add_jobs_to_it_to_run # make our new batch as a child batch of the parent batch
  end # close the parent batch again
else # there's no parent batches, this job was run directly outside of an orchestration
  create_a_new_batch_and_add_jobs_to_it_to_run # make our new batch without a parent batch
end

Simplekiq::BatchingJob

See the Simplekiq::BatchingJob module itself for a description and example usage in the header comments. Nutshell is that you should use this if you're planning on making a batched asynchronous process as it shaves off a lot of ceremony and unexpressive structure. eg - Instead of having BeerBottlerJob which queues some number of BeerBottlerBatchJobs to handle the broken down sub-tasks you can just have BeerBottlerJob with a method for batching, executing individual batches, and a callback that gets run after all batches have completed successfully.

History

Simplekiq was initially released for private use within Doximity applications in Oct 2020 where it continued to be iterated on towards stability and general use until Jan 2022 when it was deemed settled enough for public release.

The primary driving factor that inspired this work was a series of over a dozen differently defined and structured jobs part of a single workflow of which the logical flow was extraordinarily difficult to cognitively trace. This led to exteme difficulty in debugging and following problematic instances of the workflow in production as well as needlessly high cost to refactoring and iterative adjustments.

The crux of the problem was that each job was highly coupled to its position in the overall flow as well as the absence of any central mechanism to indicate what the overall flow was. After building Simplekiq and implementing it into the flow, significant changes to the flow became quick adjustments requiring only a couple lines of code to change and folks unfamiliar with the system could quickly get up to speed by reading through the orchestration job.

Versioning

This project follows semantic versioning. See https://semver.org/ for details.

Development

After checking out the repo, run bin/setup to install dependencies. Note that this depends on sidekiq-pro which requires a commercial license to install and use.

Then, run rake ci:specs to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install.

To get a new release cut, please either open a PR or an issue with your ask with as much context as possible and someone from Doximity will consider your request. If it makes sense for the direction of the project we'll get it done and inform you of when a release has been made available with the changes.

For internal employees: consult the company wiki on the current standard process for conducting releases for our public gems.

Contributing

See CONTRIBUTING.md
Fork it
Create your feature branch (git checkout -b my-new-feature)
Commit your changes (git commit -am 'Add some feature')
Push to the branch (git push origin my-new-feature)
Create a new Pull Request

License

The gem is licensed under an Apache 2 license. Contributors are required to sign an contributor license agreement. See LICENSE.txt and CONTRIBUTING.md for more information.

simplekiq's People

Stargazers

Watchers

Forkers

jeremysmithco streemau jose96043 gxespino

simplekiq's Issues

Move version to 1.0 and update readme to remove plans to do this

Orchestration failure handling

Simplekiq::BatchedJob provides access to simplekiq-pro's on_death callback, but there's no similar behavior provided for orchestrations. Also, even if we did add a way to tap into batch callbacks, that wouldn't give us the actual exception, stacktrace, etc.

There's some additional error handling mechanisms provided by sidekiq which might be useful for this - https://github.com/mperham/sidekiq/wiki/Error-Handling - perhaps we can get the exception info from there and somehow tie it into on_death batch handling, or perhaps batch-level error handling wouldn't be useful for handling orchestration failures.

Ideally we would want to make sure to make this respect retry behavior, a job that failed, retried, and succeeded shouldn't trigger this (the sidekiq-pro on_death callback follows this behavior so we should match that) and it should have a simple interface so custom handling can be easily defined for an orchestration, similar to how easy it is to define failure handling for Simplekiq::BatchedJob

Setting reserved queue for job doesn't include BatchTrackerJob

I set up a Simplekiq::OrchestrationJob which I then call via a sidekiq schedule specifying that it should be on a new queue called "priority". This mostly works well, and I can see that the job executes on this reserved queue. However there are some SimpleKiq logs related to this job which appear on the default queue from class=Simplekiq::BatchTrackerJob.

Is it possible to configure Simplekiq to perform all work on a specified reserved queue? E.g something along the lines of setting sidekiq_options queue: "priority" in the OrchestrationJob file such as in the Sidekiq docs.

Equivalent to perform_in for the run method?

Hey all! I've been using Simplekiq for several months and love it. ❤️ I hit an interesting challenge this past week and had an idea. I have an orchestration job that looks something like this:

class OrchestrateFooBarJob
  include Simplekiq::OrchestrationJob

  def perform_orchestration(id)
    foo = Foo.find(id)
    return if foo.complete?

    in_parallel do
      foo_bar_ids = foo.foo_bars.ids # Foo has_many :foo_bars
      foo_bar_ids.each do |foo_bar_id|
        run FooBarJob, foo_bar_id
      end
    end
  end
end

Each FooBarJob makes an API request and stores some data on the FooBar AR model. The problem is, during peak app usage, there may be several OrchestrateFooBarJob jobs running at a time, firing off many parallel FooBarJob jobs nearly simultaneously and exceeding our API request concurrency.

We've upgraded to Sidekiq Enterprise and I'm about to implement Concurrent Rate Limiting on the FooBarJob class. But in my mind, ideally I wouldn't be relying solely on the OverLimit exceptions plus a backoff with jitter.

As the docs say under Limiting is not Throttling:

If you push 1000 jobs to Redis, Sidekiq will run those jobs as fast as possible which may cause many of those jobs to fail with an OverLimit error. If you want to trickle jobs into Sidekiq slowly, the only way to do that is with manual scheduling. Here's how you can schedule 1 job per second to ensure that Sidekiq doesn't run all jobs immediately:
1000.times do |index|
 SomeWorker.perform_in(index, some_args)
end

Looking at the Batches docs Notes section, I see that:

Batches can contain scheduled jobs too, e.g. perform_in(10.minutes).

So, now I'm wondering, would it be possible (and make sense) to have a run command equivalent for perform_in, something like run_in below?

class OrchestrateFooBarJob
  include Simplekiq::OrchestrationJob

  def perform_orchestration(id)
    foo = Foo.find(id)
    return if foo.complete?

    in_parallel do
      foo_bar_ids = foo.foo_bars.ids
      foo_bar_ids.each_with_index do |foo_bar_id, i|
        run_in i.seconds, FooBarJob, foo_bar_id
      end
    end
  end
end

Or have you maybe found another way to solve this?

Limited parallelization for in_parallel and/or BatchedJob

https://github.com/doximity/campaigns/blob/f3b5a7d457a0fc4b5e310c851ac9b55570ce18b2/app/jobs/audiences/sleepy_mass_cache_orchestration_job.rb

^ this job takes an approach to limiting the parallelization by recursively queuing itself after the partial set has completed, we could probably use a similar approach transparently in our batching behaviors, perhaps like in_parallel(10) or a sidekiq option in BatchedJob or something like that.

This is useful for avoiding flooding the queue so that new jobs can still interleave with the orchestrated batch.

Move BatchingJob module description into the readme

Seems like now that there is a readme that description/documentation should live there.

We require contributors to "sign a license", can we be more clear about how that works?

Do they literally have to copy the agreement and HelloSign it? Or is it just that by contributing they agree?

Ability to quiet orchestrations

Sidekiq has the ability to quiet jobs (e.g. keep new jobs from be enqueued) enabling graceful shutdown. Because orchestrations contain jobs that are often coupled to each other code changes can be painful. If you have an orchestration like:

run JobB
run JobA

And move work from JobB to JobA when you deploy you might have JobB jobs already queued. When your updated code starts running the queued JobB jobs will run but you'll lose the work that you've moved to JobA. For these reasons a convenient tool would be the ability to stop starting new orchestrations. You can then wait for your running orchestrations to finish before making your code changes.

Gracefully handle callback removal

BatchingJob auto-registers Sidekiq batch callbacks when you define an on_x method. The problem is that if you have a callback that you don't need anymore and update the code removing the method you'll get exceptions when the previously registered callbacks try to run. Since we're auto-registering it seems reasonable for us to also check that they are still there before trying to run them -- basically defining BatchingJob callbacks that no-op if the user's job hasn't implemented the callback methods. Some tricky async code change stuff to think through.

Should we opensource workflows?

I think we should -- it was part of our original vision and helps solve the same set of problems. But we probably should first make significant changes -- mainly because there is a lot of random code in there that none of us understand.

Also while tree-view can definitely be helpful it probably shouldn't be the primary view since the raison d'être of simplekiq is enable you to think of your background code as a sequential flow. This view would be more orchestration, leveraging knowledge of orchestrations to flatten the tree based on orchestration callbacks, hiding jobs that aren't user-owned and leveraging orchestration "checkpoints" if we build those out. These two views can share some underlying abstractions but separating them also means you could use Simplekiq workflows without Orchestrations.

Batch invalidation

We have an initial experiment of this in Campaigns that went well, we could extract this into the gem but we also need to figure out what exactly should happen when a job realizes that its orchestration has been invalidated.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.