Giter Site home page Giter Site logo

Comments (19)

mperham avatar mperham commented on July 25, 2024 2

I apologize for the less than useful initial reply. It looks like the transactional push delay causes the batch to go out of scope and so the job does not get a BID associated. That seems like a footgun I should handle. 👍🏻

from sidekiq.

mperham avatar mperham commented on July 25, 2024

We can either guarantee that Batch jobs are pushed atomically at the end of the jobs block or transactionally, but not both. What do you expect to happen?

from sidekiq.

ernie avatar ernie commented on July 25, 2024

Heya @mperham! Long time no see. Is this a case of "we're using it wrong" or is there a real question here?

from sidekiq.

noaelad avatar noaelad commented on July 25, 2024

Clarified the description a little bit, I hope that helps. Ideally the jobs would still be pushed at the end of the transaction, but they would get their BID assigned. If it's not possible to do both, then I guess it's more important for a BID to be assigned than to enqueue at the end of the transaction.
It seems like transactional_push! already doesn't support push_bulk, so it would make sense to explicitly decide that it doesn't support batches either, but the way it currently stands it's definitely a footgun.
Thanks for looking into it!

from sidekiq.

mperham avatar mperham commented on July 25, 2024

Somehow this works for me. I can't reproduce the behavior yet. Can you use 7.2.0?

  it "works transactionally" do
    q = Sidekiq::Queue.new
    assert_equal 0, q.size

    require "active_record"
    ActiveRecord::Base.establish_connection(
      adapter: "sqlite3",
      database: "test.db"
    )
    require "after_commit_everywhere"
    Sidekiq.default_job_options["client_class"] = Sidekiq::TransactionAwareClient
    Sidekiq::JobUtil::TRANSIENT_ATTRIBUTES << "client_class"

    class XAJob
      include Sidekiq::Job
    end

    ActiveRecord::Base.transaction do
      b = Sidekiq::Batch.new
      b.jobs do
        XAJob.perform_async
      end
      assert_equal 1, q.size
    end
    assert_equal 1, q.size
  end

from sidekiq.

mperham avatar mperham commented on July 25, 2024

I suspect the batch refactoring in 7.1.x fixing inline execution also changed this behavior.

from sidekiq.

mperham avatar mperham commented on July 25, 2024

Try main and see if it works better for you.

from sidekiq.

noaelad avatar noaelad commented on July 25, 2024

The problem still exists on main. The test you wrote isn't testing the issue I mentioned, which involves the batch id not being set. I'm also not convinced this test passes, given that it's trying to use transactional push, I would expect the q.size outside the transaction to be +1 the q.size inside the transaction...

I would rewrite it as:

it "works transactionally" do
  q = Sidekiq::Queue.new
  assert_equal 0, q.size

  require "active_record"
  ActiveRecord::Base.establish_connection(
    adapter: "sqlite3",
    database: "test.db"
  )
  require "after_commit_everywhere"
  Sidekiq.default_job_options["client_class"] = Sidekiq::TransactionAwareClient
  Sidekiq::JobUtil::TRANSIENT_ATTRIBUTES << "client_class"

  class XAJob
    include Sidekiq::Job
  end

  ActiveRecord::Base.transaction do
    b = Sidekiq::Batch.new
    b.jobs do
      XAJob.perform_async
    end
    assert_equal 1, q.size
    assert_equal q.first.klass, 'Sidekiq::Batch::Empty'
  end
  assert_equal 2, q.size
  assert_equal q.map{|j| j.klass}, ['Sidekiq::Batch::Empty', 'XAJob']
  assert_equal q.map{|j| j.bid}, [b.bid, b.bid]
end

the last assertion is still broken on main.

from sidekiq.

noaelad avatar noaelad commented on July 25, 2024

@mperham sorry it took me a while to respond, wanted to make sure you saw this? ^
Can we reopen this issue?

from sidekiq.

mperham avatar mperham commented on July 25, 2024

Are you using Sidekiq + Sidekiq Pro 7.2.0? 7.1.0 will not work.

from sidekiq.

mperham avatar mperham commented on July 25, 2024

Batch jobs do not pay attention to transactional boundaries. The batch.jobs semantic is clear: all job are pushed at the end of the jobs block.

  ActiveRecord::Base.transaction do
    b = Sidekiq::Batch.new
    b.jobs do
      XAJob.perform_async
    end # the job will be pushed here
    assert_equal 1, q.size
    assert_equal q.first.klass, 'Sidekiq::Batch::Empty' # nope, this will be XAJob
  end
  assert_equal 2, q.size # nope, only one job
  assert_equal q.map{|j| j.klass}, ['Sidekiq::Batch::Empty', 'XAJob']
  assert_equal q.map{|j| j.bid}, [b.bid, b.bid] # and yes, you should get a bid

from sidekiq.

noaelad avatar noaelad commented on July 25, 2024

Batch jobs do not pay attention to transactional boundaries

Is this documented somewhere? This is surprising to me and also does not match the behavior I'm seeing, in both 7.1.0 and 7.2.0

With Sidekiq + Sidekiq Pro both on 7.2.0 what I'm seeing is:

  ActiveRecord::Base.transaction do
    b = Sidekiq::Batch.new
    b.jobs do
      XAJob.perform_async
    end # the XAJob job is not pushed here, instead there is a Sidekiq::Batch::Empty job
    assert_equal 1, q.size
    assert_equal q.first.klass, 'Sidekiq::Batch::Empty' # this part passes
  end
  assert_equal 2, q.size # I see two jobs here
  assert_equal q.map{|j| j.klass}, ['Sidekiq::Batch::Empty', 'XAJob'] # this part passes
  assert_equal q.map{|j| j.bid}, [b.bid, b.bid] # this part fails, with the second bid being nil

from sidekiq.

noaelad avatar noaelad commented on July 25, 2024

As a side note, with 7.1.0 I am not seeing the Sidekiq::Batch::Empty job, but the missing bid issue is still there

  ActiveRecord::Base.transaction do
    b = Sidekiq::Batch.new
    b.jobs do
      XAJob.perform_async
    end # the XAJob job is not pushed here
    assert_equal 0, q.size # passes
  end
  assert_equal 1, q.size # passes
  assert_equal q.map{|j| j.klass}, ['XAJob'] # passes
  assert_equal q.map{|j| j.bid}, [b.bid] # fails, j.bid is nil

from sidekiq.

mperham avatar mperham commented on July 25, 2024

You need Sidekiq main + Pro 7.2.

from sidekiq.

noaelad avatar noaelad commented on July 25, 2024

🤦 I didn't realize main was ahead of 7.2.0 and included this fix. My bad 😬

Looks like that resolved the issue! I will (ask our infra team to) update when it gets released. Thank you!

from sidekiq.

adrian-gomez avatar adrian-gomez commented on July 25, 2024

@mperham I'm working on updating sidekiq and bump into this issue ((isolator alerted us of pushing to jobs while inside a transaction). Our scenario is as follows:

  • create a batch
  • push jobs into the batch that we actually want to track and consider part of the batch
  • when those jobs are processed they might open a db transaction perform some tasks that might also push other jobs to sidekiq that we don't really care to track inside the batch and would rather just keep the old behavior of pushing after the transaction is done.

For the pushing after the transaction part I believe we can add AfterCommitEverywhere.after_commit { Worker.peform_async } manually to all those places which is doable but kind of pain (and not needed when those jobs are pushing for other parts of the app that's not using batches). Can you think of any other way to get back old behavior?

It seems that side effect of that would be that those jobs would be hanlded outside the batch. Is this assumption correct?

I think the uniqueness of our scenario is that we have jobs that we care to track as part of the batch and some jobs that we don't. Idealy we would be able to provide a list to the bach of what classes are part of it and it would only track those instead of assuming every job pushed in the context of a batch is part of it.

pd: sorry to post on a closed issue, not sure if that was the way to go or if I should have opened a new one

from sidekiq.

mperham avatar mperham commented on July 25, 2024
  • Batch guarantees that jobs are pushed at the end of the jobs block atomically.
  • Transactional push guarantees that jobs are pushed after the transaction is committed.

We cannot provide both guarantees at the same time.

Here's an idea: do the work to separate those two concerns. Either commit your transactions before creating the batch or scope the transaction within the jobs block so your jobs are committed before the batch jobs are pushed:

Commit in batch:

batch.jobs do
  User.transaction do
    ...
  end
end

Commit then batch:

User.transaction do
  ...
end
batch.jobs do
end

from sidekiq.

adrian-gomez avatar adrian-gomez commented on July 25, 2024

We don't have a problem or a transaction at the initial level. We do:

  batch.jobs do
    UserWorker.perform_async(id)
  end

  class UserWorker
    def perform(id)
      user = User.find(id)
      user.transaction do
         user.thing
         OtherWorker.perform_async(id)
      end
    end
  end

with the latest changes OtherWorker is fired inmediatly (before the commit) and its also added as part of the batch. Where before it would wait for the transaction to commit and not be part of the barch.
We could do:

  class UserWorker
    def perform(id)
      user = User.find(id)
      user.transaction do
         user.thing
         AfterCommitEverywhere.after_commit { OtherWorker.perform_async(id) }
      end
    end
  end

but thats kind of annoing since the call for that worker might be nested insde a bunch of method calls (and sometimes its called without a batch being present in those cases we wouldn't need the AfterCommitEverywhere block.

from sidekiq.

adrian-gomez avatar adrian-gomez commented on July 25, 2024

Just to close the loop, turns out that pushing jobs from a job inside a batch DOES NOT push the jobs to the batch automatically that was happening on our test because of some workarounds we have to be able to perform integration tests on jobs that depend on batches. Sorry for the noise!

from sidekiq.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.