nesquena / backburner Goto Github PK
View Code? Open in Web Editor NEWSimple and reliable beanstalkd job queue for ruby
Home Page: http://nesquena.github.com/backburner
License: MIT License
Simple and reliable beanstalkd job queue for ruby
Home Page: http://nesquena.github.com/backburner
License: MIT License
Allow ResqueCompat module to support old resque 1.1 users
Discussing with @bradgessler:
Currently:
Backburner::Worker.enqueue NewsletterSender, [self.id, user.id], :ttr => 1000
instead:
Backburner::Worker.enqueue Backburner::Job.new(NewsletterSender, [self.id, user.id], :ttr => 1000)
or:
# include module
Backburner::Worker.enqueue NewsletterSender.job(self.id, user.id).tap { |p| p.ttr = 200 }
Right now jobs are buried if they raise an exception or timeout. Instead, perhaps retry with a delay for some max times before burying.
From @bradgessler:
A stub could be provided for peeps that want to assert jobs are thrown on the queue in a test env. Make testing a job is performed work easily. Right now I use a hacky thing on my projects:
# Backburner::Worker.enqueue NewsletterSender, [self.id, user.id], :ttr => 1000
Backburner::Worker.class_eval do
class << self; alias_method :original_enqueue, :enqueue; end
def self.enqueue(job_class, args=[], opts={})
job_class.perform(*args)
end
end
to force the jobs to be executed automatically. Open to the right way to do this that is simple.
The beanstalk-client gem allows you to define multiple beanstalk servers, but I don't see how that works with backburner's configure block.
This way the workers could listen to the same queue on multiple beanstalkd servers and if job enqueuing fails, backburner should try another server in the pool.
When queue is empty worker explodes with message:
/Users/jdudulski/.rvm/gems/ruby-1.9.2-p290/gems/backburner-0.1.0/lib/backburner/worker.rb:100:in `rescue in work_one_job': undefined method `bury' for nil:NilClass (NoMethodError)
Hello
I defined one job in my Rails app, as follows
class ImageDownloadJob
include Backburner::Queue
queue "image-download"
queue_priority 1000 # most urgent priority is 0
def self.perform(image_id, url, expected_mime_type)
Rails.logger.debug ("ImageDownloadJob.perform #{image_id} #{url} #{expected_mime_type}")
Image.find(image_id).save_file_from_url(url, expected_mime_type)
end
end
In config/initializers/backburner.rb
I put the following code
Backburner.configure do |config|
config.beanstalk_url = "beanstalk://127.0.0.1"
config.tube_namespace = "myapp.#{Rails.env}"
config.on_error = lambda { |e| Rails.logger.error "Backburner / Beanstalk error = #{e}" }
config.max_job_retries = 0 # default 0 retries
config.retry_delay = 2 # default 5 seconds
config.default_priority = 65536
config.respond_timeout = 120
config.default_worker = Backburner::Workers::Simple
config.logger = Rails.logger
config.primary_queue = "backburner-jobs"
config.priority_labels = { :custom => 50, :useless => 1000 }
end
Beanstalkd is running.
Then I am starting a worker with the following command :
QUEUES=image-download bundle exec rake backburner:work
I get the following in my log
Working 1 queues: [ myapp.development.backburner-jobs ]
which does not make sense to me, because it should start waiting on my image-download
queue.
Then if I start my Rails app with bundle exec rails s
, the app enqueues some jobs that the worker never gets the work.
Then, if I kill the backburner:work process with a ctrl+c, and relaunch it, it will get my image-download
queue and process the jobs.
Working 2 queues: [ myapp.development.image-download, myapp.development.backburner-jobs ]
Work job ImageDownloadJob with [45, "test", "image"]
ImageDownloadJob.perform 45 test image
Why do I have to start the worker after my Rails app has put something in the queue ?
Is it a bug or just me ?
Thanks for your help !!
Best
Geoffroy
Geoffroy
Is it possible, to move this block to yaml file?
Backburner.configure do |config|
...
end
Its needed, when you have different configurations for development, staging and production.
Or is there any other convenient way to split configuration?
Some companies will only use gems with a certain license.
The canonical and easy way to check is via the gemspec,
via e.g.
spec.license = 'MIT'
# or
spec.licenses = ['MIT', 'GPL-2']
Even for projects that already specify a license, including a license in your gemspec is a good practice, since it is easily
discoverable there without having to check the readme or for a license file.
For example, there is a License Finder gem to help companies ensure all gems they use
meet their licensing needs. This tool depends on license information being available in the gemspec. This is an important enough
issue that even Bundler now generates gems with a default 'MIT' license.
If you need help choosing a license (sorry, I haven't checked your readme or looked for a license file),
github has created a license picker tool.
In case you're wondering how I found you and why I made this issue, it's because I'm collecting stats on gems (I was originally
looking for download data) and decided to collect license metadata,too, and make issues for gemspecs not specifying a license as a public service :).
I hope you'll consider specifying a license in your gemspec. If not, please just close the issue and let me know. In either case, I'll follow up. Thanks!
p.s. I've written a blog post about this project
I didn't find it anywhere in docs, but what I would like to do is within a job call a delay method, for example:
class ImportSongs
include Backburner::Queue
def self.perform(api_token, songs)
api = API.new api_token
songs.each_with_index do |song, i|
# make current worker proceed with another job while it's sleeping
delay 60*60 if i != 0 && i % 100 == 0
api.import_song song
end
end
end
Is there any init script for linux to start/stop backburner?
For example, if you use monit and want to monitor your workers with pid - its really good to have init script for start/stop/restart.
Backburner::Worker#retry_connection!
does not call close, leaving open connections to servers until the process holding those connections is killed.
Can be reproduced by starting 2 servers. Then, queue jobs in a loop.
require 'backburner'
Backburner.configure do |config|
config.beanstalk_url = ['beanstalk://127.0.0.1:11300', 'beanstalk://127.0.0.1:11301']
end
class Job
def self.perform(message)
p message
end
end
loop do
Backburner::Worker.enqueue(Job, ['Hello'])
end
In another process run the worker.
Backburner.work
Kill one of the servers. The other servers output will look like this.
$ beanstalkd -V -p 11300
pid 4318
bind 3 0.0.0.0:11300
accept 5
accept 6
accept 7
accept 8
accept 9
accept 10
accept 11
accept 12
accept 13
accept 14
accept 15
lsof
of this process.
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME [2/1880]
beanstalk 32480 vagrant cwd DIR 8,3 4096 261123 /home/vagrant
beanstalk 32480 vagrant rtd DIR 8,3 4096 2 /
beanstalk 32480 vagrant txt REG 8,3 63744 789844 /usr/bin/beanstalkd
beanstalk 32480 vagrant mem REG 8,3 156928 522458 /lib64/ld-2.12.so
beanstalk 32480 vagrant mem REG 8,3 1926800 522604 /lib64/libc-2.12.so
beanstalk 32480 vagrant 0u CHR 136,4 0t0 7 /dev/pts/4
beanstalk 32480 vagrant 1u CHR 136,4 0t0 7 /dev/pts/4
beanstalk 32480 vagrant 2u CHR 136,4 0t0 7 /dev/pts/4
beanstalk 32480 vagrant 3u IPv4 97801 0t0 TCP *:11300 (LISTEN)
beanstalk 32480 vagrant 4u REG 0,9 0 3780 [eventpoll]
beanstalk 32480 vagrant 5u IPv4 97810 0t0 TCP localhost:11300->localhost:55278 (ESTABLISHED)
beanstalk 32480 vagrant 6u IPv4 98268 0t0 TCP localhost:11300->localhost:55306 (ESTABLISHED)
beanstalk 32480 vagrant 7u IPv4 98274 0t0 TCP localhost:11300->localhost:55308 (ESTABLISHED)
beanstalk 32480 vagrant 8u IPv4 98279 0t0 TCP localhost:11300->localhost:55310 (ESTABLISHED)
beanstalk 32480 vagrant 9u IPv4 98284 0t0 TCP localhost:11300->localhost:55312 (ESTABLISHED)
beanstalk 32480 vagrant 10u IPv4 98291 0t0 TCP localhost:11300->localhost:55314 (ESTABLISHED)
beanstalk 32480 vagrant 11u IPv4 98296 0t0 TCP localhost:11300->localhost:55316 (ESTABLISHED)
beanstalk 32480 vagrant 12u IPv4 98301 0t0 TCP localhost:11300->localhost:55318 (ESTABLISHED)
beanstalk 32480 vagrant 13u IPv4 98306 0t0 TCP localhost:11300->localhost:55320 (ESTABLISHED)
beanstalk 32480 vagrant 14u IPv4 98311 0t0 TCP localhost:11300->localhost:55322 (ESTABLISHED)
beanstalk 32480 vagrant 16u IPv4 98236 0t0 TCP localhost:11300->localhost:55302 (ESTABLISHED)
beanstalk 32480 vagrant 17u IPv4 98252 0t0 TCP localhost:11300->localhost:55304 (ESTABLISHED)
Sinatra web-frontend would be awesome
Right now logger is only printing to stdout, maybe allow a better logger to be specified
Talking with @bradgessler want a way to hook into the worker and define arbitrary code to run before and after.
Backburner::Worker.on_start do |job|
NewRelic.add_instrumentation "..."
end
Backburner::Worker.before_enqueue do |job|
end
Backburner::Worker.after_enqueue do |job|
end
Hi, I'm loving the backburner/beanstalk combo and am using it to process large amounts of data within my app.
From what I have read so far; the prevailing advice seems to be to pass database ids into enqueue and then have the worker read the information it needs from the database.
I am doing this now - I persist data for the backburner worker to then (almost) immediately read that data back out (then delete that row once processing is complete).
This (the database write/read/delete) is proving to be a bit of a bottleneck within my app.
What I'd like to do is not to have to touch the database at all but to pass all my data straight to backburner and then have my workers process it without having to read it back from the database.
My data is around 12 distinct text strings (none more than a few hundred characters long) although some can be non-ASCII (ie, UTF-8) text.
Am I likely to hit any real problems with this approach?
Is anyone else already doing it this way?
Thanks in advance for any help.
Darren.
Hello
I am running Ubuntu 12.04TLS and beanstalkd 1.9
I have a very simple Rails 4 application which uses Backburner to do some audio processing, image and video downloads in the background.
It kinda works but sometimes some jobs are not processed.
I can reproduce it fairly easily and I got this kind of behaviour.
The code to enqueue the image to download is
Rails.logger.debug("[Asset.enqueue_download_from_url] id = #{id}, url = #{url}")
Backburner.enqueue ImageDownloadJob, self.id, url
And the code in my Job is
class ImageDownloadJob
include Backburner::Queue
queue "image-download"
queue_priority 1000 # most urgent priority is 0
def self.perform(image_id, url)
Rails.logger.debug("[ImageDownloadJob.perform] id = #{image_id}, url = #{url}")
Image.find(image_id).download_from_url(url)
end
end
and even though it works 9 / 10 times
sometimes I get the following in my development log
[Asset.enqueue_download_from_url] id = 71, url = http://upload.wikimedia.org/wikipedia/commons/8/89/Drosophilidae_compound_eye_.jpg
but no ImageDownloadJob.perform
trace afterwards. It seems like the job is never enqueued or never processed.
I'm using backburner:threads_on_fork:work
Here is my Procfile
web: bundle exec rails s
backburner: env QUEUE=image-download:3:50:2,video-generation:1:50:1,audio-analysis:1:50:1 rake backburner:threads_on_fork:work
rails_logs: tail -f log/development.log
backburner_logs: tail -f log/backburner.development.log
Is it a known bug of the backburner:threads_on_fork:work strategy ? Any advice ?
Thanks in advance, best regards
Geoffroy
beanstalkd://localhost/foo?retry_times=5
# parse prefix and retry from the url
From @bradgessler:
I don't think having different queues per app is a good idea in a beanstalkd world. Resque does this because there's no concept of priorities in Redis. Since Beanstalkd lets people specify priories of jobs, its a moot point to run different numbers of workers for different queue names. Put more emphasis on priority to deal with this.
I like the idea of having all classes by default piping to the same tube and using priorities more heavily. If a job should be in a different tube, then that is of course possible using the same format today.
Hi I am looking in to setting up beanstalkd and backburned in a HA context.
To protect against a beanstalkd instance going down I have multiple instances fronted by haproxy.
My test is using the Simple worker.
Whilst executing the work_one_job method in worker.rb, backburner attempts to reserve a job without using a timeout. When there are no jobs on a particular tube, the connection is kept open until one arrives.
When connecting to beanstalkd directly, the connection is held open for as long as it takes for a job to arrive. However haproxy terminates the connection, after the timeout that it is configured with is exceeded.
At this point the exception is caught but a further exception is thrown as the job variable is nil. The fix to this problem is unfortunately not a one liner as the tcpsocket connection held in a class variable is also broken so subsequent retries would also fail.
What do you think the solution should be to make connections to beanstalkd more resilient, when operating in the this context? Should these terminations be handled? I would like to contribute to the project, but would like to know your thoughts before I spend to much time going in the wrong direction.
Thanks for your help,
lashd
Backburner looks great by the way.
I've noticed a problem with the ThreadsOnFork worker once the job queue goes empty. If there is a thread trying to reserve a job on an empty queue, it holds the mutex so no other communication can happen on that connection (note: the mutex code is in beaneater, though the mutex code itself is not necessarily a bug, IMO). This is problematic if another thread has just reserved a job and is trying to process said job. Most importantly, in order to do actual job processing, the 'stats-job' is run to retrieve the ttr from beanstalkd, and if this job is held up then the already-reserved jobs will fail to process, and you're deadlocked.
The only way I've seen this deadlock broken is when beanstalkd eventually returns 'DEADLINE_SOON' to the blocking reserve jobs, by default 120s later. This is pretty terrible latency, however.
You can alleviate this by having more connection addresses defined in your backburner config; beaneater will use these as a pool, but you'll still get collisions where two threads are trying to use the same connection. Really, no two threads should ever try to use the same connection/socket if a blocking reserve is involved.
An easy way to reproduce this is to instantiate a ThreadsOnFork worker for a particular tube with n (>= 2) max threads, and then queue m (n < m < 2n) jobs onto that tube. The first n jobs should process immediately, and the remaining ones should get hung in a state where they have technically been reserved, but can't communicate with the beanstalkd server so they won't process for 120 seconds (unless you've overridden ttr). In my particular case, I had 10 threads and 15 jobs.
Resque uses forking to control memory management and bloat and I think this could be a good idea to have a fork worker to apply the same principles.
From @bradgessler:
CLI for querying/filtering jobs for performing batch operations on jobs that are in the queue. This is important for when things go bad in production and certain jobs with certain payloads may need to be buried until a patch can be pushed to prod and the jobs are re-run. Web GUI might be able to use this CLI interface.
From @bradgessler:
We had a mis-numbered priority in production once that brought our system down. We'd like to have "named" priorities that are stack-ranked. This is a simple DSL that looks like: Backburner.config.priorities = [:high, :medium, :custom_pri, :low]
, which is mapped to the int values that beanstalkd understands. When tossing a job on the queue, a named priority could be specified like User.new.async(pri: :high).blah
.
I am using DataMapper callbacks to call a method that imports user data from Zendesk when I create a "desk" object in my app. This is working great. However, importing the user data can take some time for some of the larger accounts I work with. Does anyone have any advice on how to call this callback via backburner?
http://datamapper.org/docs/callbacks.html
How does backburner handle restarts or shutdowns? Do the running jobs get put back into the queue or do they just "timeout"?
From #52, need to support priority labels in configure block
Track backburner tasks with newrelic API for background processing.
When I Backburner, I am unable to run the worker with the updated code, it still seems to run the old code. Any pointers?
From @bradgessler:
I'm not convinced that the Job mix-in is the best approach. A job should be a class itself. Mixin's make sense as syntatical sugar to make it easier to queue an instance of a class into a job, and from pulling the job off the queue and shoving it back into that class for processing.
I agree, would be ideal to support any ruby object that responds to perform ala resque. I still want to keep the mixin around for the nice syntactic sugar it affords.
After chatting with @kr today, have a few ideas to jot down for a first class backburner admin frontend panel. First I would want visibility into the jobs in the ready queue as well as a view showing failed jobs with backtraces and a way to 'kick' a job, all available via a sinatra web UI.
In particular, some ideas on how to do this.
Create a sinatra view that reserve(0) 100 jobs and then collects the information and immediately releases them. Aggregate the 100 jobs and display them in the sinatra view. By reserving and releasing immediately, we can create a view for both development and production by showing the next 100 jobs.
I would want a place for buried jobs where you can see the next 100 buried jobs across all queues perhaps. For this we could have a special tube called 'failed-jobs' that is special and used by the admin panel. In Backburner, everytime a job fails and is buried, we can then insert the job into the 'failed-jobs' queue before burying it. Then when we want to show last 100 failing jobs simply reserve(0) and release 100 jobs from that tube. In this way we can have a buried jobs list. We can even have a button to kick the job (which will then kick the job based on the real id). The way I was thinking about it, the failed-jobs tube could have jobs with this format:
{ "job-id" : 1234, "tube" : "foo", "backtrace" : "...", "tries" : 3 }
and then this can be used to display the buried jobs in a table.
Also, I love beanstalkd_view and it's such a great start, I wonder @denniskuczynski if you would have any interest being a core contributor to backburner and helping out with the admin panel. Ideally we could have a familiar feel to https://github.com/defunkt/resque#section_The_Front_End and obviously be unabashedly clear we are inspired by that project's interface as a point of reference (as well as beanstalkd_view itself).
Cleanup github pages and add logo. As suggested by @bradgessler we can use http://thenounproject.com/noun/stove/#icon-No4325 as a starting point.
Download the SVG and PNG here: https://www.dropbox.com/s/kndb4kv5py4m5nh/stove.zip
Wouldn't it be convenient if we could specify timeout in reserve method which is called from work_one_job method. So if worker waits on reserve command longer than specified timeout it stops automatically. Actually this is first parameter which can be passed to reserve method of laying below in hierarchy Beaneater::Tubes class.
I want to run such a worker from time to time so I would have it only to process current requests after processing it should stop.
I am going to send to beanstalkd 20 tasks at once. Each pack of 20 tasks will be assigned to different tube. Each Tube will have only one worker which will process tasks, so I will be able to log data from processing these particular 20 tasks to separate file. That is why I would like to have possibility to run worker which will stop after it sees that there is nothing more to do for him. And this 'seeing' could be basically timeout on reserve.
I believe that reserve-with-timeout is the way it may be achieved.
Please correct me if I am wrong and tell me if you are going to add this nice feature to backburner.
Regards,
Mateusz
It is expected that the client should try to connect to each server. If a connection is not established it will remove the connection from its connection pool. As long as the number of successful connections to a server is greater than zero, work should continue to occur. Ideally, the client would be smart enough to retry the connection to failed server and add it back to the pool when its connection is restored.
The following will reproduce the error if you start a worker on port 11300 or 11301, not both.
require 'backburner'
Backburner.configure do |config|
config.beanstalk_url = ['beanstalk://127.0.0.1:11300', 'beanstalk://127.0.0.1:11301']
end
class Job
def self.perform(message)
p message
end
end
Backburner::Worker.enqueue(Job, ['Hello'])
Beaneater::NotConnected: Could not connect to '127.0.0.1:11301'
from /usr/lib64/ruby/gems/2.1.0/gems/beaneater-0.3.2/lib/beaneater/connection.rb:96:in `rescue in establish_connection'
from /usr/lib64/ruby/gems/2.1.0/gems/beaneater-0.3.2/lib/beaneater/connection.rb:92:in `establish_connection'
from /usr/lib64/ruby/gems/2.1.0/gems/beaneater-0.3.2/lib/beaneater/connection.rb:36:in `initialize'
from /usr/lib64/ruby/gems/2.1.0/gems/beaneater-0.3.2/lib/beaneater/pool.rb:25:in `new'
from /usr/lib64/ruby/gems/2.1.0/gems/beaneater-0.3.2/lib/beaneater/pool.rb:25:in `block in initialize'
from /usr/lib64/ruby/gems/2.1.0/gems/beaneater-0.3.2/lib/beaneater/pool.rb:25:in `map'
from /usr/lib64/ruby/gems/2.1.0/gems/beaneater-0.3.2/lib/beaneater/pool.rb:25:in `initialize'
from /usr/lib64/ruby/gems/2.1.0/gems/backburner-0.4.5/lib/backburner/connection.rb:27:in `new'
from /usr/lib64/ruby/gems/2.1.0/gems/backburner-0.4.5/lib/backburner/connection.rb:27:in `connect!'
from /usr/lib64/ruby/gems/2.1.0/gems/backburner-0.4.5/lib/backburner/connection.rb:13:in `initialize'
from /usr/lib64/ruby/gems/2.1.0/gems/backburner-0.4.5/lib/backburner/worker.rb:58:in `new'
from /usr/lib64/ruby/gems/2.1.0/gems/backburner-0.4.5/lib/backburner/worker.rb:58:in `connection'
from /usr/lib64/ruby/gems/2.1.0/gems/backburner-0.4.5/lib/backburner/worker.rb:185:in `retry_connection!'
from /usr/lib64/ruby/gems/2.1.0/gems/backburner-0.4.5/lib/backburner/worker.rb:173:in `rescue in retryable_command'
from /usr/lib64/ruby/gems/2.1.0/gems/backburner-0.4.5/lib/backburner/worker.rb:170:in `retryable_command'
from /usr/lib64/ruby/gems/2.1.0/gems/backburner-0.4.5/lib/backburner/worker.rb:33:in `enqueue'
The problem lies within Backburner::Worker#retry_connection!
. Instead of the simple @connection = nil
, the failed connection should be removed before retrying.
I have some trouble with backburner. I have a setup with two workers running against two beanstalk instances. Somehow, it seems like jobs get 'stuck' somehow sometimes. I can see the workers in beanstalkd view like this:
And if I keep restarting the workers, it eventually picks up the jobs and process them. I'm wondering if this could be related to having multiple beanstalk servers, but I am a bit at a loss about how to debug further.
When I run backburner as a daemon, it seems to always prefix the queue with 'backburner.worker' irrespective of what is in my apps config file or what I provide as command-line parameters.
Working 1 queues: [ backburner.worker.queue.myapp.mailer ]
I tried to change the app config to load work into the queue named above, but then it seems that the environment isn't loaded properly so jobs that get queued get immediately buried.
The rake task always works but is ugly and difficult to connect to monit or God. This is what I'm doing as a short-term work around.
nohup rake backburner:work &
Any ideas?
config.beanstalk_url = ["beanstalk://127.0.0.1:11300"]
config.tube_namespace = "myapp"
config.on_error = lambda { |e| puts e }
config.max_job_retries = 3 # default 0 retries
config.retry_delay = 5 # default 5 seconds
config.default_priority = 65536
config.respond_timeout = 120
config.default_worker = Backburner::Workers::Simple
config.default_queues = ["staging", "staging-mailer"]
config.logger = Logger.new("backburner-staging.log")
Currently the max_job_retries and retry_delay settings apply to every tube that backburner watches. For some tubes that I'm processing I'd like a large number of retries for, while for others others I'd like a single retry. It would be great if back burner could be configured to support this.
I am trying to use backburner in my rails application and am using Backburner.enqueue to add jobs to the beanstalkd queue. However, if the connection to the beanstalkd server breaks, all calls to enqueue fail with the exception Beaneater::NotConnected. Even if beanstalkd is back online, enqueue continues to fail since @connection in Backburner::Worker is still set and a new Connection is not made.
Maybe I am just missing something simple in the code/documentation, but what is the best way to reattempt/reset a connection without creating a setter for the connection method (patching) in the Worker? I do not want to restart my rails application in order for a new connection to be made.
Is there a way for me to check on a job after it's been enqueued? Preferably after calling "async.command" I could get an id back that I can go back and check so I can follow up afterwards?
My use-case is this. I am triggering async jobs from several parts of my web application. I'd like to give the user feedback about the completion of those jobs and I can't move the user forward until they finish. There are other parts where we trigger a job but I need to give them a way to see if they were successful or failed. I'm trying to find a way where I don't need to wrap all of these with another layer of database tables to provide this information.
Not really an issue, I just don't know where is the best place to ask this question.
I'm just starting with background jobs and wondering what is advantage of backburner over beaneater?
Shreko
As part of moving work away from rails runner based cron jobs into a backburner based system I've started on a simple way to enqueue jobs from the command line. Is that something you'd be interested in incorporating into backburner?
Tube namespacing to support versioning? in the wiki
When I run my workers, they can't seem to find the objects. I get this error:
burying undefined method `getagents' for #<Enumerator: Desk:find(2)>
Has anyone else seen this before?
@ShadowBelmolve is making great progress with an interesting threads_on_fork worker hybrid. Hope to get that released into backburner with docs and tests soon.
At the moment I have 16 workers processing jobs in parallel.
The outcome of these jobs is either no action or a write to a database.
As each process saves to the database it invokes its own separate BEGIN/COMMIT database transaction. This is proving to be quite slow and I'd like to find a way to speed that bit up.
I was wondering about this approach instead:
Instead of saving to a database - send the data to a different beanstalk queue/tube.
For that queue I'd like to read a batch of jobs into a single worker and then commit them in a single database transaction.
If I could process jobs in a batch of 50 at a time I could significantly reduce the number of database commits.
Can I do such a thing with backburner?
Or if not, maybe using beaneater instead?
Any issues with it from a theoretical standpoint?
Thanks in advance, Darren.
how can I delete a job from beanstalkd queues when using backburner gem
If a worker process receives SIGTERM while a mysql query is running, rails apparently rescues the SignalException and re-raises it as an ActiveRecord::StatementInvalid. This causes Backburner to gracefully recover from the exception, bury the job, and keep on working additional jobs, effectively ignoring the intent of the TERM signal.
Apparently this issue is at least somewhat common:
http://stackoverflow.com/questions/548048/activerecordstatementinvalid-when-process-receives-sigterm
I'm not sure if this is a problem on all workers, but we can confirm it is present on the simple worker, and the net effect is that once every few deploys, our old workers simply refuse to die when we attempt to shut them down.
I'm trying to use Backburner in my application to create, say, 50 workers all watching the same tube for incoming jobs. The problem is, each of those 50 workers, when performing a job, does some heavy file manipulation, etc., and each of them needs to perform some operations on a uniquely-named numbered directory. For example, worker 7 needs to work with a directory named dir7
. How do I handle this situation using Backburner?
Initially I thought I could make a ThreadsOnFork
worker with 50 threads, and I'd be able to access a number associated with each thread from the worker's perform()
method, but I haven't been able to do this yet.
Please help. Thanks!
PS: Apologies for asking this question on GitHub Issues, but I couldn't find a link to any official forum / google group on the backburner page at http://nesquena.github.com/backburner/.
Add hooks for plugins ala resque
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.