seomoz / qless Goto Github PK

Queue / Pipeline Management

License: MIT License

Ruby 38.28% Shell 0.30% Lua 24.05% CSS 7.01% JavaScript 22.59% HTML 7.77%

qless's Introduction

qless

Qless is a powerful Redis-based job queueing system inspired by resque, but built on a collection of Lua scripts, maintained in the qless-core repo.

Philosophy and Nomenclature

A job is a unit of work identified by a job id or jid. A queue can contain several jobs that are scheduled to be run at a certain time, several jobs that are waiting to run, and jobs that are currently running. A worker is a process on a host, identified uniquely, that asks for jobs from the queue, performs some process associated with that job, and then marks it as complete. When it's completed, it can be put into another queue.

Jobs can only be in one queue at a time. That queue is whatever queue they were last put in. So if a worker is working on a job, and you move it, the worker's request to complete the job will be ignored.

A job can be canceled, which means it disappears into the ether, and we'll never pay it any mind ever again. A job can be dropped, which is when a worker fails to heartbeat or complete the job in a timely fashion, or a job can be failed, which is when a host recognizes some systematically problematic state about the job. A worker should only fail a job if the error is likely not a transient one; otherwise, that worker should just drop it and let the system reclaim it.

Features

Jobs don't get dropped on the floor -- Sometimes workers drop jobs. Qless automatically picks them back up and gives them to another worker
Tagging / Tracking -- Some jobs are more interesting than others. Track those jobs to get updates on their progress. Tag jobs with meaningful identifiers to find them quickly in the UI.
Job Dependencies -- One job might need to wait for another job to complete
Stats -- qless automatically keeps statistics about how long jobs wait to be processed and how long they take to be processed. Currently, we keep track of the count, mean, standard deviation, and a histogram of these times.
Job data is stored temporarily -- Job info sticks around for a configurable amount of time so you can still look back on a job's history, data, etc.
Priority -- Jobs with the same priority get popped in the order they were inserted; a higher priority means that it gets popped faster
Retry logic -- Every job has a number of retries associated with it, which are renewed when it is put into a new queue or completed. If a job is repeatedly dropped, then it is presumed to be problematic, and is automatically failed.
Web App -- With the advent of a Ruby client, there is a Sinatra-based web app that gives you control over certain operational issues
Scheduled Work -- Until a job waits for a specified delay (defaults to 0), jobs cannot be popped by workers
Recurring Jobs -- Scheduling's all well and good, but we also support jobs that need to recur periodically.
Notifications -- Tracked jobs emit events on pubsub channels as they get completed, failed, put, popped, etc. Use these events to get notified of progress on jobs you're interested in.

Enqueing Jobs

First things first, require qless and create a client. The client accepts all the same arguments that you'd use when constructing a redis client.

require 'qless'

# Connect to localhost
client = Qless::Client.new
# Connect to somewhere else
client = Qless::Client.new(:host => 'foo.bar.com', :port => 1234)

Jobs should be classes or modules that define a perform method, which must accept a single job argument:

class MyJobClass
  def self.perform(job)
    # job is an instance of `Qless::Job` and provides access to
    # job.data, a means to cancel the job (job.cancel), and more.
  end
end

Now you can access a queue, and add a job to that queue.

# This references a new or existing queue 'testing'
queue = client.queues['testing']
# Let's add a job, with some data. Returns Job ID
queue.put(MyJobClass, :hello => 'howdy')
# => "0c53b0404c56012f69fa482a1427ab7d"
# Now we can ask for a job
job = queue.pop
# => <Qless::Job 0c53b0404c56012f69fa482a1427ab7d (MyJobClass / testing)>
# And we can do the work associated with it!
job.perform

The job data must be serializable to JSON, and it is recommended that you use a hash for it. See below for a list of the supported job options.

The argument returned by queue.put is the job ID, or jid. Every Qless job has a unique jid, and it provides a means to interact with an existing job:

# find an existing job by it's jid
job = client.jobs[jid]

# Query it to find out details about it:
job.klass # => the class of the job
job.queue # => the queue the job is in
job.data  # => the data for the job
job.history # => the history of what has happened to the job sofar
job.dependencies # => the jids of other jobs that must complete before this one
job.dependents # => the jids of other jobs that depend on this one
job.priority # => the priority of this job
job.tags # => array of tags for this job
job.original_retries # => the number of times the job is allowed to be retried
job.retries_left # => the number of retries left

# You can also change the job in various ways:
job.requeue("some_other_queue") # move it to a new queue
job.cancel # cancel the job
job.tag("foo") # add a tag
job.untag("foo") # remove a tag

Running A Worker

The Qless ruby worker was heavily inspired by Resque's worker, but thanks to the power of the qless-core lua scripts, it is much simpler and you are welcome to write your own (e.g. if you'd rather save memory by not forking the worker for each job).

As with resque...

The worker forks a child process for each job in order to provide resilience against memory leaks. Pass the RUN_AS_SINGLE_PROCESS environment variable to force Qless to not fork the child process. Single process mode should only be used in some test/dev environments.
The worker updates its procline with its status so you can see what workers are doing using ps.
The worker registers signal handlers so that you can control it by sending it signals.
The worker is given a list of queues to pop jobs off of.
The worker logs out put based on VERBOSE or VVERBOSE (very verbose) environment variables.
Qless ships with a rake task (qless:work) for running workers. It runs qless:setup before starting the main work loop so that users can load their environment in that task.
The sleep interval (for when there is no jobs available) can be configured with the INTERVAL environment variable.

Resque uses queues for its notion of priority. In contrast, qless has priority support built-in. Thus, the worker supports two strategies for what order to pop jobs off the queues: ordered and round-robin. The ordered reserver will keep popping jobs off the first queue until it is empty, before trying to pop job off the second queue. The round-robin reserver will pop a job off the first queue, then the second queue, and so on. You could also easily implement your own.

To start a worker, write a bit of Ruby code that instantiates a worker and runs it. You could write a rake task to do this, for example:

namespace :qless do
  desc "Run a Qless worker"
  task :work do
    # Load your application code. All job classes must be loaded.
    require 'my_app/environment'

    # Require the parts of qless you need
    require 'qless'
    require 'qless/job_reservers/ordered'
    require 'qless/worker'

    # Create a client
    client = Qless::Client.new(:host => 'foo.bar.com', :port => 1234)

    # Get the queues you use
    queues = %w[ queue_1 queue_2 ].map do |name|
      client.queues[name]
    end

    # Create a job reserver; different reservers use different
    # strategies for which order jobs are popped off of queues
    reserver = Qless::JobReservers::Ordered.new(queues)

    # Create a forking worker that uses the given reserver to pop jobs.
    worker = Qless::Workers::ForkingWorker.new(reserver)

    # Start the worker!
    worker.run
  end
end

The following signals are supported in the parent process:

TERM: Shutdown immediately, stop processing jobs.
INT: Shutdown immediately, stop processing jobs.
QUIT: Shutdown after the current job has finished processing.
USR1: Kill the forked child immediately, continue processing jobs.
USR2: Don't process any new jobs, and dump the current backtrace.
CONT: Start processing jobs again after a USR2

You should send these to the master process, not the child.

The child process supports the USR2 signal, whch causes it to dump its current backtrace.

Workers also support middleware modules that can be used to inject logic before, after or around the processing of a single job in the child process. This can be useful, for example, when you need to re-establish a connection to your database in each job.

Define a module with an around_perform method that calls super where you want the job to be processed:

module ReEstablishDBConnection
  def around_perform(job)
    MyORM.establish_connection
    super
  end
end

Then, mix-it into the worker class. You can mix-in as many middleware modules as you like:

require 'qless/worker'
Qless::Worker.class_eval do
  include ReEstablishDBConnection
  include SomeOtherAwesomeMiddleware
end

Per-Job Middlewares

Qless also supports middleware on a per-job basis, when you have some orthogonal logic to run in the context of some (but not all) jobs.

Per-job middlewares are defined the same as worker middlewares:

module ReEstablishDBConnection
  def around_perform(job)
    MyORM.establish_connection
    super
  end
end

To add them to a job class, you first have to make your job class middleware-enabled by extending it with Qless::Job::SupportsMiddleware, then extend your middleware modules:

class MyJobClass
  extend Qless::Job::SupportsMiddleware
  extend ReEstablishDBConnection
  extend MyOtherAwesomeMiddleware

  def self.perform(job)
  end
end

Note that Qless::Job::SupportsMiddleware must be extended onto your job class before any other middleware modules.

Web Interface

Qless ships with a resque-inspired web app that lets you easily deal with failures and see what it is processing. If you're project has a rack-based ruby web app, we recommend you mount Qless's web app in it. Here's how you can do that with Rack::Builder in your config.ru:

client = Qless::Client.new(:host => "some-host", :port => 7000)

Rack::Builder.new do
  use SomeMiddleware

  map('/some-other-app') { run Apps::Something.new }
  map('/qless')          { run Qless::Server.new(client) }
end

For an app using Rails 3+, check the router documentation for how to mount rack apps.

If you wish to run the web interface from the exe directory, you have the option to run the server as a daemon. Running as a daemon is default behavior. To run in the foreground, pass the --foreground or -F flag:

PATH_TO_QLESS_DIST/exe/qless-web -F

Job Dependencies

Let's say you have one job that depends on another, but the task definitions are fundamentally different. You need to bake a turkey, and you need to make stuffing, but you can't make the turkey until the stuffing is made:

queue        = client.queues['cook']
stuffing_jid = queue.put(MakeStuffing, {:lots => 'of butter'})
turkey_jid   = queue.put(MakeTurkey  , {:with => 'stuffing'}, :depends=>[stuffing_jid])

When the stuffing job completes, the turkey job is unlocked and free to be processed.

Priority

Some jobs need to get popped sooner than others. Whether it's a trouble ticket, or debugging, you can do this pretty easily when you put a job in a queue:

queue.put(MyJobClass, {:foo => 'bar'}, :priority => 10)

What happens when you want to adjust a job's priority while it's still waiting in a queue?

job = client.jobs['0c53b0404c56012f69fa482a1427ab7d']
job.priority = 10
# Now this will get popped before any job of lower priority

Scheduled Jobs

If you don't want a job to be run right away but some time in the future, you can specify a delay:

# Run at least 10 minutes from now
queue.put(MyJobClass, {:foo => 'bar'}, :delay => 600)

This doesn't guarantee that job will be run exactly at 10 minutes. You can accomplish this by changing the job's priority so that once 10 minutes has elapsed, it's put before lesser-priority jobs:

# Run in 10 minutes
queue.put(MyJobClass, {:foo => 'bar'}, :delay => 600, :priority => 100)

Recurring Jobs

Sometimes it's not enough simply to schedule one job, but you want to run jobs regularly. In particular, maybe you have some batch operation that needs to get run once an hour and you don't care what worker runs it. Recurring jobs are specified much like other jobs:

# Run every hour
queue.recur(MyJobClass, {:widget => 'warble'}, 3600)
# => 22ac75008a8011e182b24cf9ab3a8f3b

You can even access them in much the same way as you would normal jobs:

job = client.jobs['22ac75008a8011e182b24cf9ab3a8f3b']
# => < Qless::RecurringJob 22ac75008a8011e182b24cf9ab3a8f3b >

Changing the interval at which it runs after the fact is trivial:

# I think I only need it to run once every two hours
job.interval = 7200

If you want it to run every hour on the hour, but it's 2:37 right now, you can specify an offset which is how long it should wait before popping the first job:

# 23 minutes of waiting until it should go
queue.recur(MyJobClass, {:howdy => 'hello'}, 3600, :offset => 23 * 60)

Recurring jobs also have priority, a configurable number of retries, and tags. These settings don't apply to the recurring jobs, but rather the jobs that they create. In the case where more than one interval passes before a worker tries to pop the job, more than one job is created. The thinking is that while it's completely client-managed, the state should not be dependent on how often workers are trying to pop jobs.

  # Recur every minute
  queue.recur(MyJobClass, {:lots => 'of jobs'}, 60)
  # Wait 5 minutes
  queue.pop(10).length
  # => 5 jobs got popped

Configuration Options

You can get and set global (read: in the context of the same Redis instance) configuration to change the behavior for heartbeating, and so forth. There aren't a tremendous number of configuration options, but an important one is how long job data is kept around. Job data is expired after it has been completed for jobs-history seconds, but is limited to the last jobs-history-count completed jobs. These default to 50k jobs, and 30 days, but depending on volume, your needs may change. To only keep the last 500 jobs for up to 7 days:

client.config['jobs-history'] = 7 * 86400
client.config['jobs-history-count'] = 500

Tagging / Tracking

In qless, 'tracking' means flagging a job as important. Tracked jobs have a tab reserved for them in the web interface, and they also emit subscribable events as they make progress (more on that below). You can flag a job from the web interface, or the corresponding code:

client.jobs['b1882e009a3d11e192d0b174d751779d'].track

Jobs can be tagged with strings which are indexed for quick searches. For example, jobs might be associated with customer accounts, or some other key that makes sense for your project.

queue.put(MyJobClass, {:tags => 'aplenty'}, :tags => ['12345', 'foo', 'bar'])

This makes them searchable in the web interface, or from code:

jids = client.jobs.tagged('foo')

You can add or remove tags at will, too:

job = client.jobs['b1882e009a3d11e192d0b174d751779d']
job.tag('howdy', 'hello')
job.untag('foo', 'bar')

Notifications

Tracked jobs emit events on specific pubsub channels as things happen to them. Whether it's getting popped off of a queue, completed by a worker, etc. A good example of how to make use of this is in the qless-campfire or qless-growl. The jist of it goes like this, though:

client.events.listen do |on|
  on.canceled  { |jid| puts "#{jid} canceled"   }
  on.stalled   { |jid| puts "#{jid} stalled"    }
  on.track     { |jid| puts "tracking #{jid}"   }
  on.untrack   { |jid| puts "untracking #{jid}" }
  on.completed { |jid| puts "#{jid} completed"  }
  on.failed    { |jid| puts "#{jid} failed"     }
  on.popped    { |jid| puts "#{jid} popped"     }
  on.put       { |jid| puts "#{jid} put"        }
end

Those familiar with redis pubsub will note that a redis connection can only be used for pubsub-y commands once listening. For this reason, invoking client.events actually creates a second connection so that client can still be used as it normally would be:

client.events do |on|
  on.failed do |jid|
  puts "#{jid} failed in #{client.jobs[jid].queue_name}"
  end
end

Heartbeating

When a worker is given a job, it is given an exclusive lock to that job. That means that job won't be given to any other worker, so long as the worker checks in with progress on the job. By default, jobs have to either report back progress every 60 seconds, or complete it, but that's a configurable option. For longer jobs, this may not make sense.

# Hooray! We've got a piece of work!
job = queue.pop
# How long until I have to check in?
job.ttl
# => 59
# Hey! I'm still working on it!
job.heartbeat
# => 1331326141.0
# Ok, I've got some more time. Oh! Now I'm done!
job.complete

If you want to increase the heartbeat in all queues,

# Now jobs get 10 minutes to check in
client.config['heartbeat'] = 600
# But the testing queue doesn't get as long.
client.queues['testing'].heartbeat = 300

When choosing a heartbeat interval, realize that this is the amount of time that can pass before qless realizes if a job has been dropped. At the same time, you don't want to burden qless with heartbeating every 10 seconds if your job is expected to take several hours.

An idiom you're encouraged to use for long-running jobs that want to check in their progress periodically:

# Wait until we have 5 minutes left on the heartbeat, and if we find that
# we've lost our lock on a job, then honorably fall on our sword
if (job.ttl < 300) && !job.heartbeat
  return / die / exit
end

Stats

One nice feature of qless is that you can get statistics about usage. Stats are aggregated by day, so when you want stats about a queue, you need to say what queue and what day you're talking about. By default, you just get the stats for today. These stats include information about the mean job wait time, standard deviation, and histogram. This same data is also provided for job completion:

# So, how're we doing today?
stats = client.stats.get('testing')
# => { 'run' => {'mean' => ..., }, 'wait' => {'mean' => ..., }}

Time

It's important to note that Redis doesn't allow access to the system time if you're going to be making any manipulations to data (which our scripts do). And yet, we have heartbeating. This means that the clients actually send the current time when making most requests, and for consistency's sake, means that your workers must be relatively synchronized. This doesn't mean down to the tens of milliseconds, but if you're experiencing appreciable clock drift, you should investigate NTP. For what it's worth, this hasn't been a problem for us, but most of our jobs have heartbeat intervals of 30 minutes or more.

Ensuring Job Uniqueness

As mentioned above, Jobs are uniquely identied by an id--their jid. Qless will generate a UUID for each enqueued job or you can specify one manually:

queue.put(MyJobClass, { :hello => 'howdy' }, :jid => 'my-job-jid')

This can be useful when you want to ensure a job's uniqueness: simply create a jid that is a function of the Job's class and data, it'll guaranteed that Qless won't have multiple jobs with the same class and data.

Setting Default Job Options

Qless::Queue#put accepts a number of job options (see above for their semantics):

jid
delay
priority
tags
retries
depends

When enqueueing the same kind of job with the same args in multiple places it's a pain to have to declare the job options every time. Instead, you can define default job options directly on the job class:

class MyJobClass
  def self.default_job_options(data)
    { :priority => 10, :delay => 100 }
  end
end

queue.put(MyJobClass, { :some => "data" }, :delay => 10)

Individual jobs can still specify options, so in this example, the job would be enqueued with a priority of 10 and a delay of 10.

Testing Jobs

When unit testing your jobs, you will probably want to avoid the overhead of round-tripping them through redis. You can of course use a mock job object and pass it to your job class's perform method. Alternately, if you want a real full-fledged Qless::Job instance without round-tripping it through Redis, use Qless::Job.build:

describe MyJobClass do
  let(:client) { Qless::Client.new }
  let(:job)    { Qless::Job.build(client, MyJobClass, :data => { "some" => "data" }) }

  it 'does something' do
    MyJobClass.perform(job)
    # make an assertion about what happened
  end
end

The options hash passed to Qless::Job.build supports all the same options a normal job supports. See the source for a full list.

Contributing

To bootstrap an environment, first have a redis.

Have rvm or rbenv. Then to install the dependencies:

rbenv install                 # rbenv only.  Install bundler if you need it.
bundle install
./exe/install_phantomjs       # Bring in phantomjs 1.7.0 for tests.
rbenv rehash                  # rbenv only
git submodule init
git submodule update
bundle exec rake core:build

To run the tests:

bundle exec rake spec

The locally installed redis will be flushed before and after each test run.

To change the redis instance used in tests, put the connection information into ./spec/redis.config.yml.

To help develop the web UI, run bundle exec ./utils/dev/qless-web-dev to run the server with seed data.

To contribute, fork the repo, use feature branches, run the tests and open PRs.

Mailing List

For questions and general Qless discussion, please join the Qless Mailing list.

Release Notes

0.12.0

The metric failures provided by qless-stats has been replaced by failed for compatibility with users of graphite. See #275 for more details.

qless's People

Contributors

Stargazers

Watchers

Forkers

eristoddle penguinxr2 jstorimer plambert elucid dreadjr lantiga unimatrixzxero databus23 charlesmartin14 trtg gurpartap pospischil backupify msonnabaum conarro aub lrechert pombredanne ronalchn jaanek canadaduane web5design metaware stephenott custora bjhaid pintsized snormore fingul bugroger jsonck instructure prakashru shopify johnjansen jwfearn getmelisted laurenerechert rails-pre girasquid erkki straywarrior a0s icheishvili aaronlifton j-mutter digitallyimported r38y clearbit dlecocq osadchyi-s wallarm napolskih koalephant montekidlo matt-peters pedropb bravotran coopergillan iq-scm knasyrov clash-craft

qless's Issues

Add support for displaying recurring jobs

With the advent of recurring jobs, we should be able to look at them in the UI

On Web UI I occasionally get InternalServerError w/ no error message or backtrace

I think it's just a matter of tweaking some sinatra settings on the qless app. Ideally I'd like the error to be raised by the app because I have middleware higher up that will catch it and display it the way I like. This might simply be a config issue in our app but figured I'd mention it here.

Tag Management

We need to have a way to both add and remove tags from a job. There is support for this in the client libraries, and it should be trivial to add to the server. It really just needs a way to do it in the UI.

I'm not terribly happy with the tags in Twitter bootstrap, but I'm looking for something that probably has a little close button on each tag (à la <span class="icon-remove"></span>), and then maybe a plus sign that expands out to add new tags? All the tagging I've done thus far has been using the client directly.

Web UI Tag searches don't show all jobs that match the tag

It appears to only show the first 25. If you have more jobs matching a tag (which we frequently do), there's no way to see all the jobs. It'd be nice to have a way to paginate through all the jobs.

Also, it'd be nice if they were sorted so that completed jobs came last. When searching on a tag, I'm most interested in failed or waiting jobs for that tag.

Remove client argument from Job.build

Currently, the signature for Job.build is:

def self.build(client, klass, attributes = {})
end

However, that client argument feels like it doesn't belong. On several occasions I've used Job.build and instinctively called it like so:

Qless::Job.build(MyJobClass, data: { "some" => "data" })

...which of course, doesn't work as I expect. I think that if you're using Job.build you don't want a real client, anyway, since the point is to unit test a job and not hit redis. So I think Job.build should change to not accept a real client, and simply pass a null implementation client into #new.

@proby / @dlecocq -- thoughts?

Sort Failures by Number of Jobs

In the overview, we should probably show jobs in descending order of the number of affected jobs

Indicate whether or not a queue is paused on the Qless UI

As recommended by @proby in #64, we should indicate on the Qless UI whether or not a queue is paused.

Provide a way to see which queues a worker is processing

The UI shows the current job a worker is processing (and what queue it is in) but it doesn't show all of the queues a worker is pulling jobs from.

Failure Grouping Too Long

When the failure group is too long, it can look funny when reporting the number of jobs affected.

Fix `depends` to work when a job is not in the `depends` state

Currently Qless::Job#depend(jid) only works if the job is in the depends state. This surprised me; as I see it, the primary use case of this method is to add dependencies to a job that has none, but it doesn't even work in this case.

Better Messaging for 'Cancel' Button

Some people think the 'cancel' button is misleading, and looks more like a 'pause' / 'stop.' Perhaps this should be rethunk.

Views missing in 0.9.1 gem?

Hey,

I'm just trying to get Qless running and it looks like the lib/qless/server folder is just missing in the 0.9.1 gem which is up on rubygems.org. Is that correct? Am I missing something or am I just stupid as hell?

Need to reconnect to redis after forking (in the child process)

The redis-rb 3.x gem demands it: https://github.com/redis/redis-rb/blob/master/lib/redis/client.rb#L276-L279

Subqueue Large Numbers of Jobs

Phil suggested that depending on the performance profile, it might be worth subqueueing large queues to avoid the penalty of maintaining a sorted set.

In light of some benchmarks, it seems to scale well (about a 10% performance degradation) up to millions of jobs in the queue at a single time. Memory is the limiting factor long before we hit this issue, but it's something that we might consider in the future.

Depends Tab for Tracked Jobs

Just realized that there's no 'depends' tab under 'track'

Blocking Pop

Phil thinks that this is important, depending on how many jobs we're expecting to come through, how often, etc. I'd like to follow up with him to make sure that this is necessary, though.

Allow heartbeat timeout to be set on a per-job basis

It should fall back to the queue's setting and than the global one.

Robust Retry Selection

Instead of just retrying all failures of a certain grouping, it might be really useful to be able to select subset of these jobs that meet certain criteria. I'm not sure how useful it will be, though.

History Presentation

It might be worth looking into alternative ways of displaying the history portion. It's essentially a timeline, and I don't know if sort of tabular text is really the way to go. A cursory glance at some of the JavaScript timeline packages out there don't really appeal to me.

that said, I really like the look of the network activity interface in Chrome's debug console -- would be neat to emulate that. It also might be pretty involved. I got some CSS working to look similar to it stylistically, but there would still be layout to overcome, tooltips, etc.

Allow web app's qless client to be set on a per-instance basis

Currently, you set the web app's qless client globally:

Qless::Server.client = Qless::Client.new

However, I'm thinking that we may want to have an array of redis servers (and corresponding qless workers); this will allow us to scale infinitely as long as we have a good sharding strategy. Once we're doing that, we're going to want to have multiple qless web apps mounted in the same rack process, each bound to a different qless client (each of which is bound to a different redis server). Something like:

map('/qless1') { run Qless::Server.new(client1) }
map('/qless2') { run Qless::Server.new(client2) }

Provide means to run a worker in the same process

The worker runs jobs in a forked child process. This is great for production, and probably a good idea to do in a true end-to-end acceptance test, but for some levels of testing this gets in the way and is annoying (e.g. if you're trying to use VCR or whatever).

It'd be nice to have a way to run a worker w/o it forking.

/cc @proby

Qless::Client#inspect should be cleaned up

It spits out a huge string, and on our staging server, it takes a long time to run the #inspect method. I haven't bench marked it yet to see what is taking so long but regardless, which should clean it up and simplify it.

Overview Should Include Tracked Categories

The overview view should probably show the number of jobs in each category for tracked jobs. Something like:

Tracked Jobs
=============
waiting | 4
complete | 10

Just so that users don't have to go 'track' every time they want to check up on the state of things, but only when there's something they know they can update.

Priority Changes

Like tags, need a way to edit jobs' priority in the UI

Rate Limiting

In a comment on our blog post, @drano mentioned that he'd like rate limiting based on job type. We'll see what interest is like.

Show when scheduled jobs are scheduled to be run.

It'd be super helpful and handy to show when a job is scheduled to be run whenever looking at a scheduled job.

Job Data Archival

Brandon suggested the possibility of archiving completed jobs to permanent storage. This might be nice, but I think it might be best to leave this as the responsibility of clients. It's also possible that there just be a consumer helper that archives to S3, disk, etc.

Display Number of Recurring Jobs

In the overview, we should display the number of recurring jobs

Host / Queue Throttling

It might be nice to limit the level of concurrency on a host or queue basis. This might include:

Constant limit of jobs running on a queue basis or a host basis
Constant limit of jobs of a certain type
Hosts gone awry and burning through jobs and failing them

Brandon had a really good suggestion that perhaps hosts should be throttled based on how many jobs that they drop or fail. The suggestion was that we could track statistics about how many jobs get dropped by all hosts and then mark a host as 'bad' when it is sufficiently faily.

The difficulty with this is that it can be hard to determine how many jobs have been dropped. Failures, on the other hand, could definitely be handled.

Roundtrip Job.build data through JSON

When a job's data hash is round-tripped through JSON (e.g. encoded when the job is enqueued, decoded when the job is popped), the hash can wind up being different then the original hash. For example, it's idiomatic to use symbols for identifiers in ruby, so the data hash might be { id: 'foo' }. However, when the job is run, the hash will be { "id" => "foo" } due to how JSON deserializes. Likewise, date/time objects will wind up as strings when round-tripping through JSON.

Since the purpose of Job.build is to get a job instance for unit testing without incurring the cost of going through redis, it would be good if it round-tripped the JSON arguments to mirror the production behavior.

Enqueing a recurring job isn't fully idempotent (or appears not to be)

I've got code that runs at app boot time that enqueues a recurring job. It appears that when this runs the next time the app boots, the counter gets reset to 1 and there wind up being jid collisions.

Allow job classes to define default job options

It's nice to be able to set job options directly in the job class; that way, when your app enqueues that type of job from multiple places, each place does not have to pass the same job options to queue.put. Here's an API @dlecocq and I discussed:

class MyJob
  def self.perform(job)
  end

  def self.default_job_options(data)
    { jid: jid_for(data) }
  end
end

Essentially, a job class can define a default_job_options method. If it's defined, queue.put(MyJob...) will use it to get default job options. A single use of queue.put(MyJob...) can override default options by passing its own options.

Provide a way to see recurring jobs on the web UI

Currently there doesn't appear to be a way to view recurring jobs.

Better webapp docs for non-Ruby crowd

So given that the webapp is supposed to be language agnostic, it would be nice to have more complete docs on how to run it and set it up.

I'm using qless-py for instance and my Ruby knowledge is next to none. I did manage to have it running using the qless-web script on the exe folder, after realising that it needs Ruby 1.9.*

But now I need to specify my Redis connection as it's not running on the same machine and I'm at a loss here.

Also, cloning the repo just to run the app seems clumsy. I guess there's probably a gem that one can install, and if so, how does one get the webapp running with it (and specify the connection!)?

PS: perhaps a mailing list would be in order?

Retain instance history longer

I've run into a number of trouble tickets where it would have been most helpful to find some basic historical information on a given crawl in the qless status pages. Right now, crawl instance information seems to vanish the moment a more recent crawl exists. It would be nice if the metadata for old crawls lingered in the qless status pages for 6 weeks (i.e. the same length of time we retain crawler logs for).

The most important bit of information is which host a given crawl ran on, because I need that information to be able to locate the log files for the crawl.

qless-0.9.0 ruby gem doesn't include the lua files

Qless::Client.new results in an error about missing lua files. Further I can't find any lua files browsing the gem directory. There also doesn't seem to be a separate 'qless-core' package or similar on rubygems.org

What am I doing wrong here?

Provide a sidekiq-style worker as an alternative to the resque-style one

@jstorimer and I were talking about this on twitter. I haven't looked much at the sidekiq code yet but wanted to get a conversation rolling here.

Dropped Job Backoff

We came to the concensus that we should probably re-schedule jobs that get dropped on the floor to avoid transient failures. Phil suggested constant backoff since we're likely dealing with jobs on a larger granularity than sub-second.

Provide means to enumerate all jobs in a queue

Enumerating all the jobs in a single queue, regardless of their state, is harder than it needs to be. This is useful for things like moving all the jobs in a queue to new queues (if you are splitting out your queues or whatever). I got it to work with code like:

types = [:running, :stalled, :scheduled, :depends, :recurring]
jids_for_queue = lambda do |queue|
  types.inject(Set.new) do |set, type|
    set | queue.jobs.send(type)
  end | queue.peek(25).map(&:jid)
end

queue = ShardCreation.qless.queues["some_queue_name"]
while (jids = jids_for_queue[queue]).any?
  jids.each do |jid|
    job = ShardCreation.qless.jobs[jid]
    # do something with the job
  end
end

That works but it'd be nice to be able to use something like:

queue.jobs.all.each do |job|
  # do something with the job
end

Ideally, jobs.all would return an Enumerator, which can lazily iterate over the jids from the various states and return them. There i sa potential maintenance issue with having code like this--it assumes knowledge of all the states. If a new state was added, we could forget to add it to the list here. So it might be worth adding some support to the qless-core scripts for this (unless there's a better way to go about this?).

Force workers to quit when a job's heartbeat times out

Currently when a job times out its heartbeat interval, the job fails, but a worker may keep trying to work on it.

It would be useful if we could cause the heartbeat timeout to kill the worker, so the worker is not stuck, continuing to work on the job.

Here's an idea of how we can do that:

Just before a worker process starts working on a job, have it spin up a thread that subscribes to a redis pub/sub channel named after the worker process.
When Qless times out the heartbeat interval of a job, have it publish a "stop working on this job" message in the worker's channel.
The worker's listening thread can use ruby's Thread#raise API to cause an exception to be raised in the main worker thread, effectively killing it.

Thoughts from @proby, @dlecocq, @benkirzhner, @waltjones ?

Allow job classes to define retryable exceptions

Currently, the ruby worker fails all exceptions. However, there are certain types of errors (e.g. transient network failures) that we expect, and that we'd like it to automatically retry. Here's a suggested API for that:

class MyJob
  extend Qless::RetryExceptions(TimeoutError, SomeOtherError)

  def self.perform(data)
  end
end

Nanny Process

Certain pieces of data are amenable to being cleaned up as we go. Jobs, for instance. Whenever a job is completed, we can delete any jobs whose data is expired. But what of our list of workers? What about statistics? Should we have an admin operation that's along the lines of clean-stale-data?

That said, I'm not thrilled with the idea of abandoning the completely-client managed nature of qless.

Stable sorting when displaying workers

When displaying workers (not through the client API), we should use some stable sort

Stale workers don't drop off the web ui

Currently when a worker is started it shows up in the web ui, and even if it gets stopped it'll stay there forever. It looks like ql:workers should be truncated or limited to recently seen workers. If you have a preferred method of implementing this, I can submit a patch.

Improve Job#move so that it can take new dependencies

It would be nice if Job#move allowed dependencies to be added.

Prefer raising errors to returning false when Qless commands cannot be completed

Currently, some qless-core commands silently fail and return false when they cannot be completed. For example, depends.

It would be nice if it would raise an error instead. As a caller, I expect that the command worked unless it gives me an error. I didn't realize it had situations where it would return false. Even if I checked the return value, it's not clear what false means...it tells me nothing why it couldn't complete or what I can do to fix that.

Generally, I prefer things to "fail loudly" and raise errors if they can't do what they've been asked. Sometimes this leads to rescueing exceptions for control flow (which is generally frowned upon) but I like that failing loudly forces the user to be aware of the problem and do something about it.

If you move in this direction, it'd be nice if there were specific error classes that were raised (rather than a generic LuaError (or whatever) with a detailed error message), as that would allow the caller flexibility to rescue specific errors.

Update signal handling

Resque's signal handling has been updated to follow more standard unix conventions, and to allow for jobs to perform cleanup when killed:

http://hone.heroku.com/resque/2012/08/21/resque-signals.html

It'd be good to change Qless as well (since we pretty much copied the signal handling from resque).

Ideally, this should be done before a 1.0 release.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.