Giter Site home page Giter Site logo

mercadolibre / toiler Goto Github PK

View Code? Open in Web Editor NEW
16.0 4.0 10.0 160 KB

Toiler is a AWS SQS long-polling thread-based message processor.

License: GNU Lesser General Public License v3.0

Ruby 100.00%
polling sqs-message workers ruby sqs sqs-poller threaded

toiler's Introduction

Toiler

Toiler is a AWS SQS long-polling thread-based message processor. It's based on shoryuken but takes a different approach at loadbalancing and uses long-polling.

Features

Concurrency

Toiler allows to specify the amount of processors (threads) that should be spawned for each queue. Instead of shoryuken's loadbalancing approach, Toiler delegates this work to the kernel scheduling threads.

Long-Polling

A Fetcher thread is spawned for each queue. Fetchers are resposible for polling SQS/PubSub and retreiving messages. They are optimised to not bring more messages than the amount of processors avaiable for such queue. By long-polling fetchers wait for a configurable amount of time for messages to become available on a single request, this prevents unneccesarilly requesting messages when there are none.

Message Parsing

Workers can configure a parser Class or Proc to parse a message body before being processed.

Deadline Extension

Toiler has the ability to automatically extend the ack deadline of and messages to prevent the message from re-entering the queue if processing of such message is taking longer than the queue's ack deadline or visibility timeout.

Instalation

Add this line to your application's Gemfile:

gem 'toiler'

And then execute:

$ bundle

Or install it yourself as:

$ gem install toiler

Usage

Worker class

class MyWorker
  include Toiler::Worker

  toiler_options queue: 'default', concurrency: 5, auto_delete: true
  toiler_options parser: :json

  # toiler_options parser: ->(sqs_msg){ REXML::Document.new(sqs_msg.body) }
  # toiler_options parser: MultiJson
  # toiler_options deadline_extension: true
  # toiler_options batch: true
  # toiler_options queue: 'subscription', concurrency: 5, auto_delete: true, provider: :gcp

  #Example connection client that should be shared across all instances of MyWorker
  @@client = ConnectionClient.new
    
  def initialize
    @last_message = nil
  end

  def perform(sqs_msg, body)
    #Workers are thread safe, yay!
    #Each worker instance is assured to be processing only one message at a time
    @last_message = sqs_msg 
    puts body
  end
end

Configuration

aws:
  access_key_id:     ...             # or <%= ENV['AWS_ACCESS_KEY_ID'] %>
  secret_access_key: ...             # or <%= ENV['AWS_SECRET_ACCESS_KEY'] %>
  region:            us-east-1       # or <%= ENV['AWS_REGION'] %>
gcp:
  project_id:  my-project            # or <%= ENV['GCP_PROJECT'] %>
  credentials: /path/to/keyfile.json # or <%= ENV['GCP_CREDENTIALS'] %>
wait: 20                             # The time in seconds to wait for messages during long-polling

Rails Integration

You can tell Toiler to load your Rails application by passing the -R or --rails flag to the "toiler" command.

If you load Rails, and assuming your workers are located in the app/workers directory, they will be auto-loaded. This means you don't need to require them explicitly with -r.

Start Toiler

bundle exec toiler -r worker.rb -C toiler.yml

Other options:

toiler --help

    -d, --daemon                     Daemonize process
    -r, --require [PATH|DIR]         Location of the worker
    -q, --queue QUEUE1,QUEUE2,...    Queues to process
    -C, --config PATH                Path to YAML config file
    -R, --rails                      Load Rails
    -L, --logfile PATH               Path to writable logfile
    -P, --pidfile PATH               Path to pidfile
    -v, --verbose                    Print more verbose output
    -h, --help                       Show help

Credits

Sebastian Schepens for the creation of the proyect. But much of the credit goes to Pablo Cantero, creator of Shoryuken, and everybody who contributed to it.

Contributing

  1. Fork it ( https://github.com/sschepens/toiler/fork )
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create a new Pull Request

toiler's People

Contributors

andylaurito92 avatar ddanielr avatar jclemson4 avatar meli-websec-advanced-security[bot] avatar sebisujar avatar sschepens avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

toiler's Issues

Whitelist or specify queues to process in configuration.yml

Shoryuken supports this, and we plan to add this and PR when done, we could have used this last week but we just moved the job we needed to segregate back to DelayedJob for now, but we hope to move all our jobs over to toiler and will want to partition them to different boxes based on kind of work and importance. Think this won't be hard to add, but thought I'd open an issue here first to document need/convention, and if you @sschepens or anyone else has preferred interface, happy to consider, otherwise would probably plan to match Shoryuken's interface as a goal.

DB_POOL default value weirdness

It's friday and just had a weird bug in our new kubernetes infrastructure want to capture here because it doesn't make sense to me... but in rails console, database connections were working fine, but in toiler, we were getting 100% only database connection timeouts... normally DB_POOL defaults to 5 in rails if no value set, but somehow in Toiler, without this set it was... I dunno, defaulting to 0 or something... does that make any sense to you @sschepens ? May investigate next week, but seems weird. Also FYI to @gtaschuk to discuss/look into together next week.

Test suite

@sschepens hey so I was looking at maybe starting on the config extension for #7 and thought I'd start by reviewing existing tests, maybe try and even start TDD with a new failing test (I'm not great at TDD but like to try now and then), but I'm not seeing any tests in the repo... any chance you just didn't commit them for some reason? If not, maybe I can adapt some of shoryuken's tests and get a suite running on Travis or something, seems like we'd like at least decent test coverage going forward to enable safe releases.

Concurrency/parallism question/concern

Hey @sschepens ! We got our app using Toiler and it's chugging along nicely in our test environment and it looks to have solved a somewhat severe memory leak problem there! So hooray and thanks for help!

The processing characteristics look plenty fast, but very different from what we see on Shoryuken though, and while not necessarily bad, it wasn't obvious to me and I'm wondering if a) we need better docs, and/or b) if we need to/should add some additional config around the differences.

For context, our worker looks approx like this (split the worker out from creator because of memoize/ivar issue discussed previously to make one object per message), and we have 3 queues at present and concurrency of 25:

class CreationQueueWorker
  Queue = ENV.fetch('SQS_CREATE_QUEUE')
  include Toiler::Worker
  toiler_options queue: Queue,
                 auto_delete: true,
                 parser: :json

  def perform(_body, sqs_hash_message)
    Creator.perform(sqs_hash_message)
  end
end

The surprising thing to me, which I think I understand now but wanted to discuss, is that we only ever see one message in flight from the queue... this makes sense now I think about it because it's one thread per queue at a time, but a) am I right that there's no point having higher concurrency than your number of queues? and b) though this may be desirable where you want to guarantee messages are processed in order, in a lot of cases ordering is unimportant, and it would be nice to be able to specify a queue should have as many threads as are available working on it. I'm not sure how I'd do that now without creating multiple worker classes all naming the same queue or something.

Possibly specifying a pool_size option in worker config would make sense for this, with a default size of 1, for no parallelism per queue, but will use up to pool_size threads in parallel when available?

Do I understand correctly, and what do you think? Nice work though, so far I'm impressed with toiler!

ActiveJob interface

@sschepens we talked about this and I know on the roadmap, but wanted to document here with an issue, we may tackle this at work in a fork and PR it back, or let me know if you've already started. Toiler's working great for us in production now by the way, much better than Shoryuken was, lower memory and faster processing!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.