Giter Site home page Giter Site logo

wsq's Introduction

Work Stealing Queue

This is an implementation of a Work Stealing Queue described in a series of blog articles by Stefan Reinalter at Molecular Matters, beginning with Job System 2.0.

Origin

In the blog article, Stefan uses a lock-free structure with memory barriers and compare-exchange primitives. This algorithm also uses these ideas, but does so in a slightly different way then described by Stefan. It dispenses with the memory barriers and uses compare-exchange with exchange primitives to schedule jobs using a Work Stealing Queue which is probably taught in many CS classes like CSE P 506 - Concurrency.

The version described by Stefan is a variation of a Chase Lev dequeue. There is an implementation of that work queue in Aamanieu's asyncplusplus in the default task scheduler. This is the header file for it: work_steal_queue.h.

Description of Algorithm

A Work Stealing Queue is a double ended queue that each thread/core maintains. The thread which owns the queue puts jobs at the bottom end of the queue and other threads to steal jobs from the top end, when they have nothing to do in their own queues.

+--------+ <- entries[ 0 ]
|  top   | <- stealers consume here: job = entries[ top++ ]
|        |
|   ||   |
|        |
|   vv   |
| bottom | <- owner pushes here:    entries[ bottom++ ] = job
|        |    owner consumes here:  job = entries[ --bottom ]
|        |
+--------+ <- entries[ MASK_JOBS ]

The parallelism obtained by this lock-free structure is fine grained. The test included is able to schedule synthetic jobs with approximately 100 ns (per thread) of overhead on my i9-7960X cpu. I surmise that most of the this is due to the cache coherency overhead of the x86 cmpxchg and xchg instructions along with push/pop mechanics of the queue.

I measure this 100 ns overhead by putting jobs into the queue with only one worker thread.

$ git clone https://injinj.github.com/WSQ
$ cd WSQ
$ g++ -Wall -Wextra -std=c++11 -O3 test_job.cpp -pthread
$ a.out -c 1
Sizeof Job Sys Ctx: 528
Sizeof Job Thread:  524408
Sizeof Job:         48
Sizeof Job Alloc:   65480
Number of threads:  1
Serial workload:    1000 iterations
Parallel workload:  10000 jobs
......................................................................
Workload  Serial Elapsed  Parallel Elapsed  Speedup
--------  --------------  ----------------  -------
     100          142 ns            192 ns     0.74  (- 50 / thr: 50)
     200          287 ns            335 ns     0.86  (- 48 / thr: 48)
     300          432 ns            474 ns     0.91  (- 42 / thr: 42)
     400          579 ns            605 ns     0.96  (- 26 / thr: 26)
^C
$ a.out -c 2                                                            
...
Workload  Serial Elapsed  Parallel Elapsed  Speedup
--------  --------------  ----------------  -------
     100          142 ns            106 ns     1.34
     200          289 ns            169 ns     1.71
     300          432 ns            237 ns     1.82
     400          579 ns            308 ns     1.88
^C
$ a.out -c 3
...
Workload  Serial Elapsed  Parallel Elapsed  Speedup
--------  --------------  ----------------  -------
     100          142 ns             87 ns     1.63
     200          287 ns            128 ns     2.24
     300          435 ns            174 ns     2.50
     400          577 ns            218 ns     2.65
^C
$ a.out -c 4
...
Workload  Serial Elapsed  Parallel Elapsed  Speedup
--------  --------------  ----------------  -------
     100          142 ns             70 ns     2.03
     200          289 ns             99 ns     2.92
     300          432 ns            133 ns     3.25
     400          580 ns            174 ns     3.33
^C

I also created a gnuplot script to graph the speedup of this synthetic workload. This graph is the result of running that.

$ gnuplot
Terminal type set to 'qt'
gnuplot> load "plot.gnuplot"
1-core
2-core
3-core
4-core
5-core
6-core
7-core
8-core
9-core
10-core
11-core
12-core
13-core
14-core
15-core
16-core

Job Stealing Queue

wsq's People

Contributors

injinj avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.