atemerev / skynet Goto Github PK

Skynet 1M threads microbenchmark

License: MIT License

Erlang 1.67% Go 1.13% Scala 5.97% C# 27.85% Haskell 13.90% JavaScript 2.81% F# 1.36% Crystal 1.07% Java 16.19% Makefile 0.72% C 5.65% Python 4.26% Rust 2.00% Lua 0.86% Scheme 5.71% wisp 6.79% Pony 2.08%

skynet's Introduction

Skynet 1M concurrency microbenchmark

Creates an actor (goroutine, whatever), which spawns 10 new actors, each of them spawns 10 more actors, etc. until one million actors are created on the final level. Then, each of them returns back its ordinal number (from 0 to 999999), which are summed on the previous level and sent back upstream, until reaching the root actor. (The answer should be 499999500000).

Results (on my shitty Macbook 12" '2015, Core M, OS X):

Actors

Scala/Akka: 6379 ms.
Erlang (non-HIPE): 4414 ms.
Erlang (HIPE): 3999 ms.

Coroutines / fibers / channels

Haskell (GHC 7.10.3): 6181 ms.
Go: 979 ms.
Quasar fibers and channels (Java 8): TODO

Futures / promises

.NET Core: 650 ms.
RxJava: 219 ms.

Results (i7-4770, Win8.1):

Actors

Scala/Akka: 4419 ms
Erlang (non-HIPE): 1700 ms.

Coroutines / fibers / channels

Haskell (GHC 7.10.3): 2820 ms.
Go: 629 ms.
F# MailboxProcessor: 756ms. (should be faster?..)
Quasar fibers and channels (Java 8): TODO

Futures / promises

.NET Core: Async (8 threads) 290 ms
Node-bluebird (Promise) 285ms / 195ms (after warmup)
.NET Full (TPL): 118 ms.

Results (i7-4771, Ubuntu 15.10):

Scala/Akka: 1700-2700 ms
Haskell (GHC 7.10.3): 41-44 ms
Erlang (non-HIPE): 700-1100 ms
Erlang (HIPE): 2100-3500 ms
Go: 200-224 ms
LuaJit: 297 ms

How to run

Scala/Akka

Install latest Scala and SBT.

Go to scala/, then run sbt compile run.

Java/Quasar

Install the Java 8 SDK.

Go to java-quasar/ ./gradlew

Go

Install latest Go compiler/runtime.

In go/, run go run skynet.go.

Pony

Install latest Pony compiler.

In pony/, run ponyc -b skynet && ./skynet.

Erlang

Install latest Erlang runtime.

In erlang, run erl +P 2000000 (to raise process limit), then compile:

For non-HIPE: c(skynet).
For HIPE (if supported on your system): hipe:c(skynet).

Then, run:

skynet:skynet(1000000,10).

.NET Core:

Install latest version of .NET Core

Go to dnx/
dotnet restore (first time)
dotnet run --configuration Release

Haskell

Install Stack

In haskell/, run stack build && stack exec skynet +RTS -N

Node (bluebird)

Install node.js

in node-bluebird/ run npm install then node skynet

FSharp

Install FSharp Interactive

Run fsi skynet.fsx, or run fsi and paste the code in (runs faster this way)

Crystal:

Install latest version of Crystal.

Go to crystal/ crystal build skynet.cr --release ./skynet

.NET/TPL

Build the solution with VS2015. Windows only :(

=======

Java

Install the Java 8 SDK.

Go to java/ ./gradlew :run

Rust (with coroutine-rs)

cd ./rust-coroutine
cargo build --release
cargo run --release

LuaJIT

Install luajit

Run luajit luajit/skynet.lua

Scheme/Guile Fibers

Install Guile, Guile fibers, and wisp; for example via guix package -i guile guile-fibers guile-wisp.

Go to guile-fibers ./skynet.scm

skynet's People

Contributors

Stargazers

Watchers

Forkers

jrwren nivertech desmax74 asterite xyproto zach-klippenstein daxfohl fasaxc ixmatus spion greyfade benashford rw gonzalomcruz jonas-l trashsydowdev ygabo ogier tonysimpson tpetricek sergey-scherbina jacktang kenofyugen benaadams nepomuceno nqzero naisanza justnoxx ozifrankly kirushik christopherking42 eao197 oswynb ociule mironor haseebahmad109 nocarryr viktor-evdokimov retrogradeorbit cutewalker bjmayor tizzybec valyala luci4r hyqgod xlpe joseph-hurtado liloman zhangf911 yourchanges zhengxiongzhao m-2k zhhong ekingyan circlespainter kikofernandez leonardleonard abcijkxyz dragon9783 kakahero neumachen glebpom zachlungu zhoudaqing witeman junsure kcruci lovvecode davidzhixing skyformat99 ecit241 devopsmi happy-ferret to11mtm modulexcite hnfgns palutz kwame998 unipeter andreatp zhangjh953166 lshm bsdpunk depizzottri ly774508966 cleverlzc yanghongkjxy arnebab dripemail stepsforward luhan 0zand1z sassa-nf hepyu yonggeshidai zmyer halfvector baojie5642 phuong3030 qipa

skynet's Issues

.NET Native version

Would be great to get .NET Native results. The .NET Core version should work on a UWP project that's compiled in Release mode with .NET Native.

The Haskell benchmark doesn't use the threaded runtime

You have to compile with -threaded, and then run with +RTS -N.

Also is it really fair to use a Chan where an MVar would be just as nice?

Add Service Fabric results

Would be interesting to see how Azure Service Fabric compares.

Haskell benchmark doesn't use the sorts of tools we ordinarily use

Chan isn't really recommended by anybody. Almost always people will use unagi-chan or TBQueue instead. I will submit a PR.

Is .NET TPL version synchronous?

I looked at the code, found it a bit weird and went on to debug it.

It seems we have one task running parallel to the Main task and doing, if I understand it correctly, only an asynchronous reduce (aggregate) over all synchronous and recursive results.

Can anyone help me understand this?

I've created a stackoverflow question hoping for more attention.

Quasar version

It would be interesting to see a version of this benchmark implemented in Java/Scala/Kotlin using Quasar.

WIP Rust version

Here is a not working Rust version ported from the Go example.

The Receiver should be wrapped in Arc to work

Not totally fair

I think the test may have some issues:

Scala uses the java vm, that is well know to need a time to "warm-up" and the JIT compiler kick-in. Some info: http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java and http://www.oracle.com/technetwork/articles/java/architect-benchmarking-2266277.html
Scala, Erlang and Go actors have truly different features, like network and even monitoring support. And java objects will always be slower to instantiate, no matter what.
You are mixing the instantiation metrics with the execution metrics, so you are targeting a very specific use-case (an application where you have very short lived calls, with a very short computation performed). In real world scenarios, actors are more resilient, and live through the user session lifecycle. At this moment, inter-actor communication becomes more important than everything, and all these languages have different approachs to the problem.

Good job anyway, []s, Ricardo Mello

License?

A License file should probably be added. The Haskell folder already has its own, somehow.

Interesting Haskell Results

Haskell went from being one of the slowest (on Macbook) to the fastest in the Linux test case. Any indication of what's causing this relative slowdown? Memory/CPU? Perhaps even OS?

Tuning the Erlang VM

Simple standard tunings of the Erlang VM cuts processing time in half here on my Linux Core i7 machine:

erl +P 2000000 +sbt db +sbwt very_long +swt very_low +sub true +Mulmbcs 32767 +Mumbcgs 1 +Musmbcs 2047

This makes the system lock the schedulers to physical cores so your kernel threads are not jumping around like mad. It also tells the VM to allocate carriers for memory in blocks of 32 megabytes and to use 2048 kilobyte blocks inside there. This means you get superpages in the TLB and you have way fewer TLB hits in the long run :)

Scala implementation with Futures

Here's the benchmark implemented using Scala Future

import scala.concurrent._
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration._

object Root extends App {
  def skynet(num: Int, size: Int, div: Int): Future[Long] = {
    if(size == 1)
      Future.successful(num.toLong)
    else {
      Future.sequence((0 until div).map(n => skynet(num + n*(size/div), size/div, div))).map(_.sum)
    }
  }
  val startTime = System.currentTimeMillis()
  val x = Await.result(skynet(0, 1000000, 10), 10.seconds)
  val diffMs = System.currentTimeMillis() - startTime
  println(s"Result: $x in $diffMs ms.")
}

Runs in 250ms on my i7 laptop. For comparison the Akka version takes about 8200ms and Go takes about 750ms.

Benchmarks do not perform multiple runs to establish statistical significance

Again, fixing this for Haskell. I don't know what equivalents the other ecosystems have for criterion.

Cf. Zed Shaw's article about this topic

Alternative Go version

Just in case if you're curious. It avoids using channels for communications and uses sync.WaitGroup instead. Results are collected via arrays and then aggregated inside a single goroutine. On my machine it's slightly faster than channel-based one.

package main

import "fmt"
import "time"
import "sync"

func skynet(out *int, num int, size int, div int, wg *sync.WaitGroup) {
    if size == 1 {
        *out = num
        wg.Done()
        return
    }

    var wgInner sync.WaitGroup
    wgInner.Add(div)
    results := make([]int, div)
    for i := 0; i < div; i++ {
        subNum := num + i*(size/div)
        go skynet(&results[i], subNum, size/div, div, &wgInner)
    }
    wgInner.Wait()
    var sum int
    for _, r := range results {
        sum += r
    }
    *out = sum
    wg.Done()
}

func main() {
    start := time.Now()
    var wg sync.WaitGroup
    wg.Add(1)
    var result int
    go skynet(&result, 0, 1000000, 10, &wg)
    wg.Wait()
    took := time.Since(start)
    fmt.Printf("Result: %d in %d ms.\n", result, took.Nanoseconds()/1e6)
}

Crystal version doesn't compile with 1.4.1

As in the subject the Crystal version doesn't compile with version 1.4.1, something has changed.
This is how it should be:

`def skynet(c, num, size, div)
if size == 1
c.send num
else
rc = Channel(Int64|Float64).new
sum = 0_i64
div.times do |i|
sub_num = num + i*(size/div)
spawn skynet(rc, sub_num, size/div, div)
end
div.times do
sum += rc.receive
end
c.send sum
end
end

c = Channel(Int64|Float64).new
start_time = Time.local
spawn skynet(c, 0_i64, 1_000_000, 10)
result = c.receive
end_time = Time.local
puts "Result: #{result} in #{(end_time - start_time).total_milliseconds.to_i}ms."`

Note for linux users, the number of allowed threads have to be raised with

sysctl -w vm.max_map_count=10000000

What, if anything, have you done to thwart CPU frequency scaling?

You need to make sure your CPU is running at full clock speed when starting the test. Otherwise, part of the test is a scaling of the CPU frequency, which isn't fair.

If you run the test 20 times for each application and measure all the results, what are the variance of the measurements? If you use something like Haskell's criterion or my own eministat to analyze the results, are they stable against outlier variance?

Intel TBB version?

It would be interesting to see performance of Intel TBB with your microbenchmark.

List language/runtime versions

Benchmarks can be highly dependant on which version was used to run the test. For example, was Node v4 or v5 used to generate the results?

comparison doesn't make any sense

.Net SYNC is not concurrent at all. This would be fast in all languages.

.Net ASYNC does NOT have a million actors at any time. Other implementations probably as well because the task is so easy that they complete faster than they can be spawned.

Sorry, but you should state clearly what you want to compare and then create a fitting benchmark.
Currently I fear the only thing you've shown that underlying implementations are very different. But not if they are better or worse at anything bit this particular program...

Haskell Stack swallows RTS flags

I ran into an issue when I tried to get stack to generate .prof files with the +RTS flags. It seems like the stack/ghc on windows swallows the +RTS flags meant for the application! This bug means the windows code never even got the -N flag and wasn't even parallelized. One workaround is to run the benchmarks against the executable directly without calling stack (somewhere in .stack-work\install\xxxxxxx\bin) until this is resolved.

Use monotonic_time for Erlang

This is really a minor nitpick, but system_time is under the influence of NTP and shouldn't be used as a difference clock source. Use erlang:monotonic_time(native) and then convert the difference to milli-seconds. This is time-warp safe.

It probably will not hit your benchmark in any way, but it is important to get right in larger production systems.

Use Project Orleans for .Net

Project Orleans is the actor framework for .net.
https://github.com/dotnet/orleans

Update results in README.md to reflect current code

As explained in the merged Akka implementation PR #13
the timings are very different from what was previously in the readme, to avoid further confusing readers it would be nice to update the README with the current numbers.

Thanks in advance!

Prolog version!

Out of interest I have included my own implementation in Prolog. This is written in my own Prolog ( https://github.com/infradig/yxtrang ) which implements Erlang-style processes and message-passing. It is quite slow as I haven't done too much optimisation in this area (in fact I just got it working well-enough to run the test at all).

:-module(skynet).
:-export([start/2]).

start(Size,Div) :-
        spawn(skynet(0,Size,Div)),
        recv(Tot),
        writeln('###=> ',Tot).

skynet(Num,1,Div) :- send(Num).

skynet(Num,Size,Div) :-
    NewSize is Size div Div,
    between(1,Div,Idx),
        NewNum is Num+((Idx-1)*NewSize),
        spawn(skynet(NewNum,NewSize,Div)),
        fail.

skynet(Num,Size,Div) :- process_sum(0,Div).

process_sum(Tot,0) :- send(parent,Tot).

process_sum(Tot,Idx) :-
    recv(N),
    NewTot is Tot+N,
    NewIdx is Idx-1,
    process_sum(NewTot,NewIdx).

It would be trivially easy to distribute the skynet processes over multiple nodes, as send/recv work transparently over networks as well.

Can't compile `rust-coroutine` with Rust 1.8.0

With Rust 1.8.0 installed by Linuxbrew on Linux Mint 17.4 Rosa x64 (based on Trusty), when running cargo build --release:

/home/fabio/.cargo/registry/src/github.com-88ac128001ac3a9a/fs2-0.2.2/src/unix.rs:63:67: 63:79 error: mismatched types:
 expected `i64`,
    found `u64` [E0308]
/home/fabio/.cargo/registry/src/github.com-88ac128001ac3a9a/fs2-0.2.2/src/unix.rs:63     let ret = unsafe { libc::posix_fallocate(file.as_raw_fd(), 0, len as off_t) };
                                                                                                                                                       ^~~~~~~~~~~~

Sorry, not knowledgeable enough in Rust to open a PR but wanted to give it a try nevertheless.

atemerev / skynet Goto Github PK

skynet's Introduction

Skynet 1M concurrency microbenchmark

Results (on my shitty Macbook 12" '2015, Core M, OS X):

Actors

Coroutines / fibers / channels

Futures / promises

Results (i7-4770, Win8.1):

Actors

Coroutines / fibers / channels

Futures / promises

Results (i7-4771, Ubuntu 15.10):

How to run

Scala/Akka

Java/Quasar

Go

Pony

Erlang

.NET Core:

Haskell

Node (bluebird)

FSharp

Crystal:

.NET/TPL

Java

Rust (with coroutine-rs)

LuaJIT

Scheme/Guile Fibers

skynet's People

Contributors

Stargazers

Watchers

Forkers

skynet's Issues

Recommend Projects

Recommend Topics

Recommend Org