Giter Site home page Giter Site logo

atemerev / skynet Goto Github PK

View Code? Open in Web Editor NEW
1.0K 50.0 122.0 279 KB

Skynet 1M threads microbenchmark

License: MIT License

Erlang 1.67% Go 1.13% Scala 5.97% C# 27.85% Haskell 13.90% JavaScript 2.81% F# 1.36% Crystal 1.07% Java 16.19% Makefile 0.72% C 5.65% Python 4.26% Rust 2.00% Lua 0.86% Scheme 5.71% wisp 6.79% Pony 2.08%

skynet's Introduction

Skynet 1M concurrency microbenchmark

Creates an actor (goroutine, whatever), which spawns 10 new actors, each of them spawns 10 more actors, etc. until one million actors are created on the final level. Then, each of them returns back its ordinal number (from 0 to 999999), which are summed on the previous level and sent back upstream, until reaching the root actor. (The answer should be 499999500000).

Results (on my shitty Macbook 12" '2015, Core M, OS X):

Actors

  • Scala/Akka: 6379 ms.
  • Erlang (non-HIPE): 4414 ms.
  • Erlang (HIPE): 3999 ms.

Coroutines / fibers / channels

  • Haskell (GHC 7.10.3): 6181 ms.
  • Go: 979 ms.
  • Quasar fibers and channels (Java 8): TODO

Futures / promises

  • .NET Core: 650 ms.
  • RxJava: 219 ms.

Results (i7-4770, Win8.1):

Actors

  • Scala/Akka: 4419 ms
  • Erlang (non-HIPE): 1700 ms.

Coroutines / fibers / channels

  • Haskell (GHC 7.10.3): 2820 ms.
  • Go: 629 ms.
  • F# MailboxProcessor: 756ms. (should be faster?..)
  • Quasar fibers and channels (Java 8): TODO

Futures / promises

  • .NET Core: Async (8 threads) 290 ms
  • Node-bluebird (Promise) 285ms / 195ms (after warmup)
  • .NET Full (TPL): 118 ms.

Results (i7-4771, Ubuntu 15.10):

  • Scala/Akka: 1700-2700 ms
  • Haskell (GHC 7.10.3): 41-44 ms
  • Erlang (non-HIPE): 700-1100 ms
  • Erlang (HIPE): 2100-3500 ms
  • Go: 200-224 ms
  • LuaJit: 297 ms

How to run

Scala/Akka

Install latest Scala and SBT.

Go to scala/, then run sbt compile run.

Java/Quasar

Install the Java 8 SDK.

Go to java-quasar/ ./gradlew

Go

Install latest Go compiler/runtime.

In go/, run go run skynet.go.

Pony

Install latest Pony compiler.

In pony/, run ponyc -b skynet && ./skynet.

Erlang

Install latest Erlang runtime.

In erlang, run erl +P 2000000 (to raise process limit), then compile:

  • For non-HIPE: c(skynet).
  • For HIPE (if supported on your system): hipe:c(skynet).

Then, run:

skynet:skynet(1000000,10).

.NET Core:

Install latest version of .NET Core

Go to dnx/
dotnet restore (first time)
dotnet run --configuration Release

Haskell

Install Stack

In haskell/, run stack build && stack exec skynet +RTS -N

Node (bluebird)

Install node.js

in node-bluebird/ run npm install then node skynet

FSharp

Install FSharp Interactive

Run fsi skynet.fsx, or run fsi and paste the code in (runs faster this way)

Crystal:

Install latest version of Crystal.

Go to crystal/ crystal build skynet.cr --release ./skynet

.NET/TPL

Build the solution with VS2015. Windows only :(

=======

Java

Install the Java 8 SDK.

Go to java/ ./gradlew :run

Rust (with coroutine-rs)

cd ./rust-coroutine
cargo build --release
cargo run --release

LuaJIT

Install luajit

Run luajit luajit/skynet.lua

Scheme/Guile Fibers

Install Guile, Guile fibers, and wisp; for example via guix package -i guile guile-fibers guile-wisp.

Go to guile-fibers ./skynet.scm

skynet's People

Contributors

aotenko avatar asterite avatar atemerev avatar benaadams avatar bitemyapp avatar christopherking42 avatar d-led avatar kirushik avatar mironor avatar ociule avatar peterbourgon avatar rkuhn avatar roryrjb avatar sergey-scherbina avatar snapcracklepopgone avatar spion avatar theangrybyrd avatar tonysimpson avatar tpetricek avatar witeman avatar wizzard0 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

skynet's Issues

.NET Native version

Would be great to get .NET Native results. The .NET Core version should work on a UWP project that's compiled in Release mode with .NET Native.

Is .NET TPL version synchronous?

I looked at the code, found it a bit weird and went on to debug it.

It seems we have one task running parallel to the Main task and doing, if I understand it correctly, only an asynchronous reduce (aggregate) over all synchronous and recursive results.

Can anyone help me understand this?

I've created a stackoverflow question hoping for more attention.

Quasar version

It would be interesting to see a version of this benchmark implemented in Java/Scala/Kotlin using Quasar.

Not totally fair

I think the test may have some issues:

  • Scala uses the java vm, that is well know to need a time to "warm-up" and the JIT compiler kick-in. Some info: http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java and http://www.oracle.com/technetwork/articles/java/architect-benchmarking-2266277.html
  • Scala, Erlang and Go actors have truly different features, like network and even monitoring support. And java objects will always be slower to instantiate, no matter what.
  • You are mixing the instantiation metrics with the execution metrics, so you are targeting a very specific use-case (an application where you have very short lived calls, with a very short computation performed). In real world scenarios, actors are more resilient, and live through the user session lifecycle. At this moment, inter-actor communication becomes more important than everything, and all these languages have different approachs to the problem.

Good job anyway, []s, Ricardo Mello

License?

A License file should probably be added. The Haskell folder already has its own, somehow.

Interesting Haskell Results

Haskell went from being one of the slowest (on Macbook) to the fastest in the Linux test case. Any indication of what's causing this relative slowdown? Memory/CPU? Perhaps even OS?

Tuning the Erlang VM

Simple standard tunings of the Erlang VM cuts processing time in half here on my Linux Core i7 machine:

erl +P 2000000 +sbt db +sbwt very_long +swt very_low +sub true +Mulmbcs 32767 +Mumbcgs 1 +Musmbcs 2047

This makes the system lock the schedulers to physical cores so your kernel threads are not jumping around like mad. It also tells the VM to allocate carriers for memory in blocks of 32 megabytes and to use 2048 kilobyte blocks inside there. This means you get superpages in the TLB and you have way fewer TLB hits in the long run :)

Scala implementation with Futures

Here's the benchmark implemented using Scala Future

import scala.concurrent._
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration._

object Root extends App {
  def skynet(num: Int, size: Int, div: Int): Future[Long] = {
    if(size == 1)
      Future.successful(num.toLong)
    else {
      Future.sequence((0 until div).map(n => skynet(num + n*(size/div), size/div, div))).map(_.sum)
    }
  }
  val startTime = System.currentTimeMillis()
  val x = Await.result(skynet(0, 1000000, 10), 10.seconds)
  val diffMs = System.currentTimeMillis() - startTime
  println(s"Result: $x in $diffMs ms.")
}

Runs in 250ms on my i7 laptop. For comparison the Akka version takes about 8200ms and Go takes about 750ms.

Alternative Go version

Just in case if you're curious. It avoids using channels for communications and uses sync.WaitGroup instead. Results are collected via arrays and then aggregated inside a single goroutine. On my machine it's slightly faster than channel-based one.

package main

import "fmt"
import "time"
import "sync"

func skynet(out *int, num int, size int, div int, wg *sync.WaitGroup) {
    if size == 1 {
        *out = num
        wg.Done()
        return
    }

    var wgInner sync.WaitGroup
    wgInner.Add(div)
    results := make([]int, div)
    for i := 0; i < div; i++ {
        subNum := num + i*(size/div)
        go skynet(&results[i], subNum, size/div, div, &wgInner)
    }
    wgInner.Wait()
    var sum int
    for _, r := range results {
        sum += r
    }
    *out = sum
    wg.Done()
}

func main() {
    start := time.Now()
    var wg sync.WaitGroup
    wg.Add(1)
    var result int
    go skynet(&result, 0, 1000000, 10, &wg)
    wg.Wait()
    took := time.Since(start)
    fmt.Printf("Result: %d in %d ms.\n", result, took.Nanoseconds()/1e6)
}

Crystal version doesn't compile with 1.4.1

As in the subject the Crystal version doesn't compile with version 1.4.1, something has changed.
This is how it should be:

`def skynet(c, num, size, div)
if size == 1
c.send num
else
rc = Channel(Int64|Float64).new
sum = 0_i64
div.times do |i|
sub_num = num + i*(size/div)
spawn skynet(rc, sub_num, size/div, div)
end
div.times do
sum += rc.receive
end
c.send sum
end
end

c = Channel(Int64|Float64).new
start_time = Time.local
spawn skynet(c, 0_i64, 1_000_000, 10)
result = c.receive
end_time = Time.local
puts "Result: #{result} in #{(end_time - start_time).total_milliseconds.to_i}ms."`

Note for linux users, the number of allowed threads have to be raised with

sysctl -w vm.max_map_count=10000000

What, if anything, have you done to thwart CPU frequency scaling?

You need to make sure your CPU is running at full clock speed when starting the test. Otherwise, part of the test is a scaling of the CPU frequency, which isn't fair.

If you run the test 20 times for each application and measure all the results, what are the variance of the measurements? If you use something like Haskell's criterion or my own eministat to analyze the results, are they stable against outlier variance?

Intel TBB version?

It would be interesting to see performance of Intel TBB with your microbenchmark.

List language/runtime versions

Benchmarks can be highly dependant on which version was used to run the test. For example, was Node v4 or v5 used to generate the results?

comparison doesn't make any sense

.Net SYNC is not concurrent at all. This would be fast in all languages.

.Net ASYNC does NOT have a million actors at any time. Other implementations probably as well because the task is so easy that they complete faster than they can be spawned.

Sorry, but you should state clearly what you want to compare and then create a fitting benchmark.
Currently I fear the only thing you've shown that underlying implementations are very different. But not if they are better or worse at anything bit this particular program...

Haskell Stack swallows RTS flags

I ran into an issue when I tried to get stack to generate .prof files with the +RTS flags. It seems like the stack/ghc on windows swallows the +RTS flags meant for the application! This bug means the windows code never even got the -N flag and wasn't even parallelized. One workaround is to run the benchmarks against the executable directly without calling stack (somewhere in .stack-work\install\xxxxxxx\bin) until this is resolved.

Use monotonic_time for Erlang

This is really a minor nitpick, but system_time is under the influence of NTP and shouldn't be used as a difference clock source. Use erlang:monotonic_time(native) and then convert the difference to milli-seconds. This is time-warp safe.

It probably will not hit your benchmark in any way, but it is important to get right in larger production systems.

Update results in README.md to reflect current code

As explained in the merged Akka implementation PR #13
the timings are very different from what was previously in the readme, to avoid further confusing readers it would be nice to update the README with the current numbers.

Thanks in advance!

Prolog version!

Out of interest I have included my own implementation in Prolog. This is written in my own Prolog ( https://github.com/infradig/yxtrang ) which implements Erlang-style processes and message-passing. It is quite slow as I haven't done too much optimisation in this area (in fact I just got it working well-enough to run the test at all).

:-module(skynet).
:-export([start/2]).

start(Size,Div) :-
        spawn(skynet(0,Size,Div)),
        recv(Tot),
        writeln('###=> ',Tot).

skynet(Num,1,Div) :- send(Num).

skynet(Num,Size,Div) :-
    NewSize is Size div Div,
    between(1,Div,Idx),
        NewNum is Num+((Idx-1)*NewSize),
        spawn(skynet(NewNum,NewSize,Div)),
        fail.

skynet(Num,Size,Div) :- process_sum(0,Div).

process_sum(Tot,0) :- send(parent,Tot).

process_sum(Tot,Idx) :-
    recv(N),
    NewTot is Tot+N,
    NewIdx is Idx-1,
    process_sum(NewTot,NewIdx).

It would be trivially easy to distribute the skynet processes over multiple nodes, as send/recv work transparently over networks as well.

Can't compile `rust-coroutine` with Rust 1.8.0

With Rust 1.8.0 installed by Linuxbrew on Linux Mint 17.4 Rosa x64 (based on Trusty), when running cargo build --release:

/home/fabio/.cargo/registry/src/github.com-88ac128001ac3a9a/fs2-0.2.2/src/unix.rs:63:67: 63:79 error: mismatched types:
 expected `i64`,
    found `u64` [E0308]
/home/fabio/.cargo/registry/src/github.com-88ac128001ac3a9a/fs2-0.2.2/src/unix.rs:63     let ret = unsafe { libc::posix_fallocate(file.as_raw_fd(), 0, len as off_t) };
                                                                                                                                                       ^~~~~~~~~~~~

Sorry, not knowledgeable enough in Rust to open a PR but wanted to give it a try nevertheless.

rust-jobsteal results

There is rphmeier/skynet-jobsteal implementation of this kata, which is much better and faster than my own (~5ร— faster on my laptop).

It would be very cool to see those results in the repo's readme.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.