lilithhafner / chairmarks.jl Goto Github PK

View Code? Open in Web Editor NEW

78.0 2.0 6.0 27.27 MB

Benchmarks with back support

License: GNU General Public License v3.0

Julia 93.63% Shell 6.37%

chairmarks.jl's Introduction

Chairmarks

Chairmarks measures performance hundreds of times faster than BenchmarkTools without compromising on accuracy.

Installation

julia> import Pkg; Pkg.add("Chairmarks")

Usage

julia> using Chairmarks

julia> @b rand(1000) # How long does it take to generate a random array of length 1000?
720.214 ns (3 allocs: 7.875 KiB)

julia> @b rand(1000) hash # How long does it take to hash that array?
1.689 μs

julia> @b rand(1000) _.*5 # How long does it take to multiply it by 5 element wise?
172.970 ns (3 allocs: 7.875 KiB)

Why Chairmarks?

Tutorial

API Reference

chairmarks.jl's People

Contributors

Stargazers

Watchers

Forkers

ianbutterworth asinghvi17 etiennedeg zentrik phlaster samuelbadr

chairmarks.jl's Issues

Tuning is not robust to highly variable runtimes

julia> f(::Int) = nothing
f (generic function with 1 method)

julia> f(::Float64) = sleep(.01)
f (generic function with 2 methods)

julia> using StableRNGs

julia> rng = StableRNG(0)
StableRNGs.LehmerRNG(state=0x00000000000000000000000000000001)

julia> @be rand(rng, (1, 2.0)) f
[ Info: Loading Chairmarks ...
Benchmark: 17 samples with 1 evaluation
min    0 ns
median 42.000 ns
mean   5.206 ms (1.88 allocs: 52.706 bytes)
max    11.079 ms (4 allocs: 112 bytes)

julia> rng = StableRNG(1)
StableRNGs.LehmerRNG(state=0x00000000000000000000000000000003)

julia> @be rand(rng, (1, 2.0)) f
[hangs for 5 minutes]

A less reliably reproducing variant was originally reported by @mbauman here.

Proposed fix:

When reporting final results (or maybe half way through the runtime budget) check to see if evals is actually reasonable. If not, rerun or warn that auto-tuning failed and prompt the user to manually tune the benchmark.

When choosing a high number of evals, increase the number of evals run by at most a factor of 10x at a time and make each of those trials a new sample (with new setup & teardown).

This will not cover the @be rand() < .01 if _ sleep(10) end case, but that case is nearly impossible to cover, and this will cover all reasonable cases (I hope).

How to handle functions which run slowly the first few thousand times and fast on subsequent runs?

In a fresh REPL session:

foo(::Type{<:Array}) = nothing
T = Vector{Int};
g() = foo(T)
using Chairmarks
@b g  # timing always larger than below
@b g  # timing always smaller than above

Example:

julia> foo(::Type{<:Array}) = nothing
foo (generic function with 1 method)

julia> T = Vector{Int};

julia> g() = foo(T)
g (generic function with 1 method)

julia> using Chairmarks

julia> @b g
333.609 ns

julia> @b g
317.712 ns

julia> @b g
310.692 ns

julia> @b g
314.212 ns

(@v1.12) pkg> st Chairmarks
Status `~/.julia/environments/v1.12/Project.toml`
  [0ca39b1e] Chairmarks v1.2.1

`seconds=Inf` does not work

Is it possible to have unlimited seconds?

julia> using Chairmarks

julia> @be exp(rand(10, 10)) evals=1 samples=10 seconds=Inf
ERROR: InexactError: trunc(UInt64, Inf)
Stacktrace:
 [1] trunc
   @ ./float.jl:881 [inlined]
 [2] round(::Type{UInt64}, x::Float64)
   @ Base ./float.jl:385
 [3] benchmark(init::Any, setup::Any, f::Any, teardown::Any; evals::Union{…}, samples::Union{…}, seconds::Union{…}, gc::Bool, checksum::Bool, _map::Any, _reduction::Any)
   @ Chairmarks ~/.julia/packages/Chairmarks/7hE0Y/src/benchmarking.jl:102
 [4] benchmark
   @ ~/.julia/packages/Chairmarks/7hE0Y/src/benchmarking.jl:20 [inlined]
 [5] benchmark (repeats 2 times)
   @ ~/.julia/packages/Chairmarks/7hE0Y/src/benchmarking.jl:14 [inlined]
 [6] #benchmark#5
   @ ~/.julia/packages/Chairmarks/7hE0Y/src/benchmarking.jl:13 [inlined]
 [7] top-level scope
   @ REPL[2]:1
Some type information was truncated. Use `show(err)` to see complete types.

Bug when specifying all of samples, seconds, and evals, and limited by seconds.

julia> @b sleep(.01) evals=2 samples=100 seconds=0.5
0 ns (NaN allocs: 0 bytes, NaN% compile time, without a warmup)

use `Base.donotdelete`?

I saw the checksum description from https://chairmarks.lilithhafner.com/v1.1.0/why#Truthful; that sounds a lot like

help?> Base.donotdelete
  Base.donotdelete(args...)

  This function prevents dead-code elimination (DCE) of itself and any arguments passed to it, but
  is otherwise the lightest barrier possible. In particular, it is not a GC safepoint, does model
  an observable heap effect, does not expand to any code itself and may be re-ordered with respect
  to other side effects (though the total number of executions may not change).

  A useful model for this function is that it hashes all memory reachable from args and escapes
  this information through some observable side-channel that does not otherwise impact program
  behavior. Of course that's just a model. The function does nothing and returns nothing.

  This is intended for use in benchmarks that want to guarantee that args are actually computed.
  (Otherwise DCE may see that the result of the benchmark is unused and delete the entire benchmark
  code).

  │ Note
  │
  │  donotdelete does not affect constant folding. For example, in donotdelete(1+1), no add
  │  instruction needs to be executed at runtime and the code is semantically equivalent to
  │  donotdelete(2).

  Examples
  ≡≡≡≡≡≡≡≡

  function loop()
      for i = 1:1000
          # The compiler must guarantee that there are 1000 program points (in the correct
          # order) at which the value of `i` is in a register, but has otherwise
          # total control over the program.
          donotdelete(i)
      end
  end

@b Produces `Infs` in matrix

Hi there,

I might just be using the @b macro wrongly. But i was confused that in the following code

# This is on v. 1.2.1
using LinearAlgebra, Chairmarks
M = rand(3,3)
D = Diagonal(2ones(3))
@b lmul!(D,M) # M is not filled with Infs

the M gets filled with Infs.

[Discussion] Global settings/defaults

I noticed that there was some instability when benchmarking functions with a runtime of ~3ms, which was reduced significantly when I set seconds=1.

Would it be possible to set "global defaults" when using Chairmarks, so that I could set the default value of seconds to be 1s instead of 0.1s?

Samples parameters is off by one?

julia> @be sleep(0.01) evals=1 samples=1
Benchmark: 0 samples

julia> @be sleep(0.01) evals=1 samples=2
Benchmark: 1 sample with 1 evaluation
       11.362 ms (4 allocs: 112 bytes)

Suggestion to Follow BenchmarkTools More Closely

Just a couple of suggestions for this superb blazingly fast package! In summary, I propose following BenchmarkTools more closely regarding layout.

Thanks for your work!

Adding "0 allocations: 0 bytes" when there are no allocations, instead of printing nothing. This follows @btime and is also consistent when there's at least one allocation with @b. Example

7.107 ns (0 allocations: 0 bytes) # this is @btime
7.300 ns                          # this is @b

# but 
34.694 ns (1 allocation: 896 bytes) # this is @btime
33.846 ns (1 allocs: 896 bytes)     # this is @b

It might seem trivial, but it becomes important when you have several lines of results, you're new to the package, and for teaching.

Add display by default as BenchmarkTools to reduce boilerplate code. Right now, to get the same results when you execute a whole code snippet, we need to explicitly add display to see the result. Example:

using BenchmarkTools
using Chairmarks

x = rand(100)
foo1(x) = x^2
foo2(x) = x^3

# this displays all the results directly when the whole code is executed
@btime foo1.($x)
@btime foo2.($x)

# you need to add display to get the same behavior as above
display(@b foo1.($x))
display(@b foo2.($x))

This is also a suggestion, but probably it's too late. Maybe change @b for a more explicit name? I can imagine that reading code with @b without any context calls for confusion.

Count CPU cycles

e.g. ccall("llvm.x86.rdtsc",llvmcall, Int, (), ) + thread pinning

Will this be registered?

Hi Lilith!
I'm updating my blog Modern Julia Workflows and I'm wondering if I should include this among the benchmarking tools, or if it's still a prototype mainly for yourself?
Thanks

Less accurate answers than BenchmarkTools on some microbenchmarks

g is a little faster than f, as justified by these macrobenchmarks and analysis of generated code. Chairmarks fails to detect this while BenchmarkTools succeeds.

Originally reported by @matthias314 here with macrobenchmarks by @MasonProtter here

julia> function macro_benchmark_5!(out, f)
           for j ∈ axes(out, 2)
               x = UInt128(j)
               for i ∈ axes(out, 1)
                   out[i, j] = f(x, i)
               end
           end
       end;

julia> function macro_benchmark_5_noinline!(out, f)
           for j ∈ axes(out, 2)
               x = UInt128(j)
               for i ∈ axes(out, 1)
                   out[i, j] = @noinline f(x, i)
               end
           end
       end;

julia> function macro_benchmark_6!(f, N)
           for j ∈ 1:N
               x = UInt128(j)
               for i ∈ 1:N
                   Base.donotdelete(f(x, i))
               end
           end
       end;

julia> function macro_benchmark_6_noinline!(f, N)
           for j ∈ 1:N
               x = UInt128(j)
               for i ∈ 1:N
                   Base.donotdelete(@noinline f(x, i))
               end
           end
       end;

julia> f(x, n) = x << n;

julia> g(x, n) = x << (n & 63);

julia> let
           out = Matrix{UInt128}(undef, 10_000, 10_000)
           @time macro_benchmark_5!(out, f)
           @time macro_benchmark_5!(out, g)
           println()
           @time macro_benchmark_5_noinline!(out, f)
           @time macro_benchmark_5_noinline!(out, g)
           println()
           @time macro_benchmark_6!(f, 10_000)
           @time macro_benchmark_6!(g, 10_000)
           println()
           @time macro_benchmark_6_noinline!(f, 10_000)
           @time macro_benchmark_6_noinline!(g, 10_000)
       end
  0.185143 seconds
  0.121225 seconds

  0.220452 seconds
  0.181451 seconds

  0.114753 seconds
  0.115903 seconds

  0.258359 seconds
  0.204093 seconds

julia> x = UInt128(1); n = 1;

julia> @btime f($x, $n);
  2.500 ns (0 allocations: 0 bytes)

julia> @btime g($x, $n);
  1.958 ns (0 allocations: 0 bytes)

julia> @b f($x, $n)
1.137 ns

julia> @b g($x, $n)
1.137 ns

`@b @b hash(rand()) seconds=.0000001` throws

julia> @b @b hash(rand()) seconds=.0000001
ERROR: InexactError: trunc(Int64, NaN)
Stacktrace:
  [1] trunc
    @ ./float.jl:905 [inlined]
  [2] floor(::Type{Int64}, x::Float64)
    @ Base ./float.jl:383
  [3] benchmark(init::Any, setup::Any, f::Any, teardown::Any; evals::Union{…}, samples::Union{…}, seconds::Union{…}, map::Any, reduction::Any)
    @ Chairmarks ~/.julia/packages/Chairmarks/bdfFn/src/benchmarking.jl:67
  [4] benchmark
    @ ~/.julia/packages/Chairmarks/bdfFn/src/benchmarking.jl:15 [inlined]
  [5] benchmark (repeats 2 times)
    @ ~/.julia/packages/Chairmarks/bdfFn/src/benchmarking.jl:9 [inlined]
  [6] benchmark
    @ ~/.julia/packages/Chairmarks/bdfFn/src/benchmarking.jl:8 [inlined]
  [7] #41
    @ ~/.julia/packages/Chairmarks/bdfFn/src/macro_tools.jl:20 [inlined]
  [8] _benchmark(f::var"#41#43", map::typeof(Chairmarks.default_map), reduction::typeof(Chairmarks.default_reduction), args::Tuple{}, evals::Int64, warmup::Bool)
    @ Chairmarks ~/.julia/packages/Chairmarks/bdfFn/src/benchmarking.jl:100
  [9] bench
    @ ~/.julia/packages/Chairmarks/bdfFn/src/benchmarking.jl:27 [inlined]
 [10] (::Chairmarks.var"#bench#6"{typeof(Chairmarks.default_map), typeof(Chairmarks.default_reduction), Nothing, var"#41#43", Nothing, Tuple{}})(evals::Int64)
    @ Chairmarks ~/.julia/packages/Chairmarks/bdfFn/src/benchmarking.jl:26
 [11] benchmark(init::Any, setup::Any, f::Any, teardown::Any; evals::Union{…}, samples::Union{…}, seconds::Union{…}, map::Any, reduction::Any)
    @ Chairmarks ~/.julia/packages/Chairmarks/bdfFn/src/benchmarking.jl:87
 [12] benchmark
    @ ~/.julia/packages/Chairmarks/bdfFn/src/benchmarking.jl:15 [inlined]
 [13] benchmark (repeats 2 times)
    @ ~/.julia/packages/Chairmarks/bdfFn/src/benchmarking.jl:9 [inlined]
 [14] benchmark(f::Function)
    @ Chairmarks ~/.julia/packages/Chairmarks/bdfFn/src/benchmarking.jl:8
 [15] top-level scope
    @ REPL[9]:1
Some type information was truncated. Use `show(err)` to see complete types.

Need better performance regression testing

Must have very low false positivity rate
Must run quickly
Should be data efficient
Should provide nice visualizations on failures that paint a compelling picture that the failure is real
Should perform drift detection
Should track performance separately across operating systems or other configurable values
Should provide searchable, browsable visualizations of all tracked parameters
Should easily interoperate with other visualization and benchmarking programs and data formats
Should use to test this package
Should provide an interface for users of this package to use
Should provide regression testing to Base
Should support measuring TTFX, load time, compile time, and classic runtime.
Should support arbitrary real numbers

Use `Base._stable_typeof`

Whenever we tell the compiler the type of something, we should follow Base._stable_typeof and also tell it the value if the value is a type. i.e.

julia> @b Int rand
84.453 ns (1.17 allocs: 19.085 bytes)

Should be type-stable

Stable documentation link points to out of date docs.

Detect cases where first eval is slower than subsequent evals

If I have something like @b rand(1000) sort!, the first eval is much slower than subsequent evals within a given sample, which violates benchmarking assumptions and results in weird results. For example, @b rand(1000) sort! reports a super fast runtime while @b rand(100_000) sort! is realistic.

See: compintell/Tapir.jl#140

julia> @be rand(100_000) sort!
Benchmark: 100 samples with 1 evaluation
min    761.379 μs (6 allocs: 789.438 KiB)
median 871.046 μs (6 allocs: 789.438 KiB)
mean   890.113 μs (6 allocs: 789.438 KiB, 2.74% gc time)
max    1.223 ms (6 allocs: 789.438 KiB, 14.46% gc time)

julia> @be rand(1000) sort!
Benchmark: 2943 samples with 7 evaluations
min    2.345 μs (0.86 allocs: 1.429 KiB)
median 3.208 μs (0.86 allocs: 1.429 KiB)
mean   4.221 μs (0.86 allocs: 1.434 KiB, 0.25% gc time)
max    701.837 μs (0.86 allocs: 1.714 KiB, 98.49% gc time)

`@b rand hash seconds=1`

julia> @b rand hash seconds=1
ERROR: TypeError: in keyword argument seconds, expected Union{Nothing, Float64}, got a value of type Int64
Stacktrace:
 [1] benchmark(setup::Function, f::Function, teardown::Nothing; kw::Base.Pairs{Symbol, Int64, Tuple{Symbol}, NamedTuple{(:seconds,), Tuple{Int64}}})
   @ Tablemarks ~/.julia/packages/Tablemarks/KkyUq/src/benchmarking.jl:9
 [2] top-level scope
   @ REPL[5]:1

Support comparative benchmarking

It would be nice to have a way to communicate that two implementations of the same function are to be compared (e.g. @be Compare(f, g) or @b init setup {f g h} teardown or some such). This would allow a more efficient backend that takes this into account in experimental design.

support $x variable interpolation

It would be nice if the @b macro supported $x interpolation of globals and other expressions into benchmarked expressions, to make it easier to benchmark expressions using global data without using @eval and without writing a function.

This is a widely used feature of BenchmarkTools / @btime … why not copy it?

Parsing error on standalone literal symbols

julia> x = :my_func
:my_func

julia> @b x isdefined(Main, _)
178.125 ns

julia> @b :my_func isdefined(Main, _)
ERROR: TypeError: in isdefined, expected Symbol, got a value of type QuoteNode
Stacktrace:
 [1] (::var"#23#24")(242::QuoteNode)
   @ Main ~/.julia/packages/Chairmarks/7hE0Y/src/macro_tools.jl:52
 [2] _benchmark(f::var"#23#24", map::typeof(Core.donotdelete), reduction::typeof(Chairmarks.default_reduction), args::Tuple{QuoteNode}, evals::Int64, warmup::Bool)
   @ Chairmarks ~/.julia/packages/Chairmarks/7hE0Y/src/benchmarking.jl:119
 [3] (::Chairmarks.var"#bench#8"{Bool, typeof(Core.donotdelete), typeof(Chairmarks.default_reduction), Returns{QuoteNode}, var"#23#24", Nothing, Tuple{}})(evals::Int64, warmup::Bool)
   @ Chairmarks ~/.julia/packages/Chairmarks/7hE0Y/src/benchmarking.jl:34
 [4] benchmark(init::Any, setup::Any, f::Any, teardown::Any; evals::Union{…}, samples::Union{…}, seconds::Union{…}, gc::Bool, checksum::Bool, _map::Any, _reduction::Any)
   @ Chairmarks ~/.julia/packages/Chairmarks/7hE0Y/src/benchmarking.jl:44
 [5] benchmark
   @ ~/.julia/packages/Chairmarks/7hE0Y/src/benchmarking.jl:20 [inlined]
 [6] benchmark
   @ ~/.julia/packages/Chairmarks/7hE0Y/src/benchmarking.jl:14 [inlined]
 [7] benchmark(setup::Function, f::Function)
   @ Chairmarks ~/.julia/packages/Chairmarks/7hE0Y/src/benchmarking.jl:14
 [8] top-level scope
   @ REPL[58]:1
Some type information was truncated. Use `show(err)` to see complete types.

Histograms of benchmark times

Hi,

I ported the code that produces the nice histograms of the @benchmark macro from BenchmarkTools to Chairmarks. Right now I created a new package, PrettyChairmarks.jl.

In case there is a general agreement that this useful, I would be happy to make a pull request to include this code directly in Chairmarks.

PSA: It is possible to use `BenchmarkTools.BenchmarkGroup` with Chairmarks

Simply replacing @benchmarkable with @be suffices, and you don't have to run tune! or run either!

Even running Statistics.median(suite) works - although any custom plotting utilities might need a couple of tweaks :)

Does each `Sample` in a `Benchmark` have the same number of `evals`?

It would seem natural but since each Sample carries its own evals field I wanted to double-check

Does not suppress allocation & GC time as much as BenchmarkTools

Chairmarks.jl can't get the best performance before excuting the methods from BenchmarkTools.
See my test:

julia> using Chairmarks

julia> @b rand(100, 10000, 100)
178.522 ms (2 allocs: 762.940 MiB, 24.89% gc time)

julia> @b rand(100, 10000, 100)
174.507 ms (2 allocs: 762.940 MiB, 24.95% gc time)

julia> using BenchmarkTools

julia> @b rand(100, 10000, 100)
180.840 ms (2 allocs: 762.940 MiB, 24.67% gc time)

julia> @b rand(100, 10000, 100)
172.184 ms (2 allocs: 762.940 MiB, 24.11% gc time)

julia> @btime rand(100, 10000, 100);
  123.355 ms (2 allocations: 762.94 MiB)

julia> @b rand(100, 10000, 100)
126.622 ms (2 allocs: 762.940 MiB)

julia> @b rand(100, 10000, 100)
125.907 ms (2 allocs: 762.940 MiB)

Most strangly, It seems like happenning randomly. Sometime this trick won't work, too.
Also, the performance fluctuating randomly even not doing anything, sometime can reach 200% times comparing with base time given by best-performanced @b and @btime.
A manual way for gc configuration may help?

benchmark results oscillate

In this example, the timings returned by @b oscillate between two values, one of which agrees with @btime:

julia> using StaticArrays, BenchmarkTools, Chairmarks

julia> v = zero(SVector{14,Int16}); @btime $v == $v; @b $v == $v
  9.522 ns (0 allocations: 0 bytes)
16.110 ns

julia> v = zero(SVector{14,Int16}); @btime $v == $v; @b $v == $v
  9.510 ns (0 allocations: 0 bytes)
9.763 ns

julia> v = zero(SVector{14,Int16}); @btime $v == $v; @b $v == $v
  9.515 ns (0 allocations: 0 bytes)
16.536 ns

julia> v = zero(SVector{14,Int16}); @btime $v == $v; @b $v == $v
  9.517 ns (0 allocations: 0 bytes)
9.764 ns

I only get this on my laptop, not on another machine. versioninfo() for my laptop:

Julia Version 1.10.0
Commit 3120989f39b (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × Intel(R) Core(TM) i3-10110U CPU @ 2.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake)
  Threads: 1 on 4 virtual cores

Packages:

Status `/tmp/jl_AUb4oc/Project.toml`
  [6e4b80f9] BenchmarkTools v1.5.0
  [0ca39b1e] Chairmarks v1.1.2
  [90137ffa] StaticArrays v1.9.3

Addition: For @btime I once got the higher number when I ran the benchmark for the first time:

julia> v = zero(SVector{14,Int16}); @btime $v == $v; @b $v == $v
  16.658 ns (0 allocations: 0 bytes)
17.701 ns

but I cannot reproduce this.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!