Giter Site home page Giter Site logo

chairmarks.jl's Introduction

Chairmarks

Stable Dev Build Status Coverage Aqua

Chairmarks measures performance hundreds of times faster than BenchmarkTools without compromising on accuracy.

Installation

julia> import Pkg; Pkg.add("Chairmarks")

Usage

julia> using Chairmarks

julia> @b rand(1000) # How long does it take to generate a random array of length 1000?
720.214 ns (3 allocs: 7.875 KiB)

julia> @b rand(1000) hash # How long does it take to hash that array?
1.689 μs

julia> @b rand(1000) _.*5 # How long does it take to multiply it by 5 element wise?
172.970 ns (3 allocs: 7.875 KiB)

Why Chairmarks?

Tutorial

API Reference

chairmarks.jl's People

Contributors

etiennedeg avatar lilithhafner avatar phlaster avatar samuelbadr avatar simonp0420 avatar zentrik avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

chairmarks.jl's Issues

Tuning is not robust to highly variable runtimes

julia> f(::Int) = nothing
f (generic function with 1 method)

julia> f(::Float64) = sleep(.01)
f (generic function with 2 methods)

julia> using StableRNGs

julia> rng = StableRNG(0)
StableRNGs.LehmerRNG(state=0x00000000000000000000000000000001)

julia> @be rand(rng, (1, 2.0)) f
[ Info: Loading Chairmarks ...
Benchmark: 17 samples with 1 evaluation
min    0 ns
median 42.000 ns
mean   5.206 ms (1.88 allocs: 52.706 bytes)
max    11.079 ms (4 allocs: 112 bytes)

julia> rng = StableRNG(1)
StableRNGs.LehmerRNG(state=0x00000000000000000000000000000003)

julia> @be rand(rng, (1, 2.0)) f
[hangs for 5 minutes]

A less reliably reproducing variant was originally reported by @mbauman here.

Proposed fix:

When reporting final results (or maybe half way through the runtime budget) check to see if evals is actually reasonable. If not, rerun or warn that auto-tuning failed and prompt the user to manually tune the benchmark.

When choosing a high number of evals, increase the number of evals run by at most a factor of 10x at a time and make each of those trials a new sample (with new setup & teardown).

This will not cover the @be rand() < .01 if _ sleep(10) end case, but that case is nearly impossible to cover, and this will cover all reasonable cases (I hope).

How to handle functions which run slowly the first few thousand times and fast on subsequent runs?

In a fresh REPL session:

foo(::Type{<:Array}) = nothing
T = Vector{Int};
g() = foo(T)
using Chairmarks
@b g  # timing always larger than below
@b g  # timing always smaller than above

Example:

julia> foo(::Type{<:Array}) = nothing
foo (generic function with 1 method)

julia> T = Vector{Int};

julia> g() = foo(T)
g (generic function with 1 method)

julia> using Chairmarks

julia> @b g
333.609 ns

julia> @b g
317.712 ns

julia> @b g
310.692 ns

julia> @b g
314.212 ns
(@v1.12) pkg> st Chairmarks
Status `~/.julia/environments/v1.12/Project.toml`
  [0ca39b1e] Chairmarks v1.2.1

`seconds=Inf` does not work

Is it possible to have unlimited seconds?

julia> using Chairmarks

julia> @be exp(rand(10, 10)) evals=1 samples=10 seconds=Inf
ERROR: InexactError: trunc(UInt64, Inf)
Stacktrace:
 [1] trunc
   @ ./float.jl:881 [inlined]
 [2] round(::Type{UInt64}, x::Float64)
   @ Base ./float.jl:385
 [3] benchmark(init::Any, setup::Any, f::Any, teardown::Any; evals::Union{…}, samples::Union{…}, seconds::Union{…}, gc::Bool, checksum::Bool, _map::Any, _reduction::Any)
   @ Chairmarks ~/.julia/packages/Chairmarks/7hE0Y/src/benchmarking.jl:102
 [4] benchmark
   @ ~/.julia/packages/Chairmarks/7hE0Y/src/benchmarking.jl:20 [inlined]
 [5] benchmark (repeats 2 times)
   @ ~/.julia/packages/Chairmarks/7hE0Y/src/benchmarking.jl:14 [inlined]
 [6] #benchmark#5
   @ ~/.julia/packages/Chairmarks/7hE0Y/src/benchmarking.jl:13 [inlined]
 [7] top-level scope
   @ REPL[2]:1
Some type information was truncated. Use `show(err)` to see complete types.

use `Base.donotdelete`?

I saw the checksum description from https://chairmarks.lilithhafner.com/v1.1.0/why#Truthful; that sounds a lot like

help?> Base.donotdelete
  Base.donotdelete(args...)

  This function prevents dead-code elimination (DCE) of itself and any arguments passed to it, but
  is otherwise the lightest barrier possible. In particular, it is not a GC safepoint, does model
  an observable heap effect, does not expand to any code itself and may be re-ordered with respect
  to other side effects (though the total number of executions may not change).

  A useful model for this function is that it hashes all memory reachable from args and escapes
  this information through some observable side-channel that does not otherwise impact program
  behavior. Of course that's just a model. The function does nothing and returns nothing.

  This is intended for use in benchmarks that want to guarantee that args are actually computed.
  (Otherwise DCE may see that the result of the benchmark is unused and delete the entire benchmark
  code).

  │ Note
  │
  │  donotdelete does not affect constant folding. For example, in donotdelete(1+1), no add
  │  instruction needs to be executed at runtime and the code is semantically equivalent to
  │  donotdelete(2).

  Examples
  ≡≡≡≡≡≡≡≡

  function loop()
      for i = 1:1000
          # The compiler must guarantee that there are 1000 program points (in the correct
          # order) at which the value of `i` is in a register, but has otherwise
          # total control over the program.
          donotdelete(i)
      end
  end

@b Produces `Infs` in matrix

Hi there,

I might just be using the @b macro wrongly. But i was confused that in the following code

# This is on v. 1.2.1
using LinearAlgebra, Chairmarks
M = rand(3,3)
D = Diagonal(2ones(3))
@b lmul!(D,M) # M is not filled with Infs

the M gets filled with Infs.

[Discussion] Global settings/defaults

I noticed that there was some instability when benchmarking functions with a runtime of ~3ms, which was reduced significantly when I set seconds=1.

Would it be possible to set "global defaults" when using Chairmarks, so that I could set the default value of seconds to be 1s instead of 0.1s?

Samples parameters is off by one?

julia> @be sleep(0.01) evals=1 samples=1
Benchmark: 0 samples

julia> @be sleep(0.01) evals=1 samples=2
Benchmark: 1 sample with 1 evaluation
       11.362 ms (4 allocs: 112 bytes)

Suggestion to Follow BenchmarkTools More Closely

Just a couple of suggestions for this superb blazingly fast package! In summary, I propose following BenchmarkTools more closely regarding layout.

Thanks for your work!

  1. Adding "0 allocations: 0 bytes" when there are no allocations, instead of printing nothing. This follows @btime and is also consistent when there's at least one allocation with @b. Example
7.107 ns (0 allocations: 0 bytes) # this is @btime
7.300 ns                          # this is @b

# but 
34.694 ns (1 allocation: 896 bytes) # this is @btime
33.846 ns (1 allocs: 896 bytes)     # this is @b

It might seem trivial, but it becomes important when you have several lines of results, you're new to the package, and for teaching.

  1. Add display by default as BenchmarkTools to reduce boilerplate code. Right now, to get the same results when you execute a whole code snippet, we need to explicitly add display to see the result. Example:
using BenchmarkTools
using Chairmarks

x = rand(100)
foo1(x) = x^2
foo2(x) = x^3

# this displays all the results directly when the whole code is executed
@btime foo1.($x)
@btime foo2.($x)

# you need to add display to get the same behavior as above
display(@b foo1.($x))
display(@b foo2.($x))
  1. This is also a suggestion, but probably it's too late. Maybe change @b for a more explicit name? I can imagine that reading code with @b without any context calls for confusion.

Count CPU cycles

e.g. ccall("llvm.x86.rdtsc",llvmcall, Int, (), ) + thread pinning

Will this be registered?

Hi Lilith!
I'm updating my blog Modern Julia Workflows and I'm wondering if I should include this among the benchmarking tools, or if it's still a prototype mainly for yourself?
Thanks

Less accurate answers than BenchmarkTools on some microbenchmarks

g is a little faster than f, as justified by these macrobenchmarks and analysis of generated code. Chairmarks fails to detect this while BenchmarkTools succeeds.

Originally reported by @matthias314 here with macrobenchmarks by @MasonProtter here

julia> function macro_benchmark_5!(out, f)
           for j  axes(out, 2)
               x = UInt128(j)
               for i  axes(out, 1)
                   out[i, j] = f(x, i)
               end
           end
       end;

julia> function macro_benchmark_5_noinline!(out, f)
           for j  axes(out, 2)
               x = UInt128(j)
               for i  axes(out, 1)
                   out[i, j] = @noinline f(x, i)
               end
           end
       end;

julia> function macro_benchmark_6!(f, N)
           for j  1:N
               x = UInt128(j)
               for i  1:N
                   Base.donotdelete(f(x, i))
               end
           end
       end;

julia> function macro_benchmark_6_noinline!(f, N)
           for j  1:N
               x = UInt128(j)
               for i  1:N
                   Base.donotdelete(@noinline f(x, i))
               end
           end
       end;

julia> f(x, n) = x << n;

julia> g(x, n) = x << (n & 63);

julia> let
           out = Matrix{UInt128}(undef, 10_000, 10_000)
           @time macro_benchmark_5!(out, f)
           @time macro_benchmark_5!(out, g)
           println()
           @time macro_benchmark_5_noinline!(out, f)
           @time macro_benchmark_5_noinline!(out, g)
           println()
           @time macro_benchmark_6!(f, 10_000)
           @time macro_benchmark_6!(g, 10_000)
           println()
           @time macro_benchmark_6_noinline!(f, 10_000)
           @time macro_benchmark_6_noinline!(g, 10_000)
       end
  0.185143 seconds
  0.121225 seconds

  0.220452 seconds
  0.181451 seconds

  0.114753 seconds
  0.115903 seconds

  0.258359 seconds
  0.204093 seconds

julia> x = UInt128(1); n = 1;

julia> @btime f($x, $n);
  2.500 ns (0 allocations: 0 bytes)

julia> @btime g($x, $n);
  1.958 ns (0 allocations: 0 bytes)

julia> @b f($x, $n)
1.137 ns

julia> @b g($x, $n)
1.137 ns

`@b @b hash(rand()) seconds=.0000001` throws

julia> @b @b hash(rand()) seconds=.0000001
ERROR: InexactError: trunc(Int64, NaN)
Stacktrace:
  [1] trunc
    @ ./float.jl:905 [inlined]
  [2] floor(::Type{Int64}, x::Float64)
    @ Base ./float.jl:383
  [3] benchmark(init::Any, setup::Any, f::Any, teardown::Any; evals::Union{…}, samples::Union{…}, seconds::Union{…}, map::Any, reduction::Any)
    @ Chairmarks ~/.julia/packages/Chairmarks/bdfFn/src/benchmarking.jl:67
  [4] benchmark
    @ ~/.julia/packages/Chairmarks/bdfFn/src/benchmarking.jl:15 [inlined]
  [5] benchmark (repeats 2 times)
    @ ~/.julia/packages/Chairmarks/bdfFn/src/benchmarking.jl:9 [inlined]
  [6] benchmark
    @ ~/.julia/packages/Chairmarks/bdfFn/src/benchmarking.jl:8 [inlined]
  [7] #41
    @ ~/.julia/packages/Chairmarks/bdfFn/src/macro_tools.jl:20 [inlined]
  [8] _benchmark(f::var"#41#43", map::typeof(Chairmarks.default_map), reduction::typeof(Chairmarks.default_reduction), args::Tuple{}, evals::Int64, warmup::Bool)
    @ Chairmarks ~/.julia/packages/Chairmarks/bdfFn/src/benchmarking.jl:100
  [9] bench
    @ ~/.julia/packages/Chairmarks/bdfFn/src/benchmarking.jl:27 [inlined]
 [10] (::Chairmarks.var"#bench#6"{typeof(Chairmarks.default_map), typeof(Chairmarks.default_reduction), Nothing, var"#41#43", Nothing, Tuple{}})(evals::Int64)
    @ Chairmarks ~/.julia/packages/Chairmarks/bdfFn/src/benchmarking.jl:26
 [11] benchmark(init::Any, setup::Any, f::Any, teardown::Any; evals::Union{…}, samples::Union{…}, seconds::Union{…}, map::Any, reduction::Any)
    @ Chairmarks ~/.julia/packages/Chairmarks/bdfFn/src/benchmarking.jl:87
 [12] benchmark
    @ ~/.julia/packages/Chairmarks/bdfFn/src/benchmarking.jl:15 [inlined]
 [13] benchmark (repeats 2 times)
    @ ~/.julia/packages/Chairmarks/bdfFn/src/benchmarking.jl:9 [inlined]
 [14] benchmark(f::Function)
    @ Chairmarks ~/.julia/packages/Chairmarks/bdfFn/src/benchmarking.jl:8
 [15] top-level scope
    @ REPL[9]:1
Some type information was truncated. Use `show(err)` to see complete types.

Need better performance regression testing

  • Must have very low false positivity rate
  • Must run quickly
  • Should be data efficient
  • Should provide nice visualizations on failures that paint a compelling picture that the failure is real
  • Should perform drift detection
  • Should track performance separately across operating systems or other configurable values
  • Should provide searchable, browsable visualizations of all tracked parameters
  • Should easily interoperate with other visualization and benchmarking programs and data formats
  • Should use to test this package
  • Should provide an interface for users of this package to use
  • Should provide regression testing to Base
  • Should support measuring TTFX, load time, compile time, and classic runtime.
  • Should support arbitrary real numbers

Use `Base._stable_typeof`

Whenever we tell the compiler the type of something, we should follow Base._stable_typeof and also tell it the value if the value is a type. i.e.

julia> @b Int rand
84.453 ns (1.17 allocs: 19.085 bytes)

Should be type-stable

Detect cases where first eval is slower than subsequent evals

If I have something like @b rand(1000) sort!, the first eval is much slower than subsequent evals within a given sample, which violates benchmarking assumptions and results in weird results. For example, @b rand(1000) sort! reports a super fast runtime while @b rand(100_000) sort! is realistic.

See: compintell/Tapir.jl#140

julia> @be rand(100_000) sort!
Benchmark: 100 samples with 1 evaluation
min    761.379 μs (6 allocs: 789.438 KiB)
median 871.046 μs (6 allocs: 789.438 KiB)
mean   890.113 μs (6 allocs: 789.438 KiB, 2.74% gc time)
max    1.223 ms (6 allocs: 789.438 KiB, 14.46% gc time)

julia> @be rand(1000) sort!
Benchmark: 2943 samples with 7 evaluations
min    2.345 μs (0.86 allocs: 1.429 KiB)
median 3.208 μs (0.86 allocs: 1.429 KiB)
mean   4.221 μs (0.86 allocs: 1.434 KiB, 0.25% gc time)
max    701.837 μs (0.86 allocs: 1.714 KiB, 98.49% gc time)

`@b rand hash seconds=1`

julia> @b rand hash seconds=1
ERROR: TypeError: in keyword argument seconds, expected Union{Nothing, Float64}, got a value of type Int64
Stacktrace:
 [1] benchmark(setup::Function, f::Function, teardown::Nothing; kw::Base.Pairs{Symbol, Int64, Tuple{Symbol}, NamedTuple{(:seconds,), Tuple{Int64}}})
   @ Tablemarks ~/.julia/packages/Tablemarks/KkyUq/src/benchmarking.jl:9
 [2] top-level scope
   @ REPL[5]:1

Support comparative benchmarking

It would be nice to have a way to communicate that two implementations of the same function are to be compared (e.g. @be Compare(f, g) or @b init setup {f g h} teardown or some such). This would allow a more efficient backend that takes this into account in experimental design.

See also: JuliaCI/BenchmarkTools.jl#239

support $x variable interpolation

It would be nice if the @b macro supported $x interpolation of globals and other expressions into benchmarked expressions, to make it easier to benchmark expressions using global data without using @eval and without writing a function.

This is a widely used feature of BenchmarkTools / @btime … why not copy it?

Parsing error on standalone literal symbols

julia> x = :my_func
:my_func

julia> @b x isdefined(Main, _)
178.125 ns

julia> @b :my_func isdefined(Main, _)
ERROR: TypeError: in isdefined, expected Symbol, got a value of type QuoteNode
Stacktrace:
 [1] (::var"#23#24")(242::QuoteNode)
   @ Main ~/.julia/packages/Chairmarks/7hE0Y/src/macro_tools.jl:52
 [2] _benchmark(f::var"#23#24", map::typeof(Core.donotdelete), reduction::typeof(Chairmarks.default_reduction), args::Tuple{QuoteNode}, evals::Int64, warmup::Bool)
   @ Chairmarks ~/.julia/packages/Chairmarks/7hE0Y/src/benchmarking.jl:119
 [3] (::Chairmarks.var"#bench#8"{Bool, typeof(Core.donotdelete), typeof(Chairmarks.default_reduction), Returns{QuoteNode}, var"#23#24", Nothing, Tuple{}})(evals::Int64, warmup::Bool)
   @ Chairmarks ~/.julia/packages/Chairmarks/7hE0Y/src/benchmarking.jl:34
 [4] benchmark(init::Any, setup::Any, f::Any, teardown::Any; evals::Union{…}, samples::Union{…}, seconds::Union{…}, gc::Bool, checksum::Bool, _map::Any, _reduction::Any)
   @ Chairmarks ~/.julia/packages/Chairmarks/7hE0Y/src/benchmarking.jl:44
 [5] benchmark
   @ ~/.julia/packages/Chairmarks/7hE0Y/src/benchmarking.jl:20 [inlined]
 [6] benchmark
   @ ~/.julia/packages/Chairmarks/7hE0Y/src/benchmarking.jl:14 [inlined]
 [7] benchmark(setup::Function, f::Function)
   @ Chairmarks ~/.julia/packages/Chairmarks/7hE0Y/src/benchmarking.jl:14
 [8] top-level scope
   @ REPL[58]:1
Some type information was truncated. Use `show(err)` to see complete types.

Histograms of benchmark times

Hi,

I ported the code that produces the nice histograms of the @benchmark macro from BenchmarkTools to Chairmarks. Right now I created a new package, PrettyChairmarks.jl.

In case there is a general agreement that this useful, I would be happy to make a pull request to include this code directly in Chairmarks.

Does not suppress allocation & GC time as much as BenchmarkTools

Chairmarks.jl can't get the best performance before excuting the methods from BenchmarkTools.
See my test:

julia> using Chairmarks

julia> @b rand(100, 10000, 100)
178.522 ms (2 allocs: 762.940 MiB, 24.89% gc time)

julia> @b rand(100, 10000, 100)
174.507 ms (2 allocs: 762.940 MiB, 24.95% gc time)

julia> using BenchmarkTools

julia> @b rand(100, 10000, 100)
180.840 ms (2 allocs: 762.940 MiB, 24.67% gc time)

julia> @b rand(100, 10000, 100)
172.184 ms (2 allocs: 762.940 MiB, 24.11% gc time)

julia> @btime rand(100, 10000, 100);
  123.355 ms (2 allocations: 762.94 MiB)

julia> @b rand(100, 10000, 100)
126.622 ms (2 allocs: 762.940 MiB)

julia> @b rand(100, 10000, 100)
125.907 ms (2 allocs: 762.940 MiB)

Most strangly, It seems like happenning randomly. Sometime this trick won't work, too.
Also, the performance fluctuating randomly even not doing anything, sometime can reach 200% times comparing with base time given by best-performanced @b and @btime.
A manual way for gc configuration may help?

benchmark results oscillate

In this example, the timings returned by @b oscillate between two values, one of which agrees with @btime:

julia> using StaticArrays, BenchmarkTools, Chairmarks

julia> v = zero(SVector{14,Int16}); @btime $v == $v; @b $v == $v
  9.522 ns (0 allocations: 0 bytes)
16.110 ns

julia> v = zero(SVector{14,Int16}); @btime $v == $v; @b $v == $v
  9.510 ns (0 allocations: 0 bytes)
9.763 ns

julia> v = zero(SVector{14,Int16}); @btime $v == $v; @b $v == $v
  9.515 ns (0 allocations: 0 bytes)
16.536 ns

julia> v = zero(SVector{14,Int16}); @btime $v == $v; @b $v == $v
  9.517 ns (0 allocations: 0 bytes)
9.764 ns

I only get this on my laptop, not on another machine. versioninfo() for my laptop:

Julia Version 1.10.0
Commit 3120989f39b (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × Intel(R) Core(TM) i3-10110U CPU @ 2.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake)
  Threads: 1 on 4 virtual cores

Packages:

Status `/tmp/jl_AUb4oc/Project.toml`
  [6e4b80f9] BenchmarkTools v1.5.0
  [0ca39b1e] Chairmarks v1.1.2
  [90137ffa] StaticArrays v1.9.3

Addition: For @btime I once got the higher number when I ran the benchmark for the first time:

julia> v = zero(SVector{14,Int16}); @btime $v == $v; @b $v == $v
  16.658 ns (0 allocations: 0 bytes)
17.701 ns

but I cannot reproduce this.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.