ecraven / r7rs-benchmarks Goto Github PK
View Code? Open in Web Editor NEWBenchmarks for various Scheme implementations. Taken with kind permission from the Larceny project, based on the Gabriel and Gambit benchmarks.
Benchmarks for various Scheme implementations. Taken with kind permission from the Larceny project, based on the Gabriel and Gambit benchmarks.
The CPU limit in the bench
script is 300 seconds of CPU time (5 minutes). Compilation has 5 minutes, then running has 5 minutes. But unfortunately this penalizes parallelization, whose goal is usually to minimize wall-clock time rather than CPU time; and the end result of the benchmarking process is wall-clock time. Of course this result is worse the more cores you have.
I think Guile has hit this limit in ctak
at run-time, and now that some optimizations allow for better visibility into compile
, we hit it there at compile-time as well.
I think the limit should be raised to 15 minutes. It's a decent trade-off between the concern of wanting short benchmark runs and allowing for parallelism in GC.
The Makefile references graph.scm, but it doesn't seem to be committed. Is it available?
chudnovsky
is a bignum benchmark, as is pi
---they both compute up to 500 digits of
Some chudnovsky
benchmark times are so small that one can't have confidence in their accuracy (compared to startup times, for example).
One could increase computation times by increasing the number of computed digits of
I think a general benchmark suite should test performance on smallish bignums, as they are more useful in practice, so I would recommend increasing the number of repetitions instead of increasing the size of the bignums involved.
So here's a proposed diff:
diff --git a/inputs/chudnovsky.input b/inputs/chudnovsky.input
index c310e06..46b9135 100644
--- a/inputs/chudnovsky.input
+++ b/inputs/chudnovsky.input
@@ -1,4 +1,4 @@
-2
+20
50
500
50
Brad
Pleaze...
Thanks for maintaining this set of benchmarks.
In the 'bench' script, the chez command refers to '--libdirs /home/nex/scheme/chez'
Could you please describe how you set up the compatibility libraries for chez?
Thanks,
Jan Erik
It would be interesting to have S7 scheme in the benchmark. It is a interpreted scheme, successor of Tinyscheme, but much faster. S7 has a compatibility layer for r7rs, see below a possible prelude.
I've performed some preliminary tests on my machine (MacBookAir 2019) and the results are the following (comparison with Guile 1.8 and 3.0.4).
test | S7 | Guile1.8.8 | Guile3.0.4 |
---|---|---|---|
browse | 24.27 | 80.32 | 12.060597 |
deriv | 25.194 | 61.39 | 18.581995 |
destruc | 52.077 | TIMELIM | 7.143701 |
diviter | 9.685 | 77.85 | 15.453743 |
divrec | 11.803 | 78.55 | 17.41294 |
puzzle | 27.716 | 191.35 | 18.086531 |
triangl | 33.931 | 98.16 | 8.519252 |
tak | 12.925 | 134.2 | 4.757643 |
takl | 20.968 | TIMELIM | 9.456034 |
ntakl | 17.073 | TIMELIM | 9.516082 |
cpstak | 103.358 | 221.03 | 59.444873 |
ctak | 44.139 | TIMELIM | TIMELIM |
fib | 10.218 | 195.78 | 12.090909 |
fibc | 25.799 | TIMELIM | TIMELIM |
fibfp | 1.885 | 45.98 | 22.001634 |
sum | 6.637 | 281.63 | 6.866215 |
sumfp | 2.499 | 105.1 | 42.058511 |
fft | 32.198 | TIMELIM | 7.685201 |
mbrot | 24.403 | TIMELIM | 50.086067 |
mbrotZ | 18.556 | TIMELIM | 67.011491 |
nucleic | 19.946 | 67.46 | 15.347245 |
pi | NO | TIMELIM | 0.564552 |
pnpoly | 17.981 | TIMELIM | 24.886723 |
ray | 20.455 | TIMELIM | 18.51229 |
simplex | 46.344 | TIMELIM | 13.895531 |
ack | 10.572 | TIMELIM | 8.413945 |
array1 | 11.483 | 160.88 | 9.241778 |
string | 1.714 | 1.82 | 1.872806 |
sum1 | 0.47 | 1.63 | 4.427402 |
cat | 1.187 | TIMELIM | 28.396944 |
tail | 1.188 | TIMELIM | 9.821691 |
wc | 8.266 | 57.91 | 16.963138 |
read1 | 406 | 0.95 | 5.804979 |
compiler | 41.155 | TIMELIM | 5.149011 |
conform | 51.031 | TIMELIM | 10.508732 |
dynamic | 22.736 | 69.58 | 7.374259 |
earley | TIMELIM | TIMELIM | 9.489885 |
graphs | 127.611 | TIMELIM | 23.026826 |
lattice | 139.275 | 292.7 | 15.937364 |
matrix | 72.073 | TIMELIM | 9.881781 |
maze | 23.258 | TIMELIM | 4.70391 |
mazefun | 19.51 | 129.61 | 9.664338 |
nqueens | 55.11 | TIMELIM | 19.372148 |
paraffins | 31.424 | TIMELIM | 4.24542 |
parsing | 39.443 | TIMELIM | 10.687959 |
peval | 29.677 | 98.91 | 15.644764 |
primes | 7.73 | 39.33 | 7.521318 |
quicksort | 93.996 | TIMELIM | 13.252736 |
scheme | 71.462 | TIMELIM | 15.142413 |
slatex | 32.069 | 48.96 | 45.047143 |
chudnovski | NO | TIMELIM | 0.306648 |
nboyer | 39.274 | 151.42 | 5.10214 |
sboyer | 31.537 | 168.81 | 4.755798 |
gcbench | 20.54 | TIMELIM | 3.511493 |
mperm | 173.33 | TIMELIM | 10.650118 |
equal | 781 | TIMELIM | TIMELIM |
bv2string | 10.782 | TIMELIM | 4.489627 |
chudnovski
and pi
fails but should be easy to arrange for that.
This is the s7.prelude
I'm using
(define (this-scheme-implementation-name) "s7")
(define exact-integer? integer?)
(define (exact-integer-sqrt i) (let ((sq (floor (sqrt i)))) (values sq (- i (* sq sq)))))
(define inexact exact->inexact)
(define exact inexact->exact)
(define (square x) (* x x))
(define (vector-map f v) (copy v)) ; for quicksort.scm
(define-macro (import . args) #f)
(define (jiffies-per-second) 1000)
(define (current-jiffy) (round (* (jiffies-per-second) (*s7* 'cpu-time))))
(define (current-second) (floor (*s7* 'cpu-time)))
(define read-u8 read-byte)
(define write-u8 write-byte)
(define u8-ready? char-ready?)
(define peek-u8 peek-char)
(define* (utf8->string v (start 0) end)
(if (string? v)
v
(substring (byte-vector->string v) start (or end (length v)))))
(define* (string->utf8 s (start 0) end)
(if (byte-vector? s)
s
(string->byte-vector (utf8->string s start end))))
(define write-simple write)
(define* (string->vector s (start 0) end)
(let ((stop (or end (length s))))
(copy s (make-vector (- stop start)) start stop)))
(define vector-copy string->vector)
(define* (vector-copy! dest at src (start 0) end) ; end is exclusive
(let ((len (or end (length src))))
(if (or (not (eq? dest src))
(<= at start))
(do ((i at (+ i 1))
(k start (+ k 1)))
((= k len) dest)
(set! (dest i) (src k)))
(do ((i (- (+ at len) start 1) (- i 1))
(k (- len 1) (- k 1)))
((< k start) dest)
(set! (dest i) (src k))))))
(define make-bytevector make-byte-vector)
(define bytevector-ref byte-vector-ref)
(define bytevector-set! byte-vector-set!)
(define bytevector-copy! vector-copy!)
(define bytevector-u8-ref byte-vector-ref)
(define bytevector-u8-set! byte-vector-set!)
;; records
(define-macro (define-record-type type make ? . fields)
(let ((obj (gensym))
(args (map (lambda (field)
(values (list 'quote (car field))
(let ((par (memq (car field) (cdr make))))
(if (pair? par) (car par) #f))))
fields)))
`(begin
(define (,? ,obj)
(and (let? ,obj)
(eq? (let-ref ,obj 'type) ',type)))
(define ,make
(inlet 'type ',type ,@args))
,@(map
(lambda (field)
(when (pair? field)
(if (null? (cdr field))
(values)
(if (null? (cddr field))
`(define (,(cadr field) ,obj)
(let-ref ,obj ',(car field)))
`(begin
(define (,(cadr field) ,obj)
(let-ref ,obj ',(car field)))
(define (,(caddr field) ,obj val)
(let-set! ,obj ',(car field) val)))))))
fields)
',type)))
I think for a fairer comparison, chez benchmark should precompile the program and compatibility libraries before execution, as is done with gambit.
Well, I know this might be off-topic, but I really want to see the comparison of some ((Schemes (like chez)) who claim to have comparable performance with static-compiled language like c) with c. I did some unscientific benchmark and chez is slower of 2x ~ 10x, racket is of course even slower.. but they are unscientific.
I've seen on Reddit and StackOverflow that people want some benchmark between scheme and c. Adding just c implementations for them of course require works, but not so much. If this is considered helpful maybe I'll start translating some of the benchmarks into c.
Ypsilon is now developed at https://github.com/fujita-y/ypsilon and it becomes a LLVM-based R6RS/R7RS compiler.
The author claims that its performance is comparable to Guile 3.x on his machine. It would be great if Ypsilon could come back to the benchmarking matrix officially.
There are two entries for Chez, effectively safe and unsafe. The fact that other entries are unlabelled makes it easy to assume that they're all "safe"—but, as the Gambit prelude shows, some of them are actually "unsafe".
Looks like the mperm
benchmark is failing on all Schemes. I checked on both Chez and Racket, and it appears the function run-benchmark
is defined twice at the end of the file.
I was having the same problems until I did the following:
% raco pkg uninstall r7rs
% raco pkg install -i r7rs
now things (fib* so far) run for me.
It allows more programs to actually compile and run:
https://gist.github.com/gambiteer/03f2d16f3ca6e76489da70b4bed71984
The implementation names in the first two graphs of https://ecraven.github.io/r7rs-benchmarks/ seem to be consistent, except for the strange combined entry "gambit/gerbil" in the top-9 graph. Can this be split?
Hello!
Version 5 of Chicken scheme was released in november 2018. Do you mind bumping version used in the benchmark?
I have some patches, but am not allowed to contribute them since there is no license file.
Can I send you a PR to add one? Which license would you want?
Here is a project doing something similar https://github.com/dyu/ffi-overhead
I was wondering about the reasoning for the current Chicken flags (in particular C5). For example, according to the wiki, -O2
already includes -optimize-leaf-routines
and -inline
, so specifying these seems redundant.
I was also wondering if there's any particular reason not to compile with -O3
, or even -C -O3
, which passes the -O3
flag to the C compiler. I'm not entirely certain if or how these would work or break existing tests, the nature of the question is more exploratory.
Quite a number of these tests have a high garbage collection component. It's well known that allocation-heavy benchmarks will run faster with larger heap sizes, and different implementations may be tuned to different heap sizes relative to live data. For consistency, it would be good to tune all implementations to have the same heap size for each benchmark -- i.e. for each benchmark, determine the minimum heap size at which the benchmark runs on any implementation, and then run all implementations at, say, 2.5x that heap size.
For Guile you can do this by setting the GC_INITIAL_HEAP_SIZE
and GC_MAXIMUM_HEAP_SIZE
environment variables. Like, let's say you want to determine the minimum heap size for chudnovsky; then you do GC_INITIAL_HEAP_SIZE=3m GC_MAXIMUM_HEAP_SIZE=3m ./bench guile chudnovsky
to try at 3 megabytes, and you vary the 3m until you find a heap size at which the benchmark doesn't run. You record that size for chudnovsky, then do it for all the others. For chudnovsky for example I find it to be 2700k or so. So let's say we run at 2.5 heap size, then then when running the tests you do GC_INITIAL_HEAP_SIZE=6750k GC_MAXIMUM_HEAP_SIZE=6750k ./bench guile chudnovsky
. But, better to set GC_INITIAL_HEAP_SIZE only when running the compiled artifact and not the compiler!
Anyway, a thought, just if you were interested :) I will probably do this for Guile at some point for our internal benchmarks.
total-accumulated-runtime
shows Chez with a small lead over Gambit, and Gambit with a small lead over Larceny. But tests-finished
shows that Chez is only running 45 benchmarks, while Gambit is running 51, and Larceny is running 54. Are Gambit and Larceny taking a hit in that first graph just because they're doing more work?
PS Really, really love this!
can you please add benchmarks for
marking version 6.12 is interesting
cos in 7.0 racket switched to chez scheme 'backend'
adding clojure is out of scope, i guess?
Will Clinger has updated his versions of a few of these benchmarks, see
http://www.larcenists.org/benchmarksAboutR7.html
This patch for fft.scm and paraffins.scm gets your versions to match his.
Brad
patch.txt
firefly:~/programs/r7rs-benchmarks> git diff src/GambitC-prelude.scm
diff --git a/src/GambitC-prelude.scm b/src/GambitC-prelude.scm
index 51ffde0..73f9cdd 100644
--- a/src/GambitC-prelude.scm
+++ b/src/GambitC-prelude.scm
@@ -35,6 +35,23 @@
(define write-string write)
(define (this-scheme-implementation-name) (string-append "gambitc-" (system-version-string)))
+
+(define (string->utf8 s)
+ (with-output-to-u8vector
+ '()
+ (lambda ()
+ (display s))))
+
+(define (utf8->string v)
+ (call-with-input-u8vector
+ v
+ (lambda (p)
+ (list->string (read-all p read-char)))))
+
+(define make-bytevector make-u8vector)
+
+(define bytevector-u8-set! u8vector-set!)
+
;; TODO: load syntax-case here, to get syntax-rules.
;; google says (load "~~/syntax-case"), but that doesn't work on my machine :-/
I'm not sure that these lines in the femtolisp prelude:
+(define utf8->string identity)
+(define string->utf8 identity)
are really true---are utf8-encoded strings really just byte-vectors in femtolisp? It does make the benchmark to faster, though!
In PR #15, a change was made to ensure all implementations were run in "safe mode". However this was reverted in b196000. Currently the benchmarks compare safe and unsafe implementations. What's the goal here?
My expectation would be that all Schemes should be compiled in such a way that they don't use unsafe optimizations.
r7rs has exact-integer-sqrt.
In Gambit, if you replace square-root with integer-sqrt, and quartic root with (lambda (x) (integer-sqrt (integer-sqrt x)) then the results on my machine go from
+!CSVLINE!+gambitc-v4.8.5,pi:50:500:50:2,.7766382694244385
to
+!CSVLINE!+gambitc-v4.8.5,pi:50:500:50:2,.03497314453125
I.e., it's 20 times as fast. Perhaps it would be a better benchmark if a similar replacement were made. (In other words, perhaps you're spending most of the time in inefficient implementations of width and root, which may not be what you think the CPU is spending its time on.)
It would be nice to see the startup time of each scheme. Those that take very long to run can't be considered for writing unix-like one-shot tools like cat or grep. From my own trials some take extremely long to start.
In results.GambitC you find
Testing compiler under GambitC
Including prelude /home/nex/src/r7rs-benchmarks/src/GambitC-prelude.scm
Compiling...
gambitc_comp /tmp/larcenous/GambitC/compiler.scm /tmp/larcenous/GambitC/compiler.exe
{standard input}: Assembler messages:
{standard input}:5355: Warning: end of file not at end of a line; newline inserted
{standard input}:6227: Error: no such instruction: `mo'
{standard input}: Error: open CFI at the end of file; missing .cfi_endproc directive
gcc: internal compiler error: Killed (program cc1)
Please submit a full bug report,
with preprocessed source if appropriate.
See <https://bugs.archlinux.org/> for instructions.
+!CSVLINE!+gambitc,compiler,COMPILEERROR
So it appears that it's not a Gambit bug per se, but rather a problem compiling the C file that Gambit produces.
I looked for this because I had no problem with running compiler
on my own Ubuntu box. I don't know why your setup has this problem.
Brad
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.