ecraven / r7rs-benchmarks Goto Github PK

Benchmarks for various Scheme implementations. Taken with kind permission from the Larceny project, based on the Gabriel and Gambit benchmarks.

Makefile 0.02% Shell 1.77% TeX 3.41% Scheme 64.63% HTML 28.27% JavaScript 1.90%

r7rs-benchmarks's People

Contributors

Stargazers

Watchers

r7rs-benchmarks's Issues

CPU limit penalizes parallel GC

The CPU limit in the bench script is 300 seconds of CPU time (5 minutes). Compilation has 5 minutes, then running has 5 minutes. But unfortunately this penalizes parallelization, whose goal is usually to minimize wall-clock time rather than CPU time; and the end result of the benchmarking process is wall-clock time. Of course this result is worse the more cores you have.

I think Guile has hit this limit in ctak at run-time, and now that some optimizations allow for better visibility into compile, we hit it there at compile-time as well.

I think the limit should be raised to 15 minutes. It's a decent trade-off between the concern of wanting short benchmark runs and allowing for parallelism in GC.

Where is graph.scm?

The Makefile references graph.scm, but it doesn't seem to be committed. Is it available?

Compute more repetitions for chudnovsky

chudnovsky is a bignum benchmark, as is pi---they both compute up to 500 digits of $\pi$.

Some chudnovsky benchmark times are so small that one can't have confidence in their accuracy (compared to startup times, for example).

One could increase computation times by increasing the number of computed digits of $\pi$ (currently limited to 500) or by increasing the number of times each length of $\pi$ is computed (currently twice).

I think a general benchmark suite should test performance on smallish bignums, as they are more useful in practice, so I would recommend increasing the number of repetitions instead of increasing the size of the bignums involved.

So here's a proposed diff:

diff --git a/inputs/chudnovsky.input b/inputs/chudnovsky.input
index c310e06..46b9135 100644
--- a/inputs/chudnovsky.input
+++ b/inputs/chudnovsky.input
@@ -1,4 +1,4 @@
-2
+20
 50
 500
 50

Brad

Missing LICENSE

Pleaze...

r7rs compatibility libraries for chez

Thanks for maintaining this set of benchmarks.

In the 'bench' script, the chez command refers to '--libdirs /home/nex/scheme/chez'

Could you please describe how you set up the compatibility libraries for chez?

Thanks,
Jan Erik

Add support for S7 scheme

It would be interesting to have S7 scheme in the benchmark. It is a interpreted scheme, successor of Tinyscheme, but much faster. S7 has a compatibility layer for r7rs, see below a possible prelude.

I've performed some preliminary tests on my machine (MacBookAir 2019) and the results are the following (comparison with Guile 1.8 and 3.0.4).

test	S7	Guile1.8.8	Guile3.0.4
browse	24.27	80.32	12.060597
deriv	25.194	61.39	18.581995
destruc	52.077	TIMELIM	7.143701
diviter	9.685	77.85	15.453743
divrec	11.803	78.55	17.41294
puzzle	27.716	191.35	18.086531
triangl	33.931	98.16	8.519252
tak	12.925	134.2	4.757643
takl	20.968	TIMELIM	9.456034
ntakl	17.073	TIMELIM	9.516082
cpstak	103.358	221.03	59.444873
ctak	44.139	TIMELIM	TIMELIM
fib	10.218	195.78	12.090909
fibc	25.799	TIMELIM	TIMELIM
fibfp	1.885	45.98	22.001634
sum	6.637	281.63	6.866215
sumfp	2.499	105.1	42.058511
fft	32.198	TIMELIM	7.685201
mbrot	24.403	TIMELIM	50.086067
mbrotZ	18.556	TIMELIM	67.011491
nucleic	19.946	67.46	15.347245
pi	NO	TIMELIM	0.564552
pnpoly	17.981	TIMELIM	24.886723
ray	20.455	TIMELIM	18.51229
simplex	46.344	TIMELIM	13.895531
ack	10.572	TIMELIM	8.413945
array1	11.483	160.88	9.241778
string	1.714	1.82	1.872806
sum1	0.47	1.63	4.427402
cat	1.187	TIMELIM	28.396944
tail	1.188	TIMELIM	9.821691
wc	8.266	57.91	16.963138
read1	406	0.95	5.804979
compiler	41.155	TIMELIM	5.149011
conform	51.031	TIMELIM	10.508732
dynamic	22.736	69.58	7.374259
earley	TIMELIM	TIMELIM	9.489885
graphs	127.611	TIMELIM	23.026826
lattice	139.275	292.7	15.937364
matrix	72.073	TIMELIM	9.881781
maze	23.258	TIMELIM	4.70391
mazefun	19.51	129.61	9.664338
nqueens	55.11	TIMELIM	19.372148
paraffins	31.424	TIMELIM	4.24542
parsing	39.443	TIMELIM	10.687959
peval	29.677	98.91	15.644764
primes	7.73	39.33	7.521318
quicksort	93.996	TIMELIM	13.252736
scheme	71.462	TIMELIM	15.142413
slatex	32.069	48.96	45.047143
chudnovski	NO	TIMELIM	0.306648
nboyer	39.274	151.42	5.10214
sboyer	31.537	168.81	4.755798
gcbench	20.54	TIMELIM	3.511493
mperm	173.33	TIMELIM	10.650118
equal	781	TIMELIM	TIMELIM
bv2string	10.782	TIMELIM	4.489627

chudnovski and pi fails but should be easy to arrange for that.

This is the s7.prelude I'm using

(define (this-scheme-implementation-name) "s7")
(define exact-integer? integer?)        
(define (exact-integer-sqrt i) (let ((sq (floor (sqrt i)))) (values sq (- i (* sq sq)))))
(define inexact exact->inexact)
(define exact inexact->exact)
(define (square x) (* x x))
(define (vector-map f v) (copy v)) ; for quicksort.scm
(define-macro (import . args) #f)
(define (jiffies-per-second) 1000)
(define (current-jiffy) (round (* (jiffies-per-second) (*s7* 'cpu-time))))
(define (current-second) (floor (*s7* 'cpu-time)))

(define read-u8 read-byte)
(define write-u8 write-byte) 
(define u8-ready? char-ready?) 
(define peek-u8 peek-char)
(define* (utf8->string v (start 0) end) 
  (if (string? v)
      v
      (substring (byte-vector->string v) start (or end (length v)))))
(define* (string->utf8 s (start 0) end) 
  (if (byte-vector? s)
      s
      (string->byte-vector (utf8->string s start end))))
(define write-simple write)


(define* (string->vector s (start 0) end)
  (let ((stop (or end (length s)))) 
    (copy s (make-vector (- stop start)) start stop)))

(define vector-copy string->vector)
(define* (vector-copy! dest at src (start 0) end) ; end is exclusive
  (let ((len (or end (length src))))
    (if (or (not (eq? dest src))
            (<= at start))
        (do ((i at (+ i 1))
             (k start (+ k 1)))
            ((= k len) dest)
          (set! (dest i) (src k)))
        (do ((i (- (+ at len) start 1) (- i 1))
             (k (- len 1) (- k 1)))
            ((< k start) dest)
          (set! (dest i) (src k))))))

(define make-bytevector make-byte-vector)
(define bytevector-ref byte-vector-ref)
(define bytevector-set! byte-vector-set!)
(define bytevector-copy! vector-copy!)
(define bytevector-u8-ref byte-vector-ref)
(define bytevector-u8-set! byte-vector-set!)

;; records
(define-macro (define-record-type type make ? . fields)
  (let ((obj (gensym))
        (args (map (lambda (field)
                     (values (list 'quote (car field))
                             (let ((par (memq (car field) (cdr make))))
                               (if (pair? par) (car par) #f))))
                   fields)))
    `(begin
       (define (,? ,obj)
         (and (let? ,obj)
              (eq? (let-ref ,obj 'type) ',type)))
       
       (define ,make 
         (inlet 'type ',type ,@args))

       ,@(map
          (lambda (field)
            (when (pair? field)
              (if (null? (cdr field))
                  (values)
                  (if (null? (cddr field))
                      `(define (,(cadr field) ,obj)
                         (let-ref ,obj ',(car field)))
                      `(begin
                         (define (,(cadr field) ,obj)
                           (let-ref ,obj ',(car field)))
                         (define (,(caddr field) ,obj val)
                           (let-set! ,obj ',(car field) val)))))))
          fields)
       ',type)))

chez: precompile before execution

I think for a fairer comparison, chez benchmark should precompile the program and compatibility libraries before execution, as is done with gambit.

Add a few reference implementation in other language.

Well, I know this might be off-topic, but I really want to see the comparison of some ((Schemes (like chez)) who claim to have comparable performance with static-compiled language like c) with c. I did some unscientific benchmark and chez is slower of 2x ~ 10x, racket is of course even slower.. but they are unscientific.
I've seen on Reddit and StackOverflow that people want some benchmark between scheme and c. Adding just c implementations for them of course require works, but not so much. If this is considered helpful maybe I'll start translating some of the benchmarks into c.

Update Ypsilon to 2.x

Ypsilon is now developed at https://github.com/fujita-y/ypsilon and it becomes a LLVM-based R6RS/R7RS compiler.

The author claims that its performance is comparable to Guile 3.x on his machine. It would be great if Ypsilon could come back to the benchmarking matrix officially.

Should safe and unsafe results be consistently marked and/or separated?

There are two entries for Chez, effectively safe and unsafe. The fact that other entries are unlabelled makes it easy to assume that they're all "safe"—but, as the Gambit prelude shows, some of them are actually "unsafe".

mperm run failures

Looks like the mperm benchmark is failing on all Schemes. I checked on both Chez and Racket, and it appears the function run-benchmark is defined twice at the end of the file.

racket run is dead

I was having the same problems until I did the following:

% raco pkg uninstall r7rs
% raco pkg install -i r7rs

now things (fib* so far) run for me.

Try this GambitC-prelude.scm

It allows more programs to actually compile and run:

https://gist.github.com/gambiteer/03f2d16f3ca6e76489da70b4bed71984

Implementation names in top-9 graph

The implementation names in the first two graphs of https://ecraven.github.io/r7rs-benchmarks/ seem to be consistent, except for the strange combined entry "gambit/gerbil" in the top-9 graph. Can this be split?

Chicken 5

Hello!
Version 5 of Chicken scheme was released in november 2018. Do you mind bumping version used in the benchmark?

No license file

I have some patches, but am not allowed to contribute them since there is no license file.

Can I send you a PR to add one? Which license would you want?

Compare CFFI overhead

Here is a project doing something similar https://github.com/dyu/ffi-overhead

Chicken Questions

I was wondering about the reasoning for the current Chicken flags (in particular C5). For example, according to the wiki, -O2 already includes -optimize-leaf-routines and -inline, so specifying these seems redundant.

I was also wondering if there's any particular reason not to compile with -O3, or even -C -O3, which passes the -O3 flag to the C compiler. I'm not entirely certain if or how these would work or break existing tests, the nature of the question is more exploratory.

Consistently set initial heap sizes

Quite a number of these tests have a high garbage collection component. It's well known that allocation-heavy benchmarks will run faster with larger heap sizes, and different implementations may be tuned to different heap sizes relative to live data. For consistency, it would be good to tune all implementations to have the same heap size for each benchmark -- i.e. for each benchmark, determine the minimum heap size at which the benchmark runs on any implementation, and then run all implementations at, say, 2.5x that heap size.

For Guile you can do this by setting the GC_INITIAL_HEAP_SIZE and GC_MAXIMUM_HEAP_SIZE environment variables. Like, let's say you want to determine the minimum heap size for chudnovsky; then you do GC_INITIAL_HEAP_SIZE=3m GC_MAXIMUM_HEAP_SIZE=3m ./bench guile chudnovsky to try at 3 megabytes, and you vary the 3m until you find a heap size at which the benchmark doesn't run. You record that size for chudnovsky, then do it for all the others. For chudnovsky for example I find it to be 2700k or so. So let's say we run at 2.5 heap size, then then when running the tests you do GC_INITIAL_HEAP_SIZE=6750k GC_MAXIMUM_HEAP_SIZE=6750k ./bench guile chudnovsky. But, better to set GC_INITIAL_HEAP_SIZE only when running the compiled artifact and not the compiler!

Anyway, a thought, just if you were interested :) I will probably do this for Guile at some point for our internal benchmarks.

Is total-accumulated-runtime unintentionally misleading?

total-accumulated-runtime shows Chez with a small lead over Gambit, and Gambit with a small lead over Larceny. But tests-finished shows that Chez is only running 45 benchmarks, while Gambit is running 51, and Larceny is running 54. Are Gambit and Larceny taking a hit in that first graph just because they're doing more work?

PS Really, really love this!

add marks for racket 7.1 and racket 6.12

can you please add benchmarks for

racket 7.1
racket 6.12

marking version 6.12 is interesting
cos in 7.0 racket switched to chez scheme 'backend'

adding clojure is out of scope, i guess?

updates to match larceny versions of benchmarks

Will Clinger has updated his versions of a few of these benchmarks, see

http://www.larcenists.org/benchmarksAboutR7.html

This patch for fft.scm and paraffins.scm gets your versions to match his.

Brad
patch.txt

Gambit prelude addition for bv2string

firefly:~/programs/r7rs-benchmarks> git diff src/GambitC-prelude.scm
diff --git a/src/GambitC-prelude.scm b/src/GambitC-prelude.scm
index 51ffde0..73f9cdd 100644
--- a/src/GambitC-prelude.scm
+++ b/src/GambitC-prelude.scm
@@ -35,6 +35,23 @@
 (define write-string write)
 
 (define (this-scheme-implementation-name) (string-append "gambitc-" (system-version-string)))
+
+(define (string->utf8 s)
+  (with-output-to-u8vector
+   '()
+   (lambda ()
+     (display s))))
+
+(define (utf8->string v)
+  (call-with-input-u8vector
+   v
+   (lambda (p)
+     (list->string (read-all p read-char)))))
+
+(define make-bytevector make-u8vector)
+
+(define bytevector-u8-set! u8vector-set!)
+
 ;; TODO: load syntax-case here, to get syntax-rules.
 ;; google says (load "~~/syntax-case"), but that doesn't work on my machine :-/

I'm not sure that these lines in the femtolisp prelude:

+(define utf8->string identity)
+(define string->utf8 identity)

are really true---are utf8-encoded strings really just byte-vectors in femtolisp? It does make the benchmark to faster, though!

racket 8.7 benchmark please ?

Avoid unsafe optimizations

In PR #15, a change was made to ensure all implementations were run in "safe mode". However this was reverted in b196000. Currently the benchmarks compare safe and unsafe implementations. What's the goal here?

My expectation would be that all Schemes should be compiled in such a way that they don't use unsafe optimizations.

pi.scm

r7rs has exact-integer-sqrt.

In Gambit, if you replace square-root with integer-sqrt, and quartic root with (lambda (x) (integer-sqrt (integer-sqrt x)) then the results on my machine go from

+!CSVLINE!+gambitc-v4.8.5,pi:50:500:50:2,.7766382694244385

+!CSVLINE!+gambitc-v4.8.5,pi:50:500:50:2,.03497314453125

I.e., it's 20 times as fast. Perhaps it would be a better benchmark if a similar replacement were made. (In other words, perhaps you're spending most of the time in inefficient implementations of width and root, which may not be what you think the CPU is spending its time on.)

add startup time benchmark

It would be nice to see the startup time of each scheme. Those that take very long to run can't be considered for writing unix-like one-shot tools like cat or grep. From my own trials some take extremely long to start.

Gambit compiling "compiler" seems to tickle C toolchain bug

In results.GambitC you find

Testing compiler under GambitC
Including prelude /home/nex/src/r7rs-benchmarks/src/GambitC-prelude.scm
Compiling...
gambitc_comp /tmp/larcenous/GambitC/compiler.scm /tmp/larcenous/GambitC/compiler.exe
{standard input}: Assembler messages:
{standard input}:5355: Warning: end of file not at end of a line; newline inserted
{standard input}:6227: Error: no such instruction: `mo'
{standard input}: Error: open CFI at the end of file; missing .cfi_endproc directive
gcc: internal compiler error: Killed (program cc1)
Please submit a full bug report,
with preprocessed source if appropriate.
See <https://bugs.archlinux.org/> for instructions.
+!CSVLINE!+gambitc,compiler,COMPILEERROR

So it appears that it's not a Gambit bug per se, but rather a problem compiling the C file that Gambit produces.

I looked for this because I had no problem with running compiler on my own Ubuntu box. I don't know why your setup has this problem.

Brad

ecraven / r7rs-benchmarks Goto Github PK

r7rs-benchmarks's People

Contributors

Stargazers

Watchers

Forkers

r7rs-benchmarks's Issues

Recommend Projects

Recommend Topics

Recommend Org