I can't get this to run in a container. benchmarkme fa

Poking <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

Thanks for coming over here, <a class="user-mention notranslate" data-hovercard-type="

benchmarkme fails about mkl4deb HOT 16 CLOSED

eduardszoecs commented on July 24, 2024

benchmarkme fails

from mkl4deb.

Comments (16)

dmbates commented on July 24, 2024 1

"picky" is in the mind of the beholder. I can't remember the details on the distribution of the eigenvalues of a matrix constructed in this way but it is extremely likely that a 2500 x 2500 cross-product matrix will be computationally singular. That's why solving such very large systems without any regularization is difficult.

from mkl4deb.

eddelbuettel commented on July 24, 2024

It's a good idea to try benchmarkme on this! I did nothing more than you -- fire up the script (I posted). If there is an issue, it is probably an MKL issue.

And I seem to get the same problem:

> benchmark_std(runs=1)
# Programming benchmarks (5 tests):
        3,500,000 Fibonacci numbers calculation (vector calc): 0.437 (sec).
        Grand common divisors of 1,000,000 pairs (recursion): 0.645 (sec).
        Creation of a 3500x3500 Hilbert matrix (matrix calc): 0.122 (sec).
        Creation of a 3000x3000 Toeplitz matrix (loops): 7.09 (sec).
        Escoufier's method on a 60x60 matrix (mixed): 0.559 (sec).
# Matrix calculation benchmarks (5 tests):
        Creation, transp., deformation of a 5000x5000 matrix: 0.435 (sec).
        2500x2500 normal distributed random matrix ^1000: 0.36 (sec).
        Sorting of 7,000,000 random values: 0.53 (sec).
        2500x2500 cross-product matrix (b = a' * a): 0.329 (sec).
Error in solve(crossprod(a), crossprod(a, b)) : 
  the leading minor of order 1404 is not positive definite
Timing stopped at: 1.037 0.009 0.175

@csgillespie Any ideas? That benchmark does not even seem influenced by set.seed().

from mkl4deb.

eddelbuettel commented on July 24, 2024

Poking @csgillespie as first version of previous post had a typo with a # where a @ was needed...

from mkl4deb.

eddelbuettel commented on July 24, 2024

And I did notice that we can switch to RcppZiggurat if installed (which I did). I then get

        2500x2500 cross-product matrix (b = a' * a): 0.324 (sec).
Error in solve(crossprod(a), crossprod(a, b)) : 
  the leading minor of order 1566 is not positive definite
Timing stopped at: 1.166 0.028 0.199
>

so it looks like the MKL is very picky. That is not really an issue for this repo though. The script to install the MKL works...

from mkl4deb.

eddelbuettel commented on July 24, 2024

Thanks so much for piping in. I was thinking about bugging you :) The actual routine appears to be
this function using a dgeMatrix from your Matrix package. What is weird that this is probably "old" code from the benchmark package originally put together by Simon. I am surprised this only bubbles up with MKL though/

from mkl4deb.

eduardszoecs commented on July 24, 2024

Hmm, either a problem with the benchmark or mkl. What do Intel folks say (e.g. @emfomenk)? Are alternative benchmarks available ?

from mkl4deb.

eddelbuettel commented on July 24, 2024

As Doug said, a 2000x2000 matrix crossproduct may well be singular.

from mkl4deb.

emfomenk commented on July 24, 2024

Hi everyone,

Not sure I know what exactly you call from MKL, but it might happen we have a bug :) But if this is only matrix-matrix multiplication I really doubt the bug is in MKL itself...
Few things to check:

@eddelbuettel, what is threading model for MKL? MKL supports sequential, openmp and tbb runtimes. If OpenMP is used than there are 2 runtimes: GNU (libgomp.so) and Intel (libiomp5.so). You should not mix two runtimes in one application. If benchmarkme uses GNU OpenMP, you really want to either set MKL_THREADING_LAYER=GNU, to make mkl_rt pick gnu threading or do LD_PRELOAD=libiomp5.so (LD_LIBRARY_PATH should contain the path with libiomp5, that lives somewhere in /opt/intel). In former case libiomp5 would cover both runtimes.
you can set MKL_VERBOSE=1 to see what functions are called from MKL. that might help at least to understand what are the shapes and sizes you are using. Not sure that would immediately give me an idea what goes wrong.

Anyway, standalone reproducer would be super helpful for me to check on my side.

from mkl4deb.

eddelbuettel commented on July 24, 2024

Thanks for coming over here, @emfomenk.

This repo 'merely' contains a script adding MKL to a .deb-based Debian or Ubuntu system. You can see in the rather short script what we do for ldconf: not much.

The particular failing function from the benchmarkme package is here -- the solve() after the two crossprod goes funny. Adding MKL_VERBOSE to R session I have for this (in Docker) made no difference.

ldd shows nothing pertaining to OpenMP but it may well dispatched by libmkl_rt.so:

root@c9f8062fbd93:~# ldd /opt/intel/mkl/lib/intel64/libmkl_rt.so
        linux-vdso.so.1 (0x00007ffc625f8000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f9876403000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f9876049000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f9876c84000)
root@c9f8062fbd93:~#

benchmarkme itself is just a set of R functions; R by default does not turn OpenMP on (but can).

Setting MKL_THREADING_LAYER=GNU before calling R made the difference:

root@c9f8062fbd93:~# MKL_THREADING_LAYER=GNU R

R version 3.4.4 (2018-03-15) -- "Someone to Lean On"              
Copyright (C) 2018 The R Foundation for Statistical Computing   
Platform: x86_64-pc-linux-gnu (64-bit)                         
 
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions. 
Type 'license()' or 'licence()' for distribution details.    

  Natural language support but running in an English locale    

R is a collaborative project with many contributors.    
Type 'contributors()' for more information and      
'citation()' on how to cite R or R packages in publications. 

Type 'demo()' for some demos, 'help()' for on-line help, or  
'help.start()' for an HTML browser interface to help.      
Type 'q()' to quit R. 

> library(benchmarkme)
See https://jumpingrivers.shinyapps.io/benchmarkme/ for a Shiny
interface to the benchmark data.
> benchmark_std(runs=1)
# Programming benchmarks (5 tests):
        3,500,000 Fibonacci numbers calculation (vector calc): 0.426 (sec).
        Grand common divisors of 1,000,000 pairs (recursion): 0.616 (sec).
        Creation of a 3500x3500 Hilbert matrix (matrix calc): 0.125 (sec).
        Creation of a 3000x3000 Toeplitz matrix (loops): 7.28 (sec).
        Escoufier's method on a 60x60 matrix (mixed): 0.572 (sec).
# Matrix calculation benchmarks (5 tests):
        Creation, transp., deformation of a 5000x5000 matrix: 0.43 (sec).
        2500x2500 normal distributed random matrix ^1000: 0.367 (sec).
        Sorting of 7,000,000 random values: 0.535 (sec).
        2500x2500 cross-product matrix (b = a' * a): 0.078 (sec).
        Linear regr. over a 3000x3000 matrix (c = a \ b'): 0.101 (sec).
# Matrix function benchmarks (5 tests):
        Cholesky decomposition of a 3000x3000 matrix: 0.113 (sec).
        Determinant of a 2500x2500 random matrix: 0.084 (sec).
        Eigenvalues of a 640x640 random matrix: 0.186 (sec).
        FFT over 2,500,000 random values: 0.199 (sec).
        Inverse of a 1600x1600 random matrix: 0.072 (sec).
    user system elapsed          test test_group cores
1  0.411  0.016   0.426           fib       prog     0
2  0.584  0.032   0.616           gcd       prog     0
3  0.097  0.028   0.125       hilbert       prog     0
4  7.275  0.008   7.283      toeplitz       prog     0
5  3.292  0.039   0.572     escoufier       prog     0
6  0.406  0.024   0.430         manip matrix_cal     0
7  0.367  0.000   0.367         power matrix_cal     0
8  0.515  0.020   0.535          sort matrix_cal     0
9  0.354  0.056   0.078 cross_product matrix_cal     0
10 0.353  0.024   0.101            lm matrix_cal     0
11 0.316  0.016   0.113      cholesky matrix_fun     0
12 0.308  0.040   0.084   determinant matrix_fun     0
13 1.075  0.004   0.186         eigen matrix_fun     0
14 0.191  0.008   0.199           fft matrix_fun     0
15 0.402  0.004   0.072       inverse matrix_fun     0
>

I will ask @csgillespie to add that environment variable, or to document it. I will document it here too.

Thanks again!

from mkl4deb.

csgillespie commented on July 24, 2024

I know I'm late to the party but two comments to wrap up:

As Dirk mentioned, I just used Simon's benchmarks (http://r.research.att.com/benchmarks/R-benchmark-25.R)
Be wary of comparing with historical benchmarks. Basically, the different versions of R uses the compiler package in different ways. I intend to update the package soon to make this clear.

from mkl4deb.

emfomenk commented on July 24, 2024

No problem :)

Hmm... It is really strange MKL_VERBOSE doesn't trigger any verbose output. Currently all the functions from BLAS, LAPACK, and DFT should dump the parameters to stdout when MKL_VERBOSE environment variable is set. Really weird.

Regarding ldd. MKL supports 3 linking modes:

static (libmkl_*.a)
dynamic (libmkl_rt.so), rt stands for load dependencies at RunTime
explicit dynamic (libmkl_*.so, except for libmkl_rt.so)

libmkl_rt.so loads required MKL layers at runtime. Depending on environment variables it might load C LP64/C ILP64/GNU Fortran LP64/GNU Fortran ILP64 interface, Intel OpenMP/GNU OpenMP/Intel TBB/sequential threading, and core library (architecture specific). For more information see this page.

If no environment variable is set the default configuration is C LP64 + Intel OpenMP threading layer. So at run-time MKL will load libiomp5.so and libmkl_intel_thread.so. If your application or its dependencies use GNU OpenMP (e.g. some parts of it is built with gcc and -fopenmp flags) then Linux dynamic linker will load GNU OpenMP RT as well. So at that moment your application will have 2 OpenMP run-times. That typically leads to numerical errors or even crashes.

from mkl4deb.

eddelbuettel commented on July 24, 2024

Splendid explanation. I am the one building this R version for Debian (and Ubuntu) and we definitely have that enable in this build so we surely need the env var to not load it again:

root@11acb27fced5:~# grep -i openmp /etc/R/Makeconf 
DYLIB_LDFLAGS = -shared -fopenmp# $(CFLAGS) $(CPICFLAGS)
MAIN_LDFLAGS = -Wl,--export-dynamic -fopenmp
SHLIB_OPENMP_CFLAGS = -fopenmp
SHLIB_OPENMP_CXXFLAGS = -fopenmp
SHLIB_OPENMP_FCFLAGS = -fopenmp
SHLIB_OPENMP_FFLAGS = -fopenmp
root@11acb27fced5:~#

So to recap MKL_THREADING_LAYER=GNU should protect us, correct?

from mkl4deb.

emfomenk commented on July 24, 2024

So to recap MKL_THREADING_LAYER=GNU should protect us, correct?

Yes, as long as the whole application uses either GNU OpenMP RT or no OpenMP RT at all.

from mkl4deb.

eduardszoecs commented on July 24, 2024

I can confirm that MKL_THREADING_LAYER=GNU works for me. Thanks for you help! The speedup compared to openblas is mainly visible in eigenvalues & fft.

from mkl4deb.

eddelbuettel commented on July 24, 2024

Thanks to @Edild for raising this, and to @emfomenk for the excellent follow-up. I added this to both README.md and the actual script.sh.

from mkl4deb.

emfomenk commented on July 24, 2024

Hi,

Thanks for the fix and work you do!
I have a minor comment to the commit though.
Please see here.

from mkl4deb.

benchmarkme fails about mkl4deb HOT 16 CLOSED

Comments (16)

Related Issues (18)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent