Comments (16)
"picky" is in the mind of the beholder. I can't remember the details on the distribution of the eigenvalues of a matrix constructed in this way but it is extremely likely that a 2500 x 2500 cross-product matrix will be computationally singular. That's why solving such very large systems without any regularization is difficult.
from mkl4deb.
It's a good idea to try benchmarkme
on this! I did nothing more than you -- fire up the script (I posted). If there is an issue, it is probably an MKL issue.
And I seem to get the same problem:
> benchmark_std(runs=1)
# Programming benchmarks (5 tests):
3,500,000 Fibonacci numbers calculation (vector calc): 0.437 (sec).
Grand common divisors of 1,000,000 pairs (recursion): 0.645 (sec).
Creation of a 3500x3500 Hilbert matrix (matrix calc): 0.122 (sec).
Creation of a 3000x3000 Toeplitz matrix (loops): 7.09 (sec).
Escoufier's method on a 60x60 matrix (mixed): 0.559 (sec).
# Matrix calculation benchmarks (5 tests):
Creation, transp., deformation of a 5000x5000 matrix: 0.435 (sec).
2500x2500 normal distributed random matrix ^1000: 0.36 (sec).
Sorting of 7,000,000 random values: 0.53 (sec).
2500x2500 cross-product matrix (b = a' * a): 0.329 (sec).
Error in solve(crossprod(a), crossprod(a, b)) :
the leading minor of order 1404 is not positive definite
Timing stopped at: 1.037 0.009 0.175
@csgillespie Any ideas? That benchmark does not even seem influenced by set.seed()
.
from mkl4deb.
Poking @csgillespie as first version of previous post had a typo with a # where a @ was needed...
from mkl4deb.
And I did notice that we can switch to RcppZiggurat if installed (which I did). I then get
2500x2500 cross-product matrix (b = a' * a): 0.324 (sec).
Error in solve(crossprod(a), crossprod(a, b)) :
the leading minor of order 1566 is not positive definite
Timing stopped at: 1.166 0.028 0.199
>
so it looks like the MKL is very picky. That is not really an issue for this repo though. The script to install the MKL works...
from mkl4deb.
Thanks so much for piping in. I was thinking about bugging you :) The actual routine appears to be
this function using a dgeMatrix
from your Matrix package. What is weird that this is probably "old" code from the benchmark package originally put together by Simon. I am surprised this only bubbles up with MKL though/
from mkl4deb.
Hmm, either a problem with the benchmark or mkl. What do Intel folks say (e.g. @emfomenk)? Are alternative benchmarks available ?
from mkl4deb.
As Doug said, a 2000x2000 matrix crossproduct may well be singular.
from mkl4deb.
Hi everyone,
Not sure I know what exactly you call from MKL, but it might happen we have a bug :) But if this is only matrix-matrix multiplication I really doubt the bug is in MKL itself...
Few things to check:
-
@eddelbuettel, what is threading model for MKL? MKL supports sequential, openmp and tbb runtimes. If OpenMP is used than there are 2 runtimes: GNU (libgomp.so) and Intel (libiomp5.so). You should not mix two runtimes in one application. If
benchmarkme
uses GNU OpenMP, you really want to either set MKL_THREADING_LAYER=GNU, to make mkl_rt pick gnu threading or do LD_PRELOAD=libiomp5.so (LD_LIBRARY_PATH should contain the path with libiomp5, that lives somewhere in /opt/intel). In former case libiomp5 would cover both runtimes. -
you can set MKL_VERBOSE=1 to see what functions are called from MKL. that might help at least to understand what are the shapes and sizes you are using. Not sure that would immediately give me an idea what goes wrong.
Anyway, standalone reproducer would be super helpful for me to check on my side.
from mkl4deb.
Thanks for coming over here, @emfomenk.
This repo 'merely' contains a script adding MKL to a .deb
-based Debian or Ubuntu system. You can see in the rather short script what we do for ldconf
: not much.
The particular failing function from the benchmarkme package is here -- the solve()
after the two crossprod goes funny. Adding MKL_VERBOSE
to R session I have for this (in Docker) made no difference.
ldd
shows nothing pertaining to OpenMP but it may well dispatched by libmkl_rt.so
:
root@c9f8062fbd93:~# ldd /opt/intel/mkl/lib/intel64/libmkl_rt.so
linux-vdso.so.1 (0x00007ffc625f8000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f9876403000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f9876049000)
/lib64/ld-linux-x86-64.so.2 (0x00007f9876c84000)
root@c9f8062fbd93:~#
benchmarkme
itself is just a set of R functions; R by default does not turn OpenMP on (but can).
Setting MKL_THREADING_LAYER=GNU
before calling R made the difference:
root@c9f8062fbd93:~# MKL_THREADING_LAYER=GNU R
R version 3.4.4 (2018-03-15) -- "Someone to Lean On"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(benchmarkme)
See https://jumpingrivers.shinyapps.io/benchmarkme/ for a Shiny
interface to the benchmark data.
> benchmark_std(runs=1)
# Programming benchmarks (5 tests):
3,500,000 Fibonacci numbers calculation (vector calc): 0.426 (sec).
Grand common divisors of 1,000,000 pairs (recursion): 0.616 (sec).
Creation of a 3500x3500 Hilbert matrix (matrix calc): 0.125 (sec).
Creation of a 3000x3000 Toeplitz matrix (loops): 7.28 (sec).
Escoufier's method on a 60x60 matrix (mixed): 0.572 (sec).
# Matrix calculation benchmarks (5 tests):
Creation, transp., deformation of a 5000x5000 matrix: 0.43 (sec).
2500x2500 normal distributed random matrix ^1000: 0.367 (sec).
Sorting of 7,000,000 random values: 0.535 (sec).
2500x2500 cross-product matrix (b = a' * a): 0.078 (sec).
Linear regr. over a 3000x3000 matrix (c = a \ b'): 0.101 (sec).
# Matrix function benchmarks (5 tests):
Cholesky decomposition of a 3000x3000 matrix: 0.113 (sec).
Determinant of a 2500x2500 random matrix: 0.084 (sec).
Eigenvalues of a 640x640 random matrix: 0.186 (sec).
FFT over 2,500,000 random values: 0.199 (sec).
Inverse of a 1600x1600 random matrix: 0.072 (sec).
user system elapsed test test_group cores
1 0.411 0.016 0.426 fib prog 0
2 0.584 0.032 0.616 gcd prog 0
3 0.097 0.028 0.125 hilbert prog 0
4 7.275 0.008 7.283 toeplitz prog 0
5 3.292 0.039 0.572 escoufier prog 0
6 0.406 0.024 0.430 manip matrix_cal 0
7 0.367 0.000 0.367 power matrix_cal 0
8 0.515 0.020 0.535 sort matrix_cal 0
9 0.354 0.056 0.078 cross_product matrix_cal 0
10 0.353 0.024 0.101 lm matrix_cal 0
11 0.316 0.016 0.113 cholesky matrix_fun 0
12 0.308 0.040 0.084 determinant matrix_fun 0
13 1.075 0.004 0.186 eigen matrix_fun 0
14 0.191 0.008 0.199 fft matrix_fun 0
15 0.402 0.004 0.072 inverse matrix_fun 0
>
I will ask @csgillespie to add that environment variable, or to document it. I will document it here too.
Thanks again!
from mkl4deb.
I know I'm late to the party but two comments to wrap up:
-
As Dirk mentioned, I just used Simon's benchmarks (http://r.research.att.com/benchmarks/R-benchmark-25.R)
-
Be wary of comparing with historical benchmarks. Basically, the different versions of R uses the compiler package in different ways. I intend to update the package soon to make this clear.
from mkl4deb.
No problem :)
Hmm... It is really strange MKL_VERBOSE doesn't trigger any verbose output. Currently all the functions from BLAS, LAPACK, and DFT should dump the parameters to stdout when MKL_VERBOSE environment variable is set. Really weird.
Regarding ldd. MKL supports 3 linking modes:
- static (libmkl_*.a)
- dynamic (libmkl_rt.so), rt stands for load dependencies at RunTime
- explicit dynamic (libmkl_*.so, except for libmkl_rt.so)
libmkl_rt.so loads required MKL layers at runtime. Depending on environment variables it might load C LP64/C ILP64/GNU Fortran LP64/GNU Fortran ILP64 interface, Intel OpenMP/GNU OpenMP/Intel TBB/sequential threading, and core library (architecture specific). For more information see this page.
If no environment variable is set the default configuration is C LP64 + Intel OpenMP threading layer. So at run-time MKL will load libiomp5.so and libmkl_intel_thread.so. If your application or its dependencies use GNU OpenMP (e.g. some parts of it is built with gcc and -fopenmp flags) then Linux dynamic linker will load GNU OpenMP RT as well. So at that moment your application will have 2 OpenMP run-times. That typically leads to numerical errors or even crashes.
from mkl4deb.
Splendid explanation. I am the one building this R version for Debian (and Ubuntu) and we definitely have that enable in this build so we surely need the env var to not load it again:
root@11acb27fced5:~# grep -i openmp /etc/R/Makeconf
DYLIB_LDFLAGS = -shared -fopenmp# $(CFLAGS) $(CPICFLAGS)
MAIN_LDFLAGS = -Wl,--export-dynamic -fopenmp
SHLIB_OPENMP_CFLAGS = -fopenmp
SHLIB_OPENMP_CXXFLAGS = -fopenmp
SHLIB_OPENMP_FCFLAGS = -fopenmp
SHLIB_OPENMP_FFLAGS = -fopenmp
root@11acb27fced5:~#
So to recap MKL_THREADING_LAYER=GNU
should protect us, correct?
from mkl4deb.
So to recap
MKL_THREADING_LAYER=GNU
should protect us, correct?
Yes, as long as the whole application uses either GNU OpenMP RT or no OpenMP RT at all.
from mkl4deb.
I can confirm that MKL_THREADING_LAYER=GNU
works for me. Thanks for you help! The speedup compared to openblas is mainly visible in eigenvalues & fft.
from mkl4deb.
Thanks to @Edild for raising this, and to @emfomenk for the excellent follow-up. I added this to both README.md
and the actual script.sh
.
from mkl4deb.
Hi,
Thanks for the fix and work you do!
I have a minor comment to the commit though.
Please see here.
from mkl4deb.
Related Issues (18)
- The script should be part of a Debian package HOT 1
- Can the MKL script be adapted for dnf based fedora systems? HOT 3
- when using "sudo", the linux script does not work HOT 2
- Bumping priority number to > 100 HOT 3
- Translating into duprkit recipe HOT 2
- installation failed behind proxy HOT 1
- script.sh does not affect BLAS/LAPACK library for rocker/r-ver:3.6.1 HOT 4
- MKL threading HOT 3
- Adding MKL support to bioconductor docker HOT 7
- Not working on Ubuntu 20.04 HOT 1
- intel-mkl 2020.0.166-1 no longer has /opt/intel HOT 8
- MKL inconsistent results HOT 1
- segmentation fault HOT 2
- apt install intel-mkl in a non-interactive session HOT 4
- Fixing problems with MKL when using `intel-mkl` HOT 1
- Repository and package are now named `oneapi-mkl` HOT 1
- using MKL with rocker/r-devel HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mkl4deb.