Comments (6)
I think the issue is that the pqR version did not use OpenBLAS.
from pqr.
Hi. Could you give some more details? Is there a problem getting pqR to link with OpenBLAS, or had you just not done that? If you do link with OpenBLAS, crossprod will be done using it, but %*% will not be done with OpenBLAS unless you set the mat_mult_with_BLAS option to TRUE. Do you know whether OpenBLAS is thread safe? (That is, whether it would be OK to call it from more than one thread. Even if this gives the right answer, if OpenBLAS is itself using multiple threads, it might not be a good idea.) If not, you should configure with --disable-mat-mult-with-BLAS-in-helpers. (Of course, this isn't necessary if you've configured with --disable-helpers.)
from pqr.
Hi,
first thank you for working on this project!
Regarding the reported "issue":
When I compiled pqR I thought I compiled it with OpenBLAS, but after reporting this issue I quickly ran my standard R with the default BLAS (not with OpenBLAS as in the benchmark) and realized that the performance is comparable to the "slow" results from pqR. So I checked pqR; turns out I was not using OpenBLAS in pqR.
So the slow crossprod() was due to the used BLAS and had nothing to do with pqR. (= issue closed)
Do you know whether OpenBLAS is thread safe?
I would have to check. But I also think that using multi-threaded OpenBLAS in a multi-threaded program might cause problems...
from pqr.
Here a quick follow up:
According to https://github.com/xianyi/OpenBLAS/wiki/faq#wiki-multi-threaded it is advisable to limit the number of OpenBLAS threads to 1 if used in a multi-threaded program.
Running pqR with OpenBLAS works and I get the same timings for crossprod() in pqR and std. R:
n=100 n=200 n=300 n=400 n=500 n=600 n=700 n=800 n=900 n=1000 n=1100
std_R 0.000 0.001 0.003 0.007 0.015 0.026 0.039 0.059 0.08 0.11 0.143
pqR 0.001 0.001 0.003 0.007 0.015 0.025 0.040 0.057 0.08 0.11 0.148
n=1200 n=1300 n=1400 n=1500 n=1600 n=1700 n=1800 n=1900 n=2000
std_R 0.188 0.235 0.293 0.355 0.435 0.513 0.613 0.713 0.833
pqR 0.184 0.237 0.290 0.358 0.430 0.517 0.609 0.717 0.828
from pqr.
Thanks for the follow up information. Are those timings with only one thread in OpenBLAS?
By the way, if you use OpenBLAS (or any other BLAS, including the one packaged with pqR) for matrix multiplies (with %%, not crossprod) with helper threads, pqR may still do multiplies in parallel, but will not pipeline the output of a multiply to another operation. So, for example, L <- list(A%%B,B%%A) will do two BLAS multiplies in parallel (unless the matrices are small), but A %% (B %*% C) will not (though it will if you use the pipelined C routines for the multiply).
It might be interesting to experiment with using threads for both pqR and OpenBLAS, even if they don't advise it. If you have four or more cores, you could try one helper thread and two OpenBLAS threads, which I would think would use at most four cores, if it works as I imagine.
from pqr.
These timings are for OpenBLAS using one thread. On my machine (Intel Q9400, 4-core @2.6Ghz) using 2 threads splits the time in half; using 3 or 4 threads does not bring any improvements over 2 threads.
Timings for runs with 1,2,3,4 threads:
structure(list(num = c(100, 200, 300, 400, 500, 600, 700, 800,
900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900,
2000, 2100, 2200, 2300, 2400, 2500, 2600, 2700, 2800, 2900, 3000
), ncpu1 = c(0, 0.001, 0.003, 0.007, 0.015, 0.025, 0.038, 0.057,
0.08, 0.107, 0.143, 0.184, 0.235, 0.29, 0.357, 0.43, 0.513, 0.609,
0.714, 0.83, 0.972, 1.11, 1.254, 1.44, 1.622, 1.804, 2.036, 2.268,
2.516, 2.778), ncpu2 = c(0, 0.001, 0.002, 0.004, 0.009, 0.014,
0.023, 0.032, 0.049, 0.064, 0.082, 0.103, 0.128, 0.157, 0.192,
0.232, 0.274, 0.322, 0.378, 0.439, 0.524, 0.597, 0.66, 0.76,
0.852, 0.944, 1.07, 1.2, 1.319, 1.455), ncpu3 = c(0.001, 0.001,
0.003, 0.005, 0.008, 0.014, 0.022, 0.033, 0.049, 0.063, 0.082,
0.102, 0.128, 0.157, 0.194, 0.23, 0.274, 0.324, 0.377, 0.438,
0.515, 0.588, 0.659, 0.762, 0.853, 0.943, 1.073, 1.19, 1.318,
1.458), ncpu4 = c(0.001, 0.001, 0.003, 0.004, 0.009, 0.015, 0.022,
0.032, 0.049, 0.064, 0.081, 0.104, 0.128, 0.157, 0.192, 0.23,
0.274, 0.323, 0.378, 0.439, 0.515, 0.587, 0.688, 0.767, 0.854,
0.955, 1.074, 1.198, 1.326, 1.454)), .Names = c("num", "ncpu1",
"ncpu2", "ncpu3", "ncpu4"), row.names = c(NA, -30L), class = "data.frame")
from pqr.
Related Issues (20)
- What is preventing the merge of pqR into R? HOT 1
- pqR side by side with GNU-R HOT 8
- T and F don't work with mat_mult_with_BLAS configuration argument
- Docker image for pqR HOT 3
- No window version available HOT 1
- Incompatible library version HOT 3
- Will pqR code be merged to R? HOT 9
- Missing <R_ext/sggc-app.h> HOT 11
- Proposal of new feature: native 64 bit integers support HOT 2
- R/time.R - filename restricted? HOT 5
- Update formula does not work as expected HOT 6
- make error: multiple definitions HOT 6
- segfault when using attributes HOT 1
- buffer overflow when using attributes on a recursive function HOT 2
- floating point exception
- pqR aborted when using plot
- Problems with installation ubuntu 20.04 HOT 5
- Pqr and running Rstudio on mac
- Installation fails on Slackware 15.0 HOT 13
- Slow pqR loops vs R (CRAN)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pqr.