Giter Site home page Giter Site logo

Comments (6)

felixr avatar felixr commented on July 23, 2024

I think the issue is that the pqR version did not use OpenBLAS.

from pqr.

radfordneal avatar radfordneal commented on July 23, 2024

Hi. Could you give some more details? Is there a problem getting pqR to link with OpenBLAS, or had you just not done that? If you do link with OpenBLAS, crossprod will be done using it, but %*% will not be done with OpenBLAS unless you set the mat_mult_with_BLAS option to TRUE. Do you know whether OpenBLAS is thread safe? (That is, whether it would be OK to call it from more than one thread. Even if this gives the right answer, if OpenBLAS is itself using multiple threads, it might not be a good idea.) If not, you should configure with --disable-mat-mult-with-BLAS-in-helpers. (Of course, this isn't necessary if you've configured with --disable-helpers.)

from pqr.

felixr avatar felixr commented on July 23, 2024

Hi,
first thank you for working on this project!

Regarding the reported "issue":
When I compiled pqR I thought I compiled it with OpenBLAS, but after reporting this issue I quickly ran my standard R with the default BLAS (not with OpenBLAS as in the benchmark) and realized that the performance is comparable to the "slow" results from pqR. So I checked pqR; turns out I was not using OpenBLAS in pqR.

So the slow crossprod() was due to the used BLAS and had nothing to do with pqR. (= issue closed)

Do you know whether OpenBLAS is thread safe?
I would have to check. But I also think that using multi-threaded OpenBLAS in a multi-threaded program might cause problems...

from pqr.

felixr avatar felixr commented on July 23, 2024

Here a quick follow up:
According to https://github.com/xianyi/OpenBLAS/wiki/faq#wiki-multi-threaded it is advisable to limit the number of OpenBLAS threads to 1 if used in a multi-threaded program.

Running pqR with OpenBLAS works and I get the same timings for crossprod() in pqR and std. R:

       n=100 n=200 n=300 n=400 n=500 n=600 n=700 n=800 n=900 n=1000 n=1100
std_R 0.000 0.001 0.003 0.007 0.015 0.026 0.039 0.059  0.08   0.11  0.143
pqR    0.001 0.001 0.003 0.007 0.015 0.025 0.040 0.057  0.08   0.11  0.148

      n=1200 n=1300 n=1400 n=1500 n=1600 n=1700 n=1800 n=1900 n=2000
std_R  0.188  0.235  0.293  0.355  0.435  0.513  0.613  0.713  0.833
pqR     0.184  0.237  0.290  0.358  0.430  0.517  0.609  0.717  0.828

from pqr.

radfordneal avatar radfordneal commented on July 23, 2024

Thanks for the follow up information. Are those timings with only one thread in OpenBLAS?

By the way, if you use OpenBLAS (or any other BLAS, including the one packaged with pqR) for matrix multiplies (with %%, not crossprod) with helper threads, pqR may still do multiplies in parallel, but will not pipeline the output of a multiply to another operation. So, for example, L <- list(A%%B,B%%A) will do two BLAS multiplies in parallel (unless the matrices are small), but A %% (B %*% C) will not (though it will if you use the pipelined C routines for the multiply).

It might be interesting to experiment with using threads for both pqR and OpenBLAS, even if they don't advise it. If you have four or more cores, you could try one helper thread and two OpenBLAS threads, which I would think would use at most four cores, if it works as I imagine.

from pqr.

felixr avatar felixr commented on July 23, 2024

These timings are for OpenBLAS using one thread. On my machine (Intel Q9400, 4-core @2.6Ghz) using 2 threads splits the time in half; using 3 or 4 threads does not bring any improvements over 2 threads.

Timings for runs with 1,2,3,4 threads:

structure(list(num = c(100, 200, 300, 400, 500, 600, 700, 800, 
900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 
2000, 2100, 2200, 2300, 2400, 2500, 2600, 2700, 2800, 2900, 3000
), ncpu1 = c(0, 0.001, 0.003, 0.007, 0.015, 0.025, 0.038, 0.057, 
0.08, 0.107, 0.143, 0.184, 0.235, 0.29, 0.357, 0.43, 0.513, 0.609, 
0.714, 0.83, 0.972, 1.11, 1.254, 1.44, 1.622, 1.804, 2.036, 2.268, 
2.516, 2.778), ncpu2 = c(0, 0.001, 0.002, 0.004, 0.009, 0.014, 
0.023, 0.032, 0.049, 0.064, 0.082, 0.103, 0.128, 0.157, 0.192, 
0.232, 0.274, 0.322, 0.378, 0.439, 0.524, 0.597, 0.66, 0.76, 
0.852, 0.944, 1.07, 1.2, 1.319, 1.455), ncpu3 = c(0.001, 0.001, 
0.003, 0.005, 0.008, 0.014, 0.022, 0.033, 0.049, 0.063, 0.082, 
0.102, 0.128, 0.157, 0.194, 0.23, 0.274, 0.324, 0.377, 0.438, 
0.515, 0.588, 0.659, 0.762, 0.853, 0.943, 1.073, 1.19, 1.318, 
1.458), ncpu4 = c(0.001, 0.001, 0.003, 0.004, 0.009, 0.015, 0.022, 
0.032, 0.049, 0.064, 0.081, 0.104, 0.128, 0.157, 0.192, 0.23, 
0.274, 0.323, 0.378, 0.439, 0.515, 0.587, 0.688, 0.767, 0.854, 
0.955, 1.074, 1.198, 1.326, 1.454)), .Names = c("num", "ncpu1", 
"ncpu2", "ncpu3", "ncpu4"), row.names = c(NA, -30L), class = "data.frame")

from pqr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.