Comments (11)
Could you post some numbers on
- how the speed ratio is affected by the sample size,
- how the speed compares for
fit()
andselect()
?
from vinecopulib.
Just for reference: with the current version (a783e0d) I get
dnorm()
: 0.014298 (boost) vs 0.005876 (gsl)pnorm()
: 0.017728 (boost) vs 0.004371 (gsl)qnorm()
: 0.019062 (boost) vs 0.005081 (gsl)
from vinecopulib.
Have you compiled the Boost version with the same optimisations as the GSL .so was compiled with? Trying with "-Ofast -march=native" I see Boost being almost twice faster than GSL (dnorm). Without optimisations, I do see similar results as reported above.
#include <functional>
#include <gsl/gsl_cdf.h>
#include <gsl/gsl_randist.h>
template<typename T> T dnorm_gsl(const T& x)
{
return x.unaryExpr(std::ptr_fun(gsl_ran_ugaussian_pdf));
};
#include <boost/bind.hpp>
#include <boost/math/distributions.hpp>
#include <boost/function.hpp>
template<typename T> T dnorm(const T& x)
{
boost::math::normal std_normal;
return x.unaryExpr(boost::bind<double>(boost::math::pdf<boost::math::normal,double>, std_normal, _1));
};
#include <chrono>
#include <iostream>
template <typename T>
void time(const std::string label, const T &it)
{
auto start = std::chrono::high_resolution_clock::now();
it();
auto finish = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> elapsed = finish - start;
std::cout << label << "Elapsed time: " << elapsed.count() << " s\n";
}
#include <Eigen/Dense>
int main()
{
Eigen::MatrixXd m(10000, 10000);
time("Boost:", [m]{ auto a = dnorm(m); });
time("GSL: ", [m]{ auto b = dnorm_gsl(m); });
}
Usage:
$ clang++-3.6 -Ofast -march=native -std=c++11 -lgsl -lblas test_boost_vs_gsl.cpp
$ ./a.out
Boost:Elapsed time: 1.10801 s
GSL: Elapsed time: 1.74596 s
$ clang++-3.6 -std=c++11 -lgsl -lblas test_boost_vs_gsl.cpp
$ ./a.out
Boost:Elapsed time: 11.6608 s
GSL: Elapsed time: 4.20882 s
from vinecopulib.
OK, so I changed the release flags (line 8 of compilerDefOpt.cmake
) to use -Ofast -march=native
instead of -O3
:
Boost:Elapsed time: 1.16187 s
GSL: Elapsed time: 0.929747 s
When using the debug flags (line 7 of compilerDefOpt.cmake
), namely -g -O0 -DDEBUG -fsanitize=address -fno-omit-frame-pointer
, I get:
Boost:Elapsed time: 21.7116 s
GSL: Elapsed time: 4.15891 s
Turns out that:
- I ran all my tests with the debug version.
- Boost is still slower than GSL but it's not that bad.
from vinecopulib.
Re GSL vs. Boost in debug mode: are you linking with debug version of GSL for the benchmark? Otherwise it's an unfair comparison as the number crunching might actually happen in optimised GSL code from the precompiled shared library.
Re flags, perhaps it's worth to add -DNDEBUG to CMAKE_CXX_FLAGS_RELEASE as well.
HTH,
Sylwester
from vinecopulib.
No, GSL is the release version and it's indeed an unfair comparison. Actually, I was especially interested in the number for our release version (compiled with O3).
By the way, I tried Ofast and march=native but it does not improve much. Furthermore, when using march=native, I get the following error in test_bicop_select
:
test_bicop_select(1439,0x7fff7a92f000) malloc: *** error for object 0x110dc1020: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
Thomas also tried and it runs fine on his computer, with similar speed-ups as you get. I have a macbook pro and am running on osx, can it be related?
from vinecopulib.
I suggest following the suggestion, i.e.:
$ gdb the_failing_binary
(gbd) break malloc_error_break
(gdb) run
(gdb) bt
Should give information on what's wrong with the deallocation.
HTH
from vinecopulib.
I get:
#0 0x00007fff8a3d1f32 in malloc_error_break () from /usr/lib/system/libsystem_malloc.dylib
#1 0x00007fff8a3c2fd2 in free () from /usr/lib/system/libsystem_malloc.dylib
#2 0x000000010000471a in (anonymous namespace)::ParBicopTest_bicop_select_mle_bic_is_correct_Test<IndepBicop>::TestBody() ()
#3 0x00000001000337ae in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ()
#4 0x000000010001d27e in testing::Test::Run() ()
#5 0x000000010001dd62 in testing::TestInfo::Run() ()
#6 0x000000010001e323 in testing::TestCase::Run() ()
#7 0x00000001000261bb in testing::internal::UnitTestImpl::RunAllTests() ()
#8 0x0000000100033f70 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ()
#9 0x0000000100025d2e in testing::UnitTest::Run() ()
#10 0x0000000100003ab1 in main ()
I noticed that test_bicop_class
is passing, but that the same error arise in test_bicop_parametric
when it calls pdf_is_correct
(i.e., par_to_tau_is_correct
is passing)...
from vinecopulib.
Google suggested me a somehow similar bug report here: google/sanitizers#70 where LLVM/Clang's sanitizer was blamed (and reported to be later fixed). No idea how relevant it is?
from vinecopulib.
Not sure if -march=native
is a good default option for the release version anyway. This does not allow to build the executables on one machine and use it on another.
Also, I found that -O2 -DNDEBUG
gives us the same speed as -O3 -DNDEBUG
, but smaller executables. I think we should go with -O2 -DNDEBUG
for now and revisit the compiler options before our first release (see #24).
from vinecopulib.
Since I think we're happy with boost now, I'll close this issue.
from vinecopulib.
Related Issues (20)
- Update ThreadPool to latest RcppThread fixes HOT 1
- `str` method for `FitControlsBicop` and `FitControlsVinecop` HOT 1
- Bug in `RVineStructure::simulate()` HOT 1
- New families: rotation mixtures HOT 1
- error for TLL with comonotonic data HOT 1
- AIC as default selection criterion HOT 2
- Add support to save model using optimized data formats HOT 11
- Remove unnecessary whitespace characters in JSON files HOT 2
- High memory usage when loading JSON files HOT 4
- Estimate time of fit/simulate HOT 3
- Miscellaneous addition to docs
- inverse_rosenblatt not using multiple threads when simulating large samples? HOT 2
- gh action script needs update
- about the CDF of the kernel vinecopula HOT 1
- Error msg:Undefined symbols for architecture arm64.
- Compile problem on MacOS: Undefined symbols for architecture arm64 HOT 12
- Enable additional parallelism? HOT 8
- Independence copula causes deserialization to fail HOT 11
- a small typo in the `vinecopulib/vinecop/implementation/tools_select.ipp`
- GPU acceleration? 'Live' training? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vinecopulib.