Giter Site home page Giter Site logo

Comments (6)

dataPulverizer avatar dataPulverizer commented on August 24, 2024 1

Thanks for bringing this to my attention. I have re-run the benchmarks for Chapel and the --no-ieee-float has made a big difference. I also left out the flag from Julia so I am re-running the benchmarks for Julia also to include the flag. Having taken a closer look at fast math, I may present results with and without fast math.

Please bear with me while I re-run the benchmarks and update the results.

Thank you.

from kernelmatrixbenchmark.

npadmana avatar npadmana commented on August 24, 2024

Just to motivate this, here are the original timings on my laptop --

language, kernel, nitems, time
Chapel, DotProduct, 500, 0.0256293
Chapel, DotProduct, 1000, 0.101618
Chapel, DotProduct, 2000, 0.413218
Chapel, Gaussian, 500, 0.02657
Chapel, Gaussian, 1000, 0.105494
Chapel, Gaussian, 2000, 0.42758
Chapel, Polynomial, 500, 0.0277363
Chapel, Polynomial, 1000, 0.110192
Chapel, Polynomial, 2000, 0.443867
Chapel, Exponential, 500, 0.0264003
Chapel, Exponential, 1000, 0.103664
Chapel, Exponential, 2000, 0.418785
Chapel, Log, 500, 0.462164
Chapel, Log, 1000, 1.85764
Chapel, Log, 2000, 7.6242
Chapel, Cauchy, 500, 0.0275407
Chapel, Cauchy, 1000, 0.109992
Chapel, Cauchy, 2000, 0.444677
Chapel, Power, 500, 0.499062
Chapel, Power, 1000, 1.99373
Chapel, Power, 2000, 8.06157
Chapel, Wave, 500, 0.028901
Chapel, Wave, 1000, 0.115344
Chapel, Wave, 2000, 0.467274
Chapel, Sigmoid, 500, 0.028015
Chapel, Sigmoid, 1000, 0.110547
Chapel, Sigmoid, 2000, 0.45463

and with --no-ieee-float --

language, kernel, nitems, time
Chapel, DotProduct, 500, 0.004971
Chapel, DotProduct, 1000, 0.0175267
Chapel, DotProduct, 2000, 0.082295
Chapel, Gaussian, 500, 0.00478333
Chapel, Gaussian, 1000, 0.0184817
Chapel, Gaussian, 2000, 0.0862503
Chapel, Polynomial, 500, 0.00522733
Chapel, Polynomial, 1000, 0.01982
Chapel, Polynomial, 2000, 0.0895983
Chapel, Exponential, 500, 0.00532333
Chapel, Exponential, 1000, 0.0190223
Chapel, Exponential, 2000, 0.086786
Chapel, Log, 500, 0.471886
Chapel, Log, 1000, 1.89644
Chapel, Log, 2000, 8.10626
Chapel, Cauchy, 500, 0.004573
Chapel, Cauchy, 1000, 0.0171683
Chapel, Cauchy, 2000, 0.0855037
Chapel, Power, 500, 0.516937
Chapel, Power, 1000, 2.08933
Chapel, Power, 2000, 8.18946
Chapel, Wave, 500, 0.00571833
Chapel, Wave, 1000, 0.0226087
Chapel, Wave, 2000, 0.102585
Chapel, Sigmoid, 500, 0.00460033
Chapel, Sigmoid, 1000, 0.017633
Chapel, Sigmoid, 2000, 0.0823383

from kernelmatrixbenchmark.

npadmana avatar npadmana commented on August 24, 2024

Great -- I look forward to seeing the differences!

It looks like Julia implemented a faster version of log (which would also affect the pow) compared to the standard libm implementation, which might explain why it runs faster for those two benchmarks.

from kernelmatrixbenchmark.

dataPulverizer avatar dataPulverizer commented on August 24, 2024

I have now updated the benchmark calculations and article showing the cases with IEEE calculations and with fast math for all the languages and kernels. Thank you for bringing this to my attention.

from kernelmatrixbenchmark.

npadmana avatar npadmana commented on August 24, 2024

Thanks very much for the update!

Two very minor comments --

  1. It's my understanding that IEEE math breaks associativity in general. This is why compilers in general are disallowed from reordering arithmetic operations and why the compilers don't auto-vectorize the loops. Throwing the -ffast-math flags allow the compile to assume associative math (amongst other things) which lets it reorder the code and vectorize.

My understanding of the Julia @simd directive is that it instructs the compiler to vectorize the loop, allowing such math reorderings. (and +1 to Julia for allowing the ability to do this locally, as opposed with a more global flag).

  1. Except for log and power, I'll note that the Julia results seem to be identical with and without @fastmath. Which seems to suggest the biggest difference in performance is whether or not the core loops get vectorized or not.

Thanks again!

from kernelmatrixbenchmark.

dataPulverizer avatar dataPulverizer commented on August 24, 2024

Two very minor comments --

  1. It's my understanding that IEEE math breaks associativity in general. This is why compilers in general are disallowed from reordering arithmetic operations and why the compilers don't auto-vectorize the loops. Throwing the -ffast-math flags allow the compile to assume associative math (amongst other things) which lets it reorder the code and vectorize.

Thanks, I have updated this.

  1. Except for log and power, I'll note that the Julia results seem to be identical with and without @fastmath. Which seems to suggest the biggest difference in performance is whether or not the core loops get vectorized or not.

Yes.

from kernelmatrixbenchmark.

Related Issues (2)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.