ktrianta / jacobi-svd-evd Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
Currently we measure the performance of our codes on input sizes that are powers of 2.
Experiment with a bigger variety of input sizes. Block size (in blocking codes) should be set accordingly, i.e. such that the input size is a multiple of the block size.
These operations could prove beneficial in cases where we load/store elements from/to non-contiguous memory addresses. AVX2 supports gather (but not scatter) instructions.
More information:
Gather operations in Intel intrinsics guide
Stackoverflow: What do you do without fast gather and scatter in AVX2 instructions?
We can try to implement the blocked version of SVD without explicitly creating the intermediate matrices, thereby reducing the overall data movement and possibly increasing the performance.
Currently we are compiling our code with gcc. Experiment by using icc and clang.
Another area of experimentation could be to try different optimization flags.
Create an autotuning infrastructure that aids us in finding the optimal block size for our codes and benchmarks.
Lots of compiler warnings are emitted during compilation. To make sure we address these, we can use -Werror to fail the compilation until all of them are fixed.
We need to incorporate git hooks into the project to automatically format the committed codes, and possibly lint the code (maybe cpplint ?).
Blocked algorithms employ block matrix multiplications. To get the highest performance, we need to optimize the block matrix multiplication algorithms as much as possible. Furthermore, block based algorithms also do block matrix additions. We probably need to create a separate version of block matrix multiplication that performs C = C + AB
. Consequently, we need two algorithms
C = AB
C = C + AB
.The code for both should be pretty much the same.
Create roofline models for our codes and compare the expected performance with the one actually achieved.
Try to unroll and vectorize all possible remanining auxiliary operations
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.