sarah-ek / faer-rs Goto Github PK

View Code? Open in Web Editor NEW

1.6K 1.6K 54.0 7.1 MB

Linear algebra foundation for the Rust programming language

Home Page: https://faer-rs.github.io

License: MIT License

Rust 99.01% Python 0.06% HTML 0.04% C++ 0.70% Just 0.01% TeX 0.16% Shell 0.02%

linear-algebra matrix rust

faer-rs's People

Contributors

Stargazers

Watchers

Forkers

oojo12 masahirokato71 simonsan tastaturtaste codingonion camerontredoux matthagan15 martineekgerhardsen zusez4 shenjiangqiu wishinzhang lvtuwjl shubhamkumar13 cbeuw geo-ant iq-scm jondo2010 iverks gmh5225 evalir phi-gamma delicioushair plurigrid pnevyk miguelraz edyounis aemiliusretiarius sevagh mookerji cramt transurgeon alexandernenninger jordimerejo aentity ntorresalberto mtreinish shabbirhasan1 jonlachmann kenken-neko bpr pleepleus imbaveteran ananthakrishnagopal borispopoff jithinraj qu1x nolanking90 roscoeeh alphastrata goulart-paul hbcbh1999 mhovd lishen1

faer-rs's Issues

Iterative methods for solving linear systems/eigenproblems

Routines like (Bi-) conjugate gradients, MINRES, GMRES, SYMMLQ, LOBPCG etc. are not part of LAPACK but may fit in faer as well. Would you accept PRs for a faer-iterative crate for large/non-dense problems?

Guidelines for efficient faer dynamic library

Hi!

I am really impressed by your colossal work on this math kernel!
I am writing a Julia wrapper to benchmark faer against OpenBLAS and MKL.

So far I have only studied the dense matrix-matrix multiplication. My preliminary results show faer approximately 50% slower than OpenBLAS and 25% slower than MKL on an AMD Ryzen 5 7640U on 8 threads.

This is basically my first Rust project and I want to be fair to faer: is this a reasonable dynamic library exposing faer inplace matrix multiplication using the C ABI?

I am not sure if opening an issue is the right way to ask but the faer documentation is very sparse at the moment on how to import external matrices.

use faer::modules::core::mul::matmul;
use faer::{mat, Parallelism};
use std::usize;

// inplace c = a * b
#[no_mangle]
pub unsafe extern "C" fn mult(
    c_ptr: *mut f64,
    c_nrows: u64,
    c_ncols: u64,
    c_row_stride: u64,
    c_col_stride: u64,
    a_ptr: *const f64,
    a_nrows: u64,
    a_ncols: u64,
    a_row_stride: u64,
    a_col_stride: u64,
    b_ptr: *const f64,
    b_nrows: u64,
    b_ncols: u64,
    b_row_stride: u64,
    b_col_stride: u64,
    nthreads: u32,
) {
    assert!(!c_ptr.is_null());
    assert!(!a_ptr.is_null());
    assert!(!b_ptr.is_null());

    let c = unsafe {
        mat::from_raw_parts_mut::<f64>(
            c_ptr,
            c_nrows as usize,
            c_ncols as usize,
            c_row_stride as isize,
            c_col_stride as isize,
        )
    };

    let a = unsafe {
        mat::from_raw_parts::<f64>(
            a_ptr,
            a_nrows as usize,
            a_ncols as usize,
            a_row_stride as isize,
            a_col_stride as isize,
        )
    };

    let b = unsafe {
        mat::from_raw_parts::<f64>(
            b_ptr,
            b_nrows as usize,
            b_ncols as usize,
            b_row_stride as isize,
            b_col_stride as isize,
        )
    };

    matmul(c, a, b, None, 1.0, Parallelism::Rayon(nthreads as usize));
}

[Curiosity] Is there a future plan for linear programming?

Hello,

Just out of curiosity, are you planning on adding support for linear programming that could run on top of the algebraic operations provided by the project? Or do you think that such thing should be better provided by third-parties using faer-rs ?

Thanks

Wiki code `use` statements seem incorrect

The top line of each of the wiki's examples has using faer_core::mat; or something like that. I think it should be something like use faer::core::mat; and so forth.

joss paper plots

evd.pdf
matmul.pdf
qr.pdf

`MulAssign<E: Entity>`

Is your feature request related to a problem? Please describe.
I have a x: Mat<f64> and an n: f64and I need to update x = n * x.

Describe the solution you'd like
I think it would be convenient to implement the MulAssign trait such that we can do x *= n and it reuses the memory allocated for x.

Describe alternatives you've considered
Maybe it would be good to have a generic map_with method that transforms all the elements of a matrix in place. It could be used to implement this MulAssign trait, and it would probably be useful for other things as well.

PyArray2 into faer to simplify python interfaces to rust routines using faer

Currently my pyO3 bindings that call some rust routines that use faer look something like:

use numpy::{IntoPyArray, PyArray2, PyReadonlyArray2};
use pyo3::{exceptions::PyRuntimeError, pymodule, types::PyModule, PyResult, Python};
use faer::{prelude::*}
// ...

#[pymodule]
fn rust_lib<'py>(_py: Python<'py>, m: &'py PyModule)
    -> PyResult<()>
{
   #[pyfn(m)]
    fn rpca<'py>(py: Python<'py>, a_py: PyReadonlyArray2<'py, f64>, n_rank: usize, n_iters: usize, n_oversamples: usize)
        -> &'py PyArray2<f64>
    {
        let a_ndarray = a_py.as_array();
        let a_faer = a_ndarray.view().into_faer();
        // ... faer-rs math
        ndarray_result.into_pyarray(py)
    }
}

I'm unsure if I am using the best approach.

Desired Solution:

// ...

   #[pyfn(m)]
    fn rpca<'py>(py: Python<'py>, a_py: PyReadonlyArray2<'py, f64>, n_rank: usize, n_iters: usize, n_oversamples: usize)
        -> &'py PyArray2<f64>
    {
        let a_faer = a_py.into_faer();
        // ... faer-rs math
        faer_result.into_pyarray(py)
    }

Rename fill_with_constant to fill?

I was skimming through the high level API PR, and saw renaming of set_constant to fill_with_constant. It's really a minor thing, but how about naming this method just fill as analogous to slice::fill? As for fill_with_zero, it does not have a counterpart in std (or I am now aware of it), but fill_zero sounds fine to me. This way, you can leave fill_with name for the method taking a filling closure, as is also a convention in std (slice::fill_with).

faer calculates the eigenvalues of a nontrivial adjacency matrix as `0` and `+/- inf`

Describe the bug
~~The Vec<c64> output of eigenvalues() for the adjacency matrix of the following $20$-vertex tree are incorrectly reported as all zeroes.~~

Note: the path graph on $3$ vertices produces the wrong eigenvalues. Here's a failing unit test.

To Reproduce

/* use the smaller unit test linked above
#[test]
fn this_other_tree_has_correct_maximum_eigenvalue() {
    let edges = [(3, 2), (6, 1), (7, 4), (7, 6), (8, 5), (9, 4), (11, 2), (12, 2), (13, 2), (15, 6), (16, 2), (16, 4), (17, 8), (18, 0), (18, 8), (18, 2), (19, 6), (19, 10), (19, 14)];
    let mut a = faer::Mat::zeros(20, 20);
    for (v, u) in edges.iter() {
        a[(*v, *u)] = 1.0;
        a[(*u, *v)] = 1.0;
    }
    let eigs_complex: Vec<faer::complex_native::c64> = a.eigenvalues();
    println!("{eigs_complex:?}");
    let eigs_real = eigs_complex
        .iter()
        .map(|e| e.re)
        .collect::<Vec<_>>();
    println!("{eigs_real:?}");
    let lambda_1 = *eigs_real
        .iter()
        .max_by(|a, b| a.partial_cmp(&b).unwrap())
        .unwrap();
    let correct_lamba_1 = 2.6148611139728866;
    assert!(
        (lambda_1 - correct_lamba_1).abs() < 1e-10,
        "lambda_1 = {lambda_1}, correct_lamba_1 = {correct_lamba_1}",
    );
}
*/

Expected behavior

name: graphenv
# channels:
#   - defaults
# dependencies:
#   - python
#   - networkx
#   - matplotlib
#   - imageio
#   - msgpack-python
#   - scipy

import networkx as nx
import numpy as np
import matplotlib.pyplot as plt

# Create the graph
edges = [[3, 2], [6, 1], [7, 4], [7, 6], [8, 5], [9, 4], [11, 2], [12, 2], [13, 2], [15, 6], [16, 2], [16, 4], [17, 8], [18, 0], [18, 8], [18, 2], [19, 6], [19, 10], [19, 14]]
graph = nx.Graph()
graph.add_edges_from(edges)

# Get the adjacency matrix eigenvalues
adjacency_matrix = nx.adjacency_matrix(graph).todense()
eigenvalues = np.linalg.eigvals(adjacency_matrix)

# Print the eigenvalues in decreasing order
print("Eigenvalues in decreasing order:")
for eigenvalue in sorted(eigenvalues, reverse=True):
    print(eigenvalue)

# Plot the graph
nx.draw(graph, with_labels=True)
plt.show()

Unit test fails in `faer-core`

Sorry I don't have time currently to try and track and fix the issue in more detail:

Running cargo t in faer-core fails non-deterministically with the error:

failures:

---- mul::tests::triangular stdout ----
thread 'mul::tests::triangular' panicked at 'attempt to subtract with overflow', /home/daniel/.cargo/registry/src/github.com-1ecc6299db9ec823/gemm-0.12.0/src/gemm.rs:960:35


failures:
    mul::tests::triangular

test result: FAILED. 12 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.07s

Normal math operators can't be used with sparse matrices

Is your feature request related to a problem? Please describe.
Normal math operators like +, -, *, == etc. can't be used on sparse matrices. There are currently two bounds that can't be satisfied:

The inner storage types of sparse matrices have to impl GenericMatrix, which they don't.
The generic sparse types have to implement the matrix operation traits like MatMul, MatAdd etc. Right now MatMul is implemented for sparse types, but not any of the other operations.

Describe the solution you'd like
Right now, to multiply a sparse matrix with another matrix, you must manually invoke the MatMul implementation as so:

<SparseRowMat<_> as MatMul<DenseCol>>::mat_mul(lhs, rhs.as_ref());

And there's no way to add or subtract sparse matrices. It would be nice to be able to do something like lhs * &rhs and lhs + &rhs like normal. As well, it would also be nice to have a Debug and Index implementation for sparse matrices.

Make `rayon` dependency optional?

Is it possible to make rayon-dependency optional? If so, would you consider accepting a PR that does this?

implement `core::ops::Add` for `Mat<T>`

implement addition for Mat<T>

the traits core::ops::Add<Mat<T>> and core::ops::Add<&Mat<T>> should be implemented for Mat<T> and &Mat<T>

T should satisfy the trait ComplexField for those operations to be implementable. Mat::with_dims could be used, with a closure that takes two elements and adds them

IntoNdarray depends on nalgebra feature flag instead of ndarray

Although the IntoNdarray trait itself depends on the ndarray feature, all of the implementations depend on the nalgebra feature.

Improved ergonomics for `polars_to_faer_fxx`

The current iteration of faer::polars::polars_to_faer_fxx requires that a slice &[&str] of column names be passed to the function. While there is nothing wrong with this in an absolute sense, it does make for a somewhat awkward workflow when working with polars.

A pattern more aligned to how polars functions is to remove the &[&str] item from the function signature, and have the function create a Mat<E> from the entire frame. The function should also error if columns containing invalid data types (such as strings) are passed in as well.

Large matrix support

I saw square matrix sizes up to 1k in the benchmark section. Is it in scope of the project to maintain and optimize larger (50-200k) matrix sizes for multiplication (or other operations)?

some missing macros / convenience functions

Is your feature request related to a problem? Please describe.
It makes sense to have the macros col! and row! in order to be consistent with mat!. This also seems a good opportunity to include a related macro (or function) blockwise!:

blockwise![[A, B], [C, D]];

// ┌───────┬──┐
// │       │  │
// │   A   │ B│
// │       │  │
// ├───────┼──┤
// │   C   │ D│
// └───────┴──┘

`no_std`

You do not support no_std, right?

A specialized Diagonal type?

I'm frequently using matrix-diagonal and diagonal-diagonal multiplication (which is a component-wise vector mul). Since populating a matrix with a diagonal does lead to wasted performance, I'm using a modified copy of div_by_s I found in faer source.

Does it make sense to flesh out a Diagonal type?

Auto Generate Release Notes

Is your feature request related to a problem? Please describe.
Another maintenance task. Unlike #10 which is geared towards developers knowing what low level changes happened. This one is geared towards user facing changes so the average user knows what changes happened between releases without much hassle on the maintainers side.

Describe the solution you'd like
This is one I have to research a bit more. I forgot the means to do this but I know it's possible and seemed rather useful.

You can assign this one to me

More specific Cholesky error

In the LAPACK Cholesky factorization routines (e.g. DPOTRF) when it encounters an error it returns the details about where it failed (INFO > 0). Is it possible to add something similar to the faer Cholesky routines?

Methods to read and write matrices from and to CSV

Is your feature request related to a problem? Please describe.
Especially in debugging, it is useful to be able to inspect the matrix in a non-terminal environment. The CSV format is easily implemented and read from, but has not yet been implemented.

Describe the solution you'd like
Methods for writing and reading matrices, either as implementations for a given type, or as generic functions.

Matrix Exponentiation

Is your feature request related to a problem? Please describe.
Matrix exponentation utilizing the standard double precision algorithm by Al-Mohy and Higham.

Describe the solution you'd like
A simple function that takes in a reference or matrix view (of some sorts not familiar with this library just yet) and yields a Result< ,> containing the exponentiated matrix if it worked and an error if not.

Describe alternatives you've considered
n/a

Additional context
I have implemented this function for ndarray-linalg but the maintainers have taken months and still no word. I am using my implementation for research purposes, using ndarray I have an error one-norm of about 20x scipy at 200x200 dense matrices and an error of machine precision for 1-sparse matrices up to 1024x1024. It might take some time but I'd be willing to port it over to this library to help move it along, would like to hear how this should best be done.

implement `core::ops::Sub` for `Mat<T>`

implement subtraction for Mat<T>

the traits core::ops::Sub<Mat<T>> and core::ops::Sub<&Mat<T>> should be implemented for Mat<T> and &Mat<T>

T should satisfy the trait ComplexField for those operations to be implementable. Mat::with_dims could be used, with a closure that takes two elements and subtracts them

Add a symmetric rank k operation (e.g. like DSYRK in BLAS)

Is your feature request related to a problem? Please describe.
It would be great to have an equivalent to DSYRK in faer, to do the operations
C = alpha * A * A^T + beta * C
and
C = alpha * A^T * A + beta * C
This is useful for example when accumulating normal equations.

Describe the solution you'd like
A symmetric rank k routine.

Describe alternatives you've considered
The operation can be done with GEMM, but is significantly slower.

Gemm slow on specific AMD epyic CPU

Describe the bug
Running the faer-bench benchmark on 2xAMD EPYC 7V13 64-Core Processor
is surprisingly slow. Both in absolute numbers, as well as in comparing the speedup of faer(par) over faer(seq).
This does not seem to be the case on other, larger AMD server cpu's

To Reproduce
Just to keep track for myself:
CXXFLAGS="-I/u/drehwald/prog" CXX=g++ cargo +nightly run --release --no-default-features --features faer

Expected behavior
Well, don't be slow.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

➜  ~ g++ --version                                                                                            
g++ (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

➜  ~ cargo +nightly --version
cargo 1.71.0-nightly (d0a4cbcee 2023-04-16)

Additional context

Our admin just got back to me, university admins probably won't adjust the perf settings for us, the machine is too busy so it would be a perf risk. But I got access to two other AMD machines, maybe we can use that for pinning the issue down.

implement the qr module

implement the qr decomposition with and without column pivoting, along with the solve/reconstruct/inverse/least_squares functions to complete the decomposition api

the qr module is also needed for the future svd module implementation, since it could be used as a preconditioner.

Segfault in faer-mt-cplx-llt-512

Running cargo bench faer-mt-cplx-llt-512 leads to a segfault on my machine (the corresponding single threaded bench seems to run fine). I get the following traceback:

#0  0x000055c4371ca569 in core::ptr::write<[num_complex::Complex<f64>; 3]> () at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/ptr/mod.rs:1354                                                                      
#1  core::ptr::mut_ptr::{impl#0}::write<[num_complex::Complex<f64>; 3]> () at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/ptr/mut_ptr.rs:1449                                                                     
#2  gemm_common::pack_operands::pack_generic_inner_loop<num_complex::Complex<f64>, 1, 3> () at /home/adr/.cargo/registry/src/github.com-1ecc6299db9ec823/gemm-common-0.15.2/src/pack_operands.rs:163                                   
#3  gemm_common::pack_operands::pack_generic<num_complex::Complex<f64>, 1, 3> () at /home/adr/.cargo/registry/src/github.com-1ecc6299db9ec823/gemm-common-0.15.2/src/pack_operands.rs:256                                              
#4  gemm_common::pack_operands::pack_rhs::{closure#0}<num_complex::Complex<f64>, 1, 3, gemm_common::simd::x86::Fma> () at /home/adr/.cargo/registry/src/github.com-1ecc6299db9ec823/gemm-common-0.15.2/src/pack_operands.rs:301        
#5  gemm_common::simd::x86::{impl#2}::vectorize<gemm_common::pack_operands::pack_rhs::{closure_env#0}<num_complex::Complex<f64>, 1, 3, gemm_common::simd::x86::Fma>> (f=...)                                                           
    at /home/adr/.cargo/registry/src/github.com-1ecc6299db9ec823/gemm-common-0.15.2/src/simd.rs:50                                                                                                                                     
#6  0x000055c4371c74ef in gemm_common::pack_operands::pack_rhs<num_complex::Complex<f64>, 1, 3, gemm_common::simd::x86::Fma> (n=<optimized out>, k=8192, dst=..., src=..., src_cs=48, src_rs=255, dst_stride=768)                      
    at /home/adr/.cargo/registry/src/github.com-1ecc6299db9ec823/gemm-common-0.15.2/src/pack_operands.rs:299                                                                                                                           
#7  0x000055c4371c4992 in gemm_common::gemm::gemm_basic_generic::{closure#3}<gemm_common::simd::x86::Fma, num_complex::Complex<f64>, 2, 6, 3, 3, gemm_c64::gemm::f64::fma_cplx::gemm_basic_cplx::{closure_env#0}> (                    
    tid=<optimized out>) at /home/adr/.cargo/registry/src/github.com-1ecc6299db9ec823/gemm-common-0.15.2/src/gemm.rs:395                                                                                                               
#8  0x000055c4371f88ea in core::ops::function::impls::{impl#0}::call<(usize), (dyn core::ops::function::Fn<(usize), Output=()> + core::marker::Send + core::marker::Sync)> (args=..., self=<optimized out>)                            
    at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/ops/function.rs:263                                                                                                                                            
#9  core::ops::function::impls::{impl#1}::call_mut<(usize), &(dyn core::ops::function::Fn<(usize), Output=()> + core::marker::Send + core::marker::Sync)> (args=..., self=<optimized out>)                                             
    at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/ops/function.rs:274                                                                                                                                            
#10 core::iter::traits::iterator::Iterator::for_each::call::{closure#0}<usize, &&(dyn core::ops::function::Fn<(usize), Output=()> + core::marker::Send + core::marker::Sync)> (item=8192)                                              
    at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/iter/traits/iterator.rs:849                                                                                                                                    
#11 core::iter::traits::iterator::Iterator::fold<core::ops::range::Range<usize>, (), core::iter::traits::iterator::Iterator::for_each::call::{closure_env#0}<usize, &&(dyn core::ops::function::Fn<(usize), Output=()> + core::marker::
Send + core::marker::Sync)>> (self=..., f=..., init=<optimized out>) at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/iter/traits/iterator.rs:2477                                                                  
#12 core::iter::traits::iterator::Iterator::for_each<core::ops::range::Range<usize>, &&(dyn core::ops::function::Fn<(usize), Output=()> + core::marker::Send + core::marker::Sync)> (self=..., f=0x7fe657ffd740)                       
    at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/iter/traits/iterator.rs:852                                                                                                                                    
#13 rayon::iter::for_each::{impl#1}::consume_iter<&(dyn core::ops::function::Fn<(usize), Output=()> + core::marker::Send + core::marker::Sync), usize, core::ops::range::Range<usize>> (self=..., iter=...)                            
    at /home/adr/.cargo/registry/src/github.com-1ecc6299db9ec823/rayon-1.7.0/src/iter/for_each.rs:55                                                                                                                                   
#14 rayon::iter::plumbing::Producer::fold_with<rayon::range::IterProducer<usize>, rayon::iter::for_each::ForEachConsumer<&(dyn core::ops::function::Fn<(usize), Output=()> + core::marker::Send + core::marker::Sync)>> (self=...,     
    folder=...) at /home/adr/.cargo/registry/src/github.com-1ecc6299db9ec823/rayon-1.7.0/src/iter/plumbing/mod.rs:110                                                                                                                  
#15 rayon::iter::plumbing::bridge_producer_consumer::helper<rayon::range::IterProducer<usize>, rayon::iter::for_each::ForEachConsumer<&(dyn core::ops::function::Fn<(usize), Output=()> + core::marker::Send + core::marker::Sync)>> ( 
    len=<optimized out>, migrated=<optimized out>, splitter=..., producer=..., consumer=...) at /home/adr/.cargo/registry/src/github.com-1ecc6299db9ec823/rayon-1.7.0/src/iter/plumbing/mod.rs:438

OS: arch linux
CPU: AMD Threadripper 1920X

version: current main branch (88dd024)

Geometric Algebra

Will this repository include topics such as Geometric Algerbra, such as G^n. This would be something I would be interested in helping contribute to.

Does faer use SIMD api ?

Wondering if faer is using SIMD api. If not, maybe SIMD can make faer faster

Auto Generate Changelog

Is your feature request related to a problem? Please describe.
This is more of a maintenance task to make it obvious and easy to see what changes have happened over the life of the project.

Describe the solution you'd like
I'm thinking of using an action to generate the Changelog on releases.

Describe alternatives you've considered
Considered GitHub Actions and some other functions for doing it. Decided to stick with native GitHub tooling.

Feel free to assign to me, as I think it would be fun to do.

Automate Testing

Is your feature request related to a problem? Please describe.

At current there are test but no automated tests run via CI to ensure bugs aren't introduced as changes are made.

Describe the solution you'd like
A GitHub Action to run the test and output our current test coverage.

Describe alternatives you've considered
None seemed natural to use GitHub Actions since we're on GitHub. Believe travis-CI is a viable alternative though.

Amazing Result vs Intel MKL ???

I have just seen your crate and ran a quick benchmark against intel mkl.

I got the following result

test bench_faer_rs_n        ... bench:  33,479,057 ns/iter (+/- 9,121,197)
test bench_faer_rs_t        ... bench:  32,854,984 ns/iter (+/- 7,885,723)
test bench_mkl_n            ... bench:  35,916,818 ns/iter (+/- 9,305,052)
test bench_mkl_t            ... bench:  35,681,294 ns/iter (+/- 10,449,638)

I tried it several times just to make sure it is not a luck result, or bug. Having on par performance with intel mkl library is amazing!!!.

Maybe you can confirm yourself and put a comparison with intelmkl to documentation since it is one of the fastest blas implementation, if not the fastest.

`AddAssign` and `SubAssign` traits

Is your feature request related to a problem? Please describe.
I have a x: Mat<f64> and y: MatRef<'_, f64> and I need to update x = x - y.

Describe the solution you'd like
I think it would be convenient to have an impl SubAssign<Rhs = MatRef> for Mat such that it reuses the memory allocated for x.

Describe alternatives you've considered
I am currently solving my issue with Mat::with_dims to create a new matrix. Implementing these traits could be a bit more performant, and it would definitely be more readable.

`Mat::eigenvalues()` may lead to a panic from a failing assertion

Describe the bug
Calling Mat::eigenvalues() may lead to a panic from a failing assertion. To reproduce this, generate a 4096x4096 matrix with random f64 elements of range 0 to 1 and call the eigenvalues funtion (See code below).
This will result in the following error:

thread 'main' panicked at /home/sthagel/.cargo/registry/src/index.crates.io-6f17d22bba15001f/faer-0.18.1/src/linalg/evd/hessenberg.rs:816:26:
assertion `left == right` failed: iterators must have the same length
  left: 31
 right: 32
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

To Reproduce

use faer::mat::Mat;
use num::complex::Complex64;
use rand::prelude::*;

fn main() {
    let n = 4096;
    let n_sq = n * n;
    let mat = (0..n_sq)
        .map(|_| {
            let mut rng = rand::thread_rng();
            rng.gen()
        })
        .collect::<Vec<f64>>();
    let results = eigenvals_faer(&mat, n - 2, n);
    println!("{:#?}", results)
}

#[inline]
pub fn eigenvals_faer(
    mat: &[f64],
    nev: usize,
    n: usize,
) -> Result<Vec<Complex64>, &str> {
    if nev * n == 0 || nev > n - 2 {
        return Err("Invalid parameters!");
    }

    let faer_mat: Mat<f64> = Mat::from_fn(n, n, |i, j| mat[i + n * j]);

    let evs = faer_mat.eigenvalues();

    Ok(evs)
}

Expected behavior
One would expect the calculation to produce the desired eigenvalues

Desktop (please complete the following information):

OS: Linux Mint 21.3, Kernel 6.5.0-21-generic
Rust version 1.76.0 (stable)
Version 0.18.1

Additional info
On smaller matrices (e.g. 40x40 instead of 4096x4096) it works just fine.

Unexpected results using `faer_core::solve::solve_lower_triangular_in_place_with_conj`

First of all, thanks a lot for your work in faer, it is impressive. It has also been a lot of fun to play around with the library.

Describe the bug
The results of solving a lower triangular system are unexpected.

To Reproduce
The following code reproduces the error. It also shows how the equivalent code with nalgebra gets the correct solution:

# Cargo.toml

[package]
name = "test"
version = "0.1.0"
edition = "2021"

[dependencies]
faer-core = "0.9.1"
nalgebra = "0.32.2"

// main.rs

const RESPONSE: [f64; 3] = [-18.0, -60.0, -80.0];

// Toeplitz and lower triangular
fn r_element(i: usize, j: usize) -> f64 {
    if i >= j {
        let diff = i - j;
        if diff < RESPONSE.len() {
            RESPONSE[diff]
        } else {
            0.0
        }
    } else {
        0.0
    }
}

fn r_matrix_faer(n: usize) -> faer_core::Mat<f64> {
    faer_core::Mat::with_dims(n, n, |i, j| r_element(i, j))
}

fn r_matrix_nalgebra(n: usize) -> nalgebra::DMatrix<f64> {
    nalgebra::DMatrix::from_fn(n, n, |i, j| r_element(i, j))
}

// Solve for X in Y = R * X
fn solve_x_faer(r: &faer_core::Mat<f64>, y: &mut faer_core::Mat<f64>) {
    faer_core::solve::solve_lower_triangular_in_place_with_conj(
        r.as_ref(),
        faer_core::Conj::No,
        y.as_mut(),
        faer_core::Parallelism::None,
    );
}

// Solve for X in Y = R * X
fn solve_x_nalgebra(r: &nalgebra::DMatrix<f64>, y: &mut nalgebra::DMatrix<f64>) {
    r.solve_lower_triangular_mut(y);
}

fn main() {
    let (i, j) = (400, 1);

    let r = r_matrix_faer(i);
    let mut x = faer_core::Mat::zeros(i, j);
    x.write(0, 0, 150.0);

    let mut y = r.clone() * x;

    solve_x_faer(&r, &mut y); // The values in `y` blow up

    // Now, the same thing with nalgebra
    let r = r_matrix_nalgebra(i);
    let mut x = nalgebra::DMatrix::zeros(i, j);
    x[(0, 0)] = 150.0;

    let mut y = r.clone() * x;

    solve_x_nalgebra(&r, &mut y); // The values in `y` are correct
}

Expected behavior
I would expect the same results as nalgebra

Specify MSRV

Specifying or documenting the MSRV somewhere would be helpful for consumers of faer.

This was discussed on discord.

Implement `Index` for `Col`/`Row`

Raising issue as a bookmark for making an initial contribution as suggested by @sarah-ek.

Error `Assertion failed` when computing eigenvalues for a particular matrix

When this matrix

[
[0.0 + 0.0 * I, 0.0 + 0.0 * I, 0.0 + 0.0 * I, 2.220446049250313e-16 + -1.0000000000000002 * I],
[0.0 + 0.0 * I, 0.0 + 0.0 * I, 2.220446049250313e-16 + -1.0000000000000002 * I, 0.0 + 0.0 * I],
[0.0 + 0.0 * I, 2.220446049250313e-16 + -1.0000000000000002 * I, 0.0 + 0.0 * I, 0.0 + 0.0 * I],
[2.220446049250313e-16 + -1.0000000000000002 * I, 0.0 + 0.0 * I, 0.0 + 0.0 * I, 0.0 + 0.0 * I],
]

as a Mat<c64> is passed to eigenvalues or complex_eigenvalues, it throws an error.

numpy does compute eigenvalues:

ar = np.array([
[0.0 + 0.0j, 0.0 + 0.0j, 0.0 + 0.0j, 2.220446049250313e-16 + -1.0000000000000002j],
[0.0 + 0.0j, 0.0 + 0.0j, 2.220446049250313e-16 + -1.0000000000000002j, 0.0 + 0.0j],
[0.0 + 0.0j, 2.220446049250313e-16 + -1.0000000000000002j, 0.0 + 0.0j, 0.0 + 0.0j],
[2.220446049250313e-16 + -1.0000000000000002j, 0.0 + 0.0j, 0.0 + 0.0j, 0.0 + 0.0j],
])

In [1]: np.linalg.eigvals(ar)
Out[1]: 
array([ 1.11022302e-16-1.j, -2.22044605e-16+1.j,  1.11022302e-16-1.j,
       -2.22044605e-16+1.j])

The error messages are

Assertion failed at /home/lapeyre/.cargo/registry/src/index.crates.io-6f17d22bba15001f/faer-core-0.13.0/src/lib.rs:2093:13:
  assert!( row < this.nrows() )
with expansion:
  4 < 4

Several other matrices do give the same eigenvalues that numpy does.

Version: 0.13.1
Arch Linux

Low Level API with Pre Allocated Work Space Exposed

It would be great if there will be, for any function, a low level API which exposes the needed workspace to avoid any allocations of the function.

The work is impressive. Being competitive with the big guys is nothing short of amazing considering this is a single person show.

Apache arrow input

Apache Arrow is an aligned chunked array representation of columns. If my matrix is a dense matrix encoded as a chunked array of float64 numbers is it possible to use faer-rs on top of it in a zero-copy way? I assume further constraints (chunksize has to be a multiple of some batch size) will be required. Or will we have to combine the (relatively large) chunks into a continuous aligned array first before feeding it to faer?

Add PGO version to benchmarks

Is your feature request related to a problem? Please describe.
Hi, I ran the benchmarks yesterday on my machine (Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz) and did a quick experiment with PGO. The wallclock results are even more impressive and are eventually worth adding to the benchmark section.

Describe the solution you'd like
Make user aware of PGO, link the section in Rust documentation and add some additional estimates of benchmarks.

Additional context

Two examples with least and most improvements. I have seen a consistent improvement for all benchmarks, especially for the parallelized version.

Matrix multiplication (f32)

with PGO w/o PGO

with PGO	w/o PGO
n faer faer(par) ndarray nalgebra eigen 32 1µs 828ns 1.4µs 1.7µs 812ns 64 4.8µs 4.8µs 7.3µs 9.5µs 4.6µs 96 14.9µs 15.5µs 18.9µs 29.6µs 8.3µs 128 36.6µs 21.8µs 43.3µs 67.5µs 21.7µs 192 115µs 39µs 42.7µs 221.9µs 31.4µs 256 272.5µs 80.2µs 133.7µs 513.3µs 81.6µs 384 910.1µs 247.4µs 223.8µs 1.7ms 179.3µs 512 2.3ms 530.9µs 610.9µs 4ms 574.6µs 640 4.4ms 998.5µs 1.1ms 7.8ms 895.4µs 768 7.5ms 1.6ms 1.4ms 13.4ms 1.3ms 896 12.2ms 2.7ms 2.6ms 21.5ms 2.6ms 1024 18.9ms 4ms 4.1ms 32.2ms 3.5ms	n faer faer(par) ndarray nalgebra eigen 32 1.3µs 1.1µs 1.5µs 1.9µs 811ns 64 5.5µs 5.4µs 7.4µs 9.9µs 4.3µs 96 15.9µs 56.4µs 20.1µs 30.5µs 8.3µs 128 38.9µs 64.6µs 43.7µs 69.3µs 21.7µs 192 118.1µs 80.4µs 42.6µs 222.7µs 31.5µs 256 278.5µs 114.1µs 133.3µs 512.1µs 81.2µs 384 936.1µs 330.9µs 226.2µs 1.7ms 179.1µs 512 2.3ms 636.4µs 590.5µs 4ms 570.3µs 640 4.4ms 1.2ms 1.1ms 7.8ms 895µs 768 7.5ms 1.9ms 1.4ms 13.4ms 1.3ms 896 12.6ms 3.1ms 2.6ms 21.4ms 2.6ms 1024 18.6ms 4.5ms 4.8ms 32.1ms 3.5ms

    n       faer  faer(par)    ndarray   nalgebra      eigen
   32        1µs      828ns      1.4µs      1.7µs      812ns
   64      4.8µs      4.8µs      7.3µs      9.5µs      4.6µs
   96     14.9µs     15.5µs     18.9µs     29.6µs      8.3µs
  128     36.6µs     21.8µs     43.3µs     67.5µs     21.7µs
  192      115µs       39µs     42.7µs    221.9µs     31.4µs
  256    272.5µs     80.2µs    133.7µs    513.3µs     81.6µs
  384    910.1µs    247.4µs    223.8µs      1.7ms    179.3µs
  512      2.3ms    530.9µs    610.9µs        4ms    574.6µs
  640      4.4ms    998.5µs      1.1ms      7.8ms    895.4µs
  768      7.5ms      1.6ms      1.4ms     13.4ms      1.3ms
  896     12.2ms      2.7ms      2.6ms     21.5ms      2.6ms
 1024     18.9ms        4ms      4.1ms     32.2ms      3.5ms

    n       faer  faer(par)    ndarray   nalgebra      eigen
   32      1.3µs      1.1µs      1.5µs      1.9µs      811ns
   64      5.5µs      5.4µs      7.4µs      9.9µs      4.3µs
   96     15.9µs     56.4µs     20.1µs     30.5µs      8.3µs
  128     38.9µs     64.6µs     43.7µs     69.3µs     21.7µs
  192    118.1µs     80.4µs     42.6µs    222.7µs     31.5µs
  256    278.5µs    114.1µs    133.3µs    512.1µs     81.2µs
  384    936.1µs    330.9µs    226.2µs      1.7ms    179.1µs
  512      2.3ms    636.4µs    590.5µs        4ms    570.3µs
  640      4.4ms      1.2ms      1.1ms      7.8ms      895µs
  768      7.5ms      1.9ms      1.4ms     13.4ms      1.3ms
  896     12.6ms      3.1ms      2.6ms     21.4ms      2.6ms
 1024     18.6ms      4.5ms      4.8ms     32.1ms      3.5ms

Thin matrix singular value decomposition (f32)

with PGO w/o PGO

with PGO	w/o PGO
n faer faer(par) ndarray nalgebra eigen 32 942.2µs 1.3ms 5.8ms 2.8ms 1.7ms 64 2.6ms 3.2ms 16ms 10.9ms 5.3ms 96 5.2ms 6.2ms 29.6ms 24.6ms 11.4ms 128 8.2ms 9ms 45.8ms 43.2ms 17.6ms 192 16.6ms 16.1ms 64.3ms 98.3ms 36.1ms 256 27.7ms 24.6ms 94.2ms 185.1ms 61.6ms 384 57.8ms 44.5ms 147.2ms 445ms 118ms 512 102.9ms 68.6ms 222.5ms 926.3ms 201.8ms 640 160.9ms 93.9ms 295.2ms 1.53s 311.4ms 768 236.9ms 127ms 368.9ms 2.49s 602.2ms 896 328.2ms 166ms 469.9ms 3.65s 611.1ms 1024 453.5ms 215.1ms 732.9ms 6.07s 808.6ms	n faer faer(par) ndarray nalgebra eigen 32 1.5ms 2.5ms 5.7ms 3.1ms 1.7ms 64 4.3ms 6.2ms 16.2ms 11.5ms 5.1ms 96 8.3ms 13.9ms 29.6ms 25.5ms 10.8ms 128 13.4ms 21.7ms 43.7ms 45.3ms 17.6ms 192 27.4ms 43.9ms 73.1ms 102.8ms 34.4ms 256 45.8ms 72ms 94.5ms 194.1ms 54.4ms 384 94.6ms 128.7ms 147.9ms 451.9ms 112.9ms 512 166.7ms 210.5ms 219.9ms 962.4ms 190.5ms 640 258.1ms 313.3ms 288.2ms 1.53s 299ms 768 385.5ms 443.6ms 367.1ms 2.5s 431ms 896 526.4ms 578.5ms 466.4ms 3.61s 596ms 1024 708.7ms 739ms 719.9ms 6.19s 789.8ms

    n       faer  faer(par)    ndarray   nalgebra      eigen
   32    942.2µs      1.3ms      5.8ms      2.8ms      1.7ms
   64      2.6ms      3.2ms       16ms     10.9ms      5.3ms
   96      5.2ms      6.2ms     29.6ms     24.6ms     11.4ms
  128      8.2ms        9ms     45.8ms     43.2ms     17.6ms
  192     16.6ms     16.1ms     64.3ms     98.3ms     36.1ms
  256     27.7ms     24.6ms     94.2ms    185.1ms     61.6ms
  384     57.8ms     44.5ms    147.2ms      445ms      118ms
  512    102.9ms     68.6ms    222.5ms    926.3ms    201.8ms
  640    160.9ms     93.9ms    295.2ms      1.53s    311.4ms
  768    236.9ms      127ms    368.9ms      2.49s    602.2ms
  896    328.2ms      166ms    469.9ms      3.65s    611.1ms
 1024    453.5ms    215.1ms    732.9ms      6.07s    808.6ms

    n       faer  faer(par)    ndarray   nalgebra      eigen
   32      1.5ms      2.5ms      5.7ms      3.1ms      1.7ms
   64      4.3ms      6.2ms     16.2ms     11.5ms      5.1ms
   96      8.3ms     13.9ms     29.6ms     25.5ms     10.8ms
  128     13.4ms     21.7ms     43.7ms     45.3ms     17.6ms
  192     27.4ms     43.9ms     73.1ms    102.8ms     34.4ms
  256     45.8ms       72ms     94.5ms    194.1ms     54.4ms
  384     94.6ms    128.7ms    147.9ms    451.9ms    112.9ms
  512    166.7ms    210.5ms    219.9ms    962.4ms    190.5ms
  640    258.1ms    313.3ms    288.2ms      1.53s      299ms
  768    385.5ms    443.6ms    367.1ms       2.5s      431ms
  896    526.4ms    578.5ms    466.4ms      3.61s      596ms
 1024    708.7ms      739ms    719.9ms      6.19s    789.8ms

f64_pgo.txt
f64_wo_pgo.txt
f32_wo_pgo.txt
f32_pgo.txt

Matrix Addition and Subtraction using `+` and `-` Operators

Is your feature request related to a problem? Please describe.

Hey, this might be an embarrassingly dumb question, but I cannot see a way to add two matrices using + (or subtract using -). The official website does mention that the addition and subtraction operators should work as expected, but when I write a+b for e.g. two Mat<f64> instances, it won't compile. Looking at the faer_core crate I cannot find an implementation of core::ops::Add. What am I missing?

Describe the solution you'd like

It would be great if the + operator would just work for two suitable Mat<E> instances as well as one Mat<E> and the as a suitable MatRef instance. I can elaborate on what I mean by suitable, but roughly I mean if the entities can be added, then the matrix should be addable (given that the matrix dimensions match).

Describe alternatives you've considered
I could of course do element wise additions manually, but that would feel so much less ergonomic than having an Add trait implemented. Same for AddAssign.

Additional context

If I missed an obvious way to add matrices, I'd really appreciate a hint. If the Add and AddAssign, Sub, and SubAssign traits are indeed not implemented I'd be very happy to file a PR because this crate is really awesome. Please let me know and thanks for your work :)

High-level api for computing eigenvalues and eigenvectors from square matrices?

Hi @sarah-ek,

Is there any plans on adding any high-level APIs for computing eigenvectors and eigenvalues of square, but not necessarily symmetric, matrices?

I really like the ergonomics of your library but it would be nice to just have a functional, simple API for providing matrices and getting out eigenvalues and their associated eigenvectors without having to hand roll some form of iterative/inverse iteration method in each new crate.

Sparse Cholesky Solver

Is your feature request related to a problem? Please describe.
Some problems need a Sparse Cholesky solver, such as GraphSlam

Describe the solution you'd like
Implement a Sparse Cholesky Solver

Describe alternatives you've considered
In Rust, I used Russel which wraps SuiteSparse

`i32` overflow when calling `Faer::selfadjoint_eigenvalues()` on a `20` x `20` matrix

Describe the bug

For many graph adjacency matrices with small eigenvalues, faer crashes on an i32 overflow when computing eigenvalues as a self-adjoint matrix.

The precise line is iter += 1;.

My current workaround to avoid selfadjoint_eigenvalues() is

let eigs: Vec<faer::complex_native::c64> = a.eigenvalues();

To Reproduce

let mat: [[f64; 20]; 20] = [[0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0], [1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0], [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0], [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0]];
let mut matt = faer::Mat::zeros(20, 20);
for i in 0..20 {
    for j in 0..20 {
        matt[(i, j)] = mat[i][j];
    }
}
let eigs = matt.selfadjoint_eigenvalues(faer::Side::Lower);

Expected behavior
Expected eigs to return a vector of $20$ eigenvalues.

For example,

# `bad_mat.py`
# run with `python bad_mat.py`

import numpy as np
mat = [[0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0], [1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0], [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0], [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0]]
mat = np.array(mat)

eigs = np.linalg.eig(mat)[0]
print(eigs)

real_parts = [np.real(_) for _ in eigs]
print(real_parts)

''' output:
[ 3.69390168e+00+0.00000000e+00j -3.23272209e+00+0.00000000e+00j
 -2.52597629e+00+0.00000000e+00j  2.29766159e+00+0.00000000e+00j
  2.19984492e+00+0.00000000e+00j -2.25650189e+00+0.00000000e+00j
 -1.98531299e+00+0.00000000e+00j -1.48123145e+00+0.00000000e+00j
 -1.10237216e+00+0.00000000e+00j  1.47508155e+00+0.00000000e+00j
 -6.62467970e-01+0.00000000e+00j -5.32411898e-01+0.00000000e+00j
 -2.28816715e-01+0.00000000e+00j  4.97954067e-01+0.00000000e+00j
  7.40821370e-01+0.00000000e+00j  1.10254826e+00+0.00000000e+00j
  1.00000000e+00+0.00000000e+00j  1.00000000e+00+0.00000000e+00j
 -1.88640048e-17+9.57840098e-17j -1.88640048e-17-9.57840098e-17j]
[3.6939016833383165, -3.2327220859059014, -2.52597628501941, 2.2976615943259593, 2.199844923144114, -2.2565018926765257, -1.9853129913610488, -1.4812314534620308, -1.102372157787216, 1.4750815501253782, -0.6624679696406541, -0.5324118977580056, -0.22881671515516241, 0.49795406722182745, 0.7408213702790264, 1.1025482603313292, 1.0000000000000004, 1.0000000000000004, -1.886400483817679e-17, -1.886400483817679e-17]
'''

Additional context
Feel free to suggest a different method to use!

Since these matrices have a few very small eigenvalues, perhaps it is appropriate to tweak the high-level function to instead calculate a good guess for consider_zero_threshold in the higher-level API.

In this instance, the value equals 2.2250738585072014e-308.

pub fn compute_tridiag_real_evd_qr_algorithm<E: RealField>(
    diag: &mut [E],
    offdiag: &mut [E],
    u: Option<MatMut<'_, E>>,
    epsilon: E,
    consider_zero_threshold: E,
) {

Bump Polars Version to 0.34

Is your feature request related to a problem? Please describe.
Faer right now doesn't work with Polars 0.34. We should update it and make sure nothing breaks and the tests still pass.

Describe the solution you'd like

polars = { version = "0.34", optional = true, features = ["lazy"] }

and see if existing functions still work.

Describe alternatives you've considered
NA

Additional context
NA

Thanks in advance

Add support for matrix multiplication with `half::f16`/`half::bf16`

Is your feature request related to a problem? Please describe.
Currently none of the matrix multiplication crates support the half (fp16) datatype. There is the half crate (https://crates.io/crates/half) that includes a rust type for this.

This means for any crate that needs to support matrix multiplication on CPU with f16 datatype, they need to manually implement a matrix multiplication algorithm, which can be very slow.

Describe the solution you'd like
It'd be great if faer included configurable support for these data types.

Describe alternatives you've considered
I'm currently using this super naive matmul implementation:

fn naive_gemm<F: num_traits::Float + std::ops::AddAssign, M: Dim, K: Dim, N: Dim>(
    (m, k, n): (M, K, N),
    ap: *const F,
    a_strides: [usize; 2],
    bp: *const F,
    b_strides: [usize; 2],
    cp: *mut F,
    c_strides: [usize; 2],
) {
    for i_m in 0..m.size() {
        for i_k in 0..k.size() {
            for i_n in 0..n.size() {
                unsafe {
                    let a = *ap.add(a_strides[0] * i_m + a_strides[1] * i_k);
                    let b = *bp.add(b_strides[0] * i_k + b_strides[1] * i_n);
                    let c = cp.add(c_strides[0] * i_m + c_strides[1] * i_n);
                    *c += a * b;
                }
            }
        }
    }
}

Which is extremely slow. I'm not sure what the other alternatives are.

Additional context

I'm the author of dfdx. dfdx has both cuda/CPU support, and CUDA does have hardware acceleration for f16. It'd just be nice if f16 matmul on CPU was a bit faster!

faer-bench fails with size mismatch

I ran cargo run in the faer-bench dir and the result was a panic.
The system is a Arm MacOS with the newest nichtly.

leifeld@MacStudovonDirk faer-bench % cargo +nightly run --release --no-default-features
Finished release [optimized] target(s) in 0.08s
Running target/release/faer-bench
f32

Matrix multiplication

Multiplication of two square matrices of dimension n.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4       33ns       32ns      142ns          -          -
    8       57ns       60ns      122ns          -          -
   16      206ns      211ns      303ns          -          -
   32      1.1µs      1.1µs      996ns          -          -
   64      8.1µs      8.1µs      6.7µs          -          -
   96     24.8µs     30.9µs      1.3ms          -          -
  128     59.8µs     39.1µs      1.2ms          -          -
  192    195.1µs     93.3µs      1.7ms          -          -
  256    479.7µs    202.1µs      1.9ms          -          -
  384      1.6ms    448.6µs      2.3ms          -          -
  512      3.7ms        1ms      2.6ms          -          -
  640      7.3ms      1.7ms        3ms          -          -
  768     12.5ms        3ms      5.1ms          -          -
  896     19.7ms      4.4ms      6.8ms          -          -
 1024     29.8ms      6.7ms      9.2ms          -          -

Triangular solve

Solving AX = B in place where A and B are two square matrices of dimension n, and A is a triangular matrix.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4       14ns       14ns     26.3µs          -          -
    8       90ns       90ns       27µs          -          -
   16      378ns      380ns     29.9µs          -          -
   32      1.4µs      1.4µs     35.9µs          -          -
   64      7.2µs      7.2µs     54.1µs          -          -
   96     20.4µs     27.2µs     93.5µs          -          -
  128     44.9µs     40.6µs    131.9µs          -          -
  192    131.7µs    117.5µs    290.3µs          -          -
  256    301.3µs    152.8µs    487.1µs          -          -
  384    913.6µs    450.8µs    996.1µs          -          -
  512      2.2ms      966µs      2.6ms          -          -
  640      4.1ms      1.8ms      2.7ms          -          -
  768      6.7ms      2.6ms      3.9ms          -          -
  896     10.8ms      3.6ms      5.3ms          -          -
 1024     17.7ms      5.5ms     12.6ms          -          -

Triangular inverse

Computing A^-1 where A is a square triangular matrix with dimension n.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      160ns      9.3µs     26.3µs          -          -
    8      468ns       12µs     27.1µs          -          -
   16      1.2µs     16.4µs     29.9µs          -          -
   32      3.4µs     24.1µs     35.9µs          -          -
   64      9.9µs     31.6µs     54.1µs          -          -
   96     23.2µs     39.9µs     89.1µs          -          -
  128     33.9µs     49.9µs    133.3µs          -          -
  192     86.4µs     94.7µs    310.4µs          -          -
  256    153.1µs    143.9µs    507.7µs          -          -
  384    421.7µs    281.3µs        1ms          -          -
  512    907.5µs    492.6µs      2.7ms          -          -
  640      1.6ms    816.2µs      2.7ms          -          -
  768      2.6ms        1ms      3.9ms          -          -
  896      4.1ms      1.5ms      5.4ms          -          -
 1024      6.8ms      2.4ms     12.5ms          -          -

Cholesky decomposition

Factorizing a square matrix with dimension n as L×L.T, where L is lower triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4       43ns       43ns      120ns          -          -
    8      154ns      154ns      354ns          -          -
   16      767ns      766ns      1.2µs          -          -
   32      2.1µs      2.1µs      4.8µs          -          -
   64      6.3µs      6.3µs       15µs          -          -
   96     15.8µs     15.8µs     34.2µs          -          -
  128     22.2µs     22.3µs    428.7µs          -          -
  192     59.7µs       87µs     12.5ms          -          -
  256    106.1µs    126.4µs      3.2ms          -          -
  384    292.4µs    331.5µs      5.3ms          -          -
  512    651.6µs    483.6µs        7ms          -          -
  640      1.1ms      1.1ms     11.4ms          -          -
  768        2ms      1.5ms     19.6ms          -          -
  896        3ms        2ms      1.82s          -          -
 1024      4.7ms      2.5ms     22.3ms          -          -

LU decomposition with partial pivoting

Factorizing a square matrix with dimension n as P×L×U, where P is a permutation matrix, L is unit lower triangular and U is upper triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4       92ns       89ns      154ns          -          -
    8      263ns      238ns      456ns          -          -
   16      895ns      905ns        2µs          -          -
   32      2.9µs      2.9µs        7µs          -          -
   64     11.9µs     12.2µs     22.8µs          -          -
   96     27.2µs     28.2µs     46.3µs          -          -
  128       54µs     53.7µs     88.3µs          -          -
  192    141.9µs    215.8µs    206.9µs          -          -
  256    285.8µs    384.5µs    749.8µs          -          -
  384    801.3µs    919.1µs      1.6ms          -          -
  512      1.8ms      1.7ms      2.4ms          -          -
  640      3.2ms      3.1ms      4.2ms          -          -
  768      5.3ms      4.2ms      5.8ms          -          -
  896      8.2ms      5.8ms      7.7ms          -          -
 1024     12.8ms      7.9ms      9.8ms          -          -

LU decomposition with full pivoting

Factorizing a square matrix with dimension n as P×L×U×Q.T, where P and Q are permutation matrices, L is unit lower triangular and U is upper triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      160ns      160ns          -          -          -
    8      388ns      403ns          -          -          -
   16      1.5µs      1.5µs          -          -          -
   32      7.9µs      7.8µs          -          -          -
   64     53.4µs       53µs          -          -          -
   96    174.2µs    173.9µs          -          -          -
  128    417.9µs      417µs          -          -          -
  192      1.5ms      1.5ms          -          -          -
  256      3.5ms      3.5ms          -          -          -
  384     11.8ms     11.3ms          -          -          -
  512     28.9ms     21.8ms          -          -          -
  640     55.6ms     33.2ms          -          -          -
  768       98ms     49.2ms          -          -          -
  896    158.7ms     72.1ms          -          -          -
 1024    242.3ms    103.1ms          -          -          -

QR decomposition with no pivoting

Factorizing a square matrix with dimension n as QR, where Q is unitary and R is upper triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      102ns      100ns      729ns          -          -
    8      294ns      293ns      1.5µs          -          -
   16      1.1µs      1.2µs      4.7µs          -          -
   32      4.6µs      4.7µs     17.2µs          -          -
   64     25.3µs     25.6µs     65.4µs          -          -
   96     57.7µs     59.4µs    454.9µs          -          -
  128    110.2µs    109.9µs      4.4ms          -          -
  192    298.2µs    297.7µs     37.8ms          -          -
  256    615.9µs    796.9µs       59ms          -          -
  384      1.8ms      2.1ms    118.2ms          -          -
  512        4ms      3.7ms    197.2ms          -          -
  640      6.7ms      5.1ms    301.4ms          -          -
  768     11.2ms        8ms    429.2ms          -          -
  896     17.3ms     11.9ms    523.8ms          -          -
 1024     25.6ms     16.5ms    632.4ms          -          -

QR decomposition with column pivoting

Factorizing a square matrix with dimension n as QRP, where P is a permutation matrix, Q is unitary and R is upper triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      153ns      140ns          -          -          -
    8      414ns      410ns          -          -          -
   16      1.5µs      1.6µs          -          -          -
   32      6.5µs      6.5µs          -          -          -
   64     40.7µs     66.3µs          -          -          -
   96    123.3µs    167.8µs          -          -          -
  128    266.4µs    327.5µs          -          -          -
  192    802.5µs    862.3µs          -          -          -
  256      1.8ms      2.3ms          -          -          -
  384      5.6ms      6.8ms          -          -          -
  512     12.5ms     12.8ms          -          -          -
  640     24.6ms     19.9ms          -          -          -
  768     41.3ms     28.1ms          -          -          -
  896     65.1ms     38.3ms          -          -          -
 1024     94.5ms     51.7ms          -          -          -

Matrix inverse

Computing the inverse of a square matrix with dimension n.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      726ns     14.3µs      412ns          -          -
    8      1.7µs     17.4µs      907ns          -          -
   16      4.4µs     25.3µs      3.9µs          -          -
   32       13µs     51.3µs     12.6µs          -          -
   64       42µs     84.9µs     49.4µs          -          -
   96    100.1µs      141µs    443.4µs          -          -
  128    168.5µs    209.1µs      2.8ms          -          -
  192      430µs    454.3µs      3.1ms          -          -
  256    837.1µs    711.4µs      7.3ms          -          -
  384      2.3ms      1.6ms     17.9ms          -          -
  512      5.3ms        3ms     19.4ms          -          -
  640      9.6ms        5ms     31.4ms          -          -
  768     15.2ms        7ms     34.6ms          -          -
  896       24ms      9.7ms     43.1ms          -          -
 1024     37.6ms     14.3ms     50.6ms          -          -

Square matrix singular value decomposition

Computing the SVD of a square matrix with dimension n.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      1.1µs      1.1µs        2µs          -          -
    8      5.8µs     21.5µs      5.8µs          -          -
   16     20.3µs       54µs     19.5µs          -          -
   32       72µs    116.1µs     71.6µs          -          -
   64    270.6µs    360.5µs      3.9ms          -          -
   96    664.6µs    971.7µs      6.7ms          -          -
  128      1.3ms      1.7ms     44.4ms          -          -
  192      3.5ms      5.1ms    108.8ms          -          -
  256      7.3ms      9.3ms    173.6ms          -          -
  384     21.4ms     20.8ms    310.9ms          -          -
  512     47.5ms     35.9ms    466.5ms          -          -
  640     85.2ms     54.2ms    616.3ms          -          -
  768      143ms       80ms    830.5ms          -          -
  896    225.4ms    115.2ms      1.07s          -          -
 1024    327.3ms    160.4ms      1.23s          -          -

Thin matrix singular value decomposition

Computing the SVD of a rectangular matrix with shape (4096, n).

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4     32.8µs     32.4µs    644.1µs          -          -
    8      112µs    150.6µs      1.4ms          -          -
   16    318.9µs    456.7µs      8.4ms          -          -
   32    947.4µs      1.2ms     15.5ms          -          -
   64      2.9ms      3.5ms     45.5ms          -          -
   96      5.3ms      5.8ms       61ms          -          -
  128      8.9ms        9ms    100.3ms          -          -
  192     18.7ms     17.5ms    248.2ms          -          -
  256     33.7ms     29.3ms    380.9ms          -          -
  384     75.2ms     54.5ms    657.6ms          -          -
  512    138.8ms       94ms      1.15s          -          -
  640    224.2ms    136.4ms      1.63s          -          -
  768    336.6ms    182.1ms      1.97s          -          -
  896    478.8ms    240.6ms      2.45s          -          -
 1024      656ms    327.2ms      2.68s          -          -

Hermitian matrix eigenvalue decomposition

Computing the EVD of a hermitian matrix with shape (n, n).

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      992ns      890ns        1µs          -          -
    8      2.5µs      2.4µs      3.5µs          -          -
   16      9.2µs      8.7µs     12.1µs          -          -
   32     36.5µs     34.8µs     55.5µs          -          -
   64    161.4µs    194.4µs    421.2µs          -          -
   96    391.3µs    499.3µs      1.1ms          -          -
  128    725.3µs    841.1µs      4.3ms          -          -
  192      1.9ms      2.2ms     22.6ms          -          -
  256      3.7ms        4ms     45.9ms          -          -
  384     10.7ms     12.4ms    114.5ms          -          -
  512     23.2ms     22.7ms    206.6ms          -          -
  640     41.9ms     34.5ms    330.7ms          -          -
  768     69.5ms       51ms      471ms          -          -
  896    107.4ms     70.9ms      657ms          -          -
 1024    156.6ms     92.4ms    888.3ms          -          -

Non Hermitian matrix eigenvalue decomposition

Computing the EVD of a matrix with shape (n, n).

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      3.9µs      4.2µs      2.4µs          -          -
    8     11.6µs       13µs      6.9µs          -          -
   16     43.7µs     42.8µs     27.4µs          -          -
   32    174.2µs    188.9µs    123.3µs          -          -
   64    806.5µs    834.3µs    658.5µs          -          -
   96      1.9ms        2ms      4.4ms          -          -
  128      3.2ms      3.4ms     31.6ms          -          -
  192      8.5ms        9ms    100.3ms          -          -
  256     16.4ms       17ms    228.4ms          -          -
  384     38.3ms     47.9ms    592.5ms          -          -
  512     82.8ms     96.1ms    798.3ms          -          -
  640    146.3ms    164.9ms      1.12s          -          -
  768    230.3ms    243.3ms      1.46s          -          -
  896    340.3ms    341.4ms      1.69s          -          -
 1024    540.2ms    539.9ms      2.45s          -          -

f64

Matrix multiplication

Multiplication of two square matrices of dimension n.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4       32ns       32ns      128ns          -          -
    8       80ns       84ns      171ns          -          -
   16      311ns      315ns      345ns          -          -
   32      2.1µs      2.2µs      1.8µs          -          -
   64     14.8µs     14.8µs     12.1µs          -          -
   96     50.3µs     41.3µs      1.2ms          -          -
  128    124.1µs     69.5µs      1.4ms          -          -
  192    392.9µs    173.1µs      1.5ms          -          -
  256    943.6µs    315.2µs      1.3ms          -          -
  384      3.2ms    889.3µs      2.3ms          -          -
  512      7.5ms      1.9ms      3.7ms          -          -
  640     14.5ms      3.4ms      4.9ms          -          -
  768     25.7ms      5.8ms      8.5ms          -          -
  896       30ms      7.3ms     13.1ms          -          -
 1024     45.1ms       11ms     17.4ms          -          -

Triangular solve

Solving AX = B in place where A and B are two square matrices of dimension n, and A is a triangular matrix.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4       14ns       14ns     27.9µs          -          -
    8       89ns       90ns     27.7µs          -          -
   16      429ns      431ns     29.8µs          -          -
   32      1.9µs      1.9µs     35.5µs          -          -
   64     11.3µs     11.3µs       54µs          -          -
   96     33.9µs     35.3µs     81.9µs          -          -
  128     75.4µs     63.5µs      126µs          -          -
  192      235µs    149.1µs    264.2µs          -          -
  256    555.4µs    254.1µs    759.9µs          -          -
  384      1.7ms    841.4µs      1.2ms          -          -
  512      4.5ms      1.6ms      3.5ms          -          -
  640      7.8ms      2.8ms      3.4ms          -          -
  768     13.1ms      4.5ms      7.3ms          -          -
  896     20.9ms      6.5ms        7ms          -          -
 1024     35.4ms       10ms     18.3ms          -          -

Triangular inverse

Computing A^-1 where A is a square triangular matrix with dimension n.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      156ns     12.4µs       28µs          -          -
    8      479ns     11.4µs     27.7µs          -          -
   16      1.3µs       17µs     29.9µs          -          -
   32      3.7µs     24.1µs     34.2µs          -          -
   64     11.4µs       33µs     52.2µs          -          -
   96     28.8µs     45.4µs    212.9µs          -          -
  128     45.8µs     65.2µs    334.2µs          -          -
  192    124.9µs    130.4µs    405.3µs          -          -
  256      247µs      226µs      765µs          -          -
  384    712.3µs    395.6µs      1.2ms          -          -
  512      1.8ms    736.7µs      3.4ms          -          -
  640      2.9ms      1.3ms      3.3ms          -          -
  768      4.8ms        2ms      7.2ms          -          -
  896      7.6ms      2.6ms      7.1ms          -          -
 1024     13.5ms      3.9ms       18ms          -          -

Cholesky decomposition

Factorizing a square matrix with dimension n as L×L.T, where L is lower triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4       46ns       46ns      121ns          -          -
    8      146ns      145ns      393ns          -          -
   16      631ns      625ns      1.3µs          -          -
   32        2µs        2µs      5.1µs          -          -
   64      7.3µs      7.3µs      187µs          -          -
   96     18.3µs     18.3µs      1.4ms          -          -
  128     32.2µs     32.4µs      1.3ms          -          -
  192     88.2µs    100.1µs        6ms          -          -
  256    186.8µs    188.5µs      4.8ms          -          -
  384    520.7µs      465µs      8.8ms          -          -
  512      1.3ms    935.6µs     10.4ms          -          -
  640      2.2ms      1.6ms     12.7ms          -          -
  768      3.8ms      2.2ms     20.5ms          -          -
  896      5.7ms        3ms     24.7ms          -          -
 1024      9.5ms      4.6ms     26.2ms          -          -

LU decomposition with partial pivoting

Factorizing a square matrix with dimension n as P×L×U, where P is a permutation matrix, L is unit lower triangular and U is upper triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4       76ns       93ns      150ns          -          -
    8      225ns      213ns      493ns          -          -
   16      740ns      771ns      2.2µs          -          -
   32        3µs      2.9µs      6.6µs          -          -
   64     14.7µs     14.8µs     21.1µs          -          -
   96     37.7µs     37.7µs     46.5µs          -          -
  128     75.7µs     76.2µs    302.8µs          -          -
  192    218.3µs    276.5µs    620.6µs          -          -
  256    474.4µs      524µs    810.8µs          -          -
  384      1.4ms      1.3ms      1.5ms          -          -
  512      3.4ms      2.6ms        3ms          -          -
  640      5.9ms      4.4ms      4.6ms          -          -
  768      9.9ms      6.2ms      6.5ms          -          -
  896     15.5ms      8.5ms      8.7ms          -          -
 1024     24.8ms     12.8ms     11.3ms          -          -

LU decomposition with full pivoting

Factorizing a square matrix with dimension n as P×L×U×Q.T, where P and Q are permutation matrices, L is unit lower triangular and U is upper triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      119ns      132ns          -          -          -
    8      371ns      369ns          -          -          -
   16      1.4µs      1.4µs          -          -          -
   32      7.7µs      7.7µs          -          -          -
   64     53.4µs     53.4µs          -          -          -
   96    185.1µs    184.7µs          -          -          -
  128    460.1µs      459µs          -          -          -
  192      1.7ms      1.7ms          -          -          -
  256      4.3ms      4.3ms          -          -          -
  384     14.8ms       14ms          -          -          -
  512     38.2ms     25.8ms          -          -          -
  640     73.1ms       38ms          -          -          -
  768    128.8ms     56.5ms          -          -          -
  896    207.4ms     80.4ms          -          -          -
 1024    306.1ms    119.3ms          -          -          -

QR decomposition with no pivoting

Factorizing a square matrix with dimension n as QR, where Q is unitary and R is upper triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      106ns      106ns      658ns          -          -
    8      295ns      296ns      1.6µs          -          -
   16        1µs        1µs      4.7µs          -          -
   32        5µs        5µs       18µs          -          -
   64     31.9µs     31.8µs     75.6µs          -          -
   96     79.7µs     79.7µs    526.7µs          -          -
  128    158.8µs    157.9µs      4.7ms          -          -
  192    451.3µs    450.1µs     49.9ms          -          -
  256    963.8µs      1.1ms     72.8ms          -          -
  384      2.9ms      2.9ms    148.4ms          -          -
  512      6.6ms      5.4ms    291.2ms          -          -
  640     12.4ms      8.9ms    337.5ms          -          -
  768     20.8ms     13.7ms    413.4ms          -          -
  896     31.6ms     20.7ms    546.6ms          -          -
 1024     45.6ms     28.3ms    734.9ms          -          -

QR decomposition with column pivoting

Factorizing a square matrix with dimension n as QRP, where P is a permutation matrix, Q is unitary and R is upper triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      175ns      162ns          -          -          -
    8      422ns      436ns          -          -          -
   16      1.6µs      1.6µs          -          -          -
   32        8µs      7.9µs          -          -          -
   64     58.7µs     93.9µs          -          -          -
   96      169µs    213.4µs          -          -          -
  128    358.2µs    422.9µs          -          -          -
  192      1.2ms      1.2ms          -          -          -
  256      2.7ms        3ms          -          -          -
  384      8.8ms      8.5ms          -          -          -
  512     20.5ms     16.5ms          -          -          -
  640     39.9ms     27.4ms          -          -          -
  768     67.8ms     43.4ms          -          -          -
  896    107.9ms     59.4ms          -          -          -
 1024    159.2ms       77ms          -          -          -

Matrix inverse

Computing the inverse of a square matrix with dimension n.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      715ns     12.4µs      406ns          -          -
    8      1.6µs     16.7µs      991ns          -          -
   16      4.2µs     25.4µs      4.1µs          -          -
   32     13.6µs     53.5µs     12.8µs          -          -
   64     51.3µs     96.7µs     56.2µs          -          -
   96      130µs      169µs    450.3µs          -          -
  128    240.2µs    260.5µs      3.3ms          -          -
  192    659.7µs    589.3µs      6.1ms          -          -
  256      1.4ms      1.1ms      7.2ms          -          -
  384      4.1ms      2.4ms     15.8ms          -          -
  512      9.8ms      4.5ms     21.3ms          -          -
  640     17.3ms      7.5ms     28.9ms          -          -
  768     28.7ms     10.8ms       40ms          -          -
  896     44.9ms     15.3ms     52.6ms          -          -
 1024     74.2ms     23.2ms     70.7ms          -          -

Square matrix singular value decomposition

Computing the SVD of a square matrix with dimension n.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      1.4µs      1.3µs      2.4µs          -          -
    8      7.4µs     23.1µs      7.4µs          -          -
   16     23.3µs     56.8µs     24.9µs          -          -
   32     85.2µs    130.1µs     86.7µs          -          -
   64      374µs    464.2µs      4.1ms          -          -
   96      953µs      1.3ms        7ms          -          -
  128      1.9ms      2.2ms     51.7ms          -          -
  192      5.3ms      6.3ms    135.2ms          -          -
  256     11.4ms       11ms    204.5ms          -          -
  384     34.1ms     27.3ms    312.2ms          -          -
  512     77.3ms     48.9ms      516ms          -          -
  640    144.7ms     78.2ms    688.1ms          -          -
  768    248.5ms    119.6ms    912.3ms          -          -
  896    384.6ms    176.2ms      1.16s          -          -
 1024    560.5ms    249.6ms      1.64s          -          -

Thin matrix singular value decomposition

Computing the SVD of a rectangular matrix with shape (4096, n).

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4     43.6µs     43.7µs    578.8µs          -          -
    8    165.7µs    180.3µs      1.5ms          -          -
   16    471.3µs    573.1µs      7.8ms          -          -
   32      1.4ms      1.5ms     15.2ms          -          -
   64      4.6ms      4.5ms     58.3ms          -          -
   96      9.3ms      8.2ms     62.8ms          -          -
  128     16.1ms     12.8ms    103.5ms          -          -
  192       34ms     24.8ms    237.3ms          -          -
  256     60.9ms     41.3ms    383.9ms          -          -
  384    134.4ms     83.4ms    663.6ms          -          -
  512      248ms    142.1ms      1.05s          -          -
  640    402.7ms    204.7ms      1.52s          -          -
  768    616.6ms    298.7ms      1.81s          -          -
  896    874.1ms    399.3ms      2.44s          -          -
 1024      1.19s    532.8ms      2.87s          -          -

Hermitian matrix eigenvalue decomposition

Computing the EVD of a hermitian matrix with shape (n, n).

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      1.1µs        1µs      1.2µs          -          -
    8        3µs      3.6µs      4.2µs          -          -
   16       12µs     11.8µs     15.5µs          -          -
   32     50.4µs     53.6µs       68µs          -          -
   64    224.6µs    245.4µs    437.8µs          -          -
   96    519.9µs    615.6µs      1.4ms          -          -
  128    989.7µs      1.1ms        6ms          -          -
  192      2.5ms      2.8ms     25.6ms          -          -
  256      5.1ms      4.8ms     51.1ms          -          -
  384     14.5ms     14.8ms    122.3ms          -          -
  512     31.6ms     27.8ms    232.1ms          -          -
  640     58.5ms     44.6ms    370.5ms          -          -
  768       97ms     65.1ms      554ms          -          -
  896    148.8ms     92.1ms    769.1ms          -          -
 1024    218.1ms    122.4ms      1.09s          -          -

Non Hermitian matrix eigenvalue decomposition

Computing the EVD of a matrix with shape (n, n).

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      4.2µs      4.3µs      2.5µs          -          -
    8     14.7µs       14µs      8.6µs          -          -
   16     51.5µs       49µs     31.3µs          -          -
   32    228.1µs    225.8µs    146.1µs          -          -
   64        1ms        1ms    822.7µs          -          -
   96      2.8ms        3ms      5.1ms          -          -
  128      5.2ms      5.2ms     25.1ms          -          -
  192     14.2ms     14.4ms    121.7ms          -          -
  256     26.9ms     26.7ms    312.8ms          -          -
  384     69.1ms     78.5ms    624.7ms          -          -
  512    159.2ms    177.8ms      1.02s          -          -
  640    256.7ms    259.4ms      1.17s          -          -
  768    407.6ms    382.6ms      1.69s          -          -
  896    588.6ms    531.4ms      2.03s          -          -
 1024      1.03s    931.6ms      2.75s          -          -

f128

Matrix multiplication

Multiplication of two square matrices of dimension n.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      150ns      151ns          -          -          -
    8      1.3µs     16.2µs          -          -          -
   16      8.1µs     25.4µs          -          -          -
   32     60.4µs     40.6µs          -          -          -
   64    474.6µs    162.2µs          -          -          -
   96      1.6ms    519.4µs          -          -          -
  128      3.8ms    961.5µs          -          -          -
  192     12.8ms      2.6ms          -          -          -
  256     30.7ms      5.5ms          -          -          -
  384    103.2ms     16.5ms          -          -          -
  512    254.1ms     39.6ms          -          -          -
  640    481.4ms     70.9ms          -          -          -
  768    866.7ms    133.5ms          -          -          -
  896      1.34s      185ms          -          -          -
 1024      2.25s    311.7ms          -          -          -

Triangular solve

Solving AX = B in place where A and B are two square matrices of dimension n, and A
is a triangular matrix.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4       73ns       73ns          -          -          -
    8      670ns      669ns          -          -          -
   16      5.1µs     24.7µs          -          -          -
   32     37.8µs    179.7µs          -          -          -
   64    269.6µs    471.8µs          -          -          -
   96    892.4µs    568.2µs          -          -          -
  128      2.1ms    926.1µs          -          -          -
  192      6.9ms      2.3ms          -          -          -
  256       16ms      4.4ms          -          -          -
  384     53.9ms       12ms          -          -          -
  512    126.2ms     25.5ms          -          -          -
  640    243.8ms     47.3ms          -          -          -
  768    419.6ms     71.8ms          -          -          -
  896    662.1ms    118.7ms          -          -          -
 1024      1.03s    170.5ms          -          -          -

Triangular inverse

Computing A^-1 where A is a square triangular matrix with dimension n.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      271ns       16µs          -          -          -
    8      965ns     13.8µs          -          -          -
   16      4.2µs     35.5µs          -          -          -
   32       22µs     85.3µs          -          -          -
   64    132.6µs    330.6µs          -          -          -
   96      389µs    566.4µs          -          -          -
  128    862.3µs        1ms          -          -          -
  192      2.7ms        2ms          -          -          -
  256      6.1ms        3ms          -          -          -
  384     19.6ms      7.6ms          -          -          -
  512       46ms       13ms          -          -          -
  640     96.4ms     25.3ms          -          -          -
  768      155ms     35.2ms          -          -          -
  896    235.6ms       54ms          -          -          -
 1024    349.3ms     75.2ms          -          -          -

Cholesky decomposition

Factorizing a square matrix with dimension n as L×L.T, where L is lower triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      289ns      286ns          -          -          -
    8      692ns      696ns          -          -          -
   16      2.4µs      2.4µs          -          -          -
   32     18.9µs     58.2µs          -          -          -
   64    128.9µs    393.8µs          -          -          -
   96    368.2µs      844µs          -          -          -
  128    858.1µs      1.6ms          -          -          -
  192      2.7ms      3.6ms          -          -          -
  256      6.1ms      5.2ms          -          -          -
  384     20.1ms     12.8ms          -          -          -
  512     45.8ms       19ms          -          -          -
  640     88.4ms     36.8ms          -          -          -
  768    147.9ms     50.1ms          -          -          -
  896    231.4ms     69.7ms          -          -          -
 1024    348.7ms     89.7ms          -          -          -

LU decomposition with partial pivoting

Factorizing a square matrix with dimension n as P×L×U, where P is a permutation matrix,
L is unit lower triangular and U is upper triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      214ns      186ns          -          -          -
    8      580ns      596ns          -          -          -
   16      2.4µs      2.5µs          -          -          -
   32     21.7µs     64.4µs          -          -          -
   64    173.7µs    369.3µs          -          -          -
   96    581.2µs    889.6µs          -          -          -
  128      1.3ms      1.5ms          -          -          -
  192      4.8ms      3.7ms          -          -          -
  256     10.9ms      6.4ms          -          -          -
  384     36.4ms     16.3ms          -          -          -
  512     85.5ms     33.5ms          -          -          -
  640    165.6ms       54ms          -          -          -
  768    284.5ms     80.2ms          -          -          -
  896    450.1ms      125ms          -          -          -
 1024    684.1ms    173.6ms          -          -          -

LU decomposition with full pivoting

Factorizing a square matrix with dimension n as P×L×U×Q.T, where P and Q are
permutation matrices, L is unit lower triangular and U is upper triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      310ns      314ns          -          -          -
    8      1.2µs      1.1µs          -          -          -
   16      5.4µs      5.5µs          -          -          -
   32     35.5µs     35.6µs          -          -          -
   64    255.5µs      256µs          -          -          -
   96    843.7µs    840.9µs          -          -          -
  128        2ms        2ms          -          -          -
  192      6.7ms      6.7ms          -          -          -
  256     16.2ms     16.2ms          -          -          -
  384     52.2ms     47.8ms          -          -          -
  512    125.9ms     76.2ms          -          -          -
  640    241.1ms    114.2ms          -          -          -
  768    417.1ms    166.8ms          -          -          -
  896    661.6ms    232.6ms          -          -          -
 1024    996.3ms    326.2ms          -          -          -

QR decomposition with no pivoting

Factorizing a square matrix with dimension n as QR, where Q is unitary and R is upper
triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      698ns      699ns          -          -          -
    8      2.3µs      2.3µs          -          -          -
   16     10.9µs     10.7µs          -          -          -
   32     56.8µs     56.6µs          -          -          -
   64    519.8µs    518.4µs          -          -          -
   96      1.5ms      1.5ms          -          -          -
  128      3.4ms      3.5ms          -          -          -
  192     10.7ms     10.8ms          -          -          -
  256     24.7ms     18.5ms          -          -          -
  384     80.8ms       52ms          -          -          -
  512    198.4ms     97.7ms          -          -          -
  640    368.2ms    146.3ms          -          -          -
  768      624ms    219.9ms          -          -          -
  896    980.1ms    318.4ms          -          -          -
 1024      1.46s    422.8ms          -          -          -

QR decomposition with column pivoting

Factorizing a square matrix with dimension n as QRP, where P is a permutation matrix,
Q is unitary and R is upper triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      947ns      939ns          -          -          -
    8      3.6µs      3.6µs          -          -          -
   16     16.8µs     16.8µs          -          -          -
   32       97µs     97.4µs          -          -          -
   64    684.6µs    714.8µs          -          -          -
   96        2ms      1.9ms          -          -          -
  128      4.4ms      4.2ms          -          -          -
  192     13.7ms     13.1ms          -          -          -
  256     30.7ms     25.1ms          -          -          -
  384     99.1ms     49.1ms          -          -          -
  512    231.4ms       90ms          -          -          -
  640    449.1ms    151.6ms          -          -          -
  768    764.5ms    230.5ms          -          -          -
  896      1.21s    332.2ms          -          -          -
 1024      1.79s      462ms          -          -          -

Matrix inverse

Computing the inverse of a square matrix with dimension n.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      1.4µs     15.1µs          -          -          -
    8      4.5µs     36.1µs          -          -          -
   16     20.4µs     63.4µs          -          -          -
   32    110.3µs    196.8µs          -          -          -
   64    694.5µs    777.2µs          -          -          -
   96      2.1ms      1.7ms          -          -          -
  128      4.7ms      3.1ms          -          -          -
  192     15.1ms      7.8ms          -          -          -
  256       35ms     13.6ms          -          -          -
  384    117.1ms     35.4ms          -          -          -
  512    267.1ms     64.2ms          -          -          -
  640    524.5ms    120.4ms          -          -          -
  768    899.5ms    179.5ms          -          -          -
  896       1.4s    273.8ms          -          -          -
 1024      2.17s    405.6ms          -          -          -

Square matrix singular value decomposition

Computing the SVD of a square matrix with dimension n.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4     11.6µs     11.7µs          -          -          -
    8     60.4µs     96.7µs          -          -          -
   16    259.8µs    333.2µs          -          -          -
   32      1.1ms      1.1ms          -          -          -
   64      5.9ms      5.6ms          -          -          -
   96     16.4ms     12.4ms          -          -          -
  128     31.5ms     22.2ms          -          -          -
  192     89.6ms       51ms          -          -          -
  256    194.7ms     89.3ms          -          -          -
  384    581.4ms    243.3ms          -          -          -
  512      1.32s    464.9ms          -          -          -
  640      2.48s      758ms          -          -          -
  768      4.13s      1.16s          -          -          -
  896      6.45s      1.72s          -          -          -
 1024      9.51s      2.27s          -          -          -

Thin matrix singular value decomposition

Computing the SVD of a rectangular matrix with shape (4096, n).

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      1.1ms      2.1ms          -          -          -
    8      3.6ms      4.6ms          -          -          -
   16     10.2ms     13.2ms          -          -          -
   32     32.3ms     33.1ms          -          -          -
   64    120.3ms     86.5ms          -          -          -
   96    253.1ms    142.4ms          -          -          -
  128    444.6ms    216.4ms          -          -          -
  192    996.1ms    399.4ms          -          -          -
  256       1.8s    645.6ms          -          -          -
  384      4.05s      1.24s          -          -          -
  512      7.37s       2.1s          -          -          -
  640     11.67s      3.04s          -          -          -
  768     17.43s       4.6s          -          -          -
  896     23.81s      6.02s          -          -          -
 1024     32.59s      7.85s          -          -          -

Hermitian matrix eigenvalue decomposition

Computing the EVD of a hermitian matrix with shape (n, n).

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      5.4µs      5.8µs          -          -          -
    8     21.9µs     25.7µs          -          -          -
   16      102µs     99.5µs          -          -          -
   32    515.3µs    497.8µs          -          -          -
   64      3.1ms      3.1ms          -          -          -
   96      8.6ms      7.2ms          -          -          -
  128     17.5ms     13.1ms          -          -          -
  192     51.1ms     33.1ms          -          -          -
  256    108.3ms     65.1ms          -          -          -
  384    339.3ms    152.3ms          -          -          -
  512    745.4ms    283.9ms          -          -          -
  640       1.4s    457.2ms          -          -          -
  768      2.36s    702.8ms          -          -          -
  896      3.72s      1.03s          -          -          -
 1024      5.45s      1.37s          -          -          -

Non Hermitian matrix eigenvalue decomposition

Computing the EVD of a matrix with shape (n, n).

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4     16.2µs     14.7µs          -          -          -
    8     68.3µs    101.5µs          -          -          -
   16    349.1µs      388µs          -          -          -
   32      1.8ms      1.8ms          -          -          -
   64     11.4ms     12.8ms          -          -          -
   96     46.5ms     60.2ms          -          -          -
  128     92.9ms    104.8ms          -          -          -
  192    299.1ms    221.5ms          -          -          -
  256    593.9ms    418.8ms          -          -          -
  384      1.61s    978.8ms          -          -          -
  512      3.46s         2s          -          -          -
  640      6.26s      3.63s          -          -          -
  768     10.17s      5.24s          -          -          -
  896     15.31s      7.25s          -          -          -
 1024     22.59s      9.99s          -          -          -

c32

Matrix multiplication

Multiplication of two square matrices of dimension n.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4       61ns       61ns       97ns          -          -
    8      317ns      319ns      208ns          -          -
   16        2µs        2µs      659ns          -          -
   32     14.1µs     14.1µs        4µs          -          -
   64    107.4µs    107.4µs    837.1µs          -          -
   96      356µs    123.6µs      1.1ms          -          -
  128    834.9µs    285.1µs      1.3ms          -          -
  192      2.8ms    675.1µs      1.5ms          -          -
  256      6.6ms      1.4ms        2ms          -          -
  384     22.3ms      4.6ms      3.1ms          -          -
  512     52.6ms     10.6ms      6.7ms          -          -
  640    103.6ms     19.3ms      9.9ms          -          -
  768    178.3ms     32.5ms     16.9ms          -          -
  896    282.8ms     50.6ms     23.1ms          -          -
 1024    422.1ms     74.6ms     33.5ms          -          -

Triangular solve

Solving AX = B in place where A and B are two square matrices of dimension n, and A is a triangular matrix.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4       22ns       22ns     26.8µs          -          -
    8      203ns      202ns     27.6µs          -          -
   16      1.3µs      1.3µs     30.5µs          -          -
   32      8.8µs      8.8µs     40.6µs          -          -
   64     62.7µs     62.7µs     59.5µs          -          -
   96      206µs      132µs    332.9µs          -          -
  128    466.5µs      272µs    432.2µs          -          -
  192      1.5ms    558.7µs    602.8µs          -          -
  256      3.5ms      1.1ms    919.3µs          -          -
  384     11.7ms      2.9ms      1.8ms          -          -
  512     27.4ms      6.2ms      4.5ms          -          -
  640     53.3ms     11.7ms        5ms          -          -
  768     91.6ms     19.5ms     10.2ms          -          -
  896    144.6ms     29.4ms       12ms          -          -
 1024    215.7ms     43.6ms     25.1ms          -          -

Triangular inverse

Computing A^-1 where A is a square triangular matrix with dimension n.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      186ns     15.9µs     27.5µs          -          -
    8      534ns     12.7µs     27.6µs          -          -
   16      1.8µs     16.5µs     30.5µs          -          -
   32      7.1µs     27.2µs     40.7µs          -          -
   64     35.2µs     56.5µs     59.6µs          -          -
   96    101.2µs    146.3µs     85.4µs          -          -
  128    208.8µs    251.4µs    454.6µs          -          -
  192      636µs    576.1µs    597.7µs          -          -
  256      1.4ms        1ms      1.1ms          -          -
  384      4.4ms      1.8ms      1.8ms          -          -
  512     10.1ms      3.1ms      4.5ms          -          -
  640     19.3ms      5.3ms        5ms          -          -
  768     32.8ms      7.9ms     10.2ms          -          -
  896     51.7ms     11.2ms       12ms          -          -
 1024     76.5ms     16.3ms       25ms          -          -

Cholesky decomposition

Factorizing a square matrix with dimension n as L×L.T, where L is lower triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4       45ns       45ns      124ns          -          -
    8      172ns      172ns      397ns          -          -
   16      925ns      925ns      1.3µs          -          -
   32      5.1µs      5.2µs      6.6µs          -          -
   64     30.3µs     30.4µs    176.3µs          -          -
   96     85.9µs       86µs      1.7ms          -          -
  128    192.3µs    192.3µs      1.5ms          -          -
  192    583.1µs    541.4µs        4ms          -          -
  256      1.3ms    934.9µs        5ms          -          -
  384      4.2ms      2.3ms       10ms          -          -
  512      9.8ms      3.8ms     11.5ms          -          -
  640     18.9ms      6.5ms     26.6ms          -          -
  768     32.2ms      9.9ms     20.1ms          -          -
  896     50.8ms     13.8ms     18.9ms          -          -
 1024     76.1ms     19.1ms     28.1ms          -          -

LU decomposition with partial pivoting

Factorizing a square matrix with dimension n as P×L×U, where P is a permutation matrix, L is unit lower triangular and U is upper triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      105ns       91ns      176ns          -          -
    8      266ns      273ns      566ns          -          -
   16        1µs      1.1µs      2.3µs          -          -
   32      6.5µs      6.6µs      7.8µs          -          -
   64     45.5µs     45.4µs     30.9µs          -          -
   96    145.9µs    145.5µs       74µs          -          -
  128    327.9µs    326.5µs    360.5µs          -          -
  192      1.1ms    863.8µs      756µs          -          -
  256      2.4ms      1.7ms        1ms          -          -
  384        8ms      4.2ms      2.6ms          -          -
  512     18.7ms        8ms      4.7ms          -          -
  640     36.3ms     13.9ms      7.6ms          -          -
  768     62.6ms     20.9ms     10.8ms          -          -
  896     98.6ms     29.8ms     14.7ms          -          -
 1024    146.5ms       43ms       24ms          -          -

LU decomposition with full pivoting

Factorizing a square matrix with dimension n as P×L×U×Q.T, where P and Q are permutation matrices, L is unit lower triangular and U is upper triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      191ns      185ns          -          -          -
    8      573ns      516ns          -          -          -
   16      2.2µs      2.3µs          -          -          -
   32     12.4µs     12.3µs          -          -          -
   64     82.7µs     82.4µs          -          -          -
   96    262.9µs    262.7µs          -          -          -
  128    609.8µs    609.7µs          -          -          -
  192        2ms        2ms          -          -          -
  256      4.9ms      4.9ms          -          -          -
  384     16.4ms     15.6ms          -          -          -
  512     40.7ms     29.4ms          -          -          -
  640     77.3ms       41ms          -          -          -
  768    134.2ms     63.2ms          -          -          -
  896    215.1ms     99.3ms          -          -          -
 1024    316.8ms    136.3ms          -          -          -

QR decomposition with no pivoting

Factorizing a square matrix with dimension n as QR, where Q is unitary and R is upper triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      164ns      164ns      879ns          -          -
    8      487ns      487ns      2.1µs          -          -
   16      2.5µs      2.5µs      6.9µs          -          -
   32     16.1µs     16.1µs     29.1µs          -          -
   64    132.5µs    132.5µs        1ms          -          -
   96    377.1µs    377.3µs      5.2ms          -          -
  128    815.1µs    815.2µs      9.7ms          -          -
  192      2.5ms      2.5ms     47.7ms          -          -
  256      5.7ms      4.7ms     68.9ms          -          -
  384     18.2ms     11.3ms    132.7ms          -          -
  512     41.9ms     21.9ms    176.6ms          -          -
  640     78.9ms     34.2ms      266ms          -          -
  768    134.3ms     52.8ms    350.6ms          -          -
  896    210.9ms     77.2ms    453.1ms          -          -
 1024    313.2ms    109.5ms    544.3ms          -          -

QR decomposition with column pivoting

Factorizing a square matrix with dimension n as QRP, where P is a permutation matrix, Q is unitary and R is upper triangular.

thread 'main' panicked at 'cast>SizeMismatch', /Users/leifeld/.cargo/registry/src/index.crates.io-6f17d22bba15001f/bytemuck-1.13.1/src/internal.rs:32:3
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
leifeld@MacStudovonDirk faer-bench %  77.3ms       41ms          -          -          -
  768    134.2ms     63.2ms          -          -          -
  896    215.1ms     99.3ms          -          -          -
 1024    316.8ms    136.3ms          -          -          -

QR decomposition with no pivoting

Factorizing a square matrix with dimension n as QR, where Q is unitary and R is upper triangular.

    n       faer  faer(par)    ndarray   nalgebra      eigen
    4      164ns      164ns      879ns          -          -
    8      487ns      487ns      2.1µs          -          -
   16      2.5µs      2.5µs      6.9µs          -          -
   32     16.1µs     16.1µs     29.1µs          -          -
   64    132.5µs    132.5µs        1ms          -          -
   96    377.1µs    377.3µs      5.2ms          -          -
  128    815.1µs    815.2µs      9.7ms          -          -
  192      2.5ms      2.5ms     47.7ms          -          -
  256      5.7ms      4.7ms     68.9ms          -          -
  384     18.2ms     11.3ms    132.7ms          -          -
  512     41.9ms     21.9ms    176.6ms          -          -
  640     78.9ms     34.2ms      266ms          -          -
  768    134.3ms     52.8ms    350.6ms          -          -
  896    210.9ms     77.2ms    453.1ms          -          -
 1024    313.2ms    109.5ms    544.3ms          -          -

QR decomposition with column pivoting

Factorizing a square matrix with dimension n as QRP, where P is a permutation matrix, Q is unitary and R is upper triangular.

thread 'main' panicked at 'cast>SizeMismatch', /Users/leifeld/.cargo/registry/src/index.crates.io-6f17d22bba15001f/bytemuck-1.13.1/src/internal.rs:32:3

implement `core::ops::{Add,Sub,Mul}` for `Mat<T>`

implement basic operations (addition, subtraction, matrix multiplication, multiplication by a scalar) for Mat<T>

T should satisfy the trait ComplexField for those operations to be implementable. for the implementation of addition, subtraction, and scalar multiplication, Mat::with_dims could be used, with a closure that takes two elements and adds or subtracts them

for matrix multiplication, we could create a matrix full of zeros with Mat::zeros, and then fill it up with the result of the computation with faer_core::mul::matmul

sarah-ek / faer-rs Goto Github PK

faer-rs's People

Contributors

Stargazers

Watchers

Forkers

faer-rs's Issues

Desired Solution:

Matrix multiplication (f32)

Thin matrix singular value decomposition (f32)

Matrix multiplication

Triangular solve

Triangular inverse

Cholesky decomposition

LU decomposition with partial pivoting

LU decomposition with full pivoting

QR decomposition with no pivoting

QR decomposition with column pivoting

Matrix inverse

Square matrix singular value decomposition

Thin matrix singular value decomposition

Hermitian matrix eigenvalue decomposition

Non Hermitian matrix eigenvalue decomposition

Matrix multiplication

Triangular solve

Triangular inverse

Cholesky decomposition

LU decomposition with partial pivoting

LU decomposition with full pivoting

QR decomposition with no pivoting

QR decomposition with column pivoting

Matrix inverse

Square matrix singular value decomposition

Thin matrix singular value decomposition

Hermitian matrix eigenvalue decomposition

Non Hermitian matrix eigenvalue decomposition

Matrix multiplication

Triangular solve

Triangular inverse

Cholesky decomposition

LU decomposition with partial pivoting

LU decomposition with full pivoting

QR decomposition with no pivoting

QR decomposition with column pivoting

Matrix inverse

Square matrix singular value decomposition

Thin matrix singular value decomposition

Hermitian matrix eigenvalue decomposition

Non Hermitian matrix eigenvalue decomposition

Matrix multiplication

Triangular solve

Triangular inverse

Cholesky decomposition

LU decomposition with partial pivoting

LU decomposition with full pivoting

QR decomposition with no pivoting

QR decomposition with column pivoting

QR decomposition with no pivoting

QR decomposition with column pivoting

Recommend Projects

Recommend Topics

Recommend Org