sarah-ek / faer-rs Goto Github PK
View Code? Open in Web Editor NEWLinear algebra foundation for the Rust programming language
Home Page: https://faer-rs.github.io
License: MIT License
Linear algebra foundation for the Rust programming language
Home Page: https://faer-rs.github.io
License: MIT License
Routines like (Bi-) conjugate gradients, MINRES, GMRES, SYMMLQ, LOBPCG etc. are not part of LAPACK but may fit in faer
as well. Would you accept PRs for a faer-iterative
crate for large/non-dense problems?
Hi!
I am really impressed by your colossal work on this math kernel!
I am writing a Julia wrapper to benchmark faer against OpenBLAS and MKL.
So far I have only studied the dense matrix-matrix multiplication. My preliminary results show faer approximately 50% slower than OpenBLAS and 25% slower than MKL on an AMD Ryzen 5 7640U on 8 threads.
This is basically my first Rust project and I want to be fair to faer: is this a reasonable dynamic library exposing faer inplace matrix multiplication using the C ABI?
I am not sure if opening an issue is the right way to ask but the faer documentation is very sparse at the moment on how to import external matrices.
use faer::modules::core::mul::matmul;
use faer::{mat, Parallelism};
use std::usize;
// inplace c = a * b
#[no_mangle]
pub unsafe extern "C" fn mult(
c_ptr: *mut f64,
c_nrows: u64,
c_ncols: u64,
c_row_stride: u64,
c_col_stride: u64,
a_ptr: *const f64,
a_nrows: u64,
a_ncols: u64,
a_row_stride: u64,
a_col_stride: u64,
b_ptr: *const f64,
b_nrows: u64,
b_ncols: u64,
b_row_stride: u64,
b_col_stride: u64,
nthreads: u32,
) {
assert!(!c_ptr.is_null());
assert!(!a_ptr.is_null());
assert!(!b_ptr.is_null());
let c = unsafe {
mat::from_raw_parts_mut::<f64>(
c_ptr,
c_nrows as usize,
c_ncols as usize,
c_row_stride as isize,
c_col_stride as isize,
)
};
let a = unsafe {
mat::from_raw_parts::<f64>(
a_ptr,
a_nrows as usize,
a_ncols as usize,
a_row_stride as isize,
a_col_stride as isize,
)
};
let b = unsafe {
mat::from_raw_parts::<f64>(
b_ptr,
b_nrows as usize,
b_ncols as usize,
b_row_stride as isize,
b_col_stride as isize,
)
};
matmul(c, a, b, None, 1.0, Parallelism::Rayon(nthreads as usize));
}
Hello,
Just out of curiosity, are you planning on adding support for linear programming that could run on top of the algebraic operations provided by the project? Or do you think that such thing should be better provided by third-parties using faer-rs
?
Thanks
The top line of each of the wiki's examples has using faer_core::mat;
or something like that. I think it should be something like use faer::core::mat;
and so forth.
Is your feature request related to a problem? Please describe.
I have a x: Mat<f64>
and an n: f64
and I need to update x = n * x
.
Describe the solution you'd like
I think it would be convenient to implement the MulAssign
trait such that we can do x *= n
and it reuses the memory allocated for x
.
Describe alternatives you've considered
Maybe it would be good to have a generic map_with
method that transforms all the elements of a matrix in place. It could be used to implement this MulAssign
trait, and it would probably be useful for other things as well.
Currently my pyO3 bindings that call some rust routines that use faer look something like:
use numpy::{IntoPyArray, PyArray2, PyReadonlyArray2};
use pyo3::{exceptions::PyRuntimeError, pymodule, types::PyModule, PyResult, Python};
use faer::{prelude::*}
// ...
#[pymodule]
fn rust_lib<'py>(_py: Python<'py>, m: &'py PyModule)
-> PyResult<()>
{
#[pyfn(m)]
fn rpca<'py>(py: Python<'py>, a_py: PyReadonlyArray2<'py, f64>, n_rank: usize, n_iters: usize, n_oversamples: usize)
-> &'py PyArray2<f64>
{
let a_ndarray = a_py.as_array();
let a_faer = a_ndarray.view().into_faer();
// ... faer-rs math
ndarray_result.into_pyarray(py)
}
}
I'm unsure if I am using the best approach.
// ...
#[pyfn(m)]
fn rpca<'py>(py: Python<'py>, a_py: PyReadonlyArray2<'py, f64>, n_rank: usize, n_iters: usize, n_oversamples: usize)
-> &'py PyArray2<f64>
{
let a_faer = a_py.into_faer();
// ... faer-rs math
faer_result.into_pyarray(py)
}
I was skimming through the high level API PR, and saw renaming of set_constant
to fill_with_constant
. It's really a minor thing, but how about naming this method just fill
as analogous to slice::fill? As for fill_with_zero
, it does not have a counterpart in std (or I am now aware of it), but fill_zero
sounds fine to me. This way, you can leave fill_with
name for the method taking a filling closure, as is also a convention in std (slice::fill_with).
Describe the bug
The Vec<c64>
output of eigenvalues()
for the adjacency matrix of the following
Note: the path graph on
To Reproduce
/* use the smaller unit test linked above
#[test]
fn this_other_tree_has_correct_maximum_eigenvalue() {
let edges = [(3, 2), (6, 1), (7, 4), (7, 6), (8, 5), (9, 4), (11, 2), (12, 2), (13, 2), (15, 6), (16, 2), (16, 4), (17, 8), (18, 0), (18, 8), (18, 2), (19, 6), (19, 10), (19, 14)];
let mut a = faer::Mat::zeros(20, 20);
for (v, u) in edges.iter() {
a[(*v, *u)] = 1.0;
a[(*u, *v)] = 1.0;
}
let eigs_complex: Vec<faer::complex_native::c64> = a.eigenvalues();
println!("{eigs_complex:?}");
let eigs_real = eigs_complex
.iter()
.map(|e| e.re)
.collect::<Vec<_>>();
println!("{eigs_real:?}");
let lambda_1 = *eigs_real
.iter()
.max_by(|a, b| a.partial_cmp(&b).unwrap())
.unwrap();
let correct_lamba_1 = 2.6148611139728866;
assert!(
(lambda_1 - correct_lamba_1).abs() < 1e-10,
"lambda_1 = {lambda_1}, correct_lamba_1 = {correct_lamba_1}",
);
}
*/
Expected behavior
name: graphenv
# channels:
# - defaults
# dependencies:
# - python
# - networkx
# - matplotlib
# - imageio
# - msgpack-python
# - scipy
import networkx as nx
import numpy as np
import matplotlib.pyplot as plt
# Create the graph
edges = [[3, 2], [6, 1], [7, 4], [7, 6], [8, 5], [9, 4], [11, 2], [12, 2], [13, 2], [15, 6], [16, 2], [16, 4], [17, 8], [18, 0], [18, 8], [18, 2], [19, 6], [19, 10], [19, 14]]
graph = nx.Graph()
graph.add_edges_from(edges)
# Get the adjacency matrix eigenvalues
adjacency_matrix = nx.adjacency_matrix(graph).todense()
eigenvalues = np.linalg.eigvals(adjacency_matrix)
# Print the eigenvalues in decreasing order
print("Eigenvalues in decreasing order:")
for eigenvalue in sorted(eigenvalues, reverse=True):
print(eigenvalue)
# Plot the graph
nx.draw(graph, with_labels=True)
plt.show()
Sorry I don't have time currently to try and track and fix the issue in more detail:
Running cargo t
in faer-core
fails non-deterministically with the error:
failures:
---- mul::tests::triangular stdout ----
thread 'mul::tests::triangular' panicked at 'attempt to subtract with overflow', /home/daniel/.cargo/registry/src/github.com-1ecc6299db9ec823/gemm-0.12.0/src/gemm.rs:960:35
failures:
mul::tests::triangular
test result: FAILED. 12 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.07s
Is your feature request related to a problem? Please describe.
Normal math operators like +
, -
, *
, ==
etc. can't be used on sparse matrices. There are currently two bounds that can't be satisfied:
impl GenericMatrix
, which they don't.MatMul
, MatAdd
etc. Right now MatMul
is implemented for sparse types, but not any of the other operations.Describe the solution you'd like
Right now, to multiply a sparse matrix with another matrix, you must manually invoke the MatMul
implementation as so:
<SparseRowMat<_> as MatMul<DenseCol>>::mat_mul(lhs, rhs.as_ref());
And there's no way to add or subtract sparse matrices. It would be nice to be able to do something like lhs * &rhs
and lhs + &rhs
like normal. As well, it would also be nice to have a Debug
and Index
implementation for sparse matrices.
Is it possible to make rayon
-dependency optional? If so, would you consider accepting a PR that does this?
implement addition for Mat<T>
the traits core::ops::Add<Mat<T>>
and core::ops::Add<&Mat<T>>
should be implemented for Mat<T>
and &Mat<T>
T
should satisfy the trait ComplexField
for those operations to be implementable. Mat::with_dims
could be used, with a closure that takes two elements and adds them
Although the IntoNdarray
trait itself depends on the ndarray
feature, all of the implementations depend on the nalgebra
feature.
The current iteration of faer::polars::polars_to_faer_fxx
requires that a slice &[&str]
of column names be passed to the function. While there is nothing wrong with this in an absolute sense, it does make for a somewhat awkward workflow when working with polars
.
A pattern more aligned to how polars
functions is to remove the &[&str]
item from the function signature, and have the function create a Mat<E>
from the entire frame. The function should also error if columns containing invalid data types (such as strings) are passed in as well.
I saw square matrix sizes up to 1k in the benchmark section. Is it in scope of the project to maintain and optimize larger (50-200k) matrix sizes for multiplication (or other operations)?
Is your feature request related to a problem? Please describe.
It makes sense to have the macros col!
and row!
in order to be consistent with mat!
. This also seems a good opportunity to include a related macro (or function) blockwise!
:
blockwise![[A, B], [C, D]];
// ┌───────┬──┐
// │ │ │
// │ A │ B│
// │ │ │
// ├───────┼──┤
// │ C │ D│
// └───────┴──┘
You do not support no_std
, right?
I'm frequently using matrix-diagonal and diagonal-diagonal multiplication (which is a component-wise vector mul). Since populating a matrix with a diagonal does lead to wasted performance, I'm using a modified copy of div_by_s
I found in faer source.
Does it make sense to flesh out a Diagonal type?
Is your feature request related to a problem? Please describe.
Another maintenance task. Unlike #10 which is geared towards developers knowing what low level changes happened. This one is geared towards user facing changes so the average user knows what changes happened between releases without much hassle on the maintainers side.
Describe the solution you'd like
This is one I have to research a bit more. I forgot the means to do this but I know it's possible and seemed rather useful.
You can assign this one to me
In the LAPACK Cholesky factorization routines (e.g. DPOTRF) when it encounters an error it returns the details about where it failed (INFO > 0). Is it possible to add something similar to the faer Cholesky routines?
Is your feature request related to a problem? Please describe.
Especially in debugging, it is useful to be able to inspect the matrix in a non-terminal environment. The CSV format is easily implemented and read from, but has not yet been implemented.
Describe the solution you'd like
Methods for writing and reading matrices, either as implementations for a given type, or as generic functions.
Is your feature request related to a problem? Please describe.
Matrix exponentation utilizing the standard double precision algorithm by Al-Mohy and Higham.
Describe the solution you'd like
A simple function that takes in a reference or matrix view (of some sorts not familiar with this library just yet) and yields a Result< ,> containing the exponentiated matrix if it worked and an error if not.
Describe alternatives you've considered
n/a
Additional context
I have implemented this function for ndarray-linalg but the maintainers have taken months and still no word. I am using my implementation for research purposes, using ndarray I have an error one-norm of about 20x scipy at 200x200 dense matrices and an error of machine precision for 1-sparse matrices up to 1024x1024. It might take some time but I'd be willing to port it over to this library to help move it along, would like to hear how this should best be done.
implement subtraction for Mat<T>
the traits core::ops::Sub<Mat<T>>
and core::ops::Sub<&Mat<T>>
should be implemented for Mat<T>
and &Mat<T>
T
should satisfy the trait ComplexField
for those operations to be implementable. Mat::with_dims
could be used, with a closure that takes two elements and subtracts them
Is your feature request related to a problem? Please describe.
It would be great to have an equivalent to DSYRK in faer, to do the operations
C = alpha * A * A^T + beta * C
and
C = alpha * A^T * A + beta * C
This is useful for example when accumulating normal equations.
Describe the solution you'd like
A symmetric rank k routine.
Describe alternatives you've considered
The operation can be done with GEMM, but is significantly slower.
Describe the bug
Running the faer-bench benchmark on 2xAMD EPYC 7V13 64-Core Processor
is surprisingly slow. Both in absolute numbers, as well as in comparing the speedup of faer(par) over faer(seq).
This does not seem to be the case on other, larger AMD server cpu's
To Reproduce
Just to keep track for myself:
CXXFLAGS="-I/u/drehwald/prog" CXX=g++ cargo +nightly run --release --no-default-features --features faer
Expected behavior
Well, don't be slow.
Screenshots
If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
➜ ~ g++ --version
g++ (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
➜ ~ cargo +nightly --version
cargo 1.71.0-nightly (d0a4cbcee 2023-04-16)
Additional context
Our admin just got back to me, university admins probably won't adjust the perf settings for us, the machine is too busy so it would be a perf risk. But I got access to two other AMD machines, maybe we can use that for pinning the issue down.
implement the qr decomposition with and without column pivoting, along with the solve/reconstruct/inverse/least_squares functions to complete the decomposition api
the qr module is also needed for the future svd module implementation, since it could be used as a preconditioner.
Running cargo bench faer-mt-cplx-llt-512
leads to a segfault on my machine (the corresponding single threaded bench seems to run fine). I get the following traceback:
#0 0x000055c4371ca569 in core::ptr::write<[num_complex::Complex<f64>; 3]> () at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/ptr/mod.rs:1354
#1 core::ptr::mut_ptr::{impl#0}::write<[num_complex::Complex<f64>; 3]> () at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/ptr/mut_ptr.rs:1449
#2 gemm_common::pack_operands::pack_generic_inner_loop<num_complex::Complex<f64>, 1, 3> () at /home/adr/.cargo/registry/src/github.com-1ecc6299db9ec823/gemm-common-0.15.2/src/pack_operands.rs:163
#3 gemm_common::pack_operands::pack_generic<num_complex::Complex<f64>, 1, 3> () at /home/adr/.cargo/registry/src/github.com-1ecc6299db9ec823/gemm-common-0.15.2/src/pack_operands.rs:256
#4 gemm_common::pack_operands::pack_rhs::{closure#0}<num_complex::Complex<f64>, 1, 3, gemm_common::simd::x86::Fma> () at /home/adr/.cargo/registry/src/github.com-1ecc6299db9ec823/gemm-common-0.15.2/src/pack_operands.rs:301
#5 gemm_common::simd::x86::{impl#2}::vectorize<gemm_common::pack_operands::pack_rhs::{closure_env#0}<num_complex::Complex<f64>, 1, 3, gemm_common::simd::x86::Fma>> (f=...)
at /home/adr/.cargo/registry/src/github.com-1ecc6299db9ec823/gemm-common-0.15.2/src/simd.rs:50
#6 0x000055c4371c74ef in gemm_common::pack_operands::pack_rhs<num_complex::Complex<f64>, 1, 3, gemm_common::simd::x86::Fma> (n=<optimized out>, k=8192, dst=..., src=..., src_cs=48, src_rs=255, dst_stride=768)
at /home/adr/.cargo/registry/src/github.com-1ecc6299db9ec823/gemm-common-0.15.2/src/pack_operands.rs:299
#7 0x000055c4371c4992 in gemm_common::gemm::gemm_basic_generic::{closure#3}<gemm_common::simd::x86::Fma, num_complex::Complex<f64>, 2, 6, 3, 3, gemm_c64::gemm::f64::fma_cplx::gemm_basic_cplx::{closure_env#0}> (
tid=<optimized out>) at /home/adr/.cargo/registry/src/github.com-1ecc6299db9ec823/gemm-common-0.15.2/src/gemm.rs:395
#8 0x000055c4371f88ea in core::ops::function::impls::{impl#0}::call<(usize), (dyn core::ops::function::Fn<(usize), Output=()> + core::marker::Send + core::marker::Sync)> (args=..., self=<optimized out>)
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/ops/function.rs:263
#9 core::ops::function::impls::{impl#1}::call_mut<(usize), &(dyn core::ops::function::Fn<(usize), Output=()> + core::marker::Send + core::marker::Sync)> (args=..., self=<optimized out>)
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/ops/function.rs:274
#10 core::iter::traits::iterator::Iterator::for_each::call::{closure#0}<usize, &&(dyn core::ops::function::Fn<(usize), Output=()> + core::marker::Send + core::marker::Sync)> (item=8192)
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/iter/traits/iterator.rs:849
#11 core::iter::traits::iterator::Iterator::fold<core::ops::range::Range<usize>, (), core::iter::traits::iterator::Iterator::for_each::call::{closure_env#0}<usize, &&(dyn core::ops::function::Fn<(usize), Output=()> + core::marker::
Send + core::marker::Sync)>> (self=..., f=..., init=<optimized out>) at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/iter/traits/iterator.rs:2477
#12 core::iter::traits::iterator::Iterator::for_each<core::ops::range::Range<usize>, &&(dyn core::ops::function::Fn<(usize), Output=()> + core::marker::Send + core::marker::Sync)> (self=..., f=0x7fe657ffd740)
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/iter/traits/iterator.rs:852
#13 rayon::iter::for_each::{impl#1}::consume_iter<&(dyn core::ops::function::Fn<(usize), Output=()> + core::marker::Send + core::marker::Sync), usize, core::ops::range::Range<usize>> (self=..., iter=...)
at /home/adr/.cargo/registry/src/github.com-1ecc6299db9ec823/rayon-1.7.0/src/iter/for_each.rs:55
#14 rayon::iter::plumbing::Producer::fold_with<rayon::range::IterProducer<usize>, rayon::iter::for_each::ForEachConsumer<&(dyn core::ops::function::Fn<(usize), Output=()> + core::marker::Send + core::marker::Sync)>> (self=...,
folder=...) at /home/adr/.cargo/registry/src/github.com-1ecc6299db9ec823/rayon-1.7.0/src/iter/plumbing/mod.rs:110
#15 rayon::iter::plumbing::bridge_producer_consumer::helper<rayon::range::IterProducer<usize>, rayon::iter::for_each::ForEachConsumer<&(dyn core::ops::function::Fn<(usize), Output=()> + core::marker::Send + core::marker::Sync)>> (
len=<optimized out>, migrated=<optimized out>, splitter=..., producer=..., consumer=...) at /home/adr/.cargo/registry/src/github.com-1ecc6299db9ec823/rayon-1.7.0/src/iter/plumbing/mod.rs:438
OS: arch linux
CPU: AMD Threadripper 1920X
version: current main branch (88dd024)
Will this repository include topics such as Geometric Algerbra, such as G^n. This would be something I would be interested in helping contribute to.
Wondering if faer is using SIMD api. If not, maybe SIMD can make faer faster
Is your feature request related to a problem? Please describe.
This is more of a maintenance task to make it obvious and easy to see what changes have happened over the life of the project.
Describe the solution you'd like
I'm thinking of using an action to generate the Changelog on releases.
Describe alternatives you've considered
Considered GitHub Actions and some other functions for doing it. Decided to stick with native GitHub tooling.
Feel free to assign to me, as I think it would be fun to do.
Is your feature request related to a problem? Please describe.
At current there are test but no automated tests run via CI to ensure bugs aren't introduced as changes are made.
Describe the solution you'd like
A GitHub Action to run the test and output our current test coverage.
Describe alternatives you've considered
None seemed natural to use GitHub Actions since we're on GitHub. Believe travis-CI is a viable alternative though.
I have just seen your crate and ran a quick benchmark against intel mkl.
I got the following result
test bench_faer_rs_n ... bench: 33,479,057 ns/iter (+/- 9,121,197)
test bench_faer_rs_t ... bench: 32,854,984 ns/iter (+/- 7,885,723)
test bench_mkl_n ... bench: 35,916,818 ns/iter (+/- 9,305,052)
test bench_mkl_t ... bench: 35,681,294 ns/iter (+/- 10,449,638)
I tried it several times just to make sure it is not a luck result, or bug. Having on par performance with intel mkl library is amazing!!!.
Maybe you can confirm yourself and put a comparison with intelmkl to documentation since it is one of the fastest blas implementation, if not the fastest.
Is your feature request related to a problem? Please describe.
I have a x: Mat<f64>
and y: MatRef<'_, f64>
and I need to update x = x - y
.
Describe the solution you'd like
I think it would be convenient to have an impl SubAssign<Rhs = MatRef> for Mat
such that it reuses the memory allocated for x
.
Describe alternatives you've considered
I am currently solving my issue with Mat::with_dims
to create a new matrix. Implementing these traits could be a bit more performant, and it would definitely be more readable.
Describe the bug
Calling Mat::eigenvalues()
may lead to a panic from a failing assertion. To reproduce this, generate a 4096x4096 matrix with random f64
elements of range 0 to 1 and call the eigenvalues funtion (See code below).
This will result in the following error:
thread 'main' panicked at /home/sthagel/.cargo/registry/src/index.crates.io-6f17d22bba15001f/faer-0.18.1/src/linalg/evd/hessenberg.rs:816:26:
assertion `left == right` failed: iterators must have the same length
left: 31
right: 32
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
To Reproduce
use faer::mat::Mat;
use num::complex::Complex64;
use rand::prelude::*;
fn main() {
let n = 4096;
let n_sq = n * n;
let mat = (0..n_sq)
.map(|_| {
let mut rng = rand::thread_rng();
rng.gen()
})
.collect::<Vec<f64>>();
let results = eigenvals_faer(&mat, n - 2, n);
println!("{:#?}", results)
}
#[inline]
pub fn eigenvals_faer(
mat: &[f64],
nev: usize,
n: usize,
) -> Result<Vec<Complex64>, &str> {
if nev * n == 0 || nev > n - 2 {
return Err("Invalid parameters!");
}
let faer_mat: Mat<f64> = Mat::from_fn(n, n, |i, j| mat[i + n * j]);
let evs = faer_mat.eigenvalues();
Ok(evs)
}
Expected behavior
One would expect the calculation to produce the desired eigenvalues
Desktop (please complete the following information):
Additional info
On smaller matrices (e.g. 40x40 instead of 4096x4096) it works just fine.
First of all, thanks a lot for your work in faer
, it is impressive. It has also been a lot of fun to play around with the library.
Describe the bug
The results of solving a lower triangular system are unexpected.
To Reproduce
The following code reproduces the error. It also shows how the equivalent code with nalgebra
gets the correct solution:
# Cargo.toml
[package]
name = "test"
version = "0.1.0"
edition = "2021"
[dependencies]
faer-core = "0.9.1"
nalgebra = "0.32.2"
// main.rs
const RESPONSE: [f64; 3] = [-18.0, -60.0, -80.0];
// Toeplitz and lower triangular
fn r_element(i: usize, j: usize) -> f64 {
if i >= j {
let diff = i - j;
if diff < RESPONSE.len() {
RESPONSE[diff]
} else {
0.0
}
} else {
0.0
}
}
fn r_matrix_faer(n: usize) -> faer_core::Mat<f64> {
faer_core::Mat::with_dims(n, n, |i, j| r_element(i, j))
}
fn r_matrix_nalgebra(n: usize) -> nalgebra::DMatrix<f64> {
nalgebra::DMatrix::from_fn(n, n, |i, j| r_element(i, j))
}
// Solve for X in Y = R * X
fn solve_x_faer(r: &faer_core::Mat<f64>, y: &mut faer_core::Mat<f64>) {
faer_core::solve::solve_lower_triangular_in_place_with_conj(
r.as_ref(),
faer_core::Conj::No,
y.as_mut(),
faer_core::Parallelism::None,
);
}
// Solve for X in Y = R * X
fn solve_x_nalgebra(r: &nalgebra::DMatrix<f64>, y: &mut nalgebra::DMatrix<f64>) {
r.solve_lower_triangular_mut(y);
}
fn main() {
let (i, j) = (400, 1);
let r = r_matrix_faer(i);
let mut x = faer_core::Mat::zeros(i, j);
x.write(0, 0, 150.0);
let mut y = r.clone() * x;
solve_x_faer(&r, &mut y); // The values in `y` blow up
// Now, the same thing with nalgebra
let r = r_matrix_nalgebra(i);
let mut x = nalgebra::DMatrix::zeros(i, j);
x[(0, 0)] = 150.0;
let mut y = r.clone() * x;
solve_x_nalgebra(&r, &mut y); // The values in `y` are correct
}
Expected behavior
I would expect the same results as nalgebra
Specifying or documenting the MSRV somewhere would be helpful for consumers of faer
.
This was discussed on discord.
Raising issue as a bookmark for making an initial contribution as suggested by @sarah-ek.
When this matrix
[
[0.0 + 0.0 * I, 0.0 + 0.0 * I, 0.0 + 0.0 * I, 2.220446049250313e-16 + -1.0000000000000002 * I],
[0.0 + 0.0 * I, 0.0 + 0.0 * I, 2.220446049250313e-16 + -1.0000000000000002 * I, 0.0 + 0.0 * I],
[0.0 + 0.0 * I, 2.220446049250313e-16 + -1.0000000000000002 * I, 0.0 + 0.0 * I, 0.0 + 0.0 * I],
[2.220446049250313e-16 + -1.0000000000000002 * I, 0.0 + 0.0 * I, 0.0 + 0.0 * I, 0.0 + 0.0 * I],
]
as a Mat<c64>
is passed to eigenvalues
or complex_eigenvalues
, it throws an error.
numpy does compute eigenvalues:
ar = np.array([
[0.0 + 0.0j, 0.0 + 0.0j, 0.0 + 0.0j, 2.220446049250313e-16 + -1.0000000000000002j],
[0.0 + 0.0j, 0.0 + 0.0j, 2.220446049250313e-16 + -1.0000000000000002j, 0.0 + 0.0j],
[0.0 + 0.0j, 2.220446049250313e-16 + -1.0000000000000002j, 0.0 + 0.0j, 0.0 + 0.0j],
[2.220446049250313e-16 + -1.0000000000000002j, 0.0 + 0.0j, 0.0 + 0.0j, 0.0 + 0.0j],
])
In [1]: np.linalg.eigvals(ar)
Out[1]:
array([ 1.11022302e-16-1.j, -2.22044605e-16+1.j, 1.11022302e-16-1.j,
-2.22044605e-16+1.j])
The error messages are
Assertion failed at /home/lapeyre/.cargo/registry/src/index.crates.io-6f17d22bba15001f/faer-core-0.13.0/src/lib.rs:2093:13:
assert!( row < this.nrows() )
with expansion:
4 < 4
Several other matrices do give the same eigenvalues that numpy
does.
Version: 0.13.1
Arch Linux
It would be great if there will be, for any function, a low level API which exposes the needed workspace to avoid any allocations of the function.
The work is impressive. Being competitive with the big guys is nothing short of amazing considering this is a single person show.
Apache Arrow is an aligned chunked array representation of columns. If my matrix is a dense matrix encoded as a chunked array of float64 numbers is it possible to use faer-rs on top of it in a zero-copy way? I assume further constraints (chunksize has to be a multiple of some batch size) will be required. Or will we have to combine the (relatively large) chunks into a continuous aligned array first before feeding it to faer?
Is your feature request related to a problem? Please describe.
Hi, I ran the benchmarks yesterday on my machine (Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz) and did a quick experiment with PGO. The wallclock results are even more impressive and are eventually worth adding to the benchmark section.
Describe the solution you'd like
Make user aware of PGO, link the section in Rust documentation and add some additional estimates of benchmarks.
Additional context
Two examples with least and most improvements. I have seen a consistent improvement for all benchmarks, especially for the parallelized version.
with PGO | w/o PGO |
|
|
with PGO | w/o PGO |
|
|
Is your feature request related to a problem? Please describe.
Hey, this might be an embarrassingly dumb question, but I cannot see a way to add two matrices using +
(or subtract using -
). The official website does mention that the addition and subtraction operators should work as expected, but when I write a+b
for e.g. two Mat<f64>
instances, it won't compile. Looking at the faer_core
crate I cannot find an implementation of core::ops::Add
. What am I missing?
Describe the solution you'd like
It would be great if the +
operator would just work for two suitable Mat<E>
instances as well as one Mat<E>
and the as a suitable MatRef
instance. I can elaborate on what I mean by suitable, but roughly I mean if the entities can be added, then the matrix should be addable (given that the matrix dimensions match).
Describe alternatives you've considered
I could of course do element wise additions manually, but that would feel so much less ergonomic than having an Add
trait implemented. Same for AddAssign
.
Additional context
If I missed an obvious way to add matrices, I'd really appreciate a hint. If the Add
and AddAssign
, Sub
, and SubAssign
traits are indeed not implemented I'd be very happy to file a PR because this crate is really awesome. Please let me know and thanks for your work :)
Hi @sarah-ek,
Is there any plans on adding any high-level APIs for computing eigenvectors and eigenvalues of square, but not necessarily symmetric, matrices?
I really like the ergonomics of your library but it would be nice to just have a functional, simple API for providing matrices and getting out eigenvalues and their associated eigenvectors without having to hand roll some form of iterative/inverse iteration method in each new crate.
Is your feature request related to a problem? Please describe.
Some problems need a Sparse Cholesky solver, such as GraphSlam
Describe the solution you'd like
Implement a Sparse Cholesky Solver
Describe alternatives you've considered
In Rust, I used Russel which wraps SuiteSparse
Describe the bug
For many graph adjacency matrices with small eigenvalues, faer
crashes on an i32
overflow when computing eigenvalues as a self-adjoint matrix.
The precise line is iter += 1;
.
My current workaround to avoid selfadjoint_eigenvalues()
is
let eigs: Vec<faer::complex_native::c64> = a.eigenvalues();
To Reproduce
let mat: [[f64; 20]; 20] = [[0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0], [1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0], [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0], [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0]];
let mut matt = faer::Mat::zeros(20, 20);
for i in 0..20 {
for j in 0..20 {
matt[(i, j)] = mat[i][j];
}
}
let eigs = matt.selfadjoint_eigenvalues(faer::Side::Lower);
Expected behavior
Expected eigs
to return a vector of
For example,
# `bad_mat.py`
# run with `python bad_mat.py`
import numpy as np
mat = [[0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0], [1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0], [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0], [1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0]]
mat = np.array(mat)
eigs = np.linalg.eig(mat)[0]
print(eigs)
real_parts = [np.real(_) for _ in eigs]
print(real_parts)
''' output:
[ 3.69390168e+00+0.00000000e+00j -3.23272209e+00+0.00000000e+00j
-2.52597629e+00+0.00000000e+00j 2.29766159e+00+0.00000000e+00j
2.19984492e+00+0.00000000e+00j -2.25650189e+00+0.00000000e+00j
-1.98531299e+00+0.00000000e+00j -1.48123145e+00+0.00000000e+00j
-1.10237216e+00+0.00000000e+00j 1.47508155e+00+0.00000000e+00j
-6.62467970e-01+0.00000000e+00j -5.32411898e-01+0.00000000e+00j
-2.28816715e-01+0.00000000e+00j 4.97954067e-01+0.00000000e+00j
7.40821370e-01+0.00000000e+00j 1.10254826e+00+0.00000000e+00j
1.00000000e+00+0.00000000e+00j 1.00000000e+00+0.00000000e+00j
-1.88640048e-17+9.57840098e-17j -1.88640048e-17-9.57840098e-17j]
[3.6939016833383165, -3.2327220859059014, -2.52597628501941, 2.2976615943259593, 2.199844923144114, -2.2565018926765257, -1.9853129913610488, -1.4812314534620308, -1.102372157787216, 1.4750815501253782, -0.6624679696406541, -0.5324118977580056, -0.22881671515516241, 0.49795406722182745, 0.7408213702790264, 1.1025482603313292, 1.0000000000000004, 1.0000000000000004, -1.886400483817679e-17, -1.886400483817679e-17]
'''
Additional context
Feel free to suggest a different method to use!
Since these matrices have a few very small eigenvalues, perhaps it is appropriate to tweak the high-level function to instead calculate a good guess for consider_zero_threshold
in the higher-level API.
In this instance, the value equals 2.2250738585072014e-308
.
pub fn compute_tridiag_real_evd_qr_algorithm<E: RealField>(
diag: &mut [E],
offdiag: &mut [E],
u: Option<MatMut<'_, E>>,
epsilon: E,
consider_zero_threshold: E,
) {
Is your feature request related to a problem? Please describe.
Faer right now doesn't work with Polars 0.34. We should update it and make sure nothing breaks and the tests still pass.
Describe the solution you'd like
polars = { version = "0.34", optional = true, features = ["lazy"] }
and see if existing functions still work.
Describe alternatives you've considered
NA
Additional context
NA
Thanks in advance
Is your feature request related to a problem? Please describe.
Currently none of the matrix multiplication crates support the half (fp16) datatype. There is the half crate (https://crates.io/crates/half) that includes a rust type for this.
This means for any crate that needs to support matrix multiplication on CPU with f16 datatype, they need to manually implement a matrix multiplication algorithm, which can be very slow.
Describe the solution you'd like
It'd be great if faer included configurable support for these data types.
Describe alternatives you've considered
I'm currently using this super naive matmul implementation:
fn naive_gemm<F: num_traits::Float + std::ops::AddAssign, M: Dim, K: Dim, N: Dim>(
(m, k, n): (M, K, N),
ap: *const F,
a_strides: [usize; 2],
bp: *const F,
b_strides: [usize; 2],
cp: *mut F,
c_strides: [usize; 2],
) {
for i_m in 0..m.size() {
for i_k in 0..k.size() {
for i_n in 0..n.size() {
unsafe {
let a = *ap.add(a_strides[0] * i_m + a_strides[1] * i_k);
let b = *bp.add(b_strides[0] * i_k + b_strides[1] * i_n);
let c = cp.add(c_strides[0] * i_m + c_strides[1] * i_n);
*c += a * b;
}
}
}
}
}
Which is extremely slow. I'm not sure what the other alternatives are.
Additional context
I'm the author of dfdx. dfdx has both cuda/CPU support, and CUDA does have hardware acceleration for f16. It'd just be nice if f16 matmul on CPU was a bit faster!
I ran cargo run in the faer-bench dir and the result was a panic.
The system is a Arm MacOS with the newest nichtly.
leifeld@MacStudovonDirk faer-bench % cargo +nightly run --release --no-default-features
Finished release [optimized] target(s) in 0.08s
Running target/release/faer-bench
f32
Multiplication of two square matrices of dimension n
.
n faer faer(par) ndarray nalgebra eigen
4 33ns 32ns 142ns - -
8 57ns 60ns 122ns - -
16 206ns 211ns 303ns - -
32 1.1µs 1.1µs 996ns - -
64 8.1µs 8.1µs 6.7µs - -
96 24.8µs 30.9µs 1.3ms - -
128 59.8µs 39.1µs 1.2ms - -
192 195.1µs 93.3µs 1.7ms - -
256 479.7µs 202.1µs 1.9ms - -
384 1.6ms 448.6µs 2.3ms - -
512 3.7ms 1ms 2.6ms - -
640 7.3ms 1.7ms 3ms - -
768 12.5ms 3ms 5.1ms - -
896 19.7ms 4.4ms 6.8ms - -
1024 29.8ms 6.7ms 9.2ms - -
Solving AX = B
in place where A
and B
are two square matrices of dimension n
, and A
is a triangular matrix.
n faer faer(par) ndarray nalgebra eigen
4 14ns 14ns 26.3µs - -
8 90ns 90ns 27µs - -
16 378ns 380ns 29.9µs - -
32 1.4µs 1.4µs 35.9µs - -
64 7.2µs 7.2µs 54.1µs - -
96 20.4µs 27.2µs 93.5µs - -
128 44.9µs 40.6µs 131.9µs - -
192 131.7µs 117.5µs 290.3µs - -
256 301.3µs 152.8µs 487.1µs - -
384 913.6µs 450.8µs 996.1µs - -
512 2.2ms 966µs 2.6ms - -
640 4.1ms 1.8ms 2.7ms - -
768 6.7ms 2.6ms 3.9ms - -
896 10.8ms 3.6ms 5.3ms - -
1024 17.7ms 5.5ms 12.6ms - -
Computing A^-1
where A
is a square triangular matrix with dimension n
.
n faer faer(par) ndarray nalgebra eigen
4 160ns 9.3µs 26.3µs - -
8 468ns 12µs 27.1µs - -
16 1.2µs 16.4µs 29.9µs - -
32 3.4µs 24.1µs 35.9µs - -
64 9.9µs 31.6µs 54.1µs - -
96 23.2µs 39.9µs 89.1µs - -
128 33.9µs 49.9µs 133.3µs - -
192 86.4µs 94.7µs 310.4µs - -
256 153.1µs 143.9µs 507.7µs - -
384 421.7µs 281.3µs 1ms - -
512 907.5µs 492.6µs 2.7ms - -
640 1.6ms 816.2µs 2.7ms - -
768 2.6ms 1ms 3.9ms - -
896 4.1ms 1.5ms 5.4ms - -
1024 6.8ms 2.4ms 12.5ms - -
Factorizing a square matrix with dimension n
as L×L.T
, where L
is lower triangular.
n faer faer(par) ndarray nalgebra eigen
4 43ns 43ns 120ns - -
8 154ns 154ns 354ns - -
16 767ns 766ns 1.2µs - -
32 2.1µs 2.1µs 4.8µs - -
64 6.3µs 6.3µs 15µs - -
96 15.8µs 15.8µs 34.2µs - -
128 22.2µs 22.3µs 428.7µs - -
192 59.7µs 87µs 12.5ms - -
256 106.1µs 126.4µs 3.2ms - -
384 292.4µs 331.5µs 5.3ms - -
512 651.6µs 483.6µs 7ms - -
640 1.1ms 1.1ms 11.4ms - -
768 2ms 1.5ms 19.6ms - -
896 3ms 2ms 1.82s - -
1024 4.7ms 2.5ms 22.3ms - -
Factorizing a square matrix with dimension n
as P×L×U
, where P
is a permutation matrix, L
is unit lower triangular and U
is upper triangular.
n faer faer(par) ndarray nalgebra eigen
4 92ns 89ns 154ns - -
8 263ns 238ns 456ns - -
16 895ns 905ns 2µs - -
32 2.9µs 2.9µs 7µs - -
64 11.9µs 12.2µs 22.8µs - -
96 27.2µs 28.2µs 46.3µs - -
128 54µs 53.7µs 88.3µs - -
192 141.9µs 215.8µs 206.9µs - -
256 285.8µs 384.5µs 749.8µs - -
384 801.3µs 919.1µs 1.6ms - -
512 1.8ms 1.7ms 2.4ms - -
640 3.2ms 3.1ms 4.2ms - -
768 5.3ms 4.2ms 5.8ms - -
896 8.2ms 5.8ms 7.7ms - -
1024 12.8ms 7.9ms 9.8ms - -
Factorizing a square matrix with dimension n
as P×L×U×Q.T
, where P
and Q
are permutation matrices, L
is unit lower triangular and U
is upper triangular.
n faer faer(par) ndarray nalgebra eigen
4 160ns 160ns - - -
8 388ns 403ns - - -
16 1.5µs 1.5µs - - -
32 7.9µs 7.8µs - - -
64 53.4µs 53µs - - -
96 174.2µs 173.9µs - - -
128 417.9µs 417µs - - -
192 1.5ms 1.5ms - - -
256 3.5ms 3.5ms - - -
384 11.8ms 11.3ms - - -
512 28.9ms 21.8ms - - -
640 55.6ms 33.2ms - - -
768 98ms 49.2ms - - -
896 158.7ms 72.1ms - - -
1024 242.3ms 103.1ms - - -
Factorizing a square matrix with dimension n
as QR
, where Q
is unitary and R
is upper triangular.
n faer faer(par) ndarray nalgebra eigen
4 102ns 100ns 729ns - -
8 294ns 293ns 1.5µs - -
16 1.1µs 1.2µs 4.7µs - -
32 4.6µs 4.7µs 17.2µs - -
64 25.3µs 25.6µs 65.4µs - -
96 57.7µs 59.4µs 454.9µs - -
128 110.2µs 109.9µs 4.4ms - -
192 298.2µs 297.7µs 37.8ms - -
256 615.9µs 796.9µs 59ms - -
384 1.8ms 2.1ms 118.2ms - -
512 4ms 3.7ms 197.2ms - -
640 6.7ms 5.1ms 301.4ms - -
768 11.2ms 8ms 429.2ms - -
896 17.3ms 11.9ms 523.8ms - -
1024 25.6ms 16.5ms 632.4ms - -
Factorizing a square matrix with dimension n
as QRP
, where P
is a permutation matrix, Q
is unitary and R
is upper triangular.
n faer faer(par) ndarray nalgebra eigen
4 153ns 140ns - - -
8 414ns 410ns - - -
16 1.5µs 1.6µs - - -
32 6.5µs 6.5µs - - -
64 40.7µs 66.3µs - - -
96 123.3µs 167.8µs - - -
128 266.4µs 327.5µs - - -
192 802.5µs 862.3µs - - -
256 1.8ms 2.3ms - - -
384 5.6ms 6.8ms - - -
512 12.5ms 12.8ms - - -
640 24.6ms 19.9ms - - -
768 41.3ms 28.1ms - - -
896 65.1ms 38.3ms - - -
1024 94.5ms 51.7ms - - -
Computing the inverse of a square matrix with dimension n
.
n faer faer(par) ndarray nalgebra eigen
4 726ns 14.3µs 412ns - -
8 1.7µs 17.4µs 907ns - -
16 4.4µs 25.3µs 3.9µs - -
32 13µs 51.3µs 12.6µs - -
64 42µs 84.9µs 49.4µs - -
96 100.1µs 141µs 443.4µs - -
128 168.5µs 209.1µs 2.8ms - -
192 430µs 454.3µs 3.1ms - -
256 837.1µs 711.4µs 7.3ms - -
384 2.3ms 1.6ms 17.9ms - -
512 5.3ms 3ms 19.4ms - -
640 9.6ms 5ms 31.4ms - -
768 15.2ms 7ms 34.6ms - -
896 24ms 9.7ms 43.1ms - -
1024 37.6ms 14.3ms 50.6ms - -
Computing the SVD of a square matrix with dimension n
.
n faer faer(par) ndarray nalgebra eigen
4 1.1µs 1.1µs 2µs - -
8 5.8µs 21.5µs 5.8µs - -
16 20.3µs 54µs 19.5µs - -
32 72µs 116.1µs 71.6µs - -
64 270.6µs 360.5µs 3.9ms - -
96 664.6µs 971.7µs 6.7ms - -
128 1.3ms 1.7ms 44.4ms - -
192 3.5ms 5.1ms 108.8ms - -
256 7.3ms 9.3ms 173.6ms - -
384 21.4ms 20.8ms 310.9ms - -
512 47.5ms 35.9ms 466.5ms - -
640 85.2ms 54.2ms 616.3ms - -
768 143ms 80ms 830.5ms - -
896 225.4ms 115.2ms 1.07s - -
1024 327.3ms 160.4ms 1.23s - -
Computing the SVD of a rectangular matrix with shape (4096, n)
.
n faer faer(par) ndarray nalgebra eigen
4 32.8µs 32.4µs 644.1µs - -
8 112µs 150.6µs 1.4ms - -
16 318.9µs 456.7µs 8.4ms - -
32 947.4µs 1.2ms 15.5ms - -
64 2.9ms 3.5ms 45.5ms - -
96 5.3ms 5.8ms 61ms - -
128 8.9ms 9ms 100.3ms - -
192 18.7ms 17.5ms 248.2ms - -
256 33.7ms 29.3ms 380.9ms - -
384 75.2ms 54.5ms 657.6ms - -
512 138.8ms 94ms 1.15s - -
640 224.2ms 136.4ms 1.63s - -
768 336.6ms 182.1ms 1.97s - -
896 478.8ms 240.6ms 2.45s - -
1024 656ms 327.2ms 2.68s - -
Computing the EVD of a hermitian matrix with shape (n, n)
.
n faer faer(par) ndarray nalgebra eigen
4 992ns 890ns 1µs - -
8 2.5µs 2.4µs 3.5µs - -
16 9.2µs 8.7µs 12.1µs - -
32 36.5µs 34.8µs 55.5µs - -
64 161.4µs 194.4µs 421.2µs - -
96 391.3µs 499.3µs 1.1ms - -
128 725.3µs 841.1µs 4.3ms - -
192 1.9ms 2.2ms 22.6ms - -
256 3.7ms 4ms 45.9ms - -
384 10.7ms 12.4ms 114.5ms - -
512 23.2ms 22.7ms 206.6ms - -
640 41.9ms 34.5ms 330.7ms - -
768 69.5ms 51ms 471ms - -
896 107.4ms 70.9ms 657ms - -
1024 156.6ms 92.4ms 888.3ms - -
Computing the EVD of a matrix with shape (n, n)
.
n faer faer(par) ndarray nalgebra eigen
4 3.9µs 4.2µs 2.4µs - -
8 11.6µs 13µs 6.9µs - -
16 43.7µs 42.8µs 27.4µs - -
32 174.2µs 188.9µs 123.3µs - -
64 806.5µs 834.3µs 658.5µs - -
96 1.9ms 2ms 4.4ms - -
128 3.2ms 3.4ms 31.6ms - -
192 8.5ms 9ms 100.3ms - -
256 16.4ms 17ms 228.4ms - -
384 38.3ms 47.9ms 592.5ms - -
512 82.8ms 96.1ms 798.3ms - -
640 146.3ms 164.9ms 1.12s - -
768 230.3ms 243.3ms 1.46s - -
896 340.3ms 341.4ms 1.69s - -
1024 540.2ms 539.9ms 2.45s - -
f64
Multiplication of two square matrices of dimension n
.
n faer faer(par) ndarray nalgebra eigen
4 32ns 32ns 128ns - -
8 80ns 84ns 171ns - -
16 311ns 315ns 345ns - -
32 2.1µs 2.2µs 1.8µs - -
64 14.8µs 14.8µs 12.1µs - -
96 50.3µs 41.3µs 1.2ms - -
128 124.1µs 69.5µs 1.4ms - -
192 392.9µs 173.1µs 1.5ms - -
256 943.6µs 315.2µs 1.3ms - -
384 3.2ms 889.3µs 2.3ms - -
512 7.5ms 1.9ms 3.7ms - -
640 14.5ms 3.4ms 4.9ms - -
768 25.7ms 5.8ms 8.5ms - -
896 30ms 7.3ms 13.1ms - -
1024 45.1ms 11ms 17.4ms - -
Solving AX = B
in place where A
and B
are two square matrices of dimension n
, and A
is a triangular matrix.
n faer faer(par) ndarray nalgebra eigen
4 14ns 14ns 27.9µs - -
8 89ns 90ns 27.7µs - -
16 429ns 431ns 29.8µs - -
32 1.9µs 1.9µs 35.5µs - -
64 11.3µs 11.3µs 54µs - -
96 33.9µs 35.3µs 81.9µs - -
128 75.4µs 63.5µs 126µs - -
192 235µs 149.1µs 264.2µs - -
256 555.4µs 254.1µs 759.9µs - -
384 1.7ms 841.4µs 1.2ms - -
512 4.5ms 1.6ms 3.5ms - -
640 7.8ms 2.8ms 3.4ms - -
768 13.1ms 4.5ms 7.3ms - -
896 20.9ms 6.5ms 7ms - -
1024 35.4ms 10ms 18.3ms - -
Computing A^-1
where A
is a square triangular matrix with dimension n
.
n faer faer(par) ndarray nalgebra eigen
4 156ns 12.4µs 28µs - -
8 479ns 11.4µs 27.7µs - -
16 1.3µs 17µs 29.9µs - -
32 3.7µs 24.1µs 34.2µs - -
64 11.4µs 33µs 52.2µs - -
96 28.8µs 45.4µs 212.9µs - -
128 45.8µs 65.2µs 334.2µs - -
192 124.9µs 130.4µs 405.3µs - -
256 247µs 226µs 765µs - -
384 712.3µs 395.6µs 1.2ms - -
512 1.8ms 736.7µs 3.4ms - -
640 2.9ms 1.3ms 3.3ms - -
768 4.8ms 2ms 7.2ms - -
896 7.6ms 2.6ms 7.1ms - -
1024 13.5ms 3.9ms 18ms - -
Factorizing a square matrix with dimension n
as L×L.T
, where L
is lower triangular.
n faer faer(par) ndarray nalgebra eigen
4 46ns 46ns 121ns - -
8 146ns 145ns 393ns - -
16 631ns 625ns 1.3µs - -
32 2µs 2µs 5.1µs - -
64 7.3µs 7.3µs 187µs - -
96 18.3µs 18.3µs 1.4ms - -
128 32.2µs 32.4µs 1.3ms - -
192 88.2µs 100.1µs 6ms - -
256 186.8µs 188.5µs 4.8ms - -
384 520.7µs 465µs 8.8ms - -
512 1.3ms 935.6µs 10.4ms - -
640 2.2ms 1.6ms 12.7ms - -
768 3.8ms 2.2ms 20.5ms - -
896 5.7ms 3ms 24.7ms - -
1024 9.5ms 4.6ms 26.2ms - -
Factorizing a square matrix with dimension n
as P×L×U
, where P
is a permutation matrix, L
is unit lower triangular and U
is upper triangular.
n faer faer(par) ndarray nalgebra eigen
4 76ns 93ns 150ns - -
8 225ns 213ns 493ns - -
16 740ns 771ns 2.2µs - -
32 3µs 2.9µs 6.6µs - -
64 14.7µs 14.8µs 21.1µs - -
96 37.7µs 37.7µs 46.5µs - -
128 75.7µs 76.2µs 302.8µs - -
192 218.3µs 276.5µs 620.6µs - -
256 474.4µs 524µs 810.8µs - -
384 1.4ms 1.3ms 1.5ms - -
512 3.4ms 2.6ms 3ms - -
640 5.9ms 4.4ms 4.6ms - -
768 9.9ms 6.2ms 6.5ms - -
896 15.5ms 8.5ms 8.7ms - -
1024 24.8ms 12.8ms 11.3ms - -
Factorizing a square matrix with dimension n
as P×L×U×Q.T
, where P
and Q
are permutation matrices, L
is unit lower triangular and U
is upper triangular.
n faer faer(par) ndarray nalgebra eigen
4 119ns 132ns - - -
8 371ns 369ns - - -
16 1.4µs 1.4µs - - -
32 7.7µs 7.7µs - - -
64 53.4µs 53.4µs - - -
96 185.1µs 184.7µs - - -
128 460.1µs 459µs - - -
192 1.7ms 1.7ms - - -
256 4.3ms 4.3ms - - -
384 14.8ms 14ms - - -
512 38.2ms 25.8ms - - -
640 73.1ms 38ms - - -
768 128.8ms 56.5ms - - -
896 207.4ms 80.4ms - - -
1024 306.1ms 119.3ms - - -
Factorizing a square matrix with dimension n
as QR
, where Q
is unitary and R
is upper triangular.
n faer faer(par) ndarray nalgebra eigen
4 106ns 106ns 658ns - -
8 295ns 296ns 1.6µs - -
16 1µs 1µs 4.7µs - -
32 5µs 5µs 18µs - -
64 31.9µs 31.8µs 75.6µs - -
96 79.7µs 79.7µs 526.7µs - -
128 158.8µs 157.9µs 4.7ms - -
192 451.3µs 450.1µs 49.9ms - -
256 963.8µs 1.1ms 72.8ms - -
384 2.9ms 2.9ms 148.4ms - -
512 6.6ms 5.4ms 291.2ms - -
640 12.4ms 8.9ms 337.5ms - -
768 20.8ms 13.7ms 413.4ms - -
896 31.6ms 20.7ms 546.6ms - -
1024 45.6ms 28.3ms 734.9ms - -
Factorizing a square matrix with dimension n
as QRP
, where P
is a permutation matrix, Q
is unitary and R
is upper triangular.
n faer faer(par) ndarray nalgebra eigen
4 175ns 162ns - - -
8 422ns 436ns - - -
16 1.6µs 1.6µs - - -
32 8µs 7.9µs - - -
64 58.7µs 93.9µs - - -
96 169µs 213.4µs - - -
128 358.2µs 422.9µs - - -
192 1.2ms 1.2ms - - -
256 2.7ms 3ms - - -
384 8.8ms 8.5ms - - -
512 20.5ms 16.5ms - - -
640 39.9ms 27.4ms - - -
768 67.8ms 43.4ms - - -
896 107.9ms 59.4ms - - -
1024 159.2ms 77ms - - -
Computing the inverse of a square matrix with dimension n
.
n faer faer(par) ndarray nalgebra eigen
4 715ns 12.4µs 406ns - -
8 1.6µs 16.7µs 991ns - -
16 4.2µs 25.4µs 4.1µs - -
32 13.6µs 53.5µs 12.8µs - -
64 51.3µs 96.7µs 56.2µs - -
96 130µs 169µs 450.3µs - -
128 240.2µs 260.5µs 3.3ms - -
192 659.7µs 589.3µs 6.1ms - -
256 1.4ms 1.1ms 7.2ms - -
384 4.1ms 2.4ms 15.8ms - -
512 9.8ms 4.5ms 21.3ms - -
640 17.3ms 7.5ms 28.9ms - -
768 28.7ms 10.8ms 40ms - -
896 44.9ms 15.3ms 52.6ms - -
1024 74.2ms 23.2ms 70.7ms - -
Computing the SVD of a square matrix with dimension n
.
n faer faer(par) ndarray nalgebra eigen
4 1.4µs 1.3µs 2.4µs - -
8 7.4µs 23.1µs 7.4µs - -
16 23.3µs 56.8µs 24.9µs - -
32 85.2µs 130.1µs 86.7µs - -
64 374µs 464.2µs 4.1ms - -
96 953µs 1.3ms 7ms - -
128 1.9ms 2.2ms 51.7ms - -
192 5.3ms 6.3ms 135.2ms - -
256 11.4ms 11ms 204.5ms - -
384 34.1ms 27.3ms 312.2ms - -
512 77.3ms 48.9ms 516ms - -
640 144.7ms 78.2ms 688.1ms - -
768 248.5ms 119.6ms 912.3ms - -
896 384.6ms 176.2ms 1.16s - -
1024 560.5ms 249.6ms 1.64s - -
Computing the SVD of a rectangular matrix with shape (4096, n)
.
n faer faer(par) ndarray nalgebra eigen
4 43.6µs 43.7µs 578.8µs - -
8 165.7µs 180.3µs 1.5ms - -
16 471.3µs 573.1µs 7.8ms - -
32 1.4ms 1.5ms 15.2ms - -
64 4.6ms 4.5ms 58.3ms - -
96 9.3ms 8.2ms 62.8ms - -
128 16.1ms 12.8ms 103.5ms - -
192 34ms 24.8ms 237.3ms - -
256 60.9ms 41.3ms 383.9ms - -
384 134.4ms 83.4ms 663.6ms - -
512 248ms 142.1ms 1.05s - -
640 402.7ms 204.7ms 1.52s - -
768 616.6ms 298.7ms 1.81s - -
896 874.1ms 399.3ms 2.44s - -
1024 1.19s 532.8ms 2.87s - -
Computing the EVD of a hermitian matrix with shape (n, n)
.
n faer faer(par) ndarray nalgebra eigen
4 1.1µs 1µs 1.2µs - -
8 3µs 3.6µs 4.2µs - -
16 12µs 11.8µs 15.5µs - -
32 50.4µs 53.6µs 68µs - -
64 224.6µs 245.4µs 437.8µs - -
96 519.9µs 615.6µs 1.4ms - -
128 989.7µs 1.1ms 6ms - -
192 2.5ms 2.8ms 25.6ms - -
256 5.1ms 4.8ms 51.1ms - -
384 14.5ms 14.8ms 122.3ms - -
512 31.6ms 27.8ms 232.1ms - -
640 58.5ms 44.6ms 370.5ms - -
768 97ms 65.1ms 554ms - -
896 148.8ms 92.1ms 769.1ms - -
1024 218.1ms 122.4ms 1.09s - -
Computing the EVD of a matrix with shape (n, n)
.
n faer faer(par) ndarray nalgebra eigen
4 4.2µs 4.3µs 2.5µs - -
8 14.7µs 14µs 8.6µs - -
16 51.5µs 49µs 31.3µs - -
32 228.1µs 225.8µs 146.1µs - -
64 1ms 1ms 822.7µs - -
96 2.8ms 3ms 5.1ms - -
128 5.2ms 5.2ms 25.1ms - -
192 14.2ms 14.4ms 121.7ms - -
256 26.9ms 26.7ms 312.8ms - -
384 69.1ms 78.5ms 624.7ms - -
512 159.2ms 177.8ms 1.02s - -
640 256.7ms 259.4ms 1.17s - -
768 407.6ms 382.6ms 1.69s - -
896 588.6ms 531.4ms 2.03s - -
1024 1.03s 931.6ms 2.75s - -
f128
Multiplication of two square matrices of dimension n
.
n faer faer(par) ndarray nalgebra eigen
4 150ns 151ns - - -
8 1.3µs 16.2µs - - -
16 8.1µs 25.4µs - - -
32 60.4µs 40.6µs - - -
64 474.6µs 162.2µs - - -
96 1.6ms 519.4µs - - -
128 3.8ms 961.5µs - - -
192 12.8ms 2.6ms - - -
256 30.7ms 5.5ms - - -
384 103.2ms 16.5ms - - -
512 254.1ms 39.6ms - - -
640 481.4ms 70.9ms - - -
768 866.7ms 133.5ms - - -
896 1.34s 185ms - - -
1024 2.25s 311.7ms - - -
Solving AX = B
in place where A
and B
are two square matrices of dimension n
, and A
is a triangular matrix.
n faer faer(par) ndarray nalgebra eigen
4 73ns 73ns - - -
8 670ns 669ns - - -
16 5.1µs 24.7µs - - -
32 37.8µs 179.7µs - - -
64 269.6µs 471.8µs - - -
96 892.4µs 568.2µs - - -
128 2.1ms 926.1µs - - -
192 6.9ms 2.3ms - - -
256 16ms 4.4ms - - -
384 53.9ms 12ms - - -
512 126.2ms 25.5ms - - -
640 243.8ms 47.3ms - - -
768 419.6ms 71.8ms - - -
896 662.1ms 118.7ms - - -
1024 1.03s 170.5ms - - -
Computing A^-1
where A
is a square triangular matrix with dimension n
.
n faer faer(par) ndarray nalgebra eigen
4 271ns 16µs - - -
8 965ns 13.8µs - - -
16 4.2µs 35.5µs - - -
32 22µs 85.3µs - - -
64 132.6µs 330.6µs - - -
96 389µs 566.4µs - - -
128 862.3µs 1ms - - -
192 2.7ms 2ms - - -
256 6.1ms 3ms - - -
384 19.6ms 7.6ms - - -
512 46ms 13ms - - -
640 96.4ms 25.3ms - - -
768 155ms 35.2ms - - -
896 235.6ms 54ms - - -
1024 349.3ms 75.2ms - - -
Factorizing a square matrix with dimension n
as L×L.T
, where L
is lower triangular.
n faer faer(par) ndarray nalgebra eigen
4 289ns 286ns - - -
8 692ns 696ns - - -
16 2.4µs 2.4µs - - -
32 18.9µs 58.2µs - - -
64 128.9µs 393.8µs - - -
96 368.2µs 844µs - - -
128 858.1µs 1.6ms - - -
192 2.7ms 3.6ms - - -
256 6.1ms 5.2ms - - -
384 20.1ms 12.8ms - - -
512 45.8ms 19ms - - -
640 88.4ms 36.8ms - - -
768 147.9ms 50.1ms - - -
896 231.4ms 69.7ms - - -
1024 348.7ms 89.7ms - - -
Factorizing a square matrix with dimension n
as P×L×U
, where P
is a permutation matrix,
L
is unit lower triangular and U
is upper triangular.
n faer faer(par) ndarray nalgebra eigen
4 214ns 186ns - - -
8 580ns 596ns - - -
16 2.4µs 2.5µs - - -
32 21.7µs 64.4µs - - -
64 173.7µs 369.3µs - - -
96 581.2µs 889.6µs - - -
128 1.3ms 1.5ms - - -
192 4.8ms 3.7ms - - -
256 10.9ms 6.4ms - - -
384 36.4ms 16.3ms - - -
512 85.5ms 33.5ms - - -
640 165.6ms 54ms - - -
768 284.5ms 80.2ms - - -
896 450.1ms 125ms - - -
1024 684.1ms 173.6ms - - -
Factorizing a square matrix with dimension n
as P×L×U×Q.T
, where P
and Q
are
permutation matrices, L
is unit lower triangular and U
is upper triangular.
n faer faer(par) ndarray nalgebra eigen
4 310ns 314ns - - -
8 1.2µs 1.1µs - - -
16 5.4µs 5.5µs - - -
32 35.5µs 35.6µs - - -
64 255.5µs 256µs - - -
96 843.7µs 840.9µs - - -
128 2ms 2ms - - -
192 6.7ms 6.7ms - - -
256 16.2ms 16.2ms - - -
384 52.2ms 47.8ms - - -
512 125.9ms 76.2ms - - -
640 241.1ms 114.2ms - - -
768 417.1ms 166.8ms - - -
896 661.6ms 232.6ms - - -
1024 996.3ms 326.2ms - - -
Factorizing a square matrix with dimension n
as QR
, where Q
is unitary and R
is upper
triangular.
n faer faer(par) ndarray nalgebra eigen
4 698ns 699ns - - -
8 2.3µs 2.3µs - - -
16 10.9µs 10.7µs - - -
32 56.8µs 56.6µs - - -
64 519.8µs 518.4µs - - -
96 1.5ms 1.5ms - - -
128 3.4ms 3.5ms - - -
192 10.7ms 10.8ms - - -
256 24.7ms 18.5ms - - -
384 80.8ms 52ms - - -
512 198.4ms 97.7ms - - -
640 368.2ms 146.3ms - - -
768 624ms 219.9ms - - -
896 980.1ms 318.4ms - - -
1024 1.46s 422.8ms - - -
Factorizing a square matrix with dimension n
as QRP
, where P
is a permutation matrix,
Q
is unitary and R
is upper triangular.
n faer faer(par) ndarray nalgebra eigen
4 947ns 939ns - - -
8 3.6µs 3.6µs - - -
16 16.8µs 16.8µs - - -
32 97µs 97.4µs - - -
64 684.6µs 714.8µs - - -
96 2ms 1.9ms - - -
128 4.4ms 4.2ms - - -
192 13.7ms 13.1ms - - -
256 30.7ms 25.1ms - - -
384 99.1ms 49.1ms - - -
512 231.4ms 90ms - - -
640 449.1ms 151.6ms - - -
768 764.5ms 230.5ms - - -
896 1.21s 332.2ms - - -
1024 1.79s 462ms - - -
Computing the inverse of a square matrix with dimension n
.
n faer faer(par) ndarray nalgebra eigen
4 1.4µs 15.1µs - - -
8 4.5µs 36.1µs - - -
16 20.4µs 63.4µs - - -
32 110.3µs 196.8µs - - -
64 694.5µs 777.2µs - - -
96 2.1ms 1.7ms - - -
128 4.7ms 3.1ms - - -
192 15.1ms 7.8ms - - -
256 35ms 13.6ms - - -
384 117.1ms 35.4ms - - -
512 267.1ms 64.2ms - - -
640 524.5ms 120.4ms - - -
768 899.5ms 179.5ms - - -
896 1.4s 273.8ms - - -
1024 2.17s 405.6ms - - -
Computing the SVD of a square matrix with dimension n
.
n faer faer(par) ndarray nalgebra eigen
4 11.6µs 11.7µs - - -
8 60.4µs 96.7µs - - -
16 259.8µs 333.2µs - - -
32 1.1ms 1.1ms - - -
64 5.9ms 5.6ms - - -
96 16.4ms 12.4ms - - -
128 31.5ms 22.2ms - - -
192 89.6ms 51ms - - -
256 194.7ms 89.3ms - - -
384 581.4ms 243.3ms - - -
512 1.32s 464.9ms - - -
640 2.48s 758ms - - -
768 4.13s 1.16s - - -
896 6.45s 1.72s - - -
1024 9.51s 2.27s - - -
Computing the SVD of a rectangular matrix with shape (4096, n)
.
n faer faer(par) ndarray nalgebra eigen
4 1.1ms 2.1ms - - -
8 3.6ms 4.6ms - - -
16 10.2ms 13.2ms - - -
32 32.3ms 33.1ms - - -
64 120.3ms 86.5ms - - -
96 253.1ms 142.4ms - - -
128 444.6ms 216.4ms - - -
192 996.1ms 399.4ms - - -
256 1.8s 645.6ms - - -
384 4.05s 1.24s - - -
512 7.37s 2.1s - - -
640 11.67s 3.04s - - -
768 17.43s 4.6s - - -
896 23.81s 6.02s - - -
1024 32.59s 7.85s - - -
Computing the EVD of a hermitian matrix with shape (n, n)
.
n faer faer(par) ndarray nalgebra eigen
4 5.4µs 5.8µs - - -
8 21.9µs 25.7µs - - -
16 102µs 99.5µs - - -
32 515.3µs 497.8µs - - -
64 3.1ms 3.1ms - - -
96 8.6ms 7.2ms - - -
128 17.5ms 13.1ms - - -
192 51.1ms 33.1ms - - -
256 108.3ms 65.1ms - - -
384 339.3ms 152.3ms - - -
512 745.4ms 283.9ms - - -
640 1.4s 457.2ms - - -
768 2.36s 702.8ms - - -
896 3.72s 1.03s - - -
1024 5.45s 1.37s - - -
Computing the EVD of a matrix with shape (n, n)
.
n faer faer(par) ndarray nalgebra eigen
4 16.2µs 14.7µs - - -
8 68.3µs 101.5µs - - -
16 349.1µs 388µs - - -
32 1.8ms 1.8ms - - -
64 11.4ms 12.8ms - - -
96 46.5ms 60.2ms - - -
128 92.9ms 104.8ms - - -
192 299.1ms 221.5ms - - -
256 593.9ms 418.8ms - - -
384 1.61s 978.8ms - - -
512 3.46s 2s - - -
640 6.26s 3.63s - - -
768 10.17s 5.24s - - -
896 15.31s 7.25s - - -
1024 22.59s 9.99s - - -
c32
Multiplication of two square matrices of dimension n
.
n faer faer(par) ndarray nalgebra eigen
4 61ns 61ns 97ns - -
8 317ns 319ns 208ns - -
16 2µs 2µs 659ns - -
32 14.1µs 14.1µs 4µs - -
64 107.4µs 107.4µs 837.1µs - -
96 356µs 123.6µs 1.1ms - -
128 834.9µs 285.1µs 1.3ms - -
192 2.8ms 675.1µs 1.5ms - -
256 6.6ms 1.4ms 2ms - -
384 22.3ms 4.6ms 3.1ms - -
512 52.6ms 10.6ms 6.7ms - -
640 103.6ms 19.3ms 9.9ms - -
768 178.3ms 32.5ms 16.9ms - -
896 282.8ms 50.6ms 23.1ms - -
1024 422.1ms 74.6ms 33.5ms - -
Solving AX = B
in place where A
and B
are two square matrices of dimension n
, and A
is a triangular matrix.
n faer faer(par) ndarray nalgebra eigen
4 22ns 22ns 26.8µs - -
8 203ns 202ns 27.6µs - -
16 1.3µs 1.3µs 30.5µs - -
32 8.8µs 8.8µs 40.6µs - -
64 62.7µs 62.7µs 59.5µs - -
96 206µs 132µs 332.9µs - -
128 466.5µs 272µs 432.2µs - -
192 1.5ms 558.7µs 602.8µs - -
256 3.5ms 1.1ms 919.3µs - -
384 11.7ms 2.9ms 1.8ms - -
512 27.4ms 6.2ms 4.5ms - -
640 53.3ms 11.7ms 5ms - -
768 91.6ms 19.5ms 10.2ms - -
896 144.6ms 29.4ms 12ms - -
1024 215.7ms 43.6ms 25.1ms - -
Computing A^-1
where A
is a square triangular matrix with dimension n
.
n faer faer(par) ndarray nalgebra eigen
4 186ns 15.9µs 27.5µs - -
8 534ns 12.7µs 27.6µs - -
16 1.8µs 16.5µs 30.5µs - -
32 7.1µs 27.2µs 40.7µs - -
64 35.2µs 56.5µs 59.6µs - -
96 101.2µs 146.3µs 85.4µs - -
128 208.8µs 251.4µs 454.6µs - -
192 636µs 576.1µs 597.7µs - -
256 1.4ms 1ms 1.1ms - -
384 4.4ms 1.8ms 1.8ms - -
512 10.1ms 3.1ms 4.5ms - -
640 19.3ms 5.3ms 5ms - -
768 32.8ms 7.9ms 10.2ms - -
896 51.7ms 11.2ms 12ms - -
1024 76.5ms 16.3ms 25ms - -
Factorizing a square matrix with dimension n
as L×L.T
, where L
is lower triangular.
n faer faer(par) ndarray nalgebra eigen
4 45ns 45ns 124ns - -
8 172ns 172ns 397ns - -
16 925ns 925ns 1.3µs - -
32 5.1µs 5.2µs 6.6µs - -
64 30.3µs 30.4µs 176.3µs - -
96 85.9µs 86µs 1.7ms - -
128 192.3µs 192.3µs 1.5ms - -
192 583.1µs 541.4µs 4ms - -
256 1.3ms 934.9µs 5ms - -
384 4.2ms 2.3ms 10ms - -
512 9.8ms 3.8ms 11.5ms - -
640 18.9ms 6.5ms 26.6ms - -
768 32.2ms 9.9ms 20.1ms - -
896 50.8ms 13.8ms 18.9ms - -
1024 76.1ms 19.1ms 28.1ms - -
Factorizing a square matrix with dimension n
as P×L×U
, where P
is a permutation matrix, L
is unit lower triangular and U
is upper triangular.
n faer faer(par) ndarray nalgebra eigen
4 105ns 91ns 176ns - -
8 266ns 273ns 566ns - -
16 1µs 1.1µs 2.3µs - -
32 6.5µs 6.6µs 7.8µs - -
64 45.5µs 45.4µs 30.9µs - -
96 145.9µs 145.5µs 74µs - -
128 327.9µs 326.5µs 360.5µs - -
192 1.1ms 863.8µs 756µs - -
256 2.4ms 1.7ms 1ms - -
384 8ms 4.2ms 2.6ms - -
512 18.7ms 8ms 4.7ms - -
640 36.3ms 13.9ms 7.6ms - -
768 62.6ms 20.9ms 10.8ms - -
896 98.6ms 29.8ms 14.7ms - -
1024 146.5ms 43ms 24ms - -
Factorizing a square matrix with dimension n
as P×L×U×Q.T
, where P
and Q
are permutation matrices, L
is unit lower triangular and U
is upper triangular.
n faer faer(par) ndarray nalgebra eigen
4 191ns 185ns - - -
8 573ns 516ns - - -
16 2.2µs 2.3µs - - -
32 12.4µs 12.3µs - - -
64 82.7µs 82.4µs - - -
96 262.9µs 262.7µs - - -
128 609.8µs 609.7µs - - -
192 2ms 2ms - - -
256 4.9ms 4.9ms - - -
384 16.4ms 15.6ms - - -
512 40.7ms 29.4ms - - -
640 77.3ms 41ms - - -
768 134.2ms 63.2ms - - -
896 215.1ms 99.3ms - - -
1024 316.8ms 136.3ms - - -
Factorizing a square matrix with dimension n
as QR
, where Q
is unitary and R
is upper triangular.
n faer faer(par) ndarray nalgebra eigen
4 164ns 164ns 879ns - -
8 487ns 487ns 2.1µs - -
16 2.5µs 2.5µs 6.9µs - -
32 16.1µs 16.1µs 29.1µs - -
64 132.5µs 132.5µs 1ms - -
96 377.1µs 377.3µs 5.2ms - -
128 815.1µs 815.2µs 9.7ms - -
192 2.5ms 2.5ms 47.7ms - -
256 5.7ms 4.7ms 68.9ms - -
384 18.2ms 11.3ms 132.7ms - -
512 41.9ms 21.9ms 176.6ms - -
640 78.9ms 34.2ms 266ms - -
768 134.3ms 52.8ms 350.6ms - -
896 210.9ms 77.2ms 453.1ms - -
1024 313.2ms 109.5ms 544.3ms - -
Factorizing a square matrix with dimension n
as QRP
, where P
is a permutation matrix, Q
is unitary and R
is upper triangular.
thread 'main' panicked at 'cast>SizeMismatch', /Users/leifeld/.cargo/registry/src/index.crates.io-6f17d22bba15001f/bytemuck-1.13.1/src/internal.rs:32:3
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
leifeld@MacStudovonDirk faer-bench % 77.3ms 41ms - - -
768 134.2ms 63.2ms - - -
896 215.1ms 99.3ms - - -
1024 316.8ms 136.3ms - - -
Factorizing a square matrix with dimension n
as QR
, where Q
is unitary and R
is upper triangular.
n faer faer(par) ndarray nalgebra eigen
4 164ns 164ns 879ns - -
8 487ns 487ns 2.1µs - -
16 2.5µs 2.5µs 6.9µs - -
32 16.1µs 16.1µs 29.1µs - -
64 132.5µs 132.5µs 1ms - -
96 377.1µs 377.3µs 5.2ms - -
128 815.1µs 815.2µs 9.7ms - -
192 2.5ms 2.5ms 47.7ms - -
256 5.7ms 4.7ms 68.9ms - -
384 18.2ms 11.3ms 132.7ms - -
512 41.9ms 21.9ms 176.6ms - -
640 78.9ms 34.2ms 266ms - -
768 134.3ms 52.8ms 350.6ms - -
896 210.9ms 77.2ms 453.1ms - -
1024 313.2ms 109.5ms 544.3ms - -
Factorizing a square matrix with dimension n
as QRP
, where P
is a permutation matrix, Q
is unitary and R
is upper triangular.
thread 'main' panicked at 'cast>SizeMismatch', /Users/leifeld/.cargo/registry/src/index.crates.io-6f17d22bba15001f/bytemuck-1.13.1/src/internal.rs:32:3
implement basic operations (addition, subtraction, matrix multiplication, multiplication by a scalar) for Mat<T>
T
should satisfy the trait ComplexField
for those operations to be implementable. for the implementation of addition, subtraction, and scalar multiplication, Mat::with_dims
could be used, with a closure that takes two elements and adds or subtracts them
for matrix multiplication, we could create a matrix full of zeros with Mat::zeros
, and then fill it up with the result of the computation with faer_core::mul::matmul
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.