geostat-framework / pentapy Goto Github PK

View Code? Open in Web Editor NEW

14.0 14.0 4.0 385 KB

A Python toolbox for pentadiagonal linear systems

Home Page: https://pentapy.readthedocs.io/

License: MIT License

Python 75.48% TeX 8.75% Cython 15.77%

linear-algebra linear-systems math matrix pentadiagonal-linear-systems python solvers toolbox

pentapy's People

Contributors

Stargazers

Watchers

Forkers

standardgalactic hongchengni nvcwelter mothnik

pentapy's Issues

Flattened vs Banded

Very nice work, this is super useful!

I have a quick question, more of a clarification. The documentation says

# create a flattened pentadiagonal matrix
M_flat = (np.random.random((5, size)) - 0.5) * 1e-5

Here, I interpret M_flat to represent each of the 5 diagonals of a pentadiagonal matrix. I believe this is similar to a banded structure.

I see that pentapy supports the matrix being input as Full and Flat. The Full portion makes sense to me, this numpy array has shape (N_col, M_row). I'm a bit confused about Flat because flat typically refers to an (N, N) -> (N^2, 1) transformation (see https://numpy.org/doc/stable/reference/generated/numpy.ndarray.flatten.html). However, the above leads me to believe that Flat is actually just the 5 diagonals, i.e. a numpy array of shape (5, N).

So, this is a long winded way of asking, is it safe to assume that pentapy supports solving pentadiagonal systems represented in a sparse format using just the 5 diagonals to represent the operator (and the right hand side)?

Get rid of develop/master branches and use "main"

We should simplify our branching structure and use the new default "main" that was suggested by GitHub:

Things to take care of:

image links in index.rst and README.md (hopefully auto-redirect as promised for older docs)
links to LICENSE
update CI scripts (only refer to main)
update CONTRIBUTING.md to refer to main
update coveralls branch list
update readthedocs standard branch to main

Drop python implementations beside cython

At the moment the core algorithms are implemented in Cython and Python at the same time:

py_solver.py
solver.pyx

Since we are building wheels with pre-compiled code for almost every system, there is no need to provide python fallback implementations.

Another side-effect is, that the setup will be much more clean.

JOSS review

Overview

This is a neat and well-defined package that provides a big performance improvement on existing python algorithms for solving linear systems of pentadiagonal matrices, and I found it to work as claimed. My main reservation is the documentation, which I think could be improved somewhat, but if the points I make below are addressed/defended satisfactorily, I would be happy to accept it for publication in JOSS.

Installation

The package installed correctly via pip. Dependencies are dealt with by setup.py but scipy (which is required for some of the algorithms) is not installed automatically, and has to be installed manually. Perhaps this could be automatic, and/or see comment in Documentation section below.

Software

For solver=1,2 there is no check for singular matrix so you get a ZeroDivisionError: float division. Algorithms 3,4,5 report this situation specifically (as either an error or a warning). If performance is not impacted, could you check for this? Or simply catch numerical errors and re-raise with a more suggestive error message(s).

Tests

I found the solver to work correctly on some basic but inexhaustive tests of my own - discretising the heat equation both tri- and pentadiagonally and comparing the result to an analytic solution.

I think there should be more tests than are currently implemented in the test suite, and they should include some more realistic examples as well as the random matrix tests. There should also be some basic unit tests of the matrix utilities, e.g. check full->banded->full gives the original matrix.

Documentation on how to run (and extend) the package tests should be provided in the community guidelines, if the author is willing to accept contributions.

Performance

My tests agree with the claims in the paper. But the code only supports 5 of the 7 algorithms compared in the graphs in the README/readthedocs, could those graphs be made consistent with the one in the paper?

Documentation

The README looks a little rushed, e.g. "pentapy is a toolbox to deal with pentadiagonal matrices in Python" is a bit vague and to a large extent the README just duplicates the readthedocs documentation. Personally, I would have a README containing only a statement of intent, authorship and community guidelines, basic installation instructions, and links to detailed documentation and examples (in readthedocs).

Re: community guidelines, I don't see "clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support".

Whilst help(pentapy) provides API-level documentation (some typos/spelling issues here), I think this should also be in the main documentation, the solve and create_full methods are only dealt with in the one example given. Other methods: diag_indices, shift_banded, create_banded are not mentioned at all.

The Installation and Requirements sections of the README could be merged and made clearer, esp. that scipy is an optional dependency that require.

Paper

I think the "unique selling point" of this package is a very siginificant improvement in performance over current implementations/algorithms, and this should be emphasised more in the paper (and the README).

Consider using np.asarray instead of np.array in solve

Hi, thanks for the great library. I have a suggestion that could slightly improve performance when using the solvers with numpy array inputs.

In the solve function, you use np.array for the input matrix and right-hand-side array, which creates new arrays of the inputs even if they are already numpy arrays. I know NumPy's and Scipy's solvers do not modify the input arrays by default so they don't require copying, and it doesn't seem like your two solvers modify them either unless you call shift_banded before the solver. Therefore, I would suggest you use np.asarray whenever you do not require calling shift_banded (similar to how you use np.asanyarray within the tools functions), so that a new array is created only if the input is not already an array with the correct dtype.

I modified your 03_perform_simple.py example program and made it into a gist that shows the difference of the solver times for the PTRANS-I and PTRANS-II solvers when using np.asarray (https://gist.github.com/derb12/3e2670a509e86051ee9dfe5aa76a0a95). The modified solve function is solve_new. I also included just the time required to call np.array and np.asarray on the inputs. An image of the gist's output is below. For small N (< 10000), you can reduce time by ~5-10%, and for large N (> ~10000) the time reduction is ~30%.

If you're interested in making this change, feel free to copy and/or modify any of the code in the gist since it's basically just copies of your code anyway, or I can submit a pull request. You could simplify the code by always using np.asarray and simply calling shift_banded with copy=True, but I wanted to modify as little as possible for the gist.

[Bug] Wrong Memory Access in edge case of `n=3`

The special case of a 3x3 matrix is not handled correctly by both algorithm I and algorithm II.
They have the access pattern first 2 rows - all rows in between - last 2 rows, but for a 3x3 matrix, the first and last 2 rows have an overlap because the second row belongs to both.

This is an edge case, but yet it can happen and lead to wrong memory access.

Unfortunately, the tests do not capture this case because they only capture a single case of n=1_000.

I have already fixed this for #11 by falling back to np.linalg.solve for this special case.

[Very close to Bug] Tests are not exhaustive, scalable, and could have meaningless results

The coverage shows 80%, but only because some lines were hit during the unit tests that only consider one error case and one normal case with n=1_000. I would consider this a bug even though it works because this basically means that the package is mostly untested despite the coverage. Edge cases are not covered which lead to #24 .
On top of that, the most important functionality - the input and error checks - had zero coverage which lead to #23 . It might seem tedious and a waste of time to test those, but cutting short here can easily backfire.

Besides, the test if the x for solving Ax = b is correct by looking at the residuals is very risky. A is initialised randomly, but random pentadiagonal matrices are not guaranteed to be well-conditioned. Looking at the residuals can thus be completely meaingless because the solver might work correctly, but just run into numerical issues because A doesn't want to be solved.
I've written a dedicated function to generate a well-conditioned pentadiagonal A and as one can see, a lot of logic is required to guarantee good condition numbers. With this, the solution x can be checked, but still not via the residuals, but against the solution x_ref obtained by a well-tested standard like the LAPACK Partially Pivoted Banded LU solver wrapped in scipy.linalg.solve_banded.

I fixed all of this in one go with #11 by completely dropping unittest in favour of pure pytest and leveraging parametrized tests that test all relevant combinations with the same test function (including edge cases and all different sorts of input handling). To reduce the load when running the tests, I added pytest-xdist for parallelized tests. Required utility functions are doctested during the pytest runs to ensure that they are also operating correctly.
With this, many different sizes and matrix layout (full, banded row, banded column) can be tested with just a single test function

Python 3.8 wheels

Python 3.8 was released on Oct. 14, 2019.

It is supported by TravisCI and Appveyor.

It it also already implemented in cibuildwheel, but not released yet. See: pypa/cibuildwheel#170.

I think we should wait for cibuildwheel v0.13 and then add py3.8 to the travisCI workflow.

[Enhancement] Evaluate log-determinant for solve

While #27 is open and changes a lot of internal logic, I want to propose another feature for a version upgrade: the computation of the log-determinant. It would add into #27 very easily with a few lines of code.

If we look at how pentapy solves the system Ax = b, it basically factorises A and transforms b as follows for algorithm I:

A = LU where L is a lower triangular matrix whose entries will not be discussed here (its lower triangle is dense) and U is the unit upper triangular matrix where
- the main diagonal consists of ones,
- the first superdiagonal consists of the $\alpha_{i}$-values, and
- the second superdiagonal consists of the $\beta_{i}$-values
- all the rest is zeros
Since the main diagonal of U holds only ones and the log determinant of a triangular matrix is simply the sum of the logarithms of its main diagonal elements, this is 0.
So, the determinant of A is completely encoded in L for which we then also need the main diagonal elements given that this triangular as well. Consequently, the same computations as for U apply. But how should this be done? It cannot be computed when it's dense because that would ruin performance. Well, this is where the next point comes in:
zeta = inv(L) @ b
What does that help us? Well, we now need to look at how the update with inv(L) is achieved:

The parts I marked in red help us getting the main diagonal of inv(L) because this means that

Combining all of this with the fact that the main diagonal elements of the inverse of a triangular matrix are just the reciprocals of its main diagonal elements, we get

Long story short, that means that if we sum up the logarithms of the $\mu_{i}$-values that we get from the factorization anyway (keeping care of the sign), we can get the log-determinant in no time at all. The same applies for $\psi_{i}$ for algorithm two, so we get:

For multiple right-hand sides, the additional load will be vanishing because the factorization and determinant computation only need to be done once.
The error case of $ln\left(0\right)$ is already caught for both algorithms I and II because the factorization will throw a zero-division error before any logarithm is applied ¹⁾.

I tested this for algorithm I against np.linalg.slogdet and it works ✅.

If we allow the user to set a flag like with_logdet, this could be returned along or not. If the default is False, the Python -API does not change, but we would need to ⚠️ adapt the C-level-API again by adding an additional bool and a double pointer ⚠️

In chemotools, we rely heavily on the log determinants for Bayesian regression, but currently we use SciPy's solvers for that. Therefore, this feature would enable us to use pentapy to a way stronger extent than we currently do.

¹⁾ The determinant can however NOT be used to check whether the matrix is singular. This is a common mistake that comes from math classes where everything was done in 100% exact arithmetics. On a computer with finite precision, this is the worst check to be performed and only the condition number can provide true insights for this.

Dropping Python 2.7 and 3.4 support

Since Python 2 will reach it's EOL on 1. Jan 2020, we will drop support for it.
Python 3.4 has already reached its EOL on 2019-03-18, so we are dropping support here, too.

The next minor release (v1.1) will then be Python 3 only.

Since TravisCI supports Windows but only for Python 3 we can then drop the use of Appveyor and stick to one CI system.

[Enhancement] multiple “dependent variables”

At the moment only one dependent variable b is allowed to solve A @ x = b for x.

In other algorithms, multiple dependent variables could be passed, e.g.:
https://numpy.org/devdocs/reference/generated/numpy.linalg.solve.html

Since the solver is implemented in cython, this could be optimized by using prange.

JOSS Review issues

Firstly, I think this is a nicely written library and the code was very easy to read.

I do have some comments, some of which I think need to be addressed as part of the review and some that I think might be nice but are not essential.

I will start with the points that I think need to be addressed as part of the review process:

I believe the test suite could be more comprehensive and check some of the corner cases that arise, for example, when one of the mu_i or psi_i are zero. I also suggest using the illustrative examples from the cited paper as examples/tests. I wrote this test case up to experiment with the code:

import unittest

import numpy as np
from numpy.testing import assert_almost_equal

import pentapy as pp

class TestMuGammaNonZeroExampleFromPaper(unittest.TestCase):
   """Example taken from https://www.hindawi.com/journals/mpe/2015/232456/"""
    def setUp(self): 
        self.mat = np.array([
            [1, 2, 1,  0,  0,  0, 0,  0, 0,  0],
            [3, 2, 2,  5,  0,  0, 0,  0, 0,  0],
            [1, 2, 3,  1, -2,  0, 0,  0, 0,  0],
            [0, 3, 1, -4,  5,  1, 0,  0, 0,  0],
            [0, 0, 1,  2,  5, -7, 5,  0, 0,  0],
            [0, 0, 0,  5,  1,  6, 3,  2, 0,  0],
            [0, 0, 0,  0,  2,  2, 7, -1, 4,  0],
            [0, 0, 0,  0,  0,  2, 1, -1, 4, -3],
            [0, 0, 0,  0,  0,  0, 2, -2, 1,  5],
            [0, 0, 0,  0,  0,  0, 0, -1, 4,  8]
        ])
        self.vec = np.array([
            8, 33, 8, 24, 29, 98, 99, 17, 57, 108
        ])
        self.expected = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
    
    def test_pp1(self):
        sol = pp.solve(self.mat, self.vec)
        assert_almost_equal(sol, self.expected)

    def test_pp2(self):
        sol = pp.solve(self.mat, self.vec, solver=2)
        assert_almost_equal(sol, self.expected)

    def test_validity(self):
        sol = self.mat @ self.expected.reshape((10, 1))
        assert_almost_equal(sol.reshape((10, )), self.vec)

I think the code should warn the user if the solver encounters a case where the method fails (where mu_i, psi_i is zero, for instance) mentioned above, either by raising an appropriate exception or at least a warning. This may be rather tricky with the Cython code, but it is something to think about.
I notice that you have documentation hosted on readthedocs but I had to search for this separately. I think you should include a link in the project README, and, if possible, also on the PyPI listing for the project.
The documentation should include a few comments about what the difference is between the algorithms PTRANS-I and PTRANS-II. (I looked and failed to find any difference in the cases where these can be applied.)
Your code can be optimised by having Cython installed, but there doesn't seem to be any instruction to that effect in the README. (Note that without the Cython speed-up the performance is inferior to the standard sparse scipy solvers.) Cython should probably be listed as an optional dependency.
Similar to the above, this library provides an extra interface to scipy solvers, but scipy is not listed as an (optional) dependency. This should be fixed, at least in the documentation. Perfplot is also used in one of the examples, so it should probably be listed somewhere.
Your documentation should include instructions on how one can contribute to the project and how to report issues (I presume this should be done via Github issues). I suggest something like what is included in this template.
You might need to include references (at the end of the paper) to the packages mentioned in your paper: numpy, scipy, Cython, perfplot. This needs to be clarified by the JOSS editors.
I think the README needs to be expanded, possibly including some of the introduction made in your paper, that explains exactly what and who this package is for.

In addition to the above, I suggest some minor issues that I think will improve the code:

In core.py, you try to import the Cython solver, and if this fails then you fall back to the Python solver. However, you warn the user that this has happened by means of a print statement. This should probably be done by means of a warning. Perhaps something like

import warnings

warnings.warn("Could not import the Cython solver, falling back to Python solver", ImportWarning)

Currently you pass an (optional) integer argument to the solve function. I suggest providing a Solvers enum that wraps the possible solvers. This will make the code easier to read and more explicit about which solver is which (without having to read the documentation to find the number to use). For example,

from enum import IntEnum
class Solvers(IntEnum):
    PTRANS_I = 1
    PTRANS_II = 2
    LAPACK_BANDED = 3
    SP_SOLVE = 4
    SP_SOLVE_UMF = 5

Since IntEnum is a subclass of int, you wouldn't have to change any of your other code.

Generally the code is well documented, although the examples included could perhaps have some extra comments/doc-strings to help the users understand what is happening in the code.
I found a typo in your acknowledgements in the paper: an "e" is missing from "Framwork".

I am happy to discuss any of these points, just respond to this issue.

[Enhencement] provide solver for cimport

We could provide the solvers penta_solver1 and penta_solver2 for other cython projects, by providing a *.pxd file.

[Bug] Cython division optimizations are too aggressive for the error handling implemented on the Python side

The Cython compiler options in this line

# cython: language_level=3, boundscheck=True, wraparound=False, cdivision=True

are counterproductive when the following checks are made here

        try:
            return penta_solver1(mat_flat, rhs)
        except ZeroDivisionError:
            warnings.warn("pentapy: PTRANS-I not suitable for input-matrix.")
            return np.full_like(rhs, np.nan)

because cdivision=True explicitly prevents the compiled C-code from ever raising a ZeroDivisionError.
This branch of the code was always dead. This is also indicated by the coverage which shows that these lines was never hit.
Actually, the code produced nan-values naturally because a single nan will propagate to contaminate the full solution, but the error handling was never triggered.

For the warning to be issued, cdivision=False is required here.
Even though the Cython file will be relatively yellow, there is no need to hesitate when setting cdivision=False. Cython does a very good job when it comes to handling the more elaborate division because it helps the branch predictor by making the zero-case the unlikely case. With this, the performance impact is negligible:

I will fix this in the same go as #11

[Enhancement] Enable computation of main diagonal of the inverse from factorizations

Requiring the main diagonal of the inverse is not uncommon when solving pentadiagonal systems, e.g., during Cross-Validation for spline smoothing.
Similar to #28 this should be easily doable from the factorization A=LU where we have:

Since the main diagonal of L is known, this can also be written as A=LDU where

Now, special formulas can be applied for computing selected main diagonal elements of the inverse inv(A) as taken from the master thesis

Fasllija, E., Parallel computation of the diagonal of the inverse of a sparse matrix, 2017, pp. 8-9

All of this applies in our case and it would be a minimum effort to add a function that computes this from a factorization.
Given that one can solely rely on U, the fact that L's lower triangle is dense should not pose a problem.