Hi, I have some suggestions to improve the API. M

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Suggestions for improving the API about levenberg-marquardt HOT 14 CLOSED

rust-cv commented on July 22, 2024

Suggestions for improving the API

from levenberg-marquardt.

Comments (14)

jannschu commented on July 22, 2024

My implementation of the API change is more or less finished. I will add some documentation.
I also added math rendering to the docs.

⚠️ I noticed something strange, though. After removing the LM call in tests/basic.rs the test still passes (on master). The initial point is already good enough. This is probably a mistake. In the new API I added a doctest using the Himmelblau function and this one fails...

I read a bit through some well-hung implementations and a text book. There are some things to consider changing in the current implemntation anyway. I will open an issue for further discussion.

edit: Here is the Himmelblau example with SciPy's LM implementation, which works. The crate implementation fails even for initial values very close to one of the minimizers. Seems there is a serious bug in the crate's implementation.

from scipy.optimize import root
import numpy as np

def f(x_k):
    [x, y] = x_k
    return np.array([x ** 2 + y - 11, x + y ** 2 - 7])

def jac(x_k):
    [x, y] = x_k
    return np.array([[2 * x, 1], [1, 2 * y]])

print(root(f, x0=[0.0,0.0], jac=jac, method='lm'))

from levenberg-marquardt.

jannschu commented on July 22, 2024

Alright, I did implement some changes to the API. The progess can be found here: https://github.com/jannschu/levenberg-marquardt/tree/rework-api.

What currently still is missing is that residuals and the Jacobian computation should be able to fail.

I also extended the documentation. You can build it locally with RUSTDOCFLAGS="--html-in-header katex-header.html" cargo doc --no-deps --open.

I changed the Jacobian to be non-transposed and the residuals to be a one-dimensional vector. For the following reasons:

Computing J^T * J is faster when we have J in column-major. Same applies for J^T * x.
Having the residuals as a vector simplifies the API in my opinion. The vector case can be covered by providing an example. Also, residuals of different sizes are now naturally possible.
I will rewrite more or less everything in order to obtain the same implementation as in lmfit, which is a C port of the "classic" MINPACK implementation. This one also seems to be identical to one described in the book "Numerical Optimization" by Nocedal and Wright. Benefits:
- Faster because J^T * J + lambda I is not inverted for every new choice of lambda.
- lambda is picked in a less heuristic way.
- There will also be stopping criteria.

That's why I haven't created a pull request, yet. I will finish the implementation and then either create a pull request (if you are interested) or I will publish it as a new crate.

from levenberg-marquardt.

vadixidav commented on July 22, 2024

My bad, accidentally hit the close button.

from levenberg-marquardt.

vadixidav commented on July 22, 2024

Just like the PR, I will prioritize looking at this once I am not busy. I think these suggestions seem good, but I will give a more detailed look later today or tomorrow.

from levenberg-marquardt.

vadixidav commented on July 22, 2024

I think these ideas and changes are all great. Thanks for looking into the test as well. I merged the other PR you put in, but these changes sound pretty big. I will take a look at your branch now.

from levenberg-marquardt.

vadixidav commented on July 22, 2024

One question I have is this: Should LM be able to fail? The reason I ask this is because if we provide a model to LM, if it cannot improve the model it should just give back the original model without any change. That is my general reasoning. Why might it fail?

from levenberg-marquardt.

vadixidav commented on July 22, 2024

Also, I looked over the code you have, and I do think we need to ensure we can pass the Jacobian in pieces. The residuals we could also pass in pieces if necessary.

from levenberg-marquardt.

jannschu commented on July 22, 2024

One question I have is this: Should LM be able to fail? The reason I ask this is because if we provide a model to LM, if it cannot improve the model it should just give back the original model without any change. That is my general reasoning. Why might it fail?

Giving the last parameters back is fine, but knowing that the optimization might have been unsuccessful is essential information. You might want to handle this failure, and if ignoring it is fine you still can. I think yesterday I read this article, for this argument only the first two quotes in there are relevant.

Not being to improve the parameters is one option, but cathastrophic divergence and everything is "inf" or "nan" now is also a possible outcome.

Also, I looked over the code you have, and I do think we need to ensure we can pass the Jacobian in pieces. The residuals we could also pass in pieces if necessary.

The new implementation computes the QR decomposition of the full Jacobian J, so we have to build this matrix at some point. I am aware that computing it in chunks if often more natural. So this is a tradeoff between more natural implementation for Jacobain vs. allocation and copying.

I would rather suggest to provide an example at a prominent position which shows how you can compute your Jacobian and residuals in chunks.

from levenberg-marquardt.

vadixidav commented on July 22, 2024

@jannschu

I would rather suggest to provide an example at a prominent position which shows how you can compute your Jacobian and residuals in chunks

Perhaps the way I initially implemented the matrix multiplication is wrong (almost certainly), but if you take the first row of the transpose jacobian and dot (product) it by the first column of the regular jacobian, you get the top left element of the approximate hessian. Additionally, if you take the first row of the transpose jacobian and multiply/sum it against each column on the regular jacobian, you will get out a vector representing the first row vector of the approximate hessian. This process can also be broken down further. Specifically, the process is just the dot product of all pairs of column vectors in the regular jacobian, forming a symmetric matrix, and the dot product of these two vectors can also be done in pieces. That can be done by computing a chunk of the overall jacobian and then doing M (number of columns) number of piece wise multiplications where each multiplication is just one column in that chunk of the Jacobian (0..M) multiplied by all the columns in that chunk and then you sum the columns and that produces each row vector (0..M) in the approximate Hessian. All of these approximate Hessians must be accumulated to produce the final approximate hessian. This strategy has the advantage that you don't need to have all the rows of the Jacobian in memory, just a few at a time (enough for SIMD optimization).

Giving the last parameters back is fine, but knowing that the optimization might have been unsuccessful is essential information. You might want to handle this failure, and if ignoring it is fine you still can. I think yesterday I read this article, for this argument only the first two quotes in there are relevant.

As for error handling, the estimator will not produce a model if there is an error in computing the model, so by the time you run Levenberg-Marquardt you already have a complete model. It is possible that the Jacobian could be NaN, but the model itself might still be valid. My assumption is that the user passes a valid model into LM, thus so long as we tell the user why we exited, it is okay to treat that the same as all other termination criteria. Another way to look at it is that if LM fails, the original model is the best solution found thus far.

The new implementation computes the QR decomposition of the full Jacobian J, so we have to build this matrix at some point. I am aware that computing it in chunks if often more natural. So this is a tradeoff between more natural implementation for Jacobain vs. allocation and copying.

Why do we need to compute the QR decomp of the full Jacobian? If you look at the formula from Wikipedia here, the Jacobian is first multiplied by its transpose. Of course, the damping is then provided after that point, but you have a much smaller matrix. It is only at this point that we would need to compute any kind of decomposition. It might be that there is some alternative way to solve this problem that I am not aware of, but this is the way I have learned to solve it.

I can fix up the original API to use the correct approximate hessian computation and provide tests to show that doing it in pieces is equivalent to multiplying the whole matrices.

from levenberg-marquardt.

jannschu commented on July 22, 2024

Why do we need to compute the QR decomp of the full Jacobian?

As I already mentioned in #4 (comment) (point 3.) I am working on porting the MINPACK reference implementation to rust. This is almost done now. The argument for doing it with the QR is also presented in the mentioned book. In short:

In the implemntation on master we change lambda, then update J^T * J, and solve the system.
The initial cost for computing J^T * J is O(mn^2), also in the current "block" fashion. Then inverting is O(n^3) every time we change lambda. Doing it with a QR (also initial cost O(mn^2)), we can solve the system for new lambda in O(n^2).

(Just to stress this: Computing the QR of the full Jacobian is as expensive as computing J^T * J)

So, options:

Drop the QR way with O(n^2) instead of O(n^3)
Keep the Jacobi chunks, but build the full matrix afterwards.
Implement the QR step also in a block-fashion to keep the chunks of Jacobians.
Drop the Jacobi chunks.

Regarding the SIMD/performance argument. I guess unless your chunks are relatively large you lose performance.

I am not sure if the memory saving is worth the trouble:

You get a more complicated API.
You are probably slower (chunks would be too small).
Your LM (more specifically QR) implementation is more complex.

About how much memory allocation are we talking?

from levenberg-marquardt.

vadixidav commented on July 22, 2024

@jannschu

Ahh, okay, I understand now. I wasn't aware that LM could be computed in this way, so this is really exciting!

So there are a few main ways we might use this:

Lots of cameras and lots of points, performing bundle adjust
- In this case we would ideally like to compute a sparse matrix (MxN) which is very large
- We also may want to only do this on one closed loop of cameras to minimize the size of this matrix
- My current implementation is not focused on this approach as I do not have an implementation of sparse matrices on hand
One camera is being optimized with a large number of matches (hundreds of thousands at max)
- In this case we have a long matrix (MxN) where M is very large (number of points) and N is 6 (the components from se(3))
- My thought is that the matrix might be so large that this would be prohibitive, but looking at the numbers (6 * 100,000 = 600,000), it seems I was wrong about this (it should all fit in a few MB of memory approximately)
- This is the case I was specifically targeting with the API, although it could be used for bundle adjust as well

Looking back upon this, it seems that there is no issue with running out of memory, except for the bundle adjust case, so your algorithm should work really well for the single camera (or few camera) pose optimization problem. As for the sparse LM problem, we might want to tackle that later as well.

I see where you are going with this now and I am 100% on board 👍. I had some misunderstandings initially. Thank you for your work!

from levenberg-marquardt.

vadixidav commented on July 22, 2024

@jannschu Also, I found where the partial hessian computation was first recommended. It was recommended by @mpizenberg in #1. @mpizenberg, you might want to weigh in here, but I think @jannschu has the right approach. We should just implement a separate version if we want to do large sparse matrices, right? I don't think we will run out of memory unless we create an arbitrarily large matrix by trying to bundle adjust a whole reconstruction at once and making a large sparse matrix that cant fit in contiguous memory, so this implementation should just focus on the dense case.

from levenberg-marquardt.

mpizenberg commented on July 22, 2024

If I remember correctly the reason I was interested in handling residuals individually (or small groups) was because of precision when accumulating many small values. This is the strategy used in DSO for example: https://github.com/JakobEngel/dso/blob/master/src/OptimizationBackend/MatrixAccumulators.h
Otherwise I don't mind handling the Jacobian as one block if it makes other optimizations easier.

from levenberg-marquardt.

vadixidav commented on July 22, 2024

I think we forgot to close this issue, as the refactor @jannschu did is complete. Closing now.

from levenberg-marquardt.

Suggestions for improving the API about levenberg-marquardt HOT 14 CLOSED

Comments (14)

Related Issues (10)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent