anucvml / ddn Goto Github PK

View Code? Open in Web Editor NEW

231.0 231.0 36.0 11.3 MB

Deep Declarative Networks

License: MIT License

Python 5.86% Jupyter Notebook 94.14%

ddn's People

Contributors

Stargazers

Watchers

ddn's Issues

backpropagate through r and c for optimal transport layer

Implement and test gradients for r and c in optimal transport layer as H^{-1}A^T(AH^{-1}A^T)^{-1}(AH^{-1}B - C) - H^{-1}B
for appropriate B and C.

optimization problem with multple target variables

I have an optimization problem with two target variables in the following form:

$y_1, y_2 = \argmin_{y_1, y_2} f(x,y_1, y_2)$

where $y_1$ and $y_2$ do not have the same shape. Is it possible to use your pytorch implementation of DDN for this?

Thank you in advance for your answer.

Update linear algebra routines

Update code to use linear algebra routines in pytorch.linalg and support pytorch v1.9.0 and above only.

QR decomposition for least squares

Evaluate using QR decommission for the least squares pytorch node instead of inverting A^TA.

batched operation in pytorch pnp node

hello,
I'm using the release code of your pnp_node.py,
as my inputs are batched points, each one with different pose , so i would like to use this operation:

    # # Alternatively, disentangle batch element optimization:
    # for i in range(p2d.size(0)):
    #     Ki = K[i:(i+1),...] if K is not None else None
    #     theta[i, :] = self._run_optimization(p2d[i:(i+1),...],
    #         p3d[i:(i+1),...], w[i:(i+1),...], Ki, y=theta[i:(i+1),...])

however, i find the upper level function dose not update the w value.
I printed the theta.grad to check whether the gradient is calculated, and find that theta[i:(i+1),...].grad is None.
maybe when the optimization is done, the slice or copy ops will not copy the grad value.
Is there any way for solving this problem?

Very appreciate for your advice.

Broadcasting with r and c in optimal transport

Hello I am currently resolving a problem where the cost matrix M of shape (D, D) is somewhat fixed. Meanwhile r and c are batched and of dimension (B, D) and (C, D). Is there a way to adapt so that the layer can compute the OT loss is calculated of shape (B, C). Thank you very much.

tutorials of perspective-n-point

Could you please provide the tutorials of perspective-n-point with your DDNs ?

Tips for getting lagrangian derivative to 0?

Hi,

I have an equality constrained problem which I solve with RANSAC and then refine in a similar fashion as done in your pnp node with an auxillary objective function (as the ransac objective is typically non-differentiable), i approximately enforce the constraints by adding them to the objective only during the refinement. This seems to work well and my constraints are still fulfilled after refining. However, I still get objective gradients which cannot be solved exactly from my constraints:

UserWarning: Non-zero Lagrangian gradient at y:
[15.481806 -9.70834 -7.652554 18.65691 3.6125593 11.075308
0.03811455 11.670857 13.675308 ]
fY: [ 2.615292 5.0672874 -7.8673334 57.839783 12.556461 29.84853
-5.591362 1.9208729 -3.0231378]

It can be seen that LY is smaller than fY, but not 0. Have you had any similar experiences? Is there some optimization tricks here which may be employed? I can note that my constraints are overspecified and could be reduced, but not sure if that would help.

I guess issues could also come from the fact that my constraints are only approximately satisfied after my optimization, but they are very close to fulfilled (about 1e-8).

have problems with back-propagation

Hi there, I really appreciate your work and I see big potentials in it.

I'm trying to embed a DDN layer into my work, and I'm having trouble with it, which has been bothering me for weeks. It seems that my implementation of a DDN layer did not properly backpropagate gradients.

In detail, I want to use DDN to perform a least square minimization, say, $y=argmin_{u} | uF-x |_2$, while allowing the gradient to backpropagate through the layer. If I use a DDN layer to solve this problem, it seems that the network does not converge well; but if I compute $y$ with a closed-form solution, there is no such problem. The problem still exists even if I initialize the output using the closed-form solution in the DDN layer. The layer performs well in inference, so I assume that the accuracy is not a problem, but the back-propagating is, the DDN layer may block the gradient to be propagated to the network that's in front of it.

I'm not sure if I implemented it in the right way. When I implement the solve method, do I have to detach all the input variables? When I call the solve method, do I have to put it in the torch.no_grad()? And, do I have to manually add y.requires_grad_() after y is solved? I have tried it with and without the above, but it did not seem to work properly, I think I must have missed something.

Looking forward to your reply.

Future Feature Request: Vectorization

Pytorch is (slowly) introducing vmap: https://pytorch.org/docs/master/generated/torch.vmap.html?highlight=vmap#torch.vmap
When this feature becomes stable it seems like a great addition for jacobian calculation, probably giving additional performance.

Computation of fXY extremely slow?

Issue

The forloop involving

ddn/ddn/pytorch/node.py

Lines 138 to 145 in 1240a69

    
           for x_split, x_size, n in zip(xs_split, xs_sizes, self.n): 
        
               if isinstance(x_split[0], torch.Tensor) and x_split[0].requires_grad: 
        
                   gradient = x_split[0].new_zeros(self.b, n) # bxn 
        
                   for i, Bi in enumerate(fXY(x_split)): 
        
                       gradient[:, i] = torch.einsum('bm,bm->b', (Bi, u)) 
        
                   gradients.append(gradient.reshape(x_size)) 
        
               else: 
        
                   gradients.append(None)

and

ddn/ddn/pytorch/node.py

Lines 254 to 256 in 1240a69

    
           fXY = lambda x: (fXiY.detach().squeeze(-1) 
        
               if fXiY is not None else torch.zeros_like(fY) 
        
               for fXiY in (self._batch_jacobian(fY, xi) for xi in x))

is extremely slow.

Solution?

Simply replacing these with:

gradients = []
for x,size in zip(xs,xs_sizes):
  if x.requires_grad:
    gradient = torch.einsum('byx,by->bx', fXY(x),u).reshape(size)
    gradients.append(gradient)
  else:
    gradients.append(None)

fXY = lambda x : self._batch_jacobian(fY, x)

Potential Issues with Solution

I am not able to see any issues with the solution. There would need to be some changes to the constrained nodes as well however to accomodate it. Perhaps there is something I'm missing, please let me know if that's the case.

RuntimeError: cholesky_cuda: For batch 6: U(1,1) is zero, singular U.

🐛 Bug

Error related to Cholesky factorisation, in optimal_transport.py here at line 144. Adding a small constant eps does not solve the problem, even at eps=1e-1

Environment

OS: Ubuntu 20.04.1 LTS
PyTorch: 1.7.1
CUDA Toolkit: 10.2.89
Python: 3.7.9

Reproduction

Data: data.zip
Code:

import torch

src = 'data'
data = torch.load(src)
r = data['r']
W = data['W']; H = data['H']
P = data['P']; PdivC = data['PdivC']
# A small constant
eps = data['eps']
# Scale the constant if needed
# eps *= 1000
print(eps.max())

block_11 = torch.cholesky(torch.diag_embed(r[:, 1:H]) - torch.einsum("bij,bkj->bik", P[:, 1:H, 0:W], PdivC) + eps)

	for x_split, x_size, n in zip(xs_split, xs_sizes, self.n):
	if isinstance(x_split[0], torch.Tensor) and x_split[0].requires_grad:
	gradient = x_split[0].new_zeros(self.b, n) # bxn
	for i, Bi in enumerate(fXY(x_split)):
	gradient[:, i] = torch.einsum('bm,bm->b', (Bi, u))
	gradients.append(gradient.reshape(x_size))
	else:
	gradients.append(None)

	fXY = lambda x: (fXiY.detach().squeeze(-1)
	if fXiY is not None else torch.zeros_like(fY)
	for fXiY in (self._batch_jacobian(fY, xi) for xi in x))