stefanwebb / flowtorch-old Goto Github PK

Separating Normalizing Flows code from Pyro and improving API

License: MIT License

Python 69.22% JavaScript 15.54% CSS 14.86% Batchfile 0.38%

normalizing-flows pytorch bayesian-inference bayesian-statistics probabilistic-graphical-models probabilistic-programming probabilistic-models

flowtorch-old's Issues

TransformedDistribution does not handle sample_shapes with >1 dimension

d, param = flowtorch.bijectors.AffineAutoregressive(
    flowtorch.params.DenseAutoregressive()
)(dist.Independent(dist.Normal(torch.zeros(3), torch.ones(3)), 1))
d.rsample((10,2)).shape

should return the same [10,2,3] shape as dist.Independent(dist.Normal(torch.zeros(3), torch.ones(3)), 1).rsample((10,2)).shape, but it currently RuntimeErrors due to mismatched shape.

Replace .event_dim with .input_event_dim + .output_event_dim

PyTorch distributions already fixed this

BTW can we simply replace these with .input_event_shape and .output_event_shape, or are the complete shapes unknown at time of Bijector construction?

scalar base_distributions are not well supported

Expected: AffineAutoregressive(DenseAutoregressive) should be able to use a scalar base_distribution

Actual: When using a dist.Normal(0, 1) base distribution, we get the error:

        # Shape the output
>       h = h.reshape(x.size()[:-len(self.input_shape)] + (self.output_multiplier, self.input_dims))
E       RuntimeError: shape '[2, 1]' is invalid for input of size 200

../simplex/simplex/params/dense_autoregressive.py:198: RuntimeError

RC: In particular, for scalar distributions both batch_shape and event_shape are torch.Size([]) which is len zero, whereas a bit of DenseAutoregressive.{_build,_forward} seems to assume at least one of them is nonempty (so Param.input_shape is nonempty). We should modify this code to handle the special case batch_dim == event_dim == torch.Size([]).

Testing output shapes of `Bijector`s

Regression introduced by identity initialization

I ran a git bisect and found f42811c to break some tests in beanmachine which assert on the closeness of variational approximations (in terms of means and K-S statistics).

Will add a MVP test to repro.

Docstrings for all existing classes

Having converted the class/methods to use type hints, we should complete existing code to have docstrings and verify their output in the generated docs

E2E test measuring K-S against non-trivial target log density (GMM or Neals funnel)

RuntimeError using {batch,event}_shape=[] base/target distributions

Using a batch_shape=[] and event_shape=[] base/target results in a shape mismatch RuntimeError

import torch
import torch.distributions as dist
import flowtorch
import flowtorch.bijectors as bijectors
# Lazily instantiated flow plus base and target distributions
flow = bijectors.AffineAutoregressive(
    flowtorch.params.DenseAutoregressive()
)

########################
## LOOK HERE ##
base_dist = dist.Normal(0, 1)
target_dist = dist.Normal(5, 1)
#########################


# Instantiate transformed distribution and parameters
new_dist, params = flow(base_dist)
# Training loop
opt = torch.optim.Adam(params.parameters(), lr=1e-3)
for idx in range(501):
    opt.zero_grad()
    # Minimize KL(p || q)
    y = target_dist.sample((1000,))
    loss = -new_dist.log_prob(y).mean()
    if idx % 100 == 0:
        print('epoch', idx, 'loss', loss)
        
    loss.backward()
    opt.step()

sns.relplot(
    data=pd.DataFrame(new_dist.sample((100,)).detach().numpy()),
    x=0, y=1
)

DenseAutoregresive.init_weights causes unstable learning for 1D target distributions

When I run the example on flowtorch.ai with a 1D distribution:

import torch
import torch.distributions as dist
import flowtorch
import flowtorch.bijectors as bijectors
# Lazily instantiated flow plus base and target distributions
flow = bijectors.AffineAutoregressive(
    flowtorch.params.DenseAutoregressive()
)
base_dist = dist.Normal(torch.zeros(1), torch.ones(1))
target_dist = dist.Normal(torch.zeros(1)+5, torch.ones(1))
# Instantiate transformed distribution and parameters
new_dist, params = flow(base_dist)
# Training loop
opt = torch.optim.Adam(params.parameters(), lr=1e-3)
for idx in range(501):
    opt.zero_grad()
    # Minimize KL(p || q)
    y = target_dist.sample((1000,))
    loss = -new_dist.log_prob(y).mean()
    if idx % 100 == 0:
        print('epoch', idx, 'loss', loss)
        
    loss.backward()
    opt.step()

sns.relplot(
    data=pd.DataFrame(new_dist.sample((100,)).detach().numpy()),
    x=0, y=1
)

The loss goes to NaN unless the learning rate is set extremely low (1e-15 gives sensible results).

Removing the call to self._init_weights in DenseAutoregressiveresolves the issue and allows a more reasonable1e-3` learning rate.

Bijector.log_abs_det_jacobian cache is not correctly invalidated

When I run

bij.log_abs_det_jacobian(x, y, params)
y2 = bij.forward(x2)
print(bij.log_abs_det_jacobian(x2, y2, params)

I expect to get the jacobian at x2.

Actual behavior: I get the jacobian at x due to incorrect invalidation of J_cache

2/10/2021 meeting

@stefanwebb

Add user docs on shape information, constraints, and bijector interface (CC @fritzo on PR)
Integrate constraint and {forward,reverse}_shape methods from torch transforms; put up PR
Finish landing page
reach out to nflows authors

@feynmanliang

Prototype calling Independent to reinterp all batch dimensions before flowtorch because (1) simplifies shape dance inside bijectors and (2) do not want multiple independent bijectors for a batch_size > 1
Reach out to UMNN
Pull BatchNorm out from develop
Repro table 3 from https://arxiv.org/pdf/1908.09257.pdf; leave some missing for starter tasks

Type hints (and passing `mypy`) for all existing classes/methods

We need to add type hints to the existing code and maintain this habit moving forwards

Ability to opt out of caching

Caching is pretty buggy right now and causing the simplex feature branch of beanmachine to fail CI. Can we hide it behind a feature flag?

TransformedDistribution.log_prob does not respect batch_shape

import flowtorch
d = dist.Normal(torch.zeros(2), torch.ones(2))
x = d.sample((10,))
td, _ = flowtorch.bijectors.AffineAutoregressive(flowtorch.params.DenseAutoregressive())(d)


print(d.log_prob(x).shape, td.log_prob(x).shape)

should report identical shapes, but it looks like the TransformedDistribution does not respect batch_shape=[1]

Sign error in TransformedDistribution.log_prob?

When I run

import pandas as pd
import seaborn as sns
import torch
import torch.distributions as dist

sns.set_style('darkgrid')

import simplex.bijectors
import simplex.params

class MockParams:
    def __init__(self):
        self.permutation = torch.randperm(1, device='cpu')
        
    def __call__(self, x):
        return torch.zeros(1), torch.log(torch.ones(1)*0.5)

td, params = simplex.bijectors.AffineAutoregressive(lambda input_shape, param_shape: MockParams())(dist.Normal(0, 1))
y = torch.linspace(-10, 10, steps=100).unsqueeze(1)
x = td.bijector._inverse(y, td.params)
sns.relplot(data=pd.DataFrame({
    'y': x.detach().squeeze().numpy(),
    'p_normal': torch.exp(td.base_dist.log_prob(y)).detach().squeeze().numpy(),
    'p_transformed': torch.exp(td.log_prob(y)).detach().squeeze().numpy(),
}).melt(id_vars=['y']), x='y', y='value', hue='variable', kind='line')

I expect that scaling the standard Normal by 0.5 results in a Normal distribution more tightly peaked at zero. It seems like the opposite is happening because we have a sign error in the jacobian determinant.

Unit test for 1D TransformedDistribution

AffineAutoregressive leaks global state

This static attribute https://github.com/stefanwebb/flowtorch/blob/master/flowtorch/bijectors/affine_autoregressive.py#L18 is a global singleton and leaks across runs.

To repro unexpected behavior:

Train an AffineAutoregressive against a 2D distribution (default_param_fn.permutation gets set to a size 2 permutation)
Initialize a new AffineAutoregressive and train it against something that is not 2D; this will fail because default_param_fn.permutation is already set so the new shape is not used.

Dividing docs into Reference / Users / Developers

I'd like to divide the documentation into three sections:

Reference: the classes/methods extracted from the docstrings
Users: examples of how to use the library in practice
Developers: a guide on how to contribute code to the library and write new Bijectors, Params, etc.

Source distributions should not include docs / website

Our source distributions (https://pypi.org/project/flowtorch/0.0.dev2/#files) don't filter out the docs/website. As a result, the javascript in there is causing a Yarn validation failure internally at FB.

TransformedDistribution.log_prob return shape incompatible with batch dimension semantic

When I run

import simplex.bijectors
import simplex.params

d, params = simplex.bijectors.AffineAutoregressive(
    simplex.params.DenseAutoregressive())(dist.Normal(0, 1))
d.log_prob(torch.zeros((10,1))).shape

I expect the shape of the log_prob of a (sample_shape=[10], batch_shape=[], event_shape=[1]) sample+distribution to have shape [10].

Actual behavior: Result is of shape [10, 10]

RC: TransformedDistribution.log_prob is broadcasting a summation over a row / column vector

`Compose` for combining `Bijector`s. This may be the only method for chaining them and is very important as non-trivial normalizing flows have more than one bijection

Param.state is never modified, resulting in stale cachign in Bijector and inplace modification errors in autograd

Expected: When I call backwards on the result of Bijector.forward after taking a optimizer.step() on the bijector's params using the same inputs x, I expect a new value of y to be computed using x and the updated params.

Actual: Since Params.state is never incremented, cache invalidation based on state_cache will never occur. When we try to back-propagate through a y whose parameters have already been optim.step()ed, we get

>       Variable._execution_engine.run_backward(
            tensors, grad_tensors_, retain_graph, create_graph,
            allow_unreachable=True)  # allow_unreachable flag
E       RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [1]] is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

/opt/conda/lib/python3.8/site-packages/torch/autograd/__init__.py:130: RuntimeError

Remove code for existing caching mechanism. We decided to do this differently but the old code is still left in...

Simple fix to remove obsolete caching code

`DenseAutoregressive` initialization can be unstable

If you use too many layers or hidden units relative to the input dimension in DenseAutoregressive, the new initialization scheme can make the weights unstable, with the output distribution then not being N[0,1].

I think this is because each row of each weight matrix is normalized by the l_2 norm of the product of the previous weights, and this can underflow. It shouldn't be a problem in most practical uses of DenseAutoregressive, but we may want to flag a warning to the user if this happens and/or solve the problem by working in a type of log-space

stefanwebb / flowtorch-old Goto Github PK

flowtorch-old's Issues

Recommend Projects

Recommend Topics

Recommend Org