Giter Site home page Giter Site logo

stefanwebb / flowtorch-old Goto Github PK

View Code? Open in Web Editor NEW
36.0 6.0 1.0 4.31 MB

Separating Normalizing Flows code from Pyro and improving API

Home Page: https://flowtorch.ai

License: MIT License

Python 69.22% JavaScript 15.54% CSS 14.86% Batchfile 0.38%
normalizing-flows pytorch bayesian-inference bayesian-statistics probabilistic-graphical-models probabilistic-programming probabilistic-models

flowtorch-old's Issues

TransformedDistribution does not handle sample_shapes with >1 dimension

d, param = flowtorch.bijectors.AffineAutoregressive(
    flowtorch.params.DenseAutoregressive()
)(dist.Independent(dist.Normal(torch.zeros(3), torch.ones(3)), 1))
d.rsample((10,2)).shape

should return the same [10,2,3] shape as dist.Independent(dist.Normal(torch.zeros(3), torch.ones(3)), 1).rsample((10,2)).shape, but it currently RuntimeErrors due to mismatched shape.

scalar base_distributions are not well supported

Expected: AffineAutoregressive(DenseAutoregressive) should be able to use a scalar base_distribution

Actual: When using a dist.Normal(0, 1) base distribution, we get the error:

        # Shape the output
>       h = h.reshape(x.size()[:-len(self.input_shape)] + (self.output_multiplier, self.input_dims))
E       RuntimeError: shape '[2, 1]' is invalid for input of size 200

../simplex/simplex/params/dense_autoregressive.py:198: RuntimeError

RC: In particular, for scalar distributions both batch_shape and event_shape are torch.Size([]) which is len zero, whereas a bit of DenseAutoregressive.{_build,_forward} seems to assume at least one of them is nonempty (so Param.input_shape is nonempty). We should modify this code to handle the special case batch_dim == event_dim == torch.Size([]).

Docstrings for all existing classes

Having converted the class/methods to use type hints, we should complete existing code to have docstrings and verify their output in the generated docs

RuntimeError using {batch,event}_shape=[] base/target distributions

Using a batch_shape=[] and event_shape=[] base/target results in a shape mismatch RuntimeError

import torch
import torch.distributions as dist
import flowtorch
import flowtorch.bijectors as bijectors
# Lazily instantiated flow plus base and target distributions
flow = bijectors.AffineAutoregressive(
    flowtorch.params.DenseAutoregressive()
)

########################
## LOOK HERE ##
base_dist = dist.Normal(0, 1)
target_dist = dist.Normal(5, 1)
#########################


# Instantiate transformed distribution and parameters
new_dist, params = flow(base_dist)
# Training loop
opt = torch.optim.Adam(params.parameters(), lr=1e-3)
for idx in range(501):
    opt.zero_grad()
    # Minimize KL(p || q)
    y = target_dist.sample((1000,))
    loss = -new_dist.log_prob(y).mean()
    if idx % 100 == 0:
        print('epoch', idx, 'loss', loss)
        
    loss.backward()
    opt.step()

sns.relplot(
    data=pd.DataFrame(new_dist.sample((100,)).detach().numpy()),
    x=0, y=1
)

DenseAutoregresive.init_weights causes unstable learning for 1D target distributions

When I run the example on flowtorch.ai with a 1D distribution:

import torch
import torch.distributions as dist
import flowtorch
import flowtorch.bijectors as bijectors
# Lazily instantiated flow plus base and target distributions
flow = bijectors.AffineAutoregressive(
    flowtorch.params.DenseAutoregressive()
)
base_dist = dist.Normal(torch.zeros(1), torch.ones(1))
target_dist = dist.Normal(torch.zeros(1)+5, torch.ones(1))
# Instantiate transformed distribution and parameters
new_dist, params = flow(base_dist)
# Training loop
opt = torch.optim.Adam(params.parameters(), lr=1e-3)
for idx in range(501):
    opt.zero_grad()
    # Minimize KL(p || q)
    y = target_dist.sample((1000,))
    loss = -new_dist.log_prob(y).mean()
    if idx % 100 == 0:
        print('epoch', idx, 'loss', loss)
        
    loss.backward()
    opt.step()

sns.relplot(
    data=pd.DataFrame(new_dist.sample((100,)).detach().numpy()),
    x=0, y=1
)

The loss goes to NaN unless the learning rate is set extremely low (1e-15 gives sensible results).

Removing the call to self._init_weights in DenseAutoregressiveresolves the issue and allows a more reasonable1e-3` learning rate.

2/10/2021 meeting

@stefanwebb

  • Add user docs on shape information, constraints, and bijector interface (CC @fritzo on PR)
  • Integrate constraint and {forward,reverse}_shape methods from torch transforms; put up PR
  • Finish landing page
  • reach out to nflows authors

@feynmanliang

  • Prototype calling Independent to reinterp all batch dimensions before flowtorch because (1) simplifies shape dance inside bijectors and (2) do not want multiple independent bijectors for a batch_size > 1
  • Reach out to UMNN
  • Pull BatchNorm out from develop
  • Repro table 3 from https://arxiv.org/pdf/1908.09257.pdf; leave some missing for starter tasks

Ability to opt out of caching

Caching is pretty buggy right now and causing the simplex feature branch of beanmachine to fail CI. Can we hide it behind a feature flag?

TransformedDistribution.log_prob does not respect batch_shape

import flowtorch
d = dist.Normal(torch.zeros(2), torch.ones(2))
x = d.sample((10,))
td, _ = flowtorch.bijectors.AffineAutoregressive(flowtorch.params.DenseAutoregressive())(d)


print(d.log_prob(x).shape, td.log_prob(x).shape)

should report identical shapes, but it looks like the TransformedDistribution does not respect batch_shape=[1]

Sign error in TransformedDistribution.log_prob?

When I run

import pandas as pd
import seaborn as sns
import torch
import torch.distributions as dist

sns.set_style('darkgrid')

import simplex.bijectors
import simplex.params

class MockParams:
    def __init__(self):
        self.permutation = torch.randperm(1, device='cpu')
        
    def __call__(self, x):
        return torch.zeros(1), torch.log(torch.ones(1)*0.5)

td, params = simplex.bijectors.AffineAutoregressive(lambda input_shape, param_shape: MockParams())(dist.Normal(0, 1))
y = torch.linspace(-10, 10, steps=100).unsqueeze(1)
x = td.bijector._inverse(y, td.params)
sns.relplot(data=pd.DataFrame({
    'y': x.detach().squeeze().numpy(),
    'p_normal': torch.exp(td.base_dist.log_prob(y)).detach().squeeze().numpy(),
    'p_transformed': torch.exp(td.log_prob(y)).detach().squeeze().numpy(),
}).melt(id_vars=['y']), x='y', y='value', hue='variable', kind='line')

I expect that scaling the standard Normal by 0.5 results in a Normal distribution more tightly peaked at zero. It seems like the opposite is happening because we have a sign error in the jacobian determinant.

AffineAutoregressive leaks global state

This static attribute https://github.com/stefanwebb/flowtorch/blob/master/flowtorch/bijectors/affine_autoregressive.py#L18 is a global singleton and leaks across runs.

To repro unexpected behavior:

  • Train an AffineAutoregressive against a 2D distribution (default_param_fn.permutation gets set to a size 2 permutation)
  • Initialize a new AffineAutoregressive and train it against something that is not 2D; this will fail because default_param_fn.permutation is already set so the new shape is not used.

Dividing docs into Reference / Users / Developers

I'd like to divide the documentation into three sections:

  • Reference: the classes/methods extracted from the docstrings
  • Users: examples of how to use the library in practice
  • Developers: a guide on how to contribute code to the library and write new Bijectors, Params, etc.

TransformedDistribution.log_prob return shape incompatible with batch dimension semantic

When I run

import simplex.bijectors
import simplex.params

d, params = simplex.bijectors.AffineAutoregressive(
    simplex.params.DenseAutoregressive())(dist.Normal(0, 1))
d.log_prob(torch.zeros((10,1))).shape

I expect the shape of the log_prob of a (sample_shape=[10], batch_shape=[], event_shape=[1]) sample+distribution to have shape [10].

Actual behavior: Result is of shape [10, 10]

RC: TransformedDistribution.log_prob is broadcasting a summation over a row / column vector

Param.state is never modified, resulting in stale cachign in Bijector and inplace modification errors in autograd

Expected: When I call backwards on the result of Bijector.forward after taking a optimizer.step() on the bijector's params using the same inputs x, I expect a new value of y to be computed using x and the updated params.

Actual: Since Params.state is never incremented, cache invalidation based on state_cache will never occur. When we try to back-propagate through a y whose parameters have already been optim.step()ed, we get

>       Variable._execution_engine.run_backward(
            tensors, grad_tensors_, retain_graph, create_graph,
            allow_unreachable=True)  # allow_unreachable flag
E       RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [1]] is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

/opt/conda/lib/python3.8/site-packages/torch/autograd/__init__.py:130: RuntimeError

`DenseAutoregressive` initialization can be unstable

If you use too many layers or hidden units relative to the input dimension in DenseAutoregressive, the new initialization scheme can make the weights unstable, with the output distribution then not being N[0,1].

I think this is because each row of each weight matrix is normalized by the l_2 norm of the product of the previous weights, and this can underflow. It shouldn't be a problem in most practical uses of DenseAutoregressive, but we may want to flag a warning to the user if this happens and/or solve the problem by working in a type of log-space

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.