stefanwebb / flowtorch-old Goto Github PK
View Code? Open in Web Editor NEWSeparating Normalizing Flows code from Pyro and improving API
Home Page: https://flowtorch.ai
License: MIT License
Separating Normalizing Flows code from Pyro and improving API
Home Page: https://flowtorch.ai
License: MIT License
When I run
import pandas as pd
import seaborn as sns
import torch
import torch.distributions as dist
sns.set_style('darkgrid')
import simplex.bijectors
import simplex.params
class MockParams:
def __init__(self):
self.permutation = torch.randperm(1, device='cpu')
def __call__(self, x):
return torch.zeros(1), torch.log(torch.ones(1)*0.5)
td, params = simplex.bijectors.AffineAutoregressive(lambda input_shape, param_shape: MockParams())(dist.Normal(0, 1))
y = torch.linspace(-10, 10, steps=100).unsqueeze(1)
x = td.bijector._inverse(y, td.params)
sns.relplot(data=pd.DataFrame({
'y': x.detach().squeeze().numpy(),
'p_normal': torch.exp(td.base_dist.log_prob(y)).detach().squeeze().numpy(),
'p_transformed': torch.exp(td.log_prob(y)).detach().squeeze().numpy(),
}).melt(id_vars=['y']), x='y', y='value', hue='variable', kind='line')
I expect that scaling the standard Normal by 0.5 results in a Normal distribution more tightly peaked at zero. It seems like the opposite is happening because we have a sign error in the jacobian determinant.
This static attribute https://github.com/stefanwebb/flowtorch/blob/master/flowtorch/bijectors/affine_autoregressive.py#L18 is a global singleton and leaks across runs.
To repro unexpected behavior:
AffineAutoregressive
against a 2D distribution (default_param_fn.permutation
gets set to a size 2 permutation)AffineAutoregressive
and train it against something that is not 2D; this will fail because default_param_fn.permutation
is already set so the new shape is not used.Expected: When I call backward
s on the result of Bijector.forward
after taking a optimizer.step()
on the bijector's params
using the same inputs x
, I expect a new value of y
to be computed using x
and the updated params
.
Actual: Since Params.state
is never incremented, cache invalidation based on state_cache
will never occur. When we try to back-propagate through a y
whose parameters have already been optim.step()
ed, we get
> Variable._execution_engine.run_backward(
tensors, grad_tensors_, retain_graph, create_graph,
allow_unreachable=True) # allow_unreachable flag
E RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [1]] is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
/opt/conda/lib/python3.8/site-packages/torch/autograd/__init__.py:130: RuntimeError
import flowtorch
d = dist.Normal(torch.zeros(2), torch.ones(2))
x = d.sample((10,))
td, _ = flowtorch.bijectors.AffineAutoregressive(flowtorch.params.DenseAutoregressive())(d)
print(d.log_prob(x).shape, td.log_prob(x).shape)
should report identical shapes, but it looks like the TransformedDistribution
does not respect batch_shape=[1]
When I run
import simplex.bijectors
import simplex.params
d, params = simplex.bijectors.AffineAutoregressive(
simplex.params.DenseAutoregressive())(dist.Normal(0, 1))
d.log_prob(torch.zeros((10,1))).shape
I expect the shape of the log_prob
of a (sample_shape=[10], batch_shape=[], event_shape=[1]) sample+distribution to have shape [10].
Actual behavior: Result is of shape [10, 10]
RC: TransformedDistribution.log_prob
is broadcasting a summation over a row / column vector
Using a batch_shape=[] and event_shape=[] base/target results in a shape mismatch RuntimeError
import torch
import torch.distributions as dist
import flowtorch
import flowtorch.bijectors as bijectors
# Lazily instantiated flow plus base and target distributions
flow = bijectors.AffineAutoregressive(
flowtorch.params.DenseAutoregressive()
)
########################
## LOOK HERE ##
base_dist = dist.Normal(0, 1)
target_dist = dist.Normal(5, 1)
#########################
# Instantiate transformed distribution and parameters
new_dist, params = flow(base_dist)
# Training loop
opt = torch.optim.Adam(params.parameters(), lr=1e-3)
for idx in range(501):
opt.zero_grad()
# Minimize KL(p || q)
y = target_dist.sample((1000,))
loss = -new_dist.log_prob(y).mean()
if idx % 100 == 0:
print('epoch', idx, 'loss', loss)
loss.backward()
opt.step()
sns.relplot(
data=pd.DataFrame(new_dist.sample((100,)).detach().numpy()),
x=0, y=1
)
{forward,reverse}_shape
methods from torch transforms; put up PRnflows
authorsIndependent
to reinterp all batch dimensions before flowtorch
because (1) simplifies shape dance inside bijectors and (2) do not want multiple independent bijectors for a batch_size > 1
UMNN
BatchNorm
out from develop
Simple fix to remove obsolete caching code
Having converted the class/methods to use type hints, we should complete existing code to have docstrings and verify their output in the generated docs
We need to add type hints to the existing code and maintain this habit moving forwards
I ran a git bisect
and found f42811c to break some tests in beanmachine which assert on the closeness of variational approximations (in terms of means and K-S statistics).
Will add a MVP test to repro.
Caching is pretty buggy right now and causing the simplex feature branch of beanmachine to fail CI. Can we hide it behind a feature flag?
PyTorch distributions already fixed this
BTW can we simply replace these with .input_event_shape
and .output_event_shape
, or are the complete shapes unknown at time of Bijector construction?
Expected: AffineAutoregressive(DenseAutoregressive) should be able to use a scalar base_distribution
Actual: When using a dist.Normal(0, 1)
base distribution, we get the error:
# Shape the output
> h = h.reshape(x.size()[:-len(self.input_shape)] + (self.output_multiplier, self.input_dims))
E RuntimeError: shape '[2, 1]' is invalid for input of size 200
../simplex/simplex/params/dense_autoregressive.py:198: RuntimeError
RC: In particular, for scalar distributions both batch_shape
and event_shape
are torch.Size([])
which is len
zero, whereas a bit of DenseAutoregressive.{_build,_forward}
seems to assume at least one of them is nonempty (so Param.input_shape
is nonempty). We should modify this code to handle the special case batch_dim == event_dim == torch.Size([])
.
Our source distributions (https://pypi.org/project/flowtorch/0.0.dev2/#files) don't filter out the docs/website. As a result, the javascript in there is causing a Yarn validation failure internally at FB.
When I run the example on flowtorch.ai with a 1D distribution:
import torch
import torch.distributions as dist
import flowtorch
import flowtorch.bijectors as bijectors
# Lazily instantiated flow plus base and target distributions
flow = bijectors.AffineAutoregressive(
flowtorch.params.DenseAutoregressive()
)
base_dist = dist.Normal(torch.zeros(1), torch.ones(1))
target_dist = dist.Normal(torch.zeros(1)+5, torch.ones(1))
# Instantiate transformed distribution and parameters
new_dist, params = flow(base_dist)
# Training loop
opt = torch.optim.Adam(params.parameters(), lr=1e-3)
for idx in range(501):
opt.zero_grad()
# Minimize KL(p || q)
y = target_dist.sample((1000,))
loss = -new_dist.log_prob(y).mean()
if idx % 100 == 0:
print('epoch', idx, 'loss', loss)
loss.backward()
opt.step()
sns.relplot(
data=pd.DataFrame(new_dist.sample((100,)).detach().numpy()),
x=0, y=1
)
The loss goes to NaN unless the learning rate is set extremely low (1e-15
gives sensible results).
Removing the call to self._init_weights
in DenseAutoregressiveresolves the issue and allows a more reasonable
1e-3` learning rate.
I'd like to divide the documentation into three sections:
When I run
bij.log_abs_det_jacobian(x, y, params)
y2 = bij.forward(x2)
print(bij.log_abs_det_jacobian(x2, y2, params)
I expect to get the jacobian at x2
.
Actual behavior: I get the jacobian at x
due to incorrect invalidation of J_cache
If you use too many layers or hidden units relative to the input dimension in DenseAutoregressive
, the new initialization scheme can make the weights unstable, with the output distribution then not being N[0,1]
.
I think this is because each row of each weight matrix is normalized by the l_2
norm of the product of the previous weights, and this can underflow. It shouldn't be a problem in most practical uses of DenseAutoregressive
, but we may want to flag a warning to the user if this happens and/or solve the problem by working in a type of log-space
d, param = flowtorch.bijectors.AffineAutoregressive(
flowtorch.params.DenseAutoregressive()
)(dist.Independent(dist.Normal(torch.zeros(3), torch.ones(3)), 1))
d.rsample((10,2)).shape
should return the same [10,2,3]
shape as dist.Independent(dist.Normal(torch.zeros(3), torch.ones(3)), 1).rsample((10,2)).shape
, but it currently RuntimeErrors due to mismatched shape.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.