Giter Site home page Giter Site logo

nflows's Introduction

nflows

DOI Build status

nflows is a comprehensive collection of normalizing flows using PyTorch.

Installation

To install from PyPI:

pip install nflows

Usage

To define a flow:

from nflows import transforms, distributions, flows

# Define an invertible transformation.
transform = transforms.CompositeTransform([
    transforms.MaskedAffineAutoregressiveTransform(features=2, hidden_features=4),
    transforms.RandomPermutation(features=2)
])

# Define a base distribution.
base_distribution = distributions.StandardNormal(shape=[2])


# Combine into a flow.
flow = flows.Flow(transform=transform, distribution=base_distribution)

To evaluate log probabilities of inputs:

log_prob = flow.log_prob(inputs)

To sample from the flow:

samples = flow.sample(num_samples)

Additional examples of the workflow are provided in examples folder.

Development

To install all the dependencies for development:

pip install -r requirements.txt

Citing nflows

To cite the package:

@software{nflows,
  author       = {Conor Durkan and
                  Artur Bekasov and
                  Iain Murray and
                  George Papamakarios},
  title        = {{nflows}: normalizing flows in {PyTorch}},
  month        = nov,
  year         = 2020,
  publisher    = {Zenodo},
  version      = {v0.14},
  doi          = {10.5281/zenodo.4296287},
  url          = {https://doi.org/10.5281/zenodo.4296287}
}

The version number is intended to be the one from nflows/version.py. The year/month correspond to the date of the release. BibTeX entries for other versions could be found on Zenodo.

If you're using spline-based flows in particular, consider citing the Neural Spline Flows paper: [bibtex].

References

nflows is derived from bayesiains/nsf originally published with

C. Durkan, A. Bekasov, I. Murray, G. Papamakarios, Neural Spline Flows, NeurIPS 2019. [arXiv] [bibtex]

nflows has been used in

Conor Durkan, Iain Murray, George Papamakarios, On Contrastive Learning for Likelihood-free Inference, ICML 2020. [arXiv].

Artur Bekasov, Iain Murray, Ordering Dimensions with Nested Dropout Normalizing Flows. [arXiv].

Tim Dockhorn, James A. Ritchie, Yaoliang Yu, Iain Murray, Density Deconvolution with Normalizing Flows. [arXiv].

nflows is used by the conditional density estimation package pyknos, and in turn the likelihood-free inference framework sbi.

nflows's People

Contributors

alvorithm avatar arashabzd avatar arturbekasov avatar awehenkel avatar conormdurkan avatar dennisprangle avatar dgreenberg avatar donglin-wang2 avatar francesco-vaselli avatar imurray avatar invemichele avatar jahma avatar jan-matthis avatar janfb avatar johannbrehmer avatar mdmould avatar michaeldeistler avatar milescranmer avatar mj-will avatar phinate avatar rmnrth4 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nflows's Issues

UMNN is not included in setup.py

Hi,

When UMNN was added to nflows #29, umnn was only added to environment.yml. So when running pip install . to install the current master version, it's not installed and importing any of the related functions fails.

Whilst the is easily fixed by just installing it manually, I think it might be worth adding it either to the dependencies in setup.py or maybe as an optional requirement in extra_requires.

Question on shape of `sample` method with added context

I've trained a conditional MAF with two additional context variables, pretty much identical to the two-moon example in the repo. However, I'm having some trouble coercing that context into the sample method.

The docstring for Distribution.sample is below, and adds that the samples will have output of shape with leading dimension determined by the context:

Returns:
A Tensor containing the samples, with shape [num_samples, ...] if context is None, or
[context_size, num_samples, ...] if context is given.

I'm not clear on why this is the case (I would not expect conditioning on side information to influence the shape of my samples), and I wasn't able to generate correct samples by supplying my two context variables either -- Distribution.sample seems to treat a context Tensor like [[y1, y2]] unusually, as I was getting bi-modal samples from a unimodal learned likelihood, almost like it was conditioning on [[y1, y1]] and [[y2 y2]] separately, and generating samples for each.

Apologies for no example (private data), but I can do any tests, though I suspect my error may be conceptual. Thanks for your time!

Fast sampling for varying context

Hello there, and thank you all for your work on this package-it has been tremendously helpful.

I am opening this issue because I would like to know if there is a way to perform fast generation when each sample has a different context. To this date, sampling multiple points from the same context is straightforward, i.e. if y is a vector with six elements then

flow.sample(10000, context=y.view(-1, 6))

is quite fast, but samples are all conditioned on the same 6 context values. I have a vector y of shape (10000, 6) and I would like to sample 10000 new points, each one conditioned on a different set of values of the y array. At the moment the best I could manage was something like:

`samples = []

for i in range(0, 10000):

curr_sample = flow.sample(1, context=y[i].view(-1, 6))

curr_sample = curr_sample.detach().cpu().numpy()

curr_sample = np.squeeze(curr_sample, axis=0)

samples.append(curr_sample)`

However, being a bare Python for loop, the process is quite slow (30 minutes for 1e4 samples).
Is there a way to speedup the sampling process? Or am I missing some specific way to pass the arguments to the sample method?
I am more than willing to work on a pull request for this problem if you can provide me with some guidance.
Thanks!

AttributeError: 'int' object has no attribute 'utils'

from keras.utils import to_categorical
from keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
img_rows, img_cols = 28,28
x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)
x_test=x_test.astype('float32')
x_train=x_train.astype('float32')
mean=np.mean(x_train)
std=np.std(x_train)
x_test = (x_test-mean)/std
x_train = (x_train-mean)/std

print("counts of x_train : {}, y_train : {}, x_test : {}, y_test : {}".format(
len(x_train), len(y_train), len(x_test), len(y_test)))
num_classes=10
y_train = k.utils.to_categorical(y_train, num_classes)
y_test = k.utils.to_categorical(y_test, num_classes)
print("counts of x_train : {}, y_train : {}, x_test : {}, y_test : {}".format(
len(x_train), len(y_train), len(x_test), len(y_test)))

AttributeError Traceback (most recent call last)
in
16 len(x_train), len(y_train), len(x_test), len(y_test)))
17 num_classes=10
---> 18 y_train = k.utils.to_categorical(y_train, num_classes)
19 y_test = k.utils.to_categorical(y_test, num_classes)
20 print("counts of x_train : {}, y_train : {}, x_test : {}, y_test : {}".format(

AttributeError: 'int' object has no attribute 'utils'

Context

What exactly is the context, for example in the distributions like MADEMoG? I wanted to use this distrubution instead of StandardNormal for the MaskedAutoregressiveFlow, but I keep getting the error 'NoneType' object has no attribute 'shape' when I want to sample from this flow due to the fact that the first dimension of samples is set to context.shape[0], but I don't specify the context. Could you provide an example on how to fix this, or how an example context should be provided?

Base of resulting BPD in transformations

Hi,
I am using nflows in my project to construct normalizing flows and it is very easy and helped me to prototype my experiments rapidly. However, while going through the source code I am unable to find where the logabsdet is converted to base 2.
So, I just want to ask that the results from the model are either base 2 or base e.

AffineCouplingLayer scale <1.001

Line 225 of coupling.py can only produce a scale in the interval [0.001,1.001] which is very restrictive. The commented out line 224 produces an interval of roughly [0,3] which seems more generally useful. Or alternatively maybe both types of behaviour could be allowed e.g. through extra class initialization arguments.

(Thanks for the very useful + clear package!)

Gradients of log probabilities with respect to the inputs?

Dear All,

I wanted to compute gradients of log prob density w.r.t. the inputs, but apparently the gradient are not propagated to inputs and the result is 'None'. Would you suggest a fix or workaround? Thanks!

`import torch
import torch.nn.functional as F
from nflows import transforms, distributions, flows

def main():
flow = flows.MaskedAutoregressiveFlow(
features=1,
hidden_features=16,
num_layers=1,
num_blocks_per_layer=2,
use_residual_blocks=False,
use_random_masks=False,
use_random_permutations=False,
activation=F.elu,
dropout_probability=0.0,
batch_norm_within_layers=False,
batch_norm_between_layers=False,
)

x = torch.tensor([[1.]])
p = flow.log_prob(x)
p.backward()
print (x.grad)

if name == 'main':
main()`

Disagreement in log probabilities between log_prob and sample_and_log_prob

I'm seeing an odd behavior when contexts are included. If I compare the log probabilities output by sample_and_log_prob with the log probabilities calculated by log_prob on the samples returned by sample_and_log_prob I get different results:

log_prob: [[-16.1957, -16.7197, -17.4852, -20.2420, -17.6908]]
log_prob_2: [-16.1218, -16.7461, -16.9846, -20.4161, -18.0095]

I think I've tracked it down somewhat to the fact that the noise produced in sample_and_log_prob does not match the reconstruction of the noise in log_prob.

Noise sampled in sample_and_log_prob:

noise in s and l: tensor([[[-1.4298,  2.0084, -0.6241,  0.3967,  0.5529,  0.4732, -0.8063],
         [-0.2945, -1.4018, -0.1627, -0.1684, -1.3888,  0.2485, -0.3683],
         [-0.3361, -0.1749,  0.2021,  0.1011,  0.9791, -2.1958, -0.8109],
         [ 0.1006,  2.5615, -0.0782,  1.9179,  1.6321, -0.7352,  0.5438],
         [ 1.4222, -0.1466, -1.3136,  0.9655, -0.3346,  0.8428,  0.0655]]])

Noise reconstructed in log_prob:

noise in log: tensor([[-1.5227,  1.9867, -0.6582,  0.3719,  0.5637,  0.3799, -0.5922],
        [-0.3344, -1.4428, -0.1877, -0.1901, -1.3203,  0.3617, -0.2444],
        [-0.3513, -0.2244,  0.1518,  0.0817,  1.0594, -1.9549, -0.6088],
        [ 0.2416,  2.5659, -0.1450,  1.9957,  1.6983, -0.4731,  0.6077],
        [ 1.5897, -0.1721, -1.2935,  1.0161, -0.2590,  0.8674,  0.1447]],
       grad_fn=<AddBackward0>)

If I run the same test without contexts there is no discrepancy between the log probabilities.
I've included my test case as well:

import torch 
from nflows.flows.base import Flow
from nflows.distributions.normal import StandardNormal
from nflows.transforms.base import CompositeTransform
from nflows.transforms.autoregressive import MaskedAffineAutoregressiveTransform
from nflows.transforms.permutations import RandomPermutation 
from nflows.transforms.lu import LULinear
from nflows.nn.nets import ResidualNet




#Model definition 
base_dist = StandardNormal(shape=[7])

maf_transforms = []
for i in range(3):
    maf_transforms.append(RandomPermutation(features=7))
    maf_transforms.append(MaskedAffineAutoregressiveTransform(features=7,
                                                              hidden_features = 256, 
                                                              context_features=8))



maf_transform = CompositeTransform(maf_transforms)
maf_flow = Flow(maf_transform, base_dist)


test = torch.FloatTensor([0]*8)

samples, log_prob = maf_flow.sample_and_log_prob(5, context=test.unsqueeze(0))


log_prob_2 = maf_flow.log_prob(samples.squeeze())

num_close = torch.sum(torch.abs(log_prob - log_prob_2) < 1e-2)

print("log_prob:", log_prob)
print("log_prob_2:", log_prob_2)


print("num close:", num_close)

EDIT

nvm, i see I missed including the context with the log_prob. Tough to calculate the proper log probability without the context.

Flow.log_prob returns positive values

Thank you for this amazing package!

I have an issue with positive output from the Flow.log_prob method, and during the training it can reach values around 5, corresponding to the probabilities of p =150 > 1. Maybe I do not understand something fundamentally here and it is possible or I do not use the transforms correctly? Below is the example with moons and PiecewiseRationalQuadraticCouplingTransform:

import sklearn.datasets as datasets
import random
import numpy as np

import torch
from torch import nn
from torch import optim

from nflows.flows.base import Flow
from nflows.distributions.normal import StandardNormal
from nflows.transforms.base import CompositeTransform
from nflows.transforms import PiecewiseRationalQuadraticCouplingTransform


class TransformNetFC(nn.Module):
    def __init__(self, num_identity_features: int, num_transform_features: int, hid_dim: int = 8):
        super().__init__()
        self.hidden_channels = hid_dim
        self.fc = nn.Sequential(
            nn.Linear(num_identity_features, hid_dim),
            nn.LeakyReLU(),
            nn.Linear(hid_dim, hid_dim),
            nn.LeakyReLU(),
            nn.Linear(hid_dim, num_transform_features),
        )
        
    def forward(self, identity_split, context=None):
        return self.fc(identity_split)

    
torch.use_deterministic_algorithms(True)
torch.manual_seed(42)
np.random.seed(42)
random.seed(42)

base_dist = StandardNormal(shape=[2])

transforms = CompositeTransform([
    PiecewiseRationalQuadraticCouplingTransform(torch.tensor([1, 0]), TransformNetFC, num_bins=32),
    PiecewiseRationalQuadraticCouplingTransform(torch.tensor([0, 1]), TransformNetFC, num_bins=32),
    PiecewiseRationalQuadraticCouplingTransform(torch.tensor([1, 0]), TransformNetFC, num_bins=32),
    PiecewiseRationalQuadraticCouplingTransform(torch.tensor([0, 1]), TransformNetFC, num_bins=32),
])

flow = Flow(transforms, base_dist)
optimizer = optim.Adam(flow.parameters())

flow.train()

for i in range(5000):
    x, y = datasets.make_moons(128, noise=.1)
    x = (torch.tensor(x, dtype=torch.float32) + 2) / 5 # to be in [0, 1]
    optimizer.zero_grad()
    loss = -flow.log_prob(inputs=x).mean()
    loss.backward()
    optimizer.step()
    
    if loss < 0:
        print(f'Prob = {(-loss).exp().item():.2f}, iteration = {i}')
        break

The output is

Prob = 1.04, iteration = 2200

Environment:

  • Ubuntu 18.04 Linux-4.15.0-175-generic-x86_64
  • python 3.7.3
  • torch 1.8.1+cu102
  • nflows 0.14

Thank you!

Actnorm does not register `initialized` as buffer

Hi,

First of all, I'm a huge fan of this code base. Great work.

I stumbled about a behaviour of actnorm layers that may not be intended.

Issue

When you save a model with actnorm layers, the flag ActNorm.initialized is not part of the state dict (since it's not registered as a buffer). When you then load a model from the state dict and continue training, the scale and shift of the actnorm layers are re-initialized to the mean and standard deviation of the activations.

Expected behaviour

I would expect that saving a state dict, loading the state dict, and continuing training to behave in the same way as training in one go, i.e. without re-initializing the scale and shift parameters. That would also be more consistent with the behaviour of BatchNorm.

Fix

In

self.initialized = False
, replace self.initialized = False with self.register_buffer("initialized", torch.zeros(1, dtype=torch.bool)), and adapting
self.initialized = True
accordingly.

Does that make sense or is this behaviour intended? I'm happy to file a PR if you want me to.

Cheers,
Johann

RuntimeError when splines encounter all-tail inputs

Hi all,

very rarely, I get an error when using spline flows in nflows:

Issue

When a spline transformations encounters inputs that are all outside the (-tail_bound, tail_bound) range, it will throw a RuntimeError. For instance,

import torch
import nflows.transforms

x = torch.tensor([[5.], [-6.]])
trf = nflows.transforms.PiecewiseLinearCDF(shape=(1,), tails="linear", tail_bound=4.0)

trf(x)

gives me

RuntimeError                              Traceback (most recent call last)
<ipython-input-19-39a279c296f6> in <module>
      5 trf = nflows.transforms.PiecewiseLinearCDF(shape=(1,), tails="linear", tail_bound=4.0)
      6 
----> 7 trf(x)

~/anaconda3/envs/flow_processes/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),

~/anaconda3/envs/flow_processes/lib/python3.8/site-packages/nflows/transforms/nonlinearities.py in forward(self, inputs, context)
    240 
    241     def forward(self, inputs, context=None):
--> 242         return self._spline(inputs, inverse=False)
    243 
    244     def inverse(self, inputs, context=None):

~/anaconda3/envs/flow_processes/lib/python3.8/site-packages/nflows/transforms/nonlinearities.py in _spline(self, inputs, inverse)
    229             )
    230         else:
--> 231             outputs, logabsdet = splines.unconstrained_linear_spline(
    232                 inputs=inputs,
    233                 unnormalized_pdf=unnormalized_pdf,

~/anaconda3/envs/flow_processes/lib/python3.8/site-packages/nflows/transforms/splines/linear.py in unconstrained_linear_spline(inputs, unnormalized_pdf, inverse, tail_bound, tails)
     22         raise RuntimeError("{} tails are not implemented.".format(tails))
     23 
---> 24     outputs[inside_interval_mask], logabsdet[inside_interval_mask] = linear_spline(
     25         inputs=inputs[inside_interval_mask],
     26         unnormalized_pdf=unnormalized_pdf[inside_interval_mask, :],

~/anaconda3/envs/flow_processes/lib/python3.8/site-packages/nflows/transforms/splines/linear.py in linear_spline(inputs, unnormalized_pdf, inverse, left, right, bottom, top)
     42     > Müller et al., Neural Importance Sampling, arXiv:1808.03856, 2018.
     43     """
---> 44     if torch.min(inputs) < left or torch.max(inputs) > right:
     45         raise InputOutsideDomain()
     46 

RuntimeError: operation does not have an identity.

(This was with nflows v0.12 on pypi.)

Fix

This is very simple to fix by just adding a check like if torch.any(inside_interval_mask): before the calls to the spline functions. I would be happy to open a PR if you are interested.

Cheers,
Johann

Flow identity initialization

Hello and thanks for the work on the package,

I am doing some tests with the identity initialization for rational quadratic splines.
When using the new identity init implemented in #65 , giving in input x = torch.tensor([1, 1e-2 ,1e-6, 1e-8, 1e2], dtype=torch.float32) the following is the inverse for the untrained network (which should be initialized as the identity):

# in spline def: enable_identity_init=True

# transform back
flow.transform_to_noise(x.view(-1,1))
tensor([[  1.7013],
        [  1.3796],
        [  1.3739],
        [  1.3739],
        [100.0000]], grad_fn=<AddmmBackward0>)

If instead I manually set the weights of the last layer to 0 in the last layer of the transform network (as done in the normflows package) I get the identity as expected:

# in spline def: enable_identity_init=False

# in the model def
        if init_identity:
          torch.nn.init.constant_(autoregressive_net.final_layer.weight, 0.0)
          torch.nn.init.constant_(
              autoregressive_net.final_layer.bias,
              np.log(np.exp(1 - min_derivative) - 1),
          )

# stuff

# transform back
flow.transform_to_noise(x.view(-1,1))
tensor([[1.0000e+00],
        [1.0000e-02],
        [1.0000e-06],
        [1.0000e-08],
        [1.0000e+02]], grad_fn=<AddmmBackward0>)

I was wondering whether you could help me figure out this difference in behavior.
If this seems potentially useful I can work more than gladly on a pull request.
Best regards,
Francesco

Use torch.searchsorted instead of our ad-hoc implementation

Hi!

I compared the searchsorted function implemented here, that does torch.sum(inputs[..., None] >= bin_locations, dim=-1) - 1, with the implementation in C++ here -- https://github.com/aliutkus/torchsearchsorted -- and it appears to be a lot slower on CPU at least.

I modified the benchmark.py in torchsearchsorted and just copy-pasted the function from nflows for comparison.
The output was (all on CPU)

Benchmark searchsorted:
- a [5000 x 16]
- v [5000 x 1]
- reporting fastest time of 10 runs
- each run executes searchsorted 100 times

Numpy: 	0.9516626670001642
torchsearchsorted: 	0.009861100999842165
nflows: 	50.19729063499926

i.e. sorting 5000 inputs into 5000 individual sets of 16 bins.

Am I missing something here? If not, it looks like the spline flows could be sped up quite a bit by using torchsearchsorted or something similar?

Cheers.

Checkerboard masking in RealNVP

Hi,

Since the CouplingTransform class in nflows.transforms.coupling module only supports a 1-d mask that splits data along the channel dimension, I am wondering how I would go about implementing the alternating checkerboard mask in the RealNVP paper? Should I used the MaskedAffineAutoregressiveTransform in the nflows.transforms.autoregressive instead? Or are there some other methods or classes that I am not aware of?

Also, thank you so much for providing such a clean implementation of flow models in PyTorch!

Sincerely,
Donglin Wang

Flow for one-dimensional data

Thank you very much for sharing the code. I wonder whether the autoregressive flows (e.g., MAF) can be used to estimate the density of one-dimensional data? If not, which kind of flow model can be used?

Unnecessary dependencies

Hello,

The dependencies matplotlib, tensorboard and tqdm are installed by setup.py (and environment.yml) even though they are not used by nflows. They should be removed.

nflows/setup.py

Lines 22 to 26 in ac0bf43

"matplotlib",
"numpy",
"tensorboard",
"torch",
"tqdm",

It should be noted that matplotlib is imported (but not used) here

from matplotlib import pyplot as plt

RealNVP can't reconstruct samples

Hi!

I was playing with RealNVP and tried to reconstruct samples after a forward and backward pass through the flow. After training the network on moons dataset, I ran the following program

x, label = datasets.make_moons(1000, noise=.05)
x = torch.from_numpy(x.astype(np.float32))

with torch.no_grad():
    z, _ = model._transform(x, context=torch.nn.Identity()(None))
    x_hat, _ = model._transform(z, context=torch.nn.Identity()(None))

The following plots show the distribution of the points after each pass

Original x:
realnvp_x

Normalized z:
realnvp_z

Reconstruction x_hat:
realnvp_x_hat

The forward pass looks great, since it forms a normal distribution. However, the backward pass doesn't reconstruct the input x_hat similar to x as expected. Any feedback on what is happening would be of great help. Thanks!

Beginner's question

Hi there,

I am rather new to normalizing flows and recently started looking into your library. Therefore I would like to answer the following question:

In the readme, you append MaskedAffineAutoregressiveTransform and then RandomPermutation, while in the two moons example you append ReversePermutation and then MaskedAffineAutoregressiveTransform. Are those two ways equivalent? Which rules does one have to follow when appending transformations?

Best wishes,
Fabio

Reproducing Benchmark Results (MAF)

Hi!
I am trying to reproduce some benchmark results such as the results from the original MAF paper https://arxiv.org/pdf/1705.07057.pdf . For instance, MAF with 5 layers on HEPMASS achieves a test log-likelihood loss of about -17.70 with a standard deviation of essentially 0. When recomputing the experiments using this library, I get significantly different results (taking the tiny standard deviation into account).
Let me summarize some hyperparameters from the original paper that allow us to rebuild the employed MAF:
I) Model architecture:

  • MAF with 5 layers,
  • The made networks have 512 hidden features and 1/2 layers,
  • Batch-Norm between flow layers (to be exact, the authors use Batch-Norm between every 2 autoregressive layers according to the paper, see Appendix B),
  • ReLU activations are used,
  • the permutation scheme is reverse permutations,
  • 2 models are trained simultaneously (with 1 and 2 made layers) and the best is being selected based on its validation set performance.

II) Optimization:

  • ADAM optimizer with a learning rate of 1e-4 and a weight decay of 1e-6,
  • batch size of 100,
  • early stopping with a patience set to 30 epochs,

Hence, the essential ingredients for retraining a MAF are the flow instance:
MaskedAutoregressiveFlow(features=D, hidden_features=512, num_layers=5, num_blocks_per_layer=1, use_residual_blocks=False, batch_norm_between_layers=True)
(Note, that this implementation of MAF employs a Batch-Norm layer between each autoregressive flow layer. However, using Batch-Norm after every 2 autoregressive layers, I get similar results.);
and the optimizer:
optim.Adam(flow.parameters(), lr=1e-4, weight_decay=1e-6).

Data is obtained and preprocessed according to the original implementation https://github.com/gpapamak/maf .

I am wondering if my different results are due to the implementation in nflows or because I missed some architectural details. Besides helping me to reproduce the results, I think it would be beneficial to extend the example section of this repository by some benchmark computations like these. I am willing to help you with this task.

In the following, I present a minimal example that reproduces my results (Running this 3 times, I get final test losses of 17.943886, 18.005745, 17.997137):

import torch
import numpy as np
from datasets import hepmass # I have not included this script here, compare with https://github.com/gpapamak/maf
from torch import optim
from torch.utils.data import DataLoader
from torch.nn import functional as F

import nflows
from nflows.flows import Flow
from nflows.flows import MaskedAutoregressiveFlow

if torch.cuda.is_available():
    torch.device("cuda")
    device = "cuda"
else:
    torch.device("cpu")
    device = "cpu"

num_layers = 5
num_hiddenfeatures1 = 512
num_hiddenfeatures2 = 512
num_blocks1 = 1
num_blocks2 = 2
activation = F.relu

lr = 1e-4 # learning rate
lr_wd = 1e-6 # weight decay
batch_size = 100
num_epochs = 400 # set some upper limit for the number of epochs
num_patience = 30

data_train, data_val, data_test = hepmass.load_data_no_discrete_normalised_as_array(
        "data/hepmass")
D = 21


# batch norm after every layer
flow1 = MaskedAutoregressiveFlow(features=D, hidden_features=num_hiddenfeatures1, num_layers=num_layers, num_blocks_per_layer=num_blocks1, use_residual_blocks=False, batch_norm_between_layers=True).to(device)
flow2 = MaskedAutoregressiveFlow(features=D, hidden_features=num_hiddenfeatures2, num_layers=num_layers, num_blocks_per_layer=num_blocks2, use_residual_blocks=False, batch_norm_between_layers=True).to(device)

optimizer1 = optim.Adam(flow1.parameters(), lr=lr, weight_decay=lr_wd)
optimizer2 = optim.Adam(flow2.parameters(), lr=lr, weight_decay=lr_wd)

train_dataloader = DataLoader(data_train, batch_size=batch_size, shuffle=True)
test_dataloader = DataLoader(data_test, batch_size=batch_size, shuffle=True)
val_dataloader = DataLoader(data_val, batch_size=batch_size, shuffle=True)

# lists that are filled with epoch-averages:
loss_val_list1 = []
loss_trn_list1 = []
loss_val_list2 = []
loss_trn_list2 = []

# define early stopping counters
counter_es1 = 0
counter_es2 = 0

# set best validation loss to infinity before training and update it during training
best_val_loss1 = np.inf
best_val_loss2 = np.inf

# boolean: if early stopping criterion is attained, set to False and stop training
train_model1 = True
train_model2 = True

# training loop:
for e in range(num_epochs):
    loss_train1_thisepoch = []
    loss_train2_thisepoch = []
    for batch in train_dataloader:
        flow1.train()
        flow2.train()
        batch = batch.type(torch.float32).to(device)
        optimizer1.zero_grad()
        optimizer2.zero_grad()

        if train_model1:
            loss1 = -flow1.log_prob(inputs=batch).mean()
            loss1.backward()
            optimizer1.step()
            loss_train1_thisepoch.append(loss1.cpu().detach().numpy())
        if train_model2:
            loss2 = -flow2.log_prob(inputs=batch).mean()
            loss2.backward()
            optimizer2.step()
            loss_train2_thisepoch.append(loss2.cpu().detach().numpy())

    # compute epoch averages:
    if train_model1:
        loss_train1 = np.around(np.mean(loss_train1_thisepoch), decimals=2)
        loss_trn_list1.append(loss_train1)
    if train_model2:
        loss_train2 = np.around(np.mean(loss_train2_thisepoch), decimals=2)
        loss_trn_list2.append(loss_train2)

    # save validation losses of this epoch here:
    loss_val1_thisepoch = []
    loss_val2_thisepoch = []
    flow1.eval()
    flow2.eval()
    for val_batch in val_dataloader:
        val_batch = val_batch.type(torch.float32).to(device)
        if train_model1:
            loss_val1_thisepoch.append(np.around(torch.mean(-flow1.log_prob(val_batch)).cpu().detach().numpy(),decimals=2))
        if train_model2:
            loss_val2_thisepoch.append(np.around(torch.mean(-flow2.log_prob(val_batch)).cpu().detach().numpy(), decimals=2))

    if train_model1:
        loss_val_list1.append(np.mean(loss_val1_thisepoch)) # epoch average
        if np.mean(loss_val1_thisepoch) > best_val_loss1:
            if counter_es1 == num_patience - 1:  # stop training
                train_model1 = False
            print(f'Early Stopping counter (Model 1) {counter_es1 + 1}/{num_patience}')
            counter_es1 += 1
        else:
            counter_es1 = 0  # reset counter
            best_val_loss1 = np.mean(loss_val1_thisepoch)
            # save model
            torch.save(flow1.state_dict(), "model1")

    if train_model2:
        loss_val_list2.append(np.mean(loss_val2_thisepoch)) # epoch average
        if np.mean(loss_val2_thisepoch) > best_val_loss2:
            if counter_es2 == num_patience - 1:  # stop training
                train_model2 = False
            print(f'Early Stopping counter (Model 2) {counter_es2 + 1}/{num_patience}')
            counter_es2 += 1
        else:
            counter_es2 = 0  # reset counter
            best_val_loss2 = np.mean(loss_val2_thisepoch)
            # save model
            torch.save(flow2.state_dict(), "model2")

    if (train_model2 == False) and (train_model1 == False):
        print(f"Training finished after {e+1} Epochs!")
        break

    if train_model1:
        print(
            f'Epoch {e + 1}/{num_epochs} Model1: Train loss = {loss_train1}, Validation loss = {np.mean(loss_val1_thisepoch)}')
    if train_model2:
        print(
            f'Epoch {e + 1}/{num_epochs} Model2: Train loss = {loss_train2}, Validation loss = {np.mean(loss_val2_thisepoch)}')


### load model with minimal validation loss:
if counter_es1 > 0:
    flow1 = MaskedAutoregressiveFlow(features=D, hidden_features=num_hiddenfeatures2, num_layers=num_layers, num_blocks_per_layer=num_blocks1, use_residual_blocks=False, batch_norm_between_layers=True).to(device)
    flow1.load_state_dict(torch.load("model1", map_location=device))
torch.save(flow1.state_dict(), "model1")
if counter_es2 > 0:
    flow2 = MaskedAutoregressiveFlow(features=D, hidden_features=num_hiddenfeatures1, num_layers=num_layers, num_blocks_per_layer=num_blocks2, use_residual_blocks=False, batch_norm_between_layers=True).to(device)
    flow2.load_state_dict(torch.load("model2", map_location=device))
torch.save(flow2.state_dict(), "model2")

print(f"Model 1 (num_blocks={1}, num_hidden={512}) final validation loss: {best_val_loss1}")
print(f"Model 2 (num_blocks={2}, num_hidden={512}) final validation loss: {best_val_loss2}")
if best_val_loss2 > best_val_loss1:
    best_model = 1
else:
    best_model = 2

# print test loss:
flow1.eval()
flow2.eval()
with torch.no_grad():
    loss_test = []
    for test_batch in test_dataloader:
        test_batch = test_batch.type(torch.float32).to(device)
        if best_model == 1:
            loss = -flow1.log_prob(inputs=test_batch).mean()
        else:
            loss = -flow2.log_prob(inputs=test_batch).mean()
        loss_test.append(loss.cpu().detach().numpy())
average_testloss = np.mean(loss_test)

print("vanilla: Final Test loss after {} Epochs: {}".format(e + 1, average_testloss))

Thank you very much!

How to construct a flow with a very high dim base distribution

Thanks for your excellent works!

I want to use the normalizing flow to learn a flow from a high dim distribution (e.g. 128 dims) to a 2-D Gaussian distribution, so how to modify the code to satisfy thay?

What I want to do is to map the high-dim data onto this 2-D plane

Better README

At the moment the framework is difficult to get started with for newcomers. Without reading the code it's not clear what modules there are and how they're supposed to be used.

To begin with, I think the README could be improved. We should include:

  • An overview of the API: the main classes (Distribution, Transformation, Flow) and how they're related.
  • An overview of implemented features: what flow/layer types are available.
  • A short, didactic snippet of code for creating a simple flow and setting up the optimization.

Requesting Advice on NF Methods

Dear Bayesians,

I am working on a project where I sample a set of n-dimensional points from a Gaussian distribution (of learnt parameters) as follows and then evaluate those points based on a loss function to update model parameters with gradient descent.

mu, std = self.lin_1(z), self.lin_2(z)
eps = torch.Tensor(*img_shape).normal_()
return self.act((eps.cuda() * std) + mu)

I would like to transform the Gaussian distribution for being able to sample those points from a more complex learnt distribution. In other words, the model needs to learn how to best transform points obtained from the Gaussian distribution.

I would be glad if you can suggest the best normalizing flows method (transform) to employ considering the following scalability requirements (whether or not it is available in this repo). Thank you very much in advance for your suggestion.

  • I am sampling 100K-dimensional points with a batch-size of 5K; hence, the scalability is crucial.
  • The method should be memory efficient and fast to train on a RTX series desktop Nvidia GPU.
  • There should not ideally be an additional regularization parameter to my current loss function.
  • Expressiveness of the method is not as important as scalability and robustness in the training.

How to put flow to gpu?

I use realnvp flow in examples folder and I tinkered it a little bit. It seems that it only works on cpu.

class FlowModel(Flow):
    def __init__(...):
        transformations = CompositeTransform([...])
        super().__init__(
            transform=CompositeTransform([...]),
            distribution=StandardNormal((3, 32, 32)),
        )

flow = FlowModel()
flow = flow.to(device) # CUDA:0 here
...
flow.transform_to_noise(z) # error...

error:

File "/data/lhb/anaconda2/envs/python36/lib/python3.6/site-packages/nflows/transforms/base.py", line 52, in _cascade
    total_logabsdet += logabsdet
RuntimeError: expected backend CUDA and dtype Float but got backend CPU and dtype Float

I am pretty new to pytorch but have some experience in tensorflow, so how to put flow to gpu to train? Thx!!

Array index out of range in "rational_quadratic" file

Hi,
I'm having some issues with this package, in file: "nflows/transforms/splines/rational_quadratic.py"
Line 114: input_bin_widths = widths.gather(-1, bin_idx)[..., 0]
the index bin_idx may out of range. It takes values from {0, 1, ..., num_bins}, but the shape of widths is batch * num_bins

I found a possible reason, in Line 78 and Line 79:

if torch.min(inputs) < left or torch.max(inputs) > right:
    raise InputOutsideDomain()

if some values in inputs equal to right, it also satisfied this range check, but this point is actually at the right endpoint of the last bin, that is, the bin_idx generated in Line 111
bin_idx = torchutils.searchsorted(cumwidths, inputs)[..., None]
will give the corresponding bin_idx value as num_bins
(In this case, if I set inputs -= 1e-5, the corresponding bin_idx will be num_bins-1)
Now, if bin_idx has some value equals to num_bins, then in Line 114 input_bin_widths = widths.gather(-1, bin_idx)[..., 0] can lead to an exception because the shape of widths is batch * num_bins

I think one possible solution is that, add bin_idx[bin_idx == num_bins] -= 1 after setting the values of bin_idx, that is

if inverse:
    bin_idx = torchutils.searchsorted(cumheights, inputs)[..., None]
else:
    bin_idx = torchutils.searchsorted(cumwidths, inputs)[..., None]

bin_idx[bin_idx == num_bins] -= 1
input_cumwidths = cumwidths.gather(-1, bin_idx)[..., 0]
input_bin_widths = widths.gather(-1, bin_idx)[..., 0]

Saving models

It might be something I'm looking over completely. But how can I save a trained model/weights during training? or is there an indirect way to achieve this?

Nevermind, beginner in pytroch:). Just use torch.save() & torch.load()

Is having negative loss okay?

Hi!
Me and my colleague are using your package and encountered a strange phenomenon; we are getting negative loss values when training. That being said, the loss graph as a whole kinda looks like a normal loss graph (minus the values).

image

I am also pasting a code snippet of the initialization and the training:

class NormalizedFlowModel:
    def __init__(self, n_flows, pretrained_path=None, device='cpu', **kwargs):
        self.n_flows = n_flows
        self.net = ResnetAdapter(pretrained_path, device)

        self.latent_dim = kwargs.get('latent_dim', 512)
        self.device = device

        self.transform = transforms.CompositeTransform([
            transforms.MaskedAffineAutoregressiveTransform(features=self.latent_dim, hidden_features=2 * self.latent_dim),
            transforms.RandomPermutation(features=self.latent_dim)
        ] * n_flows)

        # Set target and q0
        base_distribution = distributions.StandardNormal(shape=[self.latent_dim])

        # Construct flow model
        self.flow = flows.Flow(transform=self.transform, distribution=base_distribution)
        self.flow.to(device)

    def train(self, nf_train_loader, **kwargs):
        n_epochs = kwargs.get('n_epochs', 5)
        lr = kwargs.get('lr', 1e-4)
        weight_decay = kwargs.get('weight_decay', 1e-5)

        optimizer = torch.optim.Adam(self.flow.parameters(), lr=lr, weight_decay=weight_decay)
        loss_list = []

        for epoch in tqdm(range(n_epochs), desc="epoch"):

            self.flow.train()
            self.net.eval()

            for batch_idx, (X, Y) in enumerate(nf_train_loader):
                batch_size = X.shape[0]

                X = X.to(self.device)

                with torch.no_grad():
                    outputs, _, latent = self.net(X)

                loss = -self.flow.log_prob(inputs=latent[-1]).mean()
                optimizer.zero_grad()
                loss.backward()
                optimizer.step()

                loss_list.append(loss.item())

        return loss_list

We are using the feature map before the FC layer of resnet as inputs (that is latent[-1])

Max size of conditional variable?

Forgive my naivety, but is there a max size or required size of the conditional(/context) feature for a given normalizing flow?

I have been trying to get a flow working with a large conditional feature vector, but am unable to get it working due to an apparent shape mismatch.

In looking at one of the examples, I noticed that the number of conditional features/context features was equal to twice the number of input features. Is this required, or is there a different way to pass conditional information (without having to compute the full joint distribution over the extra features)?

I thus got my code working by projecting my current conditional feature to this smaller-sized space. Is this standard for autoregressive transform-based flows? Or is there a way to get the flow to work with very large conditional feature vectors?

Here is some example code I am working with:

#Modeling a distribution in a space with `data_features` dimensions
# Thus, our context vector should have twice the dimensions:
hidden_features_transform = data_features * 2

# Create a projection matrix to encode the large conditional feature vector:
context_encoder = nn.Linear(conditional_features, hidden_features_transform)

# Give this encoder to the conditional base distribution
base_dist = ConditionalDiagonalNormal(
        shape=[data_features], 
        context_encoder=context_encoder)

# Create the flow with this many conditional features
transforms = []
for _ in range(num_layers):
    transforms.append(ReversePermutation(features=data_features))
    transforms.append(
        MaskedAffineAutoregressiveTransform(
            features=data_features,
            hidden_features=hidden_features_transform,
            context_features=conditional_features,
            num_blocks=5,
    ))
transform = CompositeTransform(transforms)
flow = Flow(transform, base_dist)

Currently, if I make this hidden_features_transform larger, I get shape mismatch errors, though I would expect this to just be another hidden layer hyperparameter I can change.

Thanks!
Miles

ConditionalDiagonalNormal sampling doesn't work when model is on GPU

Hello,

I am trying to implement a conditional flow model and I use ConditionalDiagonalNormal as the base distribution. The problem I encountered was that when I tried to sample when the model was on gpu I got 'expected device cuda:0 but got device cpu' error. This is because the noise generated by torch.randn is not on gpu. On cpu, it works fine.

noise = torch.randn(context_size * num_samples, *self._shape)

Can't import `nflows`, get `No module named 'nflows.distributions'; 'nflows' is not a package`

Hi there,

Just installed the nflows package using pip and tried to run the example on ipython and got this:

In [1]: from nflows import transforms, distributions, flows
   ...: 
   ...: # Define an invertible transformation.
   ...: transform = transforms.CompositeTransform([
   ...:     transforms.MaskedAffineAutoregressiveTransform(features=2, hidden_features=4),
   ...:     transforms.RandomPermutation(features=2)
   ...: ])
   ...: 
   ...: # Define a base distribution.
   ...: base_distribution = distributions.StandardNormal(shape=[2])
   ...: 
   ...: 
   ...: # Combine into a flow.
   ...: flow = flows.Flow(transform=transform, distribution=base_distribution)
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-1-73851c844ba7> in <module>
----> 1 from nflows import transforms, distributions, flows
      2 
      3 # Define an invertible transformation.
      4 transform = transforms.CompositeTransform([
      5     transforms.MaskedAffineAutoregressiveTransform(features=2, hidden_features=4),

~/HEP_Tools/2HDMSP-1.1.2-Miguel/nflows.py in <module>
      9 from torch import nn, optim
     10 
---> 11 from nflows.distributions.normal import StandardNormal
     12 from nflows.flows.base import Flow
     13 from nflows.transforms.autoregressive import MaskedAffineAutoregressiveTransform

ModuleNotFoundError: No module named 'nflows.distributions'; 'nflows' is not a package

In [2]: 


Any other information you need from me to further debug? Cheers

Further info

$ pip list --user | grep nflows
nflows                   0.14

and package exists in $HOME/.local/lib/python3.9/site-packages/nflows.

Using Manjaro Linux.

Help with the use of the quadratic splines.

Hello, I have an issue when I try to use the quadratic splines autoregressive transform. The issue is that I dont know how to use it with 1d data. Say my data is of the shape [batch_size, 24]. Then how do I train it? I can make it to be [batch_size,1,24] or [batch_size,24,1] But I am not sure how to use this library.

Problem with transforming between distributions

I am trying to use nflows to map between two nontrivial probability distributions p_1, p_2. I first train a flow to learn distribution 1 with

transform_1 = CompositeTransform(transforms_1)
flow_1 = Flow(transform_1, base_dist)

for an array of transform objects and a standard normal base_dist. After learning, when I sample from flow_1, the samples match the probability distribution as the data for p_1.

I then train a flow to map from p_1 to p_2 with

transform_2 = CompositeTransform(transforms_2)
flow_2 = Flow(transform_2, flow_1)

So I am using flow_1 as the base density for this transformation. After learning, when I sample from flow_2, the samples match the probability distribution as the data for p_2.

However, if I try to map from p_1 to p_2 by taking a sample from p_1 and running transformed_dat = flow_2.transform_to_noise(p_1), transformed_dat does not match the distribution from p_2.

Is there an additional step in transforming between datasets that I need to carry out?

Possible memory leak (CPU) when using batch norm

Hi,

I'm seeing an unreasonable increase in RAM usage (of order GB) when training a normalising flow with batch norm between layers. I believe this is caused by the computation graph being extended each time the running mean is computed here (this pytorch issue reports a similar problem). This does not appear to be an issue when using batch norm within layers since they use nn.BatchNorm1d.

I will also submit a pull request with minor changes that should fix this (assuming it is indeed a bug).

To reproduce:

Run the following snippet and monitor RAM usage:

import sklearn.datasets as datasets
import torch
from torch import optim

from nflows.flows.realnvp import SimpleRealNVP

flow = SimpleRealNVP(2, 32, 4, 2, batch_norm_between_layers=True)
optimizer = optim.Adam(flow.parameters())

num_iter = 1000
for i in range(num_iter):
    x, y = datasets.make_moons(1024, noise=.1)
    x = torch.tensor(x, dtype=torch.float32)
    optimizer.zero_grad()
    loss = -flow.log_prob(inputs=x).mean()
    loss.backward()
    optimizer.step()

Higher dimensional data leads to exploding likelihood

Hey,

while playing around I stumbled upon some behavior that I can't quite explain to myself. I implemented the flow

from nflows import transforms, distributions, flows
from nflows.transforms.autoregressive import MaskedPiecewiseRationalQuadraticAutoregressiveTransform as AutoregRQS

modules = []
for i in range(n_layers):
	modules.extend([AutoregRQS(features=24, num_bins=10, hidden_features=8, tails='linear', tail_bound=5),
	transforms.LULinear(24)])
modules.pop()
transform = transforms.CompositeTransform(modules)
base_distribution = nflows.distributions.StandardNormal(shape=[24])
flow = flows.Flow(transform=transform, distribution=base_distribution)

and used the negative log likelihood loss

def nll(batch, model):
    log_prob = -1 * model.log_prob(batch).mean(0)
    return log_prob

to train it on this dataset from the planar_datasets.py:

training_dataset = FourCircles(50000).data
training_dataset = target_dataset.repeat(1, 12)

As you can see, I repeated the dataset on purpose to blow up the dimensionality of the problem. When I keep the dataset in its regular 2-D form, everything works fine. But in this version, after training for 2 epochs (Adam optimizer, lr 0.001, cosine annealing, batch size 64), my loss goes down to as low as -45. It goes even lower if I train longer. That means my likelihood must be > e^45, which should not happen.

The reason might be a conceptual misunderstanding on my side, as I also ran into this issue with a custom implemented Masked Autoregressive Flow. I hoped I have a bug and using the nflows framework to implement my model and log_prob() computation eradicates this issue, but unfortunately this did not happen.

Any ideas how this can be explained?

Cheers :)

EDIT:
I made some more observations that might help to clearify this: In such a high dimensional space, the density of the base distribution is embedded in a larger volume and hence the likelihoods of noise points becomes very small everywhere. This leads to a greater influence of the jacobian determinant on the negative log likelihood and ultimately to it blowing up.

AttributeError: module 'nflows' has no attribute 'utils'

Hello,

first of all, thanks for sharing your toolbox!

However, I have a minor problem using it. When I try to import the masked autoregressive flow using from nflows.flows import autoregressive as ar, I get the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-5-b04b6f7320e6> in <module>()
----> 1 from nflows.flows import autoregressive as ar

5 frames
/usr/local/lib/python3.6/dist-packages/nflows/flows/__init__.py in <module>()
----> 1 from nflows.flows.autoregressive import MaskedAutoregressiveFlow
      2 from nflows.flows.base import Flow
      3 from nflows.flows.realnvp import SimpleRealNVP

/usr/local/lib/python3.6/dist-packages/nflows/flows/autoregressive.py in <module>()
      3 from torch.nn import functional as F
      4 
----> 5 from nflows.distributions.normal import StandardNormal
      6 from nflows.flows.base import Flow
      7 from nflows.transforms.autoregressive import MaskedAffineAutoregressiveTransform

/usr/local/lib/python3.6/dist-packages/nflows/distributions/__init__.py in <module>()
----> 1 from nflows.distributions.base import Distribution, NoMeanException
      2 from nflows.distributions.discrete import ConditionalIndependentBernoulli
      3 from nflows.distributions.mixture import MADEMoG
      4 from nflows.distributions.normal import (
      5     ConditionalDiagonalNormal,

/usr/local/lib/python3.6/dist-packages/nflows/distributions/base.py in <module>()
      4 from torch import nn
      5 
----> 6 from nflows.utils import torchutils
      7 import nflows.utils.typechecks as check
      8 

/usr/local/lib/python3.6/dist-packages/nflows/utils/__init__.py in <module>()
----> 1 from nflows.utils.torchutils import (
      2     cbrt,
      3     create_alternating_binary_mask,
      4     create_mid_split_binary_mask,
      5     create_random_binary_mask,

/usr/local/lib/python3.6/dist-packages/nflows/utils/torchutils.py in <module>()
      1 """Various PyTorch utility functions."""
      2 
----> 3 import nflows.utils.typechecks as check
      4 import numpy as np
      5 import torch

AttributeError: module 'nflows' has no attribute 'utils'

I am using Google Colab and installed the toolbox using !pip3 install nflows. Any help is appreciated. Thank you!

Rational quadratic spline identity init

First of all thanks for providing this awesome library.

In our applications, it is sometimes beneficial to initialize transforms as the identity.
We usually achieve this by setting all parameters to zero (e.g., shift and logscale in an affine coupling layer).

In your implementation of the rational quadratic spline, we can achieve this by replacing the line

derivatives = min_derivative + F.softplus(unnormalized_derivatives)

by something like

    import numpy as np
    ...
    mean_slope = (top-bottom) / (right-left)
    derivatives = mean_slope * F.softplus(unnormalized_derivatives, beta=np.log(2))
    derivatives =  derivatives.clip(min_derivative, None)

Is this something you would consider changing in your implementation?
Otherwise we can just go ahead and use an adaptation of your rq-spline code in our repo (of course with the appropriate references to your implementation).

Cheers,
Andreas

Tagging @invemichele @jonkhler

InputOutsideDomain() error raised with cubic spline

I am testing out a few different types of architectures for training normalising flows, and I am running into an unexpected bug.

The first architecture uses a MaskedPiecewiseRationalQuadraticAutoregressiveTransform:

transforms = []

for _ in range(num_layers):
      
        transforms.append(MaskedPiecewiseRationalQuadraticAutoregressiveTransform(features = n_features, hidden_features = 128, num_blocks = 2,  tail_bound=3.5, context_features=1,tails="linear",num_bins = 10))
        transforms.append(ReversePermutation(features=n_features)) 

This architecture trains without issue, and sampling from the base distribution produces good results.

However, when I replace MaskedPiecewiseRationalQuadraticAutoregressiveTransform with MaskedPiecewiseCubicAutoregressiveTransform, i.e.

transforms = []

for _ in range(num_layers):

        transforms.append(MaskedPiecewiseCubicAutoregressiveTransform(features = n_features, hidden_features = 128, num_blocks = 2, context_features=1, num_bins = 10))
        transforms.append(ReversePermutation(features=n_features))     

I get an error

  File "/global/home/users/rrmastandrea/computingML2/lib64/python3.6/site-packages/nflows/transforms/splines/cubic.py", line 85, in cubic_spline
    raise InputOutsideDomain()
nflows.transforms.base.InputOutsideDomain

It seems that both types of flows are implemented similarly in the nflows repo, so I am not sure why changing the type of transform would cause such an error to be thrown?

If my data is 6 dimension, how can I use this code to process it?

I wanna use this code to predict stocks, but my data shape is (n,6),the example moons has only dimension 2, I can't do this:

        xline = torch.linspace(-1.5, 2.5, 100)
        yline = torch.linspace(-.75, 1.25, 100)
        xgrid, ygrid = torch.meshgrid(xline, yline)
        xyinput = torch.cat([xgrid.reshape(-1, 3), ygrid.reshape(-1, 3)], dim=1)

        with torch.no_grad():
            zgrid = flow.log_prob(xyinput).exp().reshape(100, 100)

        plt.contourf(xgrid.numpy(), ygrid.numpy(), zgrid.numpy())
        plt.title('iteration {}'.format(i + 1))
        plt.show()

Using SimpleRealNVP on a 10 features converging to a different std

Hi!
We are using SimpleRealNVP and encountered a strange phenomenon. The flow converges to a seemingly normal distribution:
image

However, when plotting the covariance matrix (that should be I) we get lower variances across the board:
image

When dropping number of features we experience increased variance and when increasing the number of features to 512 the variance drops further and is significantly closer to 0.

We could not figure out what is causing this problem. Any help would be appreciated :D

Instability due to division by scale?

My issue is more of a mathematical nature, in the inverse affine transform one divides by scale

outputs = (inputs - shift) / scale

The scale is output of softplus + epsilon, which is 1e-3 by default. If my scale is always close to zero, and my network has 5 affine transformations, it's potentially multiplication by a factor of 1e-15. And I observe samples of such order of magnitude during training/validation. Apart from increase of 1e-3, is there some less artificial way to avoid this instability?

Transforms Summary

Sorry if it is already here, but do you have a summary somewhere of the set of transforms (i.e., bijective mapping layers) that you have implemented? A short list of them, perhaps along with the papers they're from and how to use/stack them in practice, would be quite helpful.

I suppose this is related to issue #3.

This looks like a fantastic repo. Thanks for putting it up!

Citation

Sorry if I missed it, but how would you prefer nflows to be cited?

A bibtex entry for example would be useful. :)

Forward vs. Inversse

Hi,

From my understanding to sample from a normalizing flow you sample the base distribution and then push that forward through the transformations, but it seems you have decided to use the inverse. I would like the output of my flow to be bounded by [-1,1] and therefore apply a tanh transformation at the end of my transforms list. Though I am not getting the behavior I expect. I was hoping you could clarify what is meant by forward and inverse in the code.

def build_model(num_layers=2, hids=100, dims=2, context_dims=2,
        batch_norm=False, activation=torch.nn.functional.relu, bins = 15, tail=8.0,
        bounds = 9, device = 'cuda'):
    context_net = nn.Sequential(nn.Linear(context_dims, 256),
            nn.ReLU(),
            nn.Linear(256, 256),
            nn.ReLU(),
            nn.Linear(256, 2*dims)
        )
    base_dist = nflows.distributions.ConditionalDiagonalNormal(
        shape=[dims], context_encoder= context_net)

    transforms = []

    def create_net(in_features, out_features):
        return nets.ResidualNet(
            in_features, out_features, context_features=context_dims,
            hidden_features=hids, num_blocks=2,
            use_batch_norm=batch_norm,
            activation=activation)

    for _ in range(num_layers):
        transforms.append(nflows.transforms.ReversePermutation(features=dims))
        if dims > 1:
            mask = nflows.utils.torchutils.create_mid_split_binary_mask(dims)
            transforms.append(
                nflows.transforms.PiecewiseRationalQuadraticCouplingTransform(
                    mask, create_net, tails='linear', num_bins=bins, tail_bound=tail,
                ))
        if dims == 1:
           transforms.append(
                nflows.transforms.MaskedPiecewiseRationalQuadraticAutoregressiveTransform(
                 features=dims,
                 hidden_features=hids,
                 context_features=context_dims,
                 tails='linear',
                 use_batch_norm=batch_norm,
                 num_bins=bins,
                 tail_bound = tail,
                 activation = activation))

    transforms.append(nflows.transforms.Tanh())
    transform = nflows.transforms.CompositeTransform(transforms)

    flow = nflows.flows.Flow(transform, base_dist)
    return flow

Best,
Lucas

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.