Giter Site home page Giter Site logo

twostagevae's People

Contributors

daib13 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

twostagevae's Issues

a problem about 'unpickle' function

when I try to run the cifar10 dataset, there is a problem occurs:
Traceback (most recent call last):
File "preprocess.py", line 190, in
preporcess_cifar10()
File "preprocess.py", line 149, in preporcess_cifar10
x_train = load_cifar10_data('training')
images_array = np.concatenate(images_array, 0)
File "preprocess.py", line 51, in load_cifar10_data
img_dict = unpickle(filename)
NameError: name 'unpickle' is not defined

Am I miss something?

Default setting for reproducing the result in your paper

Hi. I would like to really appreciate your work. I have implemented you paper in PyTorch and now I am trying to reproduce your paper results in (Table 1). May I ask you to tell me what are the default settings for CelebA and Cifar10 with which you used to train the TwoStageVAE? In your paper, you referred us to a paper that introduces some hyperparameter settings, not all of them and I think they are incomplete.

pre-processing CIFAR-10

I read the discussion in OpenReview on the sensitivity of the FID score to the min-max-normalization of CIFAR-10 images.

The authors thought it was the min-max normalization carried out by imsave that changes the pixel values of images. However, I find that it is not the case.

In the reference guide of the old version of scipy (https://docs.scipy.org/doc/scipy-1.0.0/reference/generated/scipy.misc.imsave.html), there is a warning message:
"This function uses bytescale under the hood to rescale images to use the full (0, 255) range if tode is one of None, 'L', 'P', 'l'. It will also cast data for 2-D images to uint32 for mode=None (which is the default)."

But bytescale() min-max normalizes the input array if the dtype of the array is not uint8. Actually, the CIFAR-10 3-D images for python have been restored as np.uint8 arrays. Thus, the min-max normalization was not executed.

It seems to be the compression of JPG format (when imsave() saving PIL.Image image object to .jpg file) that cause the changes of pixel values. See https://stackoverflow.com/questions/21949014/python-image-save-changes-the-data

Optimizing gamma to zero and mode collapse

Thank you for your very nice work. It was a very good read.
If you don't mind, I have a few questions:

  1. You argued for the importance of optimizing gamma, and showed that as gamma goes to zero, the VAE reconstructs the same x for any z~q(z|x). But do we really want to get this scenario? Isn't this mode collapse?
  2. If I understood correctly, the above happens because the injected noise is rapidly scaled down by the encoder variance, which goes to zero as gamma goes to zero. I think this means that q(z|x) is converging to a delta on some of its dimensions (specifically, on the nonsuperfluous ones). Then how is this nonzero measure?
  3. Some works use pixel-wise gammas, instead of a scalar one. Does your work easily generalize to this case?
  4. Given the interesting insights from your paper, I now wonder what should we optimize for. What metric should we track e.g. for early stopping, model selection, etc?
    a) The VAE loss
    b) Expectation (under z~q(z|x)) of reconstruction loss (since you argue that perfect reconstruction happens at the optima)
    c) Deterministic reconstruction loss (i.e. using mean of q(z|x))
    d) Wait until gamma gets under certain small threshold

FID score calculation and it's difference from tf version

I am trying to understand how to calculate the FID scores mentioned in the paper using the fid_score.py file. I understand that the score can be calculated using the evaluate_fid_score method, but it seems to be done for .npy files.

I want to know how much the difference will be in the fid score if I use the method in fid.py from https://github.com/bioinf-jku/TTUR as it allows to calculate the score from two folders of images?

pre-processing CelebA

Hi, seems like you use 128 by 128 center crop for celebA, while google use 160 by 160 center crop in their Are GANs Created Equal. See below codes from their repository:

image = tf.image.resize_image_with_crop_or_pad(image, 160, 160) image = tf.image.resize_images(image, [64, 64])

Just wondering if it is a fair comparison? Thanks!

Does reconstruction loss dominate in the 2nd stage VAE?

Hello, I recently read this paper and found it fascinating and very relevant to my work. I am a little bit confused by one aspect of your approach, however. I understand the intuition that in a standard VAE, the reconstruction term dominates such that the model learns a useful latent representation of encoded data but fails to structure this space in a way that allows sampling novel, high-quality data.

My concern is that, what prevents the second stage VAE from falling into this same trap? Isn't it possible that the reconstruction loss on z will dominate in the second stage, so that it can encode and decode samples from the stage one posterior, but fail to structure the second latent space q(u) as normally distributed so that we can't generate samples of z from the "empirical prior"?

Error when building Resnet and Wae models

I get errors when I build the Resnet and Wae models.
For Resnet the error is: assert(scales[-1] == desired_scale)
For Wae the error is: ValueError: Dimensions must be equal, but are 28 and 64 for 'sub_2' (op: 'Sub') with input shapes: [64,28,28,1], [64,64,64,3].

Infogan model is constructing without errors

About finding a sequence of encoders

Hi, thank you for this interesting work! I was trying to read your proof in appendix E2, and got confused about the design of encoder networks. Ideally, if the decoder network is linear, i.e. f_\mu_x(z) = Az + b, the true posterior is also gaussian with mean (I\gamma + A^TA)^{-1}A^T(x-b), which is related with \gamma. However, the mean of the variational posterior in this paper is f_\mu_z(x) which is independent of \gamma. Is there anything wrong? I was trying to figure this out in the following proof, but couldn't understand why equation (30) holds. Since the variable transformation is z' = (z - z*) / \sqrt(\gamma), then shouldn't z' be somehow related with \gamma? If so, why z' can be canceled out in the second term of the second equality when taking the limit \gamma goes to \intfy?

Could you please elaborate on your loss function?

I am trying to reimplement your code in PyTorch and I need to know what is the difference between your loss function and the loss function regarding the vanilla VAE? Based on my experience, your KL-divergence formula is not correct or is not as same as the one that we see in the regular implementation of VAE and there is a subtle difference between them. Could you please explain it a little more? I also have a question about the self.gen_loss1. Could you please explain it?

Another thing which is very worthwhile to mention is that when I optimize the stage 1 network, by the end of the training, I received negative losses for loss_gen1 which I think they are related to self.loggamma_x. When I disabled it and left it constant "0", the loss values did not become negative.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.