twostagevae's People
Forkers
quuhua911 gogumee ominux hsouporto harutatsuakiyama xqpinitial yudeng hlz1992 yyht ferrine chenyuzhong prasanna1991 yizhou012 ttslr taimurahmedmultanwala yes7rose satyam-cyc metachenyiyan strategist922 vnesh-san xrosliang 23pointsnorth daleas0120 mr-wombat sunset-clouds oriondollar wn1695173791 mustphd kyg0910 nhgowtham mountains-high huayong94 surajiyer aaelmelitwostagevae's Issues
a problem about 'unpickle' function
when I try to run the cifar10 dataset, there is a problem occurs:
Traceback (most recent call last):
File "preprocess.py", line 190, in
preporcess_cifar10()
File "preprocess.py", line 149, in preporcess_cifar10
x_train = load_cifar10_data('training')
images_array = np.concatenate(images_array, 0)
File "preprocess.py", line 51, in load_cifar10_data
img_dict = unpickle(filename)
NameError: name 'unpickle' is not defined
Am I miss something?
如何训练自定义数据集?
你好,请问一下如何支持自定义数据集的训练呢?看了一下process.py好像要自己改代码...
Default setting for reproducing the result in your paper
Hi. I would like to really appreciate your work. I have implemented you paper in PyTorch and now I am trying to reproduce your paper results in (Table 1). May I ask you to tell me what are the default settings for CelebA and Cifar10 with which you used to train the TwoStageVAE? In your paper, you referred us to a paper that introduces some hyperparameter settings, not all of them and I think they are incomplete.
pre-processing CIFAR-10
I read the discussion in OpenReview on the sensitivity of the FID score to the min-max-normalization of CIFAR-10 images.
The authors thought it was the min-max normalization carried out by imsave that changes the pixel values of images. However, I find that it is not the case.
In the reference guide of the old version of scipy (https://docs.scipy.org/doc/scipy-1.0.0/reference/generated/scipy.misc.imsave.html), there is a warning message:
"This function uses bytescale under the hood to rescale images to use the full (0, 255) range if tode is one of None, 'L', 'P', 'l'. It will also cast data for 2-D images to uint32 for mode=None (which is the default)."
But bytescale() min-max normalizes the input array if the dtype of the array is not uint8. Actually, the CIFAR-10 3-D images for python have been restored as np.uint8 arrays. Thus, the min-max normalization was not executed.
It seems to be the compression of JPG format (when imsave() saving PIL.Image image object to .jpg file) that cause the changes of pixel values. See https://stackoverflow.com/questions/21949014/python-image-save-changes-the-data
Can you please provide a "requirements.txt" for the python packages and their versions used in this repository sitory
Would you please provide a "requirements.txt" for the python packages and their versions used in this repository
Optimizing gamma to zero and mode collapse
Thank you for your very nice work. It was a very good read.
If you don't mind, I have a few questions:
- You argued for the importance of optimizing gamma, and showed that as gamma goes to zero, the VAE reconstructs the same x for any z~q(z|x). But do we really want to get this scenario? Isn't this mode collapse?
- If I understood correctly, the above happens because the injected noise is rapidly scaled down by the encoder variance, which goes to zero as gamma goes to zero. I think this means that q(z|x) is converging to a delta on some of its dimensions (specifically, on the nonsuperfluous ones). Then how is this nonzero measure?
- Some works use pixel-wise gammas, instead of a scalar one. Does your work easily generalize to this case?
- Given the interesting insights from your paper, I now wonder what should we optimize for. What metric should we track e.g. for early stopping, model selection, etc?
a) The VAE loss
b) Expectation (under z~q(z|x)) of reconstruction loss (since you argue that perfect reconstruction happens at the optima)
c) Deterministic reconstruction loss (i.e. using mean of q(z|x))
d) Wait until gamma gets under certain small threshold
FID score calculation and it's difference from tf version
I am trying to understand how to calculate the FID scores mentioned in the paper using the fid_score.py
file. I understand that the score can be calculated using the evaluate_fid_score
method, but it seems to be done for .npy
files.
I want to know how much the difference will be in the fid score
if I use the method in fid.py
from https://github.com/bioinf-jku/TTUR
as it allows to calculate the score from two folders of images?
pre-processing CelebA
Hi, seems like you use 128 by 128 center crop for celebA, while google use 160 by 160 center crop in their Are GANs Created Equal. See below codes from their repository:
image = tf.image.resize_image_with_crop_or_pad(image, 160, 160) image = tf.image.resize_images(image, [64, 64])
Just wondering if it is a fair comparison? Thanks!
Does reconstruction loss dominate in the 2nd stage VAE?
Hello, I recently read this paper and found it fascinating and very relevant to my work. I am a little bit confused by one aspect of your approach, however. I understand the intuition that in a standard VAE, the reconstruction term dominates such that the model learns a useful latent representation of encoded data but fails to structure this space in a way that allows sampling novel, high-quality data.
My concern is that, what prevents the second stage VAE from falling into this same trap? Isn't it possible that the reconstruction loss on z will dominate in the second stage, so that it can encode and decode samples from the stage one posterior, but fail to structure the second latent space q(u) as normally distributed so that we can't generate samples of z from the "empirical prior"?
loss值变成负的,且绝对值越来越大?
我在训练自己的数据集的时候发现一个问题,当我动了epoch和lr_epoch参数后,很容易在训练200个epoch后loss变成负值....
Values of reported KID Score
Hello!
First of all, thanks for sharing the repo!
I want to ask about reported KID Score values. In the paper available here: https://arxiv.org/pdf/1903.05789.pdf in Table 2 reported KID scores are much bigger than usual KID Score values (see for example here: https://arxiv.org/pdf/1801.01401.pdf Figure 2). Did you scale the values? Do I miss something?
Regards,
Szymon
Error when building Resnet and Wae models
I get errors when I build the Resnet and Wae models.
For Resnet the error is: assert(scales[-1] == desired_scale)
For Wae the error is: ValueError: Dimensions must be equal, but are 28 and 64 for 'sub_2' (op: 'Sub') with input shapes: [64,28,28,1], [64,64,64,3].
Infogan model is constructing without errors
preprocess.py line 164 typo: 'preporcess'
preprocess.py line 164 typo: 'preporcess'
About finding a sequence of encoders
Hi, thank you for this interesting work! I was trying to read your proof in appendix E2, and got confused about the design of encoder networks. Ideally, if the decoder network is linear, i.e. f_\mu_x(z) = Az + b, the true posterior is also gaussian with mean (I\gamma + A^TA)^{-1}A^T(x-b), which is related with \gamma. However, the mean of the variational posterior in this paper is f_\mu_z(x) which is independent of \gamma. Is there anything wrong? I was trying to figure this out in the following proof, but couldn't understand why equation (30) holds. Since the variable transformation is z' = (z - z*) / \sqrt(\gamma), then shouldn't z' be somehow related with \gamma? If so, why z' can be canceled out in the second term of the second equality when taking the limit \gamma goes to \intfy?
Could you please elaborate on your loss function?
I am trying to reimplement your code in PyTorch and I need to know what is the difference between your loss function and the loss function regarding the vanilla VAE? Based on my experience, your KL-divergence formula is not correct or is not as same as the one that we see in the regular implementation of VAE and there is a subtle difference between them. Could you please explain it a little more? I also have a question about the self.gen_loss1. Could you please explain it?
Another thing which is very worthwhile to mention is that when I optimize the stage 1 network, by the end of the training, I received negative losses for loss_gen1 which I think they are related to self.loggamma_x. When I disabled it and left it constant "0", the loss values did not become negative.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.