taldatech / soft-intro-vae-pytorch Goto Github PK

[CVPR 2021 Oral] Official PyTorch implementation of Soft-IntroVAE from the paper "Soft-IntroVAE: Analyzing and Improving Introspective Variational Autoencoders"

License: Apache License 2.0

Python 25.77% Jupyter Notebook 74.23%

cvpr2021 density-estimation image-generation pytorch soft-intro-vae soft-introvae vae vae-pytorch variational-autoencoder

soft-intro-vae-pytorch's People

Contributors

Stargazers

Watchers

soft-intro-vae-pytorch's Issues

Potential Bugs in the FID Calc?

Hi Daniel @taldatech ,

I found in the CIFAR-10 exp, generation quality of the checkpoint you provided is like,

The first two rows are the training data; the following two rows are the reconstruction; the last four rows are the generation.

Using the arg --fid, I got a similar result as the paper reports, 4.37. But actually, generation quality does not seem as good, so I manually recompute FID using repo pytorch_fid and got a result of 25.86 which I think may be more reasonable. Similar phenomena are observed on some other datasets such as CelebA. So I suspect there might be some bugs in the FID calc? Correct me if I were wrong.

Thanks,

System Error

Hello, I am training soft-intro-vae with my own dataset. And these lines of codes always raise a SystemError:
if torch.isnan(lossD) or torch.isnan(lossE): raise SystemError
Can you explain this?

Reproducing 2d results

What are the 12 command line instructions necessary to reproduce these results?

Recommended Hyper-Params for The Enc-Dec Arch on MNIST

Hi Daniel,

May you check the recommendation of hyper-params for MNIST here in
https://github.com/taldatech/soft-intro-vae-pytorch/tree/main/soft_intro_vae#recommended-hyperparameters?

I just found some spurious things, like

The training process. kl_real > kl_fake.
Random sampling in the latent space (not perfect)

Thanks.

generate function parameters

Hi,

Can you please tell me what the parameters "lod" and "blend_factor" are in /style_soft_intro_vae/model.py line 159 (generate function).

Thank you

Can't Not find weighted sum of the extracted styles

Hi, I want to interpolate in latent vector w space, not latent vector in z space.
I think your encoding code returns z space latent vector, but I want more complicated manipulation in w space.
How can I solve it?

soft-intro-vae-pytorch/style_soft_intro_vae/make_figures/make_recon_figure_interpolation_2_images.py

Line 126 in 6942a49

z, mu, _ = model.encode(x, layer_count - 1, 1)

One more question, what is Ci learnable parameter in your code? I want to check Ci you mentioned in your paper.
Thanks.

Digital-Monsters dataset

Dear Mr. Tal Daniel,
I hope this message finds you well. I am currently studying this project,Could you share the Digital-Monsters dataset with me? Your generosity in sharing is greatly appreciated.

Thank you sincerely

Training interrupts on Google Colab notebook

Hey,
first of all, thanks for your great work!

I'm trying to run the code on a Google Colab Machine.
I'm having issues with the output of the make_dataloader() function of TFRecordsDataset when training style-soft-intro-vae.

Training works fine until iteration 210 of the first epoch. Then the notebook suddenly stops without throwing any error.
For debugging, I tried iterating over the batches without performing any code in the for loop. The Loop then succesfully iterates over all the batches. However, when I try to get a new batches object returned from make_dataloader(), the Notebook exits again, without throwing any error.

Does this issue occur for anyone else?

Edit:
Running on an Azure VM with the same setup, but without notebook, the training works. However, it would be interesting what exactly is causing the issue with Google Colab.

Best regards

Image quality deteriorates at final image resolution

First off, amazing work on the soft-intro VAEs! Great results, and awesome job on sharing the code, including tutorials.

I'm reaching out for some assistance, as I've been able to reproduce high-quality images on the FFHQ dataset, but I'm struggling to achieve similar results with my own dataset of 256x256 histopathology images. The output seems to deteriorate significantly after the last resolution step. I'm wondering if you have any suggestions for improving the performance of the VAE on this type of dataset.

Here are some images mid training:

Here, well after the final image resolution step:

Training loss:

I've played around a bit with the KLD hyperparameters: BETA_KL in [0.05, 0.2, 0.4] and BETA_REC in [0.05, 0.1, 0.2, 0.4]. But this doesn't seem to help much. Also played with the learning rate in the final step (lowering it), similar results. Any suggestions to improve the performance?

Pre-trained model

Hi,

Can you please tell me if there is a pre-trained model available for soft_intro_vae or do we have to train it ourselves? I want to try it on CelebA 256x256 dataset.

Thank you

Question about make_recon_figure_interpolation_2_images.py

Hi, I have succeeded in linear interpolation smoothly, but I want some more.
I want to disentangle latent space.
For Example, I don't want big features to move and small features to move something like styleGAN2.
How Can I solve it?
Thanks!

One more thing, I was wondering this line.

soft-intro-vae-pytorch/style_soft_intro_vae/make_figures/make_recon_figure_interpolation_2_images.py

Line 154 in 6942a49

latents = _latents[0, 0]

why _latents[0, 0]? I did not understand.

Inconsistency between an equation and implementation in expELBO?

(Thank you for making whole codes publicly available, especially I really like easy-to-run colab demo!)

In implementation, exp ELBO term is computed both for D(E(x)) and D(z). However the term is only computed for D(z) in Algorithm 1 in paper. Could you elaborate a bit more about the difference?

Questions about out-of-Distribution (OOD) Detection

Hi, experts. Nice work to embedded ELBO to objective function of GAN and excellent analyzing exponential ELBO.
I would like to understand more detail for measuring the log-likelihoods between different datasets.
And How to statistic and generate this distribution on the following figure?
OOD distribution

Request of Pretrained Models

Hi Daniel,

Thanks for your great work. Could you provide a pre-trained model, e.g., on CIFAR-10, since I encountered some problems when reproducing exp results in the paper?

Couldn't reconstruct when using trained model.

Hi Daniel,

While training a model using the script of train_soft_intro_vae.py, it reconstructed images in a good way.

However, when I loaded the trained model in jupyter lab and tried to reconstruct images in order to check, it generated something chaos.

Could you tell me how to reconstruct images using the trained model

Thank you

images (real, recon, fake) while training

images (real ,recon) which the trained model generated

Question about paper's equation

Thanks for your nice work and let us to use code.
I have some question about paper. As Soft-IntroVAE is modified IntroVAE i read introVAE. At section 3, which is background, i think equation (2) is different from IntroVAE.

This equation is from IntroVAE paper where E(x) = D_{KL}(q_φ(z|x)||p(z))

And this equation is from Soft-IntroVAE paper. I think L_{E_φ}(x, z) is same with introVAE, but L_{D_θ} is different from introVAE equation with additional E(x) term.
If you don't mind can you explain about this? sorry for the bad english

Sample image question

Hi,
How should I interpret the sample image? I'm training model batch size 4, but I can't interpret each column or row.
Thanks.

a question

Dear Mr. Tal Daniel
I apologize for taking up your time. I have a question and would like to request your help. I trained the soft-intro_vae model using the code you provided, and the image generated during the training process was very good. However, after the model training was completed, I used the trained model to generate images, but received strange results
This is the image generated during the training process

This is the image generated by calling the trained model

Is the method I used to call the trained model incorrect?
Think about seeking your advice on a solution. I hope to receive your help. Thank you very much

Some Question about smooth interpolation Fig.17

Thank you so much for open source code!
But I have a question. In your paper, you said that you created the following images with smooth interference,
how did you do it? Can you give me the code?

Thanks Again.

Aborted core dumped error

Hi, I tried to start learning, but the following error occurred. I set requirements and my parameters are as follows. How can I solve this error?

NAME: ffhq
DATASET:
PART_COUNT: 1
SIZE: 85965
FFHQ_SOURCE:
/workspace/data/train_tf/
PATH: /workspace/data/train_tf/

PART_COUNT_TEST: 1
PATH_TEST: /workspace/data/test_tf/

SAMPLES_PATH: /workspace/data/test/
STYLE_MIX_PATH: ./style_mixing

MAX_RESOLUTION_LEVEL: 8
MODEL:
LATENT_SPACE_SIZE: 512
LAYER_COUNT: 7
MAX_CHANNEL_COUNT: 512
START_CHANNEL_COUNT: 64
DLATENT_AVG_BETA: 0.995
MAPPING_LAYERS: 8
BETA_KL: 0.2
BETA_REC: 0.1
BETA_NEG: [2048, 2048, 2048, 1024, 512, 512, 512, 512, 512]
SCALE: 0.000005
OUTPUT_DIR: ./output
TRAIN:
BASE_LEARNING_RATE: 0.002
EPOCHS_PER_LOD: 2
NUM_VAE: 1
LEARNING_DECAY_RATE: 0.1
LEARNING_DECAY_STEPS: []
TRAIN_EPOCHS: 300

4 8 16 32 64 128 256

LOD_2_BATCH_8GPU: [512, 256, 128, 64, 32, 32, 32, 32, 32] # If GPU memory ~16GB reduce last number from 32 to 24
LOD_2_BATCH_4GPU: [ 512, 256, 128, 64, 32, 32, 16, 8, 4 ]
LOD_2_BATCH_2GPU: [ 128, 128, 128, 64, 32, 16, 8 ]
LOD_2_BATCH_1GPU: [ 32, 32, 32, 32, 16, 8, 4 ]

LEARNING_RATES: [0.0015, 0.0015, 0.0015, 0.0015, 0.0015, 0.0015, 0.002, 0.003, 0.003]

2021-01-16 14:11:11,875 logger INFO: Running with config:
DATASET:
FFHQ_SOURCE: /workspace/data/train_tf/
FLIP_IMAGES: False
MAX_RESOLUTION_LEVEL: 8
PART_COUNT: 1
PART_COUNT_TEST: 1
PATH: /workspace/data/train_tf/
PATH_TEST: /workspace/data/test_tf/
SAMPLES_PATH: /workspace/data/test/
SIZE: 85965
SIZE_TEST: 25000
STYLE_MIX_PATH: ./style_mixing
MODEL:
BETA_KL: 0.2
BETA_NEG: [2048, 2048, 2048, 1024, 512, 512, 512, 512, 512]
BETA_REC: 0.1
CHANNELS: 3
DLATENT_AVG_BETA: 0.995
ENCODER: EncoderDefault
GENERATOR: GeneratorDefault
LATENT_SPACE_SIZE: 512
LAYER_COUNT: 7
MAPPING_FROM_LATENT: MappingFromLatent
MAPPING_LAYERS: 8
MAPPING_TO_LATENT: MappingToLatent
MAX_CHANNEL_COUNT: 512
SCALE: 5e-06
START_CHANNEL_COUNT: 64
STYLE_MIXING_PROB: 0.9
TRUNCATIOM_CUTOFF: 8
TRUNCATIOM_PSI: 0.7
Z_REGRESSION: False
NAME: ffhq
OUTPUT_DIR: ./output
PPL_CELEBA_ADJUSTMENT: False
TRAIN:
ADAM_BETA_0: 0.0
ADAM_BETA_1: 0.99
BASE_LEARNING_RATE: 0.002
EPOCHS_PER_LOD: 2
LEARNING_DECAY_RATE: 0.1
LEARNING_DECAY_STEPS: []
LEARNING_RATES: [0.0015, 0.0015, 0.0015, 0.0015, 0.0015, 0.0015, 0.002, 0.003, 0.003]
LOD_2_BATCH_1GPU: [32, 32, 32, 32, 16, 8, 4]
LOD_2_BATCH_2GPU: [128, 128, 128, 64, 32, 16, 8]
LOD_2_BATCH_4GPU: [512, 256, 128, 64, 32, 32, 16, 8, 4]
LOD_2_BATCH_8GPU: [512, 256, 128, 64, 32, 32, 32, 32, 32]
NUM_VAE: 1
REPORT_FREQ: [100, 80, 60, 30, 20, 10, 10, 5, 5]
SNAPSHOT_FREQ: [300, 300, 300, 100, 50, 30, 20, 20, 10]
TRAIN_EPOCHS: 300
Running on Quadro RTX 8000
2021-01-16 14:11:17,697 logger INFO: Trainable parameters decoder:
23984981
2021-01-16 14:11:17,698 logger INFO: Trainable parameters encoder:
26789952
2021-01-16 14:11:17,700 logger INFO: No checkpoint found. Initializing model from scratch
2021-01-16 14:11:17,700 logger INFO: Starting from epoch: 0
2021-01-16 14:11:18,534 logger INFO: ################################################################################
2021-01-16 14:11:18,534 logger INFO: # Switching LOD to 0
2021-01-16 14:11:18,534 logger INFO: # Starting transition
2021-01-16 14:11:18,534 logger INFO: ################################################################################
2021-01-16 14:11:18,534 logger INFO: ################################################################################
2021-01-16 14:11:18,534 logger INFO: # Transition ended
2021-01-16 14:11:18,535 logger INFO: ################################################################################
beta negative changed to: 2048
2021-01-16 14:11:18,535 logger INFO: Batch size: 32, Batch size per GPU: 32, LOD: 0 - 4x4, blend: 1.000, dataset size: 85965
Aborted (core dumped)

taldatech / soft-intro-vae-pytorch Goto Github PK

soft-intro-vae-pytorch's People

Contributors

Stargazers

Watchers

Forkers

soft-intro-vae-pytorch's Issues

4 8 16 32 64 128 256

Recommend Projects

Recommend Topics

Recommend Org