Giter Site home page Giter Site logo

taldatech / soft-intro-vae-pytorch Goto Github PK

View Code? Open in Web Editor NEW
186.0 186.0 28.0 1.34 MB

[CVPR 2021 Oral] Official PyTorch implementation of Soft-IntroVAE from the paper "Soft-IntroVAE: Analyzing and Improving Introspective Variational Autoencoders"

License: Apache License 2.0

Python 25.77% Jupyter Notebook 74.23%
cvpr2021 density-estimation image-generation pytorch soft-intro-vae soft-introvae vae vae-pytorch variational-autoencoder

soft-intro-vae-pytorch's People

Contributors

taldatech avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

soft-intro-vae-pytorch's Issues

Potential Bugs in the FID Calc?

Hi Daniel @taldatech ,

I found in the CIFAR-10 exp, generation quality of the checkpoint you provided is like,

image

The first two rows are the training data; the following two rows are the reconstruction; the last four rows are the generation.

Using the arg --fid, I got a similar result as the paper reports, 4.37. But actually, generation quality does not seem as good, so I manually recompute FID using repo pytorch_fid and got a result of 25.86 which I think may be more reasonable. Similar phenomena are observed on some other datasets such as CelebA. So I suspect there might be some bugs in the FID calc? Correct me if I were wrong.

Thanks,

System Error

Hello, I am training soft-intro-vae with my own dataset. And these lines of codes always raise a SystemError:
if torch.isnan(lossD) or torch.isnan(lossE): raise SystemError
Can you explain this?

generate function parameters

Hi,

Can you please tell me what the parameters "lod" and "blend_factor" are in /style_soft_intro_vae/model.py line 159 (generate function).

Thank you

Can't Not find weighted sum of the extracted styles

Hi, I want to interpolate in latent vector w space, not latent vector in z space.
I think your encoding code returns z space latent vector, but I want more complicated manipulation in w space.
How can I solve it?

One more question, what is Ci learnable parameter in your code? I want to check Ci you mentioned in your paper.
Thanks.
image

Digital-Monsters dataset

Dear Mr. Tal Daniel,
I hope this message finds you well. I am currently studying this project,Could you share the Digital-Monsters dataset with me? Your generosity in sharing is greatly appreciated.

Thank you sincerely

Training interrupts on Google Colab notebook

Hey,
first of all, thanks for your great work!

I'm trying to run the code on a Google Colab Machine.
I'm having issues with the output of the make_dataloader() function of TFRecordsDataset when training style-soft-intro-vae.

Training works fine until iteration 210 of the first epoch. Then the notebook suddenly stops without throwing any error.
For debugging, I tried iterating over the batches without performing any code in the for loop. The Loop then succesfully iterates over all the batches. However, when I try to get a new batches object returned from make_dataloader(), the Notebook exits again, without throwing any error.

Does this issue occur for anyone else?

Edit:
Running on an Azure VM with the same setup, but without notebook, the training works. However, it would be interesting what exactly is causing the issue with Google Colab.

Best regards

Image quality deteriorates at final image resolution

First off, amazing work on the soft-intro VAEs! Great results, and awesome job on sharing the code, including tutorials.

I'm reaching out for some assistance, as I've been able to reproduce high-quality images on the FFHQ dataset, but I'm struggling to achieve similar results with my own dataset of 256x256 histopathology images. The output seems to deteriorate significantly after the last resolution step. I'm wondering if you have any suggestions for improving the performance of the VAE on this type of dataset.

Here are some images mid training:
sample_95_10

Here, well after the final image resolution step:
sample_192_10

Training loss:
losses

I've played around a bit with the KLD hyperparameters: BETA_KL in [0.05, 0.2, 0.4] and BETA_REC in [0.05, 0.1, 0.2, 0.4]. But this doesn't seem to help much. Also played with the learning rate in the final step (lowering it), similar results. Any suggestions to improve the performance?

Pre-trained model

Hi,

Can you please tell me if there is a pre-trained model available for soft_intro_vae or do we have to train it ourselves? I want to try it on CelebA 256x256 dataset.

Thank you

Question about make_recon_figure_interpolation_2_images.py

Hi, I have succeeded in linear interpolation smoothly, but I want some more.
I want to disentangle latent space.
For Example, I don't want big features to move and small features to move something like styleGAN2.
How Can I solve it?
Thanks!

One more thing, I was wondering this line.

why _latents[0, 0]? I did not understand.

Questions about out-of-Distribution (OOD) Detection

Hi, experts. Nice work to embedded ELBO to objective function of GAN and excellent analyzing exponential ELBO.
I would like to understand more detail for measuring the log-likelihoods between different datasets.
And How to statistic and generate this distribution on the following figure?
OOD distribution

Request of Pretrained Models

Hi Daniel,

Thanks for your great work. Could you provide a pre-trained model, e.g., on CIFAR-10, since I encountered some problems when reproducing exp results in the paper?

Couldn't reconstruct when using trained model.

Hi Daniel,

While training a model using the script of train_soft_intro_vae.py, it reconstructed images in a good way.

However, when I loaded the trained model in jupyter lab and tried to reconstruct images in order to check, it generated something chaos.

Could you tell me how to reconstruct images using the trained model

Thank you

images (real, recon, fake) while training
image_130760

images (real ,recon) which the trained model generated
tes

Question about paper's equation

Thanks for your nice work and let us to use code.
I have some question about paper. As Soft-IntroVAE is modified IntroVAE i read introVAE. At section 3, which is background, i think equation (2) is different from IntroVAE.

image

This equation is from IntroVAE paper where E(x) = D_{KL}(q_φ(z|x)||p(z))
image
And this equation is from Soft-IntroVAE paper. I think L_{E_φ}(x, z) is same with introVAE, but L_{D_θ} is different from introVAE equation with additional E(x) term.
If you don't mind can you explain about this? sorry for the bad english

Sample image question

Hi,
How should I interpret the sample image? I'm training model batch size 4, but I can't interpret each column or row.
Thanks.

a question

Dear Mr. Tal Daniel
I apologize for taking up your time. I have a question and would like to request your help. I trained the soft-intro_vae model using the code you provided, and the image generated during the training process was very good. However, after the model training was completed, I used the trained model to generate images, but received strange results
This is the image generated during the training process

FXF8PMII9)~LRL@0)IWFABF

This is the image generated by calling the trained model

mmexport1711957926710
image_2
Is the method I used to call the trained model incorrect?
Think about seeking your advice on a solution. I hope to receive your help. Thank you very much

Some Question about smooth interpolation Fig.17

Thank you so much for open source code!
But I have a question. In your paper, you said that you created the following images with smooth interference,
how did you do it? Can you give me the code?
image

Thanks Again.

Aborted core dumped error

Hi, I tried to start learning, but the following error occurred. I set requirements and my parameters are as follows. How can I solve this error?

NAME: ffhq
DATASET:
PART_COUNT: 1
SIZE: 85965
FFHQ_SOURCE:
/workspace/data/train_tf/
PATH: /workspace/data/train_tf/

PART_COUNT_TEST: 1
PATH_TEST: /workspace/data/test_tf/

SAMPLES_PATH: /workspace/data/test/
STYLE_MIX_PATH: ./style_mixing

MAX_RESOLUTION_LEVEL: 8
MODEL:
LATENT_SPACE_SIZE: 512
LAYER_COUNT: 7
MAX_CHANNEL_COUNT: 512
START_CHANNEL_COUNT: 64
DLATENT_AVG_BETA: 0.995
MAPPING_LAYERS: 8
BETA_KL: 0.2
BETA_REC: 0.1
BETA_NEG: [2048, 2048, 2048, 1024, 512, 512, 512, 512, 512]
SCALE: 0.000005
OUTPUT_DIR: ./output
TRAIN:
BASE_LEARNING_RATE: 0.002
EPOCHS_PER_LOD: 2
NUM_VAE: 1
LEARNING_DECAY_RATE: 0.1
LEARNING_DECAY_STEPS: []
TRAIN_EPOCHS: 300

4 8 16 32 64 128 256

LOD_2_BATCH_8GPU: [512, 256, 128, 64, 32, 32, 32, 32, 32] # If GPU memory ~16GB reduce last number from 32 to 24
LOD_2_BATCH_4GPU: [ 512, 256, 128, 64, 32, 32, 16, 8, 4 ]
LOD_2_BATCH_2GPU: [ 128, 128, 128, 64, 32, 16, 8 ]
LOD_2_BATCH_1GPU: [ 32, 32, 32, 32, 16, 8, 4 ]

LEARNING_RATES: [0.0015, 0.0015, 0.0015, 0.0015, 0.0015, 0.0015, 0.002, 0.003, 0.003]

2021-01-16 14:11:11,875 logger INFO: Running with config:
DATASET:
FFHQ_SOURCE: /workspace/data/train_tf/
FLIP_IMAGES: False
MAX_RESOLUTION_LEVEL: 8
PART_COUNT: 1
PART_COUNT_TEST: 1
PATH: /workspace/data/train_tf/
PATH_TEST: /workspace/data/test_tf/
SAMPLES_PATH: /workspace/data/test/
SIZE: 85965
SIZE_TEST: 25000
STYLE_MIX_PATH: ./style_mixing
MODEL:
BETA_KL: 0.2
BETA_NEG: [2048, 2048, 2048, 1024, 512, 512, 512, 512, 512]
BETA_REC: 0.1
CHANNELS: 3
DLATENT_AVG_BETA: 0.995
ENCODER: EncoderDefault
GENERATOR: GeneratorDefault
LATENT_SPACE_SIZE: 512
LAYER_COUNT: 7
MAPPING_FROM_LATENT: MappingFromLatent
MAPPING_LAYERS: 8
MAPPING_TO_LATENT: MappingToLatent
MAX_CHANNEL_COUNT: 512
SCALE: 5e-06
START_CHANNEL_COUNT: 64
STYLE_MIXING_PROB: 0.9
TRUNCATIOM_CUTOFF: 8
TRUNCATIOM_PSI: 0.7
Z_REGRESSION: False
NAME: ffhq
OUTPUT_DIR: ./output
PPL_CELEBA_ADJUSTMENT: False
TRAIN:
ADAM_BETA_0: 0.0
ADAM_BETA_1: 0.99
BASE_LEARNING_RATE: 0.002
EPOCHS_PER_LOD: 2
LEARNING_DECAY_RATE: 0.1
LEARNING_DECAY_STEPS: []
LEARNING_RATES: [0.0015, 0.0015, 0.0015, 0.0015, 0.0015, 0.0015, 0.002, 0.003, 0.003]
LOD_2_BATCH_1GPU: [32, 32, 32, 32, 16, 8, 4]
LOD_2_BATCH_2GPU: [128, 128, 128, 64, 32, 16, 8]
LOD_2_BATCH_4GPU: [512, 256, 128, 64, 32, 32, 16, 8, 4]
LOD_2_BATCH_8GPU: [512, 256, 128, 64, 32, 32, 32, 32, 32]
NUM_VAE: 1
REPORT_FREQ: [100, 80, 60, 30, 20, 10, 10, 5, 5]
SNAPSHOT_FREQ: [300, 300, 300, 100, 50, 30, 20, 20, 10]
TRAIN_EPOCHS: 300
Running on Quadro RTX 8000
2021-01-16 14:11:17,697 logger INFO: Trainable parameters decoder:
23984981
2021-01-16 14:11:17,698 logger INFO: Trainable parameters encoder:
26789952
2021-01-16 14:11:17,700 logger INFO: No checkpoint found. Initializing model from scratch
2021-01-16 14:11:17,700 logger INFO: Starting from epoch: 0
2021-01-16 14:11:18,534 logger INFO: ################################################################################
2021-01-16 14:11:18,534 logger INFO: # Switching LOD to 0
2021-01-16 14:11:18,534 logger INFO: # Starting transition
2021-01-16 14:11:18,534 logger INFO: ################################################################################
2021-01-16 14:11:18,534 logger INFO: ################################################################################
2021-01-16 14:11:18,534 logger INFO: # Transition ended
2021-01-16 14:11:18,535 logger INFO: ################################################################################
beta negative changed to: 2048
2021-01-16 14:11:18,535 logger INFO: Batch size: 32, Batch size per GPU: 32, LOD: 0 - 4x4, blend: 1.000, dataset size: 85965
Aborted (core dumped)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.