cdoersch / vae_tutorial Goto Github PK

Caffe code to accompany my Tutorial on Variational Autoencoders

License: MIT License

Python 100.00%

vae_tutorial's Introduction

Tutorial on Variational Autoencoders

Introduction

This code is a supplement to the Tutorial on Variational Autoencoders. It allows you to reproduce the example experiments in the tutorial's later sections.

This code contains two demos. The first is a standard Variational Autoencoder (VAE) for MNIST. The second is a Conditional Variational Autoencoder (CVAE) for reconstructing a digit given only a noisy, binarized column of pixels from the digit's center. For details on the experimental setup, see the paper.

No additional Caffe layers are needed to make a VAE/CVAE work in Caffe. The only requirements are a working Caffe/pycaffe installation. A GPU will make the experiments run faster, but is not necessary (comment out set_mode_gpu() in the python files if you don't have one). On my system (a Titan X), these experiments all complete in about 10 minutes.

VAE and CVAE Network Structure

The code will generate a network drawing, but for convenience I've included the result of that drawing here. This is for the VAE:

VAE
Train Net	Test Net

Here is a side-by-side comparison between the CVAE and regressor which solve the same problem. Note that both networks have several initial layers for constructing the input and output data that's used to train the network.

CVAE and Regressor
CVAE Train Net	CVAE Test Net	Regressor Train Net	Regressor Test Net

Setup

Install Caffe (see: Caffe installation instructions). Build Caffe and pycaffe. For this readme, we'll call the installation path $CAFFE_PATH.
Clone this repo. For this readme, we'll call the installation path $TUTORIAL_PATH

git clone https://github.com/cdoersch/vae_tutorial.git

Download MNIST using Caffe's pre-packaged downloader, and run create_mnist.sh to create an lmdb.

  cd $CAFFE_PATH/data/mnist/
  ./get_mnist.sh
  cd $CAFFE_PATH/
  ./examples/mnist/create_mnist.sh

Optional: create a symlink for snapshots.

  cd $TUTORIAL_PATH
  ln -s [...] snapshots

Running the VAE

Edit mnist_vae.prototxt and enter the correct "source" path to the training lmdb (line 13)
Run the code. Make sure $CAFFE_PATH/python is on your PYTHONPATH.

  cd $TUTORIAL_PATH
  python mnist_vae.py

Note that the python is only required for generating the visualizations: the net can also be trained simply by calling

  $CAFFE_PATH/build/tools/caffe train --solver=mnist_vae_solver_adam.prototxt

Running the CVAE

Edit mnist_cvae.prototxt and enter the correct "source" path for BOTH training and testing lmdb's (line 13 AND 29)
Run the code. Make sure $CAFFE_PATH/python is on your PYTHONPATH.

  cd $TUTORIAL_PATH
  python mnist_cvae.py

Note that the python is only required for generating the visualizations: the net can also be trained simply by calling

  $CAFFE_PATH/build/tools/caffe train --solver=mnist_cvae_solver_adam.prototxt

Optional: do the same thing for the regressor to see the baseline results. After altering the "source" paths in mnist_regressor.prototxt, run:

  cd $TUTORIAL_PATH
  python mnist_regressor.py

vae_tutorial's People

Contributors

Stargazers

Watchers

Forkers

zhixinshu oasisyang lenovor akiratu lizhangzhan chongyang915 liusifei caomw zhaoyang10 ilovecv arnabgho rt0220 wlike sqxiang zilongzhong adrianhust kgl-prml zizhuozhang sdujump qingsong99 zashani ml-lab odellus vyraun jren2017 fujianhai yifangfu ehfo0 youngjt jzhang45 jinyu0310 dongzhuoyao xmyqsh bobohuang achraf-oussidi furongpeng justindoghouse shihenw yifanh syllcs youngleec sincerezzq xuekuanwang hityzy cold-blue shubhampachori12110095 tqdavid jizhihang cjyanyi weixsong satchelwu chcbin hins zengjianping shi27feng marcucla adamdm jryongithub medicalimageanalysisgroup nemocpp walter2018huang cpuyyp sumihui whirlfirst marthavk florentchen hkg2018 miaoyuanyuan brimborough kraken000 xseaty nanzhixionggit wooramkang jaykimbravekjh pinglmlcv ymlml manishdash12 alucardmini liyaangy emmacb 3rtz luiscoding luzhang96 thaumkid hlz1992 jiaruili314 wwxfromtju siviltaram yanxg wangyongguang c-song zhuchangyan shanzhangming dyh1996 linty5 githubgreat886 sienna13 marine01 wgwangang jjkke88

vae_tutorial's Issues

The kldiv(power) layer

Thank you for this great tutorial and the code. But I have trouble understanding the kldiv(Power) layer in mnist_vae. According to the tutorial, the loss function should contain D[Q(z|X)||P(z)], not exp(D[Q(z|X)||P(z)]). So why do we need this layer?

Determinant in KLD

There is a log of the determinant of the covariance term in the KLD equation, however, I see that you have a sum instead of a product, where the determinant of a diagonal matrix would be a product. Could you provide some insight as to why this is?

KL loss compare to Keras

Thank you the nice tutorial and supporting code. I made a plot (attached) of KL Loss vs iterations of your implementation and that of Keras (blog, code). Could you please provide insight as to why the KL loss for your implementation is going up?

KL-divergence loss

hi, I find there maybe a issue in model prototxt about the KL-divergence loss bewteen Q(z|X) and P(z).
In the paper, the KL-divergence of Enquation 7:

The first term is trace of diagonal matrix and should be sum of all diagonal elements, ex: x1+x2+x3.
But in the model file implementation is sum of square in diagonal.

layer{
name: "var"
type: "Eltwise"
bottom: "sd"
bottom: "sd"
top: "var"
eltwise_param{
operation: PROD
}
include {
phase: TRAIN
}
}

layer{
name: "kldiv_plus_half"
type: "Eltwise"
bottom: "meansq"
bottom: "var"
bottom: "logsd"
top: "kldiv_plus_half"
eltwise_param{
operation: SUM
coeff: 0.5
coeff: 0.5
coeff: -1.0
}
include {
phase: TRAIN
}
}

That makes me some confuse.

VAE loss seems to differ in paper and implementation

Hi, thanks for a great tutorial on VAEs!
I have a quick question about the implementation. In the tutorial, the reconstruction loss is L2 (as I thought it should be).

However, in the Caffe implementation, there is what seems to be an additional cross entropy reconstruction loss

What is the purpose of this loss? Or am I missing something?

I realise cross entropy loss if often better for less blurry images, but since we parametrize P(X|z) by a gaussian with mean f(z), I thought the log likelihood should be proportional to ||X - f(z)||^2.

Thank you!

got the error "Check failed: target_blobs.size() == source_layer.blobs_size() (2 vs. 0) "

When i run python mnist_vae.py, got "Check failed: target_blobs.size() == source_layer.blobs_size() (2 vs. 0) Incompatible number of blobs for layer decode4". This error come from 'net=caffe.Net('mnist_vae.prototxt','snapshots/mnist_vae_iter_60000.caffemodel', caffe.TEST'.
I have checked the code,but found nothing wrong. hope for your help! 😭

Understanding input to mu and logsd layers

Hi thanks for the great tutorial. I have trouble understanding math. What is the reason to pass in encode3 to logsd before the nonlinearity is applied? Why not give encode3neur to both mu and logsd? I would ask if it's a typo, but running the reference prototxt, I can make it converge.

I have combined the VAE layers with convolution and deconvolution layers, and am having trouble training MNIST with this new architecture. (Using Sigmoid neurons instead of ReLU, if that matters).

Confused of loss=nan

I have some trouble happening with loss=nan. I am confused why it happened by using my own data.
I modify the "batch size" from 100 to 1, and then, modify the param of "Dummydata" shape dim from 100 to 1. But I don't know whether should I modify Reduction loss_weight. Is that the key factor influencing the mistake result loss=nan?