rinuboney / ladder Goto Github PK

View Code? Open in Web Editor NEW

242.0 25.0 92.0 17 KB

Ladder network is a deep learning algorithm that combines supervised and unsupervised learning.

License: MIT License

Python 100.00%

ladder-network deep-learning-algorithms unsupervised-learning

ladder's People

Contributors

Stargazers

Watchers

Forkers

mphielipp wgmueller1 krasin codeaudit sushrutkarnik daviddao speakerjohnash kkboy123 asmith26 gandalfvn zero76114 veterun lianggou desperado1992 alexmihaj blackyang asilvino digitalepidemiologylab jmast drorhilman jmrinaldi vyraun benjamesbabala shenyuanyuan cmlbemml wen036 aquamera junliangma khushhallchandra shaform avisingh599 aromalp renmengye zilongzhong toiyeuvietnam1986 jeremyhide netist bclavie tianxiangchen2015 siarez claudiogreco michaelfeng87 jonganej adyashadash vishalramaswamy hdubey tedliaotw ju-chang hugours xuansen sakishinoda mad-ant lxwithgod leido magicharry ajaycharan zhangyijia1979 niucheney aleum falloutboyrocks acanakoglu vibhutim xcodeburpx superrookie007 shankarmb mynamezkj devpouya navy1989 floatdrop birajaghoshal pollenjp yaolezju westamine qweshpd eleozzr jaykimbravekjh ntran16 dpcraft drandilowe vlainic libertyeagle nnu-gisa alisa-wy wuyifanisai aivanni createrll fenix-nong elijahahianyo shun-ryu

ladder's Issues

Larger error for MNIST MLP baseline supervised compared to paper's results

Hello,

I did not find if anyone tried this TF version of the ladder network fully supervised to compare with the paper's baseline MLP result from table 1 so I tested it for that case.

I am running into the problem of getting a larger error for running the script entirely supervised (i.e. loss = corrupted supervised cost only or setting denoising costs to all zeros) with num_labels = 60000. I tried two different Adam optimizer settings: (1) default as provided in this TF implementation, with LR decay starting at epoch 15 and (2) matching the parameters to the paper's provided code, with LR decay starting at epoch 100, as seen here. After running for 150 epochs, setting 1 performed better in this case (error decreases to ~10-15% early on but doesn't improve after that); for setting 2, error is larger throughout and generally >90% after 150 epochs. In the paper they got an error of 0.80 (± 0.03)% so I am a ways off and not sure where the problem could be. I also tried removing all parts of the code dealing with unlabeled portion as that is not used here, but no luck there. Any ideas on where the problem may be would be helpful.

Thanks!

incorrect mean cross entropy when batch sizes vary

The cross entropy is divided by batch_size 100 (here and here) but evaluation data is not fed in batches of that size.

Code could be:

-tf.reduce_mean(tf.reduce_sum(outputs*tf.log(y), 1))

Normalization & Gamma

I don't understand two points in your code:

if l == L: #Convert z and apply softmax for the last layer. (TODO: Only for prediction or if we pass through encoder?) h = tf.nn.softmax(weights['gamma'][l-1] * (z+weights['beta'][l-1])) else: h = tf.nn.relu(z + weights['beta'][l-1]) #TODO: No gamma?

Why do you leave out the gamma in the second calculation? I neither Valpola2015 nor Bengio2016 I can see a reason for this. Maybe you can explain.

Second:
def eval_batch_norm(): # Evaluation batch normalization # obtain average mean and variance and use it to normalize the batch mean = ewma.average(running_mean[l-1]) var = ewma.average(running_var[l-1]) z = batch_normalization(z_pre, mean, var) return z

I don't understand, why we are using the std and mean from the labeled data that we have accumulated (is this correct?). I thought for evaluation we are using the batchmean and batchstd.

Also another thing: I am sure I was just too lazy to understand, but are we also saving the labeled data in the dictionary under 'unlabeled'?
The reason I think this is, that in the decoding phase we are only using the data 'unlabeled', but we also must learn on the labeled data (take the recons. cost into account). Especially if we have only labeled data.
I hope there is still somebody active here!

Labelled / unlabelled data split by batch_size=100 during evaluation

During evaluation all the data has labels so the labelled and unlabelled split should reflect this I think. The current split by batch_size is incorrect but ultimately does not have an impact because both the labelled and unlabelled data get treated the same during evaluation.

Previously though, this was not true and this split would have coincided with the batch normalization code being discuss in #3 to use evaluation set mean and var for normalization of incorrectly split unlabelled data. All my results discussed in #3 do not rely on changing this but instead only changed the evaluation results with changes to the batch normalization paths.

It may be clearer to split the inputs at a different point in the graph outside the encoder so that labelled data gets passed to the encoder with resulting labelled output nodes and then the same thing is done with unlabelled data to create different unlabelled output nodes.

I can attempt this in a pull request if you like.

why you put this line in code

Recurrent implementation

It was mentioned in the original article that such kind of model could be easily used in RNNs. I'm trying to implement such a model now, but I wonder which of the connections should be batch-normilized.
This paper (https://arxiv.org/pdf/1603.09025v4.pdf) provides instructions to use several batch normalizations across LSTM calculation routine, another (https://arxiv.org/pdf/1510.01378v1.pdf), and also this approach works for my data, not the first, say that we should normalize only inputs. Are there any ideas and research how effectively transform LSTM into Ladder LSTM?

Bug Report

In ladder.py, line 179, you normalized z_est by:
z_est_bn = (z_est[l] - m) / v
m, v are calculated in line 92,
m, v = tf.nn.moments(z_pre_u, axes=[0])
which means that v is variance, not the standard deviation. So line 179 should be modified to: z_est_bn = (z_est[l] - m) / tf.sqrt(v + epsilon)
where epsilon is a small number.
Hope that helps.

Computation of running average of layer std and mean

learning rate declining faster than straight line

To get a straight line decline, the learning rate update should be:

learning_rate = starter_learning_rate * ratio

Also, I don't think it is good practice to assign a tf.Variable() using =. I think creating a learning rate placeholder and then setting the learning rate in the feed_dict for training is better.

Definition of prediction cost and correct prediction -- only labelled?

Shouldn't the definitions for prediction cost and correct prediction (see current definitions below) be changed to only include labelled examples?

pred_cost = -tf.reduce_mean(tf.reduce_sum(outputs*tf.log(y), 1))  # cost used for prediction
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(outputs, 1))  # no of correct predictions

Type error!

when I run the ladder.py, I get this error:

File "E:\DL_test\semi-supervised\ladder-master\input_data.py", line 158, in init
i = indices[y==c][:n_from_each_class]

TypeError: slice indices must be integers or None or have an index method

Could someone help me? many thanks!

initialization value 0 should not be multiplied by tf.Variable()

Thanks very much for sharing your work. I was struggling to implement a version of this myself.

This code, however, doesn't match the outcomes in the paper and I think it is because of initializations like:

wi = lambda inits, name: inits * tf.Variable(tf.ones([size]), name=name)
a1 = wi(0., 'a1')

I think the in built tensorflow optimizations can only update the trainable tf.Variables so by multiplying the tf.Variable by 0 (effectively removing them from the optimization) a very different model is being fit.

You can see this by trying to duplicate the paper results by using the g_gauss function in the decoder and seeing that the d_cost outputs never change.

I think a fix is to move inits into the initialization value when defining a new variable. This way the value of inits will only impact the beginning value. Like this:

 wi = lambda inits, name: tf.Variable(inits * tf.ones([size]), name=name)

Though training is still running the d_cost values are updating and accuracy is much higher - 98.65% after 10 epochs.

The effect of tf.train.ExponentialMovingAverage

I have a little question about the code:
At line 55:
ewma = tf.train.ExponentialMovingAverage(decay=0.99)

My question is:

What is the role of the function tf.train.ExponentialMovingAverage? In my option, it may help improve the accuracy of the final classification for mnist? I also read the official doc of tensorflow(https://www.tensorflow.org/api_docs/python/tf/train/ExponentialMovingAverage), but I don't really understand. I guess that the code would also run correctly if there is no tf.train.ExponentialMovingAverage.

I hope some help me to answer this question.
thanks a lot.

Intermediate layers do not make use of gamma parameters

Hi there,

First of all thanks for this implementation, I have been making good use of it in my efforts to study the ladder network.

One thing I noticed, in line 130 of the current ladder.py, you have:
h = tf.nn.relu(z + weights["beta"][l-1])

shouldn't this be:
h = tf.nn.relu(weights['gamma'][l-1] * (z + weights["beta"][l-1]))
? (As per equation 10 here)

Regards,
Liam

PR to tensorflow

Can you open a WIP PR on Tensorflow?

clean batch normalization should use batch mean and var for training

I think you are using noise_std > 0 to separate both clean vs corrupted path as well as training vs eval. This causes a problem because during evaluation batch norm mean and var should always be based on the training example averages while during training batch norm is meant to introduce regularization via noise by using the batch mean and var.

I changed the code so that update_batch_norm only ran during training on the clean path and always normalized with the mean and var of the batch. Like this:

def update_batch_normalization(batch, mean, var, l):
  if not mean or not var:
    mean, var = tf.nn.moments(batch, axes=[0])
  assign_mean = running_mean[l-1].assign(mean)
  assign_var = running_var[l-1].assign(var)
  bn_assigns.append(ewma.apply([running_mean[l-1], running_var[l-1]]))
  with tf.control_dependencies([assign_mean, assign_var]):
    return (batch - mean) / tf.sqrt(var + 1e-12)

I passed a boolean placeholder to the encoder to separate training loops from evaluation loops. Then inside the encoder I used batch_norm to normalize by the running averages outside of training steps.

if training and noise_std == 0.0:
  z = join(update_batch_normalization(z_pre_l, m_l, v_l, l), batch_normalization(z_pre_u, m, v))
elif training:
  z = join(batch_normalization(z_pre_l, m_l, v_l), batch_normalization(z_pre_u, m, v))
else:
   mean = ewma.average(running_mean[l-1])
   var = ewma.average(running_var[l-1])
   z = join(batch_normalization(z_pre_l, m_l, mean, var), batch_normalization(z_pre_u, mean, var))

This may still not be completely right since I was making all examples labeled examples. With this and the variable initialization fix I trained with 60k labeled examples down to 0.59% error.

ladder for CIFAR-10

Can you kindly help me running your code for CIFAR-10 data..

z_est_bn = (z_est[l] - m) / v

why we need to put this line in decoder in line 179??? after you apply batch normalization

it seems like you apply mean and v for output layer before and after g_gauss
u = batch_normalization(u)
z_est[l] = g_gauss(z_c, u, layer_sizes[l])
z_est_bn = (z_est[l] - m) / v

Index out of range in ladder.py

Hello, I have the following issue when running your code:

"Traceback (most recent call last):
File "/home/mblack/stage/ladder-master (1)/ladder.py", line 177, in
u = tf.matmul(z_est[l+1], weights['V'][l])
IndexError: list index out of range"

Can you please tell me what's going wrong?

Error when running ladder.py

I get the following error when running ladder.py. Please, advise on how to solve it.

=== Corrupted Encoder ===
Layer 1 : 784 -> 1000
Traceback (most recent call last):
File "ladder.py", line 138, in
y_c, corr = encoder(inputs, noise_std)
File "ladder.py", line 93, in encoder
m, v = tf.nn.moments(z_pre_u, axes=[0])
File "/Users/lfc009/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/nn.py", line 536, in moments
divisor *= x.get_shape()[d].value
TypeError: unsupported operand type(s) for *=: 'float' and 'NoneType'