rinuboney / ladder Goto Github PK
View Code? Open in Web Editor NEWLadder network is a deep learning algorithm that combines supervised and unsupervised learning.
License: MIT License
Ladder network is a deep learning algorithm that combines supervised and unsupervised learning.
License: MIT License
Hello,
I did not find if anyone tried this TF version of the ladder network fully supervised to compare with the paper's baseline MLP result from table 1 so I tested it for that case.
I am running into the problem of getting a larger error for running the script entirely supervised (i.e. loss = corrupted supervised cost only or setting denoising costs to all zeros) with num_labels = 60000. I tried two different Adam optimizer settings: (1) default as provided in this TF implementation, with LR decay starting at epoch 15 and (2) matching the parameters to the paper's provided code, with LR decay starting at epoch 100, as seen here. After running for 150 epochs, setting 1 performed better in this case (error decreases to ~10-15% early on but doesn't improve after that); for setting 2, error is larger throughout and generally >90% after 150 epochs. In the paper they got an error of 0.80 (± 0.03)% so I am a ways off and not sure where the problem could be. I also tried removing all parts of the code dealing with unlabeled portion as that is not used here, but no luck there. Any ideas on where the problem may be would be helpful.
Thanks!
I don't understand two points in your code:
if l == L: #Convert z and apply softmax for the last layer. (TODO: Only for prediction or if we pass through encoder?) h = tf.nn.softmax(weights['gamma'][l-1] * (z+weights['beta'][l-1])) else: h = tf.nn.relu(z + weights['beta'][l-1]) #TODO: No gamma?
Why do you leave out the gamma in the second calculation? I neither Valpola2015 nor Bengio2016 I can see a reason for this. Maybe you can explain.
Second:
def eval_batch_norm(): # Evaluation batch normalization # obtain average mean and variance and use it to normalize the batch mean = ewma.average(running_mean[l-1]) var = ewma.average(running_var[l-1]) z = batch_normalization(z_pre, mean, var) return z
I don't understand, why we are using the std and mean from the labeled data that we have accumulated (is this correct?). I thought for evaluation we are using the batchmean and batchstd.
Also another thing: I am sure I was just too lazy to understand, but are we also saving the labeled data in the dictionary under 'unlabeled'?
The reason I think this is, that in the decoding phase we are only using the data 'unlabeled', but we also must learn on the labeled data (take the recons. cost into account). Especially if we have only labeled data.
I hope there is still somebody active here!
During evaluation all the data has labels so the labelled and unlabelled split should reflect this I think. The current split by batch_size is incorrect but ultimately does not have an impact because both the labelled and unlabelled data get treated the same during evaluation.
Previously though, this was not true and this split would have coincided with the batch normalization code being discuss in #3 to use evaluation set mean and var for normalization of incorrectly split unlabelled data. All my results discussed in #3 do not rely on changing this but instead only changed the evaluation results with changes to the batch normalization paths.
It may be clearer to split the inputs at a different point in the graph outside the encoder so that labelled data gets passed to the encoder with resulting labelled output nodes and then the same thing is done with unlabelled data to create different unlabelled output nodes.
I can attempt this in a pull request if you like.
It was mentioned in the original article that such kind of model could be easily used in RNNs. I'm trying to implement such a model now, but I wonder which of the connections should be batch-normilized.
This paper (https://arxiv.org/pdf/1603.09025v4.pdf) provides instructions to use several batch normalizations across LSTM calculation routine, another (https://arxiv.org/pdf/1510.01378v1.pdf), and also this approach works for my data, not the first, say that we should normalize only inputs. Are there any ideas and research how effectively transform LSTM into Ladder LSTM?
In ladder.py, line 179, you normalized z_est by:
z_est_bn = (z_est[l] - m) / v
m, v are calculated in line 92,
m, v = tf.nn.moments(z_pre_u, axes=[0])
which means that v is variance, not the standard deviation. So line 179 should be modified to: z_est_bn = (z_est[l] - m) / tf.sqrt(v + epsilon)
where epsilon is a small number.
Hope that helps.
To get a straight line decline, the learning rate update should be:
learning_rate = starter_learning_rate * ratio
Also, I don't think it is good practice to assign a tf.Variable() using =
. I think creating a learning rate placeholder and then setting the learning rate in the feed_dict for training is better.
Shouldn't the definitions for prediction cost and correct prediction (see current definitions below) be changed to only include labelled examples?
pred_cost = -tf.reduce_mean(tf.reduce_sum(outputs*tf.log(y), 1)) # cost used for prediction
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(outputs, 1)) # no of correct predictions
when I run the ladder.py, I get this error:
File "E:\DL_test\semi-supervised\ladder-master\input_data.py", line 158, in init
i = indices[y==c][:n_from_each_class]
TypeError: slice indices must be integers or None or have an index method
Could someone help me? many thanks!
Thanks very much for sharing your work. I was struggling to implement a version of this myself.
This code, however, doesn't match the outcomes in the paper and I think it is because of initializations like:
wi = lambda inits, name: inits * tf.Variable(tf.ones([size]), name=name)
a1 = wi(0., 'a1')
I think the in built tensorflow optimizations can only update the trainable tf.Variables so by multiplying the tf.Variable by 0 (effectively removing them from the optimization) a very different model is being fit.
You can see this by trying to duplicate the paper results by using the g_gauss
function in the decoder and seeing that the d_cost
outputs never change.
I think a fix is to move inits
into the initialization value when defining a new variable. This way the value of inits
will only impact the beginning value. Like this:
wi = lambda inits, name: tf.Variable(inits * tf.ones([size]), name=name)
Though training is still running the d_cost
values are updating and accuracy is much higher - 98.65% after 10 epochs.
I have a little question about the code:
At line 55:
ewma = tf.train.ExponentialMovingAverage(decay=0.99)
My question is:
I hope some help me to answer this question.
thanks a lot.
Hi there,
First of all thanks for this implementation, I have been making good use of it in my efforts to study the ladder network.
One thing I noticed, in line 130 of the current ladder.py, you have:
h = tf.nn.relu(z + weights["beta"][l-1])
shouldn't this be:
h = tf.nn.relu(weights['gamma'][l-1] * (z + weights["beta"][l-1]))
? (As per equation 10 here)
Regards,
Liam
Can you open a WIP PR on Tensorflow?
I think you are using noise_std > 0
to separate both clean vs corrupted path as well as training vs eval. This causes a problem because during evaluation batch norm mean and var should always be based on the training example averages while during training batch norm is meant to introduce regularization via noise by using the batch mean and var.
I changed the code so that update_batch_norm
only ran during training on the clean path and always normalized with the mean and var of the batch. Like this:
def update_batch_normalization(batch, mean, var, l):
if not mean or not var:
mean, var = tf.nn.moments(batch, axes=[0])
assign_mean = running_mean[l-1].assign(mean)
assign_var = running_var[l-1].assign(var)
bn_assigns.append(ewma.apply([running_mean[l-1], running_var[l-1]]))
with tf.control_dependencies([assign_mean, assign_var]):
return (batch - mean) / tf.sqrt(var + 1e-12)
I passed a boolean placeholder to the encoder to separate training loops from evaluation loops. Then inside the encoder I used batch_norm
to normalize by the running averages outside of training steps.
if training and noise_std == 0.0:
z = join(update_batch_normalization(z_pre_l, m_l, v_l, l), batch_normalization(z_pre_u, m, v))
elif training:
z = join(batch_normalization(z_pre_l, m_l, v_l), batch_normalization(z_pre_u, m, v))
else:
mean = ewma.average(running_mean[l-1])
var = ewma.average(running_var[l-1])
z = join(batch_normalization(z_pre_l, m_l, mean, var), batch_normalization(z_pre_u, mean, var))
This may still not be completely right since I was making all examples labeled examples. With this and the variable initialization fix I trained with 60k labeled examples down to 0.59% error.
Can you kindly help me running your code for CIFAR-10 data..
why we need to put this line in decoder in line 179??? after you apply batch normalization
it seems like you apply mean and v for output layer before and after g_gauss
u = batch_normalization(u)
z_est[l] = g_gauss(z_c, u, layer_sizes[l])
z_est_bn = (z_est[l] - m) / v
Hello, I have the following issue when running your code:
"Traceback (most recent call last):
File "/home/mblack/stage/ladder-master (1)/ladder.py", line 177, in
u = tf.matmul(z_est[l+1], weights['V'][l])
IndexError: list index out of range"
Can you please tell me what's going wrong?
I get the following error when running ladder.py. Please, advise on how to solve it.
=== Corrupted Encoder ===
Layer 1 : 784 -> 1000
Traceback (most recent call last):
File "ladder.py", line 138, in
y_c, corr = encoder(inputs, noise_std)
File "ladder.py", line 93, in encoder
m, v = tf.nn.moments(z_pre_u, axes=[0])
File "/Users/lfc009/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/nn.py", line 536, in moments
divisor *= x.get_shape()[d].value
TypeError: unsupported operand type(s) for *=: 'float' and 'NoneType'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.