jlindsey15 / ram Goto Github PK

Recurrent Visual Attention Model

Python 100.00%

ram's Introduction

A tensorflow implementation of the recurrent attention model

Some known issues with this implementation are discussed here

Intro to RAM

This is an implementation of the RAM (recurrent attention model) described in [1], using some code from the partial implementation found at [2]. Instead of processing all pixels of the image at once, this model focuses on a smaller glimpse window at each time step (the choice of glimpse location is learned). It integrates the information over time to output a classification prediction for the image. Once trained, it is more robust against the influence of translation than a standard ConvNet, demonstrated by [1].

For a more detailed description, please refer to the repo [wiki page] (https://github.com/QihongL/RAM/wiki)!

Run the RAM

To run the code, simply type python ram.py [simulation name] (say, in the terminal). The model parameters are described [here] (https://github.com/QihongL/RAM/wiki/Parameter-description) in our RAM wiki page. The input argument simulation name will be used to create folders to save the summary log file and images plotting the model's policy (I haven't finished this part yet...).

It should run if the directory structure is correctly specified. For example, there should be two folders called "summary" and "chckPts" in the project directory.

Some results

The ram.py implements the RAM. For the 60 X 60 translated MNIST, it converges at 6% error. Here's a comparison between the model with the value baseline prediction term (purple), and the model without the baseline term (blue). The plot shows the reward and cost over time. In this simulations, both model ends up with a similar error (6%), which is something that still needs to be understood...

If you find any errors in the code, please let us know. Thanks!

Prerequisites

Python 2.7 or Python 3.3+

Tensorflow

NumPy

Matplotlib

References:

[1] https://papers.nips.cc/paper/5542-recurrent-models-of-visual-attention.pdf

[2] https://github.com/seann999/tensorflow_mnist_ram

ram's People

Contributors

Stargazers

Watchers

Forkers

zergylord qihongl hhhmoan rainson mason2012 happyphonon ajaytalati lucasmahieu jtkim-kaist zhuxf0407 zhucqut mzschwartz88 fireae machanic showly xiaobaoer yipeng-sun huachunwang tianxingweichen iqbal-chowdhury chenerg shimazing keniuniu shubhampachori12110095 wbokun whyou5945 southatsouth jzhanglab blankworld youyouhuo pandinosaurus kaiqiao1992 fengwengg keen3986 anhngml hanimiao currentjobs xiaojie18 chaitusvk dyutimoy nd1511 zhouweiti lianglili xuexia7023 githubjoiner sherlock42 canalstar qniguoym lystahi normonisping kevinabott7897 praneeth1207

ram's Issues

***** Ram. Counting. Py,

https://github.com/jlindsey15/RAM/blob/master/oldScripts/ramcounting.py
}ckpt/"on-running"
}"opn"
import tensorflow as tf
import tf_mnist_loader
import matplotlib.pyplot as plt
import numpy as np
import time
import math
import random
from scipy import misc

dataset = tf_mnist_loader.read_data_sets("mnist_data")
save_dir = "save-3scales/"
save_prefix = "save"
start_step = 10000
#load_path = None
load_path = save_dir + save_prefix + str(start_step) + ".ckpt"

to enable visualization, set draw to True

eval_only = False
animate = True
draw = True

model parameters

minRadius = 120 # zooms -> minRadius * 2**<depth_level>
sensorBandwidth = 120 # fixed resolution of sensor
depth = 1 # number of zooms
channels = 1 # mnist are grayscale images
totalSensorBandwidth = depth * channels * (sensorBandwidth **2)

number of units

hg_size = 128 # glimpse
hl_size = 128 # location
g_size = 256 #
cell_size = 256 #
cell_out_size = cell_size #

glimpses = 1 # number of glimpses
n_classes = 10 # cardinality(Y)

batch_size = 10
max_iters = 1000000

mnist_size = [30, 120]

loc_sd = 0.1 # std when setting the location
mean_locs = [] #
sampled_locs = [] # ~N(mean_locs[.], loc_sd)
glimpse_images = [] # to show in window

set the weights to be small random values, with truncated normal distribution

def weight_variable(shape):
initial = tf.random_uniform(shape, minval=-0.1, maxval = 0.1)
return tf.Variable(initial)

get local glimpses

def glimpseSensor(img, normLoc):
loc = ((normLoc + 1) / 2) * mnist_size # normLoc coordinates are between -1 and 1
loc = tf.cast(loc, tf.int32)

img = tf.reshape(img, (batch_size, mnist_size[0], mnist_size[1], channels))

zooms = []

# process each image individually
for k in xrange(batch_size):
    imgZooms = []
    one_img = img[k,:,:,:]
    max_radius = minRadius * (2 ** (depth - 1)) 
    offset = 2 * max_radius

    # pad image with zeros
    one_img = tf.image.pad_to_bounding_box(one_img, offset, offset, \
        max_radius * 4 + mnist_size[0], max_radius * 4 + mnist_size[1])

    for i in xrange(depth):
        r = int(minRadius * (2 ** (i - 1)))

        d_raw = 2 * r
        d = tf.constant(d_raw, shape=[1])

        d = tf.tile(d, [2])

        loc_k = loc[k,:]
        adjusted_loc = offset + loc_k - r


        one_img2 = tf.reshape(one_img, (one_img.get_shape()[0].value,\
            one_img.get_shape()[1].value))

        # crop image to (d x d)

        print(d_raw)
        zoom = tf.slice(one_img2, adjusted_loc, d)

        # resize cropped image to (sensorBandwidth x sensorBandwidth)
        zoom = tf.image.resize_bilinear(tf.reshape(zoom, (1, d_raw, d_raw, 1)), (sensorBandwidth, sensorBandwidth))
        zoom = tf.reshape(zoom, (sensorBandwidth, sensorBandwidth))
        imgZooms.append(zoom)

    zooms.append(tf.stack(imgZooms))

zooms = tf.stack(zooms)

glimpse_images.append(zooms)

return zooms

implements the glimpse network

def get_glimpse(loc):
# get glimpse using the previous location
glimpse_input = glimpseSensor(inputs_placeholder, loc)
glimpse_input = tf.reshape(glimpse_input, (batch_size, totalSensorBandwidth))

# the hidden units that process location & the glimpse
l_hl = weight_variable((2, hl_size))
glimpse_hg = weight_variable((totalSensorBandwidth, hg_size))
hg = tf.nn.relu(tf.matmul(glimpse_input, glimpse_hg))
hl = tf.nn.relu(tf.matmul(loc, l_hl))

# the hidden units that integrates the location & the glimpses
hg_g = weight_variable((hg_size, g_size))
hl_g = weight_variable((hl_size, g_size))
g = tf.nn.relu(tf.matmul(hg, hg_g) + tf.matmul(hl, hl_g) )   # TODO linear layer in Mnih et al. (2014)!
g2 = tf.matmul(g, intrag)
return g2

def get_next_input(output, i):
# the next location is computed by the location network
mean_loc = tf.tanh(tf.matmul(output, h_l_out))
mean_locs.append(mean_loc)

sample_loc = tf.tanh(mean_loc + tf.random_normal(mean_loc.get_shape(), 0, loc_sd))
sample_loc = tf.zeros(mean_loc.get_shape())
sampled_locs.append(sample_loc)

return get_glimpse(sample_loc)

def model():
# initialize the location under unif[-1,1], for all example in the batch
initial_loc = tf.random_uniform((batch_size, 2), minval=-1, maxval=1)
initial_loc = tf.zeros(initial_loc.get_shape())

# get the glimpse using the glimpse network
initial_glimpse = get_glimpse(initial_loc)   

#
# lstm_cell = tf.nn.rnn_cell.LSTMCell(cell_size, g_size, num_proj=cell_out_size)
lstm_cell = tf.nn.rnn_cell.LSTMCell(cell_size, state_is_tuple = True, num_proj=cell_out_size)
initial_state = lstm_cell.zero_state(batch_size, tf.float32)

#
inputs = [initial_glimpse]
inputs.extend([0] * (glimpses - 1))

#
outputs, _ = tf.nn.seq2seq.rnn_decoder(inputs, initial_state, lstm_cell, loop_function=get_next_input)
# get the next location
get_next_input(outputs[-1], 0)

return outputs

def dense_to_one_hot(labels_dense, num_classes=10):
"""Convert class labels from scalars to one-hot vectors."""

copied from TensorFlow tutorial

num_labels = labels_dense.shape[0]
index_offset = np.arange(num_labels) * n_classes
labels_one_hot = np.zeros((num_labels, num_classes))
labels_one_hot.flat[index_offset + labels_dense.ravel()] = 1
return labels_one_hot

to use for maximum likelihood with glimpse location

def gaussian_pdf(mean, sample):
Z = 1.0 / (loc_sd * tf.sqrt(2.0 * math.pi))
a = -tf.square(sample - mean) / (2.0 * tf.square(loc_sd))
return Z * tf.exp(a)

def calc_reward(outputs):

outputs_tensor = tf.convert_to_tensor(outputs)
outputs_tensor = tf.transpose(outputs_tensor, perm=[1, 0, 2])
b_weights_batch = tf.tile(b_weights, [10, 1, 1])
b = tf.sigmoid(tf.matmul(outputs_tensor, b_weights_batch))
b = tf.concat(axis=2, values=[b, b])
b = tf.reshape(b, (batch_size, glimpses * 2))
print(b.get_shape())
# consider the action at the last time step
outputs = outputs[-1] # look at ONLY THE END of the sequence
outputs = tf.reshape(outputs, (batch_size, cell_out_size))

# the hidden layer for the action network
h_a_out = weight_variable((cell_out_size, n_classes))
# process its output
p_y = tf.nn.softmax(tf.matmul(outputs, h_a_out))
max_p_y = tf.arg_max(p_y, 1)
# the targets
correct_y = tf.cast(labels_placeholder, tf.int64)

# reward for all examples in the batch
R = tf.cast(tf.equal(max_p_y, correct_y), tf.float32)
reward = tf.reduce_mean(R) # mean reward

#
p_loc = gaussian_pdf(mean_locs, sampled_locs)
p_loc_orig = p_loc
p_loc = tf.reshape(p_loc, (batch_size, glimpses * 2))

print(R)
R = tf.reshape(R, (batch_size, 1))
R = tf.tile(R, [1, glimpses*2])
print(R)
# 1 means concatenate along the row direction
J = tf.concat(axis=1, values=[tf.log(p_y + 1e-5) * onehot_labels_placeholder, tf.log(p_loc + 1e-5) * R])
print(J)
# sum the probability of action and location
J = tf.reduce_sum(J, 1)
print(J)
# average over batch
J = tf.reduce_mean(J, 0)
print(J)
cost = -J
#cost = cost + tf.square(tf.reduce_mean(R - b))

# Adaptive Moment Estimation
# estimate the 1st and the 2nd moment of the gradients
global_step = tf.Variable(0, trainable=False)
lr = tf.train.exponential_decay(1e-3, global_step, 1000, 0.95, staircase=True)
optimizer = tf.train.AdamOptimizer(lr)
train_op = optimizer.minimize(cost)

return cost, reward, max_p_y, correct_y, train_op, b, tf.reduce_mean(b), tf.reduce_mean(R - b), p_loc_orig, p_loc

def evaluate(xtestdata, ytestdata):
#data = dataset.test
#batches_in_epoch = len(data._images) // batch_size

accuracy = 0

for i in xrange(2000 / batch_size):
    nextX, nextY = next_batch(xtestdata, ytestdata)
    feed_dict = {inputs_placeholder: nextX, labels_placeholder: nextY,
                 onehot_labels_placeholder: dense_to_one_hot(nextY)}
    r = sess.run(reward, feed_dict=feed_dict)
    accuracy += r

accuracy /= batches_in_epoch

print("ACCURACY: " + str(accuracy))

def next_batch(xdata, ydata):
xBatch = []
yBatch = []
for i in xrange(batch_size):
xBatch.append(xdata[i])
yBatch.append(ydata[i])
xdata = np.roll(xdata, batch_size)
ydata = np.roll(ydata, batch_size)

return np.array(xBatch), np.array(yBatch)

with tf.Graph().as_default():
# the y vector
labels = tf.placeholder("float32", shape=[batch_size, n_classes])
# the input x and yhat
inputs_placeholder = tf.placeholder(tf.float32, shape=(batch_size, mnist_size[0] * mnist_size[1]), name="images")
labels_placeholder = tf.placeholder(tf.float32, shape=(batch_size), name="labels")
onehot_labels_placeholder = tf.placeholder(tf.float32, shape=(batch_size, 10), name="oneHotLabels")
b_placeholder = tf.placeholder(tf.float32, shape=(batch_size, glimpses*2), name="b")

xdata = []
ydata = []
xtestdata = []
ytestdata = []

for i in xrange(1, 11):
    count = str(i).zfill(2)
    for j in xrange(1, 801):
        index = str(j).zfill(5)
        filename = 'counting_data/' + count + 'oneObj' + index + '.jpg'
        image = misc.imread(filename)
        image = np.reshape(image, (30*120))
        xdata.append(image)
        ydata.append(i - 1)
    for i in xrange(801, 1001):
        index = str(j).zfill(5)
        filename = 'counting_data/' + count + 'oneObj' + index + '.jpg'
        image = misc.imread(filename)
        image = np.reshape(image, (30*120))
        xtestdata.append(image)
        ytestdata.append(i - 1)

combined = zip(xdata, ydata)
random.shuffle(combined)

xdata[:], ydata[:] = zip(*combined)





#
h_l_out = tf.ones((cell_out_size, 2))
loc_mean = weight_variable((batch_size, glimpses, 2))
intrag = weight_variable((g_size, g_size))
b_weights = weight_variable((1, g_size, 1))
'''
bias_1 = weight_variable(())
bias_2 = weight_variable(())
bias_3 = weight_variable(())
bias_4 = weight_variable(())
bias_5 = weight_variable(())
bias_6 = weight_variable(())
bias_7 = weight_variable(())

'''
# query the model ouput
outputs = model()

# convert list of tensors to one big tensor
sampled_locs = tf.concat(axis=0, values=sampled_locs)
sampled_locs = tf.reshape(sampled_locs, (batch_size, glimpses, 2))
mean_locs = tf.concat(axis=0, values=mean_locs)
mean_locs = tf.reshape(mean_locs, (batch_size, glimpses, 2))
glimpse_images = tf.concat(axis=0, values=glimpse_images)

#
cost, reward, predicted_labels, correct_labels, train_op, b, avg_b, rminusb, p_loc_orig, p_loc = calc_reward(outputs)

tf.summary.scalar("reward", reward)
tf.summary.scalar("cost", cost)
summary_op = tf.summary.merge_all()

sess = tf.Session()
saver = tf.train.Saver()
b_fetched = np.zeros((batch_size, glimpses*2))

# ckpt = tf.train.get_checkpoint_state(save_dir)
# if load_path is not None and ckpt and ckpt.model_checkpoint_path:
#     try:
#         saver.restore(sess, load_path)
#         print("LOADED CHECKPOINT")
#     except:
#         print("FAILED TO LOAD CHECKPOINT")
#         exit()
# else:
init = tf.global_variables_initializer()
sess.run(init)

if eval_only:        
    evaluate(xtestdata, ytestdata)
else:
    summary_writer = tf.summary.FileWriter("summary", graph=sess.graph)

    if draw:
        fig = plt.figure()
        txt = fig.suptitle("-", fontsize=36, fontweight='bold') 
        plt.ion()
        plt.show()   
        plt.subplots_adjust(top=0.7)
        plotImgs = []

    # training
    for step in xrange(start_step + 1, max_iters):
        start_time = time.time()

        # get the next batch of examples
        #nextX, nextY = dataset.train.next_batch(batch_size)
        nextX, nextY = next_batch(xdata, ydata)
        feed_dict = {inputs_placeholder: nextX, labels_placeholder: nextY, onehot_labels_placeholder: dense_to_one_hot(nextY), b_placeholder: b_fetched}
        fetches = [train_op, cost, reward, predicted_labels, correct_labels, glimpse_images, b, avg_b, rminusb, p_loc_orig, p_loc, mean_locs]
        # feed them to the model
        results = sess.run(fetches, feed_dict=feed_dict)
        _, cost_fetched, reward_fetched, prediction_labels_fetched,\
            correct_labels_fetched, f_glimpse_images_fetched, b_fetched, avg_b_fetched, rminusb_fetched, p_loc_orig_fetched, p_loc_fetched, mean_locs_fetched = results

        duration = time.time() - start_time

        if step % 20 == 0:
            if step % 1000 == 0:
                saver.save(sess, save_dir + save_prefix + str(step) + ".ckpt")
                if step % 1000 == 0:
                    evaluate(xtestdata, ytestdata)


            ##### DRAW WINDOW ################

            f_glimpse_images = np.reshape(f_glimpse_images_fetched, (glimpses + 1, batch_size, depth, sensorBandwidth, sensorBandwidth)) #steps, THEN batch

            if draw:
                if animate:
                    fillList = False
                    if len(plotImgs) == 0:
                        fillList = True

                    # display first in mini-batch
                    for y in xrange(glimpses):
                        txt.set_text('FINAL PREDICTION: %i\nTRUTH: %i\nSTEP: %i/%i'
                            % (prediction_labels_fetched[0] + 1, correct_labels_fetched[0]+1, (y + 1), glimpses))

                        for x in xrange(depth):
                            plt.subplot(depth, 1, x + 1)
                            if fillList:
                                plotImg = plt.imshow(f_glimpse_images[y, 0, x], cmap=plt.get_cmap('gray'),
                                                     interpolation="nearest")
                                plotImg.autoscale()                                
                                plotImgs.append(plotImg)
                            else:
                                plotImgs[x].set_data(f_glimpse_images[y, 0, x])
                                plotImgs[x].autoscale()  

                        fillList = False

                        fig.canvas.draw()
                        time.sleep(0.1)
                        plt.pause(0.0001) 
                else:
                    txt.set_text('PREDICTION: %i\nTRUTH: %i' % (prediction_labels_fetched[0] + 1, correct_labels_fetched[0] + 1))
                    for x in xrange(depth):
                        for y in xrange(glimpses):
                            plt.subplot(depth, glimpses, x * glimpses + y + 1)
                            plt.imshow(f_glimpse_images[y, 0, x], cmap=plt.get_cmap('gray'),
                                       interpolation="nearest")

                    plt.draw()
                    time.sleep(0.05)
                    plt.pause(0.0001)  

            ################################

            print('Step %d: cost = %.5f reward = %.5f (%.3f sec) b = %.5f R-b = %.5f' % (step, cost_fetched, reward_fetched, duration, avg_b_fetched, rminusb_fetched))
            '''
            print('real b: ' )
            print(b_fetched)
            print('p_loc orig: ')
            print(p_loc_orig_fetched)
            print('p_loc: ')
            print(p_loc_fetched)
            '''
            summary_str = sess.run(summary_op, feed_dict=feed_dict)
            summary_writer.add_summary(summary_str, step)

sess.close("open")

Mobile phone ram enlarge

Copyright (C) 2015-2016 Willi Ye [email protected]
This file is part of Kernel Adiutor.
Kernel Adiutor is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
Kernel Adiutor is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with Kernel Adiutor. If not, see http://www.gnu.org/licenses/.

*/
package com.grarak.kerneladiutor.fragments.kernel;

import android.text.InputType;

import com.grarak.kerneladiutor.R;
import com.grarak.kerneladiutor.fragments.ApplyOnBootFragment;
import com.grarak.kerneladiutor.fragments.recyclerview.RecyclerViewFragment;
import com.grarak.kerneladiutor.utils.kernel.vm.VM;
import com.grarak.kerneladiutor.utils.kernel.vm.ZRAM;
import com.grarak.kerneladiutor.utils.kernel.vm.ZSwap;
import com.grarak.kerneladiutor.views.recyclerview.CardView;
import com.grarak.kerneladiutor.views.recyclerview.GenericSelectView;
import com.grarak.kerneladiutor.views.recyclerview.RecyclerViewItem;
import com.grarak.kerneladiutor.views.recyclerview.SeekBarView;
import com.grarak.kerneladiutor.views.recyclerview.SwitchView;
import com.grarak.kerneladiutor.views.recyclerview.TitleView;

import java.util.ArrayList;
import java.util.List;

/**

Created by willi on 29.06.16.
*/
public class VMFragment extends RecyclerViewFragment {

private List mVMs = new ArrayList<>();

@OverRide
protected void init() {
super.init();

 addViewPagerFragment(ApplyOnBootFragment.newInstance(this));

}

@OverRide
protected void addItems(List items) {
mVMs.clear();
for (int i = 0; i < VM.size(); i++) {
if (VM.exists(i)) {
GenericSelectView vm = new GenericSelectView();
vm.setSummary(VM.getName(i));
vm.setValue(VM.getValue(i));
vm.setValueRaw(vm.getValue());
vm.setInputType(InputType.TYPE_CLASS_NUMBER);

         final int position = i;
         vm.setOnGenericValueListener((genericSelectView, value) -> {
             VM.setValue(value, position, getActivity());
             genericSelectView.setValue(value);
             refreshVMs();
         });

         items.add(vm);
         mVMs.add(vm);
     }
 }

 if (RAM.supported()) {
     ramInit(items);
 }
 zswapInit(items);

}

private void zramInit(List items) {
TitleView zramTitle = new TitleView();
zramTitle.setText(getString(R.string.zram));
items.add(zramTitle);

 SeekBarView ram = new SeekBarView();
 ram.setTitle(getString(R.string.disksize));
 ram.setSummary(getString(R.string.disksize_summary));
 ram.setUnit(getString(R.string.mb));
 ram.setMax(400);
 ram.setOffset(16);
 ram.setProgress(ZRAM.getDisksize() / 16);
 ram.setOnSeekBarListener(new SeekBarView.OnSeekBarListener() {
     @Override
     public void onStop(SeekBarView seekBarView, int position, String value) {
         RAM.setDisksize(position * 16, getActivity());
     }

     @Override
     public void onMove(SeekBarView seekBarView, int position, String value) {
     }
 });

 items.add(ram);

}

private void zswapInit(List items) {
CardView zswapCard = new CardView();
zswapCard.setTitle(getString(R.string.zswap));

 if (ZSwap.hasEnable()) {
     SwitchView zswap = new SwitchView();
     zswap.setTitle(getString(R.string.zswap));
     zswap.setSummary(getString(R.string.zswap_summary));
     zswap.setChecked(ZSwap.isEnabled());
     zswap.addOnSwitchListener((switchView, isChecked)
             -> ZSwap.enable(isChecked, getActivity()));

     zswapCard.addItem(zswap);
 }

 if (ZSwap.hasMaxPoolPercent()) {
     SeekBarView maxPoolPercent = new SeekBarView();
     maxPoolPercent.setTitle(getString(R.string.memory_pool));
     maxPoolPercent.setSummary(getString(R.string.memory_pool_summary));
     maxPoolPercent.setUnit("%");
     maxPoolPercent.setMax(50);
     maxPoolPercent.setProgress(ZSwap.getMaxPoolPercent());
     maxPoolPercent.setOnSeekBarListener(new SeekBarView.OnSeekBarListener() {
         @Override
         public void onStop(SeekBarView seekBarView, int position, String value) {
             ZSwap.setMaxPoolPercent(position, getActivity());
         }

         @Override
         public void onMove(SeekBarView seekBarView, int position, String value) {
         }
     });

     zswapCard.addItem(maxPoolPercent);
 }

 if (ZSwap.hasMaxCompressionRatio()) {
     SeekBarView maxCompressionRatio = new SeekBarView();
     maxCompressionRatio.setTitle(getString(R.string.maximum_compression_ratio));
     maxCompressionRatio.setSummary(getString(R.string.maximum_compression_ratio_summary));
     maxCompressionRatio.setUnit("%");
     maxCompressionRatio.setProgress(ZSwap.getMaxCompressionRatio());
     maxCompressionRatio.setOnSeekBarListener(new SeekBarView.OnSeekBarListener() {
         @Override
         public void onStop(SeekBarView seekBarView, int position, String value) {
             ZSwap.setMaxCompressionRatio(position, getActivity());
         }

         @Override
         public void onMove(SeekBarView seekBarView, int position, String value) {
         }
     });

     zswapCard.addItem(maxCompressionRatio);
 }

 if (zswapCard.size() > 0) {
     items.add(zswapCard);
 }

}

private void refreshVMs() {
getHandler().postDelayed(() -> {
for (int i = 0; i < mVMs.size(); i++) {
mVMs.get(i).setValue(VM.getValue(i));
mVMs.get(i).setValueRaw(mVMs.get(i).getValue());
}
}, 250);/ , 255);/
}

}

Ram+enlargement

xrange = range

dataset = tf_mnist_loader.read_data_sets("mnist_data")
save_dir = "chckPts/"
save_prefix = "save"
summaryFolderName = "summary/"

if len(sys.argv) == 2:
simulationName = str(sys.argv[1])
print("Simulation name = " + simulationName)
summaryFolderName = summaryFolderName + simulationName + "/"
saveImgs = True
imgsFolderName = "imgs/" + simulationName + "/"
if os.path.isdir(summaryFolderName) == False:
os.mkdir(summaryFolderName)
# if os.path.isdir(imgsFolderName) == False:
# os.mkdir(imgsFolderName)
else:
saveImgs = False
print("Testing... image files will not be saved.")

start_step = 0
#load_path = None
load_path = save_dir + save_prefix + str(start_step) + ".ckpt"

to enable visualization, set draw to True

eval_only = False
draw = False
animate = False

conditions

translateMnist = 1
eyeCentered = 0

preTraining = 0
preTraining_epoch = 20000
drawReconsturction = 0

about translation

MNIST_SIZE = 28
translated_img_size = 60 # side length of the picture

fixed_learning_rate = 0.001

if translateMnist:
print("TRANSLATED MNIST")
img_size = translated_img_size
depth = 3 # number of zooms
sensorBandwidth = 12
minRadius = 8 # zooms -> minRadius * 2**<depth_level>

initLr = 1e-3
lr_min = 1e-4
lrDecayRate = .999
lrDecayFreq = 200
momentumValue = .9
batch_size = 64

else:
print("CENTERED MNIST")
img_size = MNIST_SIZE
depth = 1 # number of zooms
sensorBandwidth = 8
minRadius = 4 # zooms -> minRadius * 2**<depth_level>

initLr = 1e-3
lrDecayRate = .99
lrDecayFreq = 200
momentumValue = .9
batch_size = 20

model parameters

channels = 1 # mnist are grayscale images
totalSensorBandwidth = depth * channels * (sensorBandwidth **2)
nGlimpses = 6 # number of glimpses
loc_sd = 0.22 # std when setting the location

network units

hg_size = 128 #
hl_size = 128 #
g_size = 256 #
cell_size = 256 #
cell_out_size = cell_size #

paramters about the training examples

n_classes = 10 # card(Y)

training parameters

max_iters = 1000000
SMALL_NUM = 1e-10

resource prellocation

mean_locs = [] # expectation of locations
sampled_locs = [] # sampled locations ~N(mean_locs[.], loc_sd)
baselines = [] # baseline, the value prediction
glimpse_images = [] # to show in window

set the weights to be small random values, with truncated normal distribution

def weight_variable(shape, myname, train):
initial = tf.random_uniform(shape, minval=-0.1, maxval = 0.1)
return tf.Variable(initial, name=myname, trainable=train)

get local glimpses

def glimpseSensor(img, normLoc):
loc = tf.round(((normLoc + 1) / 2.0) * img_size) # normLoc coordinates are between -1 and 1
loc = tf.cast(loc, tf.int32)

img = tf.reshape(img, (batch_size, img_size, img_size, channels))

# process each image individually
zooms = []
for k in range(batch_size):
    imgZooms = []
    one_img = img[k,:,:,:]
    max_radius = minRadius * (2 ** (depth - 1))
    offset = 2 * max_radius

    # pad image with zeros
    one_img = tf.image.pad_to_bounding_box(one_img, offset, offset, \
                                           max_radius * 4 + img_size, max_radius * 4 + img_size)

    for i in range(depth):
        r = int(minRadius * (2 ** (i)))

        d_raw = 2 * r
        d = tf.constant(d_raw, shape=[1])
        d = tf.tile(d, [2])
        loc_k = loc[k,:]
        adjusted_loc = offset + loc_k - r
        one_img2 = tf.reshape(one_img, (one_img.get_shape()[0].value, one_img.get_shape()[1].value))

        # crop image to (d x d)
        zoom = tf.slice(one_img2, adjusted_loc, d)

        # resize cropped image to (sensorBandwidth x sensorBandwidth)
        zoom = tf.image.resize_bilinear(tf.reshape(zoom, (1, d_raw, d_raw, 1)), (sensorBandwidth, sensorBandwidth))
        zoom = tf.reshape(zoom, (sensorBandwidth, sensorBandwidth))
        imgZooms.append(zoom)

    zooms.append(tf.stack(imgZooms))

zooms = tf.stack(zooms)

glimpse_images.append(zooms)

return zooms

implements the input network

def get_glimpse(loc):
# get input using the previous location
glimpse_input = glimpseSensor(inputs_placeholder, loc)
glimpse_input = tf.reshape(glimpse_input, (batch_size, totalSensorBandwidth))

# the hidden units that process location & the input
act_glimpse_hidden = tf.nn.relu(tf.matmul(glimpse_input, Wg_g_h) + Bg_g_h)
act_loc_hidden = tf.nn.relu(tf.matmul(loc, Wg_l_h) + Bg_l_h)

# the hidden units that integrates the location & the glimpses
glimpseFeature1 = tf.nn.relu(tf.matmul(act_glimpse_hidden, Wg_hg_gf1) + tf.matmul(act_loc_hidden, Wg_hl_gf1) + Bg_hlhg_gf1)
# return g
# glimpseFeature2 = tf.matmul(glimpseFeature1, Wg_gf1_gf2) + Bg_gf1_gf2
return glimpseFeature1

def get_next_input(output):
# the next location is computed by the location network
core_net_out = tf.stop_gradient(output)

# baseline = tf.sigmoid(tf.matmul(core_net_out, Wb_h_b) + Bb_h_b)
baseline = tf.sigmoid(tf.matmul(core_net_out, Wb_h_b) + Bb_h_b)
baselines.append(baseline)

# compute the next location, then impose noise
if eyeCentered:
    # add the last sampled glimpse location
    # TODO max(-1, min(1, u + N(output, sigma) + prevLoc))
    mean_loc = tf.maximum(-1.0, tf.minimum(1.0, tf.matmul(core_net_out, Wl_h_l) + sampled_locs[-1] ))
else:
    # mean_loc = tf.clip_by_value(tf.matmul(core_net_out, Wl_h_l) + Bl_h_l, -1, 1)
    mean_loc = tf.matmul(core_net_out, Wl_h_l) + Bl_h_l
    mean_loc = tf.clip_by_value(mean_loc, -1, 1)
# mean_loc = tf.stop_gradient(mean_loc)
mean_locs.append(mean_loc)

# add noise
# sample_loc = tf.tanh(mean_loc + tf.random_normal(mean_loc.get_shape(), 0, loc_sd))
sample_loc = tf.maximum(-1.0, tf.minimum(1.0, mean_loc + tf.random_normal(mean_loc.get_shape(), 0, loc_sd)))

# don't propagate throught the locations
sample_loc = tf.stop_gradient(sample_loc)
sampled_locs.append(sample_loc)

return get_glimpse(sample_loc)

def affineTransform(x,output_dim):
"""
affine transformation Wx+b
assumes x.shape = (batch_size, num_features)
"""
w=tf.get_variable("w", [x.get_shape()[1], output_dim])
b=tf.get_variable("b", [output_dim], initializer=tf.constant_initializer(0.0))
return tf.matmul(x,w)+b

def model():

# initialize the location under unif[-1,1], for all example in the batch
initial_loc = tf.random_uniform((batch_size, 2), minval=-1, maxval=1)
mean_locs.append(initial_loc)

# initial_loc = tf.tanh(initial_loc + tf.random_normal(initial_loc.get_shape(), 0, loc_sd))
initial_loc = tf.clip_by_value(initial_loc + tf.random_normal(initial_loc.get_shape(), 0, loc_sd), -1, 1)

sampled_locs.append(initial_loc)

# get the input using the input network
initial_glimpse = get_glimpse(initial_loc)

# set up the recurrent structure
inputs = [0] * nGlimpses
outputs = [0] * nGlimpses
glimpse = initial_glimpse
REUSE = None
for t in range(nGlimpses):
    if t == 0:  # initialize the hidden state to be the zero vector
        hiddenState_prev = tf.zeros((batch_size, cell_size))
    else:
        hiddenState_prev = outputs[t-1]

    # forward prop
    with tf.variable_scope("coreNetwork", reuse=REUSE):
        # the next hidden state is a function of the previous hidden state and the current glimpse
        hiddenState = tf.nn.relu(affineTransform(hiddenState_prev, cell_size) + (tf.matmul(glimpse, Wc_g_h) + Bc_g_h))

    # save the current glimpse and the hidden state
    inputs[t] = glimpse
    outputs[t] = hiddenState
    # get the next input glimpse
    if t != nGlimpses -1:
        glimpse = get_next_input(hiddenState)
    else:
        first_hiddenState = tf.stop_gradient(hiddenState)
        # baseline = tf.sigmoid(tf.matmul(first_hiddenState, Wb_h_b) + Bb_h_b)
        baseline = tf.sigmoid(tf.matmul(first_hiddenState, Wb_h_b) + Bb_h_b)
        baselines.append(baseline)
    REUSE = True  # share variables for later recurrence

return outputs

def dense_to_one_hot(labels_dense, num_classes=10):
"""Convert class labels from scalars to one-hot vectors."""
# copied from TensorFlow tutorial
num_labels = labels_dense.shape[0]
index_offset = np.arange(num_labels) * num_classes
labels_one_hot = np.zeros((num_labels, num_classes))
labels_one_hot.flat[index_offset + labels_dense.ravel()] = 1
return labels_one_hot

to use for maximum likelihood with input location

def gaussian_pdf(mean, sample):
Z = 1.0 / (loc_sd * tf.sqrt(2.0 * np.pi))
a = -tf.square(sample - mean) / (2.0 * tf.square(loc_sd))
return Z * tf.exp(a)

def calc_reward(outputs):

# consider the action at the last time step
outputs = outputs[-1] # look at ONLY THE END of the sequence
outputs = tf.reshape(outputs, (batch_size, cell_out_size))

# get the baseline
b = tf.stack(baselines)
b = tf.concat(axis=2, values=[b, b])
b = tf.reshape(b, (batch_size, (nGlimpses) * 2))
no_grad_b = tf.stop_gradient(b)

# get the action(classification)
p_y = tf.nn.softmax(tf.matmul(outputs, Wa_h_a) + Ba_h_a)
max_p_y = tf.arg_max(p_y, 1)
correct_y = tf.cast(labels_placeholder, tf.int64)

# reward for all examples in the batch
R = tf.cast(tf.equal(max_p_y, correct_y), tf.float32)
reward = tf.reduce_mean(R) # mean reward
R = tf.reshape(R, (batch_size, 1))
R = tf.tile(R, [1, (nGlimpses)*2])

# get the location

p_loc = gaussian_pdf(mean_locs, sampled_locs)
# p_loc = tf.tanh(p_loc)

p_loc_orig = p_loc
p_loc = tf.reshape(p_loc, (batch_size, (nGlimpses) * 2))

# define the cost function
J = tf.concat(axis=1, values=[tf.log(p_y + SMALL_NUM) * (onehot_labels_placeholder), tf.log(p_loc + SMALL_NUM) * (R - no_grad_b)])
J = tf.reduce_sum(J, 1)
J = J - tf.reduce_sum(tf.square(R - b), 1)
J = tf.reduce_mean(J, 0)
cost = -J
var_list = tf.trainable_variables()
grads = tf.gradients(cost, var_list)
grads, _ = tf.clip_by_global_norm(grads, 0.5)
# define the optimizer
# lr_max = tf.maximum(lr, lr_min)
optimizer = tf.train.AdamOptimizer(lr)
# optimizer = tf.train.MomentumOptimizer(lr, momentumValue)
# train_op = optimizer.minimize(cost, global_step)
train_op = optimizer.apply_gradients(zip(grads, var_list), global_step=global_step)

return cost, reward, max_p_y, correct_y, train_op, b, tf.reduce_mean(b), tf.reduce_mean(R - b), lr

def preTrain(outputs):
lr_r = 1e-3
# consider the action at the last time step
outputs = outputs[-1] # look at ONLY THE END of the sequence
outputs = tf.reshape(outputs, (batch_size, cell_out_size))
# if preTraining:
reconstruction = tf.sigmoid(tf.matmul(outputs, Wr_h_r) + Br_h_r)
reconstructionCost = tf.reduce_mean(tf.square(inputs_placeholder - reconstruction))

train_op_r = tf.train.RMSPropOptimizer(lr_r).minimize(reconstructionCost)
return reconstructionCost, reconstruction, train_op_r

def evaluate():
data = dataset.test
batches_in_epoch = len(data._images) // batch_size
accuracy = 0

for i in range(batches_in_epoch):
    nextX, nextY = dataset.test.next_batch(batch_size)
    if translateMnist:
        nextX, _ = convertTranslated(nextX, MNIST_SIZE, img_size)
    feed_dict = {inputs_placeholder: nextX, labels_placeholder: nextY,
                 onehot_labels_placeholder: dense_to_one_hot(nextY)}
    r = sess.run(reward, feed_dict=feed_dict)
    accuracy += r

accuracy /= batches_in_epoch
print(("ACCURACY: " + str(accuracy)))

def convertTranslated(images, initImgSize, finalImgSize):
size_diff = finalImgSize - initImgSize
newimages = np.zeros([batch_size, finalImgSizefinalImgSize])
imgCoord = np.zeros([batch_size,2])
for k in range(batch_size):
image = images[k, :]
image = np.reshape(image, (initImgSize, initImgSize))
# generate and save random coordinates
randX = random.randint(0, size_diff)
randY = random.randint(0, size_diff)
imgCoord[k,:] = np.array([randX, randY])
# padding
image = np.lib.pad(image, ((randX, size_diff - randX), (randY, size_diff - randY)), 'constant', constant_values = (0))
newimages[k, :] = np.reshape(image, (finalImgSizefinalImgSize))

return newimages, imgCoord

def toMnistCoordinates(coordinate_tanh):
'''
Transform coordinate in [-1,1] to mnist
:param coordinate_tanh: vector in [-1,1] x [-1,1]
:return: vector in the corresponding mnist coordinate
'''
return np.round(((coordinate_tanh + 1) / 2.0) * img_size)

def variable_summaries(var, name):
"""Attach a lot of summaries to a Tensor."""
with tf.name_scope('param_summaries'):
mean = tf.reduce_mean(var)
tf.summary.scalar('param_mean/' + name, mean)
with tf.name_scope('param_stddev'):
stddev = tf.sqrt(tf.reduce_sum(tf.square(var - mean)))
tf.summary.scalar('param_sttdev/' + name, stddev)
tf.summary.scalar('param_max/' + name, tf.reduce_max(var))
tf.summary.scalar('param_min/' + name, tf.reduce_min(var))
tf.summary.histogram(name, var)

def plotWholeImg(img, img_size, sampled_locs_fetched):
plt.imshow(np.reshape(img, [img_size, img_size]),
cmap=plt.get_cmap('gray'), interpolation="nearest")

plt.ylim((img_size - 1, 0))
plt.xlim((0, img_size - 1))

# transform the coordinate to mnist map
sampled_locs_mnist_fetched = toMnistCoordinates(sampled_locs_fetched)
# visualize the trace of successive nGlimpses (note that x and y coordinates are "flipped")
plt.plot(sampled_locs_mnist_fetched[0, :, 1], sampled_locs_mnist_fetched[0, :, 0], '-o',
         color='lawngreen')
plt.plot(sampled_locs_mnist_fetched[0, -1, 1], sampled_locs_mnist_fetched[0, -1, 0], 'o',
         color='red')

with tf.device('/gpu:1'):

with tf.Graph().as_default():

    # set the learning rate
    global_step = tf.Variable(0, trainable=False)
    lr = tf.train.exponential_decay(initLr, global_step, lrDecayFreq, lrDecayRate, staircase=True)

    # preallocate x, y, baseline
    labels = tf.placeholder("float32", shape=[batch_size, n_classes])
    labels_placeholder = tf.placeholder(tf.float32, shape=(batch_size), name="labels_raw")
    onehot_labels_placeholder = tf.placeholder(tf.float32, shape=(batch_size, 10), name="labels_onehot")
    inputs_placeholder = tf.placeholder(tf.float32, shape=(batch_size, img_size * img_size), name="images")

    # declare the model parameters, here're naming rule:
    # the 1st captical letter: weights or bias (W = weights, B = bias)
    # the 2nd lowercase letter: the network (e.g.: g = glimpse network)
    # the 3rd and 4th letter(s): input-output mapping, which is clearly written in the variable name argument

    Wg_l_h = weight_variable((2, hl_size), "glimpseNet_wts_location_hidden", True)
    Bg_l_h = weight_variable((1,hl_size), "glimpseNet_bias_location_hidden", True)

    Wg_g_h = weight_variable((totalSensorBandwidth, hg_size), "glimpseNet_wts_glimpse_hidden", True)
    Bg_g_h = weight_variable((1,hg_size), "glimpseNet_bias_glimpse_hidden", True)

    Wg_hg_gf1 = weight_variable((hg_size, g_size), "glimpseNet_wts_hiddenGlimpse_glimpseFeature1", True)
    Wg_hl_gf1 = weight_variable((hl_size, g_size), "glimpseNet_wts_hiddenLocation_glimpseFeature1", True)
    Bg_hlhg_gf1 = weight_variable((1,g_size), "glimpseNet_bias_hGlimpse_hLocs_glimpseFeature1", True)

    Wc_g_h = weight_variable((cell_size, g_size), "coreNet_wts_glimpse_hidden", True)
    Bc_g_h = weight_variable((1,g_size), "coreNet_bias_glimpse_hidden", True)

    Wr_h_r = weight_variable((cell_out_size, img_size**2), "reconstructionNet_wts_hidden_action", True)
    Br_h_r = weight_variable((1, img_size**2), "reconstructionNet_bias_hidden_action", True)

    Wb_h_b = weight_variable((g_size, 1), "baselineNet_wts_hiddenState_baseline", True)
    Bb_h_b = weight_variable((1,1), "baselineNet_bias_hiddenState_baseline", True)

    Wl_h_l = weight_variable((cell_out_size, 2), "locationNet_wts_hidden_location", True)
    Bl_h_l = weight_variable((1, 2), "locationNet_bias_hidden_location", True)

    Wa_h_a = weight_variable((cell_out_size, n_classes), "actionNet_wts_hidden_action", True)
    Ba_h_a = weight_variable((1,n_classes),  "actionNet_bias_hidden_action", True)

    # query the model ouput
    outputs = model()

    # convert list of tensors to one big tensor
    sampled_locs = tf.concat(axis=0, values=sampled_locs)
    sampled_locs = tf.reshape(sampled_locs, (nGlimpses, batch_size, 2))
    sampled_locs = tf.transpose(sampled_locs, [1, 0, 2])
    mean_locs = tf.concat(axis=0, values=mean_locs)
    mean_locs = tf.reshape(mean_locs, (nGlimpses, batch_size, 2))
    mean_locs = tf.transpose(mean_locs, [1, 0, 2])
    glimpse_images = tf.concat(axis=0, values=glimpse_images)

    # compute the reward
    reconstructionCost, reconstruction, train_op_r = preTrain(outputs)
    cost, reward, predicted_labels, correct_labels, train_op, b, avg_b, rminusb, lr = calc_reward(outputs)

    # tensorboard visualization for the parameters
    variable_summaries(Wg_l_h, "glimpseNet_wts_location_hidden")
    variable_summaries(Bg_l_h, "glimpseNet_bias_location_hidden")
    variable_summaries(Wg_g_h, "glimpseNet_wts_glimpse_hidden")
    variable_summaries(Bg_g_h, "glimpseNet_bias_glimpse_hidden")
    variable_summaries(Wg_hg_gf1, "glimpseNet_wts_hiddenGlimpse_glimpseFeature1")
    variable_summaries(Wg_hl_gf1, "glimpseNet_wts_hiddenLocation_glimpseFeature1")
    variable_summaries(Bg_hlhg_gf1, "glimpseNet_bias_hGlimpse_hLocs_glimpseFeature1")

    variable_summaries(Wc_g_h, "coreNet_wts_glimpse_hidden")
    variable_summaries(Bc_g_h, "coreNet_bias_glimpse_hidden")

    variable_summaries(Wb_h_b, "baselineNet_wts_hiddenState_baseline")
    variable_summaries(Bb_h_b, "baselineNet_bias_hiddenState_baseline")

    variable_summaries(Wl_h_l, "locationNet_wts_hidden_location")

    variable_summaries(Wa_h_a, 'actionNet_wts_hidden_action')
    variable_summaries(Ba_h_a, 'actionNet_bias_hidden_action')

    # tensorboard visualization for the performance metrics
    tf.summary.scalar("reconstructionCost", reconstructionCost)
    tf.summary.scalar("reward", reward)
    tf.summary.scalar("cost", cost)
    tf.summary.scalar("mean(b)", avg_b)
    tf.summary.scalar("mean(R - b)", rminusb)
    summary_op = tf.summary.merge_all()


    ####################################### START RUNNING THE MODEL #######################################

    sess_config = tf.ConfigProto(allow_soft_placement=True, log_device_placement=False)
    sess_config.gpu_options.allow_growth = True
    sess = tf.Session(config=sess_config)

    saver = tf.train.Saver()
    b_fetched = np.zeros((batch_size, (nGlimpses)*2))

    init = tf.global_variables_initializer()
    sess.run(init)

    if eval_only:
        evaluate()
    else:
        summary_writer = tf.summary.FileWriter(summaryFolderName, graph=sess.graph)

        if draw:
            fig = plt.figure(1)
            txt = fig.suptitle("-", fontsize=36, fontweight='bold')
            plt.ion()
            plt.show()
            plt.subplots_adjust(top=0.7)
            plotImgs = []

        if drawReconsturction:
            fig = plt.figure(2)
            txt = fig.suptitle("-", fontsize=36, fontweight='bold')
            plt.ion()
            plt.show()

        if preTraining:
            for epoch_r in range(1,preTraining_epoch):
                nextX, _ = dataset.train.next_batch(batch_size)
                nextX_orig = nextX
                if translateMnist:
                    nextX, _ = convertTranslated(nextX, MNIST_SIZE, img_size)

                fetches_r = [reconstructionCost, reconstruction, train_op_r]

                reconstructionCost_fetched, reconstruction_fetched, train_op_r_fetched = sess.run(fetches_r, feed_dict={inputs_placeholder: nextX})

                if epoch_r % 20 == 0:
                    print(('Step %d: reconstructionCost = %.5f' % (epoch_r, reconstructionCost_fetched)))
                    if epoch_r % 100 == 0:
                        if drawReconsturction:
                            fig = plt.figure(2)

                            plt.subplot(1, 2, 1)
                            plt.imshow(np.reshape(nextX[0, :], [img_size, img_size]),
                                       cmap=plt.get_cmap('gray'), interpolation="nearest")
                            plt.ylim((img_size - 1, 0))
                            plt.xlim((0, img_size - 1))

                            plt.subplot(1, 2, 2)
                            plt.imshow(np.reshape(reconstruction_fetched[0, :], [img_size, img_size]),
                                       cmap=plt.get_cmap('gray'), interpolation="nearest")
                            plt.ylim((img_size - 1, 0))
                            plt.xlim((0, img_size - 1))
                            plt.draw()
                            plt.pause(0.0001)
                            # plt.show()


        # training
        for epoch in range(start_step + 1, max_iters):
            start_time = time.time()

            # get the next batch of examples
            nextX, nextY = dataset.train.next_batch(batch_size)
            nextX_orig = nextX
            if translateMnist:
                nextX, nextX_coord = convertTranslated(nextX, MNIST_SIZE, img_size)

            feed_dict = {inputs_placeholder: nextX, labels_placeholder: nextY, \
                         onehot_labels_placeholder: dense_to_one_hot(nextY)}

            fetches = [train_op, cost, reward, predicted_labels, correct_labels, glimpse_images, avg_b, rminusb, \
                       mean_locs, sampled_locs, lr]
            # feed them to the model
            results = sess.run(fetches, feed_dict=feed_dict)

            _, cost_fetched, reward_fetched, prediction_labels_fetched, correct_labels_fetched, glimpse_images_fetched, \
            avg_b_fetched, rminusb_fetched, mean_locs_fetched, sampled_locs_fetched, lr_fetched = results


            duration = time.time() - start_time

            if epoch % 20 == 0:
                print(('Step %d: cost = %.5f reward = %.5f (%.3f sec) b = %.5f R-b = %.5f, LR = %.5f'
                      % (epoch, cost_fetched, reward_fetched, duration, avg_b_fetched, rminusb_fetched, lr_fetched)))
                summary_str = sess.run(summary_op, feed_dict=feed_dict)
                summary_writer.add_summary(summary_str, epoch)
                # if saveImgs:
                #     plt.savefig(imgsFolderName + simulationName + '_ep%.6d.png' % (epoch))

                if epoch % 5000 == 0:
                    saver.save(sess, save_dir + save_prefix + str(epoch) + ".ckpt")
                    evaluate()

                ##### DRAW WINDOW ################
                f_glimpse_images = np.reshape(glimpse_images_fetched, \
                                              (nGlimpses, batch_size, depth, sensorBandwidth, sensorBandwidth))

                if draw:
                    if animate:
                        fillList = False
                        if len(plotImgs) == 0:
                            fillList = True

                        # display the first image in the in mini-batch
                        nCols = depth+1
                        plt.subplot2grid((depth, nCols), (0, 1), rowspan=depth, colspan=depth)
                        # display the entire image
                        plotWholeImg(nextX[0, :], img_size, sampled_locs_fetched)

                        # display the glimpses
                        for y in range(nGlimpses):
                            txt.set_text('Epoch: %.6d \nPrediction: %i -- Truth: %i\nStep: %i/%i'
                                         % (epoch, prediction_labels_fetched[0], correct_labels_fetched[0], (y + 1), nGlimpses))

                            for x in range(depth):
                                plt.subplot(depth, nCols, 1 + nCols * x)
                                if fillList:
                                    plotImg = plt.imshow(f_glimpse_images[y, 0, x], cmap=plt.get_cmap('gray'),
                                                         interpolation="nearest")
                                    plotImg.autoscale()
                                    plotImgs.append(plotImg)
                                else:
                                    plotImgs[x].set_data(f_glimpse_images[y, 0, x])
                                    plotImgs[x].autoscale()
                            fillList = False

                            # fig.canvas.draw()
                            time.sleep(0.1)
                            plt.pause(0.00005)

                    else:
                        txt.set_text('PREDICTION: %i\nTRUTH: %i' % (prediction_labels_fetched[0], correct_labels_fetched[0]))
                        for x in range(depth):
                            for y in range(nGlimpses):
                                plt.subplot(depth, nGlimpses, x * nGlimpses + y + 1)
                                plt.imshow(f_glimpse_images[y, 0, x], cmap=plt.get_cmap('gray'), interpolation="nearest")

Action="{runs}"
="on"

question for the gradient and pertrain

at first,in function calc_reward, when you calc the J,you use p_loc made by mean_locs and sample_locs, but both the mean_locs and the sample_locs are stop_gradients. so I think tf.log(p_loc + SMALL_NUM) * (R - no_grad_b) is no use when calc the gradients.
and why this need to use pretrain.but in paper,i never found this method.

thanks for you release your code,can you solve my doubts, and have you finish this experiment in translate clutter mnist data 100 * 100. if you have,please @me. thanks.

Api.js

https://github.com/symbiosdotwiki/disks/blob/main/public/api.js

}>api.js>"/
{
"name": "disks",
"version": "4.1.2",
"description": "A bunch of disks that are compatible with the "whistlegraph/system" repo.",
"main": "blank.js",
"scripts": {
"dev": "http-server public -S -C ssl-dev/server.crt -K ssl-dev/server.key -p 8081 --cors"
},
"repository": {
"type": "git"," ramgb",
"url": "git+https://github.com/whistlegraph/disks.git"
},
"author": "Jeffrey Alan Scudder",
"license": "ISC",
"bugs": {
"url": "https://github.com/whistlegraph/disks/issues"
},
"homepage": "https://github.com/whistlegraph/disks#readme"
}
console.log("API LOADED");

// TODO: How to best publish the API for myself?

// Make a file here that gets loaded by a disk but then replaced in start
// with all the actual functions!
// export { sound: "Sound", pen, screen, color, clear, Form, TRIANGLE, camera,fomations, docs, devram, [email protected],kernelpacth-1.4.2vers};
Load-ups,) :

Is this code can be a "online learning model"?

Hi, @jlindsey15 @qihongl @LucasMahieu @jtkim-kaist

Is this code can be a "online learning model"?

I mean ... In reference time, retraining is implemented using additional data , repeatedly.
So, accuracy is better and better repeatedly ...

Is it possible?

Question about calculating R

Thank you very much for releasing the code , it helps me a lot.
I read carefully about the code , and have some questions in the function calc_reward.
1 in line 289 , you average the rewards in a minibatch, so for each sample in this batch, the reward will not be 0 or 1, but a value between 0 and 1. I am curious why do you this.
2 in line 291, you scale the shape of reward to (batchsize, 2*nGlimpses), why should not the shape of reward be (batchsize, nGlimpses), what is the motivation of multiplying a factor of 2?

Why do pad_to_bounding_box in ram.py?

As the title said, I found there is a tf.image.pad_to_bounding_box to get a zero padding.However I did not found the process in the paper. What's that mean?Actually, I also want to know the meaning of max_radius
Thank you.

Is the stop_gradient problem solved?

Hello!

I wanna confirm that the stop gradient problem was solved or not.

I found that there is some intrinsic mathmatical problem in your implementation and after I fix that
found part, It works (with location network training and the fixed implementation finally gets the high accuracy) so I wanna share both mathmatical things and implementation with you.

Please contact to me - [email protected]

Thx!

jlindsey15 / ram Goto Github PK

ram's Introduction

A tensorflow implementation of the recurrent attention model

Some known issues with this implementation are discussed here

Intro to RAM

Run the RAM

Some results

Prerequisites

References:

ram's People

Contributors

Stargazers

Watchers

Forkers

ram's Issues

to enable visualization, set draw to True

model parameters

number of units

set the weights to be small random values, with truncated normal distribution

get local glimpses

implements the glimpse network

copied from TensorFlow tutorial

to use for maximum likelihood with glimpse location

to enable visualization, set draw to True

conditions

about translation

model parameters

network units

paramters about the training examples

training parameters

resource prellocation

set the weights to be small random values, with truncated normal distribution

get local glimpses

implements the input network

to use for maximum likelihood with input location

Recommend Projects

Recommend Topics

Recommend Org