yunjey / show-attend-and-tell Goto Github PK

TensorFlow Implementation of "Show, Attend and Tell"

License: MIT License

Jupyter Notebook 99.23% Python 0.76% Shell 0.01%

tensorflow image-captioning show-attend-and-tell attention-mechanism mscoco-image-dataset

show-attend-and-tell's Introduction

Show, Attend and Tell

Update (December 2, 2016) TensorFlow implementation of Show, Attend and Tell: Neural Image Caption Generation with Visual Attention which introduces an attention based image caption generator. The model changes its attention to the relevant part of the image while it generates each word.

References

Author's theano code: https://github.com/kelvinxu/arctic-captions

Another tensorflow implementation: https://github.com/jazzsaxmafia/show_attend_and_tell.tensorflow

Getting Started

Prerequisites

First, clone this repo and pycocoevalcap in same directory.

$ git clone https://github.com/yunjey/show-attend-and-tell-tensorflow.git
$ git clone https://github.com/tylin/coco-caption.git

This code is written in Python2.7 and requires TensorFlow 1.2. In addition, you need to install a few more packages to process MSCOCO data set. I have provided a script to download the MSCOCO image dataset and VGGNet19 model. Downloading the data may take several hours depending on the network speed. Run commands below then the images will be downloaded in image/ directory and VGGNet19 model will be downloaded in data/ directory.

$ cd show-attend-and-tell-tensorflow
$ pip install -r requirements.txt
$ chmod +x ./download.sh
$ ./download.sh

For feeding the image to the VGGNet, you should resize the MSCOCO image dataset to the fixed size of 224x224. Run command below then resized images will be stored in image/train2014_resized/ and image/val2014_resized/ directory.

$ python resize.py

Before training the model, you have to preprocess the MSCOCO caption dataset. To generate caption dataset and image feature vectors, run command below.

$ python prepro.py

Train the model

To train the image captioning model, run command below.

$ python train.py

(optional) Tensorboard visualization

I have provided a tensorboard visualization for real-time debugging. Open the new terminal, run command below and open http://localhost:6005/ into your web browser.

$ tensorboard --logdir='./log' --port=6005

Evaluate the model

To generate captions, visualize attention weights and evaluate the model, please see evaluate_model.ipynb.

Results

Training data

(1) Generated caption: A plane flying in the sky with a landing gear down.

(2) Generated caption: A giraffe and two zebra standing in the field.

Validation data

(1) Generated caption: A large elephant standing in a dry grass field.

(2) Generated caption: A baby elephant standing on top of a dirt field.

Test data

(1) Generated caption: A plane flying over a body of water.

(2) Generated caption: A zebra standing in the grass near a tree.

show-attend-and-tell's People

Contributors

Stargazers

Watchers

Forkers

ml-lab arora90 ahn19 brucecui0120 donghyunlee hanamizukigakki jacklone apkrepo pras-kolar wanjinchang vangogh0318 allensmile stevenlol benjamesbabala ziyubiti celuigi hzy-zg vsitzmann arieling antalexa ccv-edward shaoxuan92 xieqiangqiang souravsingh johndpope wujiahongpku whenever77 wlcoolongs zgsxwsdxg lijianqing317 4nonymou5 ankitshah009 xsongx yuejack anejatanu34 liviust xiaohanghang songkay paojianghu dongzhuoyao eelaiwind gocreating hankhuang1258 davidsonggithub sasa33 waynesuzq nsokhand chunhuanlin cning fmfn fernlee pratik18v souravroy0708 fuleying sangy12 vivek252007 liean shimazing rubenvereecken tivaro ajaytalati bfolkens seaun163 hstromfelt mensanyan athirajacob dimplesl k-wu th4nos weili-nlp guptaaman2011 astorfi vish25v lorenzvh problemtryer lolacc lyunix hsakas iqbal-chowdhury azurathena yiqinggit pranoothatwar daijucug moxel newzhx roozbehsanaei xujunrt melody-xiaomi vanpersie32 yeyuel chetankhatri tianqig jeffrey1hu jkronen sim-l3 chenghuige dengcy028 researcher2003pro danfouer alexanderhanboli

show-attend-and-tell's Issues

Memoryerror in prepro.py

Loaded ./data/train/train.annotations.pkl..
Traceback (most recent call last):
  File "prepro.py", line 212, in <module>
    main()
  File "prepro.py", line 195, in main
    all_feats = np.ndarray([n_examples, 196, 512], dtype=np.float32)
MemoryError

How to reduce to number of training dataset?

train.py index error

I get the following error on executing train.py. Does somebody knows what this is about?

Traceback (most recent call last):
File "train.py", line 25, in
main()
File "train.py", line 8, in main
data = load_coco_data(data_path='./data', split='train')
File "/Users/jaideepsingh/Desktop/new/show-attend-and-tell-tensorflow/core/utils.py", line 13, in load_coco_data
data['features'] = hickle.load(os.path.join(data_path, '%s.features.hkl' %split))
File "/Users/jaideepsingh/Library/Python/2.7/lib/python/site-packages/hickle.py", line 625, in load
return py_container[0][0]
IndexError: list index out of range

How to unroll the model dynamically?

Hi, According to https://github.com/vanpersie32/show-attend-and-tell/blob/master/core/model.py#L160-L171, the model is not unrolled in a fixed length mode. But I want ask how to unroll the model dynamically?

Hard attention

Did you implement hard-attention part? Do you have any reference about that part?

difference with jazzsaxmafia/show_attend_and_tell.tensorflow

hello, Im newbie about LSTM and NLP. Is your implementation better than jazzsaxmafia/show_attend_and_tell.tensorflow ? which one should I take for first study?

Hope reply and many tnanks.

Beam search

Hi, I am reading your excellent code, but find no beam search during caption generation as the source code in https://github.com/kelvinxu/arctic-captions, is there any reason ?

Is the data file missing?

Could you please provide the "data" file via Google drive or any other place where we can download it...

And What's the model you used to train these features?

Thanks.

_decode_lstm ctx2out

Document indicates that ctx2out is in Eq(2). I can't find corresponding ctx2out in the paper. Can you give me some insight about why we need that?

ValueError in train.py

work@lab-server03:~/ljz/show-attend-and-tell-master$ python train.py
image_idxs <type 'numpy.ndarray'> (399998,) int32
file_names <type 'numpy.ndarray'> (82783,) <U55
word_to_idx <type 'dict'> 23110
features <type 'numpy.ndarray'> (82783, 196, 512) float32
captions <type 'numpy.ndarray'> (399998, 17) int32
Elapse time: 198.26
image_idxs <type 'numpy.ndarray'> (19589,) int32
file_names <type 'numpy.ndarray'> (4052,) <U51
features <type 'numpy.ndarray'> (4052, 196, 512) float32
captions <type 'numpy.ndarray'> (19589, 17) int32
Elapse time: 3.67
Traceback (most recent call last):
File "train.py", line 25, in
main()
File "train.py", line 22, in main
solver.train()
File "/home/work/ljz/show-attend-and-tell-master/core/solver.py", line 86, in train
train_op = optimizer.apply_gradients(grads_and_vars=grads_and_vars)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 412, in apply_gradients
self._create_slots(var_list)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/adam.py", line 119, in _create_slots
self._zeros_slot(v, "m", self._name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 656, in _zeros_slot
named_slots[var] = slot_creator.create_zeros_slot(var, op_name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 123, in create_zeros_slot
colocate_with_primary=colocate_with_primary)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 101, in create_slot
return _create_slot_var(primary, val, '')
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 55, in _create_slot_var
slot = variable_scope.get_variable(scope, initializer=val, trainable=False)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 988, in get_variable
custom_getter=custom_getter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 890, in get_variable
custom_getter=custom_getter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 348, in get_variable
validate_shape=validate_shape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 333, in _true_getter
caching_device=caching_device, validate_shape=validate_shape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 657, in _get_single_variable
"VarScope?" % name)
ValueError: Variable conv_featuresbatch_norm/beta/Adam/ does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?

Does the selector make a big difference in results?

vector serialized

Dear author,
I have a question what is LSTM's input should be serialized vector, and CNN's output vector without serialization, CNN output vector is how to serialize, then in the input to the LSTM, How is this serialized process implemented ?
I will appreciate it if you answer my question and I am looking forward to your early reply.
From: Kobe20

error when proceed prepro.py

when i run the command prepro.py, it is fail? detail is:
Traceback (most recent call last):
File "prepro.py", line 212, in
main()
File "prepro.py", line 185, in main
vggnet.build()
File "/Users/hp/show-attend-and-tell-tensorflow/core/vggnet.py", line 59, in build
self.build_params()
File "/Users/hp/show-attend-and-tell-tensorflow/core/vggnet.py", line 19, in build_params
model = scipy.io.loadmat(self.vgg_path)
File "/Users/hp/miniconda2/lib/python2.7/site-packages/scipy/io/matlab/mio.py", line 136, in loadmat
matfile_dict = MR.get_variables(variable_names)
File "/Users/hp/miniconda2/lib/python2.7/site-packages/scipy/io/matlab/mio5.py", line 272, in get_variables
hdr, next_position = self.read_var_header()
File "/Users/hp/miniconda2/lib/python2.7/site-packages/scipy/io/matlab/mio5.py", line 231, in read_var_header
raise TypeError('Expecting miMATRIX type here, got %d' % mdtype)
TypeError: Expecting miMATRIX type here, got 1902171734

Error running evaluate_model.ipynb

Maybe you can help me run your excellent project.

After following all steps of the README, an error occurs on the last step:

To generate captions, visualize attention weights and evaluate the model, please see evaluate_model.ipynb

When trying to run: solver.test(data, split='val') an ipython notebook error message:


ValueError: Variable conv_featuresbatch_norm/beta already exists, disallowed. Did you mean to set reuse=True in VarScope? Originally defined at:

  File "/usr/local/lib/python2.7/site-packages/tensorflow/contrib/framework/python/ops/variables.py", line 208, in variable
    caching_device=caching_device)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 177, in func_with_args
    return func(*args, **current_args)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/contrib/framework/python/ops/variables.py", line 244, in model_variable
    caching_device=caching_device, device=device)

My guess is that something is wrong with my tensorflow installation, which is weird because I tested it out with basic usage in iPython and everything works fine there.

Do you see what is causing my problem?

Pretrained model

Could you share a pretrained model checkpoint that can be directly used for generation/evaluation?

Can not extract features with vggnet

While trying to extract features, I came across these errors:

When I set the batch_size to 100:

Traceback (most recent call last):
File "prepare_features.py", line 53, in
main(sys.argv[1:])
File "prepare_features.py", line 40, in main
imread(x, mode='RGB'), image_batch_file)).astype(np.float32)
ValueError: setting an array element with a sequence

When I set the batch_size to 1 due to the above error:

FailedPreconditionError (see above for traceback): Attempting to use uninitialized value conv1_1/w
[[Node: conv1_1/w/read = IdentityT=DT_FLOAT, _class=["loc:@conv1_1/w"], _device="/job:localhost/replica:0/task:0/gpu:0"]]
[[Node: conv5_3/BiasAdd/_3 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_9_conv5_3/BiasAdd", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]]

The size of train.features.hkl is too big???

When I run the prepro.py,I spend too much time getting train.features.hkl,it almost 80GB+ now and still increase...I want to ask is that normal???

Issue in download.sh

Please change the line 6: unzip captions_train-val2014.zip -d data/

to unzip data/captions_train-val2014.zip -d data/

essentially the data/ part is missing.

About the Doubly stochastic regularization

Hello,
I'm a little surprise by this code :

if self.alpha_c > 0:
            alphas = tf.transpose(tf.pack(alpha_list), (1, 0, 2))     # (N, T, L)
            alphas_all = tf.reduce_sum(alphas, 1)      # (N, L)
            alpha_reg = self.alpha_c * tf.reduce_sum((16./196 - alphas_all) ** 2)

I dont understand the calcul to get alpha_reg. As i understand, alphas_all is the sum of alphas for every spatial location. In the paper, we want this to be close from 1, shouldnt then it be something like :

alpha_reg = self.alpha_c * tf.reduce_sum(tf.abs(1-alphas_all))

Thanks in advance for the clarification.

MemoryError in prepro.py

This process requires way too large RAM to implement.
Could you please show or upload the code to split the preprocessing in prepro?

Error while running evaluate_model

Hello, yunjey, thanks a lot for writing such a well structured code.

I got an error while running evaluate_model code. At solver.test(data, split='val'), code throws the following error : Assign requires shapes of both tensors to match. lhs shape= [1500] rhs shape= [1024]

Here I have following doubts:

Hidden state size in training and testing is not same(1024, 1500). when i changed the hidden state size to 1024, I got a similar error.
Testing model path = './model/lstm3/model-18' does not exist, should it be './model/lstm/model-18' or './model/lstm/model-10' as done while training the model.

Till now I had done following steps:

1. Obtained VGG features for 10 percent of the training data by adding the below lines in prepro.py at line 150.
  train_cutoff = int(0.1 * len(train_dataset))
  print 'Finished processing caption data'
  save_pickle(train_dataset[:train_cutoff], 'data/train/train.annotations.pkl')
1. Trained the model for 20 epochs with VGG feature vector of size=[196, 512], word embedding size =512, hidden state size =1024, trained model path = 'model/lstm/', test_model='model/lstm/model-10' as done in the code.
1. Converted evaluate_model.ipynb into evaluate_model.py and ran the code with VGG feature vector of size=[196, 512], word embedding size =512, hidden state size =1500, trained model path = 'model/lstm/', test_model='./model/lstm3/model-18' as done in the code.

Thanks in advance for your support.

why lstm recurse in fixed steps?

According to https://github.com/vanpersie32/show-attend-and-tell/blob/master/core/model.py#L160-L171, the model recurse in a fixed steps regardless of the length of the input sentence. If the sentence length is much smaller than the fixed steps, it is very time-consuming to do that. Is there any better way solving the problem?

Variable conv_featuresbatch_norm/beta/RMSProp/ does not exist

hey, I encountered a problem when i run python train.py. The full error is
ValueError: Variable conv_featuresbatch_norm/beta/RMSProp/ does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?

occured in this line(87): train_op = optimizer.apply_gradients(grads_and_vars=grads_and_vars)

Evaluate error

Hello, yunjey, thanks a lot for your nice work! I got a problem when I run 'train.py', at the line of 'scores = evaluate(data_path='./data', split='val', get_scores=True)', raised a error like this:

I'm new to this field, I'll very appreciate it if you can give some advice. Thank you!

Preprocess the image

Thank you very much for sharing your code. I was wondering if you preprocessed the image somewhere, e.g. subtracting the mean pixel values?

Trained

Can you provide the models already trained?

how to fix train.py on tensorflow1.0

when i run train.py ,i can't find rnn_cell,then i use tf.contrib.rnn instead. Being given TypeError: Expected int32, got list containing Tensors of type '_Message' instead.I dont' know how to fix

which version of tf in your env?

my env is tf 1.0.0, and with so much variable sharing problems,
i change the code with variable scope, but it still can't work...

Error Info looks like:
ValueError: Variable lstm/basic_lstm_cell/weights already exists? ....

Otherwise,
tf.get_variable_scope().reuse_variables() lies in Line 78 of solver.py, it change the whole variable scope to reuse status, is that what you desire?

Thks!

MemoryError when train the model

MemoryError when train the model, I want to know how large the memory does the model need if I train the model, thx

problems with prepro.py

my RAM is not big enough to run this code, so I'm trying to split batch size to be smaller.
and I'm having trouble with the size of train.features.hkl
It should be getting bigger as the process goes on, but the size is not changing, fixed as 4MB.
It seems hickle.dump is overwriting. So I tried mode='a' option, but I got below error message.

RuntimeError: Unable to create link (Name already exists)

How can I append to already existing file?

Problem in evaluate_model.ipynb

When i run the evaluate_model.ipynb,it just show that:
InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [1500] rhs shape= [1024]
[[Node: save/Assign_8 = Assign[T=DT_FLOAT, _class=["loc:@initial_lstm/b_h"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](initial_lstm/b_h, save/restore_slice_8/_33)]]
[[Node: save/restore_slice_15/_4 = _SendT=DT_FLOAT, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_7_save/restore_slice_15", _device="/job:localhost/replica:0/task:0/cpu:0"]]

generate captions for an image or multi-images?

Firstly, i appreciate you for your code, and i am a novice. when i trained a model, i is thinking how can i use it to test a image. so i think maybe you have this code.

what's your GPU memory?

It seems that you tried to load all training data into a numpy array, which causes memory error in my case. (it works on my 3* titanX 12G GPU)

what's your GPU memory?

can I remove remove original images?

can I remove original images after resizing?

In prepro.py File, line 31: caption_data.sort_values(by='image_id', inplace=True)

~/show-attend-and-tell-tensorflow# python prepro.py
Traceback (most recent call last):
File "prepro.py", line 212, in
main()
File "prepro.py", line 138, in main
max_length=max_length)
File "prepro.py", line 31, in _process_caption_data
caption_data.sort_values(by='image_id', inplace=True)
File "/usr/lib/python2.7/dist-packages/pandas/core/generic.py", line 1815, in getattr
(type(self).name, name))
AttributeError: 'DataFrame' object has no attribute 'sort_values'

in prepro.py :
line31 caption_data.sort_values(by='image_id', inplace=True)

Trouble understanding alpha penalty in loss

I'm having trouble understanding the magic constant 16./196 in https://github.com/yunjey/show-attend-and-tell/blob/master/core/model.py#L176.

In the paper, this value is 1, meaning they try to force the different alphas to sum up to 1 so every part of the image is attended to. I don't get this value that is roughly 8% and looking at the notebook after training, it doesn't seem like only a small part is attended to. Additionally, the fraction corresponds to T/L with default values but I can not extract meaning from that.

Anyone got a clue?

can we use an pre-trained embedding instead of training from scratch?

Hello, at the very beginning of input, it's a embedding for word2vec from scratch. Can we use an pre-trained embedding instead of training it from scratch?

And for the decoding, in the code it's from h->K words softmax directly. Should we add an intermediate layer like embedding between them? Or share the same embedding of the input?

about prev2out and evaluation score

Does the prev2out make big difference in evaluation?
Does anyone had tried this and have evaluation score of this

the download.sh script error:

unzip captions_train-val2014.zip -d data/

should be:

unzip ./data/captions_train-val2014.zip -d data/

not big deal, anyway

Error about with Image.open(f) as image in resize.py line: 33

Error looks like problem
Changing with Image.open(f) as image: to image = Image.open(f).
It fixed.

Does the computational graph implementation contains hard attention?

Regarding original paper, does the computational graph implementation contains hard attention (includes reinforcement learning)?

train.py TypeError

TypeError: Expected int32, got list containing Tensors of type '_Message' instead.

It traced back to the line:

line 168, in build_model _, (c, h) = lstm_cell(inputs=tf.concat(1, [x[:,t,:], context]), state=[c, h])

Variable scope issues

Hi,

I am trying to use your code base for an image captioning problem.
I have tensorflow .11 version installed.
I am getting variable scope issues at solver.py and model.py when i run train.py
Any help would be much appreciated.

raise MemoryError when I run prepro.py

I tensorflow/core/common_runtime/gpu/gpu_device.cc:944] Found device 0 with properties: 
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.8095
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 7.27GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1034] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
Loaded ./data/train/train.annotations.pkl..

Traceback (most recent call last):
  File "prepro.py", line 234, in <module>
    main()
  File "prepro.py", line 209, in main
    all_feats = np.ndarray([n_examples, 196, 512], dtype=np.float32)
MemoryError

I print n_examples,and there have 80k images, so 82783x196x512= 8307439616,float32 is 32bit,so there will be 8,307,439,616 *32bit=32GB memory ?
Is there some wrong?

My 1080 have 7.27GB free memory,so I change n_example to 20000,all things work well

Bleu result

In the paper, their Bleu-1 result is 70.7 in the end.
I trained this model and final bleu score is about 66.5.
Do you have the same behaviour? What could be the problem?

InvalidArgumentError of evaluate_model in show-attend-and-tell

Can anyone help me to solve this problem?
In the evaluate_model.py , when I run the solver.test(data, split='val') . I got the error like this:

INFO:tensorflow:Restoring parameters from ./model/lstm/model-19

InvalidArgumentError Traceback (most recent call last)
in ()
----> 1 solver.test(data, split='val')

/home/guanlida/show-attend-and-tell-tensorflow/core/solver.pyc in test(self, data, split, attention_visualization, save_sampled_captions)
195 features_batch, image_files = sample_coco_minibatch(data, self.batch_size)
196 feed_dict = { self.model.features: features_batch }
--> 197 alps, bts, sam_cap = sess.run([alphas, betas, sampled_captions], feed_dict) # (N, max_len, L), (N, max_len)
198 decoded = decode_captions(sam_cap, self.model.idx_to_word)
199

/home/guanlida/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in run(self, fetches, feed_dict, options, run_metadata)
787 try:
788 result = self._run(None, fetches, feed_dict, options_ptr,
--> 789 run_metadata_ptr)
790 if run_metadata:
791 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/home/guanlida/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _run(self, handle, fetches, feed_dict, options, run_metadata)
995 if final_fetches or final_targets:
996 results = self._do_run(handle, final_targets, final_fetches,
--> 997 feed_dict_string, options, run_metadata)
998 else:
999 results = []

/home/guanlida/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
1130 if handle is None:
1131 return self._do_call(_run_fn, self._session, feed_dict, fetch_list,
-> 1132 target_list, options, run_metadata)
1133 else:
1134 return self._do_call(_prun_fn, self._session, handle, feed_dict,

/home/guanlida/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.pyc in _do_call(self, fn, *args)
1150 except KeyError:
1151 pass
-> 1152 raise type(e)(node_def, op, message)
1153
1154 def _extend_graph(self):

InvalidArgumentError: transpose expects a vector of size 1. But input(1) is a vector of size 2
[[Node: transpose_1 = Transpose[T=DT_FLOAT, Tperm=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](Squeeze, transpose_1/perm)]]

Caused by op u'transpose_1', defined at:
File "/home/guanlida/anaconda2/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/home/guanlida/anaconda2/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/home/guanlida/anaconda2/lib/python2.7/site-packages/ipykernel_launcher.py", line 16, in
app.launch_new_instance()
File "/home/guanlida/anaconda2/lib/python2.7/site-packages/traitlets/config/application.py", line 658, in launch_instance
app.start()
File "/home/guanlida/anaconda2/lib/python2.7/site-packages/ipykernel/kernelapp.py", line 477, in start
ioloop.IOLoop.instance().start()
File "/home/guanlida/anaconda2/lib/python2.7/site-packages/zmq/eventloop/ioloop.py", line 177, in start
super(ZMQIOLoop, self).start()
File "/home/guanlida/anaconda2/lib/python2.7/site-packages/tornado/ioloop.py", line 888, in start
handler_func(fd_obj, events)
File "/home/guanlida/anaconda2/lib/python2.7/site-packages/tornado/stack_context.py", line 277, in null_wrapper
return fn(*args, **kwargs)
File "/home/guanlida/anaconda2/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events
self._handle_recv()
File "/home/guanlida/anaconda2/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
self._run_callback(callback, msg)
File "/home/guanlida/anaconda2/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
callback(*args, **kwargs)
File "/home/guanlida/anaconda2/lib/python2.7/site-packages/tornado/stack_context.py", line 277, in null_wrapper
return fn(*args, **kwargs)
File "/home/guanlida/anaconda2/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 283, in dispatcher
return self.dispatch_shell(stream, msg)
File "/home/guanlida/anaconda2/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 235, in dispatch_shell
handler(stream, idents, msg)
File "/home/guanlida/anaconda2/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 399, in execute_request
user_expressions, allow_stdin)
File "/home/guanlida/anaconda2/lib/python2.7/site-packages/ipykernel/ipkernel.py", line 196, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "/home/guanlida/anaconda2/lib/python2.7/site-packages/ipykernel/zmqshell.py", line 533, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "/home/guanlida/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2717, in run_cell
interactivity=interactivity, compiler=compiler, result=result)
File "/home/guanlida/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2827, in run_ast_nodes
if self.run_code(code, result):
File "/home/guanlida/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2881, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in
solver.test(data, split='val')
File "core/solver.py", line 188, in test
alphas, betas, sampled_captions = self.model.build_sampler(max_len=20) # (N, max_len, L), (N, max_len)
File "core/model.py", line 216, in build_sampler
betas = tf.transpose(tf.squeeze(beta_list), (1, 0)) # (N, T)
File "/home/guanlida/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 1285, in transpose
ret = gen_array_ops.transpose(a, perm, name=name)
File "/home/guanlida/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3658, in transpose
result = _op_def_lib.apply_op("Transpose", x=x, perm=perm, name=name)
File "/home/guanlida/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/home/guanlida/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/guanlida/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1269, in init
self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): transpose expects a vector of size 1. But input(1) is a vector of size 2
[[Node: transpose_1 = Transpose[T=DT_FLOAT, Tperm=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](Squeeze, transpose_1/perm)]]

In this program, when I try using the whole dataset of MS COCO, I found that I got memory error on my GPU (NVIDIA TITAN XP), so I only use 10% of the train data to train the model and set the batch_size from 128 to 1 in the train.py. I completed the train.py and got the model in the lstm file. But when I want to run the evaluate_model finally, I change the dim_hidden from 1500 to 1024 and in the In[4]
solver = CaptioningSolver(model, data, data, n_epochs=15, batch_size=1, update_rule='adam', learning_rate=0.0025, print_every=2000, save_every=1, image_path='./image/val2014_resized', pretrained_model=None, model_path='./model/lstm', test_model='./model/lstm/model-19', print_bleu=False, log_path='./log/')
I changed the batch to 1 and the test_model from lstm3 to lstm(because there is not a file named lstm3). Those are my all change and I think when I changed the parameter, the shape or the dimension of the tensor has changed, so I got this error.

The loss keeps increasing even when learning_rate is set to 0

I am using the code to re-generate results, however loss increases. I wanted to debug what is going on, and set learning rate to 0, but the loss keeps increasing and exploding after even so many epochs. What could be a possible reason for this?

How can I implement the beam search in the captioning?

Thank you for your excellent code!
how can i implement the beam search for captioning?
i don't know how to implement that in graph.
Thank you.

Failed to get matching files on ./model/lstm3/model-18: Not found:

Failed to get matching files on ./model/lstm3/model-18: Not found: when i run the evaluate_model

IOError in train.py

When I run train.py,i got the problem says
Traceback (most recent call last): File "train.py", line 25, in <module> main() File "train.py", line 8, in main data = load_coco_data(data_path='./data', split='train') File "/home/sxm/bishe/tensorflow-git/show-attend-and-tell-tensorflow/core/utils.py", line 13, in load_coco_data data['features'] = hickle.load(os.path.join(data_path, '%s.features.hkl' %split)) File "/home/sxm/Tensorflow/local/lib/python2.7/site-packages/hickle.py", line 616, in load h5f = file_opener(fileobj) File "/home/sxm/Tensorflow/local/lib/python2.7/site-packages/hickle.py", line 154, in file_opener h5f = h5.File(filename, mode) File "/usr/lib/python2.7/dist-packages/h5py/_hl/files.py", line 207, in __init__ fid = make_fid(name, mode, userblock_size, fapl) File "/usr/lib/python2.7/dist-packages/h5py/_hl/files.py", line 79, in make_fid fid = h5f.open(name, h5f.ACC_RDONLY, fapl=fapl) File "h5f.pyx", line 71, in h5py.h5f.open (h5py/h5f.c:1806) IOError: unable to open file (File accessibilty: Unable to open file)
Is there anybody has the same problem?I don't know how to fix it...

error in train.py

WARNING:tensorflow:<tensorflow.python.ops.rnn_cell.BasicLSTMCell object at 0x7fbc2a5b0850>: Using a concatenated state is slower and will soon be deprecated. Use state_is_tuple=True.
Traceback (most recent call last):
File "train.py", line 25, in
main()
File "train.py", line 22, in main
solver.train()
File "/media/s/F606389C06386033/show-attend-and-tell-tensorflow/core/solver.py", line 77, in train
loss = self.model.build_model()
File "/media/s/F606389C06386033/show-attend-and-tell-tensorflow/core/model.py", line 168, in build_model
_, (c, h) = lstm_cell(inputs=tf.concat(1, [x[:,t,:], context]), state=[c, h])
File "/home/s/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/rnn_cell.py", line 305, in call
concat = _linear([inputs, h], 4 * self._num_units, True)
File "/home/s/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/rnn_cell.py", line 886, in _linear
raise ValueError("Linear is expecting 2D arguments: %s" % str(shapes))
ValueError: Linear is expecting 2D arguments: [[None, 1024], [2, None, 1024]]