chenxinpeng / s2vt Goto Github PK

Tensorflow implement of paper: Sequence to Sequence: Video to Text

Python 99.71% Shell 0.29%

captions

s2vt's Introduction

S2VT: Sequence to Sequence: Video to Text

Note

This repository is not being actively maintained due to lack of time and interest. My sincerest apologies to the open source community for allowing this project to stagnate. I hope it was useful for some of you as a jumping-off point.

Acknowledgement

I modified the code from jazzsaxmafia, and I have fixed some problems in his code.

Requirement

Tensorflow 0.12
Keras

How to use my code

First, download MSVD dataset, and extract video features:

$ python extract_feats.py

After this operation, you should split the features into two parts:

train_features
test_features

Second, train the model:

$ CUDA_VISIBLE_DEVICES=0 ipython

When in the ipython environment, then:

>>> import model_rgb
>>> model_rgb.train()

You should change the training parameters and directory path in the model_rgb.py

Third, test the model, choose a trained model, then:

>>> import model_rgb
>>> model_rgb.test()

After testing, a text file, "S2VT_results.txt" will generated.

Last, evaluate results with COCO

We evaluate the generation results with coco-caption tools.

You can run the shell get_coco_tools.sh get download the coco tools:

$ ./get_coco_tools.sh

After this, generate the reference json file from ground truth CSV file:

$ python create_reference.py

Then, generate the results json file from S2VT_results.txt file:

$ python create_result_json.py

Finally, you can evaluate the generation results:

$ python eval.py

Results

Model	METEOR
S2VT(ICCV 2015)
-RGB(VGG)	29.2
-Optical Flow(AlexNet)	24.3
Our model
-RGB(VGG)	28.1
-Optical Flow(AlexNet)	23.3

Attention

Please feel free to ask me if you have questions.
I only commit the RGB parts of all my code, you can modify the code to use optical flow features.

s2vt's People

Stargazers

Watchers

Forkers

choznerol hyoungwoopark miffy1216 bityangke arasharchor dasimici yingning dimplesl junx1992 bmyan sahuangtw fo40225 kezhang-cs nickball007 pc-huang kalyfabdalla charudatta10 knwng shenxin008 tsingzao meelement eaglep91 adrianhsu reloadbrain ishandutta2007 lvaleriu sandy4321 bei21 ybcliff amirunpri2018 sususushi amazefan tong8080 lexieewei ylhe 2226171237 aniloc111 zhengleicq amengi dubaozeng ai-natural-language-processing-lab zzzzlalala antsemot dripmaster

s2vt's Issues

how long is the train time？

It took 3600s for one epoch for me. How long are your traing time?
I am looking forward to anyone's answer. Thank you!

Rationale behind using stacked LSTM

Hello,

I apologise initially if this is not the right forum to ask this question. But I believe since you were able to get good results, I thought you will be able to help me out!

I just have a confusion in understanding why the architecture involves stacked LSTMs. It is not very clearly explained in the paper (or I might have missed the finer details ). Since the inputs are just pad, I do not see any reason for the LSTM stacked layer. Request to point me in right direction to eliminate this ambiguity.

Thanks!

Can you upload the features file？

A few quentions ahout this project

Hi, I am a student who just get started with video description. I am sorry I have many quenstions about s2vt.
1.How can I get MSVD dataset? I have download from Internet,but it just have an excel document which I can not find any videos.
2.How can I get VGGmodel? I see that in your home/chenx../caffe/models/...
I would be very glad to receive your respons.
Sorry I am a tiro.

Time taken to run the code

Hello,

Thank you for making this code available. Quick question -

How long does it take to run the scripts on a CPU:

time taken to run extract_RGB_feats.py on the MSVD dataset?

and then to train and test the model?

Hope to hear back from someone soon.

Do you need to substract mean from the input images/video frames?

Hi,

Thanks for sharing.
Do you not need to subtract mean from the input video frames for training the model using VGG 16 layer model as initial weights? I have not seen that part in your implementation, and I am wondering why.

Thanks.

How to import caffe?

I got this error

Traceback (most recent call last):
File "extract_RGB_feats.py", line 9, in
from cnn_util import *
File "cnn_util.py", line 4, in
import caffe
File "/content/drive/My Drive/S2VT/caffe/python/caffe/init.py", line 1, in
from .pycaffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, RMSPropSolver, AdaDeltaSolver, AdamSolver, NCCL, Timer
File "/content/drive/My Drive/S2VT/caffe/python/caffe/pycaffe.py", line 13, in
from ._caffe import Net, SGDSolver, NesterovSolver, AdaGradSolver,
ImportError: No module named _caffe

How can I download MSVD videos, not the csv file.

@chenxinpeng Thank you for sharing your work. That’s a great work. I would appreciate if you tell me how to download videos of MSVD. I have only got the csv file after downloading from www.microsoft.com.
And I also want to know what the format of the video's names are, for example, "mv89psg6zh4.avi" or "mv89psg6zh4_33_46.avi" ?
Look forward to your soonest reply.

In training getting IndexError: too many indices for array

for ind, row in enumerate(current_caption_masks):
row[:nonzeros[ind]] = 1------------>IndexError: too many indices for array

Download MSVD

I get .csv file of MSVD dataset. How can I get video from the file and use it in extract_RGB_feats.py?

test error

I have a big problem about test(). When I ran test(), there were no sentences been generated. And I found that video_feat.shape[1] was always 0 rather than 80(n_frame_step). How could I fix this?

optical flow

how to use optical flow, please tell me

error model_rgb.train()

ValueError Traceback (most recent call last)
in ()
----> 1 model_rgb.train()

/home/jyuan/software/S2VT-master/model_rgb.py in train()
288 with tf.variable_scope(tf.get_variable_scope(), reuse=False):
289 saver = tf.train.Saver(max_to_keep=100, write_version=1)
--> 290 train_op = tf.train.AdamOptimizer(learning_rate).minimize(tf_loss)
291 tf.global_variables_initializer().run()
292

/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.pyc in minimize(self, loss, global_step, var_list, gate_gradients, aggregation_method, colocate_gradients_with_ops, name, grad_loss)
323
324 return self.apply_gradients(grads_and_vars, global_step=global_step,
--> 325 name=name)
326
327 def compute_gradients(self, loss, var_list=None,

/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.pyc in apply_gradients(self, grads_and_vars, global_step, name)
444 ([str(v) for _, _, v in converted_grads_and_vars],))
445 with ops.control_dependencies(None):
--> 446 self._create_slots([_get_variable_for(v) for v in var_list])
447 update_ops = []
448 with ops.name_scope(name, self._name) as name:

/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/adam.pyc in _create_slots(self, var_list)
126 # Create slots for the first and second moments.
127 for v in var_list:
--> 128 self._zeros_slot(v, "m", self._name)
129 self._zeros_slot(v, "v", self._name)
130

/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.pyc in _zeros_slot(self, var, slot_name, op_name)
764 named_slots = self._slot_dict(slot_name)
765 if _var_key(var) not in named_slots:
--> 766 named_slots[_var_key(var)] = slot_creator.create_zeros_slot(var, op_name)
767 return named_slots[_var_key(var)]

/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.pyc in create_zeros_slot(primary, name, dtype, colocate_with_primary)
172 return create_slot_with_initializer(
173 primary, initializer, slot_shape, dtype, name,
--> 174 colocate_with_primary=colocate_with_primary)
175 else:
176 val = array_ops.zeros(slot_shape, dtype=dtype)

/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.pyc in create_slot_with_initializer(primary, initializer, shape, dtype, name, colocate_with_primary)
144 with ops.colocate_with(primary):
145 return _create_slot_var(primary, initializer, "", validate_shape, shape,
--> 146 dtype)
147 else:
148 return _create_slot_var(primary, initializer, "", validate_shape, shape,

/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.pyc in _create_slot_var(primary, val, scope, validate_shape, shape, dtype)
64 use_resource=_is_resource(primary),
65 shape=shape, dtype=dtype,
---> 66 validate_shape=validate_shape)
67 variable_scope.get_variable_scope().set_partitioner(current_partitioner)
68

/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.pyc in get_variable(name, shape, dtype, initializer, regularizer, trainable, collections, caching_device, partitioner, validate_shape, use_resource, custom_getter)
1063 collections=collections, caching_device=caching_device,
1064 partitioner=partitioner, validate_shape=validate_shape,
-> 1065 use_resource=use_resource, custom_getter=custom_getter)
1066 get_variable_or_local_docstring = (
1067 """%s

/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.pyc in get_variable(self, var_store, name, shape, dtype, initializer, regularizer, reuse, trainable, collections, caching_device, partitioner, validate_shape, use_resource, custom_getter)
960 collections=collections, caching_device=caching_device,
961 partitioner=partitioner, validate_shape=validate_shape,
--> 962 use_resource=use_resource, custom_getter=custom_getter)
963
964 def _get_partitioned_variable(self,

/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.pyc in get_variable(self, name, shape, dtype, initializer, regularizer, reuse, trainable, collections, caching_device, partitioner, validate_shape, use_resource, custom_getter)
365 reuse=reuse, trainable=trainable, collections=collections,
366 caching_device=caching_device, partitioner=partitioner,
--> 367 validate_shape=validate_shape, use_resource=use_resource)
368
369 def _get_partitioned_variable(

/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.pyc in _true_getter(name, shape, dtype, initializer, regularizer, reuse, trainable, collections, caching_device, partitioner, validate_shape, use_resource)
350 trainable=trainable, collections=collections,
351 caching_device=caching_device, validate_shape=validate_shape,
--> 352 use_resource=use_resource)
353
354 if custom_getter is not None:

/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.pyc in _get_single_variable(self, name, shape, dtype, initializer, regularizer, partition_info, reuse, trainable, collections, caching_device, validate_shape, use_resource)
680 raise ValueError("Variable %s does not exist, or was not created with "
681 "tf.get_variable(). Did you mean to set reuse=None in "
--> 682 "VarScope?" % name)
683 if not shape.is_fully_defined() and not initializing_from_value:
684 raise ValueError("Shape of a new variable (%s) must be fully defined, "

ValueError: Variable Wemb/Adam/ does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?

How about GPU?

I could run these code in my cpu tensorflow, but the training time is quite long. So I downloaded GPU tensorflow and wanted to run model_RGB.py again, but there were many peoblems. The most biggest problem is ResourceExhaustedError:OMM when allocating tensor with shape[3000,4000].
I want to know if these codes just for CPU? And we cannot simply apply them to GPU environment?
Thank you for your reply! I am new to video description.

KeyError: 'video_path' in Model_RGB

I guess this issue had happened with people a lot,but whatsoever was not able to fix with the solutions provided. Please help
I am currently using Google Colab to run the file Model_RGB.py

KeyError: 'video_path'

During handling of the above exception, another exception occurred:

KeyError Traceback (most recent call last)
5 frames
in ()
452
453 if name == "main":
--> 454 main()

in main()
448
449 def main():
--> 450 train()
451
452

in train()
247 train_data = get_video_train_data(video_train_data_path, video_train_feat_path)
248 train_captions = train_data['Description'].values
--> 249 test_data = get_video_test_data(video_test_data_path, video_test_feat_path)
250 test_captions = test_data['Description'].values
251

in get_video_test_data(video_data_path, video_feat_path)
199 video_data = video_data[video_data['Description'].map(lambda x: isinstance(x, str))]
200
--> 201 unique_filenames = sorted(video_data['video_path'].unique())
202 test_data = video_data[video_data['video_path'].map(lambda x: x in unique_filenames)]
203 return test_data

/usr/local/lib/python3.6/dist-packages/pandas/core/frame.py in getitem(self, key)
2798 if self.columns.nlevels > 1:
2799 return self._getitem_multilevel(key)
-> 2800 indexer = self.columns.get_loc(key)
2801 if is_integer(indexer):
2802 indexer = [indexer]

/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2646 return self._engine.get_loc(key)
2647 except KeyError:
-> 2648 return self._engine.get_loc(self._maybe_cast_indexer(key))
2649 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2650 if indexer.ndim > 1 or indexer.size > 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'video_path'

[Deprication warning] in rnn_cell.BasicLSTMCell

when I run the model_RGB.py i get that warning

WARNING:tensorflow:<tensorflow.python.ops.rnn_cell.BasicLSTMCell object at 0x7f16020b08d0>: Using a concatenated state is slower and will soon be deprecated. Use state_is_tuple=True.
and then the entire process is killed.

i am using tensorflow = 0.12.0 python 2.7

where do i download the MSVD csv files

error when running model_RGB.train()

In [2]: model_RGB.train()

TypeError Traceback (most recent call last)
in ()
----> 1 model_RGB.train()

/home2/xzhe3946/S2VT/model_RGB.py in train()
247
248 def train():
--> 249 train_data = get_video_train_data(video_train_data_path, video_train_feat_path)
250 train_captions = train_data['Description'].values
251 test_data = get_video_test_data(video_test_data_path, video_test_feat_path)

/home2/xzhe3946/S2VT/model_RGB.py in get_video_train_data(video_data_path, video_feat_path)
184 def get_video_train_data(video_data_path, video_feat_path):
185 video_data = pd.read_csv(video_data_path, sep=',')
--> 186 video_data = video_data[video_data['Language'] == 'English']
187 video_data['video_path'] = video_data.apply(lambda row: row['VideoID']+''+str(int(row['Start']))+''+str(int(row['End']))+'.avi.npy', axis=1)
188 video_data['video_path'] = video_data['video_path'].map(lambda x: os.path.join(video_feat_path, x))

/usr/lib/python2.7/dist-packages/pandas/core/ops.pyc in wrapper(self, other)
574 # mask out the invalids
575 if mask.any():
--> 576 res[mask] = masker
577
578 return res

/usr/lib/python2.7/dist-packages/pandas/core/series.pyc in setitem(self, key, value)
633 key = _check_bool_indexer(self.index, key)
634 try:
--> 635 self.where(~key, value, inplace=True)
636 return
637 except (InvalidIndexError):

/usr/lib/python2.7/dist-packages/pandas/core/generic.pyc in where(self, cond, other, inplace, axis, level, try_cast, raise_on_error)
3024
3025 if inplace:
-> 3026 cond = -(cond.fillna(True).astype(bool))
3027 else:
3028 cond = cond.fillna(False).astype(bool)

/usr/lib/python2.7/dist-packages/pandas/core/series.pyc in neg(self)
999 # inversion
1000 def neg(self):
-> 1001 arr = operator.neg(self.values)
1002 return self._constructor(arr, self.index).finalize(self)
1003

TypeError: The numpy boolean negative, the - operator, is not supported, use the ~ operator or the logical_not function instead.

Unable to reproduce the results mentioned

I am unable to reproduce the result. Can you tell me at which epoch u got the meteor of 28%. I have trained it for about 1000 epochs and most of the captions are "a man is playing a guitar"

Error running model_RGB.train()

Thanks for your detailed work.
I got an error running the code. Does that mean that there is something wrong with my video dataset?

In [2]: model_RGB.train()

KeyError Traceback (most recent call last)
in ()
----> 1 model_RGB.train()

/home/binwang/Documents/S2VT/model_RGB.py in train()
247
248 def train():
--> 249 train_data = get_video_train_data(video_train_data_path, video_train_feat_path)
250 train_captions = train_data['Description'].values
251 test_data = get_video_test_data(video_test_data_path, video_test_feat_path)

/home/binwang/Documents/S2VT/model_RGB.py in get_video_train_data(video_data_path, video_feat_path)
190 video_data = video_data[video_data['Description'].map(lambda x: isinstance(x, str))]
191
--> 192 unique_filenames = sorted(video_data['video_path'].unique())
193 train_data = video_data[video_data['video_path'].map(lambda x: x in unique_filenames)]
194 return train_data

/usr/lib/python2.7/dist-packages/pandas/core/frame.pyc in getitem(self, key)
1967 return self._getitem_multilevel(key)
1968 else:
-> 1969 return self._getitem_column(key)
1970
1971 def _getitem_column(self, key):

/usr/lib/python2.7/dist-packages/pandas/core/frame.pyc in _getitem_column(self, key)
1974 # get column
1975 if self.columns.is_unique:
-> 1976 return self._get_item_cache(key)
1977
1978 # duplicate columns & possible reduce dimensionality

/usr/lib/python2.7/dist-packages/pandas/core/generic.pyc in _get_item_cache(self, item)
1089 res = cache.get(item)
1090 if res is None:
-> 1091 values = self._data.get(item)
1092 res = self._box_item_values(item, values)
1093 cache[item] = res

/usr/lib/python2.7/dist-packages/pandas/core/internals.pyc in get(self, item, fastpath)
3209
3210 if not isnull(item):
-> 3211 loc = self.items.get_loc(item)
3212 else:
3213 indexer = np.arange(len(self.items))[isnull(self.items)]

/usr/lib/python2.7/dist-packages/pandas/core/index.pyc in get_loc(self, key, method, tolerance)
1757 'backfill or nearest lookups')
1758 key = _values_from_object(key)
-> 1759 return self._engine.get_loc(key)
1760
1761 indexer = self.get_indexer([key], method=method,

/usr/lib/python2.7/dist-packages/pandas/index.so in pandas.index.IndexEngine.get_loc (pandas/index.c:3979)()

/usr/lib/python2.7/dist-packages/pandas/index.so in pandas.index.IndexEngine.get_loc (pandas/index.c:3843)()

/usr/lib/python2.7/dist-packages/pandas/hashtable.so in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12265)()

/usr/lib/python2.7/dist-packages/pandas/hashtable.so in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12216)()

KeyError: 'video_path'

What version of Caffe should I install?

It's not listed under requirements, but it's needed.

TypeError: 'map' object is not subscriptable

File "model_RGB.py", line 327, in train
current_feats[ind][:len(current_feats_vals[ind])] = feat
TypeError: 'map' object is not subscriptable

I can't fix it. Anyone can help?

chenxinpeng / s2vt Goto Github PK

s2vt's Introduction

S2VT: Sequence to Sequence: Video to Text

Note

Acknowledgement

Requirement

How to use my code

First, download MSVD dataset, and extract video features:

Second, train the model:

Third, test the model, choose a trained model, then:

Last, evaluate results with COCO

Results

Attention

s2vt's People

Stargazers

Watchers

Forkers

s2vt's Issues

In [2]: model_RGB.train()

In [2]: model_RGB.train()

Recommend Projects

Recommend Topics

Recommend Org