karpathy / neuraltalk2 Goto Github PK

Efficient Image Captioning code in Torch, runs on GPU

Lua 30.47% Shell 0.20% Python 4.26% HTML 0.70% Jupyter Notebook 64.37%

neuraltalk2's Introduction

NeuralTalk2

Update (September 22, 2016): The Google Brain team has released the image captioning model of Vinyals et al. (2015). The core model is very similar to NeuralTalk2 (a CNN followed by RNN), but the Google release should work significantly better as a result of better CNN, some tricks, and more careful engineering. Find it under im2txt repo in tensorflow. I'll leave this code base up for educational purposes and as a Torch implementation.

Recurrent Neural Network captions your images. Now much faster and better than the original NeuralTalk. Compared to the original NeuralTalk this implementation is batched, uses Torch, runs on a GPU, and supports CNN finetuning. All of these together result in quite a large increase in training speed for the Language Model (~100x), but overall not as much because we also have to forward a VGGNet. However, overall very good models can be trained in 2-3 days, and they show a much better performance.

This is an early code release that works great but is slightly hastily released and probably requires some code reading of inline comments (which I tried to be quite good with in general). I will be improving it over time but wanted to push the code out there because I promised it to too many people.

This current code (and the pretrained model) gets ~0.9 CIDEr, which would place it around spot #8 on the codalab leaderboard. I will submit the actual result soon.

You can find a few more example results on the demo page. These results will improve a bit more once the last few bells and whistles are in place (e.g. beam search, ensembling, reranking).

There's also a fun video by @kcimc, where he runs a neuraltalk2 pretrained model in real time on his laptop during a walk in Amsterdam.

Requirements

For evaluation only

This code is written in Lua and requires Torch. If you're on Ubuntu, installing Torch in your home directory may look something like:

$ curl -s https://raw.githubusercontent.com/torch/ezinstall/master/install-deps | bash
$ git clone https://github.com/torch/distro.git ~/torch --recursive
$ cd ~/torch; 
$ ./install.sh      # and enter "yes" at the end to modify your bashrc
$ source ~/.bashrc

See the Torch installation documentation for more details. After Torch is installed we need to get a few more packages using LuaRocks (which already came with the Torch install). In particular:

$ luarocks install nn
$ luarocks install nngraph 
$ luarocks install image

We're also going to need the cjson library so that we can load/save json files. Follow their download link and then look under their section 2.4 for easy luarocks install.

If you'd like to run on an NVIDIA GPU using CUDA (which you really, really want to if you're training a model, since we're using a VGGNet), you'll of course need a GPU, and you will have to install the CUDA Toolkit. Then get the cutorch and cunn packages:

$ luarocks install cutorch
$ luarocks install cunn

If you'd like to use the cudnn backend (the pretrained checkpoint does), you also have to install cudnn. First follow the link to NVIDIA website, register with them and download the cudnn library. Then make sure you adjust your LD_LIBRARY_PATH to point to the lib64 folder that contains the library (e.g. libcudnn.so.7.0.64). Then git clone the cudnn.torch repo, cd inside and do luarocks make cudnn-scm-1.rockspec to build the Torch bindings.

For training

If you'd like to train your models you will need loadcaffe, since we are using the VGGNet. First, make sure you follow their instructions to install protobuf and everything else (e.g. sudo apt-get install libprotobuf-dev protobuf-compiler), and then install via luarocks:

luarocks install loadcaffe

Finally, you will also need to install torch-hdf5, and h5py, since we will be using hdf5 files to store the preprocessed data.

Phew! Quite a few dependencies, sorry no easy way around it :\

I just want to caption images

In this case you want to run the evaluation script on a pretrained model checkpoint. I trained a decent one on the MS COCO dataset that you can run on your images. The pretrained checkpoint can be downloaded here: pretrained checkpoint link (600MB). It's large because it contains the weights of a finetuned VGGNet. Now place all your images of interest into a folder, e.g. blah, and run the eval script:

$ th eval.lua -model /path/to/model -image_folder /path/to/image/directory -num_images 10

This tells the eval script to run up to 10 images from the given folder. If you have a big GPU you can speed up the evaluation by increasing batch_size (default = 1). Use -num_images -1 to process all images. The eval script will create an vis.json file inside the vis folder, which can then be visualized with the provided HTML interface:

$ cd vis
$ python -m SimpleHTTPServer

Now visit localhost:8000 in your browser and you should see your predicted captions.

You can see an example visualization demo page here.

Running in Docker. If you'd like to avoid dependency nightmares, running the codebase from Docker might be a good option. There is one (third-party) docker repo here.

"I only have CPU". Okay, in that case download the cpu model checkpoint. Make sure you run the eval script with -gpuid -1 to tell the script to run on CPU. On my machine it takes a bit less than 1 second per image to caption in CPU mode.

Beam Search. Beam search is enabled by default because it increases the performance of the search for argmax decoding sequence. However, this is a little more expensive, so if you'd like to evaluate images faster, but at a cost of performance, use -beam_size 1. For example, in one of my experiments beam size 2 gives CIDEr 0.922, and beam size 1 gives CIDEr 0.886.

Running on MSCOCO images. If you train on MSCOCO (see how below), you will have generated preprocessed MSCOCO images, which you can use directly in the eval script. In this case simply leave out the image_folder option and the eval script and instead pass in the input_h5, input_json to your preprocessed files. This will make more sense once you read the section below :)

Running a live demo. With OpenCV 3 installed you can caption video stream from camera in real time. Follow the instructions in torch-opencv to install it and run videocaptioning.lua similar to eval.lua. Note that only central crop will be captioned.

I'd like to train my own network on MS COCO

Great, first we need to some preprocessing. Head over to the coco/ folder and run the IPython notebook to download the dataset and do some very simple preprocessing. The notebook will combine the train/val data together and create a very simple and small json file that contains a large list of image paths, and raw captions for each image, of the form:

[{ "file_path": "path/img.jpg", "captions": ["a caption", "a second caption of i"tgit ...] }, ...]

Once we have this, we're ready to invoke the prepro.py script, which will read all of this in and create a dataset (an hdf5 file and a json file) ready for consumption in the Lua code. For example, for MS COCO we can run the prepro file as follows:

$ python prepro.py --input_json coco/coco_raw.json --num_val 5000 --num_test 5000 --images_root coco/images --word_count_threshold 5 --output_json coco/cocotalk.json --output_h5 coco/cocotalk.h5

This is telling the script to read in all the data (the images and the captions), allocate 5000 images for val/test splits respectively, and map all words that occur <= 5 times to a special UNK token. The resulting json and h5 files are about 30GB and contain everything we want to know about the dataset.

Warning: the prepro script will fail with the default MSCOCO data because one of their images is corrupted. See this issue for the fix, it involves manually replacing one image in the dataset.

The last thing we need is the VGG-16 Caffe checkpoint, (under Models section, "16-layer model" bullet point). Put the two files (the prototxt configuration file and the proto binary of weights) somewhere (e.g. a model directory), and we're ready to train!

$ th train.lua -input_h5 coco/cocotalk.h5 -input_json coco/cocotalk.json

The train script will take over, and start dumping checkpoints into the folder specified by checkpoint_path (default = current folder). You also have to point the train script to the VGGNet protos (see the options inside train.lua).

If you'd like to evaluate BLEU/METEOR/CIDEr scores during training in addition to validation cross entropy loss, use -language_eval 1 option, but don't forget to download the coco-caption code into coco-caption directory.

A few notes on training. To give you an idea, with the default settings one epoch of MS COCO images is about 7500 iterations. 1 epoch of training (with no finetuning - notice this is the default) takes about 1 hour and results in validation loss ~2.7 and CIDEr score of ~0.4. By iteration 70,000 CIDEr climbs up to about 0.6 (validation loss at about 2.5) and then will top out at a bit below 0.7 CIDEr. After that additional improvements are only possible by turning on CNN finetuning. I like to do the training in stages, where I first train with no finetuning, and then restart the train script with -finetune_cnn_after 0 to start finetuning right away, and using -start_from flag to continue from the previous model checkpoint. You'll see your score rise up to about 0.9 CIDEr over ~2 days or so (on MS COCO).

I'd like to train on my own data

No problem, create a json file in the exact same form as before, describing your JPG files:

[{ "file_path": "path/img.jpg", "captions": ["a caption", "a similar caption" ...] }, ...]

and invoke the prepro.py script to preprocess all the images and data into and hdf5 file and json file. Then invoke train.lua (see detailed options inside code).

I'd like to distribute my GPU trained checkpoints for CPU

Use the script convert_checkpoint_gpu_to_cpu.lua to convert your GPU checkpoints to be usable on CPU. See inline documentation for why this separate script is needed. For example:

th convert_checkpoint_gpu_to_cpu.lua gpu_checkpoint.t7

write the file gpu_checkpoint.t7_cpu.t7, which you can now run with -gpuid -1 in the eval script.

License

BSD License.

Acknowledgements

Parts of this code were written in collaboration with my labmate Justin Johnson.

I'm very grateful for NVIDIA's support in providing GPUs that made this work possible.

I'm also very grateful to the maintainers of Torch for maintaining a wonderful deep learning library.

neuraltalk2's People

Contributors

Stargazers

Watchers

Forkers

ml-lab peratham lisabug milesqli snazz2001 aminyl zchengquan robertogemartin paulhendricks benjamesbabala yiiwood deepcompute giserh nguyenductung hson648 bipul21 seragentp yin-shane-xia brando3141 ctozlm yzli jmrinaldi ezhangle caomw aihgf satwantkumar acourtney2015 mkolod claudiouzelac ilovejs exercises sanchitaggarwal ghosthamlet ldbk kustomzone alexeyspiridonov blixt syrenity zdx3578 zerkh selimam anilcs13m masptj rszeto gragtah justfathi ersinpw mmasson01 igorpavlovic verrol drjova fmassa negashev santaklouse mtal rawmx isnyaga badmazafaka neraloth jots rtweiss takeshineshiro carolusian jkryanchou hungrxyz mbijon falkirks fedorajzf udaychettiar ipv1337 klayn24 herenow atc-it noscripter calinrada adrian907 nikolaiivanov fib1123 handsomeko ecneladis sanuj wojohowitz00 kevinwenya takaf51 kod3r cvf55 rcolomina golv1974 makeideashappen sunsocool ycfx jacopofar takhs91 socibo hudvin littlecherry11 mithul qnix zkiran jiangwaniot

neuraltalk2's Issues

prepro.py crashes on Unicode captions

Expanding my tag/image dataset further from Danbooru, my preprocessing step began to crash with the error:

Traceback (most recent call last):  File "prepro.py", line 241, in <module>
main(params)   File "prepro.py", line 162, in main
prepro_captions(imgs)   File 
"prepro.py", line 43, in prepro_captions
txt = str(s).lower().translate(None, string.punctuation).strip().split() UnicodeEncodeError: 'ascii' codec can't encode 
character u'\xd7' in position 21: ordinal not in range(128)

While no useful information is printed about which tag/JSON entry caused the problem, my guess is that one of the tags has some Unicode in it (probably a Japanese word or emoji) and neuraltalk2/prepro.py, like char-rnn, makes an ASCII-only assumption.

Using the first suggestion I found on StackOverflow, I tried tossing in some sort of iconv-like conversion step which renders Unicode in a longer ASCII form (I think that's what it does, anyway):

@@ -34,13 +34,13 @@ import numpy as np
 from scipy.misc import imread, imresize

 def prepro_captions(imgs):
-  
+
   # preprocess all the captions
   print 'example processed tokens:'
   for i,img in enumerate(imgs):
     img['processed_tokens'] = []
     for j,s in enumerate(img['captions']):
-      txt = str(s).lower().translate(None, string.punctuation).strip().split()
+      txt = s.encode('ascii', errors='backslashreplace').lower().translate(None, string.punctuation).strip().split()
       img['processed_tokens'].append(txt)
       if i < 10 and j == 0: print txt

Seems to work. Maybe some version of that could be added?

System took control over my computer, and wants world domination

Should I reboot it?

size mismatch, m1: [1 x 512], m2: [25088 x 4096]

When running code on image:

http://cs.stanford.edu/people/karpathy/neuraltalk2/imgs/img1.jpg

(from demo website)
with a CPU-pretrained model from

http://cs.stanford.edu/people/karpathy/neuraltalk2/checkpoint_v1_cpu.zip

the following error shows up.

th eval.lua -gpuid -1 -model model_id1-501-1448236541.t7_cpu.t7 -image_folder img/ -num_images -1
DataLoaderRaw loading images from folder:   img/    
listing all images in directory img/    
Image added: img/img1.jpg   
DataLoaderRaw found 1 images    
constructing clones inside the LanguageModel    
/Users/tt/torch/install/bin/luajit: /Users/tt/torch/install/share/lua/5.1/nn/Linear.lua:53: size mismatch, m1: [1 x 512], m2: [25088 x 4096] at /tmp/luarocks_torch-scm-1-8011/torch7/lib/TH/generic/THTensorMath.c:706
stack traceback:
    [C]: in function 'addmm'
    /Users/tt/torch/install/share/lua/5.1/nn/Linear.lua:53: in function 'updateOutput'
    /Users/tt/torch/install/share/lua/5.1/nn/Sequential.lua:29: in function 'forward'
    eval.lua:121: in function 'eval_split'
    eval.lua:173: in main chunk
    [C]: in function 'dofile'
    ...s/tt/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
    [C]: at 0x0102587340

I'm on Mac OS X 10.10.5.

It seems that there is some mismatch between the training weights and output propagated through the net.

Out of memory error when finetuning is enabled

Hello,

I am experiencing an error when I try to train a model with finetuning either from a previously saved checkpoint or from scratch.

What I did:

Stage 1 (success)

$ th train.lua -input_h5 coco/cocotalk.h5 -input_json coco/cocotalk.json -checkpoint_path chkp -language_eval 1

This trained a model on COCO with no finetuning. I got cider ~0.6 after 150k+ iters so I saved the checkpoint elsewhere (e.g. saved_checkpoints/coco_initial.t7) to continue with finetuning.

Stage 2 (failure)

$ th train.lua -input_h5 coco/cocotalk.h5 -input_json coco/cocotalk.json -checkpoint_path chkp -language_eval 1 -finetune_cnn_after 0 -start_from saved_checkpoints/coco_initial.t7

Tried to enable finetuning and load the previously saved checkpoint, but what I got was:

wrote json checkpoint to chkp/model_id.json
/home/cybernaut/torch/install/bin/luajit: ./misc/optim_updates.lua:65: cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-6112/cutorch/lib/THC/THCStorage.cu:44
stack traceback:
[C]: in function 'new'
./misc/optim_updates.lua:65: in function 'adam'
train.lua:387: in main chunk
[C]: in function 'dofile'
...naut/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk

[C]: at 0x00405e40

The GPU I used: GeForce GTX 860M with 4Gb of memory
Installed Cuda7 and cudnn R3
Installed all prementioned torch packages in readme of neuraltalk2

Any hints or ideas, would be much appreciated!
Thank you!

Live camera feed from lua-camera not working

Hi im using the CPU model from your Readme and lua-camera to get the image from the camera to process instead of reading an image from the folder. This method always gives me
"a view of a UNK UNK in a cloudy sky" as the caption"
I did this by

1. Initializing at the start of the program

   require 'camera'
   cam = image.Camera(0)

2. And making the following change to the eval_script function

 -- local data = loader:getBatch{batch_size = opt.batch_size, split = split, seq_per_img = opt. seq_per_img} #commented to prevent file load from folder
 img = cam:forward()
 data={images=nil,infos=nil}
 data.images=torch.ByteTensor(1, 3, img:size(2), img:size(3))
 data.images[1]=img
 ... --some lines in between
 for k=1,#sents do
   print(sents[k]) --removed everything else
 end
 break --since im using only 1 frame
 end --end loop

3. Loop the function

while true do
    loss, split_predictions, lang_stats = eval_split(opt.split, {num_images = opt.num_images})
end

However the following method works(gives me slightly relevant captions)

1. Loop the function by adding the following line

while true do
  local folder="/home/mithul/torch/projects/neuraltalk2/cam"
  local imfile=folder.."/image.jpeg"
  os.execute("streamer -f jpeg -s 1280x720 -o "..imfile)
    load_data()
    loss, split_predictions, lang_stats = eval_split(opt.split, {num_images = opt.num_images})
    print('loss: ', loss)
end

2. Run eval.lua with "/home/mithul/torch/projects/neuraltalk2/cam" as the image folder

However this method causes the camera to be switched on and off by the OS for each image causing a delay, and also re-initialize the loader for every frame.

I would like to know why the first method does not work while the second method works.

Error when processing training images

Edit: See @susemeee's comment below (the image COCO_train2014_000000167126.jpg is corrupted, and you can download a replacement at https://msvocds.blob.core.windows.net/images/262993_z.jpg)

I was trying to run prepro.py but eventually ran into an issue in scipy's pilutil package (see below).

I've installed all dependencies, run the coco_preprocess.ipynb, and downloaded train2014.zip + val2014.zip and extracted them into coco/images.

Am I missing something?

$ python prepro.py --input_json coco/coco_raw.json --num_val 5000 --num_test 5000 --images_root coco/images --word_count_threshold 5 --output_json coco/cocotalk.json --output_h5 coco/cocotalk.h5
parsed input parameters:
{
  "output_json": "coco/cocotalk.json",
  "images_root": "coco/images",
  "input_json": "coco/coco_raw.json",
  "word_count_threshold": 5,
  "max_length": 16,
  "output_h5": "coco/cocotalk.h5",
  "num_test": 5000,
  "num_val": 5000
}
example processed tokens:
['a', 'woman', 'riding', 'a', 'bike', 'down', 'a', 'bike', 'trail']
... lots of info deleted for brevity ...
inserting the special UNK token
assigned 5000 to val, 5000 to test.
encoded captions to array of size  (616767, 16)
processing 0/123287 (0.00% done)
... lots of percentages deleted for brevity ...
processing 60000/123287 (48.67% done)
Traceback (most recent call last):
  File "prepro.py", line 236, in <module>
    main(params)
  File "prepro.py", line 186, in main
    Ir = imresize(I, (256,256))
  File "/usr/local/lib/python2.7/site-packages/scipy/misc/pilutil.py", line 424, in imresize
    im = toimage(arr, mode=mode)
  File "/usr/local/lib/python2.7/site-packages/scipy/misc/pilutil.py", line 234, in toimage
    raise ValueError("'arr' does not have a suitable array shape for "
ValueError: 'arr' does not have a suitable array shape for any mode.

Getting error CUDNN_STATUS_NOT_INITIALIZED

Hi,

Followed the guide to the T, but when trying to launch the eval with GPU checkpoint, I'm getting the following error:

ubuntu@ip-172-31-12-54:~/neuraltalk2$ th eval.lua -model ../models/model_id1-501-1448236541.t7 -image_folder ../samples/
DataLoaderRaw loading images from folder:       ../samples/
listing all images in directory ../samples/
DataLoaderRaw found 4 images
constructing clones inside the LanguageModel
/home/ubuntu/torch/install/bin/luajit: /home/ubuntu/torch/install/share/lua/5.1/cudnn/init.lua:45: Error in CuDNN: CUDNN_STATUS_NOT_INITIALIZED
stack traceback:
        [C]: in function 'error'
        /home/ubuntu/torch/install/share/lua/5.1/cudnn/init.lua:45: in function 'getHandle'
        /home/ubuntu/torch/install/share/lua/5.1/cudnn/init.lua:53: in function 'errcheck'
        ...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:39: in function 'resetWeightDescriptors'
        ...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:338: in function 'updateOutput'
        /home/ubuntu/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
        eval.lua:121: in function 'eval_split'
        eval.lua:173: in main chunk
        [C]: in function 'dofile'
        ...untu/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
        [C]: at 0x00406670

Any idea?

libcudnn.dylib not found

Hello,

I've tried to make the image description work on Mac OSX ElCapitan, but libcudnn.dylib is still not found, whereas LD_LIBRARY_PATH is defined :

➜  neuraltalk2 git:(master) ✗ th eval.lua -model ./model_id1-501-1448236541.t7 -image_folder ./pic -num_images 17
/Users/olivier/torch/install/share/lua/5.1/cudnn/ffi.lua:574: dlopen(libcudnn.dylib, 5): image not found    
/Users/olivier/torch/install/bin/luajit: /Users/olivier/torch/install/share/lua/5.1/trepl/init.lua:383: /Users/olivier/torch/install/share/lua/5.1/cudnn/ffi.lua:577: 'libcudnn.so not found in library path.
Please install CuDNN from https://developer.nvidia.com/cuDNN
Then make sure all the files named as libcudnn.so* are placed in your library load path (for example /usr/local/lib , or manually add a path to LD_LIBRARY_PATH)

stack traceback:
    [C]: in function 'error'
    /Users/olivier/torch/install/share/lua/5.1/trepl/init.lua:383: in function 'require'
    eval.lua:60: in main chunk
    [C]: in function 'dofile'
    ...vier/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x01079cfbd0
➜  neuraltalk2 git:(master) ✗ echo $LD_LIBRARY_PATH                                                                 
/Developpements/torch/cuda/lib:/Users/olivier/torch/install/lib
➜  neuraltalk2 git:(master) ✗ ls -l /Developpements/torch/cuda/lib 
total 233008
-rwxr-xr-x@ 1 olivier  staff  60047144 20 nov 12:04 libcudnn.4.dylib
lrwxr-xr-x@ 1 olivier  staff        16 23 nov 03:30 libcudnn.dylib -> libcudnn.4.dylib
-rw-r--r--@ 1 olivier  staff  59245464 20 nov 12:04 libcudnn_static.a
➜  neuraltalk2 git:(master) ✗ ls -l /Users/olivier/torch/install/lib
total 56720
-rwxr-xr-x  1 olivier  staff   1898340 14 déc 11:58 libTH.dylib
-rwxr-xr-x  1 olivier  staff  26156832 14 déc 15:40 libTHC.dylib
-rwxr-xr-x  1 olivier  staff     35692 14 déc 11:58 libluaT.dylib
-rwxr-xr-x  1 olivier  staff    651772 14 déc 11:58 libluajit.dylib
-rwxr-xr-x  1 olivier  staff    118084 14 déc 12:00 libqlua.dylib
-rwxr-xr-x  1 olivier  staff    170428 14 déc 12:00 libqtlua.dylib
drwxr-xr-x  3 olivier  staff       102 14 déc 11:58 lua
drwxr-xr-x  3 olivier  staff       102 14 déc 11:58 luarocks

Is there any more setup to do ? Or any missing file ?

Thanks.

Training freezes and high use of virtual memory (ubuntu)

Hi!

I installed all the dependencies and I'm able to use the pre trained MS COCO network. As a first try to train my own network I created a base with only one image. I'm running on a g2.2xlarge instance on AWS (it has a NVIDIA GPU) and installed CUDA, cuDNN and everything else that was on the readme file. What happens when I run the "th thrain.lua .." command is that a process from luajit starts to use 100% of one CPU core and 36.7G of virtual memory. This process also can't be killed. I just have to reboot the machine from AWS console. Is this normal? I expected the training process to be fast based on the small number of images in my base.

Thanks for the help and congrats! The results I've seen until here are very impressive!

Readme claims the ipython notebook downloads the ms coco dataset, but it does not.

Possibly add instructions about how to put the dataset in the appropriate folder?

Bug on ARM hardware

Hi,

I am trying to run this code on a Raspberry Pi 2, as an evaluation of building smart cameras on limited hardware. Everything goes well at installation, though some of the python dependencies have to be apt-get installed instead of pip installed, but it "seems" to be OK.
Note the installation takes ~10hrs, so I was really eager to see the processing. Unfortunately, I run into:

scozannet@ubuntu:~/neural-networks/neuraltalk2$ th eval.lua -model ../model_id1-501-1448236541.t7_cpu.t7 -image_folder ../images -num_images 10 -batch_size 1 -gpuid -1
/home/scozannet/neural-networks/torch/install/bin/luajit: ...ural-networks/torch/install/share/lua/5.1/torch/File.lua:289: table index is nil
stack traceback:
...ural-networks/torch/install/share/lua/5.1/torch/File.lua:289: in function 'readObject'
...ural-networks/torch/install/share/lua/5.1/torch/File.lua:272: in function 'readObject'
...ural-networks/torch/install/share/lua/5.1/torch/File.lua:288: in function 'readObject'
...ural-networks/torch/install/share/lua/5.1/torch/File.lua:288: in function 'readObject'
...ural-networks/torch/install/share/lua/5.1/torch/File.lua:319: in function 'load'
eval.lua:69: in main chunk
[C]: in function 'dofile'
...orks/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x0000cff9

Any idea of what I am doing wrong? I have tried postfixing the image folder with /, changing number of images, batch size...
My image set is a copy of your demo images, stored as .jpg.

Let me know if you need anything to help find out the problem.
Many thanks in advance for your help,

Getting identical sequence all the time...

Hi Andrej,
actually I am training this model using some arbitrary dataset and arbitrary features. What I did is:
: remove the CNN layer and use a [Linear(4096, D); ReLu()] instead.

But for language model (lm.sample()) on different features, I always ended up with exact the same sequences. So my question is: is this normal for the first several thousands iterations or did I overlook something?

Thanks so much!

"attempt to concatenate local 'ext' (a nil value)"

After I finally got it to work this morning, now there is an error again and it refuses to work. Here's what I get:

$ th eval.lua -model /Users/username/Documents/NeuralTalk2/checkpoint_v1.t7_cpu.t7 -image_folder /Users/username/Documents/NeuralTalk2/images -num_images -1 -gpuid -1
DataLoaderRaw loading images from folder:   /Users/username/Documents/NeuralTalk2/images  
listing all images in directory /Users/username/Documents/NeuralTalk2/images  
DataLoaderRaw found 21 images 
constructing clones inside the LanguageModel  
/Users/username/torch/install/bin/luajit: /Users/username/torch/install/share/lua/5.1/image/init.lua:337: attempt to concatenate local 'ext' (a nil value)
stack traceback:
  /Users/username/torch/install/share/lua/5.1/image/init.lua:337: in function 'load'
  ./misc/DataLoaderRaw.lua:74: in function 'getBatch'
  eval.lua:115: in function 'eval_split'
  eval.lua:169: in main chunk
  [C]: in function 'dofile'
  ...username/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
  [C]: at 0x010bdd1f10

The strange thing (at least to me) is that it claims it finds 21 images, when there is actually only 20 in the images folder. Andybody knows what's wrong here?

Dependency issues. loadcaffe, protobuf, gcc etc.

Hi there.
Now, I am very inexperienced in all this all this shell/bash business, but I tried to get neuraltalk to work.
After I thought I had installed all the dependencies that I needed, I tried to run the provided command $ th eval.lua -model /path/to/model -image_folder /path/to/image/directory -num_images 10 , but before anything happens, I immediately get the following error in my Terminal.

/Users/username/torch/install/bin/luajit: cannot open eval.lua: No such file or directory
stack traceback:
    [C]: in function 'dofile'
    ...username/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x010a015f10

I was able to activate torch in the terminal and also to use luarocks to install the dependencies, so I thought I was doing something right, but no I have no idea at all, what I did wrong here. Is there a way for you to tell where I went wrong in the installation process (and explain it in idiot proof terminology, if possible ;) )?

Thanks!

OS X 10.8.5

Error when processing training images - No such file or directory IOError

Hi.

I have installed all the dependencies. I was trying to run prepro.py as mentioned in the documentation. I ran into an issue which I believe is different from issue #4 mentioned in the documentation.

Here are the contents of my coco folder after running the ipython tutorial:

>>> ls coco/*
coco/captions_train-val2014.zip  coco/coco_preprocess.ipynb  coco/coco_raw.json  coco/cocotalk.h5

coco/annotations:
captions_train2014.json  captions_val2014.json

coco/images:
captions_train2014.json  captions_val2014.json

When I run prepro.py I get the following error:

parsed input parameters:
{
  "output_json": "coco/cocotalk.json", 
  "images_root": "coco/images", 
  "input_json": "coco/coco_raw.json", 
  "word_count_threshold": 5, 
  "max_length": 16, 
  "output_h5": "coco/cocotalk.h5", 
  "num_test": 5000, 
  "num_val": 5000
}
example processed tokens:
['a', 'woman', 'riding', 'a', 'bike', 'down', 'a', 'bike', 'trail']
...
top words and their counts:
(1019751, 'a')
(224731, 'on')
...
(35371, 'woman')
total words: 6447836
number of bad words: 20059/29625 = 67.71%
number of words in vocab would be 9566
number of UNKs: 34543/6447836 = 0.54%
max length sentence in raw data:  49
sentence length distribution (count, number of words):
 0:          0   0.000000%
 1:          0   0.000000%
 ... 
 49:          4   0.000649%
inserting the special UNK token
assigned 5000 to val, 5000 to test.
encoded captions to array of size  (616767, 16)
Traceback (most recent call last):
  File "prepro.py", line 240, in <module>
    main(params)
  File "prepro.py", line 185, in main
    I = imread(os.path.join(params['images_root'], img['file_path']))
  File "/usr/local/lib/python2.7/dist-packages/scipy/misc/pilutil.py", line 154, in imread
    im = Image.open(name)
  File "/usr/lib/python2.7/dist-packages/PIL/Image.py", line 1955, in open
    fp = __builtin__.open(fp, "rb")
IOError: [Errno 2] No such file or directory: u'coco/images/train2014/COCO_train2014_000000152328.jpg'

Could someone please help me out? Am I missing something here?

Thanks

read error: read 0 blocks instead of 1

I ran this line of code to predict caption on some images.
th eval.lua -model /home/ubuntu/neuraltalk2/model/ -image_folder ./images/ -num_images 1
Using AWS server configured with Torch and Caffe.
Tried with mentioning the name of the model also.
/home/ubuntu/torch-distro/install/bin/luajit: ...ubuntu/torch-distro/install/share/lua/5.1/torch/File.lua:194: read error: read 0 blocks instead of 1 at /home/ubuntu/torch-distro/pkg/torch/lib/TH/THDiskFile.c:302
stack traceback:
[C]: in function 'readInt'
...ubuntu/torch-distro/install/share/lua/5.1/torch/File.lua:194: in function 'readObject'
...ubuntu/torch-distro/install/share/lua/5.1/torch/File.lua:294: in function 'load'
eval.lua:68: in main chunk
[C]: in function 'dofile'
...rch-distro/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
[C]: at 0x00406670

Getting this error and have little idea of what this means.

Error in CuDNN: CUDNN_STATUS_NOT_SUPPORTED

Hi,
I have followed all the steps and installed all dependencies required for neuraltalk2.
while running th eval.lua

DataLoaderRaw loading images from folder: /home/anilil/caffe/examples/images/
listing all images in directory /home/anilil/caffe/examples/images/
DataLoaderRaw found 3 images
constructing clones inside the LanguageModel
/home/anilil/torch/install/bin/luajit: /home/anilil/torch/install/share/lua/5.1/cudnn/init.lua:58: Error in CuDNN: CUDNN_STATUS_NOT_SUPPORTED
stack traceback:
[C]: in function 'error'
/home/anilil/torch/install/share/lua/5.1/cudnn/init.lua:58: in function 'errcheck'
...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:385: in function 'updateOutput'
/home/anilil/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
eval.lua:121: in function 'eval_split'
eval.lua:173: in main chunk
[C]: in function 'dofile'
...ilil/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670

I already tried the #19 and the #20

I already installed the latest version of Cudnn.

Issue when running on Jetson TX1

Good afternoon,

I'm trying to get Neuraltalk to run on a Jetson TX1.

I successfully managed to install Torch and all other dependencies listed in the main page, however I get this error when trying to run the command:
th eval.lua -model /path/to/model -image_folder /path/to/image/directory -num_images 10

(of course all paths have been replaced with the correct path, I'm using the model provided)

/usr/local/bin/luajit: /usr/local/share/lua/5.1/torch/File.lua:317: table index is nil
stack traceback:
/usr/local/share/lua/5.1/torch/File.lua:317: in function 'readObject'
/usr/local/share/lua/5.1/nn/Module.lua:154: in function 'read'
/usr/local/share/lua/5.1/torch/File.lua:298: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:316: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:316: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:347: in function 'load'
eval.lua:69: in main chunk
[C]: in function 'dofile'
/usr/local/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x0000d055

Any help would be welcome

Regards

train.lua error , cuda runtime error (2) : out of memory

Hi, all
I tried to train network on MSCOCO, i downloaded the dataset, and then run prepro.py ,and there were cocotalk.h5 and cocotalk.json under ./coco filefolder.

Then i tried to run the script:

$ th train.lua -input_h5 coco/cocotalk.h5 -input_json coco/cocotalk.json

And i get the next error message:

/home/liuchang/torch/install/bin/luajit: ./misc/optim_updates.lua:65: cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-6827/cutorch/lib/THC/generic/THCStorage.cu:40
stack traceback:
[C]: in function 'new'
./misc/optim_updates.lua:65: in function 'adam'
/home/liuchang/neuraltalk2/train.lua:375: in main chunk
[C]: in function 'dofile'
...hang/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406620

My config:

os: ubuntu 14.04, 64bit
gpu: getforce gtx745, 4g
cuda: 7.0
cudnn:cudnn-7.0-linux-x64-v3.0-prod.tgz

I use Zerobrane to debug the train.lua step by step, and the error occured at line 387, when my GPU memory is occupied by 99%.

I wonder if my GPU memory is too small to train on MSCOCO ？

Error when running eval.lua ("attempt to index field 'tensorOutput' (a nil value)")

Hello,

I've just downloaded the project out of curiosity, and followed the setup instructions. But note that I have no idea of what I'm doing :)

I'm running on a problem when running:
$ th eval.lua -model ./model_id1-501-1448236541.t7_cpu.t7 -image_folder ./sample-images -num_images 1 -gpuid -1

Heres my setup:
Machine: Macbook Pro, late 2011, 2.4 GHz Intel Core i5, 2.4 GHz Intel Core i5, OS X El Capitan 10.11.1
Model: I'm using the provided cpu checkpoint model in the README.
Image in folder: http://25.media.tumblr.com/Jjkybd3nSab3wr6cd1T33jjw_500.jpg

Error output:

$ th eval.lua -model ./model_id1-501-1448236541.t7_cpu.t7 -gpuid -1 -image_folder ./sample-images
DataLoaderRaw loading images from folder:       ./sample-images
listing all images in directory ./sample-images
DataLoaderRaw found 1 images
constructing clones inside the LanguageModel
/Users/***/torch/install/bin/luajit: /Users/***/torch/install/share/lua/5.1/nn/Identity.lua:13: attempt to index field 'tensorOutput' (a nil value)
stack traceback:
        /Users/***/torch/install/share/lua/5.1/nn/Identity.lua:13: in function 'func'
        /Users/***/torch/install/share/lua/5.1/nngraph/gmodule.lua:252: in function 'neteval'
        /Users/***/torch/install/share/lua/5.1/nngraph/gmodule.lua:287: in function 'forward'
        ./misc/LanguageModel.lua:266: in function 'sample'
        eval.lua:134: in function 'eval_split'
        eval.lua:169: in main chunk
        [C]: in function 'dofile'
        ...***/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
        [C]: at 0x0102a00bd0

I've also partially ran the tests in test_language_model.lua, and they pass, although I've commented out the cudaApiForwardTest.

Let me know if there is anything else I can provide.

Adding notes on memory requirements?

I'm going to test this on a real computer tomorrow, but testing today on the 2GB GPU on my laptop I get an out of memory error with the 600MB pre-trained model.

I tried shutting everything else down in hope that 2GB was almost enough to run the model, but it doesn't seem to help (or even change the error message).

I tried running off the CPU using combinations of -gpuid -1 and -backend nn but i get different errors. Here are all the errors, in order:

kyle@kyle ~/D/L/neuraltalk2 (master)> th eval.lua -model models/checkpoint_v1.t7 -image_folder images/
DataLoaderRaw loading images from folder:   images/ 
listing all images in directory images/ 
DataLoaderRaw found 8 images    
/Users/kyle/torch/install/bin/luajit: ...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:99: cuda runtime error (2) : out of memory at /Users/kyle/torch/extra/cutorch/lib/THC/THCStorage.cu:44
stack traceback:
    [C]: in function 'resizeAs'
    ...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:99: in function 'createIODescriptors'
    ...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:339: in function 'updateOutput'
    /Users/kyle/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
    eval.lua:115: in function 'eval_split'
    eval.lua:163: in main chunk
    [C]: in function 'dofile'
    ...kyle/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x010b4892d0
kyle@kyle ~/D/L/neuraltalk2 (master) [1]> th eval.lua -backend nn -model models/checkpoint_v1.t7 -image_folder images/
/Users/kyle/torch/install/bin/luajit: /Users/kyle/torch/install/share/lua/5.1/torch/File.lua:262: unknown Torch class <cudnn.SpatialConvolution>
stack traceback:
    [C]: in function 'error'
    /Users/kyle/torch/install/share/lua/5.1/torch/File.lua:262: in function 'readObject'
    /Users/kyle/torch/install/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /Users/kyle/torch/install/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /Users/kyle/torch/install/share/lua/5.1/torch/File.lua:272: in function 'readObject'
    /Users/kyle/torch/install/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /Users/kyle/torch/install/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /Users/kyle/torch/install/share/lua/5.1/torch/File.lua:319: in function 'load'
    eval.lua:64: in main chunk
    [C]: in function 'dofile'
    ...kyle/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x010f3862d0
kyle@kyle ~/D/L/neuraltalk2 (master) [1]> th eval.lua -gpuid -1 -model models/checkpoint_v1.t7 -image_folder images/
/Users/kyle/torch/install/bin/luajit: /Users/kyle/torch/install/share/lua/5.1/torch/File.lua:262: unknown Torch class <torch.CudaTensor>
stack traceback:
    [C]: in function 'error'
    /Users/kyle/torch/install/share/lua/5.1/torch/File.lua:262: in function 'readObject'
    /Users/kyle/torch/install/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /Users/kyle/torch/install/share/lua/5.1/torch/File.lua:272: in function 'readObject'
    /Users/kyle/torch/install/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /Users/kyle/torch/install/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /Users/kyle/torch/install/share/lua/5.1/torch/File.lua:319: in function 'load'
    eval.lua:64: in main chunk
    [C]: in function 'dofile'
    ...kyle/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x010e5a42d0

Using OpenCL instead of CUDA

Hi!

I have mac with an amd video card and I can´t use CUDA, there is any posibility to port the project to use OpenCL wich works with intel, amd and nvidia cards?

val_images_use option error

In the description, it says -1 = all, but it's not considered in the code.

"Camera Dropped Frame"

I am trying to locate the string "Camera Dropped Frame" as I am using text-to-speech on NeuralTalk2, and I do not need the "Camera Dropped Frame" text but doing a grep of both the library on Github and on my machine, I couldn't locate this string, making me think this may be a language thing. How do I remove this string? @karpathy

COCO images and Torch's image reader

I tried to run your pretrained model on COCO validation set. It didn't work and I figured out that some images in COCO are png, although they have .jpg extension. This confuses Torch's image reader: not a JPEG file. OpenCV doesn't have this problem because it detects the image format using the header, not the filename.

Did you encounter this when working on COCO data? If so, what processing did you do?

libcudnn.so* not found

Thanks for this great code. I tried to follow your README instructions as religiously as possible. When I tried running the eval.lua script, I came up with

User@User:~/Documents/neuraltalk2$ th eval.lua -model ../KarpathyNN/model_id1-501-1448236541.t7 -image_folder ../MS-CoCo/test2014/ -num_images 10
/usr/local/share/lua/5.1/cudnn/ffi.lua:574: libcudnn.so: cannot open shared object file: No such file or directory
/usr/local/bin/luajit: /usr/local/share/lua/5.1/trepl/init.lua:363: /usr/local/share/lua/5.1/cudnn/ffi.lua:577: 'libcudnn.so not found in library path.
Please install CuDNN from https://developer.nvidia.com/cuDNN
Then make sure all the files named as libcudnn.so* are placed in your library load path (for example /usr/local/lib , or manually add a path to LD_LIBRARY_PATH)

stack traceback:
[C]: in function 'error'
/usr/local/share/lua/5.1/trepl/init.lua:363: in function 'require'
eval.lua:59: in main chunk
[C]: in function 'dofile'
/usr/local/lib/luarocks/rocks/trepl/scm-1/bin/th:133: in main chunk
[C]: at 0x00406260

But I have libcudnn.so* files installed as locate libcudnn gives

/home/User/Documents/cuda/lib64/libcudnn.so
/home/User/Documents/cuda/lib64/libcudnn.so.7.0
/home/User/Documents/cuda/lib64/libcudnn.so.7.0.64
/home/User/Documents/cuda/lib64/libcudnn_static.a

So I export this path to my LD_LIBRARY_PATH as in

export LD_LIBRARY_PATH=/home/User/cuda:${LD_LIBRARY_PATH}

When I echo $LD_LIBRARY_PATH, I get

/home/User/cuda:/home/User/catkin_ws/devel/lib:/home/User/cuda/lib64:/home/User/cuda/lib64/home/User/catkin_ws/devel/lib:/home/User/catkin_ws/devel/lib/x86_64-linux-gnu:/opt/ros/indigo/lib/x86_64-linux-gnu:/usr/local/cuda-7.0/lib64:/opt/ros/indigo/lib

It appears libcudnn is now in the LD_LIBRARY_PATH. However, running again the eval script still produces

User@User:~/Documents/neuraltalk2$ th eval.lua -model ../KarpathyNN/model_id1-501-1448236541.t7 -image_folder ../MS-CoCo/test2014/ -num_images 10
/usr/local/share/lua/5.1/cudnn/ffi.lua:574: libcudnn.so: cannot open shared object file: No such file or directory
/usr/local/bin/luajit: /usr/local/share/lua/5.1/trepl/init.lua:363: /usr/local/share/lua/5.1/cudnn/ffi.lua:577: 'libcudnn.so not found in library path.
Please install CuDNN from https://developer.nvidia.com/cuDNN
Then make sure all the files named as libcudnn.so* are placed in your library load path (for example /usr/local/lib , or manually add a path to LD_LIBRARY_PATH)

stack traceback:
[C]: in function 'error'
/usr/local/share/lua/5.1/trepl/init.lua:363: in function 'require'
eval.lua:59: in main chunk
[C]: in function 'dofile'
/usr/local/lib/luarocks/rocks/trepl/scm-1/bin/th:133: in main chunk
[C]: at 0x00406260

I'm sorry for the bother but would appreciate any help.

Distributing training on several hosts

Does anyone know if it's possible to distribute the training on several hosts to reduce the time to run it?

I'm thinking Google Vision cannot run on a single host, and they must aggregate models from too many images. As a consequence, they must have means to grow a model from several simultaneous training systems. Could this work be ported to scale out?

Thanks,

Training on SBU dataset

Hi all, I noticed the current implementation was trained on MS-COCO dataset, whose captions are limited in size. I wonder if anyone has tried to train on bigger datasets, such as the SBU dataset (http://tlberg.cs.unc.edu/vicente/sbucaptions/). How should I initialize the network settings in this case? Thanks!

Car model? Motorcycle or truck? Alpr ?

It's possible to training for recognition car model? If is a Ferrari or a fiat?

Can i run neuraltalk2 MacbookAir without CUDA?...

First, i try to use CUDA but my macbook is not working on CUDA.
Under problem occur..
gimboseog-ui-MacBook-Air:neuraltalk2 JewelryKIM$ th eval.lua
/Users/JewelryKIM/torch/install/bin/luajit: ...rs/JewelryKIM/torch/install/share/lua/5.1/trepl/init.lua:383: cuda runtime error (35) : CUDA driver version is insufficient for CUDA runtime version at /tmp/luarocks_cutorch-scm-1-3737/cutorch/lib/THC/THCGeneral.c:16
stack traceback:
[C]: in function 'error'
...rs/JewelryKIM/torch/install/share/lua/5.1/trepl/init.lua:383: in function 'require'
eval.lua:58: in main chunk
[C]: in function 'dofile'
...yKIM/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x010adeebc0

i have install the CUDA version 7.5 but not delete the CUDA

cannot open eval.lua: No such file or directory

I've followed the setup till the cjson part (since I am only using CPU). However when I ran the following command I got the error. I am on Mac OSX 10.9.5

command
$ th eval.lua -model model_id1-501-1448236541.t7_cpu.t7 -image_folder ./img -num_images 1 -gpuid -1

error
../torch/install/bin/luajit: cannot open eval.lua: No such file or directory stack traceback: [C]: in function 'dofile' ../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x0105f2a1d0

Anyone can point me to the right direction? Thanks ;-)

Out of memory training with 6G memory GPU

Hi,

I got "out of memory" error when training with a Titan Z GPU which has 6GB memory:
th train.lua -input_h5 coco/cocotalk.h5 -input_json coco/cocotalk.json

The error messages are:
/blah/torch/install/bin/luajit: .../torch/install/share/lua/5.1/trepl/init.lua:363: /tmp/luarocks_cutorch-scm-1-8337/cutorch/lib/THC/THCTensorRandom.cu(20) : cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-8337/cutorch/lib/THC/THCGeneral.c:241
stack traceback:
[C]: in function 'error'
.../torch/install/share/lua/5.1/trepl/init.lua:363: in function 'require'
train.lua:79: in main chunk
[C]: in function 'dofile'
.../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
[C]: at 0x00406670

I've tried reducing batch_size to 1 and also reducing rnn_size and input_encoding_size but none of
them helped.

Any idea what I may have missed? Many thanks.

Getting error on CPU only mode

I followed all the tutorial for installing all the dependeces, I can´t install cutorch and cunn because I don´t have an nvidia card. I´m on OSX 10.11.1.

I downloaded the model and tried to use it like this:

th eval.lua -gpuid -1 -model model_id1-501-1448236541.t7_cpu.t7

But I get the following error:

DataLoader loading json file:   /scr/r6/karpathy/cocotalk.json
/Users/miguel/torch/install/bin/luajit: ./misc/utils.lua:17: attempt to index local 'file' (a nil value)
stack traceback:
    ./misc/utils.lua:17: in function 'read_json'
    ./misc/DataLoader.lua:10: in function '__init'
    /Users/miguel/torch/install/share/lua/5.1/torch/init.lua:91: in function </Users/miguel/torch/install/share/lua/5.1/torch/init.lua:87>
    [C]: in function 'DataLoader'
    eval.lua:84: in main chunk
    [C]: in function 'dofile'
    ...guel/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x01035bdbd0

What can I do?

RGB order?

I'm wondering if there is RGB->BGR conversion in the code?
It looks to me the imgs only goes cropping and RGB mean subtraction in the prepro.py.
Then the imgs are fed into VGGNet.

I guess this won't make too much difference on the results, but is it better to go through normal workflow, i.e., adding BGR conversion?

Thanks,
-Licheng

Image Sizes: 256 to 224 with augmentation?

I am trying to figure out why you resize the images to 256 and then perform the 'augmentation step' in ``net_utils.prepro` to take a random region of this image. Is there an inherent reason not to downsample to 224 in the first place?
If the subregion aspect is important (for overfitting I guess?), it should be performed before the downsampling.

I am working on a PR to make it more flexible for other sized images (particularly smaller ones) and will post it once this aspect is clear.

option -start_from in train.lua : what's the expected behaviour

Hi,

I have trained a model on the MSCOCO without any fine tuning. Then I wanted to add the fine tuning, so I did:

th train.lua -input_h5 /data/training/cocotalk.h5
-input_json /data/training/cocotalk.json
-cnn_proto /data/training/VGG_ILSVRC_16_layers_deploy.prototxt
-cnn_model /data/training/VGG_ILSVRC_16_layers.caffemodel
-checkpoint_path /data/model
-finetune_cnn_after 0
-start_from /data/model/model_id.t7
-batch_size 4

(small batch size as running with 4GB vRAM only)

As a result, a new file model_id.json was created and keeps growing (I am now at about 80k iterations).
How am I supposed to combine this new file with the original t7 file? Is it an expected behaviour?

Many thanks in advance,

CNN Feature Extraction Automatically Runs on Multiple GPUS?

It seems that the CNN feature extraction of images runs automatically on multiple GPUs, although I specify -gpuid 0. But the language model indeed runs on the specified GPU.

Can it work in another direction (from text to image)?

cuda runtime error (2) : out of memory at THCStorage.cu

Hi All,
10 days ago I managed to run the neuraltalk2 eval. Yesterday (29.12) I reinstalled the torch and the dependencies like cutorch. Since that i get the next error message when i try to run eval with the same parameters:

constructing clones inside the LanguageModel
/home/aron/torch/install/bin/luajit: /home/aron/torch/install/share/lua/5.1/torch/File.lua:298: cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-2709/cutorch/lib/THC/generic/THCStorage.cu:40
stack traceback:
[C]: in function 'read'
/home/aron/torch/install/share/lua/5.1/torch/File.lua:298: in function </home/aron/torch/install/share/lua/5.1/torch/File.lua:212>
[C]: in function 'read'
/home/aron/torch/install/share/lua/5.1/torch/File.lua:298: in function 'readObject'
/home/aron/torch/install/share/lua/5.1/torch/File.lua:316: in function 'readObject'
/home/aron/torch/install/share/lua/5.1/torch/File.lua:300: in function 'readObject'
/home/aron/torch/install/share/lua/5.1/torch/File.lua:316: in function 'readObject'
/home/aron/torch/install/share/lua/5.1/torch/File.lua:316: in function 'readObject'
/home/aron/torch/install/share/lua/5.1/torch/File.lua:316: in function 'readObject'
/home/aron/torch/install/share/lua/5.1/torch/File.lua:316: in function 'readObject'
/home/aron/torch/install/share/lua/5.1/torch/File.lua:316: in function 'readObject'
...
/home/aron/torch/install/share/lua/5.1/torch/File.lua:300: in function 'readObject'
/home/aron/torch/install/share/lua/5.1/torch/File.lua:316: in function 'readObject'
/home/aron/torch/install/share/lua/5.1/nngraph/gmodule.lua:402: in function 'read'
/home/aron/torch/install/share/lua/5.1/torch/File.lua:298: in function 'readObject'
/home/aron/torch/install/share/lua/5.1/nn/Module.lua:108: in function 'clone'
./misc/LanguageModel.lua:51: in function 'createClones'
eval.lua:96: in main chunk
[C]: in function 'dofile'
...aron/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670

link missing

http://cs.stanford.edu/people/karpathy/neuraltalk2/demo.html gives a 404

The same as "attempt to concatenate local 'ext' (a nil value)" #11

Hi, I have completely the same issue with this even though I am running the latest resources that this issue has been patched. (I confirm #12 as well and there is no problem with my resources "DataLoaderRaw.lua".)
OSX 10.9.5, Mac book pro Retina, 15-inch, Late 2013 (no GPU). It seems caused by the invisible files though, there is no invisible files in the directory of /img/ where the objective images are stored.

I have been trying to run "neuraltalk2" somehow and it seems to be getting close, so please somebody points out what is the cause of this problem.

[The error is the below (exactly the same though)]

$ th eval.lua -model /Users/usrname/neuraltalk2/model_id1-501-1448236541.t7_cpu.t7 -image_folder /Users/usrname/neuraltalk2/img/ -num_images 10 -gpuid -1
DataLoaderRaw loading images from folder: /Users/usrname/neuraltalk2/img/
listing all images in directory /Users/usrname/neuraltalk2/img/
DataLoaderRaw found 14 images
constructing clones inside the LanguageModel
/Users/usrname/torch/install/bin/luajit: /Users/usrname/torch/install/share/lua/5.1/image/init.lua:346: attempt to concatenate local 'ext' (a nil value)
stack traceback:
/Users/usrname/torch/install/share/lua/5.1/image/init.lua:346: in function 'load'
./misc/DataLoaderRaw.lua:82: in function 'getBatch'
eval.lua:116: in function 'eval_split'
eval.lua:173: in main chunk
[C]: in function 'dofile'
...s/usrname/torch/install/lib/luarocusrname/rocusrname/trepl/scm-1/bin/th:131: in main chunk
[C]: at 0x010e9f77b0

[/img/ ls -la]
$ ls -la
total 2088
drwxr-xr-x 14 usrname staff 476 12 5 16:38 .
drwxr-xr-x 19 usrname staff 646 12 5 16:38 ..
-rw-r--r--@ 1 usrname staff 102769 12 4 23:01 1.jpg
-rw-r--r--@ 1 usrname staff 112326 12 4 22:58 10.jpg
-rw-r--r--@ 1 usrname staff 158805 12 4 22:57 11.jpg
-rw-r--r--@ 1 usrname staff 29750 12 4 22:58 12.jpg
-rw-r--r--@ 1 usrname staff 47949 12 4 22:57 2.jpg
-rw-r--r--@ 1 usrname staff 52914 12 4 22:58 3.jpg
-rw-r--r--@ 1 usrname staff 35022 12 4 22:57 4.jpg
-rw-r--r--@ 1 usrname staff 141824 12 4 22:56 5.jpg
-rw-r--r--@ 1 usrname staff 128698 12 4 22:56 6.jpg
-rw-r--r--@ 1 usrname staff 185112 12 4 22:59 7.jpg
-rw-r--r--@ 1 usrname staff 29393 12 4 22:59 8.jpg
-rw-r--r--@ 1 usrname staff 18712 12 4 22:55 9.jpg

Inaccurate captioning of images

Hello,

Thank you for the excellent code and guide to run the code.
I was successful in running the eval.lua on a set of 12 images. But apart from a couple of images the prediction of caption on the images was inaccurate.
Why would this be happening? Also could I use another model? Could you point me to a better or more comprehensive model that would help increase the accuracy?

Thanks!

another "out of memory issue" when reading the pretained model

In this #3 post a memory error was already solved by adding the option -batch_size 1 but did not work for me on a 16 GB iMac with cuda installed (see output)

The problem already arises when reading in the pretained model:

require 'nn';
require 'cudnn';
require 'cunn';
net = torch.load('./models/checkpoint_v1.t7', 'binary')

/usr/local/share/lua/5.1/torch/File.lua:270: cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-8052/cutorch/lib/THC/THCGeneral.c:510
stack traceback:
    [C]: in function 'read'
    /usr/local/share/lua/5.1/torch/File.lua:270: in function </usr/local/share/lua/5.1/torch/File.lua:190>
    [C]: in function 'read'
    /usr/local/share/lua/5.1/torch/File.lua:270: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:272: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:272: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:319: in function 'load'
    [string "net = torch.load('checkpoint_v1.t7', 'binary'..."]:1: in main chunk
    [C]: in function 'xpcall'
    /usr/local/share/lua/5.1/itorch/main.lua:179: in function </usr/local/share/lua/5.1/itorch/main.lua:143>
    /usr/local/share/lua/5.1/lzmq/poller.lua:75: in function 'poll'
    /usr/local/share/lua/5.1/lzmq/impl/loop.lua:307: in function 'poll'
    /usr/local/share/lua/5.1/lzmq/impl/loop.lua:325: in function 'sleep_ex'
    /usr/local/share/lua/5.1/lzmq/impl/loop.lua:370: in function 'start'
    /usr/local/share/lua/5.1/itorch/main.lua:350: in main chunk
    [C]: in function 'require'

Any ideas?

$ th eval.lua -model ./models/checkpoint_v1.t7 -image_folder ./images
/usr/local/bin/luajit: /usr/local/share/lua/5.1/torch/File.lua:270: cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-8052/cutorch/lib/THC/THCStorage.cu:44
stack traceback:
    [C]: in function 'read'
    /usr/local/share/lua/5.1/torch/File.lua:270: in function </usr/local/share/lua/5.1/torch/File.lua:190>
    [C]: in function 'read'
    /usr/local/share/lua/5.1/torch/File.lua:270: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:272: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:272: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:319: in function 'load'
    eval.lua:63: in main chunk
    [C]: in function 'dofile'
    /usr/local/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x0101364c10

$ th eval.lua -backend nn -model models/checkpoint_v1.t7 -image_folder images/
/usr/local/bin/luajit: /usr/local/share/lua/5.1/torch/File.lua:262: unknown Torch class <cudnn.SpatialConvolution>
stack traceback:
    [C]: in function 'error'
    /usr/local/share/lua/5.1/torch/File.lua:262: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:272: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:319: in function 'load'
    eval.lua:63: in main chunk
    [C]: in function 'dofile'
    /usr/local/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x010edaec10

$ th eval.lua -gpuid -1 -model models/checkpoint_v1.t7 -image_folder images/
/usr/local/bin/luajit: /usr/local/share/lua/5.1/torch/File.lua:262: unknown Torch class <torch.CudaTensor>
stack traceback:
    [C]: in function 'error'
    /usr/local/share/lua/5.1/torch/File.lua:262: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:272: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:319: in function 'load'
    eval.lua:63: in main chunk
    [C]: in function 'dofile'
    /usr/local/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x0105d7dc10

$ th eval.lua -model ./models/checkpoint_v1.t7 -image_folder ./images -batch_size 1
/usr/local/bin/luajit: /usr/local/share/lua/5.1/torch/File.lua:270: cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-8052/cutorch/lib/THC/THCStorage.cu:44
stack traceback:
    [C]: in function 'read'
    /usr/local/share/lua/5.1/torch/File.lua:270: in function </usr/local/share/lua/5.1/torch/File.lua:190>
    [C]: in function 'read'
    /usr/local/share/lua/5.1/torch/File.lua:270: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:272: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:272: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:319: in function 'load'
    eval.lua:63: in main chunk
    [C]: in function 'dofile'
    /usr/local/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x010845ec10

File names / file paths to the .json file?

Hi there,
would it be possible to add a "JSON only" mode? For my purposes I only really need the .json file instead of the entire html structure. And I guess that version would be the more (or most?) basic outcome for such a script.

What I imagine:

Run the shell command using something like eval-json.lua instead of eval.lua.
Images don't get renamed and don't get copied.
A .json file will be placed in the images folder alongside the images.
In the .json file an entry could look like: {"caption":"my resulting caption","image_file":"myFilename.jpg"}

Is this feasible / desirable? For me it would make sense as it would be a more generic result of the script.

Checkpoint crash

Hi Andrej,

I'm trying to use NeuralTalk2 on my cpu (and so with the cpu checkpoint) using the following command:
th eval.lua -model model_id1-501-1448236541.t7_cpu.t7 -image_folder ../frames/ -num_images 10 -gpuid -1

It throws the following error:
ldb must be >= MAX(K,1): ldb=0 K=768BLAS error: Parameter number 11 passed to cblas_sgemm had an invalid value

On backtracing, I've narrowed down the source of the error to this line in eval.lua (line 134):
local seq = protos.lm:sample(feats, sample_opts)

Any pointers why this might be crashing?

Thanks a lot!

Error while running eval.lua

While evaluating images,

th eval.lua -model model_id1-501-1448236541.t7 -image_folder images/ -num_images -1

I get this error,

/home/ubuntu/torch-distro/install/bin/luajit: ...ubuntu/torch-distro/install/share/lua/5.1/torch/File.lua:277: unknown object
stack traceback:
    [C]: in function 'error'
    ...ubuntu/torch-distro/install/share/lua/5.1/torch/File.lua:277: in function 'readObject'
    ...ubuntu/torch-distro/install/share/lua/5.1/torch/File.lua:271: in function 'readObject'
    ...ubuntu/torch-distro/install/share/lua/5.1/torch/File.lua:257: in function 'readObject'
    ...ubuntu/torch-distro/install/share/lua/5.1/torch/File.lua:271: in function 'readObject'
    ...ubuntu/torch-distro/install/share/lua/5.1/torch/File.lua:271: in function 'readObject'
    ...ubuntu/torch-distro/install/share/lua/5.1/torch/File.lua:257: in function 'readObject'
    ...ubuntu/torch-distro/install/share/lua/5.1/torch/File.lua:271: in function 'readObject'
    ...ubuntu/torch-distro/install/share/lua/5.1/torch/File.lua:271: in function 'readObject'
    ...ubuntu/torch-distro/install/share/lua/5.1/torch/File.lua:294: in function 'load'
    eval.lua:68: in main chunk
    [C]: in function 'dofile'
    ...rch-distro/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
    [C]: at 0x00406670

Have little clue how to solve this. Any help would be great!

Thank you NeuralTalk2

Just came back this weekend from PennApps XIII (the largest college hackathon in the U.S.) to win the “Most Innovative Use of Embedded Systems” prize, beating over 1500 other student hackers!

VIA is a Visual Impairment Assistant, which aims to bring context back to the visually impaired, as distance information alone isn’t enough. We used NeuralTalk2 for the long-range context component (and rightfully credited the source code and explained that we didn't in fact write the neural engine).

Check out the project and our video demo at this Devpost link!

http://devpost.com/software/via-visual-impairment-assistant-17pg9d

Very excited to be one of the few winning teams in the biggest hackathon in the States.

Issues with CPU based training

Running this on a macbook with a 2GB RAM on the GPU was spitting out out of memory issues:

/torch/install/share/lua/5.1/torch/File.lua:270: cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-2702/cutorch/lib/THC/THCStorage.cu:44

So I attempted to run it with GPU as I have 16GB RAM available but when running the training with the following options I get errors...

‹master*› » th train.lua -gpuid -1 -backend nn -input_h5 coco/cocotalk.h5 -input_json coco/cocotalk.json 1 ↵
DataLoader loading json file: coco/cocotalk.json
vocab size is 9567
DataLoader loading h5 file: coco/cocotalk.h5
read 123287 images of size 3x256x256
max sequence length in data is 16
assigned 113287 images to split train
assigned 5000 images to split val
assigned 5000 images to split test
[libprotobuf ERROR google/protobuf/text_format.cc:274] Error parsing text-format caffe.NetParameter: 1:9: Message type "caffe.NetParameter" has no field named "require".
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message. If the message turns out to be larger than 1073741824 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 553432081
Successfully loaded model/VGG_ILSVRC_16_layers.caffemodel
conv1_1: 64 3 3 3
conv1_2: 64 64 3 3
conv2_1: 128 64 3 3
conv2_2: 128 128 3 3
conv3_1: 256 128 3 3
conv3_2: 256 256 3 3
conv3_3: 256 256 3 3
conv4_1: 512 256 3 3
conv4_2: 512 512 3 3
conv4_3: 512 512 3 3
conv5_1: 512 512 3 3
conv5_2: 512 512 3 3
conv5_3: 512 512 3 3
fc6: 1 1 25088 4096
fc7: 1 1 4096 4096
fc8: 1 1 4096 1000
converting first layer conv filters from BGR to RGB...
/torch/install/bin/luajit: bad argument #2 to '?' (too many indices provided at /torch/pkg/torch/generic/Tensor.c:929)

What am I doing wrong? Is it possible to train this model on a macbook w/2GB GPU or on CPU with 16 GB system memory available.

Instructions have problems

The instructions say to do the following:

luarocks install nn
luarocks install nngraph
luarocks install image

However, I cannot get the above packages through luarocks.

What should I do?

Thanks!

Dockerfile & image available for those interested

I have created a Dockerfile for amd64 (and working on the arm version). It's available on https://github.com/SaMnCo/docker-neuraltalk2
It's really early stage and only does the captioning, but if I see interest I'll add features.

The image is available on the Docker Hub (or will be once it's built) on https://hub.docker.com/r/samnco/neuraltalk2/

Hope you'll like it, thanks again for this amazing piece of work :)

karpathy / neuraltalk2 Goto Github PK

neuraltalk2's Introduction

NeuralTalk2

Requirements

For evaluation only

For training

I just want to caption images

I'd like to train my own network on MS COCO

I'd like to train on my own data

I'd like to distribute my GPU trained checkpoints for CPU

License

Acknowledgements

neuraltalk2's People

Contributors

Stargazers

Watchers

Forkers

neuraltalk2's Issues

Stage 1 (success)

$ th train.lua -input_h5 coco/cocotalk.h5 -input_json coco/cocotalk.json -checkpoint_path chkp -language_eval 1

Stage 2 (failure)

$ th train.lua -input_h5 coco/cocotalk.h5 -input_json coco/cocotalk.json -checkpoint_path chkp -language_eval 1 -finetune_cnn_after 0 -start_from saved_checkpoints/coco_initial.t7

[C]: at 0x00405e40

1. Initializing at the start of the program

2. And making the following change to the eval_script function

3. Loop the function

However the following method works(gives me slightly relevant captions)

1. Loop the function by adding the following line

2. Run eval.lua with "/home/mithul/torch/projects/neuraltalk2/cam" as the image folder

Recommend Projects

Recommend Topics

Recommend Org