Giter Site home page Giter Site logo

vanthaiunghoa / densecap Goto Github PK

View Code? Open in Web Editor NEW

This project forked from linjieyangsc/densecap

0.0 1.0 0.0 108.63 MB

Dense captioning with joint inference and visual context

License: Other

CMake 2.17% Makefile 0.55% Jupyter Notebook 15.38% Python 12.95% Shell 0.45% C++ 63.03% Cuda 4.73% MATLAB 0.74%

densecap's Introduction

Dense Captioning with Joint Inference and Visual Context

This repo is the released code of dense image captioning models described in the CVPR 2017 paper:

 @InProceedings{CVPR17,
  author       = "Linjie Yang and Kevin Tang and Jianchao Yang and Li-Jia Li",
  title        = "Dense Captioning with Joint Inference and Visual Context",
  booktitle    = "IEEE Conference on Computer Vision and Pattern Recognition (CVPR)",
  month        = "Jul",
  year         = "2017"
}

All code is provided for research purposes only and without any warranty. Any commercial use requires our consent. When using the code in your research work, please cite the above paper. Our code is adapted from the popular Faster-RCNN repo written by Ross Girshick, which is based on the open source deep learning framework Caffe. The evaluation code is adapted from COCO captioning evaluation code.

Compiling

Compile Caffe

Please follow official guide. Support CUDA 7.5+, CUDNN 5.0+. Tested on Ubuntu 14.04.

Compile local libraries

cd lib
make

Demo

Download official sample model here. This model is the Twin-LSTM with late context fusion (fused by summation) described in the paper. To test the model, run the following command in the library root folder.

python ./lib/tools/demo.py --image [IMAGE_PATH] --gpu [GPU_ID] --net [MODEL_PATH]

It will generate a folder named "demo" in the library root. Inside the "demo" folder, there will be an HTML page showing the predicted results.

Training

Data preparation

For model training you will need to download the visual genome dataset from Visual Genome Website, either 1.0 or 1.2 is fine. Download pre-trained VGG16 model from link. Modify data paths in models/dense_cap/preprocess.py and run it from the library root to generate training/validation/testing data.

Start training

Run models/dense_cap/dense_cap_train.sh to start training. For example, to train a model with joint inference and visual context (late fusion, feature summation) on visual genome 1.0:

./models/dense_cap/dense_cap_train.sh [GPU_ID] visual_genome late_fusion_sum [VGG_MODEL_PATH] 

It typically takes 3 days to finish training. Note that due to the limitation of Python, multi-GPU training is not available for this library. In this library, we only provide Twin-LSTM structure for joint inference and late fusion (with three different fusion operators: summation, multiplication, concatenation) for context fusion. Other structures described in the paper can be easily implemented by adapting the existing code.

Evaluation

Modify models/dense_cap/dense_cap_test.sh according to the model you want to test. For example, if you want to test the provided sample model, it will look like this:

GPU_ID=0
NET_FINAL=models/dense_cap/dense_cap_late_fusion_sum.caffemodel
TEST_IMDB="vg_1.0_test"
PT_DIR="dense_cap"
time ./lib/tools/test_net.py --gpu ${GPU_ID} \
  --def_feature models/${PT_DIR}/vgg_region_global_feature.prototxt \
  --def_recurrent models/${PT_DIR}/test_cap_pred_context.prototxt \
  --def_embed models/${PT_DIR}/test_word_embedding.prototxt \
  --net ${NET_FINAL} \
  --imdb ${TEST_IMDB} \
  --cfg models/${PT_DIR}/dense_cap.yml \

The sample model will get an mAP of around 9.05. Except the model path(NET_FINAL), the only thing you should change is def_recurrent, which should be models/${PT_DIR}/test_cap_pred_no_context.prototxt for models without context information and models/${PT_DIR}/test_cap_pred_context.prototxt for models with context fusion. If you want to test late fusion models with other fusion operators, you need to modify test_cap_pred_context.prototxt. Change the "local_global_fusion" layer to eltwise multiplication or concatenation accordingly. To visualize the result, you can add --vis to the end of the above script. It will generate html pages for each image visualizing the results under folder output/dense_cap/${TEST_IMDB}/vis.

Contact

If you have any questions regarding the repo, please send email to Linjie Yang ([email protected]).

densecap's People

Contributors

blgene avatar bogger avatar cypof avatar dgolden1 avatar ducha-aiki avatar eelstork avatar erictzeng avatar flx42 avatar jamt9000 avatar jeffdonahue avatar jyegerlehner avatar kloudkl avatar longjon avatar lukeyeager avatar mavenlin avatar mohomran avatar mtamburrano avatar netheril96 avatar philkr avatar qipeng avatar rbgirshick avatar ronghanghu avatar sergeyk avatar sguada avatar shelhamer avatar ste-m5s avatar tnarihi avatar vsubhashini avatar yangqing avatar yosinski avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.