Giter Site home page Giter Site logo

aniloc111 / jeddi-net Goto Github PK

View Code? Open in Web Editor NEW

This project forked from visionlearninggroup/jeddi-net

0.0 0.0 0.0 24.34 MB

Implementation for "Joint Event Detection and Description in Continuous Video Streams"

License: MIT License

CMake 1.13% Makefile 0.27% Shell 0.40% Dockerfile 0.03% HTML 0.08% CSS 0.10% Jupyter Notebook 56.59% Python 5.35% MATLAB 0.36% C++ 33.10% Cuda 2.59%

jeddi-net's Introduction

Joint Event Detection and Description in Continuous Video Streams

Code released by Huijuan Xu (Boston University).

Introduction

We present the Joint Event Detection and Description Network (JEDDi-Net) that solves the dense captioning task in an end-to-end fashion. Our model continuously encodes the input video stream with three-dimensional convolutional layers, proposes variable-length temporal events based on pooled features, and transcribes the event proposals into captions with the consideration of visual and language context.

License

JEDDi-Net is released under the MIT License (refer to the LICENSE file for details).

Citing JEDDi-Net

If you find JEDDi-Net useful in your research, please consider citing:

@article{xu2019joint,
title={Joint Event Detection and Description in Continuous Video Streams},
  	author={Xu, Huijuan and Li, Boyang and Ramanishka, Vasili and Sigal, Leonid and Saenko, Kate},
journal={2019 IEEE Winter Conference on Applications of Computer Vision (WACV)},
    year={2019}
}

Contents

  1. Installation
  2. Preparation
  3. Training
  4. Testing

Installation:

  1. Clone the JEDDi-Net repository.

    git clone --recursive [email protected]:VisionLearningGroup/JEDDi-Net.git
  2. Build Caffe3d with pycaffe (see: Caffe installation instructions).

    Note: Caffe must be built with Python support!

cd ./caffe3d

# If have all of the requirements installed and your Makefile.config in place, then simply do:
make -j8 && make pycaffe
  1. Build JEDDi-Net lib folder.

    cd ./lib    
    make

Preparation:

  1. Download the ground truth annatations and videos in ActivityNet Captions dataset.

  2. Extract frames from downloaded videos in 25 fps.

  3. Generate the pickle data for training and testing JEDDi-Net model.

    cd ./preprocess
    # generate training data
    python generate_train_roidb_sorted.py
    # generate validation data
    python generate_val_roidb.py  

Training:

  1. Download the separately-trained segment proposal network(SPN) and captioning models ./pretrain/ .

  2. In JEDDi-Net root folder, run:

    bash ./experiments/denseCap_jeddiNet_end2end/script_train.sh

Testing:

  1. Download one sample JEDDi-Net model to ./snapshot/ .

    One JEDDi-Net model on ActivityNet Captions dataset is provided in: caffemodel .

    The provided JEDDi-Net model has the METEOR score ~8.58% on the validation set.

  2. In JEDDi-Net root folder, generate the prediction log file on the validation set.

    bash ./experiments/denseCap_jeddiNet_end2end/test/script_test.sh 
  3. Generate the results.json file from the prediction log file.

    cd ./experiments/denseCap_jeddiNet_end2end/test/
    bash bash.sh
  4. Follow the evaluation code to get the evaluation results.

jeddi-net's People

Contributors

huijuan88 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.