Giter Site home page Giter Site logo

omcnn_2clstm's Introduction

OM-CNN+2C-LSTM for video salinecy prediction

The model of " DeepVS: A Deep Learning Based Video Saliency Prediction Approach " at ECCV2018.

Abstract

Over the past few years, deep neural networks (DNNs) have exhibited great success in predicting the saliency of images. However, there are few works that apply DNNs to predict the saliency of generic videos. In this paper, we propose a novel DNN-based video saliency prediction method. Specifically, we establish a large-scale eye-tracking database of videos (LEDOV), which provides sufficient data to train the DNN models for predicting video saliency. Through the statistical analysis of our LEDOV database, we find that human attention is normally attracted by objects, particularly moving objects or the moving parts of objects. Accordingly, we propose an object-to-motion convolutional neural network (OM-CNN) to learn spatio-temporal features for predicting the intra-frame saliency via exploring the information of both objectness and object motion. We further find from our database that there exists a temporal correlation of human attention with a smooth saliency transition across video frames. Therefore, we develop a two-layer convolutional long short-term memory (2C-LSTM) network in our DNN-based method, using the extracted features of OM-CNN as the input. Consequently, the inter-frame saliency maps of videos can be generated, which consider the transition of attention across video frames. Finally, the experimental results show that our method advances the state-of-the-art in video saliency prediction.

Publication

The extended pre-print version of our work is published in ECCV2018, one can cite with the Bibtex code:

@InProceedings{Jiang_2018_ECCV,
author = {Jiang, Lai and Xu, Mai and Liu, Tie and Qiao, Minglang and Wang, Zulin},
title = {DeepVS: A Deep Learning Based Video Saliency Prediction Approach},
booktitle = {The European Conference on Computer Vision (ECCV)},
month = {September},
year = {2018}
} 

Models

The whole architecture consists of two parts: OM-CNN and 2C-LSTM, which is shown below. The pre-trained model has already been uploaded to Google drive and BaiduYun.
For running the demo, please the model should be decompressed to the directory of ./model/pretrain/.

OM-CNN

2C-LSTM

Database

As introduced in our paper, our model is trained by our newly-established eye-tracking database, LEDOV, which is also available at Dropbox and BaiduYun

Usage

This model is implemented by tensorflow-gpu 1.0.0, and the detail of our computational environment is listed in 'requirement.txt'. Just run 'TestDemo.py' to see the saliency prediction results on a test video.

Visual Results

Some visual results of our model and ground-truth. visualresult

Contact

If any question, please contact [email protected] (or [email protected]), or use public issues section of this repository.

License

This code is distributed under MIT LICENSE.

Supplementary material

Link

omcnn_2clstm's People

Contributors

remega avatar mayanksingh1998 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.