Giter Site home page Giter Site logo

omcnn_2clstm's Introduction

OM-CNN+2C-LSTM for video salinecy prediction

The model of " DeepVS: A Deep Learning Based Video Saliency Prediction Approach " at ECCV2018.

Abstract

Over the past few years, deep neural networks (DNNs) have exhibited great success in predicting the saliency of images. However, there are few works that apply DNNs to predict the saliency of generic videos. In this paper, we propose a novel DNN-based video saliency prediction method. Specifically, we establish a large-scale eye-tracking database of videos (LEDOV), which provides sufficient data to train the DNN models for predicting video saliency. Through the statistical analysis of our LEDOV database, we find that human attention is normally attracted by objects, particularly moving objects or the moving parts of objects. Accordingly, we propose an object-to-motion convolutional neural network (OM-CNN) to learn spatio-temporal features for predicting the intra-frame saliency via exploring the information of both objectness and object motion. We further find from our database that there exists a temporal correlation of human attention with a smooth saliency transition across video frames. Therefore, we develop a two-layer convolutional long short-term memory (2C-LSTM) network in our DNN-based method, using the extracted features of OM-CNN as the input. Consequently, the inter-frame saliency maps of videos can be generated, which consider the transition of attention across video frames. Finally, the experimental results show that our method advances the state-of-the-art in video saliency prediction.

Publication

The extended pre-print version of our work is published in ECCV2018, one can cite with the Bibtex code:

@InProceedings{Jiang_2018_ECCV,
author = {Jiang, Lai and Xu, Mai and Liu, Tie and Qiao, Minglang and Wang, Zulin},
title = {DeepVS: A Deep Learning Based Video Saliency Prediction Approach},
booktitle = {The European Conference on Computer Vision (ECCV)},
month = {September},
year = {2018}
} 

Models

The whole architecture consists of two parts: OM-CNN and 2C-LSTM, which is shown below. The pre-trained model has already been uploaded to Google drive and BaiduYun.
For running the demo, please the model should be decompressed to the directory of ./model/pretrain/.

OM-CNN

2C-LSTM

Database

As introduced in our paper, our model is trained by our newly-established eye-tracking database, LEDOV, which is also available at Dropbox and BaiduYun

Usage

This model is implemented by tensorflow-gpu 1.0.0, and the detail of our computational environment is listed in 'requirement.txt'. Just run 'TestDemo.py' to see the saliency prediction results on a test video.

Visual Results

Some visual results of our model and ground-truth. visualresult

Ablation

Ablation

To do

Our DeepVS2.0 will be upated soon.

Contact

If any question, please contact [email protected] (or [email protected]), or use public issues section of this repository.

License

This code is distributed under MIT LICENSE.

Supplementary material

Link

omcnn_2clstm's People

Contributors

remega avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

omcnn_2clstm's Issues

The output doesn't match with the input.

I found the output saliency map is 16 frames less than the input one. If my input is 192 frames , then the output will be 176 frames. I look through the paper and didn't found any helpful information. So I want to ask is that true?

cannot write the results

When I run the TestDemo.py, I can not obtain any output.

New video: animal_alpaca01 with 0 frames and size of (0, 0)
Total time for this video 0.000004

notfounderror

NotFoundError (see above for traceback): ./model/pretrain/LSTMconv_prefinal_loss05_dp075_075MC100-200000.data-00000-of-00001
[[Node: save/RestoreV2_5 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_5/tensor_names, save/RestoreV2_5/shape_and_slices)]]
[[Node: save/RestoreV2_112/_39 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_890_save/RestoreV2_112", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]]
the error is like this , tensorflow-- 1.0.0 python--2.7,
how to solve this

directions to train OMCNN correctly

Congrats for the excellent work and thank you for the well-documented public release of dataset!

Could you please provide the training code? It would be helpful to understand and reproduce some of the useful steps mentioned in your paper, such as CB dropout, etc.

hi

i can not run testdemo.py because the tensorflow 1.0 , i install tensorflow 2.9 , i run the TestDemo.py and the error is
AttributeError: module 'tensorflow' has no attribute 'contrib'

Training stage

Dear author,

Thank you very much for sharing your work, I am very interested in your work. I would be greatly appreciated if you can share your training demo, because i am not sure the training stages of OMCNN and 2c-lstm in the code. Thank you so much.

best regards.

How did you fine tune SalGAN to LEDOV ?

Thank you very much for publishing this work and sharing the models and dataset. I am one of the authors of SalGAN [28] and I was wondering how you fine-tuned SalGAN to this dataset. We developed it in Lasagne quite a long time ago now and it is not that obvious how to fine-tune it.

Could you please provide some details about how do perform this domain adaptation ?

Congrats for the nice work :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.