Giter Site home page Giter Site logo

wenguanwang / dhf1k Goto Github PK

View Code? Open in Web Editor NEW
134.0 11.0 28.0 16.72 MB

Revisiting Video Saliency: A Large-scale Benchmark and a New Model (CVPR18, PAMI19)

MATLAB 100.00%
saliency attention-mechanism salient-object-detection fixation saliency-prediction visual-attention cvpr2018 cvpr cvpr18

dhf1k's People

Contributors

wenguanwang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dhf1k's Issues

Annotations for only 26 videos

After downloading the dataset, the annotation folder has annotations for only 26 videos. How can I get the annotations of remaining videos?

Testing setting of Hollywood2 dataset

Did you use the whole fixation points when training and testing Hollywood2 dataset?
Or, did you filter out some points? (e.x. filters out the points at image edge, ....)
Also, did you use the whole 884 test videos when testing Hollywood2?
How did you sync the fixation points with the video?

I am asking you this because I want to replicate the same result on Hollywood2 dataset.
Could you provide a more detailed information of your setting?
(I divided the videos by using shot bounds, as you mentioned)

discrepancy in exportdata_train and DHF1K fixation maps?

Hi, thanks for the nice dataset.
I want to recreate the fixation maps using the raw gaze records in exportdata_train folder released for DHF1K.

However, the fixation map obtained using record_mapping.m script and raw data from exportdata_train folder donot match the ones released in DHF1K.

For example:

  1. 0001.png: this is the fixation map for first frame of 001.AVI copied from: annotation/0001/fixation/0001.png

0001

  1. 0001_regenerated.png : I regenerated this fixation map using files from exportdata_train folder.

0001_regenerated

I used the record_mapping.m file after specifying appropriate paths and modifying line 22 and line 24.

Could you please help me understand what I might be missing?

For your reference, here is my copy of record_mapping.m file:

%This function is used for mapping the fixation record into the corresponding fixation maps.
screen_res_x = 1440;
screen_res_y = 900;

parent_dir = 'GIVE PATH TO PARENT DIRECTORY';

datasetFile1 = 'movie';
datasetFile = 'video';
gazeFile = 'exportdata_train';

videoFiles = dir(fullfile('./', datasetFile));
videoNUM = length(videoFiles)-2;
rate = 30;
  
full_vid_dir = [parent_dir, datasetFile, '/'];

 for videonum = 1:700
        videofolder =  videoFiles(videonum+2).name
        vidObj = VideoReader([full_vid_dir,videofolder]);
        options.infolder = fullfile( './', datasetFile,  videofolder, 'images' );
        % no need to read full video if I can use VideoReader to know
        % dimensions and duration of video
        % Cache all frames in memory
        %[data.frames,names,video_res_y,video_res_x,nframe ]= readAllFrames( options );
        nframe = vidObj.NumberOfFrames;
        video_res_x = vidObj.Width;
        video_res_y = vidObj.Height;
        a=video_res_x/screen_res_x;
        b=(screen_res_y-video_res_y/a)/2;
        all_fixation = zeros(video_res_y,video_res_x,nframe);
        for person = 1:17
            %modified the following line to match the video naming format
            txtloc = fullfile(parent_dir, gazeFile, sprintf('P%02d',person), [sprintf('P%02d_Trail',person), sprintf('%03d.txt',videonum)]);
            if exist(txtloc, 'file')
                %modified the following line to match the txt file format
                [time,model,trialnum,diax, diay, x_screen,y_screen,event]=textread(txtloc,'%f%s%f%f%f%f%f%s','headerlines',1);
                if size(time,1)
                    time = time-time(1);
                    event = cellfun(@(x) x(1), event);
                    for index = 1:nframe
                            eff = find( ((index-1)<rate*time/1000000)&(rate*time/1000000<index)&event=='F'); %framerate = 10;
                            x_stimulus=int32(a*x_screen(eff));
                            y_stimulus=int32(a*(y_screen(eff)-b));
                            t = x_stimulus<=0|x_stimulus>=video_res_x|y_stimulus<=0|y_stimulus>=video_res_y;
                            all_fixation(y_stimulus(~t),x_stimulus(~t),index) = 1;
                    end
                end
            end
        end 
end

modify generate_frame.m to python

import os
import cv2
from tqdm import tqdm

base_dir = '/home/simplew/dataset/sod/DHF1K'

video_dir = os.path.join(base_dir, 'video')

movies = [mov for mov in os.listdir(video_dir) if mov.endswith('.AVI')]
for movie in tqdm(movies):
    image_dir = os.path.join(base_dir , 'annotation', '0' + movie[:-4], 'images')
    os.makedirs(image_dir, exist_ok=True)
    
    # use opencv
    # cap = cv2.VideoCapture(f"{video_dir}/{movie}")
    # numFrames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
    # for k in range(numFrames):
    #     ret, frame = cap.read()
    #     cv2.imwrite(f"{image_dir}/{k+1:04}.png", frame)
    # cap.release()

    # ubuntu
    command = f"ffmpeg -i {video_dir}/{movie}  {image_dir}/%04d.png"
    os.system(command)

Dataset license

Hi,

Thanks for the great work. Could you please provide a license for the dataset?

question about testing AUC-shuffled

When using the evaluation code in this package, AUC-shuffled score is much lower than that reported in the paper on UCF dataset. I was wondering if it is anything wrong with the evaluation code, or if I missed some important details.

How are ground truth saliency maps generated from recorded fixations?

In your data collection you gather a set of discrete fixation maps (P in the paper). From this, continuous saliency maps (Q in the paper) are generated. I found no details about how this is done, could you elaborate? I would guess that it involves gaussians centered on the spot of fixation, I am interested in the exact parameters, how you combine fixations from different test subjects and so on.

Thanks again for providing the dataset!

Attributes for first 700 videos

Many thanks for your great work!
as far as I can see, DHF1k_attribute.xlsx only provides data for the 300 test videos. Could you also provide this kind of attribute data for the first 700 videos?
That would save me a lot of work and would be highly appreciated!

Is the audio presented to the viewer during fixation collection?

Hi, thanks for collecting such a valuable dataset!
Several things I want to clarify with you:

  1. I noticed they videos are with audios. Are the viewers accessible to them during the data collection?
  2. As for the 1000 video clips, are they the complete clips that you directly downloaded from YouTube or you randomly cut some them from the raw videos?

Many thanks!

UCF download link?

You mentioned in a previous issue :

Hi, all, the data of Hollywood-2 and UCF have been uploaded.

The code (ACLNet) and dataset (DHF1K with raw gaze records, UCF-sports are new added!) can be downloaded from:

Google disk:https://drive.google.com/open?id=1sW0tf9RQMO4RR7SyKhU8Kmbm4jwkFGpQ

Baidu pan: https://pan.baidu.com/s/110NIlwRIiEOTyqRwYdDnVg

The Hollywood-2 (74.6G) can be downloaded from:

Google disk:https://drive.google.com/open?id=1vfRKJloNSIczYEOVjB4zMK8r0k4VJuWk

Originally posted by @wenguanwang in #2 (comment)

Is there also a link for the UCF-sports dataset?

Regarding saliency metric (especially CC)

Hi,

First of all, thank you for providing such a nice dataset and code for evaluation metrics. I would like to evaluate my saliency results using five metrics that paper talked about.

I found that linear cross correlation can provide positive and negative values [-1,1]. So, when you present score for CC, did you take absolute value of each linear CC with respect to image and averaging it with each frame and each clip? Your MATLAB code doesn't do that, but that does make sense to me.

Look forward to your reply.

questions about the paths and files

Hi, thank you for your dataset and the source code, I wanna replicate this work with your code, but I am confused about the paths in config.py. I want to know what kinds of data has been used to train the model. In your paper Revisiting Video Saliency: A Large-scale Benchmark and a New Model ,you said that you have used the static dataset SALICON to train the attention module, and in your code there are several paths. Could you tell me:

  • which paths are the video dataset's path and which is for the SALICON? Do you mean that frames_path is all the frames extracted from the video, and imgs_path is for the data in SALICON?
  • do I need to extract all frames from the videos by myself?

related code are as follows:

# path of training videos
videos_train_paths = ['D:/code/attention/DHF1K/training/']
# path of validation videos
videos_val_paths = ['D:/code/attention/DHF1K/val/']
videos_test_path = 'D:/code/attention/DHF1K/testing/'

# path of training maps
maps_path = '/maps/'
# path of training fixation maps
fixs_path = '/fixation/maps/'

frames_path = '/images/'

# path of training images
imgs_path = 'D:/code/attention/staticimages/training/'

Thankyou.

package version

Which version of python packages did you employ for the ACLNet/Attentive CNN-LSTM Network? Thanks a lot.

Absent annotations

Thank you for providing the dataset.
After I unarchived the annotation it has data only for the first 700 videos. Is this intended?

The loss is nan.

Hi, I'm really interested in your work. And I used your training code -- 'ACL_full' to train my data. But during training, the loss always becomes NAN after several iterations:
53/100 [==============>...............] - ETA: 59s - loss: nan - time_distributed_15_loss: nan - time_distributed_16_loss: nan

I have tuned the base learning rate from 1e-4 to 1e-12, but the results are the same.

Do you know there are some solutions?

And what does the 'imgs_path' ('staticimages') in config.py mean?

Thanks very much!

About ACL.h5

Hi! Many thanks for your great work!

The ACL.h5 file could not be opened as a result of running the program.
Is it possible that this file is corrupt?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.