wenguanwang / dhf1k Goto Github PK

Revisiting Video Saliency: A Large-scale Benchmark and a New Model (CVPR18, PAMI19)

MATLAB 100.00%

saliency attention-mechanism salient-object-detection fixation saliency-prediction visual-attention cvpr2018 cvpr cvpr18

dhf1k's People

Contributors

Stargazers

Watchers

dhf1k's Issues

Gaussian blurring of fixation map of Hollywood2 and UCFSports

I checked that the fixation map of DHF1K is blurred by width 30 gaussian kernel.
Did you use the same width for the Hollywood2 and UCFSports dataset?

Annotations for only 26 videos

After downloading the dataset, the annotation folder has annotations for only 26 videos. How can I get the annotations of remaining videos?

Testing setting of Hollywood2 dataset

Did you use the whole fixation points when training and testing Hollywood2 dataset?
Or, did you filter out some points? (e.x. filters out the points at image edge, ....)
Also, did you use the whole 884 test videos when testing Hollywood2?
How did you sync the fixation points with the video?

I am asking you this because I want to replicate the same result on Hollywood2 dataset.
Could you provide a more detailed information of your setting?
(I divided the videos by using shot bounds, as you mentioned)

discrepancy in exportdata_train and DHF1K fixation maps?

Hi, thanks for the nice dataset.
I want to recreate the fixation maps using the raw gaze records in exportdata_train folder released for DHF1K.

However, the fixation map obtained using record_mapping.m script and raw data from exportdata_train folder donot match the ones released in DHF1K.

For example:

0001.png: this is the fixation map for first frame of 001.AVI copied from: annotation/0001/fixation/0001.png

0001_regenerated.png : I regenerated this fixation map using files from exportdata_train folder.

I used the record_mapping.m file after specifying appropriate paths and modifying line 22 and line 24.

Could you please help me understand what I might be missing?

For your reference, here is my copy of record_mapping.m file:

%This function is used for mapping the fixation record into the corresponding fixation maps.
screen_res_x = 1440;
screen_res_y = 900;

parent_dir = 'GIVE PATH TO PARENT DIRECTORY';

datasetFile1 = 'movie';
datasetFile = 'video';
gazeFile = 'exportdata_train';

videoFiles = dir(fullfile('./', datasetFile));
videoNUM = length(videoFiles)-2;
rate = 30;
  
full_vid_dir = [parent_dir, datasetFile, '/'];

 for videonum = 1:700
        videofolder =  videoFiles(videonum+2).name
        vidObj = VideoReader([full_vid_dir,videofolder]);
        options.infolder = fullfile( './', datasetFile,  videofolder, 'images' );
        % no need to read full video if I can use VideoReader to know
        % dimensions and duration of video
        % Cache all frames in memory
        %[data.frames,names,video_res_y,video_res_x,nframe ]= readAllFrames( options );
        nframe = vidObj.NumberOfFrames;
        video_res_x = vidObj.Width;
        video_res_y = vidObj.Height;
        a=video_res_x/screen_res_x;
        b=(screen_res_y-video_res_y/a)/2;
        all_fixation = zeros(video_res_y,video_res_x,nframe);
        for person = 1:17
            %modified the following line to match the video naming format
            txtloc = fullfile(parent_dir, gazeFile, sprintf('P%02d',person), [sprintf('P%02d_Trail',person), sprintf('%03d.txt',videonum)]);
            if exist(txtloc, 'file')
                %modified the following line to match the txt file format
                [time,model,trialnum,diax, diay, x_screen,y_screen,event]=textread(txtloc,'%f%s%f%f%f%f%f%s','headerlines',1);
                if size(time,1)
                    time = time-time(1);
                    event = cellfun(@(x) x(1), event);
                    for index = 1:nframe
                            eff = find( ((index-1)<rate*time/1000000)&(rate*time/1000000<index)&event=='F'); %framerate = 10;
                            x_stimulus=int32(a*x_screen(eff));
                            y_stimulus=int32(a*(y_screen(eff)-b));
                            t = x_stimulus<=0|x_stimulus>=video_res_x|y_stimulus<=0|y_stimulus>=video_res_y;
                            all_fixation(y_stimulus(~t),x_stimulus(~t),index) = 1;
                    end
                end
            end
        end 
end

想问下有训练代码吗？

modify generate_frame.m to python

import os
import cv2
from tqdm import tqdm

base_dir = '/home/simplew/dataset/sod/DHF1K'

video_dir = os.path.join(base_dir, 'video')

movies = [mov for mov in os.listdir(video_dir) if mov.endswith('.AVI')]
for movie in tqdm(movies):
    image_dir = os.path.join(base_dir , 'annotation', '0' + movie[:-4], 'images')
    os.makedirs(image_dir, exist_ok=True)
    
    # use opencv
    # cap = cv2.VideoCapture(f"{video_dir}/{movie}")
    # numFrames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
    # for k in range(numFrames):
    #     ret, frame = cap.read()
    #     cv2.imwrite(f"{image_dir}/{k+1:04}.png", frame)
    # cap.release()

    # ubuntu
    command = f"ffmpeg -i {video_dir}/{movie}  {image_dir}/%04d.png"
    os.system(command)

trained model

can you provide the pre-trained model?

Dataset license

Hi,

Thanks for the great work. Could you please provide a license for the dataset?

question about testing AUC-shuffled

When using the evaluation code in this package, AUC-shuffled score is much lower than that reported in the paper on UCF dataset. I was wondering if it is anything wrong with the evaluation code, or if I missed some important details.

How are ground truth saliency maps generated from recorded fixations?

In your data collection you gather a set of discrete fixation maps (P in the paper). From this, continuous saliency maps (Q in the paper) are generated. I found no details about how this is done, could you elaborate? I would guess that it involves gaussians centered on the spot of fixation, I am interested in the exact parameters, how you combine fixations from different test subjects and so on.

Thanks again for providing the dataset!

Attributes for first 700 videos

Many thanks for your great work!
as far as I can see, DHF1k_attribute.xlsx only provides data for the 300 test videos. Could you also provide this kind of attribute data for the first 700 videos?
That would save me a lot of work and would be highly appreciated!

Is the audio presented to the viewer during fixation collection?

Hi, thanks for collecting such a valuable dataset!
Several things I want to clarify with you:

I noticed they videos are with audios. Are the viewers accessible to them during the data collection?
As for the 1000 video clips, are they the complete clips that you directly downloaded from YouTube or you randomly cut some them from the raw videos?

Many thanks!

UCF download link?

You mentioned in a previous issue :

Hi, all, the data of Hollywood-2 and UCF have been uploaded.

The code (ACLNet) and dataset (DHF1K with raw gaze records, UCF-sports are new added!) can be downloaded from:

Google disk：https://drive.google.com/open?id=1sW0tf9RQMO4RR7SyKhU8Kmbm4jwkFGpQ

Baidu pan: https://pan.baidu.com/s/110NIlwRIiEOTyqRwYdDnVg

The Hollywood-2 (74.6G) can be downloaded from:

Google disk：https://drive.google.com/open?id=1vfRKJloNSIczYEOVjB4zMK8r0k4VJuWk

Originally posted by @wenguanwang in #2 (comment)

Is there also a link for the UCF-sports dataset?

Regarding saliency metric (especially CC)

Hi,

First of all, thank you for providing such a nice dataset and code for evaluation metrics. I would like to evaluate my saliency results using five metrics that paper talked about.

I found that linear cross correlation can provide positive and negative values [-1,1]. So, when you present score for CC, did you take absolute value of each linear CC with respect to image and averaging it with each frame and each clip? Your MATLAB code doesn't do that, but that does make sense to me.

Look forward to your reply.

questions about the paths and files

Hi, thank you for your dataset and the source code, I wanna replicate this work with your code, but I am confused about the paths in config.py. I want to know what kinds of data has been used to train the model. In your paper Revisiting Video Saliency: A Large-scale Benchmark and a New Model ,you said that you have used the static dataset SALICON to train the attention module, and in your code there are several paths. Could you tell me:

which paths are the video dataset's path and which is for the SALICON? Do you mean that frames_path is all the frames extracted from the video, and imgs_path is for the data in SALICON?
do I need to extract all frames from the videos by myself?

related code are as follows:

# path of training videos
videos_train_paths = ['D:/code/attention/DHF1K/training/']
# path of validation videos
videos_val_paths = ['D:/code/attention/DHF1K/val/']
videos_test_path = 'D:/code/attention/DHF1K/testing/'

# path of training maps
maps_path = '/maps/'
# path of training fixation maps
fixs_path = '/fixation/maps/'

frames_path = '/images/'

# path of training images
imgs_path = 'D:/code/attention/staticimages/training/'

Thankyou.

package version

Which version of python packages did you employ for the ACLNet/Attentive CNN-LSTM Network? Thanks a lot.

Absent annotations

Thank you for providing the dataset.
After I unarchived the annotation it has data only for the first 700 videos. Is this intended?

The loss is nan.

Hi, I'm really interested in your work. And I used your training code -- 'ACL_full' to train my data. But during training, the loss always becomes NAN after several iterations:
53/100 [==============>...............] - ETA: 59s - loss: nan - time_distributed_15_loss: nan - time_distributed_16_loss: nan

I have tuned the base learning rate from 1e-4 to 1e-12, but the results are the same.

Do you know there are some solutions?

And what does the 'imgs_path' ('staticimages') in config.py mean?

Thanks very much!

About ACL.h5

Hi! Many thanks for your great work!

The ACL.h5 file could not be opened as a result of running the program.
Is it possible that this file is corrupt?

wenguanwang / dhf1k Goto Github PK

dhf1k's People

Contributors

Stargazers

Watchers

Forkers

dhf1k's Issues

Recommend Projects

Recommend Topics

Recommend Org