wenguanwang / dhf1k Goto Github PK
View Code? Open in Web Editor NEWRevisiting Video Saliency: A Large-scale Benchmark and a New Model (CVPR18, PAMI19)
Revisiting Video Saliency: A Large-scale Benchmark and a New Model (CVPR18, PAMI19)
I checked that the fixation map of DHF1K is blurred by width 30 gaussian kernel.
Did you use the same width for the Hollywood2 and UCFSports dataset?
After downloading the dataset, the annotation folder has annotations for only 26 videos. How can I get the annotations of remaining videos?
Did you use the whole fixation points when training and testing Hollywood2 dataset?
Or, did you filter out some points? (e.x. filters out the points at image edge, ....)
Also, did you use the whole 884 test videos when testing Hollywood2?
How did you sync the fixation points with the video?
I am asking you this because I want to replicate the same result on Hollywood2 dataset.
Could you provide a more detailed information of your setting?
(I divided the videos by using shot bounds, as you mentioned)
Hi, thanks for the nice dataset.
I want to recreate the fixation maps using the raw gaze records in exportdata_train folder released for DHF1K.
However, the fixation map obtained using record_mapping.m script and raw data from exportdata_train folder donot match the ones released in DHF1K.
For example:
I used the record_mapping.m file after specifying appropriate paths and modifying line 22 and line 24.
Could you please help me understand what I might be missing?
For your reference, here is my copy of record_mapping.m file:
%This function is used for mapping the fixation record into the corresponding fixation maps.
screen_res_x = 1440;
screen_res_y = 900;
parent_dir = 'GIVE PATH TO PARENT DIRECTORY';
datasetFile1 = 'movie';
datasetFile = 'video';
gazeFile = 'exportdata_train';
videoFiles = dir(fullfile('./', datasetFile));
videoNUM = length(videoFiles)-2;
rate = 30;
full_vid_dir = [parent_dir, datasetFile, '/'];
for videonum = 1:700
videofolder = videoFiles(videonum+2).name
vidObj = VideoReader([full_vid_dir,videofolder]);
options.infolder = fullfile( './', datasetFile, videofolder, 'images' );
% no need to read full video if I can use VideoReader to know
% dimensions and duration of video
% Cache all frames in memory
%[data.frames,names,video_res_y,video_res_x,nframe ]= readAllFrames( options );
nframe = vidObj.NumberOfFrames;
video_res_x = vidObj.Width;
video_res_y = vidObj.Height;
a=video_res_x/screen_res_x;
b=(screen_res_y-video_res_y/a)/2;
all_fixation = zeros(video_res_y,video_res_x,nframe);
for person = 1:17
%modified the following line to match the video naming format
txtloc = fullfile(parent_dir, gazeFile, sprintf('P%02d',person), [sprintf('P%02d_Trail',person), sprintf('%03d.txt',videonum)]);
if exist(txtloc, 'file')
%modified the following line to match the txt file format
[time,model,trialnum,diax, diay, x_screen,y_screen,event]=textread(txtloc,'%f%s%f%f%f%f%f%s','headerlines',1);
if size(time,1)
time = time-time(1);
event = cellfun(@(x) x(1), event);
for index = 1:nframe
eff = find( ((index-1)<rate*time/1000000)&(rate*time/1000000<index)&event=='F'); %framerate = 10;
x_stimulus=int32(a*x_screen(eff));
y_stimulus=int32(a*(y_screen(eff)-b));
t = x_stimulus<=0|x_stimulus>=video_res_x|y_stimulus<=0|y_stimulus>=video_res_y;
all_fixation(y_stimulus(~t),x_stimulus(~t),index) = 1;
end
end
end
end
end
想问下有训练代码吗?
import os
import cv2
from tqdm import tqdm
base_dir = '/home/simplew/dataset/sod/DHF1K'
video_dir = os.path.join(base_dir, 'video')
movies = [mov for mov in os.listdir(video_dir) if mov.endswith('.AVI')]
for movie in tqdm(movies):
image_dir = os.path.join(base_dir , 'annotation', '0' + movie[:-4], 'images')
os.makedirs(image_dir, exist_ok=True)
# use opencv
# cap = cv2.VideoCapture(f"{video_dir}/{movie}")
# numFrames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
# for k in range(numFrames):
# ret, frame = cap.read()
# cv2.imwrite(f"{image_dir}/{k+1:04}.png", frame)
# cap.release()
# ubuntu
command = f"ffmpeg -i {video_dir}/{movie} {image_dir}/%04d.png"
os.system(command)
can you provide the pre-trained model?
Hi,
Thanks for the great work. Could you please provide a license for the dataset?
When using the evaluation code in this package, AUC-shuffled score is much lower than that reported in the paper on UCF dataset. I was wondering if it is anything wrong with the evaluation code, or if I missed some important details.
In your data collection you gather a set of discrete fixation maps (P in the paper). From this, continuous saliency maps (Q in the paper) are generated. I found no details about how this is done, could you elaborate? I would guess that it involves gaussians centered on the spot of fixation, I am interested in the exact parameters, how you combine fixations from different test subjects and so on.
Thanks again for providing the dataset!
Many thanks for your great work!
as far as I can see, DHF1k_attribute.xlsx only provides data for the 300 test videos. Could you also provide this kind of attribute data for the first 700 videos?
That would save me a lot of work and would be highly appreciated!
Hi, thanks for collecting such a valuable dataset!
Several things I want to clarify with you:
Many thanks!
You mentioned in a previous issue :
Hi, all, the data of Hollywood-2 and UCF have been uploaded.
The code (ACLNet) and dataset (DHF1K with raw gaze records, UCF-sports are new added!) can be downloaded from:
Google disk:https://drive.google.com/open?id=1sW0tf9RQMO4RR7SyKhU8Kmbm4jwkFGpQ
Baidu pan: https://pan.baidu.com/s/110NIlwRIiEOTyqRwYdDnVg
The Hollywood-2 (74.6G) can be downloaded from:
Google disk:https://drive.google.com/open?id=1vfRKJloNSIczYEOVjB4zMK8r0k4VJuWk
Originally posted by @wenguanwang in #2 (comment)
Is there also a link for the UCF-sports dataset?
Hi,
First of all, thank you for providing such a nice dataset and code for evaluation metrics. I would like to evaluate my saliency results using five metrics that paper talked about.
I found that linear cross correlation can provide positive and negative values [-1,1]. So, when you present score for CC, did you take absolute value of each linear CC with respect to image and averaging it with each frame and each clip? Your MATLAB code doesn't do that, but that does make sense to me.
Look forward to your reply.
Hi, thank you for your dataset and the source code, I wanna replicate this work with your code, but I am confused about the paths in config.py. I want to know what kinds of data has been used to train the model. In your paper Revisiting Video Saliency: A Large-scale Benchmark and a New Model ,you said that you have used the static dataset SALICON to train the attention module, and in your code there are several paths. Could you tell me:
related code are as follows:
# path of training videos
videos_train_paths = ['D:/code/attention/DHF1K/training/']
# path of validation videos
videos_val_paths = ['D:/code/attention/DHF1K/val/']
videos_test_path = 'D:/code/attention/DHF1K/testing/'
# path of training maps
maps_path = '/maps/'
# path of training fixation maps
fixs_path = '/fixation/maps/'
frames_path = '/images/'
# path of training images
imgs_path = 'D:/code/attention/staticimages/training/'
Thankyou.
Which version of python packages did you employ for the ACLNet/Attentive CNN-LSTM Network? Thanks a lot.
Thank you for providing the dataset.
After I unarchived the annotation it has data only for the first 700 videos. Is this intended?
Hi, I'm really interested in your work. And I used your training code -- 'ACL_full' to train my data. But during training, the loss always becomes NAN after several iterations:
53/100 [==============>...............] - ETA: 59s - loss: nan - time_distributed_15_loss: nan - time_distributed_16_loss: nan
I have tuned the base learning rate from 1e-4 to 1e-12, but the results are the same.
Do you know there are some solutions?
And what does the 'imgs_path' ('staticimages') in config.py mean?
Thanks very much!
Hi! Many thanks for your great work!
The ACL.h5 file could not be opened as a result of running the program.
Is it possible that this file is corrupt?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.