kaiyangzhou / pytorch-vsumm-reinforce Goto Github PK
View Code? Open in Web Editor NEWUnsupervised video summarization with deep reinforcement learning (AAAI'18)
License: MIT License
Unsupervised video summarization with deep reinforcement learning (AAAI'18)
License: MIT License
I tried to get change points using KTS code.
But i couldn't get proper change points.
If someone get change points using KTS, please help me?
Hi @KaiyangZhou,
1/ What's the value of m? (m = Bernoulli(probs))
2/ what's the values of the probability Pt in case of , At (action) equal =1?
I have to use the TVSum50 dataset on a video summarization task. The original video uses a frame rate of 30 fps and each frame is assigned an importance score from 1-5. I have to downsample the video to 3 fps but am not understanding how that will affect the importance scores. Can anyone help me here, please?
My output summarized videos are bit faster. For this reason it skipped lots of important parts from input video. I want to extend my summarized video length with appropriate important info. Thanks in advance.
As you said in the paper, there is a description on stopping criteria. "For all our models, we stop training after K consecutive epochs with descending summarization F-score on the validation set. We set K = 5", but I can not see any clue on this strategy from your code. Moreover, I don't see validation set in your code. Can you make some explanation ?
Hi @KaiyangZhou,
I wanted to know how I can find the following features to generate a summary for a custom video:
Please let me know!
I use python3 and pytorch 1.4, so i change the print function and the range to range respectively, then i run the project directly, while the result is shown unstable, for the test data, test video varies seriously, and different train test combinations suffer the situation commonly, is the normal circumstances ?do I miss something? or is the version problem or something else?
i find the 5FCV may have some problem. it is not the standard K-fold cross validation.
In the paper, I have realized that the approach is called fully unsupervised. But I don't understand of using the gt_score (ground truth score) on your dataset. As far my study, I have learnt that gt_score (human annotation) is used for supervised approaches.
i have trained and tested with your datasets h5 file .And i just want to test my own video file ,like 'my_video.mp4'.
how can i transform it into h5 file ,and just use your code "python main.py -d datasets/my_own_video.h5 -s datasets/summe_splits.json -m summe --gpu 0 --save-dir log/summe-split0 --split-id 0 --evaluate --resume path_to_your_model.pth.tar --verbose --save-results"
In the paper, you also describe and analyze a supervised version of the framework. Will the code for the supervised version also be made available?
Thank you.
Hello,
I want to create both supervised and unspervised models on my own. However I couldn't see any options or find the related parts to switch between them. Could you please give me the reference points through the code if possible? Or do you plan to relase that part?
Thanks.
how to extracted key_frame from dataset?
when i try to run the step of Visualize summary ,i encounter with a problem. I can't extracted key_frame from dataset,so i want konw how do this step?
@KaiyangZhou @yrwangxd
When will if num_picks == 0:
in reward.py
be executed? It seems impossible to equal 0 because of these code num_picks = len(pick_idxs) if pick_idxs.ndimension() > 0 else 1
.
how to extracted key_frame from dataset?
when i try to run the step of Visualize summary ,i encounter with a problem. I can't extracted key_frame from dataset,so i want konw how do this step?
@KaiyangZhou @yrwangxd @SinDongHwan
Thanks for your excellent work~
Could you please provide the pre-trained model, which can be helpful for other new datasets?
Why do we download the videos in the dataset to 2fps / choosing every 15th frame from each video?
pytorch-vsumm-reinforce/main.py
Line 164 in fdd03be
Why tvsum use avg but summe use max?
Thank you very much.
While video summarization from given datasets, I can not see any folder where frames are saved and its throwing an error
error: C:\ci\opencv_1512688052760\work\modules\imgproc\src\resize.cpp:3289: error: (-215) ssize.width > 0 && ssize.height > 0 in function cv::resize
The network is not able to pick the frames as there is no existing folder and its throwing this error. Can you please help
Hi @KaiyangZhou
What is the relation between 0/1 Knapsack problem and the summary video ?
I saw the 0/1 Knapsack problem was used in the evaluate step in the implementation, Why? I want a detailed explanation for this last.
and thank you in advance.
from pytorch, the input of lstm is (seq_len, batch, dim).
but in the main.py, the input of lstm is (batch, seq_len, dim).
I guess there is a mistake.
I'm getting an unexpected error while training the model. Can somebody please help resolve this?
Traceback (most recent call last):
File "main.py", line 205, in
main()
File "main.py", line 121, in main
probs = model(seq) # output shape (1, seq_len, 1)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/root/models.py", line 19, in forward
h, _ = self.rnn(x)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/rnn.py", line 175, in forward
hx = torch.autograd.Variable(input.data.new(self.num_layers *
File "/usr/local/lib/python2.7/dist-packages/torch/tensor.py", line 407, in data
raise RuntimeError('cannot call .data on a torch.Tensor: did you intend to use autograd.Variable?')
RuntimeError: cannot call .data on a torch.Tensor: did you intend to use autograd.Variable?
When trying to use 'resume' functionality, you get an error:
Variable 'start_epoch' referenced before assignment
This goes from line 97 in main.py
, where start_epoch
is defined only if --resume
option is not used.
wget http://www.eecs.qmul.ac.uk/~kz303/vsumm-reinforce/datasets.tar.gz. it is 404 Not Found. can you provide it?
Hello,I can not open your link to find the dataset,can you send the dataset to my mail?
my mail:[email protected]
If I want to use my own dataset, but there is no features or labels in the dataset, only some video clips. How to extract features before construct the h5 file, and what should I do with user_summary, gts_score and gtsummary when construct h5 file?
I tried to use "https://github.com/SinDongHwan/pytorch-vsumm-reinforce/blob/master/utils/generate_dataset.py" , to extract features from another dataset using python 2.7(As recommended). But, cv2 function like cv2.cv.CV_CAP_PROP_FPS is not working and extracting the frames. And when I tried to use it in python 3.6. I could extract the frames, but weave library is not supported and could not find its replica in 3.6(which we need to find change points). Please suggest, if someone has worked on their own dataset to create H5 file.
Given a video, not from one of the training datasets, how can I apply the model to it?
In the paper, you mention how to train the DSNsup. But there is so little information about how to train the DR-DSNsup, can you specify it?
Thank you.
Does anyone have a link for the original videos?
I have trained the model through my custom dataset. I used 1000 epochs. But I realized that all the epoch are closed to 0.49. The 1000 epochs not diverse at all. For example, First epoch is 0.49 and 1000th epoch is 0.4998, there is no change or improvement. Thanks in advance.
I have used a large feature as (11585, 1000) in input. I want to generate result.h5 file as getting scores through model. I have also set the default dimension parameter 1000 as my feature dimension 1000. Error occur in h, _ = self.rnn(x) on model file. So CUDNN_STATUS_BAD_PARAM error generate. How can I overcome that ?
Can someone please help me find the dataset. The given link isn't working.
hello, I really appreciate your work. I have run your code, but i found the probs of the last fc layer are always close. For example, they are always around 0.5. Why not approach 0 or 1?I look forward to your reply!
as the title, can I control the ratio?
Can anyone please tell me how are the frames being saved to .jpg images?
when I run the command:
python summary2video.py -p log/summe-split0/result.h5 -d video_frames/ -i 0 --fps 30 --save-dir log --save-name summary.mp4
this occured:
OpenCV: FFMPEG: tag 0x5634504d/'MP4V' is not supported with codec id 12 and format 'mp4 / MP4 (MPEG-4 Part 14)'
OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v'
Traceback (most recent call last):
File "summary2video.py", line 43, in
frm2video(args.frm_dir, summary, vid_writer)
File "summary2video.py", line 27, in frm2video
frm = cv2.resize(frm, (args.width, args.height))
cv2.error: OpenCV(3.4.3) /io/opencv/modules/imgproc/src/resize.cpp:4044: error: (-215:Assertion failed) !ssize.empty() in function 'resize'
My generation is to use ffepeg operation video frame,One of my format is generated according to the requirements of the code format, why such a mistake?Can you help me?
The model does not converge on other datasets, do you have any advice?
where epochs=60, backbone=resnet50, lr=0.00001
.
Hi, as the title, there's a key called user_summary in dataset eccv16_dataset_tvsum_google_pool5.h5.
I am wondering how to convert 20 annotations, originally provided in TVSum, into that 20 binary vectors?
Thank you.
I have Trained and tested the model but am not able to visualize it. In the summary2video.py there is some typeError popping up stating TypeError: 'KeysViewHDF5' object is not subscriptable.
I have a folder named Videoframes containing several jpg images labelled 000001.jpg and so on....
Please explain, how to create the user summary and save it in the H5py file. I am able to create h5py file for my own video dataset but have no idea for user summary key for H5py file. Please help.
How I can generate the supervised version of DR-DSN ? Can anyone provide any code implementation of that. Thanks in advance.
I tried a lot of methods to calculate xcorr, but I can't get the correct result. Can you tell me?
Looks like this implementation is yielding worse results compared to the paper and the theano implementation. What could be the reason? and any ideas on how to fix this?
Thank You!
Hi,
According to most of PyTorch REINFORCE algorithm implementations, the policy gradient loss should sum the log_probs
on the trajectory (sum over t=1...T) instead of computing the mean. In the paper, this is correctly summed in equations 8/9/10. The only mean is over the N episodes. I believe this is a mistake in the code only.
pytorch-vsumm-reinforce/main.py
Line 131 in fdd03be
Should be
expected_reward = log_probs.sum() * (reward - baselines[key])
The assumption is that the authors wanted to average instead of summing because videos have a different length.
Please, tell me if I am wrong. Thanks!
Hi ,
Can u shed some light on this.
Can author give us code or link about feature extracting?
When Bernoulli sampling returns zero or one frame, training crashes.
First, if Bernoulli sampling selects one frame, this error will occur TypeError: len() of a 0-d tensor
. This happens because the line 17 in rewards.py pick_idxs = _actions.squeeze().nonzero().squeeze()
will return tensor of dimension 0, and when later on line 18 num_picks = len(pick_idxs)
function len is called - error will be thrown.
Example to reproduce the error:
import torch
from torch.distributions import Bernoulli
m = Bernoulli(torch.tensor([0.0, 0.0, 1.0, 0.0, 0.0]))
actions = m.sample()
pick_idxs = _actions.squeeze().nonzero().squeeze()
print(len(pick_idxs))
Second, when zero or one frame is selected, return of compute_reward
function should be tensor of size 0, otherwise line 132 in main.py will produce size mismatch error. So lines 22 reward = torch.tensor([0.])
and 31 reward = torch.tensor([0.])
should be replaced with reward = torch.tensor(0.)
.
Can somebody confirm these bugs? (So I can maybe commit the fix?)
pytorch version: 0.4.0
python version: 2.7.12
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.