kaiyangzhou / pytorch-vsumm-reinforce Goto Github PK

Unsupervised video summarization with deep reinforcement learning (AAAI'18)

License: MIT License

Python 99.09% Shell 0.91%

computer-vision deep-learning machine-learning policy-network reinforcement-learning unsupervised-learning video-summarization

pytorch-vsumm-reinforce's People

Contributors

Stargazers

Watchers

Forkers

shubhampachori12110095 dsp6414 herbertchen1 yusea anuj-rathore petersteve kikyou123 ryan2x arielivandiaz mctigger hrspythonix nassimanoufail nkh2235 adamsvystun wangzheallen zhangxinnan kmvinayaka akitoyi awesome-archive amirunpri2018 okaberintaruo xiaobingdu vinace databill86 zhique930716 dinngoman balrajashwath ammieqi yogeshnain sindonghwan chenbohua3 vidishsharma tarsbase qifeng2010 jxlin azhar0100 s-p-z swati640 harlanhong nomiscientist shineyusong toshikicorpus lrav gunnerwang harryjun zhiyue-archive phillip1029 wannawannawanna linglingzhao adithyasireesh bjameslondon123 nakajimakou1 pravin74 huarui1996 hyunsuk123 n9839950 ledduy610 dali-dl yifan254 zhangjunior hsouporto rickeshtn seekever mhgcc qianzixi bobycv06fpm lzh990711 chaelin0722 kadinzhang-testing naveenkumarbisai itibarpashaev saran-gangster sui6662012 xugk heylakshya hhhhnwl lwzbuaa sodiqadewole ajayarunachalam gauravbhag51 sujit1011 18771950421 chandrashekharpawar derekdqc justdrink-yeah huuuuyl mikechen66 maguscoder-official codermery twy00 tchennwpu vdeeplearn jnzs1836 tegusi pdsyaom colebryant sumathigit tan9xin zyongbo gxdj

pytorch-vsumm-reinforce's Issues

The model does not converge on other datasets.

The model does not converge on other datasets, do you have any advice?
where epochs=60, backbone=resnet50, lr=0.00001.

Resume parameter doesn't work

When trying to use 'resume' functionality, you get an error:

Variable 'start_epoch' referenced before assignment

This goes from line 97 in main.py, where start_epoch is defined only if --resume option is not used.

evaluation matrice

pytorch-vsumm-reinforce/main.py

Line 164 in fdd03be

eval_metric = 'avg' if args.metric == 'tvsum' else 'max'

Why tvsum use avg but summe use max?

Thank you very much.

pre-train model

Thanks for your excellent work~

Could you please provide the pre-trained model, which can be helpful for other new datasets?

How to create user summary of custom video dataset and save it in H5 py file?

Please explain, how to create the user summary and save it in the H5py file. I am able to create h5py file for my own video dataset but have no idea for user summary key for H5py file. Please help.

CUDNN_STATUS_BAD_PARAM when try to generate result.h5 file

I have used a large feature as (11585, 1000) in input. I want to generate result.h5 file as getting scores through model. I have also set the default dimension parameter 1000 as my feature dimension 1000. Error occur in h, _ = self.rnn(x) on model file. So CUDNN_STATUS_BAD_PARAM error generate. How can I overcome that ?

does this method extract 1 frame from 7 frames to represent them?

as the title, can I control the ratio?

Not able to create H5 file for another video dataset

I tried to use "https://github.com/SinDongHwan/pytorch-vsumm-reinforce/blob/master/utils/generate_dataset.py" , to extract features from another dataset using python 2.7(As recommended). But, cv2 function like cv2.cv.CV_CAP_PROP_FPS is not working and extracting the frames. And when I tried to use it in python 3.6. I could extract the frames, but weave library is not supported and could not find its replica in 3.6(which we need to find change points). Please suggest, if someone has worked on their own dataset to create H5 file.

how to extracted key_frame from dataset?

how to extracted key_frame from dataset?
when i try to run the step of Visualize summary ,i encounter with a problem. I can't extracted key_frame from dataset,so i want konw how do this step?
@KaiyangZhou @yrwangxd @SinDongHwan

The data set cannot be downloaded, can you provide it?

wget http://www.eecs.qmul.ac.uk/~kz303/vsumm-reinforce/datasets.tar.gz. it is 404 Not Found. can you provide it?

Frame-wise importance scores while downsampling a video

I have to use the TVSum50 dataset on a video summarization task. The original video uses a frame rate of 30 fps and each frame is assigned an importance score from 1-5. I have to downsample the video to 3 fps but am not understanding how that will affect the importance scores. Can anyone help me here, please?

the inpt of lstm

from pytorch, the input of lstm is (seq_len, batch, dim).
but in the main.py, the input of lstm is (batch, seq_len, dim).
I guess there is a mistake.

How can get change points using KTS?

I tried to get change points using KTS code.
But i couldn't get proper change points.

If someone get change points using KTS, please help me?

How to develop DR-DSNsup (Supervised Version Of RL_Model) ?

How I can generate the supervised version of DR-DSN ? Can anyone provide any code implementation of that. Thanks in advance.

how to test my own custom video？

i have trained and tested with your datasets h5 file .And i just want to test my own video file ,like 'my_video.mp4'.
how can i transform it into h5 file ,and just use your code "python main.py -d datasets/my_own_video.h5 -s datasets/summe_splits.json -m summe --gpu 0 --save-dir log/summe-split0 --split-id 0 --evaluate --resume path_to_your_model.pth.tar --verbose --save-results"

Original videos

Does anyone have a link for the original videos?

Hello,I can not find the dataset

Hello,I can not open your link to find the dataset,can you send the dataset to my mail?
my mail:[email protected]

Is gt_score (human annotation) in dataset make it supervised ?

In the paper, I have realized that the approach is called fully unsupervised. But I don't understand of using the gt_score (ground truth score) on your dataset. As far my study, I have learnt that gt_score (human annotation) is used for supervised approaches.

Summary not generating!!!

I have Trained and tested the model but am not able to visualize it. In the summary2video.py there is some typeError popping up stating TypeError: 'KeysViewHDF5' object is not subscriptable.

I have a folder named Videoframes containing several jpg images labelled 000001.jpg and so on....

Supervised version of the model

In the paper, you also describe and analyze a supervised version of the framework. Will the code for the supervised version also be made available?

Thank you.

Bernoulli distribution

Hi @KaiyangZhou,

1/ What's the value of m? (m = Bernoulli(probs))
2/ what's the values of the probability Pt in case of , At (action) equal =1?

How can I train the DR-DSNsup model?

In the paper, you mention how to train the DSNsup. But there is so little information about how to train the DR-DSNsup, can you specify it?

Thank you.

5fold cross validation

i find the 5FCV may have some problem. it is not the standard K-fold cross validation.

Extracting image features for videos

Why all the epochs are closed to 0.49 and don't diverse or improving at all ?

I have trained the model through my custom dataset. I used 1000 epochs. But I realized that all the epoch are closed to 0.49. The 1000 epochs not diverse at all. For example, First epoch is 0.49 and 1000th epoch is 0.4998, there is no change or improvement. Thanks in advance.

Cannot find Dataset

Can someone please help me find the dataset. The given link isn't working.

Unsupervised learning use my own data

If I want to use my own dataset, but there is no features or labels in the dataset, only some video clips. How to extract features before construct the h5 file, and what should I do with user_summary, gts_score and gtsummary when construct h5 file?

how to extend my summarized video length?

My output summarized videos are bit faster. For this reason it skipped lots of important parts from input video. I want to extend my summarized video length with appropriate important info. Thanks in advance.

Why is the probability of output always around 0.5？

hello, I really appreciate your work. I have run your code, but i found the probs of the last fc layer are always close. For example, they are always around 0.5. Why not approach 0 or 1？I look forward to your reply!

Unexpected error

I'm getting an unexpected error while training the model. Can somebody please help resolve this?
Traceback (most recent call last):
File "main.py", line 205, in
main()
File "main.py", line 121, in main
probs = model(seq) # output shape (1, seq_len, 1)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/root/models.py", line 19, in forward
h, _ = self.rnn(x)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/rnn.py", line 175, in forward
hx = torch.autograd.Variable(input.data.new(self.num_layers *
File "/usr/local/lib/python2.7/dist-packages/torch/tensor.py", line 407, in data
raise RuntimeError('cannot call .data on a torch.Tensor: did you intend to use autograd.Variable?')
RuntimeError: cannot call .data on a torch.Tensor: did you intend to use autograd.Variable?

How can I get to know where frames are saved?

While video summarization from given datasets, I can not see any folder where frames are saved and its throwing an error
error: C:\ci\opencv_1512688052760\work\modules\imgproc\src\resize.cpp:3289: error: (-215) ssize.width > 0 && ssize.height > 0 in function cv::resize

The network is not able to pick the frames as there is no existing folder and its throwing this error. Can you please help

Mean instead of sum when computing the `expected_reward` by episode

Hi,
According to most of PyTorch REINFORCE algorithm implementations, the policy gradient loss should sum the log_probs on the trajectory (sum over t=1...T) instead of computing the mean. In the paper, this is correctly summed in equations 8/9/10. The only mean is over the N episodes. I believe this is a mistake in the code only.

pytorch-vsumm-reinforce/main.py

Line 131 in fdd03be

expected_reward = log_probs.mean() * (reward - baselines[key])

Should be

expected_reward = log_probs.sum() * (reward - baselines[key])

The assumption is that the authors wanted to average instead of summing because videos have a different length.

Please, tell me if I am wrong. Thanks!

Stopping criteria

As you said in the paper, there is a description on stopping criteria. "For all our models, we stop training after K consecutive epochs with descending summarization F-score on the validation set. We set K = 5", but I can not see any clue on this strategy from your code. Moreover, I don't see validation set in your code. Can you make some explanation ?

Results lesser than the original implementation.

Looks like this implementation is yielding worse results compared to the paper and the theano implementation. What could be the reason? and any ideas on how to fix this?

Thank You!

how to extracted key_frame from dataset?

the following arguments are required

How did user_summary (binary vectors) generated?

Hi, as the title, there's a key called user_summary in dataset eccv16_dataset_tvsum_google_pool5.h5.

I am wondering how to convert 20 annotations, originally provided in TVSum, into that 20 binary vectors?

Thank you.

How to extract image features for videos in this paper?

Can author give us code or link about feature extracting?

0/1 Knapsack problem

Hi @KaiyangZhou
What is the relation between 0/1 Knapsack problem and the summary video ?
I saw the 0/1 Knapsack problem was used in the evaluate step in the implementation, Why? I want a detailed explanation for this last.
and thank you in advance.

Question about reward function.

When will if num_picks == 0: in reward.py be executed? It seems impossible to equal 0 because of these code num_picks = len(pick_idxs) if pick_idxs.ndimension() > 0 else 1.

How can I use the model to summarize a custom video?

Given a video, not from one of the training datasets, how can I apply the model to it?

Handling one or zero frame selected bug

When Bernoulli sampling returns zero or one frame, training crashes.

First, if Bernoulli sampling selects one frame, this error will occur TypeError: len() of a 0-d tensor. This happens because the line 17 in rewards.py pick_idxs = _actions.squeeze().nonzero().squeeze() will return tensor of dimension 0, and when later on line 18 num_picks = len(pick_idxs) function len is called - error will be thrown.

Example to reproduce the error:

import torch
from torch.distributions import Bernoulli

m = Bernoulli(torch.tensor([0.0, 0.0, 1.0, 0.0, 0.0]))
actions = m.sample()

pick_idxs = _actions.squeeze().nonzero().squeeze()
print(len(pick_idxs))

Second, when zero or one frame is selected, return of compute_reward function should be tensor of size 0, otherwise line 132 in main.py will produce size mismatch error. So lines 22 reward = torch.tensor([0.]) and 31 reward = torch.tensor([0.]) should be replaced with reward = torch.tensor(0.).

Can somebody confirm these bugs? (So I can maybe commit the fix?)

pytorch version: 0.4.0
python version: 2.7.12

Find features, change points, num_frames and positions for custom test video

Hi @KaiyangZhou,

I wanted to know how I can find the following features to generate a summary for a custom video:

Features (for finding seq and probs)
Change points (cps)
Number of frames (num_frames)
Number of frames per seg (nfps)
Positions

Please let me know!

The project is unstable, does someone download and run it properly?

I use python3 and pytorch 1.4, so i change the print function and the range to range respectively, then i run the project directly, while the result is shown unstable, for the test data, test video varies seriously, and different train test combinations suffer the situation commonly, is the normal circumstances ？do I miss something? or is the version problem or something else?

Frame Downsampling for dataset.

Why do we download the videos in the dataset to 2fps / choosing every 15th frame from each video?

Video Numbering for TVSUM dataset , and FPS for dataset

Hi ,

: TVSUM raw dataset contains videos with randomish names ( but not numbers ) while u have used video_x naming in ur processed dataset (google_tvsum dataset) . ** Where can i find this mapping **.
2): What is the fps you have used for converting videos into frames that u input to googlenet for the processed dataset . ( i am asking this for generating the summary from raw frames in summary2video)
its using fps of like 30 so i am unable to understand that part .

Can u shed some light on this.

how to calculate xcorr?

I tried a lot of methods to calculate xcorr, but I can't get the correct result. Can you tell me?

Saving of frames

Can anyone please tell me how are the frames being saved to .jpg images?

Supervised Learning Extension

Hello,

I want to create both supervised and unspervised models on my own. However I couldn't see any options or find the related parts to switch between them. Could you please give me the reference points through the code if possible? Or do you plan to relase that part?

Thanks.

Video can't generate problems

when I run the command：
python summary2video.py -p log/summe-split0/result.h5 -d video_frames/ -i 0 --fps 30 --save-dir log --save-name summary.mp4
this occured：
OpenCV: FFMPEG: tag 0x5634504d/'MP4V' is not supported with codec id 12 and format 'mp4 / MP4 (MPEG-4 Part 14)'
OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v'
Traceback (most recent call last):
File "summary2video.py", line 43, in
frm2video(args.frm_dir, summary, vid_writer)
File "summary2video.py", line 27, in frm2video
frm = cv2.resize(frm, (args.width, args.height))
cv2.error: OpenCV(3.4.3) /io/opencv/modules/imgproc/src/resize.cpp:4044: error: (-215:Assertion failed) !ssize.empty() in function 'resize'
My generation is to use ffepeg operation video frame,One of my format is generated according to the requirements of the code format， why such a mistake？Can you help me?

kaiyangzhou / pytorch-vsumm-reinforce Goto Github PK

pytorch-vsumm-reinforce's People

Contributors

Stargazers

Watchers

Forkers

pytorch-vsumm-reinforce's Issues

Recommend Projects

Recommend Topics

Recommend Org