Giter Site home page Giter Site logo

kaiyangzhou / pytorch-vsumm-reinforce Goto Github PK

View Code? Open in Web Editor NEW
464.0 464.0 151.0 394 KB

Unsupervised video summarization with deep reinforcement learning (AAAI'18)

License: MIT License

Python 99.09% Shell 0.91%
computer-vision deep-learning machine-learning policy-network reinforcement-learning unsupervised-learning video-summarization

pytorch-vsumm-reinforce's People

Contributors

kaiyangzhou avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorch-vsumm-reinforce's Issues

Resume parameter doesn't work

When trying to use 'resume' functionality, you get an error:

Variable 'start_epoch' referenced before assignment

This goes from line 97 in main.py, where start_epoch is defined only if --resume option is not used.

pre-train model

Thanks for your excellent work~

Could you please provide the pre-trained model, which can be helpful for other new datasets?

CUDNN_STATUS_BAD_PARAM when try to generate result.h5 file

I have used a large feature as (11585, 1000) in input. I want to generate result.h5 file as getting scores through model. I have also set the default dimension parameter 1000 as my feature dimension 1000. Error occur in h, _ = self.rnn(x) on model file. So CUDNN_STATUS_BAD_PARAM error generate. How can I overcome that ?

Not able to create H5 file for another video dataset

I tried to use "https://github.com/SinDongHwan/pytorch-vsumm-reinforce/blob/master/utils/generate_dataset.py" , to extract features from another dataset using python 2.7(As recommended). But, cv2 function like cv2.cv.CV_CAP_PROP_FPS is not working and extracting the frames. And when I tried to use it in python 3.6. I could extract the frames, but weave library is not supported and could not find its replica in 3.6(which we need to find change points). Please suggest, if someone has worked on their own dataset to create H5 file.

Frame-wise importance scores while downsampling a video

I have to use the TVSum50 dataset on a video summarization task. The original video uses a frame rate of 30 fps and each frame is assigned an importance score from 1-5. I have to downsample the video to 3 fps but am not understanding how that will affect the importance scores. Can anyone help me here, please?

the inpt of lstm

from pytorch, the input of lstm is (seq_len, batch, dim).
but in the main.py, the input of lstm is (batch, seq_len, dim).
I guess there is a mistake.

How can get change points using KTS?

I tried to get change points using KTS code.
But i couldn't get proper change points.

If someone get change points using KTS, please help me?

how to test my own custom video?

i have trained and tested with your datasets h5 file .And i just want to test my own video file ,like 'my_video.mp4'.
how can i transform it into h5 file ,and just use your code "python main.py -d datasets/my_own_video.h5 -s datasets/summe_splits.json -m summe --gpu 0 --save-dir log/summe-split0 --split-id 0 --evaluate --resume path_to_your_model.pth.tar --verbose --save-results"

Is gt_score (human annotation) in dataset make it supervised ?

In the paper, I have realized that the approach is called fully unsupervised. But I don't understand of using the gt_score (ground truth score) on your dataset. As far my study, I have learnt that gt_score (human annotation) is used for supervised approaches.

Summary not generating!!!

I have Trained and tested the model but am not able to visualize it. In the summary2video.py there is some typeError popping up stating TypeError: 'KeysViewHDF5' object is not subscriptable.

I have a folder named Videoframes containing several jpg images labelled 000001.jpg and so on....

Supervised version of the model

In the paper, you also describe and analyze a supervised version of the framework. Will the code for the supervised version also be made available?

Thank you.

How can I train the DR-DSNsup model?

In the paper, you mention how to train the DSNsup. But there is so little information about how to train the DR-DSNsup, can you specify it?

Thank you.

5fold cross validation

i find the 5FCV may have some problem. it is not the standard K-fold cross validation.

Cannot find Dataset

Can someone please help me find the dataset. The given link isn't working.

Unsupervised learning use my own data

If I want to use my own dataset, but there is no features or labels in the dataset, only some video clips. How to extract features before construct the h5 file, and what should I do with user_summary, gts_score and gtsummary when construct h5 file?

how to extend my summarized video length?

My output summarized videos are bit faster. For this reason it skipped lots of important parts from input video. I want to extend my summarized video length with appropriate important info. Thanks in advance.

Why is the probability of output always around 0.5?

hello, I really appreciate your work. I have run your code, but i found the probs of the last fc layer are always close. For example, they are always around 0.5. Why not approach 0 or 1?I look forward to your reply!

Unexpected error

I'm getting an unexpected error while training the model. Can somebody please help resolve this?
Traceback (most recent call last):
File "main.py", line 205, in
main()
File "main.py", line 121, in main
probs = model(seq) # output shape (1, seq_len, 1)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/root/models.py", line 19, in forward
h, _ = self.rnn(x)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/rnn.py", line 175, in forward
hx = torch.autograd.Variable(input.data.new(self.num_layers *
File "/usr/local/lib/python2.7/dist-packages/torch/tensor.py", line 407, in data
raise RuntimeError('cannot call .data on a torch.Tensor: did you intend to use autograd.Variable?')
RuntimeError: cannot call .data on a torch.Tensor: did you intend to use autograd.Variable?

How can I get to know where frames are saved?

While video summarization from given datasets, I can not see any folder where frames are saved and its throwing an error
error: C:\ci\opencv_1512688052760\work\modules\imgproc\src\resize.cpp:3289: error: (-215) ssize.width > 0 && ssize.height > 0 in function cv::resize

The network is not able to pick the frames as there is no existing folder and its throwing this error. Can you please help

Mean instead of sum when computing the `expected_reward` by episode

Hi,
According to most of PyTorch REINFORCE algorithm implementations, the policy gradient loss should sum the log_probs on the trajectory (sum over t=1...T) instead of computing the mean. In the paper, this is correctly summed in equations 8/9/10. The only mean is over the N episodes. I believe this is a mistake in the code only.

expected_reward = log_probs.mean() * (reward - baselines[key])

Should be

expected_reward = log_probs.sum() * (reward - baselines[key]) 

The assumption is that the authors wanted to average instead of summing because videos have a different length.

Please, tell me if I am wrong. Thanks!

Stopping criteria

As you said in the paper, there is a description on stopping criteria. "For all our models, we stop training after K consecutive epochs with descending summarization F-score on the validation set. We set K = 5", but I can not see any clue on this strategy from your code. Moreover, I don't see validation set in your code. Can you make some explanation ?

How did user_summary (binary vectors) generated?

Hi, as the title, there's a key called user_summary in dataset eccv16_dataset_tvsum_google_pool5.h5.

I am wondering how to convert 20 annotations, originally provided in TVSum, into that 20 binary vectors?

Thank you.

0/1 Knapsack problem

Hi @KaiyangZhou
What is the relation between 0/1 Knapsack problem and the summary video ?
I saw the 0/1 Knapsack problem was used in the evaluate step in the implementation, Why? I want a detailed explanation for this last.
and thank you in advance.

Handling one or zero frame selected bug

When Bernoulli sampling returns zero or one frame, training crashes.

First, if Bernoulli sampling selects one frame, this error will occur TypeError: len() of a 0-d tensor. This happens because the line 17 in rewards.py pick_idxs = _actions.squeeze().nonzero().squeeze() will return tensor of dimension 0, and when later on line 18 num_picks = len(pick_idxs) function len is called - error will be thrown.

Example to reproduce the error:

import torch
from torch.distributions import Bernoulli

m = Bernoulli(torch.tensor([0.0, 0.0, 1.0, 0.0, 0.0]))
actions = m.sample()

pick_idxs = _actions.squeeze().nonzero().squeeze()
print(len(pick_idxs))

Second, when zero or one frame is selected, return of compute_reward function should be tensor of size 0, otherwise line 132 in main.py will produce size mismatch error. So lines 22 reward = torch.tensor([0.]) and 31 reward = torch.tensor([0.]) should be replaced with reward = torch.tensor(0.).

Can somebody confirm these bugs? (So I can maybe commit the fix?)

pytorch version: 0.4.0
python version: 2.7.12

The project is unstable, does someone download and run it properly?

I use python3 and pytorch 1.4, so i change the print function and the range to range respectively, then i run the project directly, while the result is shown unstable, for the test data, test video varies seriously, and different train test combinations suffer the situation commonly, is the normal circumstances ?do I miss something? or is the version problem or something else?

Video Numbering for TVSUM dataset , and FPS for dataset

Hi ,

  1. : TVSUM raw dataset contains videos with randomish names ( but not numbers ) while u have used video_x naming in ur processed dataset (google_tvsum dataset) . ** Where can i find this mapping **.
    2): What is the fps you have used for converting videos into frames that u input to googlenet for the processed dataset . ( i am asking this for generating the summary from raw frames in summary2video)
    its using fps of like 30 so i am unable to understand that part .

Can u shed some light on this.

how to calculate xcorr?

I tried a lot of methods to calculate xcorr, but I can't get the correct result. Can you tell me?

Saving of frames

Can anyone please tell me how are the frames being saved to .jpg images?

Supervised Learning Extension

Hello,

I want to create both supervised and unspervised models on my own. However I couldn't see any options or find the related parts to switch between them. Could you please give me the reference points through the code if possible? Or do you plan to relase that part?

Thanks.

Video can't generate problems

when I run the command:
python summary2video.py -p log/summe-split0/result.h5 -d video_frames/ -i 0 --fps 30 --save-dir log --save-name summary.mp4
this occured:
OpenCV: FFMPEG: tag 0x5634504d/'MP4V' is not supported with codec id 12 and format 'mp4 / MP4 (MPEG-4 Part 14)'
OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v'
Traceback (most recent call last):
File "summary2video.py", line 43, in
frm2video(args.frm_dir, summary, vid_writer)
File "summary2video.py", line 27, in frm2video
frm = cv2.resize(frm, (args.width, args.height))
cv2.error: OpenCV(3.4.3) /io/opencv/modules/imgproc/src/resize.cpp:4044: error: (-215:Assertion failed) !ssize.empty() in function 'resize'
My generation is to use ffepeg operation video frame,One of my format is generated according to the requirements of the code format, why such a mistake?Can you help me?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.