tibhannover / msva Goto Github PK
View Code? Open in Web Editor NEWDeep learning model for supervised video summarization called Multi Source Visual Attention (MSVA)
License: MIT License
Deep learning model for supervised video summarization called Multi Source Visual Attention (MSVA)
License: MIT License
After trying to reproduce your code, I encountered two errors (Line 314
)
train_val_loss_score.append([loss,np.mean(avg_loss[:, 0]),val_fscore,test_loss, video_scores,kt,sp])
Appending the loss
and test_loss
(both being torch.cuda.Tensor
) creates an issue (can't convert cuda tensor to numpy array
) while converting them to numpy
array at Line 538.
According to current version of ortools
the code will be:
from ortools.algorithms.python import knapsack_solver
osolver = knapsack_solver.KnapsackSolver(
knapsack_solver.KNAPSACK_DYNAMIC_PROGRAMMING_SOLVER,
'test')
def knapsack_ortools(values, weights, items, capacity ):
scale = 1000
values = np.array(values)
weights = np.array(weights)
values = (values * scale).astype(np.int)
weights = (weights).astype(np.int)
capacity = capacity
osolver.init(values.tolist(), [weights.tolist()], [capacity])
computed_value = osolver.solve()
packed_items = [x for x in range(0, len(weights))
if osolver.best_solution_contains(x)]
return packed_items
Thanks 😺
Hello,
While reproducing the results, I'm unable to get the scores (Spearman and Kendall Tau) that you guys reported in the paper. It's deviating a lot from the reported score.
Can you please provide the actual code for computing it?
As per lines 430 and 431 (shown below)
Lines 430 to 431 in dad26a6
Thanks in advance 😃 .
Hi, I noticed that the last step of the self-attention calculation doesn't seem so right:
att_weights_ = nn.functional.softmax(logits, dim=-1)
weights = self.dropout(att_weights_)
y = torch.matmul(V.transpose(1,0), weights).transpose(1,0)
So here the softmax probability is calculated along the dim -1, which is the column direction.
But then the weighted sum is taken along the row direction according to this line
y = torch.matmul(V.transpose(1,0), weights).transpose(1,0)
I think we should do something like this
y = torch.matmul(weights,V)
How do you think?
I hope I'm the one to be corrected.
I can't reproduce the results, I run the code, and the coefficient of Spearman’s and Kendall’s in Tvsum are 0.5849 and 0.6403 respectively,which are more higher than the results.
Hi, I tried to replicate the render part of your project, specifically the motion feature extraction. I couldn't do it because the code version "https://github.com/deepmind/kinetics-i3d/tree/master/data/checkpoints/rgb_imagenet" is out of date. Thanks in advance!
Hello,
Thank you for making the code open-source. :)
I am trying to reproduce your results, and while looking into the code, it seems there are two types of split files for the non-overlapping splits: random and ordered. Could you tell me which F1 score is reported in the paper?
Best,
Noga
could you help me with how to extract features of the datasets? ( I also want to train in my dataset demo)
Hello.
I tried to train the model, (I didn't change anything.) And I got 60.45 F-score.(intermediate_tvsum_non_overlapping_split_avg)
But the paper states that the F-score is 61.5.
Why did the score come out lower?
Do I need to change other parameter values?
Hi, I am trying to extract the object features using your code in https://github.com/VideoAnalysis/EDUVSUM/tree/master/src.
According to your paper, you are using the googleNet trained with imagenet. I assume that you are extracting features using the model "modelInceptionV3" as in the codes. However, the feature shape of " inceptionv3_feature = modelInceptionV3.predict(frmRz299)" is (8,8, 2048). I tried to change the model initialization code to
"modelInceptionV3=InceptionV3(weights='imagenet', pooling='avg', include_top=False)' to get a 2048 feature vector.
However, the object feature vector length is 1024 in the MSVA codes, and I noticed that the values of features from the feature extraction code is quite different from that in the MSVA codes. For the former, the feature values can be larger than 1, but in the latter, the value seemed to be normalized to [0,1] range.
Have I missed something?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.