tibhannover / msva Goto Github PK

View Code? Open in Web Editor NEW

38.0 4.0 17.0 34.75 MB

Deep learning model for supervised video summarization called Multi Source Visual Attention (MSVA)

License: MIT License

Python 100.00%

attention-mechanism visual-attention supervised-video-summarization motion-features

msva's People

Contributors

Stargazers

Watchers

Forkers

sherzod-hakimov zjc19980405 dukezacharia ledduy610 goharkhan045 brendatafura hwijune aliniya128 ai-timi jiuney samedarslan90 anch0715 nomiscientist yanshul009 tan342 garvitbatra02 tuanzzfds

msva's Issues

Test my video

Request for changes in `train.py` and `knapsack.py`

ThankYou MSVA guys, for making the code public and most importantly, the splits public.

After trying to reproduce your code, I encountered two errors (Line 314)

train_val_loss_score.append([loss,np.mean(avg_loss[:, 0]),val_fscore,test_loss, video_scores,kt,sp])

Appending the loss and test_loss (both being torch.cuda.Tensor) creates an issue (can't convert cuda tensor to numpy array) while converting them to numpy array at Line 538.

Additional INFO:

According to current version of ortools the code will be:

from ortools.algorithms.python import knapsack_solver

osolver = knapsack_solver.KnapsackSolver(
    knapsack_solver.KNAPSACK_DYNAMIC_PROGRAMMING_SOLVER,
    'test')

def knapsack_ortools(values, weights, items, capacity ):
    scale = 1000
    values = np.array(values)
    weights = np.array(weights)
    values = (values * scale).astype(np.int)
    weights = (weights).astype(np.int)
    capacity = capacity
    osolver.init(values.tolist(), [weights.tolist()], [capacity])
    computed_value = osolver.solve()
    packed_items = [x for x in range(0, len(weights))
                    if osolver.best_solution_contains(x)]
    return packed_items

Thanks 😺

NOT able to reproduce spearman and kendall tau as reported in the paper.

Hello,

While reproducing the results, I'm unable to get the scores (Spearman and Kendall Tau) that you guys reported in the paper. It's deviating a lot from the reported score.

Can you please provide the actual code for computing it?

As per lines 430 and 431 (shown below)

MSVA/train.py

Lines 430 to 431 in dad26a6

    
           pS=spearmanr(y_pred2,y_true2)[0] 
        
           kT=kendalltau(rankdata(-np.array(y_true2)), rankdata(-np.array(y_pred2)))[0]

We tried all different combinations but were unable to reach near what you have mentioned.
Please guide us!

Thanks in advance 😃 .

Issue regarding the last step of self attention (weighted sum step)

Hi, I noticed that the last step of the self-attention calculation doesn't seem so right:

att_weights_ = nn.functional.softmax(logits, dim=-1)       
weights = self.dropout(att_weights_)     
y = torch.matmul(V.transpose(1,0), weights).transpose(1,0)

So here the softmax probability is calculated along the dim -1, which is the column direction.
But then the weighted sum is taken along the row direction according to this line

y = torch.matmul(V.transpose(1,0), weights).transpose(1,0)

I think we should do something like this

y = torch.matmul(weights,V)

How do you think?
I hope I'm the one to be corrected.

problem about code,the coefficient of Spearman’s and Kendall’s in Tvsum are 0.5849 and 0.6403

I can't reproduce the results, I run the code, and the coefficient of Spearman’s and Kendall’s in Tvsum are 0.5849 and 0.6403 respectively，which are more higher than the results.

can't reproduce the f1 results, and coefficient results also seems unusual

How did you extract the motion features for the datasets?

Hi, I tried to replicate the render part of your project, specifically the motion feature extraction. I couldn't do it because the code version "https://github.com/deepmind/kinetics-i3d/tree/master/data/checkpoints/rgb_imagenet" is out of date. Thanks in advance!

I use 'KTS' to segment the video ,but I can't get the result of yours，how can I get the same segmentation

Which F1-Score is reported in the paper?

Hello,

Thank you for making the code open-source. :)

I am trying to reproduce your results, and while looking into the code, it seems there are two types of split files for the non-overlapping splits: random and ordered. Could you tell me which F1 score is reported in the paper?

Best,
Noga

Extracted features for the datasets

could you help me with how to extract features of the datasets? ( I also want to train in my dataset demo)

Trained Model's F-score is different from the score stated in the paper.

Hello.

I tried to train the model, (I didn't change anything.) And I got 60.45 F-score.(intermediate_tvsum_non_overlapping_split_avg)

But the paper states that the F-score is 61.5.

Why did the score come out lower?

Do I need to change other parameter values?

extract object features

Hi, I am trying to extract the object features using your code in https://github.com/VideoAnalysis/EDUVSUM/tree/master/src.

According to your paper, you are using the googleNet trained with imagenet. I assume that you are extracting features using the model "modelInceptionV3" as in the codes. However, the feature shape of " inceptionv3_feature = modelInceptionV3.predict(frmRz299)" is (8,8, 2048). I tried to change the model initialization code to
"modelInceptionV3=InceptionV3(weights='imagenet', pooling='avg', include_top=False)' to get a 2048 feature vector.
However, the object feature vector length is 1024 in the MSVA codes, and I noticed that the values of features from the feature extraction code is quite different from that in the MSVA codes. For the former, the feature values can be larger than 1, but in the latter, the value seemed to be normalized to [0,1] range.

Have I missed something?

	pS=spearmanr(y_pred2,y_true2)[0]
	kT=kendalltau(rankdata(-np.array(y_true2)), rankdata(-np.array(y_pred2)))[0]