Giter Site home page Giter Site logo

tibhannover / msva Goto Github PK

View Code? Open in Web Editor NEW
38.0 4.0 17.0 34.75 MB

Deep learning model for supervised video summarization called Multi Source Visual Attention (MSVA)

License: MIT License

Python 100.00%
attention-mechanism visual-attention supervised-video-summarization motion-features

msva's People

Contributors

junaid112 avatar sherzod-hakimov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

msva's Issues

Request for changes in `train.py` and `knapsack.py`

ThankYou MSVA guys, for making the code public and most importantly, the splits public.


After trying to reproduce your code, I encountered two errors (Line 314)

train_val_loss_score.append([loss,np.mean(avg_loss[:, 0]),val_fscore,test_loss, video_scores,kt,sp])

Appending the loss and test_loss (both being torch.cuda.Tensor) creates an issue (can't convert cuda tensor to numpy array) while converting them to numpy array at Line 538.

Additional INFO:

According to current version of ortools the code will be:

from ortools.algorithms.python import knapsack_solver

osolver = knapsack_solver.KnapsackSolver(
    knapsack_solver.KNAPSACK_DYNAMIC_PROGRAMMING_SOLVER,
    'test')

def knapsack_ortools(values, weights, items, capacity ):
    scale = 1000
    values = np.array(values)
    weights = np.array(weights)
    values = (values * scale).astype(np.int)
    weights = (weights).astype(np.int)
    capacity = capacity
    osolver.init(values.tolist(), [weights.tolist()], [capacity])
    computed_value = osolver.solve()
    packed_items = [x for x in range(0, len(weights))
                    if osolver.best_solution_contains(x)]
    return packed_items

Thanks 😺

NOT able to reproduce spearman and kendall tau as reported in the paper.

Hello,

While reproducing the results, I'm unable to get the scores (Spearman and Kendall Tau) that you guys reported in the paper. It's deviating a lot from the reported score.

Can you please provide the actual code for computing it?

As per lines 430 and 431 (shown below)

MSVA/train.py

Lines 430 to 431 in dad26a6

pS=spearmanr(y_pred2,y_true2)[0]
kT=kendalltau(rankdata(-np.array(y_true2)), rankdata(-np.array(y_pred2)))[0]

We tried all different combinations but were unable to reach near what you have mentioned.
Please guide us!

Thanks in advance 😃 .

Issue regarding the last step of self attention (weighted sum step)

Hi, I noticed that the last step of the self-attention calculation doesn't seem so right:

att_weights_ = nn.functional.softmax(logits, dim=-1)       
weights = self.dropout(att_weights_)     
y = torch.matmul(V.transpose(1,0), weights).transpose(1,0)

So here the softmax probability is calculated along the dim -1, which is the column direction.
But then the weighted sum is taken along the row direction according to this line

y = torch.matmul(V.transpose(1,0), weights).transpose(1,0)

I think we should do something like this

y = torch.matmul(weights,V)

How do you think?
I hope I'm the one to be corrected.

Which F1-Score is reported in the paper?

Hello,

Thank you for making the code open-source. :)

I am trying to reproduce your results, and while looking into the code, it seems there are two types of split files for the non-overlapping splits: random and ordered. Could you tell me which F1 score is reported in the paper?

Best,
Noga

extract object features

Hi, I am trying to extract the object features using your code in https://github.com/VideoAnalysis/EDUVSUM/tree/master/src.

According to your paper, you are using the googleNet trained with imagenet. I assume that you are extracting features using the model "modelInceptionV3" as in the codes. However, the feature shape of " inceptionv3_feature = modelInceptionV3.predict(frmRz299)" is (8,8, 2048). I tried to change the model initialization code to
"modelInceptionV3=InceptionV3(weights='imagenet', pooling='avg', include_top=False)' to get a 2048 feature vector.
However, the object feature vector length is 1024 in the MSVA codes, and I noticed that the values of features from the feature extraction code is quite different from that in the MSVA codes. For the former, the feature values can be larger than 1, but in the latter, the value seemed to be normalized to [0,1] range.

Have I missed something?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.