Giter Site home page Giter Site logo

mpalaourg / xai-sum Goto Github PK

View Code? Open in Web Editor NEW

This project forked from e-apostolidis/xai-sum

0.0 0.0 0.0 278.77 MB

A PyTorch implementation of our attention-based method for explainable video summarization, presented in "Explaining video summarization based on the focus of attention" (IEEE ISM 2022)

License: Other

Python 100.00%

xai-sum's Introduction

Explaining Video Summarization Based on the Focus of Attention

PyTorch Implementation of our Attention-based Method for Explainable Video Summarization [Paper] [DOI] [Cite]

  • From "Explaining Video Summarization Based on the Focus of Attention", Proc. of the IEEE Int. Symposium on Multimedia (ISM), Dec. 2022.
  • Written by Evlampios Apostolidis, Georgios Balaouras, Vasileios Mezaris and Ioannis Patras
  • This software can be used for studying our method for producing attention-based explanations for the outcomes of the CA-SUM model for video summarization, and re-producing the reported exprerimental results in our paper.

Main dependencies

Developed, checked and verified on an Ubuntu 20.04.5 PC with an NVIDIA RTX 2080Ti GPU and an i5-11600K CPU. Main packages required:

Python PyTorch CUDA Version cuDNN Version NumPy H5py
3.8(.8) 1.7.1 11.0 8005 1.20.2 2.10.0

Data

Structured h5 files with the video features and annotations of the SumMe and TVSum datasets are available within the data folder. The GoogleNet features of the video frames were extracted by Ke Zhang and Wei-Lun Chao and the h5 files were obtained from Kaiyang Zhou. These files have the following structure:

/key
    /features                 2D-array with shape (n_steps, feature-dimension)
    /gtscore                  1D-array with shape (n_steps), stores ground truth importance score (used for training, e.g. regression loss)
    /user_summary             2D-array with shape (num_users, n_frames), each row is a binary vector (used for test)
    /change_points            2D-array with shape (num_segments, 2), each row stores indices of a segment
    /n_frame_per_seg          1D-array with shape (num_segments), indicates number of frames in each segment
    /n_frames                 number of frames in original video
    /picks                    positions of sub-sampled frames in original video
    /n_steps                  number of sub-sampled frames
    /gtsummary                1D-array with shape (n_steps), ground truth summary provided by user (used for training, e.g. maximum likelihood)
    /video_name (optional)    original video name, only available for SumMe dataset

Original videos and annotations for each dataset are also available in the dataset providers' webpages:

Running an experiment

To run an experiment using one of the aforementioned datasets and considering all of its randomly-generated splits (stored in the JSON file included in the data/splits directory), execute the following command:

python model/main.py --dataset 'dataset_name' --replacement_method 'repl_function_name' --replaced_fragments 'set_of_repl_fragments' --visual_mask 'mask_name'

where, dataset_name refers to the name of the used dataset, repl_function_name refers to the applied replacement function in fragments of the input data, set_of_repl_fragments refers to the amount of replaced fragments of the input data, and mask_name refers to the type of the used mask for replacing fragments of the input data.

After executing the above command you get the results for each different data split, as well as the overall results that are computed by averaging the obtained scores across data splits. The overall results correspond to the ones reported in Table 1 of our paper for the different replacement functions. Please note that, the results when fragments' replacement is based on "Randomization" might be slightly different from the reported ones, as we did not use a fixed seed value in our experiments.

Configurations

Setup for the experimental evaluation:

  • In main.py, specify the path to the pretrained models of the CA-SUM network for video summarization.
  • In data_loader.py, specify the paths to the h5 file of the used dataset, and the JSON file containing data about the used data splits.

Arguments in configs.py:

Parameter name Description Default Value Options
dataset Used dataset in experiments. 'SumMe' 'SumMe', 'TVSum'
replacement_method Applied replacement function. 'slice-out' 'slice-out', 'input-mask', 'random', 'attention-mask'
replaced_fragments Amount of replaced fragments. 'batch' 'batch', 'single'
visual_mask Visual mask used for replacement. 'black-frame' 'black-frame', 'white-frame'

Citation

If you find our work, code or pretrained models, useful in your work, please cite the following publication:

E. Apostolidis, G. Balaouras, V. Mezaris, I. Patras, "Explaining Video Summarization Based on the Focus of Attention", Proc. IEEE Int. Symposium on Multimedia (ISM), Dec. 2022.

BibTeX:

@INPROCEEDINGS{9666088,
    author    = {Apostolidis, Evlampios and Balaouras, Georgios and Mezaris, Vasileios and Patras, Ioannis},
    title     = {Explaining Video Summarization Based on the Focus of Attention},
    booktitle = {2022 IEEE International Symposium on Multimedia (ISM)},
    month     = {December},
    year      = {2022}
}

License

Copyright (c) 2022, Evlampios Apostolidis, Georgios Balaouras, Vasileios Mezaris, Ioannis Patras / CERTH-ITI. All rights reserved. This code is provided for academic, non-commercial use only. Redistribution and use in source and binary forms, with or without modification, are permitted for academic non-commercial use provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation provided with the distribution.

This software is provided by the authors "as is" and any express or implied warranties, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose are disclaimed. In no event shall the authors be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage.

Acknowledgement

This work was supported by the EU Horizon 2020 programme under grant agreement H2020-951911 AI4Media.

xai-sum's People

Contributors

e-apostolidis avatar mpalaourg avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.