Giter Site home page Giter Site logo

felixzhang7 / tadaconv Goto Github PK

View Code? Open in Web Editor NEW

This project forked from alibaba-mmai-research/tadaconv

0.0 0.0 0.0 1.29 MB

[ICLR 2022] TAda! Temporally-Adaptive Convolutions for Video Understanding. This codebase provides solutions for video classification, video representation learning and temporal detection.

Home Page: https://tadaconv-iclr2022.github.io

License: Apache License 2.0

Python 100.00%

tadaconv's Introduction

TAda! TAdaConv for Video Understanding

This repository provides the official pytorch implementation of the following papers for video classification and temporal localization. For more details on the respective paper, please refer to the project folder.

Video/Action Classification

Self-supervised video representation learning

Temporal Action Localization

About

This repository is released as part of the video understanding project EssentialMC2 from DAMO Academy. EssentialMC2 provides industry-level solutions to video understanding problems, which includes representation learning, relation reasoning and openset life-long learning.

Latest

[2022-02] TAda2D features for action localization released.

[2022-01] TAdaConv accepted to ICLR 2022.

[2021-10] Codes and models released.

Guidelines

Installation, data preparation and running

The general pipeline for using this repo is the installation, data preparation and running. See GUIDELINES.md.

Using TAdaConv2d in your video backbone

To use TAdaConv2d in your video backbone, please follow the following steps:

# 1. copy tada_branch somewhere in your project 
#    and import TAdaConv2d, RouteFuncMLP
from tada_branch import TAdaConv2d, RouteFuncMLP

class Model(nn.Module):
  def __init__(self):

    ...

    # 2. define tadaconv and the route func in your model
    self.conv_rf = RouteFuncMLP(
                c_in=64,            # number of input filters
                ratio=4,            # reduction ratio for MLP
                kernels=[3,3],      # list of temporal kernel sizes
    )
    self.conv = TAdaConv2d(
                in_channels     = 64,
                out_channels    = 64,
                kernel_size     = [1, 3, 3], # usually the temporal kernel size is fixed to be 1
                stride          = [1, 1, 1], # usually the temporal stride is fixed to be 1
                padding         = [0, 1, 1], # usually the temporal padding is fixed to be 0
                bias            = False,
                cal_dim         = "cin"
            )

     ...

  def self.forward(x):

    ...
    
    # 3. replace 'x = self.conv(x)' with the following line
    x = self.conv(x, self.conv_rf(x))

    ...

Model Zoo

Dataset architecture depth #frames acc@1 acc@5 checkpoint config
SSV2 TAda2D R50 8 64.0 88.0 [google drive][baidu(code:dlil)] tada2d_8f.yaml
SSV2 TAda2D R50 16 65.6 89.1 [google drive][baidu(code:f857)] tada2d_16f.yaml
K400 TAda2D R50 8 x 8 76.7 92.6 [google drive][baidu(code:p06d)] tada2d_8x8.yaml
K400 TAda2D R50 16 x 5 77.4 93.1 [google drive][baidu(code:6k8h)] tada2d_16x5.yaml

More of our pre-trained models are included in the MODEL_ZOO.md.

Feature Zoo

We include strong features for action localization on HACS and Epic-Kitchens-100 in our FEATURE_ZOO.md.

Contributors

This codebase is written and maintained by Ziyuan Huang, Zhiwu Qing and Xiang Wang.

Citations

If you find our codebase useful, please consider citing the respective work :).

@inproceedings{huang2021tada,
  title={TAda! Temporally-Adaptive Convolutions for Video Understanding},
  author={Huang, Ziyuan and Zhang, Shiwei and Pan, Liang and Qing, Zhiwu and Tang, Mingqian and Liu, Ziwei and Ang Jr, Marcelo H},
  booktitle={{ICLR}},
  year={2022}
}
@inproceedings{mosi2021,
  title={Self-supervised motion learning from static images},
  author={Huang, Ziyuan and Zhang, Shiwei and Jiang, Jianwen and Tang, Mingqian and Jin, Rong and Ang, Marcelo H},
  booktitle={{CVPR}},
  pages={1276--1285},
  year={2021}
}
@article{huang2021towards,
  title={Towards training stronger video vision transformers for epic-kitchens-100 action recognition},
  author={Huang, Ziyuan and Qing, Zhiwu and Wang, Xiang and Feng, Yutong and Zhang, Shiwei and Jiang, Jianwen and Xia, Zhurong and Tang, Mingqian and Sang, Nong and Ang Jr, Marcelo H},
  journal={arXiv preprint arXiv:2106.05058},
  year={2021}
}
@article{qing2021stronger,
  title={A Stronger Baseline for Ego-Centric Action Detection},
  author={Qing, Zhiwu and Huang, Ziyuan and Wang, Xiang and Feng, Yutong and Zhang, Shiwei and Jiang, Jianwen and Tang, Mingqian and Gao, Changxin and Ang Jr, Marcelo H and Sang, Nong},
  journal={arXiv preprint arXiv:2106.06942},
  year={2021}
}

tadaconv's People

Contributors

huang-ziyuan avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.