Large-scale Robustness Analysis of Video Action Recognition Models (CVPR 2023)

This repository is for Large-scale Robustness Analysis of Video Action Recognition Models by Madeline Chantry, Naman Biyani, Prudvi Kamtam, Shruti Vyas, Hamid Palangi, Vibhav Vineet, and Yogesh Singh Rawat.

Introduction

In this work we perform a robustness analysis of existing state-of-the-art (SOTA) models for video action recognition. We mainly focus on robustness against distribution shifts due to real-world perturbations instead of adversarial perturbations. In order to see if training on these distribution shifts improves performance, we trained on a variety of perturbations (discussed in the paper) for UCF101. We additionally trained on a SOTA augmentation method PixMix. This code is used for training and evaluation of these models with these different perturbations.

Setup

Environment

In order to install the correct packages, we have provided an environment.yml for conda and a requirements.txt for pip. If you are going to use conda, first edit the prefix filepath at the last line of environment.yml to where your local conda environments are stored. More information can be found in Managing Conda Environments.

Data

The datasets required for this repository are UCF101 and SomethingSomethingv2 (SSv2). Annotations for both datasets are located in data/ucf101 and data/ssv2.

The bulk of the perturbation code is in core. Perturbation code is found in core/video_perturb.py. Dataset annotations are in data/.

Usage

Below is an example script with annotations to explain each variable. The models that are currently supported and their respective frames are:

Model	Model Var	Num Frames
ResNet50	resnet50	8
I3D	i3d_r50	8
X3D	x3d	16
MViT	mvit	16
R3D	r2d1	8
SlowFast	slowfast_r50	32

PRETRAIN_PTH=torchhub  # This is if you are using the default weights from pytorchvideo, otherwise use own path
PRETRAIN_DATASET=kinetics400  # This is to set up the original linear classifier size
TYPE=ft  # This lets code know you are not training from scratch but loading pre-trained weights

TRAIN_DATASET=ssv2 # this is the dataset that you training/finetuning on, currently either ucf101 or ssv2
TRAIN_PATH=/home/schiappa/VideoRobustnessTraining/data/ssv2/train.csv  # Corresponding annotations
TEST_PATH=/home/schiappa/VideoRobustnessTraining/data/ssv2/test.csv  # Corresponding test annotations
ROOT_DIR=/home/c3-0/datasets/somethingsomething/videos/  # The root directory of the raw videos

MODEL=resnet50 # The model architecture you would like to train/test on
echo "Training resnet50 on SSv2"

# The outputpath is where all the results and weights will be stored.
OUTPUT_PATH=/home/schiappa/VideoRobustnessTraining/output/${MODEL}/${TYPE}_${TRAIN_DATASET}/${TRAIN_P}_3/save_last.pth

# This will run the script
python main.py $MODEL  \
                --pretrain_pth $PRETRAIN_PTH \ 
                --train_dataset $TRAIN_DATASET \
                --test_dataset $TRAIN_DATASET \
                --pretrain_dataset $PRETRAIN_DATASET \
                --train_path $TRAIN_PATH \
                --test_path $TEST_PATH \
                --root_dir $ROOT_DIR \
                --do_train \  # Add this flag if training
                --epochs 50 \
                --train_batch_size 32

# Iterate through test perturbations for evaluation
TEST_PERTURBATIONS="
temporal
spatial
pixmix
mixed
compression
defocus_blur
gaussian_noise
shot_noise
impulse_noise
rotate
static_rotate
speckle_noise
sampling
reverse_sampling
jumble
box_jumble
freeze
translate
"
PRETRAIN_DATASET=ucf101

# Now trained the model, the pretrained path is different and the pre-train dataset is the dataset trained on
PRETRAIN_PTH=/home/schiappa/VideoRobustnessTraining/output/${MODEL}/${TYPE}_${TRAIN_DATASET}/None_None/save_last.pth
for TEST_P in $TEST_PERTURBATIONS
do
    echo "Running testing for model ${PRETRAIN_PTH} for perturbation ${TEST_P}"
    python /home/schiappa/VideoRobustnessTraining/main.py $MODEL  \
                                                    --pretrain_pth $PRETRAIN_PTH \
                                                    --train_dataset $TRAIN_DATASET \
                                                    --test_dataset $TRAIN_DATASET \
                                                    --pretrain_dataset $TRAIN_DATASET \
                                                    --train_path $TRAIN_PATH \
                                                    --test_path $TEST_PATH \
                                                    --root_dir $ROOT_DIR \
                                                    --test_perturbation $TEST_P \
                                                    --test_severity 3
done

Pretrained Models

Model	Mixed $(\gamma^r)$	PixMix $(\gamma^r)$	Spatial $(\gamma^r)$	Temporal $(\gamma^r)$
MViT	0.59	0.29	0.08	0.21
R2D1	1.02	0.37	0.86	1.16
ResNet50	1.06	1.11	1.16	1.12
X3D	1.12	0.98	1.25	0.9

In this case, we treat the model trained on pertrubed data as "clean": $\gamma^r_p = 1 - (A^{f}_p - A^{f_p}_p)/ A^{f_p}_p$

UCF101-DS

To use UCF101-DS, first download the dataset from the UCF webpage

Citation

@inproceedings{robustness2022large,
title={Large-scale Robustness Analysis of Video Action Recognition Models},
author={Schiappa, Madeline C and Biyani, Naman and Kamtam, Prudvi and Vyas, Shruti and Palangi, Hamid and Vineet, Vibhav and Rawat, Yogesh},
booktitle={The IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2023}
}

maddy12 / actionrecognitionrobustnesseval Goto Github PK

actionrecognitionrobustnesseval's Introduction

Large-scale Robustness Analysis of Video Action Recognition Models (CVPR 2023)

Introduction

Setup

Environment

Data

Contents

Usage

Pretrained Models

UCF101-DS

Citation

actionrecognitionrobustnesseval's People

Contributors

Stargazers

Watchers

Forkers

actionrecognitionrobustnesseval's Issues

Recommend Projects

Recommend Topics

Recommend Org