This repository contains PyTorch code for the analysis of different modalities for the Kinetics dataset
Models: I3D, 3D-ResNet, 3D-DenseNet, 3D-ResNeXt
Datasets: Kinetics, PHAV
Clone and install:
git clone https://github.com/MKowal2/multimodal_action.git
cd PyTorchConv3D
pip install -r requirements.txt
python setup.py install
- Python 3.5+
- Numpy (developed with 1.15.0)
- PyTorch >= 1.0.0
- PIL (optional)
Training ResNet-50 from scratch on Kinetics, Flow Modality:
python train.py --dataset=kinetics --multi_modal --Flow --model=resnet --video_path=/home/Datasets/KINETICS/kinetics400 --annotation_path=/home/Datasets/KINETICS/kinetics400.json --model_depth=50 --spatial_size=112 --sample_duration=32 --optimizer=SGD --learning_rate=0.01
- This code was used for models and general folder structure: https://github.com/tomrunia/PyTorchConv3D
- Carreira and Zisserman - "Quo Vadis, Action Recognition?" (CVPR, 2017)
- de Souza et al. - "Procedural Generation of Videos to Train Deep Action Recognition Networks " (CVPR, 2017)
- Hara et al. - "Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?" (CVPR, 2018)