This repository contains the PyTorch implementation of the CRF structure for multi-label video classification. It uses I3D pre-trained models as base classifiers (I3D is reported in the paper "Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset" by Joao Carreira and Andrew Zisserman).
This code is based on Deepmind's Kinetics-I3D and on AJ Piergiovanni's PyTorch implementation of the I3D pipeline.
This code was developped with Python 3.6 and PyTorch 0.4.0. It requieres tensorboard_logger and h5py.
This pipeline uses Deepmind's pretrained I3D models (pretrained on ImageNet and Kinetics, see Kinetics-I3D for details). These are the models denoted as rgb_imagenet.pt and flow_imagenet.pt found in the directory models/.
The base model can be trained using the following command:
python train_i3d.py -dataset 'charades' -mode 'flow' -save_model 'path_to_saving_directory' -root_train 'path_to_flow_training_data' -train_split 'path_to_train_charades.json' -root_eval 'path_to_flow_evaluation_data' -eval_split 'path_to_test_charades.json' -snippets 64 -batch_size 4 -batch_size_eval 4 -saving_steps 5000 -num_steps_per_update 1 -num_classes 157 -init_lr 0.1 -use_cls True
Dataset is either 'charades' or 'multithumos', mode is either 'flow' or 'rgb'.
To add the semi-CRF structure, add '-crf True' and the regularization value wanted as follows:
python train_i3d.py -dataset 'charades' -mode 'rgb' -save_model 'path_to_saving_directory' -root_train 'path_to_rgb_training_data' -train_split 'path_to_train_charades.json' -root_eval 'path_to_rgb_evaluation_data' -eval_split 'path_to_test_charades.json' -snippets 64 -batch_size 4 -batch_size_eval 4 -saving_steps 5000 -num_steps_per_update 1 -num_classes 157 -init_lr 0.1 -use_cls True -crf True -reg_crf 1e-4
To add the fully-CRF structure, add '-conditional_crf True' and the regularization value wanted as follows:
python train_i3d.py -mode 'rgb' -save_model 'path_to_saving_directory' -root_train 'path_to_rgb_training_data' -train_split 'path_to_train_thumos.json' -root_eval 'path_to_rgb_evaluation_data' -eval_split 'path_to_test_thumos.json' -snippets 64 -batch_size 4 -batch_size_eval 4 -saving_steps 5000 -num_steps_per_update 1 -num_classes 65 -init_lr 0.1 -crf True -use_cls True -conditional_crf True -reg_crf 1e-3
python eval_i3d.py -dataset 'charades' -mode 'rgb' -save_model 'path_to_saving_directory' -root_eval 'path_to_rgb_evaluation_data' -eval_split 'path_to_test_charades.json' -snippets 64 -batch_size_eval 1 -num_classes 157 -crf True -eval_checkpoint 750000
python eval_i3d_2_streams.py -dataset 'charades' -save_model_rgb 'path_to_rgb_saving_directory' -save_model_flow 'path_to_flow_saving_directory' -root_eval_rgb 'path_to_rgb_test_data' -root_eval_flow 'path_to_flow_test_data' -eval_split 'path_to_test_charades.json' -snippets 64 -batch_size_eval 1 -crf True -num_classes 157 -eval_checkpoint_rgb 500000 -eval_checkpoint_flow 500000
To visualize logged event through TensorBoard, use:
tensorboard --logdir=path_to_saving_directory/tensorboard_logger
For 2-Streams events (after use of eval_i3d_2_streams.py):
tensorboard --logdir=path_to_rgb_saving_directory/tensorboard_logger_2_streams