This is a simple realization of Pose-based CNN Features for Action Recognition (P-CNN) algorithm. You could get detailed information from their project website.
Different from the original algorithm, I use the Pose estimated by Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. The Optical Flow features are computed using FlowNet. And I tested it on HMDB51 and UCF101 dataset obtaining 27.19% and 51.23% accuracy respectively.
- Download the pre-trained CNN weights from the P-CNN project website.
- Run the PCNN.py to extract the feature vectors for classification.
- Run the linearSVM.py to classify actions by a linear SVM.
Note, you should do some minor changes to accomadate this demo to your own coding environment. I pre-extract the Flow and Pose features for each frame so I could use a Dataloader to input the data to the networks efficiently. You should change the data folder as well as some data input.
Feel free to contact me if you have any question.