Giter Site home page Giter Site logo

st-gcn-data-len's Introduction

Spatial Temporal Graph Convolutional Networks (ST-GCN)

A graph convolutional network for skeleton based action recognition.

Introduction

This repository holds the codebase, dataset and models for the paper

Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition Sijie Yan, Yuanjun Xiong and Dahua Lin, AAAI 2018.

[Arxiv Preprint]

Prerequisites

Our codebase is based on Python. There are a few dependencies to run the code. The major python libraries we used are

  • PyTorch
  • NumPy
  • Other Python libraries can be installed by pip install -r requirements.txt

Data Preparation

We experimented on two skeleton-based action recognition datasts: NTU RGB+D and Kinetics-skeleton.

NTU RGB+D

NTU RGB+D can be downloaded from their website. Only the 3D skeletons(5.8GB) modality is required in our experiments. After that, this command should be used to build the database for training or evaluation:

python tools/ntu_gendata.py --data_path <path to nturgbd>

where the <path to nturgbd> points to the 3D skeletons modality of NTU RGB+D dataset you download, for example data/NTU-RGB-D/nturgbd+d_skeletons.

Kinetics-skeleton

Kinetics is a video-based dataset for action recognition which only provide raw video clips without skeleton data. To obatin the joint locations, we first resized all videos to the resolution of 340x256 and converted the frame rate to 30 fps. Then, we extracted skeletons from each frame in Kinetics by Openpose. The extracted skeleton data we called Kinetics-skeleton(7.5GB) can be directly downloaded from here.

It is highly recommended storing data in the SSD rather than HDD for efficiency.

Testing Pretrained Models

Get trained models

We provided the pretrained model weithts of our ST-GCN and the baseline model Temporal-Conv[1]. The model weights can be downloaded by running the script

bash tools/get_models.sh

The downloaded models will be stored under the ./model.

Evaluation

Once datasets and models ready, we can start the evaluation.

To evaluate ST-GCN model pretrained on Kinetcis-skeleton, run

python main.py --config config/st_gcn/kinetics-skeleton/test.yaml

For cross-view evaluation in NTU RGB+D, run

python main.py --config config/st_gcn/nturgbd-cross-view/test.yaml

For cross-subject evaluation in NTU RGB+D, run

python main.py --config config/st_gcn/nturgbd-cross-subject/test.yaml

Similary, the configuration file for testing baseline models can be found under the ./config/baseline.

To speed up evaluation by multi-gpu inference or modify batch size for reducing the memory cost, set --test-batch-size and --device like:

python main.py --config <config file> --test-batch-size <batch size> --device <gpu0> <gpu1> ...

Results

The expected Top-1 accuracy of provided models are shown here:

Model Kinetics-
skeleton (%)
NTU RGB+D
Cross View (%)
NTU RGB+D
Cross Subject (%)
Baseline[1] 20.3 83.1 74.3
ST-GCN (Ours) 30.6 88.9 80.7

[1] Kim, T. S., and Reiter, A. 2017. Interpretable 3d human action analysis with temporal convolutional networks. In BNMW CVPRW.

Training

To train a new ST-GCN model, run

python main.py --config config/st_gcn/<dataset>/train.yaml [--work-dir <work folder>]

where the <dataset> must be nturgbd-cross-view, nturgbd-cross-subject or kinetics-skeleton, depending on the dataset you want to use. The training results, including model weights, configurations and logging files, will be saved under the ./work_dir by default or <work folder> if you appoint it.

You can modify the training parameters such as work-dir, batch-size, step, base_lr and device in the command line or configuration files. The order of priority is: command line > config file > default parameter. For more information, use main.py -h.

Finally, custom model evaluation can be achieved by this command as we mentioned above:

python main.py --config config/st_gcn/<dataset>/test.yaml --weights <path to model weights>

Citation

Please cite the following paper if you use this repository in your reseach.

@inproceedings{stgcn2018aaai,
  title     = {Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition},
  author    = {Sijie Yan and Yuanjun Xiong and Dahua Lin},
  booktitle = {AAAI},
  year      = {2018},
}

Contact

For any question, feel free to contact

Sijie Yan     : [email protected]
Yuanjun Xiong : [email protected]

st-gcn-data-len's People

Contributors

zhujiagang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

st-gcn-data-len's Issues

Custom dataset with fewer junctions

Hello, I am currently rerunning the training and evaluation on kinetics-skeleton, however I want to use this repository for my custom dataset with fewer interest points and junctions. Can you advise me where I need to pay attention? Going through the code it seems I will need to redefine junctions of the body as graph for example.

st-gcn 1.0 train accuracy problem

@zhujiagang thank you for sharing your experiences at https://github.com/yysijie/st-gcn/issues/23. I want to train the st-gcn model in NTU RGB+D dataset. I use the old version code with default parameters. But, after more 80 epoch's training I got poor result. Then I modify the learning rate according to your paper, but the accuracy is still poor(Top1: 1.67%, Top5: 8.27%). As a new student in this field, I was puzzled if I missed some details. Hope your reply. Thanks a lot.

[ Sat Dec 22 18:59:51 2018 ] Training epoch: 80
[ Sat Dec 22 19:00:03 2018 ] Batch(0/589) done. Loss: 4.1821 lr:0.001000
[ Sat Dec 22 19:03:31 2018 ] Batch(100/589) done. Loss: 4.1529 lr:0.001000
[ Sat Dec 22 19:06:57 2018 ] Batch(200/589) done. Loss: 4.1635 lr:0.001000
[ Sat Dec 22 19:10:27 2018 ] Batch(300/589) done. Loss: 4.1190 lr:0.001000
[ Sat Dec 22 19:13:55 2018 ] Batch(400/589) done. Loss: 4.1429 lr:0.001000
[ Sat Dec 22 19:17:22 2018 ] Batch(500/589) done. Loss: 4.1611 lr:0.001000
[ Sat Dec 22 19:20:24 2018 ] Mean training loss: 4.1409.
[ Sat Dec 22 19:20:24 2018 ] Time consumption: [Data]01%, [Network]99%
[ Sat Dec 22 19:20:24 2018 ] Eval epoch: 80
[ Sat Dec 22 19:24:07 2018 ] Mean test loss of 296 batches: 4.106204550008516.
[ Sat Dec 22 19:24:08 2018 ] Top1: 1.26%
[ Sat Dec 22 19:24:08 2018 ] Top5: 9.44%

modify the learning rate according to your paper:

[ Tue Dec 25 06:20:27 2018 ] Training epoch: 20
[ Tue Dec 25 06:20:48 2018 ] Batch(0/589) done. Loss: 4.5612 lr:0.001000
[ Tue Dec 25 06:25:36 2018 ] Batch(100/589) done. Loss: 4.8982 lr:0.001000
[ Tue Dec 25 06:30:21 2018 ] Batch(200/589) done. Loss: 4.5673 lr:0.001000
[ Tue Dec 25 06:35:04 2018 ] Batch(300/589) done. Loss: 4.6968 lr:0.001000
[ Tue Dec 25 06:39:46 2018 ] Batch(400/589) done. Loss: 4.5363 lr:0.001000
[ Tue Dec 25 06:44:30 2018 ] Batch(500/589) done. Loss: 4.4839 lr:0.001000
[ Tue Dec 25 06:48:41 2018 ] Mean training loss: 4.6989.
[ Tue Dec 25 06:48:41 2018 ] Time consumption: [Data]01%, [Network]99%
[ Tue Dec 25 06:48:41 2018 ] Eval epoch: 20
[ Tue Dec 25 06:53:54 2018 ] Mean test loss of 296 batches: 4.28557998747439.
[ Tue Dec 25 06:53:54 2018 ] Top1: 1.52%
[ Tue Dec 25 06:53:55 2018 ] Top5: 8.95%

Trouble

Hello,

I'm just starting in the action recognition and I have a few questions.

First I just wish to draw a skeleton from the dataset NTU RGB in Python (I downloaded the dataset already).

Then, is it possible to make argparse working on Python 3.7 ? Because I installed all the packages except this one and when I do your first line, I doesn't work.

Thank you for your answer,

Robin

Environment details to repeat training/evaluation

I am trying to repeat the evaluation on kinetics skeleton, and I have been having errors which I believe might be related to version conflict. For example:

loss_value.append(loss.data[0])
IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

I fix it then I get another one:

Traceback (most recent call last):
File "main.py", line 426, in
processor.start()
File "main.py", line 388, in start
epoch=0, save_score=self.arg.save_score, loader_name=['test'])
File "main.py", line 352, in eval
ln, len(self.data_loader[ln]), np.mean(loss_value)))
File "/usr/local/lib/python3.5/dist-packages/numpy/core/fromnumeric.py", line 2957, in mean
out=out, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/numpy/core/_methods.py", line 80, in _mean
ret = ret.dtype.type(ret / rcount)
AttributeError: 'torch.dtype' object has no attribute 'type'

etc.

Can you please write the environment you have run these code under? e.g. pytorch, torchvision version. I am assuming you are using Python 2.7? Since it will not be supported soon and since there are many conflicts between versions relating pytorch, CUDA etc. I wonder if it is possible to make it work for Python3 easily?

Update: For anyone wondering, I made it work with Python2.7 and pytorch 0.4.0

Both validation (part of training) and test are done with val data?

In the config file, for kinetics skeleton, the training uses batches from the validation dataset during eval steps, and the test step uses the validation dataset in its entire.

Shouldn't the validation set be different from the final test set? Isn't the training using validation steps to optimize training using the val_loss? Or is the training independent of this val_loss? or any kind of regularization? fine-tune the model hyperparameters? and therefore, the validation dataset?

Lower performance with Kinetics skeleton

I am trying to repeat the tests for kinematics dataset, however it seems the loss does not really go down and even when it does, it is quite jumpy and keeps raising. Is there any difference that might affect the performance with the original paper's implementation? Even after 26th epoch the accuracy is only 18% and the loss goes back and forth between 3.5 - 4.7.

I had first tested with my own custom dataset and got very low (and quite random) results even after 200th epoch (it never really converged), so I wanted to repeat the tests on kinematics fully to see whether I have the benchmark performance but it seems not, I wonder what the reason might be.

3D dataset read in / middle formatted data and label files provided

thanks for sharing your implementation, I was wondering if you could share the files below, or just a snippet that shows its format would also work:
data_path = "./data/NTU-RGB-D/xview/val_data.npy"
label_path = "./data/NTU-RGB-D/xview/val_label.pkl"

I am working with my own custom dataset, I used the format for the 2D with kinetics skeleton dataset, so I got the json format similar to openpose. Now, I want to use the 3D version with my custom dataset, and it seems the NTU-RGB-D dataset is huge and its format is very complicated, I see that using the tools you create a middle format and then read it in (the files above), I want to bypass all this format reading in and conversion and I want to figure out how I can read in the 3D dataset in a similar way to kinetics dataset, meaning only X,Y,Z coordinates of 5 joints (2 actors per frame) and the probability of the joint node coordinates. Can you share these files so that I can use this middle format instead of all the conversions required with the NTU dataset.

how to make my own dataset?

hi, If I want to create my own data set for identifying specific actions, What should I do. I downloaded the Kinetics-skeleton data set, but the data set is very large. I changed the batch of training. After running for a period of time, I was reminded that the memory overflow, I only have one GPU.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.