Giter Site home page Giter Site logo

movenet's Introduction

MoveNet

Unofficial implementation of MoveNet from Google. This repo is heavily borrowed from CenterNet and TorchVision.

See my another repo lee-man/movenet-pytorch for inference only.

Update

I remove the redundant code for other dataset and tasks, simplify this codebase, and make it support MoveNet fine-tuning on the customized dataset (named Active here).

Prepare the datasets

Move the images and annotations into data and name the folder as Active. The annotation format should be COCO format. If you annotate the images with MPII format, you can utilize convert_active_to_coco.py and convert_mpii_to_coco.py.

Run training code

cd src
python main.py single_pose --exp_id yoga_movenet --dataset active --arch movenet --batch_size 24  --lr 5e-4 --gpus 0 --num_epochs 50 --lr_step 30 --num_workers 4 --load_model ../models/movenet.pth

Run evaluation code

cd src
python test.py single_pose --exp_id yoga_movenet --dataset active --arch movenet --resume

To directly test the pre-trained Movenet, run:

cd src
python test.py single_pose --exp_id movenet --dataset active --arch movenet --load_model ../models/movenet.pth

Run demo code

  1. For the finetuned model, move the checkpoint to directory models and run:
    cd src
    python demo.py single_pose --dataset active --arch movenet --demo ../images/active --load_model ../models/{$MODEL} --K 1 --gpus -1 --debug 2
  2. For original MoveNet, run:
    cd src
    python demo.py single_pose --dataset active --arch movenet --demo ../images --load_model ../models/movenet.pth --K 1 --gpus -1 --debug 2

Below is the original README from CenterNet. It's ane excellent work and I realy like it.

Objects as Points

Object detection, 3D detection, and pose estimation using center point detection:

Objects as Points,
Xingyi Zhou, Dequan Wang, Philipp Krähenbühl,
arXiv technical report (arXiv 1904.07850)

Contact: [email protected]. Any questions or discussions are welcomed!

Updates

  • (June, 2020) We released a state-of-the-art Lidar-based 3D detection and tracking framework CenterPoint.
  • (April, 2020) We released a state-of-the-art (multi-category-/ pose-/ 3d-) tracking extension CenterTrack.

Abstract

Detection identifies objects as axis-aligned boxes in an image. Most successful object detectors enumerate a nearly exhaustive list of potential object locations and classify each. This is wasteful, inefficient, and requires additional post-processing. In this paper, we take a different approach. We model an object as a single point -- the center point of its bounding box. Our detector uses keypoint estimation to find center points and regresses to all other object properties, such as size, 3D location, orientation, and even pose. Our center point based approach, CenterNet, is end-to-end differentiable, simpler, faster, and more accurate than corresponding bounding box based detectors. CenterNet achieves the best speed-accuracy trade-off on the MS COCO dataset, with 28.1% AP at 142 FPS, 37.4% AP at 52 FPS, and 45.1% AP with multi-scale testing at 1.4 FPS. We use the same approach to estimate 3D bounding box in the KITTI benchmark and human pose on the COCO keypoint dataset. Our method performs competitively with sophisticated multi-stage methods and runs in real-time.

Highlights

  • Simple: One-sentence method summary: use keypoint detection technic to detect the bounding box center point and regress to all other object properties like bounding box size, 3d information, and pose.

  • Versatile: The same framework works for object detection, 3d bounding box estimation, and multi-person pose estimation with minor modification.

  • Fast: The whole process in a single network feedforward. No NMS post processing is needed. Our DLA-34 model runs at 52 FPS with 37.4 COCO AP.

  • Strong: Our best single model achieves 45.1AP on COCO test-dev.

  • Easy to use: We provide user friendly testing API and webcam demos.

Main results

Object Detection on COCO validation

Backbone AP / FPS Flip AP / FPS Multi-scale AP / FPS
Hourglass-104 40.3 / 14 42.2 / 7.8 45.1 / 1.4
DLA-34 37.4 / 52 39.2 / 28 41.7 / 4
ResNet-101 34.6 / 45 36.2 / 25 39.3 / 4
ResNet-18 28.1 / 142 30.0 / 71 33.2 / 12

Keypoint detection on COCO validation

Backbone AP FPS
Hourglass-104 64.0 6.6
DLA-34 58.9 23

3D bounding box detection on KITTI validation

Backbone FPS AP-E AP-M AP-H AOS-E AOS-M AOS-H BEV-E BEV-M BEV-H
DLA-34 32 96.9 87.8 79.2 93.9 84.3 75.7 34.0 30.5 26.8

All models and details are available in our Model zoo.

Installation

Please refer to INSTALL.md for installation instructions.

Use CenterNet

We support demo for image/ image folder, video, and webcam.

First, download the models (By default, ctdet_coco_dla_2x for detection and multi_pose_dla_3x for human pose estimation) from the Model zoo and put them in CenterNet_ROOT/models/.

For object detection on images/ video, run:

python demo.py ctdet --demo /path/to/image/or/folder/or/video --load_model ../models/ctdet_coco_dla_2x.pth

We provide example images in CenterNet_ROOT/images/ (from Detectron). If set up correctly, the output should look like

For webcam demo, run

python demo.py ctdet --demo webcam --load_model ../models/ctdet_coco_dla_2x.pth

Similarly, for human pose estimation, run:

python demo.py multi_pose --demo /path/to/image/or/folder/or/video/or/webcam --load_model ../models/multi_pose_dla_3x.pth

The result for the example images should look like:

You can add --debug 2 to visualize the heatmap outputs. You can add --flip_test for flip test.

To use this CenterNet in your own project, you can

import sys
CENTERNET_PATH = /path/to/CenterNet/src/lib/
sys.path.insert(0, CENTERNET_PATH)

from detectors.detector_factory import detector_factory
from opts import opts

MODEL_PATH = /path/to/model
TASK = 'ctdet' # or 'multi_pose' for human pose estimation
opt = opts().init('{} --load_model {}'.format(TASK, MODEL_PATH).split(' '))
detector = detector_factory[opt.task](opt)

img = image/or/path/to/your/image/
ret = detector.run(img)['results']

ret will be a python dict: {category_id : [[x1, y1, x2, y2, score], ...], }

Benchmark Evaluation and Training

After installation, follow the instructions in DATA.md to setup the datasets. Then check GETTING_STARTED.md to reproduce the results in the paper. We provide scripts for all the experiments in the experiments folder.

Develop

If you are interested in training CenterNet in a new dataset, use CenterNet in a new task, or use a new network architecture for CenterNet, please refer to DEVELOP.md. Also feel free to send us emails for discussions or suggestions.

Third-party resources

License

CenterNet itself is released under the MIT License (refer to the LICENSE file for details). Portions of the code are borrowed from human-pose-estimation.pytorch (image transform, resnet), CornerNet (hourglassnet, loss functions), dla (DLA network), DCNv2(deformable convolutions), tf-faster-rcnn(Pascal VOC evaluation) and kitti_eval (KITTI dataset evaluation). Please refer to the original License of these projects (See NOTICE).

Citation

If you find this project useful for your research, please use the following BibTeX entry.

@inproceedings{zhou2019objects,
  title={Objects as Points},
  author={Zhou, Xingyi and Wang, Dequan and Kr{\"a}henb{\"u}hl, Philipp},
  booktitle={arXiv preprint arXiv:1904.07850},
  year={2019}
}

movenet's People

Contributors

cansik avatar flymin avatar lee-man avatar rachellyy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

movenet's Issues

COCO数据集单人姿态

您好,麻烦请教个问题,请问原始COCO数据集能直接做单人姿态训练吗。
src/lib/datasets/dataset/coco_hp.py中self.max_objs = 32,我理解的并不是单人的

你好,请教关于模型训练的问题

单人的模型,使用get_pose_net()构建模型,里面的backbone是mobilenetv2,“pretained=False,默认的froze_backbone设置为True”,是不会导致backbone是随机初始化的,且训练的时候不会学习

是否支持多人姿态检测

你好,我这边尝试运行模型但是抛出异常不支持多人姿态关键点检测,请问您如何修改可以支持? 或者暂时都不支持,不支持的原因能告知一下吗? 谢谢

交流一下

作者你好,我最近也在尝试自己用pytorch复现movenet,咱们可以一起交流讨论下。

我是参考的tflite模型的结构,目前我主要有几个不是很清楚的,先抛出来

  1. 模型的header之后有很多复杂的op操作,据我观察分析也大概看出来作用了,比如有argmax等,我的疑问之一就是,这些op咱们需要写在pytorch的网络结构中吗?如果要写,那估计还要实现argmax等不支持op的反向传播的求导,同时部署转换onnx估计也会遇到不支持的op,所以我目前的方案是网络结构到4个header的conv就结束,后处理代码里实现。不知道这样会不会影响结果;
  2. 模型的数据gt如何构造,或者说loss如何计算,如果把后处理全写到网络中,那么似乎直接计算输出的关键点坐标的回归loss即可,但是显然不显示,从aux loss提升精度、多任务训练提升精度等方面考虑都是要用中间输出的。那么问题就是用哪些,我目前就是直接用conv输出的数据去和相应生成的gt计算mseloss,但是貌似没啥效果,感觉中间层还要再处理下
  3. 模型中一些常量意义不明,比如name: mul_2中的一个48*48的矩阵,其值不知道从何而来,感觉像是不同像素点的权重;又比如通过center点+reg回归点得到的关键点坐标,x、y分别减去了一个横向递增和纵向递增的矩阵,经过分析加上后面的平方我感觉是要把关键点位置置0最小,那么这么做意义是什么?凸显特征点的值?后面加了一个1.8,我猜测是防止除以0,那么这个1.8又是如何得到的?应该是有一个理论下界的,比如经过减去递增矩阵后如何如何

暂时想到这几个,希望能交流一下,网上相关的讨论太少了,论文没有、知乎没有,就一个官方介绍的博客,实在不好搞

Can the trained model be used in the official Google Repo?

Great work on this repo! I have 2 questions.

  1. Can the model be converted to a compatible format that can be used in the original repository?
  2. How long does training take and how many images should be in the trainingset to yield accurate results?

My goal is to retrain a MoveNet model and use it for inference in the TFLite.js browser. As to my understanding, Google only provides examples on how to use the models (that they provide), but do not provide the trainingscode.

I look forward hearing from you.

Active download?

Hi thanks for putting this together how do we go about downloading the active dataset?

kpt_coordinate and kpt_offset_yx question

Hi,
Thank you for your great work on this repo!

I succeeded to run inference on my example images, but have few questions regarding retraining.
I want to retrain MoveNet on a custom dataset (with 26 joints) and as a first step trying to overfit on a few samples.
The loss is decreasing, but for some reason I could not overfit yet and correctly display the results.

Could you please kindly explain this line:

kpt_coordinate= (kpt_offset_yx + kpt_coordinate) * (1/size)

Do I get it right that the coordinate is a value on the 64x64 output grid and the offset should be +-2, which we will add to the final coordinate (grid index * 4)? If this is correct, I am not sure I understand how this line reflects it.

Thanks a lot!

hi, does the movenet.pth is trained or ported from tf?

I tried to run, but seems default config didn't fit pretrained weights under models:

Creating model...
Skip loading parameter backbone.body.6.conv.2.weight, required shapetorch.Size([56, 336, 1, 1]), loaded shapetorch.Size([32, 192, 1, 1]). If you see this, your model does not fully load the pre-trained weight. Please make sure you have correctly specified --arch xxx or set the correct --num_classes for your own dataset.
Skip loading parameter backbone.body.16.conv.2.weight, required shapetorch.Size([280, 1680, 1, 1]), loaded shapetorch.Size([160, 960, 1, 1]). If you see this, your model does not fully load the pre-trained weight. Please make sure you have correctly specified --arch xxx or set the correct --num_classes for your own dataset.
Skip loading parameter backbone.body.11.conv.1.0.bias, required shapetorch.Size([672]), loaded shapetorch.Size([384]). If you see this, your model does not fully load the pre-trained weight. Please make sure you have correctly specified --arch xxx or set the correct --num_classes for your own dataset.
Skip loading parameter backbone.body.5.conv.2.bias, required shapetorch.Size([56]), loaded shapetorch.Size([32]). If you see this, your model does not fully load the pre-trained weight. Please make sure you have correctly specified --arch xxx or set the correct --num_classes for your own dataset.
Skip loading parameter backbone.body.16.conv.2.bias, required shapetorch.Size([280]), loaded shapetorch.Size([160]). If you see this, your model does not fully load the pre-trained weight. Please make sure you have correctly specified --arch xxx or set the correct --num_classes for your own dataset.
Skip loading parameter backbone.body.13.conv.2.weight, required shapetorch.Size([168, 1008, 1, 1]), loaded shapetorch.Size([96, 576, 1, 1]). If you see this, your model does not fully load the pre-trained weight. Please make sure you have correctly specified --arch xxx or set the correct --num_classes for your own dataset.
Skip loading parameter backbone.body.3.conv.2.weight, required shapetorch.Size([40, 240, 1, 1]), loaded shapetorch.Size([24, 144, 1, 1]). If you see this, your model does not fully load the pre-trained weight. Please make sure you have correctly specified --arch xxx or set the correct --num_classes for your own dataset.

any idea?

training error when loading the pth model

Hello, Lee. Thanks for your amazing work. I've got the following errors when I tried to train the model:

Creating model...
Skip loading parameter backbone.body.6.conv.2.weight, required shapetorch.Size([56, 336, 1, 1]), loaded shapetorch.Size([32, 192, 1, 1]). If you see this, your model does not fully load the pre-trained weight. Please make sure you have correctly specified --arch xxx or set the correct --num_classes for your own dataset.
Skip loading parameter backbone.body.16.conv.2.weight, required shapetorch.Size([280, 1680, 1, 1]), loaded shapetorch.Size([160, 960, 1, 1]). If you see this, your model does not fully load the pre-trained weight. Please make sure you have correctly specified --arch xxx or set the correct --num_classes for your own dataset.
Skip loading parameter backbone.body.11.conv.1.0.bias, required shapetorch.Size([672]), loaded shapetorch.Size([384]). If you see this, your model does not fully load the pre-trained weight. Please make sure you have correctly specified --arch xxx or set the correct --num_classes for your own dataset.
Skip loading parameter backbone.body.5.conv.2.bias, required shapetorch.Size([56]), loaded shapetorch.Size([32]). If you see this, your model does not fully load the pre-trained weight. Please make sure you have correctly specified --arch xxx or set the correct --num_classes for your own dataset.
Skip loading parameter backbone.body.16.conv.2.bias, required shapetorch.Size([280]), loaded shapetorch.Size([160]). If you see this, your model does not fully load the pre-trained weight. Please make sure you have correctly specified --arch xxx or set the correct --num_classes for your own dataset.
...
The conda env is cuda10.2, toch 1.9.0, and torchvision 0.10.0. It seems the shapes were not match?
Any suggestions? Thanks.

这是Bug吗?

def get_affine_transform(center,
                         scale,
                         rot,
                         output_size,
                         shift=np.array([0, 0], dtype=np.float32),
                         inv=0):
    if not isinstance(scale, np.ndarray) and not isinstance(scale, list):
        scale = np.array([scale, scale], dtype=np.float32)

    scale_tmp = scale
    src_w = scale_tmp[0]
    dst_w = output_size[0]
    dst_h = output_size[1]

    rot_rad = np.pi * rot / 180
    src_dir = get_dir([0, src_w * -0.5], rot_rad)   # 这里应该要 get_dir([src_w * -0.5, 0], rot_rad)??
    dst_dir = np.array([0, dst_w * -0.5], np.float32)  # 这里应该要 np.array([dst_w * -0.5, 0], np.float32) ??

    src = np.zeros((3, 2), dtype=np.float32)
    dst = np.zeros((3, 2), dtype=np.float32)
    src[0, :] = center + scale_tmp * shift
    src[1, :] = center + src_dir + scale_tmp * shift
    dst[0, :] = [dst_w * 0.5, dst_h * 0.5]
    dst[1, :] = np.array([dst_w * 0.5, dst_h * 0.5], np.float32) + dst_dir

    src[2:, :] = get_3rd_point(src[0, :], src[1, :])
    dst[2:, :] = get_3rd_point(dst[0, :], dst[1, :])

    if inv:
        trans = cv2.getAffineTransform(np.float32(dst), np.float32(src))
    else:
        trans = cv2.getAffineTransform(np.float32(src), np.float32(dst))

    return trans

batch_size大小设置问题

作者你好。在movenet/experiments/single_pose_movenet_coco.sh文件里,看您设置的batch_szie=24,使用的是4个gpu。我在训练的时候,设置batch_szie=24,观察显存占用的非常的少。想问您,如果方便,麻烦回答一下,这里的batch_size实际训练时给的是多少,相应的学习率是多少?

training convergence and loss value

Thanks for the good implementation material.
I am currently implementing MoveNet Lightning with Tensorflow and training with Custom Dataset.
(single pose setting)
However, it is not trained enough, and it is not overfitting even with a small amount of data.

The training dataset I have configured has only one object in the image.
However, the proportion of objects in the image is relatively small (there is a lot of background)
(specifically, 1920x1080 images & most of objects are located in center, there is a lot of blank space on the left and right of the object)

First, I wonder if there were any problems with poor convergence while learning.
Also, I am curious to what extent the loss value converges in the learning process.

About model performance after training

i modified some lines
now it can load lightning movenet model
i try your training code only using COCO dataset

training sample : 64115
validation sample : 5000

when i try finetuning published lightning movenet, the loss will be like:
--->train|loss 4.476185 | hm_loss 1.625346 | hp_loss 1.318379 | hm_hp_loss 1.305676 | hp_offset_loss 0.226784 |​
--->valid| loss 3.907756 | hm_loss 1.076794 | hp_loss 0.682829 | hm_hp_loss 2.044044 | hp_offset_loss 0.104088 |

when i train the lightning movenet from scratch, the loss will be like:
--->train|loss 8.478377 | hm_loss 2.583637 | hp_loss 2.864822 | hm_hp_loss 2.782381 | hp_offset_loss 0.247538 | ​
--->valid| loss 7.517864 | hm_loss 1.706246 | hp_loss 1.409458 | hm_hp_loss 4.288480 | hp_offset_loss 0.113680 |​
is it good?​

then i try show demo results with both models,
the model trained from scratch can not even detect key points.

Structure of Training Images

Hi Lee,

What is the expected file-structure for the training images, the labels and the "reference" between the two, to put into Active folder?

I have coco annotations for many consecutive frames in a .json and the images separately with only reference between those two being the filepath/name. I have attached a json file for a single frame for clarity.

Help would be greatly appreciated!

Thanks and best regards.
Julian
kkp_single_frame.txt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.