Giter Site home page Giter Site logo

nvlabs / centerpose Goto Github PK

View Code? Open in Web Editor NEW
255.0 11.0 31.0 15.74 MB

Single-Stage Keypoint-based Category-level Object Pose Estimation from an RGB Image (ICRA 2022)

License: Other

Python 86.31% Shell 0.01% C++ 6.82% C 1.15% Cuda 5.71%
deep-learning object-pose-estimation object-pose-tracking pytorch rgb

centerpose's Introduction

CenterPose

Overview

This repository is the official implementation of the paper Single-Stage Keypoint-based Category-level Object Pose Estimation from an RGB Image by Lin et al., ICRA 2022 (full citation below). For videos, please visit the CenterPose project site.

In this work, we propose a single-stage, keypoint-based approach for category-level object pose estimation, which operates on unknown object instances within a known category using a single RGB image input. The proposed network performs 2D object detection, detects 2D keypoints, estimates 6-DoF pose, and regresses relative 3D bounding cuboid dimensions. These quantities are estimated in a sequential fashion, leveraging the recent idea of convGRU for propagating information from easier tasks to those that are more difficult. We favor simplicity in our design choices: generic cuboid vertex coordinates, a single-stage network, and monocular RGB input. We conduct extensive experiments on the challenging Objectron benchmark of real images, outperforming state-of-the-art methods for 3D IoU metric (27.6% higher than the single-stage approach of MobilePose and 7.1% higher than the related two-stage approach). The algorithm runs at 15 fps on an NVIDIA GTX 1080Ti GPU.

Tracking option

We also extend CenterPose to the tracking problem (CenterPoseTrack) as described in the paper Keypoint-Based Category-Level Object Pose Tracking from an RGB Sequence with Uncertainty Estimation by Lin et al., ICRA 2022 (full citation below). For videos, please visit the CenterPoseTrack project site.

We propose a single-stage, category-level 6-DoF pose estimation algorithm that simultaneously detects and tracks instances of objects within a known category. Our method takes as input the previous and current frame from a monocular RGB video, as well as predictions from the previous frame, to predict the bounding cuboid and 6-DoF pose (up to scale). Internally, a deep network predicts distributions over object keypoints (vertices of the bounding cuboid) in image coordinates, after which a novel probabilistic filtering process integrates across estimates before computing the final pose using PnP. Our framework allows the system to take previous uncertainties into consideration when predicting the current frame, resulting in predictions that are more accurate and stable than single frame methods. Extensive experiments show that our method outperforms existing approaches on the challenging Objectron benchmark of annotated object videos. We also demonstrate the usability of our work in an augmented reality setting. The algorithm runs at 10 fps on an NVIDIA GTX 1080Ti GPU.

Installation

The code was tested on Ubuntu 16.04, with Anaconda Python 3.6 and PyTorch 1.1.0. Higher versions should be possible with some accuracy difference. NVIDIA GPUs are needed for both training and testing.


NOTE

For hardware-accelerated ROS2 inference support, please visit Isaac ROS CenterPose which has been tested with ROS2 Foxy on Jetson AGX Xavier/JetPack 4.6 and on x86/Ubuntu 20.04 with RTX3060i.


  1. Clone this repo:

    CenterPose_ROOT=/path/to/clone/CenterPose
    git clone https://github.com/NVlabs/CenterPose.git $CenterPose_ROOT
    
  2. Create an Anaconda environment or create your own virtual environment

    conda create -n CenterPose python=3.6
    conda activate CenterPose
    pip install -r requirements.txt
    conda install -c conda-forge eigenpy
    
  3. Compile the deformable convolutional layer

    git submodule init
    git submodule update
    cd $CenterPose_ROOT/src/lib/models/networks/DCNv2
    ./make.sh
    

    [Optional] If you want to use a higher version of PyTorch, you need to download the latest version of DCNv2 and compile the library.

    git submodule set-url https://github.com/jinfagang/DCNv2_latest.git src/lib/models/networks/DCNv2
    git submodule sync
    git submodule update --init --recursive --remote
    cd $CenterPose_ROOT/src/lib/models/networks/DCNv2
    ./make.sh
    
  4. Download our CenterPose pre-trained models and move all the .pth files to $CenterPose_ROOT/models/CenterPose/. Similarly, download our CenterPoseTrack pre-trained models and move all the .pth files to $CenterPose_ROOT/models/CenterPoseTrack/. We currently provide models for 9 categories: bike, book, bottle, camera, cereal_box, chair, cup, laptop, and shoe.

  5. Prepare training/testing data

    We save all the training/testing data under $CenterPose_ROOT/data/.

    For the Objectron dataset, we created our own data pre-processor to extract the data for training/testing. Refer to the data directory for more details.

Demo

We provide supporting demos for image, videos, webcam, and image folders. See $CenterPose_ROOT/images/CenterPose

For category-level 6-DoF object estimation on images/video/image folders, run:

cd $CenterPose_ROOT/src
python demo.py --demo /path/to/image/or/folder/or/video --arch dlav1_34 --load_model ../path/to/model

Similarly, for category-level 6-DoF object tracking, run:

cd $CenterPose_ROOT/src
python demo.py --demo /path/to/folder/or/video --arch dla_34 --load_model ../path/to/model --tracking_task

You can also enable --debug 2 to display more intermediate outputs or --debug 4 to save all the intermediate and final outputs.

For the webcam demo (You may want to specify the camera intrinsics via --cam_intrinsic), run:

cd $CenterPose_ROOT/src
python demo.py --demo webcam --arch dlav1_34 --load_model ../path/to/model

Similarly, for tracking, run:

cd $CenterPose_ROOT/src
python demo.py --demo webcam --arch dla_34 --load_model ../path/to/model --tracking_task

Training

We follow the approach of CenterNet for training the DLA network, reducing the learning rate by 10x after epoch 90 and 120, and stopping after 140 epochs. Similarly, for CenterPoseTrack, we train the DLA network, reducing the learning rate by 10x after epoch 6 and 10, and stopping after 15 epochs.

For debug purposes, you can put all the local training params in the $CenterPose_ROOT/src/main_CenterPose.py script. Similarly, CenterPoseTrack can follow $CenterPose_ROOT/src/main_CenterPoseTrack.py script. You can also use the command line instead. More options are in $CenterPose_ROOT/src/lib/opts.py.

To start a new training job, simply do the following, which will use default parameter settings:

cd $CenterPose_ROOT/src
python main_CenterPose.py

The result will be saved in $CenterPose_ROOT/exp/object_pose/$dataset_$category_$arch_$time ,e.g., objectron_bike_dlav1_34_2021-02-27-15-33

You could then use tensorboard to visualize the training process via

cd $path/to/folder
tensorboard --logdir=logs --host=XX.XX.XX.XX

Evaluation

We evaluate our method on the Objectron dataset, please refer to the objectron_eval directory for more details.

Citation

Please cite the following if you use this repository in your publications:

@inproceedings{lin2022icra:centerpose,
  title={Single-Stage Keypoint-based Category-level Object Pose Estimation from an {RGB} Image},
  author={Lin, Yunzhi and Tremblay, Jonathan and Tyree, Stephen and Vela, Patricio A. and Birchfield, Stan},
  booktitle={IEEE International Conference on Robotics and Automation (ICRA)},
  year={2022}
}

@inproceedings{lin2022icra:centerposetrack,
  title={Keypoint-Based Category-Level Object Pose Tracking from an {RGB} Sequence with Uncertainty Estimation},
  author={Lin, Yunzhi and Tremblay, Jonathan and Tyree, Stephen and Vela, Patricio A. and Birchfield, Stan},
  booktitle={IEEE International Conference on Robotics and Automation (ICRA)},
  year={2022}
}

Licence

CenterPose is licensed under the NVIDIA Source Code License - Non-commercial.

centerpose's People

Contributors

hemalshahnv avatar nv-jeff avatar sbirchfield avatar uio96 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

centerpose's Issues

Question about obj_scale_loss

Hello, thank you for the nice work!
I have a question about obj_scale_loss.
Why do you use different forms of the scale loss in training and validation phase?
Specifically, in the training phase:

obj_scale_loss += self.crit_reg(output['scale'], batch['reg_mask'],
batch['ind'], batch['scale']) / opt.num_stacks

loss = torch.abs(target * mask - pred * mask).sum(dim=(2, 3))

and in the validation phase:
# Calculate relative loss only on validation phase
obj_scale_loss += self.crit_reg(output['scale'], batch['reg_mask'],
batch['ind'], batch['scale'], relative_loss=True) / opt.num_stacks

target_rmzero = target.clone()
target_rmzero[target_rmzero == 0] = 1e-06
loss = torch.abs((1 * mask - pred * mask) / target_rmzero).sum(dim=(2, 3))

torch.abs(target * mask - pred * mask) and torch.abs((1 * mask - pred * mask) / target_rmzero) does not produce same values.
I want to know the meaning of the "relative loss" in the validation phase and why it is only used in the validation phase.

I find some suspicious code in decode process

In decode.py line 188:
"mask_2 = (mask_2 == 7).float().expand(batch, num_joints, K, 2)"

but when I debug this code, there are all "False" value, since mask_2 is a bool tensor(also, my pytorch vision is 2.0.0, maybe it's correct in lower vision)

eval problem

Could you please provide a detailed procedure for evaluating the model? Because I tested it according to the code you provided and can't output any results, thanks.

Run with CPU

is there a way to run the demo with Cpu only?
I am having so many errors getting the DCNv2 working with cuda......

Way to generate category labels for CenterPose

Hello,

Thanks for your work. Is there a way to generate category labels in CenterPose as you do for DOPE during instance segmentation ? As there can be multiple categories, it's confusing if the network is not generating any labels for each category.

Thanks in advance.

RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:425

(CenterPose) dell1804@dell1804-G3-3590:~/center_pose_ws/CenterPose/src$ python demo.py --demo ../data/book.jpg --arch dlav1_34 --load_model ../models/CenterPose/book_v1_140.pth
/home/dell1804/anaconda3/envs/CenterPose/lib/python3.6/site-packages/sklearn/utils/linear_assignment_.py:22: FutureWarning: The linear_assignment_ module is deprecated in 0.21 and will be removed from 0.23. Use scipy.optimize.linear_sum_assignment instead.
FutureWarning)
Fix size testing.
training chunk_sizes: [1]
The output will be saved to /home/dell1804/center_pose_ws/CenterPose/src/lib/../../exp/object_pose/default
heads {'hm': 1, 'wh': 2, 'hps': 16, 'reg': 2, 'hm_hp': 8, 'hp_offset': 2, 'scale': 3}
Creating model...
loaded ../models/CenterPose/book_v1_140.pth, epoch 140
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=383 error=11 : invalid argument
Traceback (most recent call last):
File "demo.py", line 156, in
demo(opt, meta)
File "demo.py", line 83, in demo
ret = detector.run(image_name, meta_inp=meta)
File "/home/dell1804/center_pose_ws/CenterPose/src/lib/detectors/base_detector.py", line 474, in run
images, self.pre_images, pre_hms, pre_hm_hp, pre_inds, return_time=True)
File "/home/dell1804/center_pose_ws/CenterPose/src/lib/detectors/object_pose.py", line 135, in process
output = self.model(images, pre_images, pre_hms, pre_hm_hp)[-1]
File "/home/dell1804/anaconda3/envs/CenterPose/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/dell1804/center_pose_ws/CenterPose/src/lib/models/networks/pose_dla_dcn.py", line 531, in forward
x = self.dla_up(x)
File "/home/dell1804/anaconda3/envs/CenterPose/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/dell1804/center_pose_ws/CenterPose/src/lib/models/networks/pose_dla_dcn.py", line 441, in forward
ida(layers, len(layers) - i - 2, len(layers))
File "/home/dell1804/anaconda3/envs/CenterPose/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/dell1804/center_pose_ws/CenterPose/src/lib/models/networks/pose_dla_dcn.py", line 415, in forward
layers[i] = upsample(project(layers[i]))
File "/home/dell1804/anaconda3/envs/CenterPose/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/dell1804/center_pose_ws/CenterPose/src/lib/models/networks/pose_dla_dcn.py", line 387, in forward
x = self.conv(x)
File "/home/dell1804/anaconda3/envs/CenterPose/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/dell1804/center_pose_ws/CenterPose/src/lib/models/networks/DCNv2/dcn_v2.py", line 128, in forward
self.deformable_groups)
File "/home/dell1804/center_pose_ws/CenterPose/src/lib/models/networks/DCNv2/dcn_v2.py", line 31, in forward
ctx.deformable_groups)
RuntimeError: cublas runtime error : the GPU program failed to execute at /pytorch/aten/src/THC/THCBlas.cu:425

(CenterPose) dell1804@dell1804-G3-3590:~/center_pose_ws/CenterPose/src$ nvidia-smi
/usr/bin/nvidia-modprobe: unrecognized option: "-s"

ERROR: Invalid commandline, please run /usr/bin/nvidia-modprobe --help for usage information.

/usr/bin/nvidia-modprobe: unrecognized option: "-s"

ERROR: Invalid commandline, please run /usr/bin/nvidia-modprobe --help for usage information.

Sun Oct 9 11:10:47 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.141.03 Driver Version: 470.141.03 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A |
| N/A 50C P8 2W / N/A | 1083MiB / 3911MiB | 14% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1165 G /usr/lib/xorg/Xorg 226MiB |
| 0 N/A N/A 1846 G /usr/bin/gnome-shell 50MiB |
| 0 N/A N/A 3778 G ...428520904353170423,131072 72MiB |
| 0 N/A N/A 24592 C python 727MiB |
+-----------------------------------------------------------------------------+

Python 3.6.15 | packaged by conda-forge | (default, Dec 3 2021, 18:49:41)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

import torch
torch.version
'1.1.0'

Cann't download your pre-trained models

Hi, thanks for your code, but I have problem in downing your pre-trained model, I try different computers, but it dose not work. So I want to ask you if there is some problem because a few days ago, my friend successfully down your pre-trained model.

How to obtain the GT rotation matrix and euler angles labels of each object in Objectron?

Thank you for your released code. Recently, I'm doing a project using the Objectron dataset. Now, I need obtain the GT rotation matrix and euler angles labels of each object. The given labels of Objectron contain the possible answer of GT rotation matrix (either camera.transform or camera.view_matrix). This is also explained by an official related issue in Objectron.

Then, I utilized the camera.view_matrix multipling object.rotation for each instance as the GT rotation matrix, and wanted to further visualize the orientation of each object. I have to convert the GT rotation matrix into corresponding euler angles. Below is my code.

from scipy.spatial.transform import Rotation
rot_mat_2 = np.transpose(rot_mat)  # rot_mat is the GT rotation matrix
euler_angles = Rotation.from_matrix(rot_mat_2).as_euler("zxy", degrees=True)  # is this order right?
[roll, pitch, yaw] = euler_angles 

I plotted three euler angles for further checking. However, the orientation is not always visually right. Some examples are shown below.

positive / right examples:

negative / wrong examples:

I'm not sure if this is my fault or just the GT labels of orientation have large noise caused by human annotators. I noticed that you have shown right visualiztion of object orientation by euler angles in your paper. But I cannot find the corresponding code in this repo. Could you please share your processing steps? Thank you very much.

visualize 2D keypoints

I was wondering if there is a way to visualize 2D keypoints during inference?
thanks!

Number of needed RGB images for a custom object in my own custom dataset

Could you please suggest what would be a sensible number of frames for training CenterPose for a single custom class (in my own custom dataset)?

I have synthetic images and each image may contain up to 3 types (3 different CAD types) of the the same custom class.

I would like to know if you have suggestions on what would be a reasonable number of images for training purposes to get sensible results on real-world data of the same class.

Thanks a lot for your impressive work.

Hard to run eval_video_official.py & some training questions

Hi, I encounter some problems when using the code.
I followed data/Readme.md and download the chair data then preprocess it. After that, I got nothing inside the output folder (e.g. outf_all) except another empty folder named 'chair_train'.

I got a 'bug_lists.txt' file, seems including all the related 'chair' category video name after preprocessing.

I see there's also 'bug_list.txt' in the 'label' folder? What do you mean by 'bug_lists'?

Please help me figure out what matters, Thank you very much!

Training of own dataset

Hi,

I have tested this solution using the objectron dataset and it works well. I will like to test the centreposetrack on my own dataset and will like to know how can we go about annotating our own dataset ? Do you have some examples on the training data generation?

Thanks

Kelvin

Question about scale factor on translation vector

Hello, thanks for this awesome work!

I have a question related to the scale factor: from your paper it's crystal clear to me that it's possible to recover just the relative size of the estimated 3D bounding box.

However I was wondering whether the translation vector of the 6D pose is estimated with absolute scale (provided the correct intrinsic parameters) or not

I remain at your disposal for further clarification about my question.

Question about experiements

Great work and nice documented repository.

First is there a plan to update DCNv2 to the latest working then also on the newest PyTorch version?

Second, I have been wondering if there is any subsequent work planed using the NOCS dataset?

RuntimeError: CUDA error: no kernel image is available for execution on the device

(CenterPose) root@eunseon-ASUS:~/CenterPose/src# python demo.py --demo /root/CenterPose/images/CenterPose/chair/00000.png --arch dlav1_34 --load_model ../models/CenterPose/chair_v1_140.pth 
/root/anaconda3/envs/CenterPose/lib/python3.6/site-packages/sklearn/utils/linear_assignment_.py:22: FutureWarning: The linear_assignment_ module is deprecated in 0.21 and will be removed from 0.23. Use scipy.optimize.linear_sum_assignment instead.
  FutureWarning)
Fix size testing.
training chunk_sizes: [1]
The output will be saved to  /root/CenterPose/src/lib/../../exp/object_pose/default
heads {'hm': 1, 'wh': 2, 'hps': 16, 'reg': 2, 'hm_hp': 8, 'hp_offset': 2, 'scale': 3}
Creating model...
Downloading: "http://dl.yf.io/dla/models/imagenet/dla34-ba72cf86.pth" to /root/.cache/torch/checkpoints/dla34-ba72cf86.pth
100%|################################################| 63228658/63228658 [00:39<00:00, 1594112.61it/s]
loaded ../models/CenterPose/chair_v1_140.pth, epoch 140
  THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=383 error=8 : invalid device function
Traceback (most recent call last):
  File "demo.py", line 156, in <module>
    demo(opt, meta)
  File "demo.py", line 83, in demo
    ret = detector.run(image_name, meta_inp=meta)
  File "/root/CenterPose/src/lib/detectors/base_detector.py", line 474, in run
    images, self.pre_images, pre_hms, pre_hm_hp, pre_inds, return_time=True)
  File "/root/CenterPose/src/lib/detectors/object_pose.py", line 135, in process
    output = self.model(images, pre_images, pre_hms, pre_hm_hp)[-1]
  File "/root/anaconda3/envs/CenterPose/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/CenterPose/src/lib/models/networks/pose_dla_dcn.py", line 528, in forward
    x = self.base(x)
  File "/root/anaconda3/envs/CenterPose/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/CenterPose/src/lib/models/networks/pose_dla_dcn.py", line 312, in forward
    x = self.base_layer(x)
  File "/root/anaconda3/envs/CenterPose/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/anaconda3/envs/CenterPose/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/root/anaconda3/envs/CenterPose/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/anaconda3/envs/CenterPose/lib/python3.6/site-packages/torch/nn/modules/activation.py", line 99, in forward
    return F.relu(input, inplace=self.inplace)
  File "/root/anaconda3/envs/CenterPose/lib/python3.6/site-packages/torch/nn/functional.py", line 941, in relu
    result = torch.relu_(input)

i using cuda 10.0, torch==1.11.0, torchvision==0.12.0

Question about loss function

I noticed that you use 2d keypoints & relative cuboid dimensions for supervision, could i also use the 6-DOF pose for surpervision?
this 6-DOF loss could backward correctly?Does PNP algorithm affect back propagation?
thanks for your reply!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.