Giter Site home page Giter Site logo

hggd's Introduction

Efficient Heatmap-Guided 6-Dof Grasp Detection in Cluttered Scenes

RA-L 2023

Official code of paper Efficient Heatmap-Guided 6-Dof Grasp Detection in Cluttered Scenes

Framework

framework

Requirements

  • Python >= 3.8
  • PyTorch >= 1.10
  • pytorch3d
  • numpy==1.23.5
  • pandas
  • cupoch
  • numba
  • grasp_nms
  • matplotlib
  • open3d
  • opencv-python
  • scikit-image
  • tensorboardX
  • torchsummary
  • tqdm
  • transforms3d
  • trimesh
  • autolab_core
  • cvxopt

Installation

This code has been tested on Ubuntu20.04 with Cuda 11.1/11.3/11.6, Python3.8/3.9 and Pytorch 1.11.0/1.12.0.

Get the code.

git clone https://github.com/THU-VCLab/HGGD.git

Create new Conda environment.

conda create -n hggd python=3.8
cd HGGD

Please install pytorch and pytorch3d manually.

# pytorch-1.11.0
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
# pytorch3d
pip install fvcore
pip install --no-index --no-cache-dir pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py38_cu113_pyt1110/download.html

Install other packages via Pip.

pip install -r requirements.txt

Usage

Checkpoint

Checkpoints (realsense/kinect) can be downloaded from Tsinghua Cloud

Preprocessed Dataset

Preprocessed datasets (realsense.7z/kinect.7z) can be downloaded from Tsinghua Cloud

Containing converted and refined grasp poses from each image in graspnet dataset

Train

Training code has been released, please refer to training script

Typical hyperparameters:

batch-size # batch size, default: 4
step-cnt # step number for gradient accumulation, actual_batch_size = batch_size * step_cnt, default: 2
lr # learning rate, default: 1e-2
anchor-num # spatial rotation anchor number, default: 7
anchor-k # in-plane roation anchor number, default: 6
anchor-w # grasp width anchor size, default: 50
anchor-z # grasp depth anchor size, default: 20
all-points-num # point cloud downsample number, default: 25600
group-num # local region fps number, default: 512
center-num # sampled local center/region number, default: 128
noise # point cloud noise scale, default: 0
ratio # grasp attributes prediction downsample ratio, default: 8
grid-size # grid size for our grid-based center sampling, default: 8
scene-l & scene-r # scene range, train: 0~100, seen: 100~130, similar: 130~160, novel: 160~190
input-w & input-h # downsampled input image size, should be 640x360
loc-a & reg-b & cls-c & offset-d # loss multipier, default: 1, 5, 1, 1
epochs # training epoch number, default: 15
num-workers # dataloader worker number, default: 4
save-freq # checkpoint saving frequency, default: 1
optim # optimizer, default: 'adamw'
dataset-path # our preprocessed dataset path (read grasp poses)
scene-path  # original graspnet dataset path (read images)
joint-trainning # whether to joint train our two part of network (trainning is a typo, should be training, please ignore it)

Test

Download and unzip our preprocessed datasets (for convenience), you can also try removing unnecessary parts in our test code and directly reading images from the original graspnet dataset api.

Run test code (read rgb and depth image from graspnet dataset and eval grasps).

bash test_graspnet.sh

Attention: if you want to change camera, please remember to change camera in config.py

Typical hyperparameters:

center-num # sampled local center/region number, higher number means more regions&grasps, but gets slower speed, default: 48
grid-size # grid size for our grid-based center sampling, higher number means sparser centers, default: 8
ratio # grasp attributes prediction downsample ratio, default: 8
anchor-k # classification anchor number for grasp in-plane rotation, default: 6
anchor-w # regress anchor size for grasp width, default: 50
anchor-z # regress anchor size for grasp depth, default: 20
all-points-num # downsampled point cloud point number, default: 25600
group-num # local region point cloud point number, default: 512
local-k # grasp detection number in each local region, default: 10
scene-l & scene-r # scene range, train: 0~100, seen: 100~130, similar: 130~160, novel: 160~190
input-h & input-w # downsampled input image size, should be 640x360
local-thres & heatmap-thres # heatmap and grasp score filter threshold, set to 0.01 in our settings
dataset-path # our preprocessed dataset path (read grasp poses)
scene-path # original graspnet dataset path (read images)
num-workers # eval worker number
dump-dir # detected grasp poses dumped path (used in later evaluation)

Demo

Run demo code (read rgb and depth image from file and get grasps).

bash demo.sh

Typical hyperparameters:

center-num # sampled local center/region number, higher number means more regions&grasps, but gets slower speed, default: 48
grid-size # grid size for our grid-based center sampling, higher number means sparser centers, default: 8
all-points-num # downsampled point cloud point number, default: 25600
group-num # local region point cloud point number, default: 512
local-k # grasp detection number in each local region, default: 10

Results

Attention: HGGD detects grasps only from heatmap guidance, without any workspace mask (adopted in Graspness) or object/foreground segmentation method (adopted in Scale-balanced Grasp). It may be useful to add some of this prior information to get better results.

Evaluation results on RealSense camera:

Seen Similar Novel
In paper 59.36 51.20 22.17
In repo 64.45 53.59 24.59

Evaluation results on Kinect camera:

Seen Similar Novel
In paper 60.26 48.59 18.43
In repo 61.17 47.02 19.37

Citation

Please cite our paper in your publications if it helps your research:

@article{chen2023efficient,
  title={Efficient Heatmap-Guided 6-Dof Grasp Detection in Cluttered Scenes},
  author={Chen, Siang and Tang, Wei and Xie, Pengwei and Yang, Wenming and Wang, Guijin},
  journal={IEEE Robotics and Automation Letters},
  year={2023},
  publisher={IEEE}
}

hggd's People

Contributors

chenthree avatar

Stargazers

 avatar Yuhang Zhong avatar Sayedmohammadreza Rastegari avatar  avatar  avatar wel2018 avatar  avatar YangFive avatar Eric avatar Chinese Dragon avatar  avatar  avatar Y.Kuribayashi avatar  avatar  avatar  avatar  avatar  avatar Leibing Xiao avatar  avatar Freax Ruby avatar  avatar  avatar  avatar  avatar Zhifeng Gu avatar Jung Yeon Lee avatar  avatar Yan avatar  avatar  avatar  avatar Shenglin avatar Chunliang Zhao avatar Yoon, Seungje avatar  avatar grasp use  kinova avatar  avatar Xiaoge Cao avatar Xie Pengwei avatar echo_hyy avatar Twilight avatar  avatar

Watchers

 avatar hiyyg avatar

hggd's Issues

How to generate grasp labels

Dear author,

I want to train HGGD for another dataset. I want to know how to generate grasp labels from Graspnet-1billion dataset. Would you like to share your code for grasp generation?

Thank you very much.

Best regards,
Shiyu

Modifications to graspnetAPI in Your Repo

Hi,

I'm exploring the version of graspnetAPI in your repo for my project. Could you briefly outline the main changes you've made compared to the original version? Specifically interested in any new features or major alterations.

Thanks for your work and looking forward to your reply.

Best,
Rui

Some objects not detected

Dear authors,

Thanks for your great work. This code has significant improvement compared with Graspnet-1billion. However, I am confused why some object is not detected. For example, scissor in the demo is not detected for grasping. Is this due to scissor is not in the training dataset? How can I fix this issue?
Screenshot from 2024-02-24 17-09-30

Thank you very much.

Best regards,
Shiyu Li

question about the dataset

Hello,
Thanks for your work.
I downloaded the dataset from the link https://cloud.tsinghua.edu.cn/d/e3edfc2c8b114513b7eb/, and there are 189 scenes in this zip file. However, I notice in your paper that you mention that training requires 500 scenes.
My question is whether the final performance of the model trained using a data set of only 189 scenes can reach the value in the paper?

在运行test_graspnet.py时遇到了问题

在您的代码中test_graspnet.py分为了两个函数:def inference()和def evaluate()
在log中,我看到了inference函数得到的info:
{test_graspnet.py:140} INFO - Using saved anchors
{test_graspnet.py:382} INFO - Time stats:
{test_graspnet.py:383} INFO - Total: 34.181 ms
{test_graspnet.py:386} INFO - 2d: 7.364 ms data: 16.780 ms 6d: 5.255 ms colli:2.982 ms nms: 1.802 ms

但是并没有得到evaluate中的info:
logging.info(f'Scene: {args.scene_l} ~ {args.scene_r}')
logging.info(f'colli == {colli}')
logging.info(f'ap == {ap}')
logging.info(f'ap0.8 == {aps[3]}')
logging.info(f'ap0.4 == {aps[1]}')

这些信息在log里都没有。
同时,test过程在scene 100-130之间,会卡在
score list == [ 0.2 0.4 0.4 0.4 0.2 -1. 0.6 0.2 -1. 0.2 0.2 -1. 0.4 0.2 0.4 -1. 0.6 0.2 0.2 -1. 0.4 -1. 0.4 0.4 -1. 0.4 0.2 0.4 0.4 0.4 0.6 0.2 0.4 -1. ]
colli == 0.23529411764705882
Mean Accuracy for scene:0128 ann:0255 = 66.559
这里,然后就不动了。
我尝试把100-130换到130-160
结果也会卡住,在scene:0158 ann:0255时卡住
我不知道问题出在哪里了,为什么会这样内😎

some questions

您好!
首先十分感谢大佬能提供代码学习
我有些问题想问一下:

如果我要用HGGD对新的物体进行抓取,可以直接用你提供的checkpoint吗?还是要构建自己的数据集并训练?
请问您是如何构建自己的数据集的呢(比如制作工具)?

Inquiry on Real-Time Grasp Prediction Methodology

I recently came across your work, and I must commend you on the groundbreaking results and the innovative approach your team has adopted. The videos showcasing real-time grasp prediction were particularly inspiring.

Would you be willing to share more insights or any advice on replicating such real-time grasp prediction capabilities?

Hyper-params for getting to work with 480x640 images?

Hey, thanks for releasing the code for the project!

I have one question, what do I need to change if I want to work with 480x640 images?

For now, in order to make them compatible with your framework out-of-the-box, I upscale the images to 720x1280, send them to the point-cloud helper to get the view_points and then call the networks, as you do in your demo.

My issue is that the final grasp predictions are going to be expressed in the upscaled pointcloud coordinates. So if I visualize the pointcloud I get from the resized 720x1280 images I get nice and tight grasps
Selection_525
but if I visualize them together with the original resolution pointcloud it is not (obviously).

I tried to manually post-process the translation component of the proposed grasps, e.g. multiplying by 0.5 to fit the resolution change, by still doesnt work.

Is it possible to make your demo.py run for image resolution (480, 640)? Following your implementation at demo.py I did the following changes:

  • Changed input_h, input_w in the arguments to half my desired size (240,320) as from what I understand thats what you need there (currently has 360,640 for 720x1280 images).
  • Changed the PointCloudHelper class to replace the (1280, 720) constants with (640, 480). In particular I changed the line ymap, xmap = np.meshgrid(np.arange(480), np.arange(640)) and the line factor = 640 / self.output_shape[0].

If I run the demo with this setting, I get the following error:

Traceback (most recent call last):
  File "grasp_proposal_test.py", line 353, in <module>
    pred_gg = inference(view_points,
  File "grasp_proposal_test.py", line 149, in inference
    pred_2d, perpoint_features = anchornet(x)
  File "/home/p300488/miniconda3/envs/hggd/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/p300488/6d_grasp_ws/src/hggd_grasp_server/src/inference/models/anchornet.py", line 213, in forward
    x = layer(x + xs[self.depth - i])
RuntimeError: The size of tensor a (29) must match the size of tensor b (30) at non-singleton dimension 3

I assume that maybe there are some other hyper-parameters that need to be modified? Or some other change to make it run for my desired resolution? Please let me know if there is an easy fix for this!! Thank you!

如何应用到UR3e上

您好,请问如何将预测的结果应用到ur3e上,能否提供一些参考。

Only grasping pose detection on the edge of the scene

Screenshot from 2024-05-07 16-24-08

Hello, HGGD researchers,

Firstly, I'd like to express my gratitude for providing excellent research and a user-friendly implementation.

When I tried detecting grasping pose in my environment using the provided demo, I encountered an issue where the poses were being detected on the outer edges as shown in Figure.
I would appreciate any feedback on this matter.

Thank you.

Meaning of grasp labels

Dear author,

I saw there are some grasp annotations in the label files. What is the meaning of translation_d, rotation_d and width_d?
{"numgrasp": 146, "translation_d": 6.7442166274596925e-09, "rotation_d": 5.012725293206674e-05, "width_d": 0.0028016255560493948}

Are they used in the training process?

Best regards,
Shiyu Li

Generate preprocessed grasp poses

Hi I've seen that you use a different set of grasp labels that you generate from the original graspnet1billion dataset, you provide the link but I didn't find how you originally generated them, could you tell me how to generate them from scratch?

model training

Hello, I have reviewed your paper and code, and I feel that you have done a great job. Could you please share your training code? I would like to see the details of your model training. Thank you.

Regarding your new article: Rethinking 6-Dof Grasp Detection: A Flexible Framework for High-Quality Grasping

Hello author,
I recently read your latest article: Rethinking 6-Dof Grasp Detection: A Flexible Framework for High-Quality Grasping,I have some questions about why the GS results reproduced in the two papers change, did you turn on collision detection processing in the tests when you ran the tests in the new paper?Will you open source the code for your latest paper?
I would be grateful for your reply !

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.