ut-austin-rpl / giga Goto Github PK

Official PyTorch implementation of Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations

License: MIT License

Python 68.67% C 8.80% Mako 4.07% C++ 12.86% Cython 5.60%

robot-manipulation grasping robot-learning robotics affordance 3d-reconstruction

giga's People

Contributors

Stargazers

Watchers

giga's Issues

Training issue

I follow the command python scripts/train_giga.py --dataset /path/to/new/data --dataset_raw /path/to/raw/data , and change the path of data(pack).
At the beginning of training, I got the msg
import libmesh failed
import libkdtree failed
import utils failed

I still can continue to train.
But I got an error msg after finishing the first epoch

I don't know what happened

visualize grasp afforance

May I ask what software did you use to visualize the grasp afforance?
colored like this。。。

visual

Hello, how to the get these figures?

why process "pos" in dataset like this?

Why process "pos" in dataset like the line:

GIGA/src/vgn/dataset_voxel.py

Line 35 in d67c438

pos = pos / self.size - 0.5

Why the "self.size" in

GIGA/src/vgn/simulation.py

Line 37 in d67c438

self.size = 6 * self.gripper.finger_depth

define like this?

Why the do not use "pos" in " x, y, z" instead of "i, j k" in following lines?

GIGA/scripts/construct_dataset_parallel.py

Line 73 in d67c438

df = df.rename(columns={"x": "i", "y": "j", "z": "k"})

[Question] How to get the 3d reconstruction at inference time?

Hi, thank you very much for releasing this project!

I have a quick question regarding inference with GIGA. I see that I can predict grasps by using the VGNImplicit class and calling its __call__ function. When visualize is set to true, that function returns grasps, scores, toc, composed_scene, see here:

GIGA/src/vgn/detection_implicit.py

Line 83 in d67c438

return grasps, scores, toc, composed_scene

But from my understanding, the returned composed_scene is just the input tsdf + the added grasps, it does not contain the 3d reconstruction. Is that correct?

So my question is, how do I get the 3d reconstruction? To me it seems that the p_tsdf argument and the self.decoder_tsdf are relevant here, but I am not sure how to use them.

GIGA/src/vgn/ConvONets/conv_onet/models/__init__.py

Line 64 in d67c438

tsdf = self.decoder_tsdf(p_tsdf, c, **kwargs)

Assuming this is the right way to get the 3d reconstruction, what should I pass as p_tsdf? And then, how do I process and visualize the output of decoder_tsdf?

Thank you very much in advance, I am looking forward to hearing from you!

ValueError: file_type not specified!

Hi,
When I run sim_grasp_multiple.py and add the parameter --vis for simulation grasping, the following problem occurred:

(giga) wang@wang-U:~/github/GIGA$ python scripts/sim_grasp_multiple.py --num-view 1 --object-set pile/test --scene pile --num-rounds 100 --sideview --add-noise dex --force --best --model /home/wang/github/GIGA/data/models/giga_pile.pt --type giga --result-path /home/wang/github/GIGA/data/result --vis

pybullet build time: May 28 2020 16:37:34
Loading [giga] model from /home/wang/github/GIGA/data/models/giga_pile.pt
0%| | 0/100 [00:00<?, ?it/s]
Traceback (most recent call last):
File "scripts/sim_grasp_multiple.py", line 123, in
main(args)
File "scripts/sim_grasp_multiple.py", line 38, in main
success_rate, declutter_rate = clutter_removal.run(
File "/home/wang/github/GIGA/src/vgn/experiments/clutter_removal.py", line 89, in run
logger.log_mesh(scene_mesh, visual_mesh, f'round_{round_id:03d}trial{trial_id:03d}')
File "/home/wang/github/GIGA/src/vgn/experiments/clutter_removal.py", line 177, in log_mesh
aff_mesh.export(self.mesh_dir / (name + "_aff.obj"))
File "/home/wang/anaconda3/envs/giga/lib/python3.8/site-packages/trimesh/scene/scene.py", line 842, in export
return export.export_scene(
File "/home/wang/anaconda3/envs/giga/lib/python3.8/site-packages/trimesh/exchange/export.py", line 210, in export_scene
raise ValueError('file_type not specified!')
ValueError: file_type not specified!

What is the reason?
Thank you！ ^_^

Visualizing on custom dataset

What is the coordinate system / reference you take for adding the grasp visual in grasp2mesh()?

I tweaked your code a bit to run it on my dataset, but the gripper is way farther from the object mesh.
I ran the pre-trained model and got the best grasp params (affordance, rotation, width and center). Here is the code I used to visualize:

Code:

# Loading custom data mesh
mesh = trimesh.load('mydata/processed/meshes/map_{}.obj'.format(scene_id),force='scene')
grasp = Grasp(Transform(ori, pos, width)

finger_depth = 0.05
color = np.array([0, 250, 0, 180]).astype(np.uint8)
radius = 0.1 * finger_depth
w, d = grasp.width, finger_depth
pose = grasp.pose * Transform(Rotation.identity(), [0.0, -w / 2, d / 2])
scale = [radius, radius, d]
left_finger = trimesh.creation.cylinder(radius,
                                        d,
                                        transform=pose.as_matrix())
scene.add_geometry(left_finger, 'left_finger')

# right finger
pose = grasp.pose * Transform(Rotation.identity(), [0.0, w / 2, d / 2])
scale = [radius, radius, d]
right_finger = trimesh.creation.cylinder(radius,
                                            d,
                                            transform=pose.as_matrix())
scene.add_geometry(right_finger, 'right_finger')

# wrist
pose = grasp.pose * Transform(Rotation.identity(), [0.0, 0.0, -d / 4])
scale = [radius, radius, d / 2]
wrist = trimesh.creation.cylinder(radius,
                                    d / 2,
                                    transform=pose.as_matrix())
scene.add_geometry(wrist, 'wrist')

# palm
pose = grasp.pose * Transform(
    Rotation.from_rotvec(np.pi / 2 * np.r_[1.0, 0.0, 0.0]),
    [0.0, 0.0, 0.0])
scale = [radius, radius, w]
palm = trimesh.creation.cylinder(radius, w, transform=pose.as_matrix())
scene.add_geometry(palm, 'palm')
scene.add_geometry(trimesh.creation.axis())

Did you consider trying to avoid using the grasp data on the wrong voxels?

Hi, thanks for your great work!
I wonder if the training dataset includes those grasp data pairing with those ambiguous TSDF voxels, since you fuse the TSDF only by one single-view depth image, which definitely includes the ambiguous voxels in the occluded area. Intuitively, when you take as input those voxels with corresponding grasp data on them, this seems to affect the performance because of the ambiguous dataset.
Have you ever considered this, or already utilized a few tricks in this work to avoid this? It's important to me.
Thanks a lot.

How to use GIGA on real robot?

Hello! Thanks for your excellent work! I would like to use GIGA on real robot grasping, could you please provide relevant files (.py, yaml, launch...)?

Segmentation fault (core dumped)

Hi,thank you for you repo!
However, I encountered an error when running sim_grasp_multiple.py.
As shown in the figure

Could you give me some suggestions？

GIGA weights

Dear @yukezhu @Steve-Tod
Can you publicpretrained weights of GIGA and GIGA_AFF ?

how to query at a higher resolution of 60×60×60

Hi @Steve-Tod , I have two questions:

how to query at a higher resolution of 60x60x60? (I have tried directly set the resolution as 60 in the detection_implicit.py, but qual_vol[valid_voxels == False] = 0.0 gives me an error that the size of valid voxels doesn't match the size of qual_vol)
Without sideview, the performances decrease a lot. Do you have any explanations?

Thanks!

Training speed is too slow

Hi,I training one epoch on pile scene,and estimated time to spend more than 10 hours.

Here is my computer configuration：
pytorch 1.7.0 py3.8_cuda10.2.89_cudnn7.6.5_0
GPU:GeForce RTX 2070SUPER
CPU:Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz
Memory:16G
Hard Disk Drive:1T HDD

Here are some screenshots:

I found that GPU utilization was erratic and was 0 for a long time, and CPU utilization was not very high. So I presume it's because the io speed of the HDD hard disk drive is too slow.

I hope you can help me find the problem, thanks!

No module named vgn

when I try to run the command:

python scripts/sim_grasp_multiple.py --num-view 1 --object-set (packed/test | pile/test) --scene （packed ｜ pile) --num-rounds 100 --sideview --add-noise dex --force --best --model /path/to/model --type (vgn | giga | giga_aff) --result-path /path/to/result

it shows an error that no module named 'vgn', it seems like sim_grasp_multiple.py fails to import src/vgn or any file under vgn, I'm wondering how to fix it.

Re-implentation in real world

Hello! Thanks for your brilliant work.
I am implementing GIGA in the real-world setting based on the VGN code. However, the process doesn't go smoothly. I met the following problems:
(1) Can the checkpoints provided in the repo be used directly in the real-world setting? Or do we need to retrain the model with a different setting?
(2) I found that the generated grasps were not of high quality. Lots of them caused collisions with the object. I checked the point cloud collected by the camera and it is noisy. Does it matter? Do I need to do some post-processing? I use a realsense camera to re-implement it.
Depthmap from simulation

Depthmap from realsense camera

(3) For some grasps, the robot will collide with the object (or table) before it approaches the pre-grasp pose. What planner did you use to avoid it?

If you can give me some suggestions on it, I would really appreciate that.

Train GIGA

When I trained the GIGA, I have this problem

File "/home/wzh/GIGA/src/vgn/dataset_voxel.py", line 98, in read_occ
path_idx = torch.randint(high=len(occ_paths), size=(1,), dtype=int).item()
RuntimeError: random_ expects 'from' to be less than 'to', but got from=0 >= to=0

I don't know what cause this error.
Maybe I fail to save the valid occupancy data?

The program that generates data gets stuck in the first loop

When I run the program that generate-data as instructed in the Readme.md, it got stuck on the first loop. Then I added print to locate where the program was stuck, and found it at the following location:

generate_data.py -- "tsdf = create_tsdf(..."
perception.py -- "tsdf.integrate(depth_img[i], instrinsic, extrinsic)"

Furthermore, in the integrate function,

"self._volume.integrate(rgbd, intrinsic, extrinsic)"

I don't know why I'm stuck here, has anyone else encountered this problem?

Replenish:

My running environment is Ubuntu18.04+python3.8
This section is very similar to vgn, but vgn is not stuck, even if their input parameters at 'create_tsdf' are the same.

question about GIGA(HR)

hi, first of all, thanks for your contribution in the community!
My question is about some detail of randomly query grasp sample given a trained model.

In VGN, after we sample grasp candidates, we have to mask out those voxles which distance to the nearest surface is smaller than the finger depth. This is because we can not train sample in such case during self-supervised learning, so we can not inlcude this configuration into model.

It seems like we have to use TSDF information, which is a sparse information, has a shape of 1 * 40 * 40 * 40 in the experiment. I feel like in GIGA, we still needs this section for grasp detection, how can we gather the TSDF information of candidates if we sample the candidate randomly?

Visualization of data generation

Hi
May I ask how to visualize the process of self-supervised data generation just like you show on your project website?

Visualizing input TSDFs and grasp

Hi,

I'm trying to obtain:

TSDFs: A 3D visualization of the input TSDFs.
Grasp affordance: The visuals of the grasp affordance are not getting logged under the data/experiments/<>/meshes folder. There are no files with <name>_aff.obj> getting logged/saved (although the scenes are still getting saved as <name>_aff.obj> .

Please let me know what I can do achieve the above.

Can't log _aff.obj when running sim_grasp_multiple.py

Hi,
Thanks for your contribution to the community!

Q1. When I try to run the sim_grasp_multiple.py use your command in readme.md( I add --vis already), the system can only log _scene.obj while can not log _aff.obj.

def log_mesh(self, scene_mesh, aff_mesh, name):
aff_mesh.export(self.mesh_dir / (name + "_aff.obj"), file_type='obj')
scene_mesh.export(self.mesh_dir / (name + "_scene.obj"), file_type='obj')

The function to log scene and aff mesh is shown as above, I try to print aff_mesh and it shows like:

<trimesh.Scene(len(geometry)=3)>

which means it has object, but I can not log it. I'm wondering why?

Q2. when we log the info, there's two folder, one is meshes which contains meshes, the other one is a folder named as 'scenes', contain some .npz, I'm wondering what is the difference between the meshes we get and the .npz file which is the tsdf, is that just different representation of the same scene?

Q3. how can I visualize the grasp like the amazing visualization in your paper, with green and blue grippers for success and fail predictions.

Thanks for your time, very appreciate about your contribution again!

The time to generate the training set

Hello, may I ask some questions about datasets? Does the dataset used to train the model contain one million positive samples and one million negative samples? How much time does it take to generate these datasets in total?

Scene Descriptor

Do you have the scene descriptor in the pre-generated dataset provided? Such that we can reconstruct the scene while leverage the annotated grasping pose? Thanks!

problem about 'Save occupancy data'

Hi all,
I ran python scripts/save_occ_data_parallel.py /path/to/raw/data 100000 2 --num-proc 40 to generate occupancy data with the pre-generated data provided by you. I found this line in python scripts/save_occ_data_parallel.py encounter *** AttributeError: 'str' object has no attribute 'copy'. The environmental setting is aligned with the description in README.md

Just wonder whether there is any solution for this issue.
Any help appreciated :)

Installation error: " LINK : fatal error LNK1181: can not open input file“m.lib” "

Hello! I just began my robotics research which is very related to yours' . I am trying to follow your work in Github.
When I carried out the step 4 of Installation, firstly it said the cl.exe could not be found, so I installed Visual Studio 2017 and set the path of environment variables, like following:
env variable "path":

env variable "LIB":

env variable "INCLUDE":

But an error still occurred: LINK : fatal error LNK1181: can not open input file“m.lib” . The specific output in the command line:

I thought it might still caused by libpath or something, but I could not fix it after trials. I think it will definitely affect the following steps of installation or data generation so it is annoying.
Do you have some ideas or suggestions to fix it? Thanks a lot!

About NVIDIA Driver in WSL2

I have tried to install the program in wsl2 using ubuntu. If I use command "nvidia-smi' in linux, I can get a result:

But when I try to install cuda from nvidia by
wget https://developer.download.nvidia.com/compute/cuda/11.5.0/local_installers/cuda_11.5.0_495.29.05_linux.run
sudo sh cuda_11.5.0_495.29.05_linux.run
It will warn like that:

Thus is it necessary to download and install the nvidia driver for linux 64-bit in nvidia website? I tried to do that, but I can not install it locally in wsl2-ubuntu system. It reminded that there is not an NVIDIA GPU in this system. Thanks!

New dataset

Hi,

I would like to use the pre-trained GIGA model on a custom dataset.
Also I would like to know what these are:

inputs: conditioning input
pos: sampled points
How to obtain occupancy points and occupancy values for my dataset
(I have tried to use RGB-D images -> ScalableTSDFs using Open3D)

Corruption or core dumped

hi, when i run the command:
python scripts/construct_dataset_parallel.py --num-proc 40 --single-view --add-noise dex /path/to/raw/data /path/to/new/data
An error like this:

and i change the num-proc into 1
An error like this:

libmesh failed!

In this step:"python scripts/save_occ_data_parallel.py /home/wzh/GIGA-main/data/raw_data 100000 2 --num-proc 40"
I have this error:
import libmesh failed!
Total jobs: 0, CPU num: 40

but I have installed libmesh 0.0.6

Some confusion in the paper

Hi,

I have a question about the paper.

In the section IV.C, when you discuss about Table I, you said:

Next, we compare the results of GIGA-Aff with GIGA. In the pile scenario, the gain from geometry supervision is relatively small (around 2% grasp success rate). However, in the packed scenario, GIGA outperforms GIGA-Aff by a large margin of around 5%. We believe this is due to the different characteristics of these two scenarios. From Figure 3, we can see that in the packed scene, some tall objects standing in the workspace would occlude the objects behind them and the oc�cluded objects are partially visible. We hypothesize that in this case, the geometrically-aware feature representation learned via geometry supervision facilitates the model to predict grasps on partially visible objects. Such occlusion is, however, less frequent in pile scenarios.

To summarize, you think pile scenarios has less occlusion than packed scenario.

However, in the same section, when you discuss about Fig 4, you said:

The last two rows show the affordance landscape and top grasps for two pile scenes. We see that baselines without the multi-task training of 3D reconstruction tend to generate failed or no grasp, whereas GIGA produces more diverse and accurate grasps due to the learned geometrically�aware representations.

It feels like you are saying that the reason why we do not have this property in packed scenario is because piled scenario needs geometrically aware feature more, which means it has more occlusions.

I'm very confused, why you have opposite conclusion for this? Also, if GIGA helps to have better grasp prediction in pile senario, like what you said in the anaylse of Fig 4, why the quantitive result in Table 1 does not have a significant improvement for GSR and DR from GIGA-AFF to GIGA in pile senario?

Thanks for your time and contribution again!

Questions about experimental results

Hi，

The results I got with the model you gave are different from those in your paper, and the results are different under different devices. I want to know if you have the same problem at that time. Is this because the test objects are randomly selected and randomly placed? Is the test scene generated each time not fixed, so sometimes the experimental results are different？

Looking forward to your answer. Thank you！

Any help appreciated :)

ut-austin-rpl / giga Goto Github PK

giga's People

Contributors

Stargazers

Watchers

Forkers

giga's Issues

Recommend Projects

Recommend Topics

Recommend Org