ut-austin-rpl / giga Goto Github PK
View Code? Open in Web Editor NEWOfficial PyTorch implementation of Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations
License: MIT License
Official PyTorch implementation of Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations
License: MIT License
I follow the command python scripts/train_giga.py --dataset /path/to/new/data --dataset_raw /path/to/raw/data
, and change the path of data(pack).
At the beginning of training, I got the msg
import libmesh failed
import libkdtree failed
import utils failed
I still can continue to train.
But I got an error msg after finishing the first epoch
I don't know what happened
Hi, thank you very much for releasing this project!
I have a quick question regarding inference with GIGA. I see that I can predict grasps by using the VGNImplicit
class and calling its __call__
function. When visualize is set to true
, that function returns grasps, scores, toc, composed_scene
, see here:
GIGA/src/vgn/detection_implicit.py
Line 83 in d67c438
But from my understanding, the returned composed_scene
is just the input tsdf + the added grasps, it does not contain the 3d reconstruction. Is that correct?
So my question is, how do I get the 3d reconstruction? To me it seems that the p_tsdf
argument and the self.decoder_tsdf
are relevant here, but I am not sure how to use them.
Assuming this is the right way to get the 3d reconstruction, what should I pass as p_tsdf
? And then, how do I process and visualize the output of decoder_tsdf
?
Thank you very much in advance, I am looking forward to hearing from you!
Hi,
When I run sim_grasp_multiple.py and add the parameter --vis for simulation grasping, the following problem occurred:
(giga) wang@wang-U:~/github/GIGA$ python scripts/sim_grasp_multiple.py --num-view 1 --object-set pile/test --scene pile --num-rounds 100 --sideview --add-noise dex --force --best --model /home/wang/github/GIGA/data/models/giga_pile.pt --type giga --result-path /home/wang/github/GIGA/data/result --vis
pybullet build time: May 28 2020 16:37:34
Loading [giga] model from /home/wang/github/GIGA/data/models/giga_pile.pt
0%| | 0/100 [00:00<?, ?it/s]
Traceback (most recent call last):
File "scripts/sim_grasp_multiple.py", line 123, in
main(args)
File "scripts/sim_grasp_multiple.py", line 38, in main
success_rate, declutter_rate = clutter_removal.run(
File "/home/wang/github/GIGA/src/vgn/experiments/clutter_removal.py", line 89, in run
logger.log_mesh(scene_mesh, visual_mesh, f'round_{round_id:03d}trial{trial_id:03d}')
File "/home/wang/github/GIGA/src/vgn/experiments/clutter_removal.py", line 177, in log_mesh
aff_mesh.export(self.mesh_dir / (name + "_aff.obj"))
File "/home/wang/anaconda3/envs/giga/lib/python3.8/site-packages/trimesh/scene/scene.py", line 842, in export
return export.export_scene(
File "/home/wang/anaconda3/envs/giga/lib/python3.8/site-packages/trimesh/exchange/export.py", line 210, in export_scene
raise ValueError('file_type not specified!')
ValueError: file_type not specified!
What is the reason?
Thank you! ^_^
What is the coordinate system / reference you take for adding the grasp visual in grasp2mesh()
?
I tweaked your code a bit to run it on my dataset, but the gripper is way farther from the object mesh.
I ran the pre-trained model and got the best grasp params (affordance, rotation, width and center). Here is the code I used to visualize:
# Loading custom data mesh
mesh = trimesh.load('mydata/processed/meshes/map_{}.obj'.format(scene_id),force='scene')
grasp = Grasp(Transform(ori, pos, width)
finger_depth = 0.05
color = np.array([0, 250, 0, 180]).astype(np.uint8)
radius = 0.1 * finger_depth
w, d = grasp.width, finger_depth
pose = grasp.pose * Transform(Rotation.identity(), [0.0, -w / 2, d / 2])
scale = [radius, radius, d]
left_finger = trimesh.creation.cylinder(radius,
d,
transform=pose.as_matrix())
scene.add_geometry(left_finger, 'left_finger')
# right finger
pose = grasp.pose * Transform(Rotation.identity(), [0.0, w / 2, d / 2])
scale = [radius, radius, d]
right_finger = trimesh.creation.cylinder(radius,
d,
transform=pose.as_matrix())
scene.add_geometry(right_finger, 'right_finger')
# wrist
pose = grasp.pose * Transform(Rotation.identity(), [0.0, 0.0, -d / 4])
scale = [radius, radius, d / 2]
wrist = trimesh.creation.cylinder(radius,
d / 2,
transform=pose.as_matrix())
scene.add_geometry(wrist, 'wrist')
# palm
pose = grasp.pose * Transform(
Rotation.from_rotvec(np.pi / 2 * np.r_[1.0, 0.0, 0.0]),
[0.0, 0.0, 0.0])
scale = [radius, radius, w]
palm = trimesh.creation.cylinder(radius, w, transform=pose.as_matrix())
scene.add_geometry(palm, 'palm')
scene.add_geometry(trimesh.creation.axis())
Hi, thanks for your great work!
I wonder if the training dataset includes those grasp data pairing with those ambiguous TSDF voxels, since you fuse the TSDF only by one single-view depth image, which definitely includes the ambiguous voxels in the occluded area. Intuitively, when you take as input those voxels with corresponding grasp data on them, this seems to affect the performance because of the ambiguous dataset.
Have you ever considered this, or already utilized a few tricks in this work to avoid this? It's important to me.
Thanks a lot.
Hello! Thanks for your excellent work! I would like to use GIGA on real robot grasping, could you please provide relevant files (.py, yaml, launch...)?
Dear @yukezhu @Steve-Tod
Can you publicpretrained weights of GIGA and GIGA_AFF ?
Hi @Steve-Tod , I have two questions:
detection_implicit.py
, but qual_vol[valid_voxels == False] = 0.0
gives me an error that the size of valid voxels doesn't match the size of qual_vol)Thanks!
Hi,I training one epoch on pile scene,and estimated time to spend more than 10 hours.
Here is my computer configuration:
pytorch 1.7.0 py3.8_cuda10.2.89_cudnn7.6.5_0
GPU:GeForce RTX 2070SUPER
CPU:Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz
Memory:16G
Hard Disk Drive:1T HDD
I found that GPU utilization was erratic and was 0 for a long time, and CPU utilization was not very high. So I presume it's because the io speed of the HDD hard disk drive is too slow.
I hope you can help me find the problem, thanks!
when I try to run the command:
python scripts/sim_grasp_multiple.py --num-view 1 --object-set (packed/test | pile/test) --scene (packed | pile) --num-rounds 100 --sideview --add-noise dex --force --best --model /path/to/model --type (vgn | giga | giga_aff) --result-path /path/to/result
it shows an error that no module named 'vgn', it seems like sim_grasp_multiple.py fails to import src/vgn or any file under vgn, I'm wondering how to fix it.
Hello! Thanks for your brilliant work.
I am implementing GIGA in the real-world setting based on the VGN code. However, the process doesn't go smoothly. I met the following problems:
(1) Can the checkpoints provided in the repo be used directly in the real-world setting? Or do we need to retrain the model with a different setting?
(2) I found that the generated grasps were not of high quality. Lots of them caused collisions with the object. I checked the point cloud collected by the camera and it is noisy. Does it matter? Do I need to do some post-processing? I use a realsense camera to re-implement it.
Depthmap from simulation
Depthmap from realsense camera
(3) For some grasps, the robot will collide with the object (or table) before it approaches the pre-grasp pose. What planner did you use to avoid it?
If you can give me some suggestions on it, I would really appreciate that.
When I trained the GIGA, I have this problem
File "/home/wzh/GIGA/src/vgn/dataset_voxel.py", line 98, in read_occ
path_idx = torch.randint(high=len(occ_paths), size=(1,), dtype=int).item()
RuntimeError: random_ expects 'from' to be less than 'to', but got from=0 >= to=0
I don't know what cause this error.
Maybe I fail to save the valid occupancy data?
When I run the program that generate-data as instructed in the Readme.md, it got stuck on the first loop. Then I added print to locate where the program was stuck, and found it at the following location:
generate_data.py -- "tsdf = create_tsdf(..."
perception.py -- "tsdf.integrate(depth_img[i], instrinsic, extrinsic)"
Furthermore, in the integrate function,
"self._volume.integrate(rgbd, intrinsic, extrinsic)"
I don't know why I'm stuck here, has anyone else encountered this problem?
Replenish:
hi, first of all, thanks for your contribution in the community!
My question is about some detail of randomly query grasp sample given a trained model.
In VGN, after we sample grasp candidates, we have to mask out those voxles which distance to the nearest surface is smaller than the finger depth. This is because we can not train sample in such case during self-supervised learning, so we can not inlcude this configuration into model.
It seems like we have to use TSDF information, which is a sparse information, has a shape of 1 * 40 * 40 * 40 in the experiment. I feel like in GIGA, we still needs this section for grasp detection, how can we gather the TSDF information of candidates if we sample the candidate randomly?
Hi
May I ask how to visualize the process of self-supervised data generation just like you show on your project website?
Hi,
I'm trying to obtain:
data/experiments/<>/meshes
folder. There are no files with <name>_aff.obj>
getting logged/saved (although the scenes are still getting saved as <name>_aff.obj>
.Please let me know what I can do achieve the above.
Hi,
Thanks for your contribution to the community!
Q1. When I try to run the sim_grasp_multiple.py use your command in readme.md( I add --vis already), the system can only log _scene.obj while can not log _aff.obj.
def log_mesh(self, scene_mesh, aff_mesh, name):
aff_mesh.export(self.mesh_dir / (name + "_aff.obj"), file_type='obj')
scene_mesh.export(self.mesh_dir / (name + "_scene.obj"), file_type='obj')
The function to log scene and aff mesh is shown as above, I try to print aff_mesh and it shows like:
<trimesh.Scene(len(geometry)=3)>
which means it has object, but I can not log it. I'm wondering why?
Q2. when we log the info, there's two folder, one is meshes which contains meshes, the other one is a folder named as 'scenes', contain some .npz, I'm wondering what is the difference between the meshes we get and the .npz file which is the tsdf, is that just different representation of the same scene?
Q3. how can I visualize the grasp like the amazing visualization in your paper, with green and blue grippers for success and fail predictions.
Thanks for your time, very appreciate about your contribution again!
Hello, may I ask some questions about datasets? Does the dataset used to train the model contain one million positive samples and one million negative samples? How much time does it take to generate these datasets in total?
Do you have the scene descriptor in the pre-generated dataset provided? Such that we can reconstruct the scene while leverage the annotated grasping pose? Thanks!
Hi all,
I ran python scripts/save_occ_data_parallel.py /path/to/raw/data 100000 2 --num-proc 40
to generate occupancy data with the pre-generated data provided by you. I found this line in python scripts/save_occ_data_parallel.py
encounter *** AttributeError: 'str' object has no attribute 'copy'
. The environmental setting is aligned with the description in README.md
Just wonder whether there is any solution for this issue.
Any help appreciated :)
Hello! I just began my robotics research which is very related to yours' . I am trying to follow your work in Github.
When I carried out the step 4 of Installation, firstly it said the cl.exe could not be found, so I installed Visual Studio 2017 and set the path of environment variables, like following:
env variable "path":
env variable "LIB":
env variable "INCLUDE":
But an error still occurred: LINK : fatal error LNK1181: can not open input file“m.lib” . The specific output in the command line:
I thought it might still caused by libpath or something, but I could not fix it after trials. I think it will definitely affect the following steps of installation or data generation so it is annoying.
Do you have some ideas or suggestions to fix it? Thanks a lot!
I have tried to install the program in wsl2 using ubuntu. If I use command "nvidia-smi' in linux, I can get a result:
But when I try to install cuda from nvidia by
wget https://developer.download.nvidia.com/compute/cuda/11.5.0/local_installers/cuda_11.5.0_495.29.05_linux.run
sudo sh cuda_11.5.0_495.29.05_linux.run
It will warn like that:
Thus is it necessary to download and install the nvidia driver for linux 64-bit in nvidia website? I tried to do that, but I can not install it locally in wsl2-ubuntu system. It reminded that there is not an NVIDIA GPU in this system. Thanks!
Hi,
I would like to use the pre-trained GIGA model on a custom dataset.
Also I would like to know what these are:
inputs: conditioning input
pos: sampled points
In this step:"python scripts/save_occ_data_parallel.py /home/wzh/GIGA-main/data/raw_data 100000 2 --num-proc 40"
I have this error:
import libmesh failed!
Total jobs: 0, CPU num: 40
but I have installed libmesh 0.0.6
Hi,
I have a question about the paper.
In the section IV.C, when you discuss about Table I, you said:
Next, we compare the results of GIGA-Aff with GIGA. In the pile scenario, the gain from geometry supervision is relatively small (around 2% grasp success rate). However, in the packed scenario, GIGA outperforms GIGA-Aff by a large margin of around 5%. We believe this is due to the different characteristics of these two scenarios. From Figure 3, we can see that in the packed scene, some tall objects standing in the workspace would occlude the objects behind them and the oc�cluded objects are partially visible. We hypothesize that in this case, the geometrically-aware feature representation learned via geometry supervision facilitates the model to predict grasps on partially visible objects. Such occlusion is, however, less frequent in pile scenarios.
To summarize, you think pile scenarios has less occlusion than packed scenario.
However, in the same section, when you discuss about Fig 4, you said:
The last two rows show the affordance landscape and top grasps for two pile scenes. We see that baselines without the multi-task training of 3D reconstruction tend to generate failed or no grasp, whereas GIGA produces more diverse and accurate grasps due to the learned geometrically�aware representations.
It feels like you are saying that the reason why we do not have this property in packed scenario is because piled scenario needs geometrically aware feature more, which means it has more occlusions.
I'm very confused, why you have opposite conclusion for this? Also, if GIGA helps to have better grasp prediction in pile senario, like what you said in the anaylse of Fig 4, why the quantitive result in Table 1 does not have a significant improvement for GSR and DR from GIGA-AFF to GIGA in pile senario?
Thanks for your time and contribution again!
Hi,
The results I got with the model you gave are different from those in your paper, and the results are different under different devices. I want to know if you have the same problem at that time. Is this because the test objects are randomly selected and randomly placed? Is the test scene generated each time not fixed, so sometimes the experimental results are different?
Looking forward to your answer. Thank you!
Are giga, vgn and giga-aff are the same epoches? Are they all 20 epoches?
It would be very useful to have the training data returned from generate_data_parallel.py
script available to download, for both the pile and packed cases.
I appreciate this may be a large amount of memory, and therefore difficult to host, so there is no expectation of course!
But it would avoid people needing to run the costly data generation process locally in order to experiment with the training.
By naively following the installation instructions, I'm finding that this line hangs when running with multiprocessing, but not when running in the main thread.
Would it be possible to list the full conda environment you used, by running conda list -e > requirements.txt
?
For example, I would be interested to know which versions of python, Open3D and multiprocessing you used.
Any help appreciated :)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.