Giter Site home page Giter Site logo

facebookresearch / 3d-vision-and-touch Goto Github PK

View Code? Open in Web Editor NEW
66.0 10.0 14.0 1.4 MB

When told to understand the shape of a new object, the most instinctual approach is to pick it up and inspect it with your hand and eyes in tandem. Here, touch provides high fidelity localized information while vision provides complementary global context. However, in 3D shape reconstruction, the complementary fusion of visual and haptic modalities remains largely unexplored. In this paper, we study this problem and present an effective chart-based approach to fusing vision and touch, which leverages advances in graph convolutional networks. To do so, we introduce a dataset of simulated touch and vision signals from the interaction between a robotic hand and a large array of 3D objects. Our results show that (1) leveraging both vision and touch signals consistently improves single-modality baselines, especially when the object is occluded by the hand touching it; (2) our approach outperforms alternative modality fusion methods and strongly benefits from the proposed chart-based structure; (3) reconstruction quality boosts with the number of grasps provided; and (4) the touch information not only enhances the reconstruction at the touch site but also extrapolates to its local neighborhood.

License: Other

Python 94.50% C++ 1.56% Cuda 2.52% Shell 1.42%

3d-vision-and-touch's Introduction

Companion code for E.J. Smith, et al.: 3D Shape Reconstruction from Vision and Touch.

This repository contains a code base and dataset for learning to fuse vision and touch signals from the grasp interaction of a simulated robotic hand and 3D obejct for 3D shape reconstruction. The code comes with pre-defined train/valid/test splits over the dataset released, pretrained models, and training and evaluation scripts. This code base uses a subset of the ABC Dataset (released under MIT License) instead of the dataset listed in the paper due to licensing issues. We appologise for the discrepancy, however, no data could have been released otherwise. We have provided updated reconstruction accuracies for the new dataset below.

If you find this code useful in your research, please consider citing with the following BibTeX entry:

@misc{VisionTouch,
Author = {Edward J. Smith and Roberto Calandra and Adriana Romero and Georgia Gkioxari and David Meger and Jitendra Malik and Michal Drozdzal},
Title = {3D Shape Reconstruction from Vision and Touch},
Year = {2020},
journal = {arXiv:1911.05063},
}

Installation

This code uses PyTorch and PyTorch3D. I recommend you install them both following the install instruction found here.

  • Install other dependencies:
$ pip install -r requirements.txt

Dataset

To download the code call the following, keep in mind this will take some time to download and unpack:

$ bash download_data.sh

This is released under a MIT License.

Training

Touch Chart Prediction

To train a model to predict touch charts, ie local geometry at each touch site, first move into the touch chart directory:

$ cd touch_charts

To begin training call:

$ python recon.py --exp_type <exp_type> --exp_id <exp_id> 

where <exp_type> and <exp_id> are the experiment type and id you wish to specify. There are a number of other arguments for changing the default parameters of this training, call with --help to view them.

Checkpoints will be saved under a directory "experiments/checkpoint/<exp_type>/<exp_id>/", specified by --exp_type and --exp_id.

To check training progress with Tensorboard:

$ tensorboard --logdir=experiments/tensorboard/<exp_type>/  --port=6006

The training above will only predict a point cloud for each haptic signal. To optimize a mesh sheet to match this predicted point cloud and produce a predicted touch chart at every touch site call the following:

$ mv data/sheets data/pretrained_sheets
$ python produce_sheets.py.py --save_directory experiments/checkpoint/<exp_type>/<exp_id>/encoder_touch

where <exp_type> and <exp_id> are the same settings as when training. This will first move the premade sheets produced using the pretrained model. If you would like to use the premade sheets simply skip this step. By default the script uses the pretrained model provided to perform this optimization. Regardless of the model used, this will take some time to complete, and if you would like to use slurm to produce these sheets, the sumbit.py file can be called instead.

Global Prediction

To train a model to deform vision charts around touch charts and produce a full surface prediction, first move into the vision chart directory:

$ cd vision_charts

To begin training call:

$ python recon.py --exp_type <exp_type> --exp_id <exp_id> 

where <exp_type> and <exp_id> are the experiment type and id you wish to specify. There are a number of other arguments for changing the default parameters of this training, call with --help to view them.

Checkpoints will be saved under a directory "experiments/checkpoint/<exp_type>/<exp_id>/", specified by --exp_type and --exp_id.

To check training progress with Tensorboard:

$ tensorboard --logdir=experiments/tensorboard/<exp_type>/  --port=6006

The same level of hyperparamter search used in the paper can be reproduced using slurm and the submit.py file located in the same folder.

Evaluation

Touch Chart Prediction

Perform evaluation of the touch chart prediction, from the touch chart directory as follows:

$ python recon.py --eval --exp_type <exp_type> --exp_id <exp_id> 

where <exp_type> and <exp_id> are the experiment type and id specified during training.

Global Prediction

Perform evaluation of the global prediction, from the vision chart directory as follows:

$ python recon.py --eval --exp_type <exp_type> --exp_id <exp_id> 

where <exp_type> and <exp_id> are the experiment type and id specified during training.

Pretrained Models

If you wish to download pretrained models please call the following:

$ bash prepare_models.sh

To produce touch charts using the pretrained model call:

$ cd touch_charts
$ python produce_sheets.py 

As this is a time intensive procedure, if you would like to use slurm to produce these sheets, the sumbit.py file can be called. Premade sheets have also been provided in the dataset however.

To test using the pretrained models to reconstruct objects using different input modalities call:

$ cd vision_charts 
$ python recon.py --pretrained <model> --eval

where <model> is one of either ['empty', 'touch', 'touch_unoccluded', 'touch_occluded', 'unoccluded', 'occluded'].

The following table highlights the reconstruction accuracies of these models on the test set:

No Input Touch Occluded Unoccluded Touch + Occluded Touch + Unoccluded
Chamfer Distance 26.888 6.926 2.936 2.844 2.406 2.468

License

See LICENSE for details.

3d-vision-and-touch's People

Contributors

edwardsmith1884 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

3d-vision-and-touch's Issues

Question about original plane

Hi, I have a further question about the model.
During the initialization of the model, a plane is defined by:

width = .0218 - 0.00539 y_z = torch.arange(dim).cuda().view(dim, 1).expand(dim, dim).float() y_z = torch.stack((y_z, y_z.permute(1, 0))).permute(1, 2, 0) plane = torch.cat((torch.zeros(dim, dim, 1).cuda(), y_z), dim=-1) self.orig_plane = (plane / float(dim) - .5) * width

Based on my understanding, the prediction of the model is based on the deformation of this plane, may I ask what's the meaning of "width"? It is a parameter related to your simulator?
Thanks so much for your help!

how to use the dataset?

Excuse me, I'd like to know how to use the ABC dataset for trainning? I don't know what the exp_type and exp_id is?the name of the directory?

data collection

Hello,
I'm very interested in your work. Could you provide the following code for collecting data?

Import

import models
import utils
import data_loaders

I am getting below error in recon.py.
Import "models" could not be resolvedPyright(reportMissingImports)
Could you please help me with this error

Questions about loss & data

Hi! Thanks for the exciting work.
Recently I've been trying to combine the model of this work with a new dataset that only contains .obj mesh files with visual & tactile readings, and I encounter several questions:

  1. Why do you multiply the chamfer loss with a "loss_coeff" during the training and testing process? What's the meaning of this coefficient?
  2. In you dataset, there's an item in the scene_info called "samples", which stands for the local point cloud based on my understanding, may I ask how did you obtain this point cloud? Is it possible that I can sample this point cloud based on the .obj mesh file (similar to the generation of global point clouds using "batch_sample" function)?
  3. What's the unit of your dataset and loss? Is it centimeter or something else?

Thank you!

subprocess.CalledProcessError and UnicodeDecodeError

D:\Anaconda\envs\Tensorflow115\lib\site-packages\torch\utils\cpp_extension.py:190: UserWarning: Error checking compiler version for cl: 'utf-8' codec can't decode byte 0xd3 in position 0: invalid continuation byte
warnings.warn('Error checking compiler version for {}: {}'.format(compiler, error))
Traceback (most recent call last):
File "D:\Anaconda\envs\Tensorflow115\lib\site-packages\torch\utils\cpp_extension.py", line 961, in _build_extension_module
check=True)
File "D:\Anaconda\envs\Tensorflow115\lib\subprocess.py", line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "G:/TBSI-RD/code/3D-Vision-and-Touch/touch_charts/recon.py", line 19, in
import utils
File "..\utils.py", line 17, in
from third_party_code import ChamferDistance
File "..\third_party_code_init_.py", line 6, in
from .chamfer_distance import ChamferDistance
File "..\third_party_code\chamfer_distance.py", line 10, in
"../third_party_code/chamfer_distance.cu"])
File "D:\Anaconda\envs\Tensorflow115\lib\site-packages\torch\utils\cpp_extension.py", line 659, in load
is_python_module)
File "D:\Anaconda\envs\Tensorflow115\lib\site-packages\torch\utils\cpp_extension.py", line 828, in _jit_compile
with_cuda=with_cuda)
File "D:\Anaconda\envs\Tensorflow115\lib\site-packages\torch\utils\cpp_extension.py", line 881, in _write_ninja_file_and_build
_build_extension_module(name, build_directory, verbose)
File "D:\Anaconda\envs\Tensorflow115\lib\site-packages\torch\utils\cpp_extension.py", line 973, in _build_extension_module
message += ": {}".format(error.output.decode())
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd3 in position 1183: invalid continuation byte

This is the error log, I just cloned the project and run recon.py in the environment of python3.7 and pytorch1.6.0.
Could you please help me remove this error?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.