Giter Site home page Giter Site logo

uark-aicv / openfusion Goto Github PK

View Code? Open in Web Editor NEW
94.0 7.0 9.0 232.2 MB

[ICRA 2024 Oral] Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation

Home Page: https://uark-aicv.github.io/OpenFusion/

Python 88.36% Dockerfile 0.18% Shell 1.04% C++ 1.04% Cuda 9.37%
3d-reconstruction queryable vision-language icra2024

openfusion's Introduction

OpenFusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation

Kashu Yamazaki · Taisei Hanyu · Khoa Vo · Thang Pham · Minh Tran
Gianfranco Doretto · Anh Nguyen · Ngan Le

TL;DR: Open-Fusion builds an open-vocabulary 3D queryable scene from a sequence of posed RGB-D images in real-time.

Getting Started 🏁

System Requirements

  • Ubuntu 20.04
  • 10GB+ VRAM (~ 5 GB for SEEM and 2.5 GB ~ for TSDF) - for a large scene, it may require more memory
  • Azure Kinect, Intel T265 (for real-world data)

Environment Setup

Please build a Docker image from the Dockerfile. Do not forget to export the following environment variables (REGISTRY_NAME and IMAGE_NAME) as we use them in the tools/*.sh scripts:

export REGISTRY_NAME=<your-registry-name>
export IMAGE_NAME=<your-image-name>
docker build -t $REGISTRY_NAME/$IMAGE_NAME -f docker/Dockerfile .

Data Preparation

ICL and Replica

You can run the following script to download the ICL and Replica datasets:

bash tools/download.sh --data icl replica

This script will create a folder ./sample and download the datasets into the folder.

ScanNet

For ScanNet, please follow the instructions in ScanNet. Once you have the dataset downloaded, you can run the following script to prepare the data (example for scene scene0001_00):

python tools/prepare_scene.py --filename scene0001_00.sens --output_path sample/scannet/scene0001_00

Model Preparation

Please download the pretrained weight for SEEM from here and put it in as openfusion/zoo/xdecoder_seem/checkpoints/seem_focall_v1.pt.

Run OpenFusion

You can run OpenFusion using tools/run.sh as follows:

bash tools/run.sh --data $DATASET --scene $SCENE

Options:

  • --data: dataset to use (e.g., icl)
  • --scene: scene to use (e.g., kt0)
  • --frames: number of frames to use (default: -1)
  • --live: run with live monitor (default: False)
  • --stream: run with data stream from camera server (default: False)

If you want to run OpenFusion with camera stream, please run the following command first on the machine with Azure Kinect and Intel T265 connected:

python deploy/server.py

Please refer to this for more details.

Acknowledgement 🙇

  • SEEM: VLFM we used to extract region based features
  • Open3D: GPU accelerated 3D library for the base TSDF implementation

Citation 🙏

If you find this work helpful, please consider citing our work as:

@inproceedings{yamazaki2024open,
  title={Open-fusion: Real-time open-vocabulary 3d mapping and queryable scene representation},
  author={Yamazaki, Kashu and Hanyu, Taisei and Vo, Khoa and Pham, Thang and Tran, Minh and Doretto, Gianfranco and Nguyen, Anh and Le, Ngan},
  booktitle={2024 IEEE International Conference on Robotics and Automation (ICRA)},
  pages={9411--9417},
  year={2024},
  organization={IEEE}
}

Contact 📧

Please create an issue on this repository for questions, comments and reporting bugs. Send an email to Kashu Yamazaki for other inquiries.

openfusion's People

Contributors

dependabot[bot] avatar kashu7100 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

openfusion's Issues

Replica Evaluation Details

Great work! I will like to ask how did you perform the evaluation for the Replica dataset? Could you provide some guidance on how you obtained the pointclouds for Replica dataset and how you obtained the groundtruth labels? Thank you so much!

Quantitative evaluation

Hi, thanks for your great work. I wonder how to evaluate the segmentation result quantitatively as shown in table 2 of the paper?

Question about random_sample_indices function

Hello @Kashu7100,
I am writing to express my gratitude for the excellent work and high-quality open-source code implementation.

Also, I am curious about random_sample_indices function. From what I understand, it seems to reflect this description from the paper:
"However, to optimize computation and memory usage in subsequent modules, we limit the storage of semantics to points near the surface. These points are strategically sampled based on the TSDF values, resulting in a more efficient representation"

Could you possibly elaborate on how this function implementation correlates with the aforementioned description? I'm a bit confused because I'm only seeing VoxelGrid buffer indices and the randperm method in the function code. Could you please clarify what am i missing?

Listed minimum requirements vs. paper?

Thanks for the nice work! I am just wondering what the minimum requirements actually are for a room-sized environment

The paper says it runs in real-time on a 3060M (mobile) and as far as I know those usually come with only 6GB of VRAM. The repo instructions claims it needs 10+ GB but also seems to indicate that >5GB might work if the environment TSDF is small? Is there some tuning needed of the TSDF part to get this to work in a room-sized environment?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.