Giter Site home page Giter Site logo

minigpt-3d's Introduction

MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors

PWC PWC PWC

Paper PDF

🏠 About

pic_1 pic_1

📋 Contents

🔍 Overview

Model

pic_2

  • We present MiniGPT-3D, an efficient and powerful 3D-LLM that aligns 3D points with LLMs using 2D priors. It is trained with 47.8 M learnable parameters in just 26.8 hours on a single RTX 3090 GPU.
  • We propose an efficient four-stage training strategy in a cascaded way, gradually transferring the knowledge from 2D-LLMs.
  • We design the mixture of query experts to aggregate multiple features from different experts with only 0.4M parameters.
  • Extensive experiments show the superior performance of MiniGPT-3D on multiple tasks while reducing the training time and parameters by up to 6x and 260x, respectively.

Note: MiniGPT-3D takes the first step in efficient 3D-LLM, we hope that MiniGPT-3D can bring new insights to this community.

Experiment Results

Quantitative Comparisons with baselines.

pic_3

Qualitative Comparisons with baselines.

pic_3

💬 Dialogue Examples

pic_3 pic_3 pic_4

📝 TODO List

  • Release inferencing codes with checkpoints.
  • Release training codes.
  • Release evaluation codes.
  • Release gradio demo codes.
  • Add online demo.

🔗 Citation

If you find our work helpful, please consider citing:

@article{tang2024minigpt_3d,
  title={MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors},
  author={Tang, Yuan and Han, Xu and Li, Xianzhi and Yu, Qiao and Hao, Yixue and Hu, Long and Chen, Min},
  journal={https://arxiv.org/abs/2405.01413},
  year={2024}
}

📄 License

Creative Commons License
This work is under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

📚 Related Work

Together, Let's make LLM for 3D great!

  • Point-Bind & Point-LLM: It aligns point clouds with Image-Bind to reason multi-modality input without 3D-instruction data training.
  • 3D-LLM: employs 2D foundation models to encode multi-view images of 3D point clouds.
  • PointLLM: employs 3D point clouds with LLaVA.
  • ShapeLLM: Combine a powerful point cloud encoder with LLM for embodied scenes.

👏 Acknowledgements

We would like to thank the authors of PointLLM, Objaverse and TinyGPT-V for their great works and repos.

minigpt-3d's People

Contributors

tangyuan96 avatar

Stargazers

 avatar iwantlatiao avatar James Zhao avatar  avatar Han Xu avatar Vishal Thengane avatar Vladislav Sorokin avatar  avatar  avatar Gilhwan Kang avatar Yifan LIU avatar LiAng avatar Xiaobing Han avatar Xin Zhao avatar Yoon, Seungje avatar Snow avatar Batuhan Ozcomlekci avatar Zekun Qi avatar Jie Wang avatar RunsongZhu avatar Yuxuan Xue avatar Jifeng Wang avatar Xinyu Liu avatar Jiahui Wang avatar LongChen avatar

Watchers

Snow avatar  avatar Robert H. Tang avatar  avatar Jin Yao avatar Yiwen Tang avatar

Forkers

whuhxb sorokinvld

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.