Giter Site home page Giter Site logo

theycallmeloki / 3d-box-segment-anything Goto Github PK

View Code? Open in Web Editor NEW

This project forked from dvlab-research/3d-box-segment-anything

0.0 0.0 0.0 10.65 MB

We extend Segment Anything to 3D perception by combining it with VoxelNeXt.

Python 1.25% Jupyter Notebook 98.75%

3d-box-segment-anything's Introduction

3D-Box via Segment Anything

We extend Segment Anything to 3D perception by combining it with VoxelNeXt. Note that this project is still in progress. We are improving it and dveloping more examples. Any issue or pull request is welcome!

Why this project?

Segment Anything and its following projects focus on 2D images. In this project, we extend the scope to 3D world by combining Segment Anything and VoxelNeXt. When we provide a prompt (e.g., a point / box), the result is not only 2D segmentation mask, but also 3D boxes.

The core idea is that VoxelNeXt is a fully sparse 3D detector. It predicts 3D object upon each sparse voxel. We project 3D sparse voxels onto 2D images. And then 3D boxes can be generated for voxels in the SAM mask.

  • This project makes 3D object detection to be promptable.
  • VoxelNeXt is based on sparse voxels that are easy to be related to the mask generated from segment anything.
  • This project could facilitate 3D box labeling. 3D box can be obtained via a simple click on image. It might largely save human efforts, especially on autonuous driving scenes.

Installation

  1. Basic requirements pip install -r requirements.txt
  2. Segment anything pip install git+https://github.com/facebookresearch/segment-anything.git
  3. spconv pip install spconv or cuda version spconv pip install spconv-cu111 based on your cuda version. Please use spconv 2.2 / 2.3 version, for example spconv==2.3.5

Getting Started

Please try it via seg_anything_and_3D.ipynb. We provide this example on nuScenes dataset. You can use other image-points pairs.

TODO List

    • Zero-shot version VoxelNeXt.
    • Examples on more datasets.
    • Indoor scenes.

Citation

If you find this project useful in your research, please consider citing:

@article{kirillov2023segany,
  title={Segment Anything}, 
  author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan-Yen and Doll{\'a}r, Piotr and Girshick, Ross},
  journal={arXiv:2304.02643},
  year={2023}
}

@inproceedings{chen2023voxenext,
  title={VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking},
  author={Yukang Chen and Jianhui Liu and Xiangyu Zhang and Xiaojuan Qi and Jiaya Jia},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2023}
}

Acknowledgement

Our Works in 3D Perception

  • VoxelNeXt (CVPR 2023) [Paper] [Code] Fully Sparse VoxelNet for 3D Object Detection and Tracking.
  • Focal Sparse Conv (CVPR 2022 Oral) [Paper] [Code] Dynamic sparse convolution for high performance.
  • Spatial Pruned Conv (NeurIPS 2022) [Paper] [Code] 50% FLOPs saving for efficient 3D object detection.
  • LargeKernel3D (CVPR 2023) [Paper] [Code] Large-kernel 3D sparse CNN backbone.
  • SphereFormer (CVPR 2023) [Paper] [Code] Spherical window 3D transformer backbone.
  • spconv-plus A library where we combine our works into spconv.
  • SparseTransformer A library that includes high-efficiency transformer implementations for sparse point cloud or voxel data.

3d-box-segment-anything's People

Contributors

yukang2017 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.