Giter Site home page Giter Site logo

youquanl / segment-any-point-cloud Goto Github PK

View Code? Open in Web Editor NEW
547.0 26.0 25.0 53.36 MB

[NeurIPS'23 Spotlight] Segment Any Point Cloud Sequences by Distilling Vision Foundation Models

Home Page: https://ldkong.com/Seal

JavaScript 0.54% TypeScript 2.00% HTML 0.07% SCSS 0.01% Shell 0.35% Python 87.92% C++ 0.91% Cuda 8.20%

segment-any-point-cloud's Introduction

English | 简体中文

Segment Any Point Cloud Sequences by Distilling Vision Foundation Models

Youquan Liu1,*    Lingdong Kong1,2,*    Jun Cen3    Runnan Chen4    Wenwei Zhang1,5
Liang Pan5    Kai Chen1    Ziwei Liu5
1Shanghai AI Laboratory    2National University of Singapore    3The Hong Kong University of Science and Technology    4The University of Hong Kong    5S-Lab, Nanyang Technological University

Seal 🦭

Seal is a versatile self-supervised learning framework capable of segmenting any automotive point clouds by leveraging off-the-shelf knowledge from vision foundation models (VFMs) and encouraging spatial and temporal consistency from such knowledge during the representation learning stage.

✨ Highlight

  • 🚀 Scalability: Seal directly distills the knowledge from VFMs into point clouds, eliminating the need for annotations in either 2D or 3D during pretraining.
  • ⚖️ Consistency: Seal enforces the spatial and temporal relationships at both the camera-to-LiDAR and point-to-segment stages, facilitating cross-modal representation learning.
  • 🌈 Generalizability: Seal enables knowledge transfer in an off-the-shelf manner to downstream tasks involving diverse point clouds, including those from real/synthetic, low/high-resolution, large/small-scale, and clean/corrupted datasets.

🚘 2D-3D Correspondence

🎥 Video Demo

Demo 1 Demo 2 Demo 3
Link ⤴️ Link ⤴️ Link ⤴️

Updates

  • [2023.12] - We are hosting The RoboDrive Challenge at ICRA 2024. 🚙
  • [2023.09] - Seal was selected as a ✨ spotlight ✨ at NeurIPS 2023.
  • [2023.09] - Seal was accepted to NeurIPS 2023! 🎉
  • [2023.07] - We release the code for generating semantic superpixel & superpoint by SLIC, SAM, and SEEM. More VFMs coming on the way!
  • [2023.06] - Our paper is available on arXiv, click here to check it out. Code will be available later!

Outline

Installation

Please refer to INSTALL.md for the installation details.

Data Preparation

nuScenes SemanticKITTI Waymo Open ScribbleKITTI
RELLIS-3D SemanticPOSS SemanticSTF DAPS-3D
SynLiDAR Synth4D nuScenes-C

Please refer to DATA_PREPARE.md for the details to prepare these datasets.

Superpoint Generation

Raw Point Cloud Semantic Superpoint Groundtruth

Kindly refer to SUPERPOINT.md for the details to generate the semantic superpixels & superpoints with vision foundation models.

Getting Started

Kindly refer to GET_STARTED.md to learn more usage of this codebase.

Main Result

🦄 Framework Overview

Overview of the Seal 🦭 framework. We generate, for each {LiDAR, camera} pair at timestamp t and another LiDAR frame at timestamp t + n, the semantic superpixel and superpoint by VFMs. Two pertaining objectives are then formed, including spatial contrastive learning between paired LiDAR and camera features and temporal consistency regularization between segments at different timestamps.

🚗 Cosine Similarity

The cosine similarity between a query point (red dot) and the feature learned with SLIC and different VFMs in our Seal 🦭 framework. The queried semantic classes from top to bottom examples are: “car”, “manmade”, and “truck”. The color goes from violet to yellow denoting low and high similarity scores, respectively.

🚙 Benchmark

Method nuScenes KITTI Waymo Synth4D
LP 1% 5% 10% 25% Full 1% 1% 1%
Random 8.10 30.30 47.84 56.15 65.48 74.66 39.50 39.41 20.22
PointContrast 21.90 32.50 - - - - 41.10 - -
DepthContrast 22.10 31.70 - - - - 41.50 - -
PPKT 35.90 37.80 53.74 60.25 67.14 74.52 44.00 47.60 61.10
SLidR 38.80 38.30 52.49 59.84 66.91 74.79 44.60 47.12 63.10
ST-SLidR 40.48 40.75 54.69 60.75 67.70 75.14 44.72 44.93 -
Seal 🦭 44.95 45.84 55.64 62.97 68.41 75.60 46.63 49.34 64.50

🚌 Linear Probing

The qualitative results of our Seal 🦭 framework pretrained on nuScenes (without using groundtruth labels) and linear probed with a frozen backbone and a linear classification head. To highlight the differences, the correct / incorrect predictions are painted in gray / red, respectively.

🚛 Downstream Generalization

Method ScribbleKITTI RELLIS-3D SemanticPOSS SemanticSTF SynLiDAR DAPS-3D
1% 10% 1% 10% Half Full Half Full 1% 10% Half Full
Random 23.81 47.60 38.46 53.60 46.26 54.12 48.03 48.15 19.89 44.74 74.32 79.38
PPKT 36.50 51.67 49.71 54.33 50.18 56.00 50.92 54.69 37.57 46.48 78.90 84.00
SLidR 39.60 50.45 49.75 54.57 51.56 55.36 52.01 54.35 42.05 47.84 81.00 85.40
Seal 🦭 40.64 52.77 51.09 55.03 53.26 56.89 53.46 55.36 43.58 49.26 81.88 85.90

🚚 Robustness Probing

Init Backbone mCE mRR Fog Wet Snow Motion Beam Cross Echo Sensor
Random PolarNet 115.09 76.34 58.23 69.91 64.82 44.60 61.91 40.77 53.64 42.01
Random CENet 112.79 76.04 67.01 69.87 61.64 58.31 49.97 60.89 53.31 24.78
Random WaffleIron 106.73 72.78 56.07 73.93 49.59 59.46 65.19 33.12 61.51 44.01
Random Cylinder3D 105.56 78.08 61.42 71.02 58.40 56.02 64.15 45.36 59.97 43.03
Random SPVCNN 106.65 74.70 59.01 72.46 41.08 58.36 65.36 36.83 62.29 49.21
Random MinkUNet 112.20 72.57 62.96 70.65 55.48 51.71 62.01 31.56 59.64 39.41
PPKT MinkUNet 105.64 76.06 64.01 72.18 59.08 57.17 63.88 36.34 60.59 39.57
SLidR MinkUNet 106.08 75.99 65.41 72.31 56.01 56.07 62.87 41.94 61.16 38.90
Seal 🦭 MinkUNet 92.63 83.08 72.66 74.31 66.22 66.14 65.96 57.44 59.87 39.85

🚜 Qualitative Assessment

The qualitative results of Seal 🦭 and prior methods pretrained on nuScenes (without using groundtruth labels) and fine-tuned with 1% labeled data. To highlight the differences, the correct / incorrect predictions are painted in gray / red, respectively.

TODO List

  • Initial release. 🚀
  • Add license. See here for more details.
  • Add video demos 🎥
  • Add installation details.
  • Add data preparation details.
  • Support semantic superpixel generation.
  • Support semantic superpoint generation.
  • Add evaluation details.
  • Add training details.

Citation

If you find this work helpful, please kindly consider citing our paper:

@inproceedings{liu2023segment,
  title = {Segment Any Point Cloud Sequences by Distilling Vision Foundation Models},
  author = {Liu, Youquan and Kong, Lingdong and Cen, Jun and Chen, Runnan and Zhang, Wenwei and Pan, Liang and Chen, Kai and Liu, Ziwei},
  booktitle = {Advances in Neural Information Processing Systems}, 
  year = {2023},
}
@misc{liu2023segment_any_point_cloud,
  title = {The Segment Any Point Cloud Codebase},
  author = {Liu, Youquan and Kong, Lingdong and Cen, Jun and Chen, Runnan and Zhang, Wenwei and Pan, Liang and Chen, Kai and Liu, Ziwei},
  howpublished = {\url{https://github.com/youquanl/Segment-Any-Point-Cloud}},
  year = {2023},
}

License

Creative Commons License
This work is under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Acknowledgement

This work is developed based on the MMDetection3D codebase.


MMDetection3D is an open-source object detection toolbox based on PyTorch, towards the next-generation platform for general 3D detection. It is a part of the OpenMMLab project developed by MMLab.

Part of this codebase has been adapted from SLidR, Segment Anything, X-Decoder, OpenSeeD, Segment Everything Everywhere All at Once, LaserMix, and Robo3D.

❤️ We thank the exceptional contributions from the above open-source repositories!

segment-any-point-cloud's People

Contributors

eltociear avatar ldkong1205 avatar youquanl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

segment-any-point-cloud's Issues

Results on SLidR+SEEM

Thanks for sharing. Recently, using your code, I generated the superpixels with SEEM. Then I pre-trained the 3D backbone with SLidR with the new superpixels. However, the performance is not as good as that in paper. Specifically, I finetuned the model on 1% nuScenes dataset, the mIoU is 41.0, but the claimed result in the paper is 44.02 m_IoU. Can you give me some suggestions? Thanks very much.

Great work!

Thanks for your contribution to the image-to-point knowledge transfer community.

Improving mIoU with SAM Masks in SLidR Baseline

Hi, I really enjoy your paper, and thank you for your great work.

I tried to run your code on the SLidR baseline using SAM masks, as in the ablation study, only using C2L and VFM. However, I could only achieve 41.3 mIoU, instead of the 44 mIoU mentioned in the paper. I used the same hyperparameters as in the paper and only changed the superpixel type to 'SAM'.

Are there any possible solutions?

Thanks.

Details of Image Segmentation with SAM

Thanks for sharing this cool project! I was confused by how you segment 2D image using VFMs:

As a result, SAM is able to segment images, with either point, box, or mask prompts, across different domains and data distributions. (from 6.2 Vision Foundation Models)

What did you feed to SAM to get the final segmented 2D image?

Thanks for your explanation.

About the impact of the quality of C2L

Hi, thanks for this fantastic project.

C2L refers to Camera-to-LiDAR distillation in Table.5 of the paper.

Is there any ablation study on the impact of the quality of C2L on performance?
As we know, the superpoints are generated by C2L from superpixels.
If there are larger time offset between lidar and camera data, there would be very few superpoints left due to mismatching.
eg (the pink color are projected lidar points):
Screenshot from 2023-07-24 13-57-14

Since it is common that 2d-to-3d correspondence can be really bad in reality, it would be great if this project can overcome this issue.

关于点云分割的类别

请问下对于任意点云分割出来的结果带类别吗, 比如是否能够识别车人,建筑物这些类别

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.