official implementation of "PartSLIP: Low-Shot Part Segmentation for 3D Point Clouds via Pretrained Image-Language Models" (CVPR2023) [PDF] [Project]
We explores a novel way for low-shot part segmentation of 3D point clouds by leveraging a pretrained image-language model (GLIP). We show that our method enables excellent zero-shot 3D part segmentation. Our few-shot version not only outperforms existing few-shot approaches by a large margin but also achieves highly competitive results compared to the fully supervised counterpart. Furthermore, we demonstrate that our method can be directly applied to real-world (e.g., iPhone-scanned) point clouds without significant domain gaps.
results on real-world (iPhone-scanned) point clouds
conda env create -f environment.yml
conda activate partslip
We utilize PyTorch3D for rendering point clouds. Please install it by the following commands or its official guide:
pip install "git+https://github.com/facebookresearch/pytorch3d.git"
We incorporate GLIP and made some small changes. Please clone our modified version and install it by the following commands or its official guide:
git submodule update --init
cd GLIP
python setup.py build develop --user
We utilize cut-pursuit for computing superpoints. Please install it by the following commands or its official guide:
CONDAENV=YOUR_CONDA_ENVIRONMENT_LOCATION
cd partition/cut-pursuit
mkdir build
cd build
cmake .. -DPYTHON_LIBRARY=$CONDAENV/lib/libpython3.9.so -DPYTHON_INCLUDE_DIR=$CONDAENV/include/python3.9 -DBOOST_INCLUDEDIR=$CONDAENV/include -DEIGEN3_INCLUDE_DIR=$CONDAENV/include/eigen3
make
You can find the pre-trained checkpoints from here.
For zero-shot inference, please use the pre-trained GLIP checkpoint (glip_large_model.pth
). For few-shot inference, please use our few-shot checkpoints for each object category. Please download checkpoints to models/
.
Here is the code to download the checkpoint files for running demo.py:
!pip3 install huggingface_hub
from huggingface_hub import hf_hub_download
for model in ["glip_large_model", "Chair", "Suitcase", "Refrigerator", "Lamp", "Kettle"]:
hf_hub_download(repo_id="minghua/PartSLIP", filename="models/%s.pth" % model, repo_type="dataset", local_dir="./", local_dir_use_symlinks=False)
We provide 5 example point cloud files in examples/
. You can use the following command to run both zero-shot and few-shot inferences for them after downloading 5+1 checkpoint files.
python3 demo.py
The script will generate following files:
rendered_img/: rendering of the input point cloud from 10 different views.
glip_pred/: GLIP predicted bouning boxes for each view.
superpoint.ply: Generated super points for the input point cloud used for converting bounding boxes to 3D segmentation. Different super points are in different colors.
semantic_seg/: visualization of semantic segmentation results for each part. Colored in white or black.
instance_seg/: visualization of instance segmentation results for each part. Different part instances in different color.
You can also find the example output from here.
sem_seg_eval.py
provides a script to calculate the mIoUs reported in the paper.
You can find the PartNet-Ensembled dataset used in the paper from here.
PartNetE_meta.json: part names trained and evaluated of all 45 categories.
split:
- test.txt: list of models used for testing (1,906 models)
- few-shot.txt: list of models used for few-shot training (45x8 models)
- train.txt: list of models used for training (extra 28k models for some baselines)
data:
- test
- Chair
- 179
- pc.ply: input colored point cloud file
- images: rendered images of the 3D mesh, used for generation of the input point cloud (multi-view fusion)
- label.npy: ground truth segmentation labels
- semantic_seg: (n,), semantic segmentation labels, 0-indexed, corresponding to the part names in PartNetE_meta.json, -1 indicates not belonging to any parts.
- instance_seg: (n,), instance segmentation labels, 0-indexed, each number indicates a instance, -1 indicates not belonging to any part instances.
...
...
- few_shot
-
We assume dense and colored input point cloud, which is typically available in real-world applications.
-
You don't need to load the same checkpoints multiple times when batch evaluation.
-
You can reuse the superpoint results across different evaluations (e.g., zero- and few-shot) for the same input point cloud.
-
If you find the results unsatisfactory (e.g., when you change the number of input points or change to other datasets), you may want to tune the following paramters:
a. point cloud rendering:
point_size
insrc/render_pc.py::render_single_view()
. You can change the point size to ensure realistic point cloud renderings.b. superpoint generation:
reg
insrc/gen_superpoint.py::gen_superpoint()
. This parameters adjust the granuarlity of the super point generation. You may want to ensure the generated superpoints are not too coarse-grained (e.g., multiple chair legs are not segmented) or fine-grained (e.g., too many small super points). -
For zero-shot text prompt, simply concatenating all part names (e.g., "arm, back, seat, leg, wheel") is sometimes better than including the object category as well (e.g., "arm, back, seat, leg, wheel of a chair", used in the paper). The mIoUs are 27.2 and 34.8 in our experiments.
-
For zero-shot inference, you can change the prompts without extra training. Whereas for few-shot inference, changing prompts requires retraining.
If you find our code helpful, please cite our paper:
@article{liu2022partslip,
title={PartSLIP: Low-Shot Part Segmentation for 3D Point Clouds via Pretrained Image-Language Models},
author={Liu, Minghua and Zhu, Yinhao and Cai, Hong and Han, Shizhong and Ling, Zhan and Porikli, Fatih and Su, Hao},
journal={arXiv preprint arXiv:2212.01558},
year={2022}
}