Giter Site home page Giter Site logo

partslip's Introduction

PartSLIP

official implementation of "PartSLIP: Low-Shot Part Segmentation for 3D Point Clouds via Pretrained Image-Language Models" (CVPR2023) [PDF] [Project]

We explores a novel way for low-shot part segmentation of 3D point clouds by leveraging a pretrained image-language model (GLIP). We show that our method enables excellent zero-shot 3D part segmentation. Our few-shot version not only outperforms existing few-shot approaches by a large margin but also achieves highly competitive results compared to the fully supervised counterpart. Furthermore, we demonstrate that our method can be directly applied to real-world (e.g., iPhone-scanned) point clouds without significant domain gaps.

teaser results on PartNetE dataset

real_pc results on real-world (iPhone-scanned) point clouds

Installation

Create a conda envrionment and install dependencies.

conda env create -f environment.yml
conda activate partslip

Install PyTorch3D

We utilize PyTorch3D for rendering point clouds. Please install it by the following commands or its official guide:

pip install "git+https://github.com/facebookresearch/pytorch3d.git"

Install GLIP

We incorporate GLIP and made some small changes. Please clone our modified version and install it by the following commands or its official guide:

git submodule update --init
cd GLIP
python setup.py build develop --user

Install cut-pursuit

We utilize cut-pursuit for computing superpoints. Please install it by the following commands or its official guide:

CONDAENV=YOUR_CONDA_ENVIRONMENT_LOCATION
cd partition/cut-pursuit
mkdir build
cd build
cmake .. -DPYTHON_LIBRARY=$CONDAENV/lib/libpython3.9.so -DPYTHON_INCLUDE_DIR=$CONDAENV/include/python3.9 -DBOOST_INCLUDEDIR=$CONDAENV/include -DEIGEN3_INCLUDE_DIR=$CONDAENV/include/eigen3
make

Quick-Demo

Download pretrained checkpoints

You can find the pre-trained checkpoints from here. For zero-shot inference, please use the pre-trained GLIP checkpoint (glip_large_model.pth). For few-shot inference, please use our few-shot checkpoints for each object category. Please download checkpoints to models/.

Here is the code to download the checkpoint files for running demo.py:

!pip3 install huggingface_hub
from huggingface_hub import hf_hub_download
for model in ["glip_large_model", "Chair", "Suitcase", "Refrigerator", "Lamp", "Kettle"]:
    hf_hub_download(repo_id="minghua/PartSLIP", filename="models/%s.pth" % model, repo_type="dataset", local_dir="./", local_dir_use_symlinks=False)

Inference

We provide 5 example point cloud files in examples/. You can use the following command to run both zero-shot and few-shot inferences for them after downloading 5+1 checkpoint files.

python3 demo.py

The script will generate following files:

rendered_img/: rendering of the input point cloud from 10 different views.
glip_pred/: GLIP predicted bouning boxes for each view.
superpoint.ply: Generated super points for the input point cloud used for converting bounding boxes to 3D segmentation. Different super points are in different colors.
semantic_seg/: visualization of semantic segmentation results for each part. Colored in white or black.
instance_seg/: visualization of instance segmentation results for each part. Different part instances in different color.

You can also find the example output from here.

Evaluation

sem_seg_eval.py provides a script to calculate the mIoUs reported in the paper.

PartNet-Ensembled

You can find the PartNet-Ensembled dataset used in the paper from here.

PartNetE_meta.json: part names trained and evaluated of all 45 categories.
split: 
    - test.txt: list of models used for testing (1,906 models)
    - few-shot.txt: list of models used for few-shot training (45x8 models)
    - train.txt: list of models used for training (extra 28k models for some baselines)
data:
    - test
        - Chair
            - 179
                - pc.ply: input colored point cloud file
                - images: rendered images of the 3D mesh, used for generation of the input point cloud (multi-view fusion)
                - label.npy: ground truth segmentation labels
                    - semantic_seg: (n,), semantic segmentation labels, 0-indexed, corresponding to the part names in PartNetE_meta.json, -1 indicates not belonging to any parts.
                    - instance_seg: (n,), instance segmentation labels, 0-indexed, each number indicates a instance, -1 indicates not belonging to any part instances.
            ...
        ...
    - few_shot

Tips:

  1. We assume dense and colored input point cloud, which is typically available in real-world applications.

  2. You don't need to load the same checkpoints multiple times when batch evaluation.

  3. You can reuse the superpoint results across different evaluations (e.g., zero- and few-shot) for the same input point cloud.

  4. If you find the results unsatisfactory (e.g., when you change the number of input points or change to other datasets), you may want to tune the following paramters:

    a. point cloud rendering: point_size in src/render_pc.py::render_single_view(). You can change the point size to ensure realistic point cloud renderings.

    b. superpoint generation: reg in src/gen_superpoint.py::gen_superpoint(). This parameters adjust the granuarlity of the super point generation. You may want to ensure the generated superpoints are not too coarse-grained (e.g., multiple chair legs are not segmented) or fine-grained (e.g., too many small super points).

  5. For zero-shot text prompt, simply concatenating all part names (e.g., "arm, back, seat, leg, wheel") is sometimes better than including the object category as well (e.g., "arm, back, seat, leg, wheel of a chair", used in the paper). The mIoUs are 27.2 and 34.8 in our experiments.

  6. For zero-shot inference, you can change the prompts without extra training. Whereas for few-shot inference, changing prompts requires retraining.

Citation

If you find our code helpful, please cite our paper:

@article{liu2022partslip,
  title={PartSLIP: Low-Shot Part Segmentation for 3D Point Clouds via Pretrained Image-Language Models},
  author={Liu, Minghua and Zhu, Yinhao and Cai, Hong and Han, Shizhong and Ling, Zhan and Porikli, Fatih and Su, Hao},
  journal={arXiv preprint arXiv:2212.01558},
  year={2022}
}

partslip's People

Contributors

colin97 avatar xiyuantest avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.