Giter Site home page Giter Site logo

3d-ovs's Introduction

[NeurIPS 2023] Weakly Supervised 3D Open-vocabulary Segmentation

This repository contains a pytorch implementation for the paper: Weakly Supervised 3D Open-vocabulary Segmentation. Our method can segment 3D scenes using open-vocabulary texts without requiring any segmentation annotations.


Installation

Tested on Ubuntu 20.04 + Pytorch 1.12.1

Install environment:

conda create -n 3dovs python=3.9
conda activate 3dovs
pip install torch torchvision
pip install ftfy regex tqdm scikit-image opencv-python configargparse lpips imageio-ffmpeg kornia tensorboard
pip install git+https://github.com/openai/CLIP.git

Datasets

Please download the datasets from this link and put the datasets in ./data. You can put the datasets elsewhere if you modify the corresponding paths in the configs. The datasets are organized as

/data
|  /scene0
|  |--/images
|  |  |--00.png
|  |  |--01.png
|  |  ...
|  |--/segmentations
|  |  |--classes.txt
|  |  |--/test_view0
|  |  |  |--class0.png
|  |  |  ...
|  |  |--/test_view1
|  |  |  |--class0.png
|  |  |  ...
|  |  ...
|  |--poses_bounds.npy
|  /scene1
|  ...

where images contains the RGB images, segmentations contains the segmentation annotations for the test views, segmentations/classes.txt stores the classes' text descriptions, and poses_bounds.npy contains the camera poses generated by Colmap.

Quick Start

We provide the checkpoints for the scenes in this link. You can then test the segmentation by:

bash scripts/test_segmentation.sh [CKPT_PATH] [CONFIG_FILE] [GPU_ID] 

The config files are stored in configs, each file is named after configs/$scene_name.txt. The results will be saved in the checkpoint's path. More details can be found in scripts/test_segmentation.sh.

Data Preparation

We need to extract a hierarchy of CLIP features from image patches for training. You can extract the CLIP features by: (Please modify $scene_name to the scene name you want to extract features for)

bash scripts/extract_clip_features.sh data/$scene_name/images clip_features/$scene_name [GPU_ID]

The extracted features will be saved in clip_features/$scene_name.

Training

1. Train original TensoRF

This step is for reconstructing the TensoRF for the scenes. Please modify the datadir and expname in configs/resonstruction.txt to specify the dataset path and the experiment name. By default we set datadir to data/$scene_name and expname as $scene_name. You can then train the original TensoRF by:

bash script/reconstruction.sh [GPU_ID]

The reconstructed TensoRF will be saved in log/$scene_name.

2. Train segmentation

We provide the training script for our datasets under configs as $scene_name.txt. You can train the segmentation by:

bash scripts/segmentation.sh [CONFIG_FILE] [GPU_ID] 

The trained model will be saved in log_seg/$scene_name. The training takes about 1h30min and consumes about 14GB GPU memory.

Trouble Shooting

1. Loading CLIP features is very slow

That is because the CLIP features are very large (has 512 channels) and consume lots of memory. You can load fewer views' CLIP features by setting clip_input to 0.5 or smaller values in the config file. Normally 0.5 is enough for good performance.

2. Prompt engineering

To test if your prompts are good, you can set test_prompt to a view number in the config file. You will then see the relevancy maps in this view for each class in clip_features/clip_relevancy_maps. Each relevancy map is named as scale_class.png. You can then check if the relevancy maps are good for each class. If not, you can modify the prompts in segmentations/classes.txt and test again. In our experiments, we find that specific descriptions of objects that include the object's texture and color work better.

3. Custom data

For custom scenes, you can generate the camera poses using Colmap following the recover camera poses section from this link. If your custom data does not have annotated segmentation maps, you can set has_segmentation_maps to 0 in the config file.

4. Bad segmentation results

The bad segmentation results may be due to poor geometry reconstruction, erroneous camera poses, or inaccurate text prompts. If none of the above are the main reasons, you can try adjusting the dino_neg_weight in the config file. Usually, if the segmentation results do not align well with the object boundaries, you can set dino_neg_weight to a value larger than 0.2, such as 0.22. If the segmentation is making mistakes, you can set dino_neg_weight to a value smaller than 0.2, such as 0.18. Since dino_neg_weight encourages the model to assign different labels when the DINO features are distant, the higher it is, the more unstable the model becomes, but it also encourages sharper boundaries.

TODO

  • Currently we only support faceforwarding scenes, it can be extended to support unbounded 360 scenes using some coordinate transformation.

Acknowledgments

This repo is heavily based on the TensoRF. Thank them for sharing their amazing work!

Citation

@article{liu2023weakly,
  title={Weakly Supervised 3D Open-vocabulary Segmentation},
  author={Liu, Kunhao and Zhan, Fangneng and Zhang, Jiahui and Xu, Muyu and Yu, Yingchen and Saddik, Abdulmotaleb El and Theobalt, Christian and Xing, Eric and Lu, Shijian},
  journal={arXiv preprint arXiv:2305.14093},
  year={2023}
}

3d-ovs's People

Contributors

kunhao-liu avatar pmoulon avatar

Stargazers

 avatar Zijin Yin avatar Sunny avatar Bin Dou avatar Griffin Seonho Lee avatar  avatar Kohsuke IDE avatar Xusy2333 avatar Ge Wu avatar ZqlwMatt avatar Minseo Kwon avatar Belal Hmedan avatar Maz Gudelis avatar  avatar Shuo Feng avatar Yi Xie avatar  avatar Wenbo Zhang avatar Yuqi Zhang avatar Yuhan Jin avatar RandomNpc avatar Xiaolong avatar  avatar Ash avatar Penalty_kl avatar Keshen Zhou avatar Ziye Chen avatar Tianyi Yan avatar Lu Ming avatar Zhaoqing (Derrick) Wang avatar Jibril Muhammad Adam avatar  avatar Debug_Yann avatar Adnaan Sachidanandan avatar Jiading Fang avatar Nguyen Duc Anh Phuc avatar Julian Tanke avatar cly avatar Lucas Cassiano avatar Jin-Chuan Shi avatar Zuhao Yang avatar Youquan Liu avatar Xing Yun (邢云) avatar Lu Kangkang avatar Totorofff avatar Jeff Carpenter avatar  avatar Qiao Gu avatar Krishna Murthy avatar BigDream avatar  avatar Boshen Xu avatar Kyusun Cho avatar Yuseung (Phillip) Lee avatar hanliming avatar Xiaolong  avatar Jianing Yang avatar ChengJJ avatar  avatar Nikolas Tsagkas avatar  avatar  avatar Kwonyoung Ryu avatar Sandalots avatar 爱可可-爱生活 avatar Dave Z. Chen  avatar Yoon, Seungje avatar  avatar Tomato1107 avatar  avatar  avatar Giseop Kim avatar  avatar JerExJs avatar Ray avatar Toan Nguyen avatar Yingqiang Ge avatar Zhifeng Gu avatar  avatar Yining Shi avatar Yang Liu avatar Fangneng Zhan avatar Da Li avatar Jingnan Gao avatar  avatar Chenxin Li avatar Xiaobing Han avatar  avatar

Watchers

 avatar Lu Kangkang avatar Haoran Duan avatar Yuanwen Yue avatar TaeYoung Kim avatar Yixuan Pan avatar Chao YIN  尹超 avatar Nikolas Tsagkas avatar ChengJJ avatar  avatar

3d-ovs's Issues

When will unbouded 360 scenes be supported?

Thank you for the excellent work!
I noticed that there is a support for unbouded 360 scenes in TODOs, and when will it be finished? And if the support won't come quickly, could you give some tips about the support? There are some code slides will be better.

Open code

When are you planning to share code?
Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.