Giter Site home page Giter Site logo

robot-learning-freiburg / spino Goto Github PK

View Code? Open in Web Editor NEW
25.0 6.0 2.0 4.86 MB

Few-Shot Panoptic Segmentation With Foundation Models

Home Page: http://spino.cs.uni-freiburg.de

License: GNU General Public License v3.0

Python 99.72% Shell 0.28%
few-shot-learning foundation-models panoptic-segmentation

spino's Introduction

SPINO: Few-Shot Panoptic Segmentation With Foundation Models

arXiv | IEEE Xplore | Website | Video

This repository is the official implementation of the paper:

Few-Shot Panoptic Segmentation With Foundation Models

Markus Käppeler*, Kürsat Petek*, Niclas Vödisch*, Wolfram Burgard, and Abhinav Valada.
*Equal contribution.

IEEE International Conference on Robotics and Automation (ICRA), 2024

Overview of SPINO approach

If you find our work useful, please consider citing our paper:

@inproceedings{kaeppeler2024spino,
    title={Few-Shot Panoptic Segmentation With Foundation Models},
    author={Käppeler, Markus and Petek, Kürsat and Vödisch, Niclas and Burgard, Wolfram and Valada, Abhinav},
    booktitle={IEEE International Conference on Robotics and Automation (ICRA)},
    year={2024},
    pages={7718-7724}
}

📔 Abstract

Current state-of-the-art methods for panoptic segmentation require an immense amount of annotated training data that is both arduous and expensive to obtain posing a significant challenge for their widespread adoption. Concurrently, recent breakthroughs in visual representation learning have sparked a paradigm shift leading to the advent of large foundation models that can be trained with completely unlabeled images. In this work, we propose to leverage such task-agnostic image features to enable few-shot panoptic segmentation by presenting Segmenting Panoptic Information with Nearly 0 labels (SPINO). In detail, our method combines a DINOv2 backbone with lightweight network heads for semantic segmentation and boundary estimation. We show that our approach, albeit being trained with only ten annotated images, predicts high-quality pseudo-labels that can be used with any existing panoptic segmentation method. Notably, we demonstrate that SPINO achieves competitive results compared to fully supervised baselines while using less than 0.3% of the ground truth labels, paving the way for learning complex visual recognition tasks leveraging foundation models. To illustrate its general applicability, we further deploy SPINO on real-world robotic vision systems for both outdoor and indoor environments.

👩‍💻 Code

🏗 Setup

⚙️ Installation

  1. Create conda environment: conda create --name spino python=3.8
  2. Activate environment: conda activate spino
  3. Install dependencies: pip install -r requirements.txt
  4. Install torch, torchvision and cuda: pip install torch==1.10.1+cu111 torchvision==0.11.2+cu111 torchaudio==0.10.1 -f https://download.pytorch.org/whl/cu111/torch_stable.html
  5. Compile deformable attention: cd panoptic_segmentation_model/external/ms_deformable_attention & sh make.sh

💻 Development

  1. Install pre-commit githook scripts: pre-commit install
  2. Upgrade isort to 5.12.0: pip install isort
  3. Update pre-commit: pre-commit autoupdate
  4. Linter (pylint) and formatter (yapf, iSort) settings can be set in pyproject.toml.

🏃 Running the Code

🎨 Pseudo-label generation

To generate pseudo-labels for the Cityscapes dataset, please set the path to the dataset in the configuration files (see list below). Then execute run_cityscapes.sh from the root of the panoptic_label_generator folder. This script will perform the following steps:

  1. Train the semantic segmentation module using the configuration file configs/semantic_cityscapes.yaml.
  2. Train the boundary estimation module using the configuration file configs/boundary_cityscapes.yaml.
  3. Generate the panoptic pseudo-labels using the configuration file configs/instance_cityscapes.yaml.

We also support the KITTI-360 dataset. To generate pseudo-labels for KITTI-360, please adapt the corresponding configuration files.

Instead of training the modules from scratch, you can also use the pretrained weights provided at these links:

🧠 Panoptic segmentation model

To train a panoptic segmentation model on a given dataset, e.g., the generated pseudo-labels, execute train.sh.

Before running the code, specify all settings:

  1. python_env: Set the name of the conda environment (e.g. "spino")
  2. alias_python: Set the path of the python binary to be used
  3. WANDB_API_KEY: Set the wand API key of your account
  4. CUDA_VISIBLE_DEVICES Specifies the device ids of available GPUs
  5. Set all remaining arguments:
    • nproc_per_node: Number of processes per node (usually node=GPU server), this should be equal to the number of devices specified in CUDA_VISIBLE_DEVICES
    • master_addr: IP address of GPU server to run the code on
    • master_port: Port to be used for server access
    • run_name: Name of the current run, a folder will be created with this name including all the files to be created (pretrained weights, config file etc.) and this name will also appear on wandb
    • project_root_dir: Path to where the folder with the run name will be created
    • mode: Mode of the training, can be "train" or "eval"
    • resume: If specified, the training will be resumed from the specified checkpoint
    • pre_train: Only load the specified modules from the checkpoint
    • freeze_modules: Freeze the specified modules during training
    • filename_defaults_config: Filename of the default configuration file with all configuration parameters
    • filename_config: Filename of the configuration file that acts relative to the default configuration file
    • comment: Some string
    • seed: Seed to initialize "torch", "random", and "numpy"
  6. Set available flags:
    • eval: Only evaluate the model specified by resume
    • debug: Start the training in debug mode

Additionally,

  1. ensure that the dataset path is set correctly in the corresponding config file, e.g., train_cityscapes_dino_adapter.yaml.
  2. set the entity and project parameters for wandb.init(...) in misc/train_utils.py.

💾 Datasets

Download the following files:

  • leftImg8bit_sequence_trainvaltest.zip (324GB)
  • gtFine_trainvaltest.zip (241MB)
  • camera_trainvaltest.zip (2MB)

After extraction, one should obtain the following file structure:

── cityscapes
   ├── camera
   │    └── ...
   ├── gtFine
   │    └── ...
   └── leftImg8bit_sequence
        └── ...

Download the following files:

  • Perspective Images for Train & Val (128G): You can remove "01" in line 12 in download_2d_perspective.sh to only download the relevant images.
  • Test Semantic (1.5G)
  • Semantics (1.8G)
  • Calibrations (3K)

After extraction and copying of the perspective images, one should obtain the following file structure:

── kitti_360
   ├── calibration
   │    ├── calib_cam_to_pose.txt
   │    └── ...
   ├── data_2d_raw
   │   ├── 2013_05_28_drive_0000_sync
   │   └── ...
   ├── data_2d_semantics
   │    └── train
   │        ├── 2013_05_28_drive_0000_sync
   │        └── ...
   └── data_2d_test
        ├── 2013_05_28_drive_0008_sync
        └── 2013_05_28_drive_0018_sync

👩‍⚖️ License

For academic usage, the code is released under the GPLv3 license. For any commercial purpose, please contact the authors.

🙏 Acknowledgment

This work was funded by the German Research Foundation (DFG) Emmy Noether Program grant No 468878300 and the European Union’s Horizon 2020 research and innovation program grant No 871449-OpenDR.

DFG logo       OpenDR logo

spino's People

Contributors

vniclas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

spino's Issues

code release

@vniclas thanks for sharing ur which is wonderful can u let us knw when the code shall be released
THanks in advance

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.