[ACM MM-2023] Efficient Parallel Multi-Scale Detail and Semantic Encoding Network for Lightweight Semantic Segmentation

Xiao Liu¹, Xiuya Shi¹, Lufei Chen¹, Linbo Qing¹ and Chao Ren^{1, *}

¹ Sichuan University, ^* Corresponding Author

✍️ Changelog and ToDos

Test Set Results submitted to the Cityscapes website. [Val: 75.02 | Test: 73.99]
(2023/11/6) Release training and evaluation code along with pre-trained models.

💡 Abstract

Abstract: In this work, we propose PMSDSEN, a parallel multi-scale encoderdecoder network architecture for semantic segmentation, inspired by the human visual perception system’s ability to aggregate contextual information in various contexts and scales. Our approach introduces the efficient Parallel Multi-Scale Detail and Semantic Encoding (PMSDSE) unit to extract detailed local information and coarse large-range relationships in parallel, enabling the recognition of object boundaries and object-level areas. By stacking multiple PMSDSEs, our method learns fine-grained details and textures along with abstract category and semantic information, effectively utilizing a larger range of surrounding context information for robust segmentation. To further enhance the network’s receptive field without increasing computational complexity, the Multi-Scale Semantic Extractor (MSSE) at the end of the encoder is utilized for multi-scale semantic context extraction and detailed information encoding. Additionally, the Dynamic Weighted Feature Fusion (DWFF) strategy is employed to integrate shallow layer detail information and deep layer semantic information during the decoder stage. Our method can obtain multi-scale context from local to global, achieving efficiently low-level feature extraction to high level semantic interpretation at different scales and in different contexts. Without bells and whistles, PMSDSEN obtains a better trade-off between accuracy and complexity on popular benchmarks, including Cityscapes and Camvid. Specifically, PMSDSEN attains 73.2% mIoU with only 0.9M parameters on the Cityscapes test set.

✨ Segmentation Results

Quantitative Comparison with SOTA (click to expand)

Quantitative comparison with SOTA on Cityscapes dataset.

Qualitative Comparison with SOTA (click to expand)

Feature visualization analysis (click to expand)

Visualization of features for each branch in the PMSDSEN. PMSDSEN can extract rich and detailed local information, as well as coarse and complex large-range relationships parallelly. Therefore, the fusion features possess finely detailed localization and powerful long-range relationships. Visualization of features for various fusion strategies. DWFF enables network to focus on the most informative parts of feature map by comparing the darker parts of the feature map.

🚀 Installation

This repository is built in PyTorch 1.12.1 and trained on Centos 4.18.0 environment (Python3.7, CUDA11.6, cuDNN8.0).

Clone our repository

git clone https://github.com/liux520/PMSDSEN.git
cd PMSDSEN

💻 Usage

0. Dataset Preparation

Download datasets Cityscapes and CamVid.
It is recommended to symlink the dataset root to Datasets with the follow command:

For Linux: ln -s [Your Dataset Path] [PMSDSEN Project Path/Datasets]

For Windows: mklink /d [PMSDSEN Project Path\Datasets] [Your Dataset Path] (In administrator mode)

The file structure is as follows:

Data
Datasets
├─CamVid   
│  ├─test
│  ├─testannot
│  ├─train
|  ├─trainannot
|  ├─val
│  └─valannot
├─Cityscapes
│  ├─gtCoarse
│  ├─gtFine
│  └─leftImg8bit  
Demo
...

1. Evaluation

Download the pretrained weights and run the following command for evaluation on widely-used Benchmark datasets.

python Demo/eval.py

If you just want to generate an image, you can run demo.py.

python Demo/demo.py

2. Training

Stage1: Training the model PMSDSEN.

CUDA_VISIBLE_DEVICES=0 python Trainer_seg.py --wandb_project Default --use_cuda --gpu_ids 0 --exp_name PMSDSEN_stage1 --train_batchsize 12 --val_batchsize 4 --crop_size 512 256 --workers 4 --dataset cityscapes --use_balanced_weights --loss_type ce --lr 0.045 --lr_scheduler poly --warmup_epochs 0 --start_epoch 0 --epochs 500 --model PMSDSEN --optimizer SGD --momentum 0.9 --weight_decay 1e-4

Stage 2: Training the model PMSDSEN.

CUDA_VISIBLE_DEVICES=0 python Trainer_seg.py --wandb_project Default --use_cuda --gpu_ids 0 --exp_name PMSDSEN_stage2 --train_batchsize 6 --val_batchsize 4 --crop_size 1024 512  --workers 4 --dataset cityscapes --use_balanced_weights --loss_type ce --lr 0.01 --lr_scheduler poly --warmup_epochs 0 --start_epoch 0 --epochs 500 --model PMSDSEN --optimizer SGD --momentum 0.9 --weight_decay 1e-4 --resume [pretrained stage1 model] --finetune --freeze_bn

Tips on the training or finetuning.

For Cityscapes dataset, we adopt a two-stage training strategy. In the first stage, a smaller image resolution (512 × 256) is used as input to fit a larger batch-size and faster convergence speed. We train model for 500 epochs using SGD with an initial learning rate of 4.5×10−2. In the second stage, we freeze the batch normalization layers and finetune the model at a slightly higher image resolution (1024 × 512). We train the second stage model for 500 epochs using SGD with initial learning rate of 1 × 10−2. For Camvid dataset, we use only one-stage training strategy and train the model for 1000 epochs.

📧 Contact

Should you have any question, please create an issue on this repository or contact at [email protected] & [email protected] &[email protected].

❤️ Acknowledgement

Thanks to the lucky aura of the second author!

We are thankful for these excellent works: [STDC] [PyTorch-Enconding][EdgeNets]

🙏 Citation

If this work is helpful for you, please consider citing:

@inproceedings{PMSDSEN,
author = {Liu, Xiao and Shi, Xiuya and Chen, Lufei and Qing, Linbo and Ren, Chao},
title = {Efficient Parallel Multi-Scale Detail and Semantic Encoding Network for Lightweight Semantic Segmentation},
year = {2023},
isbn = {9798400701085},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3581783.3611848},
doi = {10.1145/3581783.3611848},
booktitle = {Proceedings of the 31st ACM International Conference on Multimedia},
pages = {2544–2552},
numpages = {9},
keywords = {lightweight semantic segmentation, parallel multi-scale information encoding},
location = {Ottawa ON, Canada},
series = {MM '23}
}

liux520 / pmsdsen Goto Github PK

pmsdsen's Introduction

[ACM MM-2023] Efficient Parallel Multi-Scale Detail and Semantic Encoding Network for Lightweight Semantic Segmentation

✍️ Changelog and ToDos

💡 Abstract

✨ Segmentation Results

🚀 Installation

💻 Usage

0. Dataset Preparation

1. Evaluation

2. Training

📧 Contact

❤️ Acknowledgement

🙏 Citation

pmsdsen's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent