PyTorch implementation of realtime semantic segmentation models, support multi-gpu training and validating, automatic mixed precision training, knowledge distillation etc.
torch == 1.8.1
segmentation-models-pytorch
torchmetrics
albumentations
loguru
tqdm
- ADSCNet 1
- AGLNet 2
- BiSeNetv1 3
- BiSeNetv2 4
- CANet 5
- CFPNet 6
- CGNet 7
- ContextNet8
- DABNet9
- DDRNet10
- DFANet11
- EDANet 12
- ENet 13
- ERFNet 14
- ESNet 15
- ESPNet 16
- ESPNetv2 17
- FANet 18
- FarseeNet 19
- FastSCNN 20
- FDDWNet 21
- FPENet 22
- FSSNet 23
- ICNet 24
- LEDNet 25
- LinkNet26
- Lite-HRNet27
- LiteSeg28
- MiniNet29
- MiniNetv230
- PP-LiteSeg31
- RegSeg32
- SegNet33
- ShelfNet34
- SQNet35
- STDC36
- SwiftNet37
If you want to use encoder-decoder structure with pretrained encoders, you may refer to: segmentation-models-pytorch38. This repo also provides easy access to SMP. Just modify the config file to (e.g. if you want to train DeepLabv3Plus with ResNet-101 backbone as teacher model to perform knowledge distillation)
self.model = 'smp'
self.encoder = 'resnet101'
self.decoder = 'deeplabv3p'
or use command-line arguments
python main.py --model smp --encoder resnet101 --decoder deeplabv3p
Details of the configurations can also be found in this file.
Currently only support the original knowledge distillation method proposed by Geoffrey Hinton.39
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 main.py
CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py
Model | Year | Encoder | Params(M) paper/my |
FPS1 | mIoU(paper) val/test |
mIoU(my) val2 |
---|---|---|---|---|---|---|
ADSCNet | 2019 | None | n.a./0.51 | 89 | n.a./67.5 | 69.06 |
AGLNet | 2020 | None | 1.12/1.02 | 61 | 69.39/70.1 | 73.58 |
BiSeNetv1 | 2018 | ResNet18 | 49.0/13.32 | 88 | 74.8/74.7 | 74.91 |
BiSeNetv2 | 2020 | None | n.a./2.27 | 142 | 73.4/72.6 | 73.733 |
CANet | 2019 | MobileNetv2 | 4.8/4.77 | 76 | 73.4/73.5 | 76.59 |
CFPNet | 2021 | None | 0.55/0.27 | 64 | n.a./70.1 | 70.08 |
CGNet | 2018 | None | 0.41/0.24 | 157 | 59.7/64.84 | 67.25 |
ContextNet | 2018 | None | 0.85/1.01 | 80 | 65.9/66.1 | 66.61 |
DABNet | 2019 | None | 0.76/0.75 | 140 | n.a./70.1 | 70.78 |
DDRNet | 2021 | None | 5.7/5.54 | 233 | 77.8/77.4 | 74.34 |
DFANet | 2019 | XceptionA | 7.8/3.05 | 60 | 71.9/71.3 | 65.28 |
EDANet | 2018 | None | 0.68/0.69 | 125 | n.a./67.3 | 70.76 |
ENet | 2016 | None | 0.37/0.37 | 140 | n.a./58.3 | 71.31 |
ERFNet | 2017 | None | 2.06/2.07 | 60 | 70.0/68.0 | 76.00 |
ESNet | 2019 | None | 1.66/1.66 | 66 | n.a./70.7 | 71.82 |
ESPNet | 2018 | None | 0.36/0.38 | 111 | n.a./60.3 | 66.39 |
ESPNetv2 | 2018 | None | 1.25/0.86 | 101 | 66.4/66.2 | 70.35 |
FANet | 2020 | ResNet18 | n.a./12.26 | 100 | 75.0/74.4 | 74.92 |
FarseeNet | 2020 | ResNet18 | n.a./16.75 | 130 | 73.5/70.2 | 77.35 |
FastSCNN | 2019 | None | 1.11/1.02 | 358 | 68.6/68.0 | 69.37 |
FDDWNet | 2019 | None | 0.80/0.77 | 51 | n.a./71.5 | 75.86 |
FPENet | 2019 | None | 0.38/0.36 | 90 | n.a./70.1 | 72.05 |
FSSNet | 2018 | None | 0.2/0.20 | 121 | n.a./58.8 | 65.44 |
ICNet | 2017 | ResNet18 | 26.55/12.42 | 102 | 67.75/69.55 | 69.65 |
LEDNet | 2019 | None | 0.94/1.46 | 76 | n.a./70.6 | 72.63 |
LinkNet | 2017 | ResNet18 | 11.5/11.54 | 106 | n.a./76.4 | 73.39 |
Lite-HRNet | 2021 | None | 1.1/1.09 | 30 | 73.8/72.8 | 70.66 |
LiteSeg | 2019 | MobileNetv2 | 4.38/4.29 | 117 | 70.0/67.8 | 76.10 |
MiniNet | 2019 | None | 3.1/1.41 | 254 | n.a./40.7 | 61.47 |
MiniNetv2 | 2020 | None | 0.5/0.51 | 86 | n.a./70.5 | 71.79 |
PP-LiteSeg | 2022 | STDC1 | n.a./6.33 | 201 | 76.0/74.9 | 72.49 |
PP-LiteSeg | 2022 | STDC2 | n.a./10.56 | 136 | 78.2/77.5 | 74.37 |
RegSeg | 2021 | None | 3.34/3.37 | 104 | 78.5/78.3 | 74.28 |
SegNet | 2015 | None | 29.46/29.48 | 14 | n.a./56.1 | 70.77 |
ShelfNet | 2018 | ResNet18 | 23.5/16.04 | 110 | n.a./74.8 | 77.63 |
SQNet | 2016 | SqueezeNet-1.1 | n.a./4.81 | 69 | n.a./59.8 | 69.55 |
STDC | 2021 | STDC1 | n.a./7.79 | 163 | 74.5/75.3 | 75.256 |
STDC | 2021 | STDC2 | n.a./11.82 | 119 | 77.0/76.8 | 76.786 |
SwiftNet | 2019 | ResNet18 | 11.8/11.95 | 141 | 75.4/75.5 | 75.43 |
[1FPSs are evaluated on RTX 2080 at resolution 1024x512 using this script. Please note that FPSs vary between devices and hardwares and also depend on other factors (e.g. whether to use cudnn or not). To obtain accurate FPSs, please test them on your device accordingly.]
[2These results are obtained by training 800 epochs with crop-size 1024x1024]
[3These results are obtained by using auxiliary heads]
[4This result is obtained by using deeper model, i.e. CGNet_M3N21]
[5The original encoder of ICNet is ResNet50]
[6In my experiments, detail loss does not improve the performances. However, using auxiliary heads does contribute to the improvements]
Decoder | Params (M) | mIoU (200 epoch) | mIoU (800 epoch) |
---|---|---|---|
DeepLabv3 | 15.90 | 75.22 | 77.16 |
DeepLabv3Plus | 12.33 | 73.97 | 75.90 |
FPN | 13.05 | 73.44 | 74.94 |
LinkNet | 11.66 | 71.17 | 73.19 |
MANet | 21.68 | 74.59 | 76.14 |
PAN | 11.37 | 70.25 | 72.46 |
PSPNet | 11.41 | 61.63 | 67.26 |
UNet | 14.33 | 72.99 | 74.45 |
UNetPlusPlus | 15.97 | 74.31 | 75.57 |
[For comparison, the above results are all using ResNet-18 as encoders.]
Model | Encoder | Decoder | kd_training | mIoU(200 epoch) | mIoU(800 epoch) |
---|---|---|---|---|---|
SMP | DeepLabv3Plus | ResNet-101 teacher |
- | 78.10 | 79.20 |
SMP | DeepLabv3Plus | ResNet-18 student |
False | 73.97 | 75.90 |
SMP | DeepLabv3Plus | ResNet-18 student |
True | 75.20 | 76.41 |
/Cityscapes
/gtFine
/leftImg8bit
Footnotes
-
ADSCNet: asymmetric depthwise separable convolution for semantic segmentation in real-time ↩
-
AGLNet: Towards real-time semantic segmentation of self-driving images via attention-guided lightweight network ↩
-
BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation ↩
-
BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation ↩
-
CFPNet: Channel-wise Feature Pyramid for Real-Time Semantic Segmentation ↩
-
CGNet: A Light-weight Context Guided Network for Semantic Segmentation ↩
-
ContextNet: Exploring Context and Detail for Semantic Segmentation in Real-time ↩
-
DABNet: Depth-wise Asymmetric Bottleneck for Real-time Semantic Segmentation ↩
-
Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes ↩
-
DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation ↩
-
Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation ↩
-
ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation ↩
-
ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation ↩
-
ESNet: An Efficient Symmetric Network for Real-time Semantic Segmentation ↩
-
ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation ↩
-
ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network ↩
-
FarSee-Net: Real-Time Semantic Segmentation by Efficient Multi-scale Context Aggregation and Feature Space Super-resolution ↩
-
FDDWNet: A Lightweight Convolutional Neural Network for Real-time Sementic Segmentation ↩
-
Feature Pyramid Encoding Network for Real-time Semantic Segmentation ↩
-
ICNet for Real-Time Semantic Segmentation on High-Resolution Images ↩
-
LEDNet: A Lightweight Encoder-Decoder Network for Real-Time Semantic Segmentation ↩
-
LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation ↩
-
LiteSeg: A Novel Lightweight ConvNet for Semantic Segmentation ↩
-
Enhancing V-SLAM Keyframe Selection with an Efficient ConvNet for Semantic Analysis ↩
-
MiniNet: An Efficient Semantic Segmentation ConvNet for Real-Time Robotic Applications ↩
-
PP-LiteSeg: A Superior Real-Time Semantic Segmentation Model ↩
-
Rethinking Dilated Convolution for Real-time Semantic Segmentation ↩
-
SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation ↩
-
In Defense of Pre-trained ImageNet Architectures for Real-time Semantic Segmentation of Road-driving Images ↩