Giter Site home page Giter Site logo

zh320 / realtime-semantic-segmentation-pytorch Goto Github PK

View Code? Open in Web Editor NEW
56.0 2.0 10.0 11.11 MB

PyTorch implementation of over 30 realtime semantic segmentations models, e.g. BiSeNetv1, BiSeNetv2, CGNet, ContextNet, DABNet, DDRNet, EDANet, ENet, ERFNet, ESPNet, ESPNetv2, FastSCNN, ICNet, LEDNet, LinkNet, PP-LiteSeg, SegNet, ShelfNet, STDC, SwiftNet, and support knowledge distillation, distributed training etc.

License: Apache License 2.0

Python 100.00%
distributed-training enet semantic-segmentation cityscapes real-time pytorch knowledge-distillation

realtime-semantic-segmentation-pytorch's Introduction

Introduction

PyTorch implementation of realtime semantic segmentation models, support multi-gpu training and validating, automatic mixed precision training, knowledge distillation etc.

Requirements

torch == 1.8.1
segmentation-models-pytorch
torchmetrics
albumentations
loguru
tqdm

Supported models

If you want to use encoder-decoder structure with pretrained encoders, you may refer to: segmentation-models-pytorch38. This repo also provides easy access to SMP. Just modify the config file to (e.g. if you want to train DeepLabv3Plus with ResNet-101 backbone as teacher model to perform knowledge distillation)

self.model = 'smp'
self.encoder = 'resnet101'
self.decoder = 'deeplabv3p'

or use command-line arguments

python main.py --model smp --encoder resnet101 --decoder deeplabv3p

Details of the configurations can also be found in this file.

Knowledge Distillation

Currently only support the original knowledge distillation method proposed by Geoffrey Hinton.39

How to use

DDP training (recommend)

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 main.py

DP training

CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py

Performances and checkpoints

full resolution on Cityscapes

Model Year Encoder Params(M)
paper/my
FPS1 mIoU(paper)
val/test
mIoU(my) val2
ADSCNet 2019 None n.a./0.51 89 n.a./67.5 69.06
AGLNet 2020 None 1.12/1.02 61 69.39/70.1 73.58
BiSeNetv1 2018 ResNet18 49.0/13.32 88 74.8/74.7 74.91
BiSeNetv2 2020 None n.a./2.27 142 73.4/72.6 73.733
CANet 2019 MobileNetv2 4.8/4.77 76 73.4/73.5 76.59
CFPNet 2021 None 0.55/0.27 64 n.a./70.1 70.08
CGNet 2018 None 0.41/0.24 157 59.7/64.84 67.25
ContextNet 2018 None 0.85/1.01 80 65.9/66.1 66.61
DABNet 2019 None 0.76/0.75 140 n.a./70.1 70.78
DDRNet 2021 None 5.7/5.54 233 77.8/77.4 74.34
DFANet 2019 XceptionA 7.8/3.05 60 71.9/71.3 65.28
EDANet 2018 None 0.68/0.69 125 n.a./67.3 70.76
ENet 2016 None 0.37/0.37 140 n.a./58.3 71.31
ERFNet 2017 None 2.06/2.07 60 70.0/68.0 76.00
ESNet 2019 None 1.66/1.66 66 n.a./70.7 71.82
ESPNet 2018 None 0.36/0.38 111 n.a./60.3 66.39
ESPNetv2 2018 None 1.25/0.86 101 66.4/66.2 70.35
FANet 2020 ResNet18 n.a./12.26 100 75.0/74.4 74.92
FarseeNet 2020 ResNet18 n.a./16.75 130 73.5/70.2 77.35
FastSCNN 2019 None 1.11/1.02 358 68.6/68.0 69.37
FDDWNet 2019 None 0.80/0.77 51 n.a./71.5 75.86
FPENet 2019 None 0.38/0.36 90 n.a./70.1 72.05
FSSNet 2018 None 0.2/0.20 121 n.a./58.8 65.44
ICNet 2017 ResNet18 26.55/12.42 102 67.75/69.55 69.65
LEDNet 2019 None 0.94/1.46 76 n.a./70.6 72.63
LinkNet 2017 ResNet18 11.5/11.54 106 n.a./76.4 73.39
Lite-HRNet 2021 None 1.1/1.09 30 73.8/72.8 70.66
LiteSeg 2019 MobileNetv2 4.38/4.29 117 70.0/67.8 76.10
MiniNet 2019 None 3.1/1.41 254 n.a./40.7 61.47
MiniNetv2 2020 None 0.5/0.51 86 n.a./70.5 71.79
PP-LiteSeg 2022 STDC1 n.a./6.33 201 76.0/74.9 72.49
PP-LiteSeg 2022 STDC2 n.a./10.56 136 78.2/77.5 74.37
RegSeg 2021 None 3.34/3.37 104 78.5/78.3 74.28
SegNet 2015 None 29.46/29.48 14 n.a./56.1 70.77
ShelfNet 2018 ResNet18 23.5/16.04 110 n.a./74.8 77.63
SQNet 2016 SqueezeNet-1.1 n.a./4.81 69 n.a./59.8 69.55
STDC 2021 STDC1 n.a./7.79 163 74.5/75.3 75.256
STDC 2021 STDC2 n.a./11.82 119 77.0/76.8 76.786
SwiftNet 2019 ResNet18 11.8/11.95 141 75.4/75.5 75.43

[1FPSs are evaluated on RTX 2080 at resolution 1024x512 using this script. Please note that FPSs vary between devices and hardwares and also depend on other factors (e.g. whether to use cudnn or not). To obtain accurate FPSs, please test them on your device accordingly.]
[2These results are obtained by training 800 epochs with crop-size 1024x1024]
[3These results are obtained by using auxiliary heads]
[4This result is obtained by using deeper model, i.e. CGNet_M3N21]
[5The original encoder of ICNet is ResNet50]
[6In my experiments, detail loss does not improve the performances. However, using auxiliary heads does contribute to the improvements]

SMP performance on Cityscapes

Decoder Params (M) mIoU (200 epoch) mIoU (800 epoch)
DeepLabv3 15.90 75.22 77.16
DeepLabv3Plus 12.33 73.97 75.90
FPN 13.05 73.44 74.94
LinkNet 11.66 71.17 73.19
MANet 21.68 74.59 76.14
PAN 11.37 70.25 72.46
PSPNet 11.41 61.63 67.26
UNet 14.33 72.99 74.45
UNetPlusPlus 15.97 74.31 75.57

[For comparison, the above results are all using ResNet-18 as encoders.]

Knowledge distillation

Model Encoder Decoder kd_training mIoU(200 epoch) mIoU(800 epoch)
SMP DeepLabv3Plus ResNet-101
teacher
- 78.10 79.20
SMP DeepLabv3Plus ResNet-18
student
False 73.97 75.90
SMP DeepLabv3Plus ResNet-18
student
True 75.20 76.41

Prepare the dataset

/Cityscapes
    /gtFine
    /leftImg8bit

References

Footnotes

  1. ADSCNet: asymmetric depthwise separable convolution for semantic segmentation in real-time

  2. AGLNet: Towards real-time semantic segmentation of self-driving images via attention-guided lightweight network

  3. BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation

  4. BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation

  5. Cross Attention Network for Semantic Segmentation

  6. CFPNet: Channel-wise Feature Pyramid for Real-Time Semantic Segmentation

  7. CGNet: A Light-weight Context Guided Network for Semantic Segmentation

  8. ContextNet: Exploring Context and Detail for Semantic Segmentation in Real-time

  9. DABNet: Depth-wise Asymmetric Bottleneck for Real-time Semantic Segmentation

  10. Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes

  11. DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation

  12. Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation

  13. ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation

  14. ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation

  15. ESNet: An Efficient Symmetric Network for Real-time Semantic Segmentation

  16. ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation

  17. ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network

  18. Real-time Semantic Segmentation with Fast Attention

  19. FarSee-Net: Real-Time Semantic Segmentation by Efficient Multi-scale Context Aggregation and Feature Space Super-resolution

  20. Fast-SCNN: Fast Semantic Segmentation Network

  21. FDDWNet: A Lightweight Convolutional Neural Network for Real-time Sementic Segmentation

  22. Feature Pyramid Encoding Network for Real-time Semantic Segmentation

  23. Fast Semantic Segmentation for Scene Perception

  24. ICNet for Real-Time Semantic Segmentation on High-Resolution Images

  25. LEDNet: A Lightweight Encoder-Decoder Network for Real-Time Semantic Segmentation

  26. LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation

  27. Lite-HRNet: A Lightweight High-Resolution Network

  28. LiteSeg: A Novel Lightweight ConvNet for Semantic Segmentation

  29. Enhancing V-SLAM Keyframe Selection with an Efficient ConvNet for Semantic Analysis

  30. MiniNet: An Efficient Semantic Segmentation ConvNet for Real-Time Robotic Applications

  31. PP-LiteSeg: A Superior Real-Time Semantic Segmentation Model

  32. Rethinking Dilated Convolution for Real-time Semantic Segmentation

  33. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

  34. ShelfNet for Fast Semantic Segmentation

  35. Speeding up Semantic Segmentation for Autonomous Driving

  36. Rethinking BiSeNet For Real-time Semantic Segmentation

  37. In Defense of Pre-trained ImageNet Architectures for Real-time Semantic Segmentation of Road-driving Images

  38. segmentation-models-pytorch

  39. Distilling the Knowledge in a Neural Network

realtime-semantic-segmentation-pytorch's People

Contributors

zh320 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

realtime-semantic-segmentation-pytorch's Issues

建议

权重可以搞个网盘吗,git项目太大,很难拉下来

I can't run this code with dataset CamVid

I wrote my own code to load the Camvid dataset, but when I use the model BisenetV2, it reports an error that the image size does not match. My image input size is crop_w = 480, crop_h = 360

I can't run without smp

I can't run main.py because it tells me I'm missing the smp module, but I don't want to use smp

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.