Introduction

PyTorch implementation of realtime semantic segmentation models, support multi-gpu training and validating, automatic mixed precision training, knowledge distillation etc.

Requirements

torch == 1.8.1
segmentation-models-pytorch
torchmetrics
albumentations
loguru
tqdm

Supported models

If you want to use encoder-decoder structure with pretrained encoders, you may refer to: segmentation-models-pytorch³⁸. This repo also provides easy access to SMP. Just modify the config file to (e.g. if you want to train DeepLabv3Plus with ResNet-101 backbone as teacher model to perform knowledge distillation)

self.model = 'smp'
self.encoder = 'resnet101'
self.decoder = 'deeplabv3p'

or use command-line arguments

python main.py --model smp --encoder resnet101 --decoder deeplabv3p

Details of the configurations can also be found in this file.

Knowledge Distillation

Currently only support the original knowledge distillation method proposed by Geoffrey Hinton.³⁹

How to use

DDP training (recommend)

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 main.py

DP training

CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py

Performances and checkpoints

full resolution on Cityscapes

Model	Year	Encoder	Params(M) paper/my	FPS¹	mIoU(paper) val/test	mIoU(my) val²
ADSCNet	2019	None	n.a./0.51	89	n.a./67.5	69.06
AGLNet	2020	None	1.12/1.02	61	69.39/70.1	73.58
BiSeNetv1	2018	ResNet18	49.0/13.32	88	74.8/74.7	74.91
BiSeNetv2	2020	None	n.a./2.27	142	73.4/72.6	73.73³
CANet	2019	MobileNetv2	4.8/4.77	76	73.4/73.5	76.59
CFPNet	2021	None	0.55/0.27	64	n.a./70.1	70.08
CGNet	2018	None	0.41/0.24	157	59.7/64.8⁴	67.25
ContextNet	2018	None	0.85/1.01	80	65.9/66.1	66.61
DABNet	2019	None	0.76/0.75	140	n.a./70.1	70.78
DDRNet	2021	None	5.7/5.54	233	77.8/77.4	74.34
DFANet	2019	XceptionA	7.8/3.05	60	71.9/71.3	65.28
EDANet	2018	None	0.68/0.69	125	n.a./67.3	70.76
ENet	2016	None	0.37/0.37	140	n.a./58.3	71.31
ERFNet	2017	None	2.06/2.07	60	70.0/68.0	76.00
ESNet	2019	None	1.66/1.66	66	n.a./70.7	71.82
ESPNet	2018	None	0.36/0.38	111	n.a./60.3	66.39
ESPNetv2	2018	None	1.25/0.86	101	66.4/66.2	70.35
FANet	2020	ResNet18	n.a./12.26	100	75.0/74.4	74.92
FarseeNet	2020	ResNet18	n.a./16.75	130	73.5/70.2	77.35
FastSCNN	2019	None	1.11/1.02	358	68.6/68.0	69.37
FDDWNet	2019	None	0.80/0.77	51	n.a./71.5	75.86
FPENet	2019	None	0.38/0.36	90	n.a./70.1	72.05
FSSNet	2018	None	0.2/0.20	121	n.a./58.8	65.44
ICNet	2017	ResNet18	26.5⁵/12.42	102	67.7⁵/69.5⁵	69.65
LEDNet	2019	None	0.94/1.46	76	n.a./70.6	72.63
LinkNet	2017	ResNet18	11.5/11.54	106	n.a./76.4	73.39
Lite-HRNet	2021	None	1.1/1.09	30	73.8/72.8	70.66
LiteSeg	2019	MobileNetv2	4.38/4.29	117	70.0/67.8	76.10
MiniNet	2019	None	3.1/1.41	254	n.a./40.7	61.47
MiniNetv2	2020	None	0.5/0.51	86	n.a./70.5	71.79
PP-LiteSeg	2022	STDC1	n.a./6.33	201	76.0/74.9	72.49
PP-LiteSeg	2022	STDC2	n.a./10.56	136	78.2/77.5	74.37
RegSeg	2021	None	3.34/3.37	104	78.5/78.3	74.28
SegNet	2015	None	29.46/29.48	14	n.a./56.1	70.77
ShelfNet	2018	ResNet18	23.5/16.04	110	n.a./74.8	77.63
SQNet	2016	SqueezeNet-1.1	n.a./4.81	69	n.a./59.8	69.55
STDC	2021	STDC1	n.a./7.79	163	74.5/75.3	75.25⁶
STDC	2021	STDC2	n.a./11.82	119	77.0/76.8	76.78⁶
SwiftNet	2019	ResNet18	11.8/11.95	141	75.4/75.5	75.43

[¹FPSs are evaluated on RTX 2080 at resolution 1024x512 using this script. Please note that FPSs vary between devices and hardwares and also depend on other factors (e.g. whether to use cudnn or not). To obtain accurate FPSs, please test them on your device accordingly.]
[²These results are obtained by training 800 epochs with crop-size 1024x1024]
[³These results are obtained by using auxiliary heads]
[⁴This result is obtained by using deeper model, i.e. CGNet_M3N21]
[⁵The original encoder of ICNet is ResNet50]
[⁶In my experiments, detail loss does not improve the performances. However, using auxiliary heads does contribute to the improvements]

SMP performance on Cityscapes

Decoder	Params (M)	mIoU (200 epoch)	mIoU (800 epoch)
DeepLabv3	15.90	75.22	77.16
DeepLabv3Plus	12.33	73.97	75.90
FPN	13.05	73.44	74.94
LinkNet	11.66	71.17	73.19
MANet	21.68	74.59	76.14
PAN	11.37	70.25	72.46
PSPNet	11.41	61.63	67.26
UNet	14.33	72.99	74.45
UNetPlusPlus	15.97	74.31	75.57

[For comparison, the above results are all using ResNet-18 as encoders.]

Knowledge distillation

Model	Encoder	Decoder	kd_training	mIoU(200 epoch)	mIoU(800 epoch)
SMP	DeepLabv3Plus	ResNet-101 teacher	-	78.10	79.20
SMP	DeepLabv3Plus	ResNet-18 student	False	73.97	75.90
SMP	DeepLabv3Plus	ResNet-18 student	True	75.20	76.41

Prepare the dataset

/Cityscapes
    /gtFine
    /leftImg8bit

zh320 / realtime-semantic-segmentation-pytorch Goto Github PK

realtime-semantic-segmentation-pytorch's Introduction

Introduction

Requirements

Supported models

Knowledge Distillation

How to use

DDP training (recommend)

DP training

Performances and checkpoints

full resolution on Cityscapes

SMP performance on Cityscapes

Knowledge distillation

Prepare the dataset

References

Footnotes

realtime-semantic-segmentation-pytorch's People

Contributors

Stargazers

Watchers

Forkers

realtime-semantic-segmentation-pytorch's Issues

Recommend Projects

Recommend Topics

Recommend Org