ydhonghit / ddrnet Goto Github PK
View Code? Open in Web Editor NEWThe official implementation of "Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes"
License: MIT License
The official implementation of "Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes"
License: MIT License
I've tried to use your code and pretrained models for an image segmentation. However, it turns out that files best_val.pth
and best_val_smaller.pth
from the Google Drive are compatible with some unknown class with fields model
and seghead_extra
rather than DualResNet.
Currently I load pretrained weight with the following code:
def fix_snapshot(state_dict, prefix='model.'):
return {key[len(prefix):]: weight
for key, weight in state_dict.items()
if key.startswith(prefix)}
from DDRNet.segmentation.DDRNet_23_slim import DualResNet_imagenet # or DDRNet_23
net = DualResNet_imagenet()
state_dict = torch.load('weights/best_val_smaller.pth') # or best_val.pth
state_dict = fix_snapshot(state_dict)
net.load_state_dict(state_dict, strict=False) # None missing keys, a lot of unexpected ones
and it works quite fine, but it doesn't seem right to me.
Could you please update pretrained models or provide code of that class?
(There is no such problem for classification task models)
This model is extremely fast & accurate. I've tried so many methods to make a model run fast enough with the same accuracy, like lite backbone, shrink the PPM or ASPP module, or cut some channels... But actuallly those changes decay the accuracy by a large scale...
Luckily I saw your work in paperwithcode, and I notice the super-high fps along with the amazingly high IoU you achive. So I think, why not try it in my project?
And here it is, the DDRNet improves the task fps from my original model's 2.5 to epically 16! What shocks me most is that the IoU even get 1 point higher, O.M.G!!!
Many great thanks to your work, your DDRNet makes me aware that there still exists huge improvements potential in the filed of semantic segmentaion's speed & accuracy.
Hope to see more!
Thank you very much for your great contribution!
I have a question about the crop augmentation that you used during training. In the paper you say that you cropped the cityscapes images to 1024x1024 during training. Considering this how did you inference on the full resolution images (2048x1024) to get the benchmark results? Did you feed in 2 crops (left and right 1024x1024 crop) into the model with same size as during training and merged both into the full image afterwards or did you feed in the full resolution image during inferencing, although training was done with smaller image crops?
Hi, thanks for your share! when I trained the model on cityscapes using my train code. but The test results of the val set and the test set are much different,the val mIoU is close to the paper ,but the test set mIoU is very low,why ?
Hi, thank you for sharing your great work.
May you please share the pre-trained model on the COCO dataset also?
Hello,
Thank you for your work. I was wondering how do I use the files included to perform tests?
Let's say I want to use DDRNet_23.py, how do I give it an input of a photo to perform semantic segmentation on?
Hi, nice work first of all!
I stumbled across the relu activations in the code:
x = x + self.down3(self.relu(x_)) x_ = x_ + F.interpolate( self.compression3(self.relu(layers[2])), size=[height_output, width_output], mode='bilinear')
DDRNet/segmentation/DDRNet_23_slim.py
Line 312 in ba659f9
Is there a googledrive /dropbox /onedrive link available for trained models?
您好,看到了您在回复其他人问题的时候说,您是把训练和验证集合成了大的训练集(train+val=2975+500)去训练的,您有没有尝试过仅仅使用原始训练集train(2975)去做训练,我仅使用训练集train,ddrnet_23_silm的miou仅能达到74左右
我用作者保存的训练好的ddrnet23-slim, 跑了cityscapes测试集,上传到官网,得到的结果是76.38, 而不是77.44, 是我漏掉了什么设置吗?有挺多模型都是这样得不到论文中的精度。
实际应用效果真的不错,又快又好,我们公司线上就用了这个
What should be the kernel sizes+strides for DAPPM module for small image sizes? and the number of spp planes?
Thanks :)
Hi,the third code dosen‘t work on my PC,when will you please upload you whole project?thx
Hi,
Thanks for your work and providing code. I am just confuse about the mIoU reported in the paper on cityscapes validation dataset which is around 77%. Kindly let me know is this mIoU achieved on pretrained model (imagenet)? I am training DDR-Net-Slim23 from scratch and I am getting around 55% mIoU on validation data of Cityscapes.
你好,想问一下为什么输入图片大小为512×512和768×1536时速度基本相当?
你好作者,我在2080ti上运行了您提供的测速代码,结果只有50多fps,请问这个正常吗,是设备问题还是其他
I train the DDRNet_23_silm and use the pretrained model but I get a wrong result, the mIoU only 58%, too low than your result in paper.I also find the loss decline is slow, the best loss is 0.9.
I use a single GPU, lr:3e-3, epochs:500, optim:SGD, loss:ohemceloss
@ydhongHIT I downloaded pretrained model for ddrnet23(cityscapes). I want to try this model on two images. As you see in below code, I resized images to (1024,2048,3) like mentioned in paper. The get_seg_model
defined in somewhere else but I did not copy it to here.
import matplotlib.pyplot as plt
import cv2
import torch
import numpy as np
img = cv2.imread('E:/deneme_alani_iha/uavid_data/npy_deneme/ddrNet_city/ss.png')
img2 = cv2.imread('E:/deneme_alani_iha/uavid_data/npy_deneme/ddrNet_city/ss2.png')
img = cv2.resize(img,(2048,1024))
img2 = cv2.resize(img2,(2048,1024))
img = img.reshape(3,2048,1024)
img2 = img2.reshape(3,2048,1024)
data = [img,img2]
data = np.array(data)
data = torch.Tensor(data)
#%%
model = get_seg_model(1)
with torch.no_grad():
output = model(data)
#%%
plt.imshow(output[0].cpu().detach().numpy().reshape(128,256,19)[:,:,0])
#%%
def eight2twoconverter(labels, size):
colors = [ (150,120,90), (153,153,153), (153,153,153), (250,170,30), (220,220,0), (107,142,35), (152,251,152), ( 70,130,180), (220,20,60), (255,0,0), ( 0,0,142), ( 0,0,70), ( 0,60,100), ( 0,0,90), ( 0,0,110), ( 0,80,100), ( 0,0,230), (119,11,32), ( 0,0,142)]
b = np.zeros(( size, 256,3))
for i in range(b.shape[0]):
for j in range(b.shape[1]):
indexx = np.argmax(labels[i,j,:])
b[i,j] = colors[indexx]
return b
#%%
out = eight2twoconverter(output[0].cpu().detach().numpy().reshape(128,256,19), 128)
plt.imshow(out)
I gave correct path and num_classes parameter to this function.
def DualResNet_imagenet(pretrained=True):
model = DualResNet(BasicBlock, [2, 2, 2, 2], num_classes=19, planes=32, spp_planes=128, head_planes=64, augment=False)
if pretrained:
checkpoint = torch.load("E:/deneme_alani_iha/uavid_data/npy_deneme/best_val_smaller.pth", map_location='cpu')
new_state_dict = OrderedDict()
for k, v in checkpoint['state_dict'].items():
name = k[7:]
new_state_dict[name] = v
model_dict.update(new_state_dict)
model.load_state_dict(model_dict)
model.load_state_dict(new_state_dict, strict = False)
return model
As you see above output of the network is some thing like a random picture. How can I solve it?
看到论文已被接受,可否将完整工程代码放出来
Hi. Tensorflow implementation would be good. Do you guys have a plan about implement this model to Tensorflow platform?
Hi, recently I've read the paper, It's an exciting work that achieved the state-of-the-art performance on the cityscape.
First of all, thanks for sharing the work, I wanna know have you ever did its TensorRT version. If did, can you share it?
I am using your model to do segmentation, and its performance is really amazing. However, the inference speed is not ideal. I noticed that you merged conv2d and batchnorm during inference. This is probably the potential reason for low speed. However, after I started to work on this, I realized this is not easy work. Could you provide us with the source code for merging conv2d and batchnorm? Thanks in advance!
@ydhongHIT ,您好,想請問你們會公開訓練的code嗎?想嘗試來訓練自己的訓練集!
謝謝您!!
could you upload the pretrained model on imagenet. thanks!
Hi
Going through your code, I cannot seem to find these two classes which are referred to in DDRNET_39.py
These do not exist in the 23 versions so this might be a mistake?
作者您好!非常感谢您的工作!注意到您在微调之前使用了imagenet数据集进行预训练,但是在论文中并未给出使用imagenet预训练给模型的分割精度带来了多少的提升,我想请问一下,使用imagenet预训练大概能给分割精度带来多少的提升呢?
Hi, bro, What is training resolution do you use?
请问辅助损失使用的也是OHEM吗?貌似论文消融实验部分显示辅助损失没有提点效果?
Dera Yuanduo
Thanks very much for your perfect Network.
I have a question about the structure of the DDRNet.
In your program code, the layer of the bottleneck block in the low-resolution branch was made with stride=2, I think after this bottleneck block, the output size will become 1/2.
And in your paper, as the image below shows, in the conv5_1 of the low-resolution branch, one Residual basic block (stride=2) and one bottleneck block(stride=2) have been used. Why does the output size change from14x14 to 7x7 after two blocks whose stride is both equal to 2?
I would appreciate it if you would answer my question.
hi,thanks for your work and sharing.
i download the DDRNet_23_slim pre-trained weights on camvid datasets. However, it can't parse images and obtain accurate semantic segmentation results by normalize with mean_std = ([0.406, 0.456, 0.485], [0.225, 0.224, 0.229])
. Could you tell me how to evaluate with this model.
Thank you a lot!
Congratulations and thank you for sharing this work. Unfortunately, I was not able to reproduce the DDRNet-39 performance on cityscapes twice with the paper settings.
Is it possible to provide the weights for DDRNet-39 model on cityscapes, which achieves ~80% mIOU%?
Thanks
Is resolution of segmentation output map 1/8 of input image resolution?
How can I get the same resolution of segmentation output map as the input image resolution?
Thanks,
请问这是正常的吗,有什么改善的方法没
if the scale of input is 256256, should I rescale it to a large size? Scale 10241024 or something.
你好,我修改了DDRNet网络,因此想重新使用ImageNet进行预训练,但是怕ImageNet预训练这里出现问题导致模型效果受影响
您能否上传一下DDRNet的在ImageNet的预训练代码呢?
谢谢
I build the "DDRNet23 Slim" model that you provided. I have images with shape of 1024 H x 1024 W x 3 C. There are 8 different classes in my dataset. When I check model summary with summary(net.cuda(),(3,1024,1024))
, I get model summary like:
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 32, 512, 512] 896
BatchNorm2d-2 [-1, 32, 512, 512] 64
ReLU-3 [-1, 32, 512, 512] 0
Conv2d-4 [-1, 32, 256, 256] 9,248
BatchNorm2d-5 [-1, 32, 256, 256] 64
ReLU-6 [-1, 32, 256, 256] 0
Conv2d-7 [-1, 32, 256, 256] 9,216
BatchNorm2d-8 [-1, 32, 256, 256] 64
ReLU-9 [-1, 32, 256, 256] 0
Conv2d-10 [-1, 32, 256, 256] 9,216
BatchNorm2d-11 [-1, 32, 256, 256] 64
ReLU-12 [-1, 32, 256, 256] 0
BasicBlock-13 [-1, 32, 256, 256] 0
Conv2d-14 [-1, 32, 256, 256] 9,216
BatchNorm2d-15 [-1, 32, 256, 256] 64
ReLU-16 [-1, 32, 256, 256] 0
Conv2d-17 [-1, 32, 256, 256] 9,216
BatchNorm2d-18 [-1, 32, 256, 256] 64
BasicBlock-19 [-1, 32, 256, 256] 0
ReLU-20 [-1, 32, 256, 256] 0
Conv2d-21 [-1, 64, 128, 128] 18,432
BatchNorm2d-22 [-1, 64, 128, 128] 128
ReLU-23 [-1, 64, 128, 128] 0
Conv2d-24 [-1, 64, 128, 128] 36,864
BatchNorm2d-25 [-1, 64, 128, 128] 128
Conv2d-26 [-1, 64, 128, 128] 2,048
BatchNorm2d-27 [-1, 64, 128, 128] 128
ReLU-28 [-1, 64, 128, 128] 0
BasicBlock-29 [-1, 64, 128, 128] 0
Conv2d-30 [-1, 64, 128, 128] 36,864
BatchNorm2d-31 [-1, 64, 128, 128] 128
ReLU-32 [-1, 64, 128, 128] 0
Conv2d-33 [-1, 64, 128, 128] 36,864
BatchNorm2d-34 [-1, 64, 128, 128] 128
BasicBlock-35 [-1, 64, 128, 128] 0
ReLU-36 [-1, 64, 128, 128] 0
Conv2d-37 [-1, 128, 64, 64] 73,728
BatchNorm2d-38 [-1, 128, 64, 64] 256
ReLU-39 [-1, 128, 64, 64] 0
Conv2d-40 [-1, 128, 64, 64] 147,456
BatchNorm2d-41 [-1, 128, 64, 64] 256
Conv2d-42 [-1, 128, 64, 64] 8,192
BatchNorm2d-43 [-1, 128, 64, 64] 256
ReLU-44 [-1, 128, 64, 64] 0
BasicBlock-45 [-1, 128, 64, 64] 0
Conv2d-46 [-1, 128, 64, 64] 147,456
BatchNorm2d-47 [-1, 128, 64, 64] 256
ReLU-48 [-1, 128, 64, 64] 0
Conv2d-49 [-1, 128, 64, 64] 147,456
BatchNorm2d-50 [-1, 128, 64, 64] 256
BasicBlock-51 [-1, 128, 64, 64] 0
ReLU-52 [-1, 64, 128, 128] 0
Conv2d-53 [-1, 64, 128, 128] 36,864
BatchNorm2d-54 [-1, 64, 128, 128] 128
ReLU-55 [-1, 64, 128, 128] 0
Conv2d-56 [-1, 64, 128, 128] 36,864
BatchNorm2d-57 [-1, 64, 128, 128] 128
ReLU-58 [-1, 64, 128, 128] 0
BasicBlock-59 [-1, 64, 128, 128] 0
Conv2d-60 [-1, 64, 128, 128] 36,864
BatchNorm2d-61 [-1, 64, 128, 128] 128
ReLU-62 [-1, 64, 128, 128] 0
Conv2d-63 [-1, 64, 128, 128] 36,864
BatchNorm2d-64 [-1, 64, 128, 128] 128
BasicBlock-65 [-1, 64, 128, 128] 0
ReLU-66 [-1, 64, 128, 128] 0
Conv2d-67 [-1, 128, 64, 64] 73,728
BatchNorm2d-68 [-1, 128, 64, 64] 256
ReLU-69 [-1, 128, 64, 64] 0
Conv2d-70 [-1, 64, 64, 64] 8,192
BatchNorm2d-71 [-1, 64, 64, 64] 128
ReLU-72 [-1, 128, 64, 64] 0
Conv2d-73 [-1, 256, 32, 32] 294,912
BatchNorm2d-74 [-1, 256, 32, 32] 512
ReLU-75 [-1, 256, 32, 32] 0
Conv2d-76 [-1, 256, 32, 32] 589,824
BatchNorm2d-77 [-1, 256, 32, 32] 512
Conv2d-78 [-1, 256, 32, 32] 32,768
BatchNorm2d-79 [-1, 256, 32, 32] 512
ReLU-80 [-1, 256, 32, 32] 0
BasicBlock-81 [-1, 256, 32, 32] 0
Conv2d-82 [-1, 256, 32, 32] 589,824
BatchNorm2d-83 [-1, 256, 32, 32] 512
ReLU-84 [-1, 256, 32, 32] 0
Conv2d-85 [-1, 256, 32, 32] 589,824
BatchNorm2d-86 [-1, 256, 32, 32] 512
BasicBlock-87 [-1, 256, 32, 32] 0
ReLU-88 [-1, 64, 128, 128] 0
Conv2d-89 [-1, 64, 128, 128] 36,864
BatchNorm2d-90 [-1, 64, 128, 128] 128
ReLU-91 [-1, 64, 128, 128] 0
Conv2d-92 [-1, 64, 128, 128] 36,864
BatchNorm2d-93 [-1, 64, 128, 128] 128
ReLU-94 [-1, 64, 128, 128] 0
BasicBlock-95 [-1, 64, 128, 128] 0
Conv2d-96 [-1, 64, 128, 128] 36,864
BatchNorm2d-97 [-1, 64, 128, 128] 128
ReLU-98 [-1, 64, 128, 128] 0
Conv2d-99 [-1, 64, 128, 128] 36,864
BatchNorm2d-100 [-1, 64, 128, 128] 128
BasicBlock-101 [-1, 64, 128, 128] 0
ReLU-102 [-1, 64, 128, 128] 0
Conv2d-103 [-1, 128, 64, 64] 73,728
BatchNorm2d-104 [-1, 128, 64, 64] 256
ReLU-105 [-1, 128, 64, 64] 0
Conv2d-106 [-1, 256, 32, 32] 294,912
BatchNorm2d-107 [-1, 256, 32, 32] 512
ReLU-108 [-1, 256, 32, 32] 0
Conv2d-109 [-1, 64, 32, 32] 16,384
BatchNorm2d-110 [-1, 64, 32, 32] 128
ReLU-111 [-1, 64, 128, 128] 0
Conv2d-112 [-1, 64, 128, 128] 4,096
BatchNorm2d-113 [-1, 64, 128, 128] 128
ReLU-114 [-1, 64, 128, 128] 0
Conv2d-115 [-1, 64, 128, 128] 36,864
BatchNorm2d-116 [-1, 64, 128, 128] 128
ReLU-117 [-1, 64, 128, 128] 0
Conv2d-118 [-1, 128, 128, 128] 8,192
BatchNorm2d-119 [-1, 128, 128, 128] 256
Conv2d-120 [-1, 128, 128, 128] 8,192
BatchNorm2d-121 [-1, 128, 128, 128] 256
Bottleneck-122 [-1, 128, 128, 128] 0
ReLU-123 [-1, 256, 32, 32] 0
Conv2d-124 [-1, 256, 32, 32] 65,536
BatchNorm2d-125 [-1, 256, 32, 32] 512
ReLU-126 [-1, 256, 32, 32] 0
Conv2d-127 [-1, 256, 16, 16] 589,824
BatchNorm2d-128 [-1, 256, 16, 16] 512
ReLU-129 [-1, 256, 16, 16] 0
Conv2d-130 [-1, 512, 16, 16] 131,072
BatchNorm2d-131 [-1, 512, 16, 16] 1,024
Conv2d-132 [-1, 512, 16, 16] 131,072
BatchNorm2d-133 [-1, 512, 16, 16] 1,024
Bottleneck-134 [-1, 512, 16, 16] 0
BatchNorm2d-135 [-1, 512, 16, 16] 1,024
ReLU-136 [-1, 512, 16, 16] 0
Conv2d-137 [-1, 128, 16, 16] 65,536
AvgPool2d-138 [-1, 512, 8, 8] 0
BatchNorm2d-139 [-1, 512, 8, 8] 1,024
ReLU-140 [-1, 512, 8, 8] 0
Conv2d-141 [-1, 128, 8, 8] 65,536
BatchNorm2d-142 [-1, 128, 16, 16] 256
ReLU-143 [-1, 128, 16, 16] 0
Conv2d-144 [-1, 128, 16, 16] 147,456
AvgPool2d-145 [-1, 512, 4, 4] 0
BatchNorm2d-146 [-1, 512, 4, 4] 1,024
ReLU-147 [-1, 512, 4, 4] 0
Conv2d-148 [-1, 128, 4, 4] 65,536
BatchNorm2d-149 [-1, 128, 16, 16] 256
ReLU-150 [-1, 128, 16, 16] 0
Conv2d-151 [-1, 128, 16, 16] 147,456
AvgPool2d-152 [-1, 512, 2, 2] 0
BatchNorm2d-153 [-1, 512, 2, 2] 1,024
ReLU-154 [-1, 512, 2, 2] 0
Conv2d-155 [-1, 128, 2, 2] 65,536
BatchNorm2d-156 [-1, 128, 16, 16] 256
ReLU-157 [-1, 128, 16, 16] 0
Conv2d-158 [-1, 128, 16, 16] 147,456
AdaptiveAvgPool2d-159 [-1, 512, 1, 1] 0
BatchNorm2d-160 [-1, 512, 1, 1] 1,024
ReLU-161 [-1, 512, 1, 1] 0
Conv2d-162 [-1, 128, 1, 1] 65,536
BatchNorm2d-163 [-1, 128, 16, 16] 256
ReLU-164 [-1, 128, 16, 16] 0
Conv2d-165 [-1, 128, 16, 16] 147,456
BatchNorm2d-166 [-1, 640, 16, 16] 1,280
ReLU-167 [-1, 640, 16, 16] 0
Conv2d-168 [-1, 128, 16, 16] 81,920
BatchNorm2d-169 [-1, 512, 16, 16] 1,024
ReLU-170 [-1, 512, 16, 16] 0
Conv2d-171 [-1, 128, 16, 16] 65,536
DAPPM-172 [-1, 128, 16, 16] 0
BatchNorm2d-173 [-1, 128, 128, 128] 256
ReLU-174 [-1, 128, 128, 128] 0
Conv2d-175 [-1, 64, 128, 128] 73,728
BatchNorm2d-176 [-1, 64, 128, 128] 128
ReLU-177 [-1, 64, 128, 128] 0
Conv2d-178 [-1, 8, 128, 128] 520
segmenthead-179 [-1, 8, 128, 128] 0
================================================================
Total params: 5,695,272
Trainable params: 5,695,272
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 12.00
Forward/backward pass size (MB): 1181.08
Params size (MB): 21.73
Estimated Total Size (MB): 1214.80
----------------------------------------------------------------
As you see that output layer become a 128 H x 128 W resolution but my labels are 1024H x 1024W shape. I can resize my labels from 1024 pixels to 128 pixel but this cause much loss of pixel information. Is this configuration correct for 1024 pixel input? Is the output necessary to be 128 pixel which is 1/8 scaled form of input? @ydhongHIT
Hello Yuanduo,
I think your achievement is really awesome. I am looking into your code and your model to develop more on it.
I was wondering whether the "best.pth" model is the trained model without Imagenet pre-training.
I remember I got the file from the google drive and it shows about 75.xx in validation set of Cityscapes.
Could you confirm it?
Hello, I noticed that the prediction result of your paper in the Camvid data set is marked with black ignore label. May I ask how you did it? If my prediction result only has 11 categories, there will be no such effect.
The baiduyun links of pre-trained models are dead!
Hi,
great project.
Do you have any plans for adding DDRNet support to mmsegmentation?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.