Giter Site home page Giter Site logo

zqpei / dssd Goto Github PK

View Code? Open in Web Editor NEW
56.0 3.0 20.0 1.22 MB

Pytorch implementation of DSSD (Deconvolutional Single Shot Detector)

License: MIT License

Python 93.15% C++ 2.40% C 0.17% Cuda 3.27% Shell 1.02%
dssd object-detection one-stage one-shot-object-detection

dssd's Introduction

Hi there 👋, this is ZQPei's github.

Github Homepage Statistics

ZQPei's GitHub stats

HitsProfile views

dssd's People

Contributors

alexey-gruzdev avatar beibinli avatar huaizhengzhang avatar lufficc avatar tkhe avatar zqpei avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

dssd's Issues

Questiones about `decoder`

The forward pass in DSSD/dssd/modeling/decoder/decoder.py, the features[-2-level] updates in each circle, did it do better than use the ORIGINAL feature comes from ResNet.

Another question, is this repo modified from mask-rcnn? The code architecture looks so familiar.

I'm really appreciate your work. Thank you very much.

prior_box

Thank you for sharing the code. I would like to ask a question, do you use the prior box during the training?

How can I add new backbone network for DSSD

I try to add other network for backbone and modify the network but not work.
Please help to check how can I add this DSSD network.

`from torch import nn

from dssd.modeling import registry
from dssd.utils.model_zoo import load_state_dict_from_url
from torchsummary import summary

model_urls = {
'mobilenet_v2': 'https://download.pytorch.org/models/mobilenet_v2-b0353104.pth',
}

class ConvBNReLU(nn.Sequential):
def init(self, in_planes, out_planes, kernel_size=3, stride=1, groups=1):
padding = (kernel_size - 1) // 2
super(ConvBNReLU, self).init(
nn.Conv2d(in_planes, out_planes, kernel_size, stride, padding, groups=groups, bias=False),
nn.BatchNorm2d(out_planes),
nn.ReLU6(inplace=True)
)

class InvertedResidual(nn.Module):
def init(self, inp, oup, stride, expand_ratio):
super(InvertedResidual, self).init()
self.stride = stride
assert stride in [1, 2]

    hidden_dim = int(round(inp * expand_ratio))
    self.use_res_connect = self.stride == 1 and inp == oup

    layers = []
    if expand_ratio != 1:
        # pw
        layers.append(ConvBNReLU(inp, hidden_dim, kernel_size=1))
    layers.extend([
        # dw
        ConvBNReLU(hidden_dim, hidden_dim, stride=stride, groups=hidden_dim),
        # pw-linear
        nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
        nn.BatchNorm2d(oup),
    ])
    self.conv = nn.Sequential(*layers)

def forward(self, x):
    if self.use_res_connect:
        return x + self.conv(x)
    else:
        return self.conv(x)

class MobileNetV2(nn.Module):
def init(self, width_mult=1.0, inverted_residual_setting=None):
super(MobileNetV2, self).init()
block = InvertedResidual
input_channel = 32
last_channel = 1280

    if inverted_residual_setting is None:
        inverted_residual_setting = [
            # t, c, n, s
            [1, 16, 1, 1],
            [6, 24, 2, 2],
            [6, 32, 3, 2],
            [6, 64, 4, 2],
            [6, 96, 3, 1],
            [6, 160, 3, 2],
            [6, 320, 1, 1],
        ]

    # only check the first element, assuming user knows t,c,n,s are required
    if len(inverted_residual_setting) == 0 or len(inverted_residual_setting[0]) != 4:
        raise ValueError("inverted_residual_setting should be non-empty "
                         "or a 4-element list, got {}".format(inverted_residual_setting))

    # building first layer
    input_channel = int(input_channel * width_mult)
    self.last_channel = int(last_channel * max(1.0, width_mult))
    features = [ConvBNReLU(3, input_channel, stride=2)]
    # building inverted residual blocks
    for t, c, n, s in inverted_residual_setting:
        output_channel = int(c * width_mult)
        for i in range(n):
            stride = s if i == 0 else 1
            features.append(block(input_channel, output_channel, stride, expand_ratio=t))
            input_channel = output_channel
    # building last several layers
    features.append(ConvBNReLU(input_channel, self.last_channel, kernel_size=1))
    # make it nn.Sequential
    self.features = nn.Sequential(*features)
    self.extras = nn.ModuleList([
        InvertedResidual(1280, 512, 2, 0.2),
        InvertedResidual(512, 256, 2, 0.25),
        InvertedResidual(256, 256, 2, 0.5),
        InvertedResidual(256, 64, 2, 0.25)
    ])

    self.reset_parameters()

def reset_parameters(self):
    # weight initialization
    for m in self.modules():
        if isinstance(m, nn.Conv2d):
            nn.init.kaiming_normal_(m.weight, mode='fan_out')
            if m.bias is not None:
                nn.init.zeros_(m.bias)
        elif isinstance(m, nn.BatchNorm2d):
            nn.init.ones_(m.weight)
            nn.init.zeros_(m.bias)
        elif isinstance(m, nn.Linear):
            nn.init.normal_(m.weight, 0, 0.01)
            nn.init.zeros_(m.bias)

def forward(self, x):
    features = []
    for i in range(14):
        x = self.features[i](x)
    features.append(x)

    for i in range(14, len(self.features)):
        x = self.features[i](x)
    features.append(x)

    for i in range(len(self.extras)):
        x = self.extras[i](x)
        features.append(x)

    return tuple(features)

@registry.BACKBONES.register('mobilenet_v2')
def mobilenet_v2(cfg, pretrained=True):
model = MobileNetV2()
if pretrained:
model.load_state_dict(load_state_dict_from_url(model_urls['mobilenet_v2']), strict=False)
return model

if name == 'main':
darknet = MobileNetV2().cuda()
summary(darknet, (3,320,320))

`

RuntimeError: The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension 3
`import torch
import torch.nn as nn
import torch.nn.functional as F

from torchsummary import summary

from dssd.layers import L2Norm
from dssd.modeling import registry
from dssd.utils.model_zoo import load_state_dict_from_url

model_urls = {
'vgg': 'https://s3.amazonaws.com/amdegroot-models/vgg16_reducedfc.pth',
}

borrowed from https://github.com/amdegroot/ssd.pytorch/blob/master/ssd.py

def add_vgg(cfg, batch_norm=False):
layers = []
in_channels = 3
for v in cfg:
if v == 'M':
layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
elif v == 'C':
layers += [nn.MaxPool2d(kernel_size=2, stride=2, ceil_mode=True)]
else:
conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1)
if batch_norm:
layers += [conv2d, nn.BatchNorm2d(v), nn.ReLU(inplace=True)]
else:
layers += [conv2d, nn.ReLU(inplace=True)]
in_channels = v
pool5 = nn.MaxPool2d(kernel_size=3, stride=1, padding=1)
conv6 = nn.Conv2d(512, 1024, kernel_size=3, padding=6, dilation=6)
conv7 = nn.Conv2d(1024, 1024, kernel_size=1)
layers += [pool5, conv6,
nn.ReLU(inplace=True), conv7, nn.ReLU(inplace=True)]
return layers

def add_extras(cfg, i, size=300):
# Extra layers added to VGG for feature scaling
layers = []
in_channels = i
flag = False
for k, v in enumerate(cfg):
if in_channels != 'S':
if v == 'S':
layers += [nn.Conv2d(in_channels, cfg[k + 1], kernel_size=(1, 3)[flag], stride=2, padding=1)]
else:
layers += [nn.Conv2d(in_channels, v, kernel_size=(1, 3)[flag])]
flag = not flag
in_channels = v
if size == 512:
layers.append(nn.Conv2d(in_channels, 128, kernel_size=1, stride=1))
layers.append(nn.Conv2d(128, 256, kernel_size=4, stride=1, padding=1))
return layers

vgg_base = {
'300': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'C', 512, 512, 512, 'M',
512, 512, 512],
'512': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'C', 512, 512, 512, 'M',
512, 512, 512],
}
extras_base = {
'300': [256, 'S', 512, 128, 'S', 256, 128, 256, 128, 256],
'512': [256, 'S', 512, 128, 'S', 256, 128, 'S', 256, 128, 'S', 256],
}

class VGG(nn.Module):
def init(self, cfg):
super(VGG, self).init()
size = cfg.INPUT.IMAGE_SIZE
vgg_config = vgg_base[str(size)]
extras_config = extras_base[str(size)]

    self.vgg = nn.ModuleList(add_vgg(vgg_config))
    self.extras = nn.ModuleList(add_extras(extras_config, i=1024, size=size))
    self.l2_norm = L2Norm(512, scale=20)
    self.reset_parameters()

def reset_parameters(self):
    for m in self.extras.modules():
        if isinstance(m, nn.Conv2d):
            nn.init.xavier_uniform_(m.weight)
            nn.init.zeros_(m.bias)

def init_from_pretrain(self, state_dict):
    self.vgg.load_state_dict(state_dict)

def forward(self, x):
    features = []
    for i in range(23):
        x = self.vgg[i](x)
    s = self.l2_norm(x)  # Conv4_3 L2 normalization
    features.append(s)

    # apply vgg up to fc7
    for i in range(23, len(self.vgg)):
        x = self.vgg[i](x)
    features.append(x)

    for k, v in enumerate(self.extras):
        x = F.relu(v(x), inplace=True)
        if k % 2 == 1:
            features.append(x)

    return tuple(features)

@registry.BACKBONES.register('vgg')
def vgg(cfg, pretrained=True):
model = VGG(cfg)
if pretrained:
model.init_from_pretrain(load_state_dict_from_url(model_urls['vgg']))
return model

if name == 'main':
darknet = VGG(300).cuda()
summary(darknet, (3,320,320))
`
RuntimeError: The size of tensor a (20) must match the size of tensor b (19) at non-singleton dimension 3

反卷积模块的 forward 好像有点问题

deconv_module.py 中,当 self.elementwise_type == "prod" 时,不应该返回 self.relu(y_deconv * y_conv)

def forward(self, x_deconv, x_conv):
        y_deconv = self.deconv_layer(x_deconv)
        y_conv = self.conv_layer(x_conv)
        if self.elementwise_type == "sum":
            return self.relu(y_deconv + y_conv)
        elif self.elementwise_type == "prod":
            return self.relu(y_deconv + y_conv)

How to run DSSD520

your github only have resnet101_dssd320_voc0712.yaml. Don't have 520
How to run DSSD520?
Thanks

Model Center Variance and Size Variance

I'm trying to run this DSSD implementation on BDD Dataset which has images of 720x1280.

I first started with input size 320 because a lot model parameters were defined for it. I'm trying to understand the center variance and size variance used in defaults.py. I could identify them being used in box_utils.py but can you please help me understand them?

  1. Are they model parameters or dependant on the choice of input? Given my choice of dataset running with input size 320 do I need to change them?

  2. Also if I provide my ground truth box co-ordinates (x1, y1, x2, y2) relative to the actual input image in the dataset (720x1280), they (gt_boxes) will be normalised as per the model input size (320) right?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.