Giter Site home page Giter Site logo

lzx1413 / pytorchssd Goto Github PK

View Code? Open in Web Editor NEW
709.0 22.0 238.0 724 KB

pytorch version of SSD and it's enhanced methods such as RFBSSD,FSSD and RefineDet

License: MIT License

Python 95.38% Shell 0.49% C++ 0.04% Cuda 1.34% C 2.76%
pytorch ssd fssd rfb refinedet

pytorchssd's Introduction

Pytorch SSD Series

Pytorch 4.1 is suppoted on branch 0.4 now.

Support Arc:

VOC2007 Test

System mAP FPS (Titan X Maxwell)
Faster R-CNN (VGG16) 73.2 7
YOLOv2 (Darknet-19) 78.6 40
R-FCN (ResNet-101) 80.5 9
SSD300* (VGG16) 77.2 46
SSD512* (VGG16) 79.8 19
RFBNet300 (VGG16) 80.5 83
RFBNet512 (VGG16) 82.2 38
SSD300 (VGG) 77.8 150 (1080Ti)
FSSD300 (VGG) 78.8 120 (1080Ti)

COCO

System test-dev mAP Time (Titan X Maxwell)
Faster R-CNN++ (ResNet-101) 34.9 3.36s
YOLOv2 (Darknet-19) 21.6 25ms
SSD300* (VGG16) 25.1 22ms
SSD512* (VGG16) 28.8 53ms
RetinaNet500 (ResNet-101-FPN) 34.4 90ms
RFBNet300 (VGG16) 29.9 15ms*
RFBNet512 (VGG16) 33.8 30ms*
RFBNet512-E (VGG16) 34.4 33ms*
SSD512 (HarDNet68) 31.7 TBD (12.9ms**)
SSD512 (HarDNet85) 35.1 TBD (15.9ms**)
RFBNet512 (HarDNet68) 33.9 TBD (16.7ms**)
RFBNet512 (HarDNet85) 36.8 TBD (19.3ms**)

Note: * The speed here is tested on the newest pytorch and cudnn version (0.2.0 and cudnnV6), which is obviously faster than the speed reported in the paper (using pytorch-0.1.12 and cudnnV5).

Note: ** HarDNet results are measured on Titan V with pytorch 1.0.1 for detection only (NMS is NOT included, which is 13~18ms in general cases). For reference, the measurement of SSD-vgg on the same environment is 15.7ms (also detection only).

MobileNet

System COCO minival mAP #parameters
SSD MobileNet 19.3 6.8M
RFB MobileNet 20.7* 7.4M

*: slightly better than the original ones in the paper (20.5).

Contents

  1. Installation
  2. Datasets
  3. Training
  4. Evaluation
  5. Models

Installation

  • Install PyTorch-0.2.0-0.3.1 by selecting your environment on the website and running the appropriate command.
  • Clone this repository. This repository is mainly based onRFBNet, ssd.pytorch and Chainer-ssd, a huge thank to them.
    • Note: We currently only support Python 3+.
  • Compile the nms and coco tools:
./make.sh

Note*: Check you GPU architecture support in utils/build.py, line 131. Default is:

'nvcc': ['-arch=sm_52',
  • Install pyinn for MobileNet backbone:
pip install git+https://github.com/szagoruyko/pyinn.git@master
  • Then download the dataset by following the instructions below and install opencv.
conda install opencv

Note: For training, we currently support VOC and COCO.

Datasets

To make things easy, we provide simple VOC and COCO dataset loader that inherits torch.utils.data.Dataset making it fully compatible with the torchvision.datasets API.

VOC Dataset

Download VOC2007 trainval & test
# specify a directory for dataset to be downloaded into, else default is ~/data/
sh data/scripts/VOC2007.sh # <directory>
Download VOC2012 trainval
# specify a directory for dataset to be downloaded into, else default is ~/data/
sh data/scripts/VOC2012.sh # <directory>

COCO Dataset

Install the MS COCO dataset at /path/to/coco from official website, default is ~/data/COCO. Following the instructions to prepare minival2014 and valminusminival2014 annotations. All label files (.json) should be under the COCO/annotations/ folder. It should have this basic structure

$COCO/
$COCO/cache/
$COCO/annotations/
$COCO/images/
$COCO/images/test2015/
$COCO/images/train2014/
$COCO/images/val2014/

UPDATE: The current COCO dataset has released new train2017 and val2017 sets which are just new splits of the same image sets.

Training

mkdir weights
cd weights
wget https://s3.amazonaws.com/amdegroot-models/vgg16_reducedfc.pth
  • To train RFBNet using the train script simply specify the parameters listed in train_RFB.py as a flag or manually change them.
python train_test.py -d VOC -v RFB_vgg -s 300 
  • Note:
    • -d: choose datasets, VOC or COCO.
    • -v: choose backbone version, RFB_VGG, RFB_E_VGG or RFB_mobile.
    • -s: image size, 300 or 512.
    • You can pick-up training from a checkpoint by specifying the path as one of the training parameters (again, see train_RFB.py for options)

Evaluation

The test frequency can be found in the train_test.py By default, it will directly output the mAP results on VOC2007 test or COCO minival2014. For VOC2012 test and COCO test-dev results, you can manually change the datasets in the test_RFB.py file, then save the detection results and submitted to the server.

Models

Update (Sep 29, 2019)

pytorchssd's People

Contributors

goatmessi7 avatar ividal avatar lzx1413 avatar pingolh avatar xiaojieli0903 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorchssd's Issues

Pre-trained model

Hi,

Do you provide the pre-trained model of SSD_VGG on COCO? Thank you!

训练COCO数据集时出错

Traceback (most recent call last):
File "train_test.py", line 501, in
train()
File "train_test.py", line 308, in train
collate_fn=detection_collate))
File "/home/phd/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 417, in iter
return DataLoaderIter(self)
File "/home/phd/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 234, in init
w.start()
File "/home/phd/anaconda3/lib/python3.6/multiprocessing/process.py", line 105, in start
self._popen = self._Popen(self)
File "/home/phd/anaconda3/lib/python3.6/multiprocessing/context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/home/phd/anaconda3/lib/python3.6/multiprocessing/context.py", line 277, in _Popen
return Popen(process_obj)
File "/home/phd/anaconda3/lib/python3.6/multiprocessing/popen_fork.py", line 26, in init
self._launch(process_obj)
File "/home/phd/anaconda3/lib/python3.6/multiprocessing/popen_fork.py", line 73, in _launch
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
在VOC上训练时一切正常,但是在COCOval2017上训练时每第11epoch末尾处(例如epochiter: 7390/7392)就会出错,我想问问什么情况,有什么好办法?

KeyError: 'unexpected key "0.weight" in state_dict'

Error while training RBF_mobile version

Loading base network...
Traceback (most recent call last):
File "train_test.py", line 133, in
net.base.load_state_dict(base_weights)
File "/home/xxxxx/anaconda2/envs/pyssd/lib/python3.6/site-packages/torch/nn/modules/module.py", line 490, in load_state_dict
.format(name))
KeyError: 'unexpected key "0.weight" in state_dict'

Visdom Problem

When I was trying to train the SSD model on VOC dataset using visdom as the visualization tool, I found that variables loc_loss and conf_loss are always 0 and during the training, these two variables are never updated. So maybe you should plot mean_loss_c and mean_loss_l for visualization?

07+12+coco

Can you share the pre-trained ssd300 coco model? thanks a lot. I want to finetune from the coco model, but I don't have enough gpus to train coco.

About loss_c in Multibox Loss

Hi~
loss_c = log_sum_exp(batch_conf) - batch_conf.gather(1, conf_t.view(-1,1))

loss_c = log_sum_exp(batch_conf) - batch_conf.gather(1, conf_t.view(-1,1))

This operation seems to me that same as to calculate the softmax cross entropy loss, so why not use torch.nn.functional.cross_entropy after softmax directly?
ps. +x_max and -x_max
return torch.log(torch.sum(torch.exp(x-x_max), 1, keepdim=True)) + x_max
in log_sum_exp seems do nothing either.

在自己的数据集上训练出现问题

Loading base network...
Initializing weights...
Loading Dataset...
Training RFB_vgg on VOC2007
1
2
训练过程,进行定位一直在这个位置,

load train data

    print('2')
    images, targets = next(batch_iterator)
    
    #print(np.sum([torch.sum(anno[:,-1] == 2) for anno in targets]))

    if args.cuda:
        images = Variable(images.cuda())
        targets = [Variable(anno.cuda()) for anno in targets]
    else:
        images = Variable(images)
        targets = [Variable(anno) for anno in targets]

请问一下这是什么原因呢?

Loss is nan on training RefineDet using LISA traffic dataset

I used LISA traffic dataset as the training dataset and translated all images and annotations to VOC format.
But when training, the AL, AC and OL are nan:
Epoch:1 || epochiter: 220/694|| Total iter 220 || AL: 3.7427 AC: 4.2145 OL: 2.7985 OC: 3.4148||Batch time: 0.8865 sec. ||LR: 0.00100000
Epoch:1 || epochiter: 230/694|| Total iter 230 || AL: nan AC: nan OL: nan OC: 518.2673||Batch time: 0.7096 sec. ||LR: 0.00100000
Epoch:1 || epochiter: 240/694|| Total iter 240 || AL: nan AC: nan OL: nan OC: 0.8749||Batch time: 0.6676 sec. ||LR: 0.00100000

My platform is Pytorch v0.4, cuda9.2.
I have modified the code in refine_multibox_loss.py because the dimensions of tensors(loss_c and pos ) are not matched. Is that reason?

The source code:
#Hard Negative Mining
loss_c[pos] = 0
My code:
loss_c[pos.view(-1,1)] = 0

Could you help me ?
@lzx1413

关于训练集07++12+COCO上训练

在训练集07++12+COCO上训练时,是直接将COCO的训练好的模型作为预训练模型,再在07++12上训练吗?还是需要做其他处理吗?

关于fps的问题

首先,想确认一下表中的fps是通过1/detect_time求出的吗?其次,我用1080ti跑了SSD300达不到150fps,大概只有83.3fps,想问一下原因。希望大神能指导一下

Train tricks

Hi! Thanks for your code! Can you give some advice on how to reproduce your result in readme file.
For example RFBNet300, must train 300 epoch on voc to get your report point 80.5. Also, I found the lr rate doesn't change during the training, I got 66%map on voc 07 test after 90 epoch. I use the v.4.0 branch.

AttributeError: 'SSD' object has no attribute 'module'

                APs, mAP = test_net(test_save_dir, net, detector, args.cuda, testset,
                                    BaseTransform(net.module.size, rgb_means, rgb_std, (2, 0, 1)),
                                    top_k, thresh=0.01)

Traceback (most recent call last):
File "/home/suizhehao/pytorch_detect/PytorchSSD/train_test.py", line 484, in
train()
File "/home/suizhehao/pytorch_detect/PytorchSSD/train_test.py", line 291, in train
BaseTransform(net.module.size, rgb_means, rgb_std, (2, 0, 1)),
File "/home/suizhehao/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 398, in getattr
type(self).name, name))
AttributeError: 'SSD' object has no attribute 'module'
what is the problem?

ImportError: cannot import name '_mask'

Traceback (most recent call last):
File "train_test.py", line 16, in
from data import VOCroot, COCOroot, VOC_300, VOC_512, COCO_300, COCO_512, COCO_mobile_300, AnnotationTransform,
File "/home/phd/PycharmProjects/PytorchSSD-master/data/init.py", line 3, in
from .coco import COCODetection
File "/home/phd/PycharmProjects/PytorchSSD-master/data/coco.py", line 20, in
from utils.pycocotools.coco import COCO
File "/home/phd/PycharmProjects/PytorchSSD-master/utils/pycocotools/coco.py", line 55, in
from . import mask as maskUtils
File "/home/phd/PycharmProjects/PytorchSSD-master/utils/pycocotools/mask.py", line 4, in
from . import _mask
ImportError: cannot import name '_mask'

I don't know how to solve it? Can you help me?

When set -s to 512 the SSD_vgg and FSSD_vgg can't run correctly

When I set -s to 300, there is nothing error reported, but when I set -s to 512 , the -v to SSD_vgg:
the error info is

Traceback (most recent call last):
  File "/home/aurora/workspaces12/PytorchSSD/train_ssds.py", line 434, in <module>
    train()
  File "/home/aurora/workspaces12/PytorchSSD/train_ssds.py", line 291, in train
    loss_l, loss_c = criterion(out, priors, targets)
  File "/usr/software/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/aurora/workspaces12/PytorchSSD/layers/modules/multibox_loss.py", line 86, in forward
    pos_idx = pos.unsqueeze(pos.dim()).expand_as(loc_data)
  File "/usr/software/anaconda3/lib/python3.5/site-packages/torch/autograd/variable.py", line 433, in expand_as
    return self.expand(tensor.size())
RuntimeError: The expanded size of the tensor (32256) must match the existing size (32756) at non-singleton dimension 1. at /home/aurora/workspaces12/backup/pytorch/torch/lib/THC/generic/THCTensor.c:340
Exception ignored in: <bound method DataLoaderIter.__del__ of <torch.utils.data.dataloader.DataLoaderIter object at 0x7fbca40ab2e8>>

When set -v to FSSD_vgg, the error info is

Traceback (most recent call last):
  File "/home/aurora/workspaces12/PytorchSSD/train_ssds.py", line 107, in <module>
    net = build_net(img_dim, num_classes)
  File "/home/aurora/workspaces12/PytorchSSD/models/FSSD_vgg.py", line 205, in build_net
    head = multibox(fea_channels, mbox[str(size)], num_classes), num_classes=num_classes)
  File "/home/aurora/workspaces12/PytorchSSD/models/FSSD_vgg.py", line 180, in multibox
    assert len(fea_channels) == len(cfg)
AssertionError

I changed the fea_channels to fea_channels = [512, 512, 256, 256, 256, 256, 256], and the error info is

Loading base network...
Initializing weights...
Loading Dataset...
Training FSSD_vgg on VOC0712
Traceback (most recent call last):
  File "/home/aurora/workspaces12/PytorchSSD/train_ssds.py", line 434, in <module>
    train()
  File "/home/aurora/workspaces12/PytorchSSD/train_ssds.py", line 288, in train
    out = net(images)
  File "/usr/software/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/aurora/workspaces12/PytorchSSD/models/FSSD_vgg.py", line 106, in forward
    concat_fea = torch.cat(transformed_features,1)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 38 and 64 in dimension 2 at /home/aurora/workspaces12/backup/pytorch/torch/lib/THC/generic/THCTensorMath.cu:111
Exception ignored in: <bound method DataLoaderIter.__del__ of <torch.utils.data.dataloader.DataLoaderIter object at 0x7f24b402c198>>

How is the speed of FSSD counted?

Hi, Zuo-xin:

I am really interested in your proposed work FSSD, I have read its details, and I find the proposed inference speed of FSSD is 65.8 fps. Actually, I also did experiment with Pytorch-SSD, which is only about 40 fps. The more complex network FSSD, however, performs faster.
Could you help find out the reason?
Thx for you help.

trying to understand multibox

can you please explaned the multibox function,
i dont understand why we have
vgg_source = [24, -2],

what happens when v=-1

def multibox(vgg, extra_layers, cfg, num_classes):
    loc_layers = []
    conf_layers = []
    vgg_source = [24, -2]
    for k, v in enumerate(vgg_source):
        loc_layers += [nn.Conv2d(vgg[v].out_channels,
                                 cfg[k] * 4, kernel_size=3, padding=1)]
        conf_layers += [nn.Conv2d(vgg[v].out_channels,
                                  cfg[k] * num_classes, kernel_size=3, padding=1)]
    for k, v in enumerate(extra_layers[1::2], 2):
        loc_layers += [nn.Conv2d(v.out_channels, cfg[k]
                                 * 4, kernel_size=3, padding=1)]
        conf_layers += [nn.Conv2d(v.out_channels, cfg[k]
                                  * num_classes, kernel_size=3, padding=1)]
return vgg, extra_layers, (loc_layers, conf_layers)

what does s1 and s2 means?

In the papers it says:
'subsample parameters from fc6 and fc7, changepool5 from2×2−s2to3×3−s1'
Can you please tell me what does it mean by s1 and s2?

also we can see 3x3x512-s2 and 3x3x256-s1 in the following figure
image

执行 demo/live.py出错

你好,我在执行demo/live.py 文件时, 发生了错误. Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument #2 'weight'
我目前pytorch版本是0.3.0

Error while reshaping tensor, in the model definition.

RuntimeError: invalid argument 2: size '[2 x -1 x 81]' is invalid for input with 267750 elements at /pytorch/aten/src/TH/THStorage.c:37

I get this error when I try to run the refinedet_train_test.py script! My input image size is 320. I don't understand where in the pipeline the source of the shape mismatch is. I haven't changed any of the code. It'd be great to know if someone faced this, or if I'm doing something wrong.
Thanks!

About configurations

I wonder if it is possible for recommended configs to be provided, as sometimes it could be confused for a green hand of the project.Thanks a lot.

refine_multibox_loss

Hi,thank you for sharing codes
there is one issue in refinedet320 training.
the "conf_p" and "targets_weighted" become "FloatTensor with no dimension" before they enter the F.cross_entropy function.
Any Suggestions?
thanks

why use the x_max in function log_sum_exp()?

def log_sum_exp(x)
x_max = x.data.max()
return torch.log(torch.sum(torch.exp(x-x_max), 1, keepdim=True)) + x_max
In this function ,if we remove x_max,the output of this function is just the same,so why should we use the x_max ?

python train_test_fssd_mobile_pre.py

当我运行,FSSD时出现下面的错误
loading pretrained model from weights/mobilenet_1.pth
Loading weights into state dict...
Traceback (most recent call last):
File "train_test_fssd_mobile_pre.py", line 132, in
net.load_weights(args.basenet)
File "/home/cv2018/PytorchSSD-master/models/FSSD_mobile.py", line 135, in load_weights
state_dict = torch.load(base_file, map_location=lambda storage, loc: storage)
File "/home/cv2018/anaconda3/lib/python3.6/site-packages/torch/serialization.py", line 265, in load
f = open(f, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: 'weights/mobilenet_1.pth'

[Errno 111] Connection refused

Hi,
I am getting this error
RuntimeError: The shape of the mask [8, 32756] at index 0 does not match the shape of the indexed tensor [262048, 1] at index 0
Exception ignored in: <bound method _DataLoaderIter.del of <torch.utils.data.dataloader._DataLoaderIter object at 0x7f801b6ccf28>>'

I am using pytorch 0.4.0. and python 3.5 anaconda 4.5.9.
Any help? Thanks

Pytorch V0.4 bug on expand() API

I have trained refinedet_traub_test.py on Pytorch v0.4, but I got the following errors:

File "E:\work\project\PytorchSSD-v1\PytorchSSD-master\utils\box_utils.py", line 71, in jaccard
inter = intersect(box_a, box_b)
File "E:\work\project\PytorchSSD-v1\PytorchSSD-master\utils\box_utils.py", line 51, in intersect
max_xy = torch.min(box_a[:, 2:].unsqueeze(1).expand(A, B, 2),
TypeError: expand(): argument 'size' (position 1) must be tuple of ints, not Tensor

Anyone knows how to fix it? Thanks a lot!

有时候会遇到max_xy的invalid argument 1问题

Traceback (most recent call last):
File "train_test.py", line 458, in
train()
File "train_test.py", line 324, in train
loss_l, loss_c = criterion(out, priors, targets)
File "/home/phd/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/home/phd/PycharmProjects/PytorchSSD-master/layers/modules/multibox_loss.py", line 74, in forward
match(self.threshold,truths,defaults,self.variance,labels,loc_t,conf_t,idx)
File "/home/phd/PycharmProjects/PytorchSSD-master/utils/box_utils.py", line 108, in match
point_form(priors)
File "/home/phd/PycharmProjects/PytorchSSD-master/utils/box_utils.py", line 67, in jaccard
inter = intersect(box_a, box_b)
File "/home/phd/PycharmProjects/PytorchSSD-master/utils/box_utils.py", line 47, in intersect
max_xy = torch.min(box_a[:, 2:].unsqueeze(1).expand(A, B, 2),
RuntimeError: invalid argument 1: the number of sizes provided must be greater or equal to the number of dimensions in the tensor at /opt/conda/conda-bld/pytorch_1518244507981/work/torch/lib/THC/generic/THCTensor.c:326

我是一块1080ti训练,pytorch0.3

运行train_test.py时AssertionError: Invalid device id

Traceback (most recent call last):
File "train_test.py", line 183, in
net = torch.nn.DataParallel(net, device_ids=list(range(args.ngpu)))
File "/home/cv2018/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 102, in init
_check_balance(self.device_ids)
File "/home/cv2018/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 17, in _check_balance
dev_props = [torch.cuda.get_device_properties(i) for i in device_ids]
File "/home/cv2018/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 17, in
dev_props = [torch.cuda.get_device_properties(i) for i in device_ids]
File "/home/cv2018/anaconda3/lib/python3.6/site-packages/torch/cuda/init.py", line 292, in get_device_properties
raise AssertionError("Invalid device id")
AssertionError: Invalid device id

refine_multibox_loss error: shape not matched

In refine_multibox_loss.py, line 112:
loss_c[pos] = 0 # filter out pos boxes for now

I got the following error:
RuntimeError: The shape of the mask [16, 6375] at index 0 does not match the shape of the indexed tensor [102000, 1] at index 0

关于RefineDet的性能问题

基于VOC数据集,训练RefineDet320,100轮后测试map不到0.2,然而loss一直在下降,想请问一下,这是什么情况?

How to set hyper-parameters for FSSD training?

To achieve the mAP of 77.8, which is claimed in README, how to set hyper-parameters for FSSD? Or just use the default values in train_test.py?

The README also mentioned that there is a train_RFB.py which lists parameters for RFB training, but I did not find such a script in this repo

训练FSSD有问题

Epoch:10 || epochiter: 2048/2068|| Totel iter 20660 || L: 1.4363 C: 3.5825||Batch time: 0.2319 sec. ||LR: 0.00400000
Epoch:10 || epochiter: 2058/2068|| Totel iter 20670 || L: 1.5284 C: 3.6207||Batch time: 0.2024 sec. ||LR: 0.00400000
Traceback (most recent call last):
File "train_test.py", line 456, in
train()
File "train_test.py", line 280, in train
top_k, thresh=0.01)
File "train_test.py", line 395, in test_net
out = net(x=x, test=True) # forward pass
File "/home/user/anaconda2/envs/tensorflow-gpu/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in call
result = self.forward(*input, **kwargs)
File "/home/user/anaconda2/envs/tensorflow-gpu/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 59, in forward
replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
File "/home/user/anaconda2/envs/tensorflow-gpu/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 64, in replicate
return replicate(module, device_ids)
File "/home/user/anaconda2/envs/tensorflow-gpu/lib/python3.6/site-packages/torch/nn/parallel/replicate.py", line 12, in replicate
param_copies = Broadcast(devices)(*params)
File "/home/user/anaconda2/envs/tensorflow-gpu/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 19, in forward
outputs = comm.broadcast_coalesced(inputs, self.target_gpus)
File "/home/user/anaconda2/envs/tensorflow-gpu/lib/python3.6/site-packages/torch/cuda/comm.py", line 48, in broadcast_coalesced
if tensor.get_device() != devices[0]:
IndexError: tuple index out of range
请问作者遇到过这种问题吗,谢谢,我用的是python3.6和pytorch0.2

L2Norm layer

hello, from the code, I don't understand the use of scale and gamma

class L2Norm(nn.Module):
    def __init__(self,n_channels, scale):
        super(L2Norm,self).__init__()
        self.n_channels = n_channels
        self.gamma = scale or None
        self.eps = 1e-10
        self.weight = nn.Parameter(torch.Tensor(self.n_channels))
        self.reset_parameters()
    def reset_parameters(self):
        init.constant(self.weight,self.gamma)
    def forward(self, x):
        norm = x.pow(2).sum(dim=1, keepdim=True).sqrt()+self.eps
        #x /= norm
        x = torch.div(x,norm)
        out = self.weight.unsqueeze(0).unsqueeze(2).unsqueeze(3).expand_as(x) * x
        return out

doubt about model test

during the test , sevaral lines of code in the forward() function of class detection:
for i in range(num):
decoded_boxes = decode(loc_data[i], prior_data, self.variance)

For each class, perform nms

conf_scores = conf_preds[i].clone()
for cl in range(1, self.num_classes):
c_mask = conf_scores[cl].gt(self.conf_thresh)
scores = conf_scores[cl][c_mask]
if scores.dim() == 0:
continue
l_mask = c_mask.unsqueeze(1).expand_as(decoded_boxes)
boxes = decoded_boxes[l_mask].view(-1, 4)

idx of highest scoring and non-overlapping boxes per class

ids, count = nms(boxes, scores, self.nms_thresh, self.top_k)
output[i, cl, :count] =
torch.cat((scores[ids[:count]].unsqueeze(1),
boxes[ids[:count]]), 1)
flt = output.contiguous().view(num, -1, 5)
_, idx = flt[:, :, 0].sort(1, descending=True)
, rank = idx.sort(1)
flt[(rank < self.top_k).unsqueeze(-1).expand_as(flt)].fill(0)
return output
we know that each box can output 21 classification score for the VOC Dataset , for one box ,if I get the it's softmax score like [0,0.02,0.02,0.8,...], then acording to this code , class 1,2,3 will all include this box. isn't this a bug? or I just ignore something.

RFB_Net_mobile training: TypeError: __init__() missing 1 required positional argument: 'head'

First off, thanks for this work!, it's super helpful to see how the heads work.

Current master branch (commit e47974b). Pytorch 0.4.0 (downgraded to 0.3.0) and Python 3.5.2.

Training with the Mobilenet as backbone with:
python train_test.py -d VOC -v RFB_mobile -s 300 --ngpu 1

fails with:

Traceback (most recent call last):
  File "train_test.py", line 126, in <module>
    net = build_net(img_dim, num_classes)
  File "/home/ividal/dev/pytorch/PytorchSSD/models/RFB_Net_mobile.py", line 346, in build_net
    mbox[str(size)], num_classes), num_classes=num_classes)
TypeError: __init__() missing 1 required positional argument: 'head'

I think the issue is actually that argument phase has been removed from the RFB_Net_* constructors (and so not passed in train_test.py), but was left behind in RFB_Net_mobile.

class RFBNet(nn.Module):
    def __init__(self, phase, size, base, extras, head, num_classes):

Removing phase from the constructor allows the arguments to be correctly identified and the network tries to load.

RefineDet320

Hello thank you for sharing your codes

There is one issue in RefineDet320 training. The training stops after training epoch of 10. Even I tried several times, always the same phenomenon occurs. Any suggestion?

Thanks

mAP of yolov2

Hi @lzx1413,

I am just curious about the mAP of yolov2, it is extremely high. Can you tell me how do you ge this mAP value?

YOLOv2 (Darknet-19) | 78.6

Best,

Loss does not decrease for coco

Have you ever trained on coco for these models? I try to train coco with original SSD. However, the loss keeps increasing especially for the localization loss from 3 to 30! The accuracy also decreases. I use the default config and everything behaves well on Pascal voc.

training loss error: inf

I follow the steps in the readme, but it occurs trainning loss error, the trainning locatization loss
the logs:

1 Epoch:1 || epochiter: 0/517|| Totel iter 0 || L: inf C: 23.2587||Batch time: 16.6584 sec. ||LR: 0.00000100
2 Epoch:1 || epochiter: 10/517|| Totel iter 10 || L: inf C: 19.0657||Batch time: 0.8695 sec. ||LR: 0.00001647
3 Epoch:1 || epochiter: 20/517|| Totel iter 20 || L: inf C: 16.3741||Batch time: 0.9224 sec. ||LR: 0.00003194
4 Epoch:1 || epochiter: 30/517|| Totel iter 30 || L: inf C: 14.5374||Batch time: 0.8792 sec. ||LR: 0.00004741

box label bug

conf = labels[best_truth_idx] # Shape: [num_priors]

This line should updated to the following form,otherwise the class background and aeroplane will confused together.

    conf = labels[best_truth_idx] + 1         # Shape: [num_priors]

The above code is copied from the project ssd.pytorch.

关于提供的model

你好,请问下,如下几个model是训练好的模型么

ImageNet mobilenet
07+12 RFB_Net300, BaiduYun Driver,FSSD300,SSD300
COCO RFB_Net512_E, BaiduYun Driver
COCO RFB_Mobile Net300, BaiduYun Driver

我下载了 第一个 但是效果很差 用的是这个 RFB300_80_5.pth 文件

How to train SSD from scratch?


I want to know how to train ssd from scratch to achive 73mAP?Can you tell me what is the batch-size,learning-rate,and another tricks?

Error when train VGG FSSD

I follow the instructions to train a VGG_FSSD model. An error occurred here:

RuntimeError: The shape of the mask [8, 11620] at index 0 does not match the shape of the indexed tensor [92960, 1] at index 0

It seems that a tensor with shape [92960, 1] cannot be indexed by a tensor with shape [8, 11620], even if the total number of elements for this two tensors are the same. The version of pytorch I used is 0.4.0.

mAP result of FSSD-MobileNet on VOC2007

Hi, Zuo-xin:

Thanks for your nice job.
I have trained a model with the following script and got a 73.84 mAP score. However it seems that my mAP score is much lower than this (78.4% mAP).
Could you give me some suggestions?

python train_test_fssd_mobile_pre.py \
    --version 'FSSD_mobile' \
    --size 300 \
    --dataset VOC \
    --basenet 'weights/mobilenet_1.pth' \
    --jaccard_threshold 0.5 \
    --batch_size 32 \
    --num_workers 4 \
    --cuda True \
    --lr 4e-3 \
    --momentum 0.9 \
    --warm_epoch 4 \
    --weight_decay 5e-4 \
    --gamma 0.1 \
    --log_iters True \
    --save_folder 'output/VOC_FSSD_MobileNet' \
    --date '05-12-2018' \
    --save_frequency 40 \
    --test_frequency 10 \
    --send_images_to_visdom False

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.