Giter Site home page Giter Site logo

qfgaohao / pytorch-ssd Goto Github PK

View Code? Open in Web Editor NEW
1.4K 39.0 519.0 1.06 MB

MobileNetV1, MobileNetV2, VGG based SSD/SSD-lite implementation in Pytorch 1.0 / Pytorch 0.4. Out-of-box support for retraining on Open Images dataset. ONNX and Caffe2 support. Experiment Ideas like CoordConv.

Home Page: https://medium.com/@smallfishbigsea/understand-ssd-and-implement-your-own-caa3232cd6ad

License: MIT License

Python 100.00%
ssd pytorch open-images object-detection

pytorch-ssd's Introduction

Single Shot MultiBox Detector Implementation in Pytorch

This repo implements SSD (Single Shot MultiBox Detector). The implementation is heavily influenced by the projects ssd.pytorch and Detectron. The design goal is modularity and extensibility.

Currently, it has MobileNetV1, MobileNetV2, and VGG based SSD/SSD-Lite implementations.

It also has out-of-box support for retraining on Google Open Images dataset.

Example of Mobile SSD

Dependencies

  1. Python 3.6+
  2. OpenCV
  3. Pytorch 1.0 or Pytorch 0.4+
  4. Caffe2
  5. Pandas
  6. Boto3 if you want to train models on the Google OpenImages Dataset.

Download models

Please download the models and put them into the folder "./models". The following sections will need them. URL: https://drive.google.com/drive/folders/1pKn-RifvJGWiOx0ZCRLtCXM5GT5lAluu?usp=sharing

Run the demo

Run the live MobilenetV1 SSD demo

# If you haven't downloaded the models, please download from https://drive.google.com/drive/folders/1pKn-RifvJGWiOx0ZCRLtCXM5GT5lAluu?usp=sharing.
python run_ssd_live_demo.py mb1-ssd models/mobilenet-v1-ssd-mp-0_675.pth models/voc-model-labels.txt 

Run the live demo in Caffe2

# If you haven't downloaded the models, please download from https://drive.google.com/drive/folders/1pKn-RifvJGWiOx0ZCRLtCXM5GT5lAluu?usp=sharing.
python run_ssd_live_caffe2.py models/mobilenet-v1-ssd_init_net.pb models/mobilenet-v1-ssd_predict_net.pb models/voc-model-labels.txt 

You can see a decent speed boost by using Caffe2.

Run the live MobileNetV2 SSD Lite demo

# If you haven't downloaded the models, please download from https://drive.google.com/drive/folders/1pKn-RifvJGWiOx0ZCRLtCXM5GT5lAluu?usp=sharing.
python run_ssd_live_demo.py mb2-ssd-lite models/mb2-ssd-lite-mp-0_686.pth models/voc-model-labels.txt 

The above MobileNetV2 SSD-Lite model is not ONNX-Compatible, as it uses Relu6 which is not supported by ONNX. The code supports the ONNX-Compatible version. Once I have trained a good enough MobileNetV2 model with Relu, I will upload the corresponding Pytorch and Caffe2 models.

You may notice MobileNetV2 SSD/SSD-Lite is slower than MobileNetV1 SSD/Lite on PC. However, MobileNetV2 is faster on mobile devices.

Pretrained Models

Mobilenet V1 SSD

If you haven't downloaded the models, please download from https://drive.google.com/drive/folders/1pKn-RifvJGWiOx0ZCRLtCXM5GT5lAluu?usp=sharing.

Model: mobilenet-v1-ssd-mp-0_675.pth

Average Precision Per-class:
aeroplane: 0.6742489426027927
bicycle: 0.7913672875238116
bird: 0.612096015101108
boat: 0.5616407126931772
bottle: 0.3471259064860268
bus: 0.7742298893362103
car: 0.7284171192326804
cat: 0.8360675520354323
chair: 0.5142295855384792
cow: 0.6244090341627014
diningtable: 0.7060035669312754
dog: 0.7849252606216821
horse: 0.8202146617282785
motorbike: 0.793578272243471
person: 0.7042670984734087
pottedplant: 0.40257147509774405
sheep: 0.6071252282334352
sofa: 0.7549120254763918
train: 0.8270992920206008
tvmonitor: 0.6459903029666852

Average Precision Across All Classes:0.6755

MobileNetV2 SSD-Lite

If you haven't downloaded the models, please download from https://drive.google.com/drive/folders/1pKn-RifvJGWiOx0ZCRLtCXM5GT5lAluu?usp=sharing.

Model: mb2-ssd-lite-mp-0_686.pth

Average Precision Per-class:
aeroplane: 0.6973327307871002
bicycle: 0.7823755921687233
bird: 0.6342429230125619
boat: 0.5478160937380846
bottle: 0.3564069147093762
bus: 0.7882037885117419
car: 0.7444122242934775
cat: 0.8198865557991936
chair: 0.5378973422880109
cow: 0.6186076149254742
diningtable: 0.7369559500950861
dog: 0.7848265495754562
horse: 0.8222948787839229
motorbike: 0.8057808854619948
person: 0.7176976451996411
pottedplant: 0.42802932547480066
sheep: 0.6259124005994047
sofa: 0.7840368059271103
train: 0.8331588002612781
tvmonitor: 0.6555051795079904
Average Precision Across All Classes:0.6860690100560214

The code to re-produce the model:

# If you haven't downloaded the models, please download from https://drive.google.com/drive/folders/1pKn-RifvJGWiOx0ZCRLtCXM5GT5lAluu?usp=sharing.
python train_ssd.py --dataset_type voc  --datasets ~/data/VOC0712/VOC2007 ~/data/VOC0712/VOC2012 --validation_dataset ~/data/VOC0712/test/VOC2007/ --net mb2-ssd-lite --base_net models/mb2-imagenet-71_8.pth  --scheduler cosine --lr 0.01 --t_max 200 --validation_epochs 5 --num_epochs 200

VGG SSD

Model: vgg16-ssd-mp-0_7726.pth

Average Precision Per-class:
aeroplane: 0.7957406334737802
bicycle: 0.8305351156180996
bird: 0.7570969203281721
boat: 0.7043869846367731
bottle: 0.5151666571756393
bus: 0.8375121237865507
car: 0.8581508869699901
cat: 0.8696185705648963
chair: 0.6165431194526735
cow: 0.8066422244852381
diningtable: 0.7629391213959706
dog: 0.8444541531856452
horse: 0.8691922094815812
motorbike: 0.8496564646906418
person: 0.793785185549561
pottedplant: 0.5233462463152305
sheep: 0.7786762429478917
sofa: 0.8024887701948746
train: 0.8713861172265407
tvmonitor: 0.7650514925384194
Average Precision Across All Classes:0.7726184620009084

The code to re-produce the model:

wget -P models https://s3.amazonaws.com/amdegroot-models/vgg16_reducedfc.pth
python train_ssd.py --datasets ~/data/VOC0712/VOC2007/ ~/data/VOC0712/VOC2012/ --validation_dataset ~/data/VOC0712/test/VOC2007/ --net vgg16-ssd --base_net models/vgg16_reducedfc.pth  --batch_size 24 --num_epochs 200 --scheduler "multi-step” —-milestones “120,160”

Training

python train_ssd.py --datasets ~/data/VOC0712/VOC2007/ ~/data/VOC0712/VOC2012/ --validation_dataset ~/data/VOC0712/test/VOC2007/ --net mb1-ssd --base_net models/mobilenet_v1_with_relu_69_5.pth  --batch_size 24 --num_epochs 200 --scheduler cosine --lr 0.01 --t_max 200

The dataset path is the parent directory of the folders: Annotations, ImageSets, JPEGImages, SegmentationClass and SegmentationObject. You can use multiple datasets to train.

Evaluation

python eval_ssd.py --net mb1-ssd  --dataset ~/data/VOC0712/test/VOC2007/ --trained_model models/mobilenet-v1-ssd-mp-0_675.pth --label_file models/voc-model-labels.txt 

Convert models to ONNX and Caffe2 models

python convert_to_caffe2_models.py mb1-ssd models/mobilenet-v1-ssd-mp-0_675.pth models/voc-model-labels.txt 

The converted models are models/mobilenet-v1-ssd.onnx, models/mobilenet-v1-ssd_init_net.pb and models/mobilenet-v1-ssd_predict_net.pb. The models in the format of pbtxt are also saved for reference.

Retrain on Open Images Dataset

Let's we are building a model to detect guns for security purpose.

Before you start you can try the demo.

python run_ssd_example.py mb1-ssd models/gun_model_2.21.pth models/open-images-model-labels.txt ~/Downloads/big.JPG

Example of Gun Detection

If you manage to get more annotated data, the accuracy could become much higher.

Download data

python open_images_downloader.py --root ~/data/open_images --class_names "Handgun,Shotgun" --num_workers 20

It will download data into the folder ~/data/open_images.

The content of the data directory looks as follows.

class-descriptions-boxable.csv       test                        validation
sub-test-annotations-bbox.csv        test-annotations-bbox.csv   validation-annotations-bbox.csv
sub-train-annotations-bbox.csv       train
sub-validation-annotations-bbox.csv  train-annotations-bbox.csv

The folders train, test, validation contain the images. The files like sub-train-annotations-bbox.csv is the annotation file.

Retrain

python train_ssd.py --dataset_type open_images --datasets ~/data/open_images --net mb1-ssd --pretrained_ssd models/mobilenet-v1-ssd-mp-0_675.pth --scheduler cosine --lr 0.01 --t_max 100 --validation_epochs 5 --num_epochs 100 --base_net_lr 0.001  --batch_size 5

You can freeze the base net, or all the layers except the prediction heads.

  --freeze_base_net     Freeze base net layers.
  --freeze_net          Freeze all the layers except the prediction head.

You can also use different learning rates for the base net, the extra layers and the prediction heads.

  --lr LR, --learning-rate LR
  --base_net_lr BASE_NET_LR
                        initial learning rate for base net.
  --extra_layers_lr EXTRA_LAYERS_LR

As subsets of open images data can be very unbalanced, it also provides a handy option to roughly balance the data.

  --balance_data        Balance training data by down-sampling more frequent
                        labels.

Test on image

python run_ssd_example.py mb1-ssd models/mobilenet-v1-ssd-Epoch-99-Loss-2.2184619531035423.pth models/open-images-model-labels.txt ~/Downloads/gun.JPG

ONNX Friendly VGG16 SSD

! The model is not really ONNX-Friendly due the issue mentioned here "#33 (comment)"

The Scaled L2 Norm Layer has been replaced with BatchNorm to make the net ONNX compatible.

Train

The pretrained based is borrowed from https://s3.amazonaws.com/amdegroot-models/vgg16_reducedfc.pth .

python train_ssd.py --datasets ~/data/VOC0712/VOC2007/ ~/data/VOC0712/VOC2012/ --validation_dataset ~/data/VOC0712/test/VOC2007/ --net "vgg16-ssd" --base_net models/vgg16_reducedfc.pth  --batch_size 24 --num_epochs 150 --scheduler cosine --lr 0.0012 --t_max 150 --validation_epochs 5

Eval

python eval_ssd.py --net vgg16-ssd  --dataset ~/data/VOC0712/test/VOC2007/ --trained_model models/vgg16-ssd-Epoch-115-Loss-2.819455094383535.pth --label_file models/voc-model-labels.txt

TODO

  1. Resnet34 Based Model.
  2. BatchNorm Fusion.

pytorch-ssd's People

Contributors

aclex avatar cortwave avatar hyl-gm avatar kaustubh-pandey avatar kinoshita-hidetoshi avatar nicolas1203 avatar qfgaohao avatar squeakus avatar wakandan avatar zhaoyi-yan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorch-ssd's Issues

Can't pickle local object 'TrainAugmentation.__init__.<locals>.

w.start()

File "D:\SoftWare\Anaconda3\envs\deeplearning\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "D:\SoftWare\Anaconda3\envs\deeplearning\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "D:\SoftWare\Anaconda3\envs\deeplearning\lib\multiprocessing\context.py", line 322, in Popen
return Popen(process_obj)
File "D:\SoftWare\Anaconda3\envs\deeplearning\lib\multiprocessing\popen_spawn
win32.py", line 65, in init
reduction.dump(process_obj, to_child)
File "D:\SoftWare\Anaconda3\envs\deeplearning\lib\multiprocessing\reduction.py ", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'TrainAugmentation.init..< lambda>'
2018-11-20 16:16:08,607 - root - INFO - working_dir is ./events/ Please using co mmand : tensorboard --logdir=.
Traceback (most recent call last):
File "", line 1, in
File "D:\SoftWare\Anaconda3\envs\deeplearning\lib\multiprocessing\spawn.py", l ine 105, in spawn_main
exitcode = _main(fd)
File "D:\SoftWare\Anaconda3\envs\deeplearning\lib\multiprocessing\spawn.py", l ine 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

have you meet this problem related to the "pickle"?? thank you for the code, firstly~~

How to modify the image_size in the ssd/config directory

Hello @qfgaohao , I want to reduce the image_size in the mobilenetv1_ssd_config.py file, such as from 300 to 100, if you adjust this, you do not need to modify specs = [
     SSDSpec(19, 16, SSDBoxSizes(60, 105), [2, 3]),
     SSDSpec(10, 32, SSDBoxSizes(105, 150), [2, 3]),
     SSDSpec(5, 64, SSDBoxSizes(150, 195), [2, 3]),
     SSDSpec(3, 100, SSDBoxSizes(195, 240), [2, 3]),
     SSDSpec(2, 150, SSDBoxSizes(240, 285), [2, 3]),
     SSDSpec(1, 300, SSDBoxSizes(285, 330), [2, 3])
],Thank you

how to apply my backbone?

as i applied my backbone network to this project, i had some question. i faced error.

RuntimeError: Given groups=1, weight of size [256, 512, 1, 1], expected input[1, 704, 16, 16] to have 512 channels, but got 704 channels instead

it is occur in line number of 60 of FpnSSD.

so i changed code how source_layer_indexes's input channels to 704, but still occur same error.

Can you tell me where i should look on?

and i wonder how to set source_layer_indexes first tuple's first component like 69 in mobile_net version 1 with fpnssd.

couple of clarifications

Thank you for releasing this repo. could you clear up a couple of questions:

1). what is the url to download the VGG based SSD? (for e.g., mobilenet based SSD is at https://storage.googleapis.com/models-hao/mobilenet-v1-ssd-mp-0_675.pth)

2). could you provide some details of how you trained the model - 'gun_model_2.21.pth'? Can I use 'gun_model_2.21.pth' with the VGG based SSD?

3). Have you worked with YOLO based models? I was wondering if you could provide any insight on the practical difference in accuracy between YOLO and SSD models?

Thanks again for releasing this repo. Look forward to your reply.

pytorch ssd to onnx error

Hi,
I modeled your documents, converting my pytorch ssd model to onnx,
but it encounter a error:
Traceback (most recent call last):
File "test.py", line 28, in
test_voc()
File "test.py", line 21, in test_voc
torch.onnx.export(net,dummy_input,"./fire_ssd.proto",verbose=True)
File "/home/yt/.local/lib/python3.5/site-packages/torch/onnx/init.py", line 27, in export
return utils.export(*args, **kwargs)
File "/home/yt/.local/lib/python3.5/site-packages/torch/onnx/utils.py", line 104, in export
operator_export_type=operator_export_type)
File "/home/yt/.local/lib/python3.5/site-packages/torch/onnx/utils.py", line 281, in _export
example_outputs, propagate)
File "/home/yt/.local/lib/python3.5/site-packages/torch/onnx/utils.py", line 224, in _model_to_graph
graph, torch_out = _trace_and_get_graph_from_model(model, args, training)
File "/home/yt/.local/lib/python3.5/site-packages/torch/onnx/utils.py", line 192, in _trace_and_get_graph_from_model
trace, torch_out = torch.jit.get_trace_graph(model, args, _force_outplace=True)
File "/usr/lib/python3.5/contextlib.py", line 77, in exit
self.gen.throw(type, value, traceback)
File "/home/yt/.local/lib/python3.5/site-packages/torch/onnx/utils.py", line 39, in set_training
yield
File "/home/yt/.local/lib/python3.5/site-packages/torch/onnx/utils.py", line 192, in _trace_and_get_graph_from_model
trace, torch_out = torch.jit.get_trace_graph(model, args, _force_outplace=True)
File "/home/yt/.local/lib/python3.5/site-packages/torch/jit/init.py", line 197, in get_trace_graph
return LegacyTracedModule(f, _force_outplace)(*args, **kwargs)
File "/home/yt/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/yt/.local/lib/python3.5/site-packages/torch/jit/init.py", line 252, in forward
out = self.inner(*trace_inputs)
File "/home/yt/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 487, in call
result = self._slow_forward(*input, **kwargs)
File "/home/yt/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in _slow_forward
result = self.forward(*input, **kwargs)
File "/home/yt/pytorch2onnx/ssd.py", line 103, in forward
self.priors.type(type(x.data)) # default boxes
RuntimeError: Attempted to trace Detect, but tracing of legacy functions is not supported
Do you have any advice,thanks!

training w/ multi classes from open images dataset

When training on multi-classes (using the downloaded images from open-images V4), the training loss becomes very large and sometimes i get nan. i believe that it is because i am training using classes (Human body and Human hand or Human face) that uses the same image distribution. And since the data for Human body is bigger, my model ends up recognizing only that class.

Is there a way of downloading images that contains all the bounding boxes of all the wanted classes?

Convert models to ONNX

RuntimeError: ONNX export failed: Couldn't export operator aten::max_pool2d_with_indices?

Ground Truth bounding boxes have negative values and values more 1

The ground truth bounding boxes given by the train loader have negative values and even values > 1 (meaning bounding box has more width/height than the actual image).. whereas in the Annotations for VOC all the values are within the file size.

Is this a normal transformation. If so then how to obtain the actual bounding box location in the image?

Multi GPU training

when training a model using train_ssd.py, what changes would we need to make to use multiple GPUs for training? thanks,

RuntimeError: Error(s) in loading state_dict for Sequential

Heelo, I have tried to train with mobile net networks, but the following error emerged:

2018-12-08 19:07:35,187 - root - INFO - Use Cuda.
2018-12-08 19:07:35,187 - root - INFO - Namespace(balance_data=False, base_net='/content/pytorch-ssd/models/mb2-ssd-lite-mp-0_686.pth', base_net_lr=None, batch_size=4, checkpoint_folder='models/', dataset_type='voc', datasets=['/content/MangaVOC'], debug_steps=100, extra_layers_lr=None, freeze_base_net=False, freeze_net=False, gamma=0.1, lr=0.001, mb2_width_mult=1.0, milestones='80,100', momentum=0.9, net='mb2-ssd-lite', num_epochs=1, num_workers=4, pretrained_ssd=None, resume=None, scheduler='cosine', t_max=200.0, use_cuda=True, validation_dataset='/content/MangaVOC', validation_epochs=5, weight_decay=0.0005)

2018-12-08 19:07:35,189 - root - INFO - Prepare training datasets.
2018-12-08 19:07:35,194 - root - INFO - Stored labels into file models/voc-model-labels.txt.
2018-12-08 19:07:35,194 - root - INFO - Train dataset size: 15761
2018-12-08 19:07:35,195 - root - INFO - Prepare Validation datasets.
2018-12-08 19:07:35,196 - root - INFO - validation dataset size: 3153
2018-12-08 19:07:35,196 - root - INFO - Build network.
2018-12-08 19:07:35,333 - root - INFO - Init from base net /content/pytorch-ssd/models/mb2-ssd-lite-mp-0_686.pth
Traceback (most recent call last):
File "train_ssd.py", line 287, in
net.init_from_base_net(args.base_net)
File "/content/pytorch-ssd/vision/ssd/ssd.py", line 112, in init_from_base_net
self.base_net.load_state_dict(torch.load(model, map_location=lambda storage, loc: storage), strict=True)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 769, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Sequential:
Missing key(s) in state_dict: "0.0.weight", "0.1.weight", "0.1.bias", "0.1.running_mean",
....

The VGG network works fine to me. I believe it might be something to do with pytorch version, but I am using the latest one from pytorch website.

Error for loading vgg16_reducedfc for exporting to onnx model

I'm trying to export the onnx model for vgg16_reducedfc using this command
python3 convert_to_caffe2_models.py vgg16-ssd models/vgg16_reducedfc.pth models/voc-model-labels.txt
but I got the following error

Traceback (most recent call last): File "convert_to_caffe2_models.py", line 51, in <module> net.load(model_path) File "/media/pc/sdb1/pytorch-ssd/vision/ssd/ssd.py", line 135, in load self.load_state_dict(torch.load(model, map_location=lambda storage, loc: storage)) File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 769, in load_state_dict self.__class__.__name__, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for SSD: Missing key(s) in state_dict: "base_net.0.bias", "base_net.0.weight", "base_net.2.bias", "base_net.2.weight", "base_net.5.bias", "base_net.5.weight", "base_net.7.bias", "base_net.7.weight", "base_net.10.bias", "base_net.10.weight", "base_net.12.bias", "base_net.12.weight", "base_net.14.bias", "base_net.14.weight", "base_net.17.bias", "base_net.17.weight", "base_net.19.bias", "base_net.19.weight", "base_net.21.bias", "base_net.21.weight", "base_net.24.bias", "base_net.24.weight", "base_net.26.bias", "base_net.26.weight", "base_net.28.bias", "base_net.28.weight", "base_net.31.bias", "base_net.31.weight", "base_net.33.bias", "base_net.33.weight", "extras.0.0.bias", "extras.0.0.weight", "extras.0.2.bias", "extras.0.2.weight", "extras.1.0.bias", "extras.1.0.weight", "extras.1.2.bias", "extras.1.2.weight", "extras.2.0.bias", "extras.2.0.weight", "extras.2.2.bias", "extras.2.2.weight", "extras.3.0.bias", "extras.3.0.weight", "extras.3.2.bias", "extras.3.2.weight", "classification_headers.0.bias", "classification_headers.0.weight", "classification_headers.1.bias", "classification_headers.1.weight", "classification_headers.2.bias", "classification_headers.2.weight", "classification_headers.3.bias", "classification_headers.3.weight", "classification_headers.4.bias", "classification_headers.4.weight", "classification_headers.5.bias", "classification_headers.5.weight", "regression_headers.0.bias", "regression_headers.0.weight", "regression_headers.1.bias", "regression_headers.1.weight", "regression_headers.2.bias", "regression_headers.2.weight", "regression_headers.3.bias", "regression_headers.3.weight", "regression_headers.4.bias", "regression_headers.4.weight", "regression_headers.5.bias", "regression_headers.5.weight", "source_layer_add_ons.0.bias", "source_layer_add_ons.0.running_var", "source_layer_add_ons.0.running_mean", "source_layer_add_ons.0.weight". Unexpected key(s) in state_dict: "0.weight", "0.bias", "2.weight", "2.bias", "5.weight", "5.bias", "7.weight", "7.bias", "10.weight", "10.bias", "12.weight", "12.bias", "14.weight", "14.bias", "17.weight", "17.bias", "19.weight", "19.bias", "21.weight", "21.bias", "24.weight", "24.bias", "26.weight", "26.bias", "28.weight", "28.bias", "31.weight", "31.bias", "33.weight", "33.bias".

Is it because vgg16-ssd network definition is different from the reduced one?

[save the entire model failed]

load and inference work ok.
and I am trying to save the entire model (including both architecture and weights),but failed.
in detail,torch.save(net,'mb_v1_ssd_all.pth')
when run the code, it failed , the error information : TypeError: can't pickle module objects
( we can save weights only successfully by : torch.save(net.state_dict(),'mb_v1_ssd.pth'))

For some purpose , i must save the entire model now.
Is anyone encountered the same problem?
or could someone give me some idea?

Abort trap 6

When running the:
run_ssd_live_demo.py mb2-ssd-lite models/mb2-ssd-lite-mp-0_686.pth models/voc-model-labels.txt
I get an error message 'Abort trap 6'

I'm running the file on mac osx. Any known errors there? Thanks in advance!

Recreating the results of available models

Hello, I tried to retrain the model for the class names "Handgun,Shotgun" following your instructions:

python train_ssd.py --dataset_type open_images --datasets ~/data/open_images --net mb1-ssd --pretrained_ssd models/mobilenet-v1-ssd-mp-0_675.pth --scheduler cosine --lr 0.01 --t_max 100 --validation_epochs 5 --num_epochs 100 --base_net_lr 0.001  --batch_size 5

After retraining, I'm not able to get the same object detection results as with your model "gun_model_2.21.pth".

could you give the details of how you trained your model "gun_model_2.21.pth"? were any of the above parameters different?

thanks,

Subsetting person class

Hi! First of all thank you for this working implementation of SSD in pytorch.
I am trying to build a person detector with this implementation for my bachelor thesis. I removed the other classes in voc_dataset.py, but I run into trouble because your code is loading all the annotations classes by default. What do I need to change to make it train only on person and background for pascal VOC?
When I change the _get_annotation function to only include class_name == 'person' I get the following error:
ndexError: Traceback (most recent call last):
File "/home/steff/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 106, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/steff/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 106, in
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/steff/.local/lib/python3.6/site-packages/torch/utils/data/dataset.py", line 81, in getitem
return self.datasets[dataset_idx][sample_idx]
File "/home/steff/bachelor-project/pytorch-ssd-master/vision/datasets/voc_dataset.py", line 37, in getitem
image, boxes, labels = self.transform(image, boxes, labels)
File "/home/steff/bachelor-project/pytorch-ssd-master/vision/ssd/data_preprocessing.py", line 34, in call
return self.augment(img, boxes, labels)
File "/home/steff/bachelor-project/pytorch-ssd-master/vision/transforms/transforms.py", line 55, in call
img, boxes, labels = t(img, boxes, labels)
File "/home/steff/bachelor-project/pytorch-ssd-master/vision/transforms/transforms.py", line 275, in call
overlap = jaccard_numpy(boxes, rect)
File "/home/steff/bachelor-project/pytorch-ssd-master/vision/transforms/transforms.py", line 30, in jaccard_numpy
inter = intersect(box_a, box_b)
File "/home/steff/bachelor-project/pytorch-ssd-master/vision/transforms/transforms.py", line 13, in intersect
max_xy = np.minimum(box_a[:, 2:], box_b[2:])
IndexError: too many indices for array

How can I subset the class person properly?
Thanks in advance!

Training on two GPUs

Hello @qfgaohao ,

I am trying to set
DEVICE = torch.device('cuda:0' if torch.cuda.is_available() and args.use_cuda else 'cpu')
DEVICE = torch.device('cuda:1' if torch.cuda.is_available() and args.use_cuda else 'cpu')
And run two experiments simultaneously.
The first one is working fine, and occupies a reasonable memory of GPU. But the second does not work, no matter how small the batch size is. Plus, GPU 1 has enough free memory for the another run.
Do you have any idea on this kind of issue?

Runtime error while running demo

Hi,
when I'm trying to run Open Images Dataset demo (with handguns), I get the following error:
Traceback (most recent call last): File "run_ssd_example.py", line 22, in <module> net.load(model_path) File "/home/bustardeuhedral/pytorch-ssd/vision/ssd/ssd.py", line 119, in load self.load_state_dict(torch.load(model, map_location=lambda storage, loc: storage)) File "/home/bustardeuhedral/anaconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 719 , in load_state_dict self.__class__.__name__, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for SSD: Missing key(s) in state_dict: "base_net.0.weight", "base_net.0.bias", "base_net.2.weight", "base_net.2.bias ", "base_net.5.weight", "base_net.5.bias", "base_net.7.weight", "base_net.7.bias", "base_net.10.weight", "base_net. 10.bias", "base_net.12.weight", "base_net.12.bias", "base_net.14.weight", "base_net.14.bias", "base_net.17.weight", "base_net.17.bias", "base_net.19.weight", "base_net.19.bias", "base_net.21.weight", "base_net.21.bias", "base_net. 24.weight", "base_net.24.bias", "base_net.26.weight", "base_net.26.bias", "base_net.28.weight", "base_net.28.bias", "base_net.31.weight", "base_net.31.bias", "base_net.33.weight", "base_net.33.bias", "source_layer_add_ons.0.weight ", "source_layer_add_ons.0.bias", "source_layer_add_ons.0.running_mean", "source_layer_add_ons.0.running_var". Unexpected key(s) in state_dict: "base_net.0.0.weight", "base_net.0.1.weight", "base_net.0.1.bias", "base_n et.0.1.running_mean", "base_net.0.1.running_var", "base_net.1.0.weight", "base_net.1.1.weight", "base_net.1.1.bias" , "base_net.1.1.running_mean", "base_net.1.1.running_var", "base_net.1.3.weight", "base_net.1.4.weight", "base_net. 1.4.bias", "base_net.1.4.running_mean", "base_net.1.4.running_var", "base_net.2.0.weight", "base_net.2.1.weight", " base_net.2.1.bias", "base_net.2.1.running_mean", "base_net.2.1.running_var", "base_net.2.3.weight", "base_net.2.4.w eight", "base_net.2.4.bias", "base_net.2.4.running_mean", "base_net.2.4.running_var", "base_net.3.0.weight", "base_ net.3.1.weight", "base_net.3.1.bias", "base_net.3.1.running_mean", "base_net.3.1.running_var", "base_net.3.3.weight ", "base_net.3.4.weight", "base_net.3.4.bias", "base_net.3.4.running_mean", "base_net.3.4.running_var", "base_net.4 .0.weight", "base_net.4.1.weight", "base_net.4.1.bias", "base_net.4.1.running_mean", "base_net.4.1.running_var", "b ase_net.4.3.weight", "base_net.4.4.weight", "base_net.4.4.bias", "base_net.4.4.running_mean", "base_net.4.4.running _var", "base_net.5.0.weight", "base_net.5.1.weight", "base_net.5.1.bias", "base_net.5.1.running_mean", "base_net.5. 1.running_var", "base_net.5.3.weight", "base_net.5.4.weight", "base_net.5.4.bias", "base_net.5.4.running_mean", "ba se_net.5.4.running_var", "base_net.6.0.weight", "base_net.6.1.weight", "base_net.6.1.bias", "base_net.6.1.running_m ean", "base_net.6.1.running_var", "base_net.6.3.weight", "base_net.6.4.weight", "base_net.6.4.bias", "base_net.6.4. running_mean", "base_net.6.4.running_var", "base_net.7.0.weight", "base_net.7.1.weight", "base_net.7.1.bias", "base _net.7.1.running_mean", "base_net.7.1.running_var", "base_net.7.3.weight", "base_net.7.4.weight", "base_net.7.4.bia s", "base_net.7.4.running_mean", "base_net.7.4.running_var", "base_net.8.0.weight", "base_net.8.1.weight", "base_ne t.8.1.bias", "base_net.8.1.running_mean", "base_net.8.1.running_var", "base_net.8.3.weight", "base_net.8.4.weight", "base_net.8.4.bias", "base_net.8.4.running_mean", "base_net.8.4.running_var", "base_net.9.0.weight", "base_net.9.1 .weight", "base_net.9.1.bias", "base_net.9.1.running_mean", "base_net.9.1.running_var", "base_net.9.3.weight", "bas e_net.9.4.weight", "base_net.9.4.bias", "base_net.9.4.running_mean", "base_net.9.4.running_var", "base_net.10.0.wei ght", "base_net.10.1.weight", "base_net.10.1.bias", "base_net.10.1.running_mean", "base_net.10.1.running_var", "bas e_net.10.3.weight", "base_net.10.4.weight", "base_net.10.4.bias", "base_net.10.4.running_mean", "base_net.10.4.runn ing_var", "base_net.11.0.weight", "base_net.11.1.weight", "base_net.11.1.bias", "base_net.11.1.running_mean", "base _net.11.1.running_var", "base_net.11.3.weight", "base_net.11.4.weight", "base_net.11.4.bias", "base_net.11.4.running_mean", "base_net.11.4.running_var", "base_net.12.0.weight", "base_net.12.1.weight", "base_net.12.1.bias", "base_net.12.1.running_mean", "base_net.12.1.running_var", "base_net.12.3.weight", "base_net.12.4.weight", "base_net.12.4.bias", "base_net.12.4.running_mean", "base_net.12.4.running_var", "base_net.13.0.weight", "base_net.13.1.weight", "base_net.13.1.bias", "base_net.13.1.running_mean", "base_net.13.1.running_var", "base_net.13.3.weight", "base_net.13.4.weight", "base_net.13.4.bias", "base_net.13.4.running_mean", "base_net.13.4.running_var". size mismatch for classification_headers.0.weight: copying a param of torch.Size([12, 512, 3, 3]) from checkpoint, where the shape is torch.Size([18, 512, 3, 3]) in current model. size mismatch for classification_headers.0.bias: copying a param of torch.Size([12]) from checkpoint, where the shape is torch.Size([18]) in current model. size mismatch for classification_headers.4.weight: copying a param of torch.Size([12, 256, 3, 3]) from checkpoint, where the shape is torch.Size([18, 256, 3, 3]) in current model. size mismatch for classification_headers.4.bias: copying a param of torch.Size([12]) from checkpoint, where the shape is torch.Size([18]) in current model. size mismatch for classification_headers.5.weight: copying a param of torch.Size([12, 256, 3, 3]) from checkpoint, where the shape is torch.Size([18, 256, 3, 3]) in current model. size mismatch for classification_headers.5.bias: copying a param of torch.Size([12]) from checkpoint, where the shape is torch.Size([18]) in current model. size mismatch for regression_headers.0.weight: copying a param of torch.Size([16, 512, 3, 3]) from checkpoint, where the shape is torch.Size([24, 512, 3, 3]) in current model. size mismatch for regression_headers.0.bias: copying a param of torch.Size([16]) from checkpoint, where the shape is torch.Size([24]) in current model. size mismatch for regression_headers.4.weight: copying a param of torch.Size([16, 256, 3, 3]) from checkpoint, where the shape is torch.Size([24, 256, 3, 3]) in current model. size mismatch for regression_headers.4.bias: copying a param of torch.Size([16]) from checkpoint, where the shape is torch.Size([24]) in current model. size mismatch for regression_headers.5.weight: copying a param of torch.Size([16, 256, 3, 3]) from checkpoint, where the shape is torch.Size([24, 256, 3, 3]) in current model. size mismatch for regression_headers.5.bias: copying a param of torch.Size([16]) from checkpoint, where the shape is torch.Size([24]) in current model.
The same error after training on Open Images Dataset (from scratch and using pretrained mobilenet-v1-ssd-mp-0_675.pth).
I am using python3.6.7 with pytorch 0.4.1

training base network (ssd mobilenet v1)

How can i get mobilenet base network's weights by myself?

Is that just result of training with PASCAL VOC 0712 data set classification task?

I've tried ssd mobilenet v1 end-to-end without your base network weight file and it's result was mAP 0.57.
(options: --batch_size 32 --num_epochs 400 --scheduler cosine --lr 0.001 --t_max 200)
But with your pretrained base network weights(mobilenet-v1-ssd-mp-0_675.pth), the mAP was 0.67 that you wrote on main page samely.
(options: --batch_size 32 --num_epochs 200 --scheduler cosine --lr 0.01 --t_max 200)

And can I get or download trained VGG16 based SSD network's weight file?

Best regards,

PRETRAINED MODEL LINK NOT VALID

Hi,
I am not able to open the link as such or even using wget.
When using wget this t=is the error I am getting:

"2018-08-25 19:34:19 ERROR 403: Forbidden."

Labels dimension mismatch

Hi,
When I run the training script the list Label generated from DataLoader has dimensions (BatchSize,1) wheras the labels list in MultiBox_loss function needs dimensions (BatchSize, num_priors) (num_priors is 3000 while running)
Actual error:

box_utils.py", line 211, in hard_negative_mining
loss[pos_mask] = -math.inf
RuntimeError: The shape of the mask [24, 1] at index 1 does not match the shape of the indexed tensor [24, 3000] at index 1

Run ssd_sample.py with mb2-ssd-lite ERROR

there‘s a little bug in the code. when i run the file to test a image with the mobilenetv2-ssdlite, got a TensorType ERROR, So i change the default params of "create_mobilenetv2_ssd_lite_predictor" in the file mobilenet_v2_ssd_lite.py . by set the device as None

def create_mobilenetv2_ssd_lite_predictor(net, candidate_size=200, nms_method=None, sigma=0.5, device=None):

Any better suggestion? i learn a lot from your code, thank you a lot. 太感谢啦~~

Infinite validation regression loss

Hi,

I am trying to reproduce your results, but validation regression loss is infinte. I have tried different learning rate regimes, but didn't have any luck. Would be great if you can provide some insights into this issue? Thanks.

Here is the output:

2018-12-01 12:38:16,778 - root - INFO - Epoch: 0, Step: 100, Average Loss: 12.1986, Average Regression Loss 2.7535, Average Classification Loss: 9.4451
2018-12-01 12:38:34,135 - root - INFO - Epoch: 0, Step: 200, Average Loss: 7.7354, Average Regression Loss 2.4653, Average Classification Loss: 5.2701
2018-12-01 12:38:51,741 - root - INFO - Epoch: 0, Step: 300, Average Loss: 7.1205, Average Regression Loss 2.2209, Average Classification Loss: 4.8996
2018-12-01 12:39:10,253 - root - INFO - Epoch: 0, Step: 400, Average Loss: 6.8956, Average Regression Loss 2.1017, Average Classification Loss: 4.7939
2018-12-01 12:39:27,837 - root - INFO - Epoch: 0, Step: 500, Average Loss: 6.6482, Average Regression Loss 1.9754, Average Classification Loss: 4.6728
2018-12-01 12:39:45,364 - root - INFO - Epoch: 0, Step: 600, Average Loss: 6.5128, Average Regression Loss 1.8923, Average Classification Loss: 4.6204
2018-12-01 12:40:18,564 - root - INFO - Epoch: 0, Validation Loss: inf, Validation Regression Loss inf, Validation Classification Loss: 10.0192

RuntimeError: merge_sort: failed to synchronize: an illegal memory access was encountered

I tried to train vgg16-ssd model without base_net.
python3 train_ssd.py --datasets ~/data/VOC0712/VOC2007/ ~/data/VOC0712/VOC2012/ --validation_dataset ~/data/VOC0712/VOC2007/ --net vgg16-ssd --batch_size 12 --num_epochs 200 --scheduler cosine --lr 0.01 --t_max 200
The environment is
Python 3.6(virtualenv)
Pytorch 0.4.1
CUDA 9.0
cuDNN v7.1

The whole error message is below:

2018-10-25 15:42:22,203 - root - INFO - Namespace(balance_data=False, base_net=None, >base_net_lr=None, batch_size=12, checkpoint_folder='models/', dataset_type='voc', datasets=['/home/han/data/VOC0712/VOC2007/', '/home/han/data/VOC0712/VOC2012/'], debug_steps=100, >extra_layers_lr=None, freeze_base_net=False, freeze_net=False, gamma=0.1, lr=0.01, >milestones='80,100', momentum=0.9, net='vgg16-ssd', num_epochs=200, num_workers=4, >pretrained_ssd=None, resume=None, scheduler='cosine', t_max=200.0, use_cuda=True, >validation_dataset='/home/han/data/VOC0712/VOC2007/', validation_epochs=5, weight_decay=0.0005)
2018-10-25 15:42:22,203 - root - INFO - Prepare training datasets.
2018-10-25 15:42:22,206 - root - INFO - Stored labels into file models/voc-model-labels.txt.
2018-10-25 15:42:22,206 - root - INFO - Train dataset size: 16551
2018-10-25 15:42:22,206 - root - INFO - Prepare Validation datasets.
2018-10-25 15:42:22,207 - root - INFO - validation dataset size: 4952
2018-10-25 15:42:22,207 - root - INFO - Build network.
2018-10-25 15:42:22,354 - root - INFO - Took 0.00 seconds to load the model.
2018-10-25 15:42:23,800 - root - INFO - Learning rate: 0.01, Base net learning rate: 0.01, Extra Layers learning rate: 0.01.
2018-10-25 15:42:23,800 - root - INFO - Uses CosineAnnealingLR scheduler.
2018-10-25 15:42:23,800 - root - INFO - Start training from epoch 0.
/home/han/virtualenv/py36/lib/python3.6/site-packages/torch/nn/functional.py:52: UserWarning: >size_average and reduce args will be deprecated, please use reduction='sum' instead.
warnings.warn(warning.format(ret))
Traceback (most recent call last):
File "train_ssd.py", line 309, in
device=DEVICE, debug_steps=args.debug_steps, epoch=epoch)
File "train_ssd.py", line 114, in train
regression_loss, classification_loss = criterion(confidence, locations, labels, boxes) # TODO CHANGE BOXES
File "/home/han/virtualenv/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/han/study/vb_ssd_2/vision/nn/multibox_loss.py", line 38, in forward
mask = box_utils.hard_negative_mining(loss, labels, self.neg_pos_ratio)
File "/home/han/study/vb_ssd_2/vision/utils/box_utils.py", line 201, in hard_negative_mining
_, indexes = loss.sort(dim=1, descending=True)
RuntimeError: merge_sort: failed to synchronize: an illegal memory access was encountered

Class Imbalance in Open Images Dataset

the Open Images Dataset has severe class imbalance - the label Person has 800,000 images, while some other images have 500 images. If i select 10 labels with different no. of images in each label, what would be the best way to address this imbalance?

thanks,

num_priors mismatch when calculating loss

I've tried a simple train process on my Ubuntu 16.04 PC, but when calculating the loss, it seems variables (confidence, locations, labels, boxes) have different 'num_priors' and cannot calculate the loss. I've set 'target_transform' parameter on and used the VOC07 dataset.

The pytorch version is 0.4.1 and the python version is 3.5.2. (I have modified the 'pathlib' format to 'os.path' format since pathlib is supported only after 3.6)

The Pycharm debugging process's screenshot is below (basenet is vgg16 mentioned in the project):
pytorch-ssd-question
Thanks!

fpn-ssd-mbv1

Hi:
I find that the fpn-ssd-mbv1's implement isn't same as tensorflow model,and your implement can improve the Ap?
tensorflow model:c3,c4,c5 from backbone,P5,P6 from extra.

evaluation of ssd_mobilenetv1_coco

I'm trying to import the ssd_mobilenetv1_coco_2018, converting it from Tensotflow (.pb) to pytorch (.pth). After the conversion, I wanted to evaluate it with the webcam input but I noticed that there is a mismatch between some layer settings in the SSD class and the pretrained model corresponding to the last EXTRA conv layers/classification_headers/regression_headers.

I had to edit your code in the create_mobilenetv1_ssd like this


extras = ModuleList([
        Sequential(
            Conv2d(in_channels=1024, out_channels=256, kernel_size=1),
            ReLU(),
            Conv2d(in_channels=256, out_channels=512, kernel_size=3, stride=2, padding=1),
            ReLU()
        ),
        Sequential(
            Conv2d(in_channels=512, out_channels=128, kernel_size=1),
            ReLU(),
            Conv2d(in_channels=128, out_channels=256, kernel_size=3, stride=2, padding=1),
            ReLU()
        ),
        Sequential(
            Conv2d(in_channels=256, out_channels=128, kernel_size=1),
            ReLU(),
            Conv2d(in_channels=128, out_channels=256, kernel_size=3, stride=2, padding=1),
            ReLU()
        ),
        Sequential(
            Conv2d(in_channels=256, out_channels=64, kernel_size=1),
            ReLU(),
            Conv2d(in_channels=64, out_channels=128, kernel_size=3, stride=2, padding=1),
            ReLU()
        )
    ])

    regression_headers = ModuleList([
        Conv2d(in_channels=512, out_channels=3 * 4, kernel_size=1, padding=1),
        Conv2d(in_channels=1024, out_channels=6 * 4, kernel_size=1, padding=1),
        Conv2d(in_channels=512, out_channels=6 * 4, kernel_size=1, padding=1),
        Conv2d(in_channels=256, out_channels=6 * 4, kernel_size=1, padding=1),
        Conv2d(in_channels=256, out_channels=6 * 4, kernel_size=1, padding=1),
        Conv2d(in_channels=128, out_channels=6 * 4, kernel_size=1, padding=1), 
    ])

    classification_headers = ModuleList([
        Conv2d(in_channels=512, out_channels=3 * num_classes, kernel_size=1, padding=1),
        Conv2d(in_channels=1024, out_channels=6 * num_classes, kernel_size=1, padding=1),
        Conv2d(in_channels=512, out_channels=6 * num_classes, kernel_size=1, padding=1),
        Conv2d(in_channels=256, out_channels=6 * num_classes, kernel_size=1, padding=1),
        Conv2d(in_channels=256, out_channels=6 * num_classes, kernel_size=1, padding=1),
        Conv2d(in_channels=128, out_channels=6 * num_classes, kernel_size=1, padding=1), 
    ])

This caused the execution of run_converted_pytorch_ssd_live_demo.py to crash with this error:
RuntimeError: The size of tensor a (2781) must match the size of tensor b (3000) at non-singleton dimension 1

Is it possible that the SSD mobilenet architecture has been modified in time and some new adjustments have to be made in order to keep the code correct? Or it's just me that I'm missing something?

Thanks

"AttributeError: 'collections.OrderedDict' object has no attribute 'state_dict"

hi
I am using mmdnn to convert your net to other platform.

I can successfully run the restnet18 and restnet34 downloaded by “mmdownload”
but error occurs converting my own model:
File “/usr/local/lib/python3.5/dist-packages/torch/jit/init.py”, line 259, in _unique_state_dict
state_dict = module.state_dict(keep_vars=keep_vars)
AttributeError: ‘collections.OrderedDict’ object has no attribute ‘state_dict’

So maybe the environment is OK,There must be something wrong with my pth ,In my python code :
net.load(model_path)
torch.save(net.state_dict(),‘mb_v1_ssd.pth’)

here, model_path is a pth path(pth file) pretrained by others

Both the model_path’s pth and the mb_v1_ssd.pth output the same error
“AttributeError: ‘collections.OrderedDict’ object has no attribute 'state_dict”

because mmdnn asks a full model saved.
then I did this :
torch.save(net,'mb_v1_ssd_all.pth')
this operation make an error in your code:
File "/home/hong/anaconda3/envs/PyTorch1.0_GPU_ONNX/lib/python3.7/site-packages/torch/serialization.py", line 291, in _save
pickler.dump(obj)
TypeError: can't pickle module objects

Could someone help?

Exploding gradient problem

I have tried training on open image dataset but i kept getting exploding gradient with the bounding box after 1 epoch
photo_2019-04-22_01-54-36

Any ideas on how to fix them?

Error converting ONNX to caffe2

Hi,

the conversion of the model to caffe2 via ONNX does not work with Pytorch 1.0.1. The initial script converts the model to ONNX, but the conversion to caffe2 fails, as
init_net, predict_net = c2.onnx_graph_to_caffe2_net(model)
results in the Segmentation fault (core dumped) error.

I aslo tried to adapt a script from https://pytorch.org/tutorials/advanced/super_resolution_with_caffe2.html :

import onnx
import caffe2.python.onnx.backend as onnx_caffe2_backend
from caffe2.python.predictor import mobile_exporter

model = onnx.load('models/mb1-ssd.onnx')
prepared_backend = onnx_caffe2_backend.prepare(model)
c2_workspace = prepared_backend.workspace
c2_model = prepared_backend.predict_net
init_net, predict_net = mobile_exporter.Export(c2_workspace, c2_model, c2_model.external_input)

but it also fails:

...anaconda3/lib/python3.6/site-packages/caffe2/python/predictor/mobile_exporter.py", line 41, in add_tensor
    kTypeNameMapper[blob.dtype],
KeyError: dtype('float64')

Error syntax when I try to train model

Hello,

I try to use the depo ad to run the training part. When I start the command I get that error:

python train_ssd.py --datasets ~/data/VOC0712/VOC2007/ ~/data/VOC0712/VOC2012/ --validation_dataset ~/data/VOC0712/test/VOC2007/ --net mobilenet-v1-ssd --base_net models/mobilenet_v1_with_relu_69_5.pth  --batch_size 24 --num_epochs 200 --scheduler cosine --lr 0.01 --t_max 200
  File "train_ssd.py", line 108
    f"Epoch: {epoch}, Step: {i}, " +
                                 ^
SyntaxError: invalid syntax

I also try to use the project on Windows 10 . I get that error:

`
$ python train_ssd.py --datasets ./vision/datasets/  --validation_dataset ./vision/datasets/ --net mobilenet-v1-ssd --base_net ./models/mobilenet_v1_with_relu_69_5.pth  --batch_size 24 --num_epochs 200 --scheduler cosine --lr 0.01 --t_max 200
2018-08-03 19:37:51,983 - root - INFO - Build network.
2018-08-03 19:37:51,983 - root - INFO - Namespace(base_net='./models/mobilenet_v1_with_relu_69_5.pth', batch_size=24, checkpoint_folder='models/', datasets=['./vision/datasets/'], debug_steps=100, gamma=0.1, lr=0.01, milestones='80,100', momentum=0.9, net='mobilenet-v1-ssd', num_epochs=200, num_workers=4, resume=None, scheduler='cosine', t_max=200.0, use_cuda=True, validation_dataset='./vision/datasets/', validation_epochs=5, weight_decay=0.0005)
2018-08-03 19:37:52,078 - root - INFO - Init from base net ./models/mobilenet_v1_with_relu_69_5.pth
2018-08-03 19:37:52,129 - root - INFO - Took 0.05 seconds to load the model.
2018-08-03 19:37:52,129 - root - INFO - Uses CosineAnnealingLR scheduler.
dataset_path ./vision/datasets/
dataset_path
[<vision.datasets.voc_dataset.VOCDataset object at 0x00000158B5502208>]
2018-08-03 19:37:52,132 - root - INFO - Train dataset size: 5011
2018-08-03 19:37:52,135 - root - INFO - validation dataset size: 4952
2018-08-03 19:37:52,135 - root - INFO - Start training from epoch 0.
2018-08-03 19:37:52,135 - root - INFO - Start training from epoch 201.
loader
<torch.utils.data.dataloader.DataLoader object at 0x00000158B5502278>
Traceback (most recent call last):
  File "train_ssd.py", line 231, in <module>
    device=DEVICE, debug_steps=args.debug_steps, epoch=epoch)
  File "train_ssd.py", line 90, in train
    for i, data in enumerate(loader):
  File "C:\Users\Xavier\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 501, in __iter__
    return _DataLoaderIter(self)
  File "C:\Users\Xavier\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 289, in __init__
    w.start()
  File "C:\Users\Xavier\AppData\Local\Programs\Python\Python36\lib\multiprocessing\process.py", line 105, in start
    self._popen = self._Popen(self)
  File "C:\Users\Xavier\AppData\Local\Programs\Python\Python36\lib\multiprocessing\context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Users\Xavier\AppData\Local\Programs\Python\Python36\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "C:\Users\Xavier\AppData\Local\Programs\Python\Python36\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\Users\Xavier\AppData\Local\Programs\Python\Python36\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'TrainAugmentation.__init__.<locals>.<lambda>'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\Xavier\AppData\Local\Programs\Python\Python36\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "C:\Users\Xavier\AppData\Local\Programs\Python\Python36\lib\multiprocessing\spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

`

MobileNetv2 enchancement

Thanks for sharing this wonderful project, however, I just wonder if there is any plan to add support of mobilenetv2 backend support SSD? if so, I can provide you a simple MobileNetv2 net scripts in pytorch.

RuntimeError when running live demo on pytorch

I've installed Pytorch 0.4.1 CUDA version and ran this code
(CUDA 9.0 with cuDNN v7.1, GTX Titan X)

python3 run_ssd_live_demo.py mb1-ssd models/mobilenet-v1-ssd-mp-0_675.pth models/voc-model-labels.txt

and it prints out

Traceback (most recent call last):
File "run_ssd_live_demo.py", line 62, in
boxes, labels, probs = predictor.predict(image, 10, 0.4)
File "/home/han/study/pytorch-ssd/vision/ssd/predictor.py", line 37, in predict
scores, boxes = self.net.forward(images)
File "/home/han/study/pytorch-ssd/vision/ssd/ssd.py", line 77, in forward
locations, self.priors, self.config.center_variance, self.config.size_variance
File "/home/han/study/pytorch-ssd/vision/utils/box_utils.py", line 104, in convert_locations_to_boxes
locations[..., :2] * center_variance * priors[..., 2:] + priors[..., :2],
RuntimeError: Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument #2 'other'

I think the problem is network on CPU, not on GPU. So tired to make network on GPU to use
net = net.cuda() but same error is occured.

My only difference is Python 3.5 and uses ww pip binary to avoid F-string syntax error on Python 3.5.

Help me guyz.

Accuracy reported is different from accuracy through reproduced models

I notice that when I run the vgg and mobilenet v2 models that you have already trained I get very similar accuracy to what you've reported, but when I try to reproduce those models through the method you suggest, I get significantly lower (20%) accuracy. Do you have any suggestions about how to improve that?

download openimages

Hi! First of all thank you for this working implementation of SSD in pytorch.

I want to retrain ssd with open image ,so I tried to use open_images_downloader.py to download the datasets.Like this.
python3 open_images_downloader.py --root ./data/open_images --class_names "Handgun,Shotgun" --num_workers 10
I can get all the csv file, but can't get the image "Handgun,Shotgun"
so I debug the code
and the code stop at here

s3.download_file(bucket, src, dest)

and the arg src input like this 'train/0000271195f2c007.jpg'
the arg dest input like this './data/open_images/train/0000271195f2c007.jpg'
but still cannot download image
Thanks in advance!

training on coco dataset

has anyone ran into issue training on CocoDataset , I have modified on line 220:

elif args.dataset_type == 'coco':
221 # Image preprocessing, normalization for the pretrained resnet
222 transform = transforms.Compose([
223 transforms.RandomCrop(224),
224 transforms.RandomHorizontalFlip(),
225 transforms.ToTensor(),
226 transforms.Normalize((0.485, 0.456, 0.406),
227 (0.229, 0.224, 0.225))])
228 # Load vocabulary wrapper
229 with open('/home/ubuntu/data/coco/vocab.pkl', 'rb') as f:
230 vocab = pickle.load(f)
231 dataset = CocoDataset(dataset_path, json='/home/ubuntu/data/coco/annotations/captions_train2014.json', vocab=vocab,transform=transform)
232 label_file = '/home/ubuntu/data/coco-labels-file.txt'

when i tried to run

337 train(train_loader, net, criterion, optimizer,
338 device=DEVICE, debug_steps=args.debug_steps, epoch=epoch)

got the elements mistach

File "train_ssd.py", line 121, in train
images, boxes, labels = data
ValueError: not enough values to unpack (expected 3, got 2)

tensor data[0] has shape (1,3,224,224)
tensor data[1] has shape (1,15)

i think we are missing data[2]

Any insight is helpful, thank you in advance.

Converting pytorch model

Hi,
Can you share the version of pytorch, ONNX you are using.
I am facing Segfault isue when import pytorch and caffe2 onnx at a time

inference speed

what's the inference speed of MobileNet v2 ssd on CPU approximately? The paper says 200ms, but i tested to be around 1 sec per image. Is there any way to speed up the inference?

Error in retrain mb2-ssd-lite on open_images

I ran into a problem when try to retrain the mb2-ssd-lite model with open images data set.

The command I used:
python train_ssd.py --dataset_type open_images --datasets ~/data/open_images --net mb2-ssd-lite --pretrained_ssd models/mb2-ssd-lite-mp-0_686.pth --scheduler cosine --lr 0.01 --t_max 100 --validation_epochs 5 --num_epochs 5 --base_net_lr 0.001 --batch_size

Error appeared after finishing the Epoch 0:

`2019-01-07 15:16:11,054 - root - INFO - Epoch: 0, Step: 2100, Average Loss: 5.7228, Average Regression Loss 3.4281, Average Classification Loss: 2.2946
2019-01-07 15:16:23,727 - root - INFO - Epoch: 0, Step: 2200, Average Loss: 5.5212, Average Regression Loss 3.2310, Average Classification Loss: 2.2902
2019-01-07 15:16:36,075 - root - INFO - Epoch: 0, Step: 2300, Average Loss: 5.7986, Average

Regression Loss 3.5029, Average Classification Loss: 2.2956
Traceback (most recent call last):
File "train_ssd.py", line 320, in
device=DEVICE, debug_steps=args.debug_steps, epoch=epoch)
File "train_ssd.py", line 123, in train
confidence, locations = net(images)
File "/home/hoatruong/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/hoatruong/pytorch-ssd/vision/ssd/ssd.py", line 81, in forward
x = layer(x)
File "/home/hoatruong/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/hoatruong/pytorch-ssd/vision/nn/mobilenet_v2.py", line 101, in forward
return self.conv(x)
File "/home/hoatruong/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/hoatruong/anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/hoatruong/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/hoatruong/anaconda3/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 67, in forward
exponential_average_factor, self.eps)
File "/home/hoatruong/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 1346, in batch_norm
raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))
ValueError: Expected more than 1 value per channel when training, got input size [1, 64, 1, 1]`

I have no problem at all with the mb1 model. Can someone help me on this problem ? Thank you in advance

val_dataset in train_ssd.py

Hey in train_ssd.py,

val_dataset = OpenImagesDataset(dataset_path,
transform=test_transform, target_transform=target_transform,
dataset_type="test")

shouldn't it be this instead?

val_dataset = OpenImagesDataset(dataset_path,
transform=test_transform, target_transform=target_transform,
dataset_type="validation")

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.