Giter Site home page Giter Site logo

ssd_tensorflow_voc's Introduction

Self-Driving Car Engineer Nanodegree

Computer Vision/Deep Learning

Side Project: PASCAL VOC Object Recognition and Detection

Levin Jian, June 2017

Overview

PASCAL VOC is a publicly available benchmark dataset used for object recognition and detection. There are about 17k images in the dataset (VOC 2007 and VOC 2012), and contains 20 labelled classes like person, car, cat, bottle, bicycle, sheep, sofa, and etc. The detector we develooped can be used to determine what kind of objects an image contains, and where those objects are.

We used the excellent work from here as our baseline. The baseline successfully converted the original SSD detector from caffe implementation to tensorflow implementation. The goal of our project is to focus on the trainig part of the problem. Specifically, We load the VGG16 weights trained from ImageNET into our VGG 16 part of SSD model, train SSD modle on PASCAL VOC training dataset (VOC 2007 train_eval and VOC 2012 train_eval), and evaluat SSD model on PASCAL VOC test dataset (VOC 2007 test). Evaluation metric is mAP.

Techncially, tensorflow and slim are used as the neural network framework, and all the development is done in Python.

Final Result

Our SSD detecotrs achieves 0.65 mAP accuracy on VOC 2007 test dataset, at the speed of 8 frames/second. Below are a few examples of detection outputs.

two person and one bottom two_cars

Here is the training/evaluation chart,

train_eval

And here the loss chart.

total_loss

Model Architecture

The core of Signle Shot MultiBox Detecotr is predicting category scores and bounding boxes offsets for a fixed set of default boxes using small convolutional filters applied at features maps. For details, please refer to the original paper

Here is the model architecture for SSD. Excluding pooling, batch normalization, and dropout layers, there are 23 layers in all. Specifically, 13 VGG CNN feature layers, and 10 SSD specific detection layers.

model_architecture

For some of the top layers in SSD architecture, specifically, conv4,conv7,conv8,conv9,conv10,conv11, each spatial location (3x3 region) will be used to predict a fixed set of default boxes, including which classes these default boxes belong to and how much offsets these default boxes are relative to true position of the objects. There are 8732 default boxes in all.

SSD only needs an input image and ground truth boxes for each object during training. Through our matching strategy, each of the default boxes will be assigned as a class label. If they are assigned as background class, we call them negative samples. If they are assigned as non-background class, we call them positive samples. For positive samples, we will also assign bounding boxes offsets (the offsets between default box and ground truth box). Our loss function is the summary of classification loss and location loss for these samples.

Here is an example of how some of defalut boxes are assinged as positive samples.

default_boxes

Database

As mentioend earlier, VOC 2007 and VOC 2012 datasets are used in this projet, and they can be downloaded from the web.

wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar

After downloading above datasets, we use pascalvoc_to_tfrecords.py scipt under datasets folder to convert the data into TF Records format, which will be used by our training scritp.

For example, to generate voc_train_2007 TF record files, uncomment below code lines in pascalvoc_to_tfrecords.py

#dataset_dir = "../../data/voc/2007_train/VOCdevkit/VOC2007/"
#output_dir = "../../data/voc/tfrecords/"
#name='voc_train_2007'

comment below code lines

dataset_dir = "../../data/voc/2007_test/VOCdevkit/VOC2007/"
output_dir = "../../data/voc/tfrecords/"
name='voc_test_2007'

And then execute python pascalvoc_to_tfrecords.py command in terminal. Note that you might have to modify the dataset_dir and output_dir based on where you put your dataset file, and where you want the TF records to be saved.

Training

Training Strategy

I conducted the training in mainly two phases:

  1. Overfitting the training dataset
    During this phase, we mainly focus on getting high accuracy on training dataset. The purpose is to make sure that the data preparation, the model architecture, the loss function, the optimizer, the evaluation are all properly setup. The highest training mAP obtained is 0.98, This proves that, after some improvements over the baseline model implementation, our model is trainable and can converge well.
  2. Improve result over test dataset
    During this phase, we mainly focus on improving test accuracy, by means of experimentig over optimiser, batch normalization, data preparation, batch normalization, dropout and etc.

Experimentations

With the goal of improving testing accuracy, we conducted experimentations over various impacting aspects of the training.

Loss Function

A few improvements are made over baseline model so that our model implementation are consistent with the original paper.

a) strict smooth L1 loss implementation

When the regression targets are unbounded as it is the case in this project, training with L2 loss would require careful tuning of learning rates in order to prevent exploding gradients. A strict implementation of L1 loss can reduce this risk.

b) Matching strategy adjustment

As mentioned earlier, we assign each and every default box with class label and location offset based on ground truth bounding boxes. This is done by matching strategy.

In the baseline model, A default box is matched with a grounding truth bounding box if its jaccard overlap is bigger than 0.5. This has a potential problem. For some ground truth box, it might happen that its jaccard overlap with all default box are less than 0.5, as a result, these ground truth box will not be assigned to any default box, this might not be good from the perspective of training.

In our model, we correct this by strictly following the matching strategy presented in the original paper. That is, we first match each ground truth box with a default box which has biggest jaccard overlap, and then we assign default box to ground truth box which has jaccard overlap bigger than 0.5.

c) Out of bounds bounding box handling In this project, we used raddom sampling data augmentation. We randomly crop a small region of the original image to serve as our training image. As a result, ground truth box needs to be adjusted.

In the baseline, the adjustment of the ground truth bboxes is a bit inappropriate in that some of the ground truth box are out of the bounds (less than 0, or bigger than 1). Intuitively, this might make sense, but it also turns that this makes the training harder to converge. With hindsight, I think this makes training harder to converge because it’s fundamentally a harder problem in that we are required to predict the accurate position of the whole object with a partial object.

In our model, we clipped all ground truth box so that they are within [0,1] range.

Optimizer

Throughout the experimentations, Adam optimizer it’s used, as it implements both momentum update and per parameter adaptive learning rate.

We did experiment a lot with learning rate though. In the original paper, 0.001 learning rate is used to train SSD weights.

To my surprise, I find this does not work very well. The training took very long time to converge, and does not converge well at all. Later on I implemented batched normalization, and increased learning rate to 0.1, and this made a huge difference. With 0.1 learning rate, we are able to achieve the loss in about half an hour which would have taken 8 hours if 0.001 learning rate is used.

Data augmentation

Three kinds of data augmentation are used, which is the same as the baseline model, except a few relevant hyperparameters. a) flip the image horizontally
b)color distortion
Randomly change the brightness, contrast , hue and saturation of the image.
c)patch sampling

Batch normalization

Batch normalization layers are added to the baseline model. They allowed us to use bigger learning rate and drastically reduced training time.

Drop out

Drop out is also experimented in this project since we saw a large gap between training accuracy and testing accuracy. It turned out that dropout does narrow the gap between train and test accuracy, but it also dampen the training accuracy a lot. At the end, we end up with roughly the same test accuracy with or without dropout.

Training summary

Training experimentation and progress are logged in history/notes.txt file. Below are a quick summary:

  1. Fix bugs in baseline made the model converge
  2. Batch normalization and bigge learning rate made a huge difference. traning accuray from 0.8 to 0.98
  3. Data augmenation is very effective in improving testing accuray, from 0.5 to 0.65

Known limitations

Current implementation can do a decent detection job. but its performance can be further improved on some images, like below,

many_people

In the original paper, the test accuracy is 0.69. If we could push our current test accurcy from 0.65 to 0.69 or higher, we should be able to have better detection result. I think the key should lie in how we perform data augmentation,

  1. Replicate the reference data augmentation implementation as much as possible
    Our current implementation already try to closely follow the instruction of the original paper regarding data augmenation. But we implemented with python and tensorflow, while the original paper implemented with c++, caffe and opencv. There should be some difference between the two implementations that is causeing the test accuracy gap.
  2. Add zoom out operation
    SSD is known to have relatively poor performance on detecitng small objects (like bottle,pottedplant), as also confirmed by our SSD implementation. So one idea to improve is to add more small objects to training data by performing zoom out operation during data augmentation. bottle_accuracy

Requried library

  • Python 3.5
  • Tensorflow 1.0.0

Instructions for running the scripts

The training took about 58 hours on a Nvidia GTX 1080 GPU.

Train SSD specific weights

run python ./train_model.py with below setting

self.max_number_of_steps = 30000
self.learning_rate = 0.1
self.fine_tune_vgg16 = False

Train VGG16 ad SSD specific weights

1). Run python ./train_model.py  with below setting

Before you run the ./train_model.py script, you will have to download the vgg16 pretrained weigths from here to a local folder, change the setting of self.checkpoint_path if necessary.

self.checkpoint_path = '../data/trained_models/vgg16/vgg_16.ckpt'
self.fine_tune_vgg16 = True
self.max_number_of_steps = 900000
self.learning_rate=0.01

2). Run python ./train_model.py with below setting

self.fine_tune_vgg16 = True
self.max_number_of_steps = 1100000
self.learning_rate=0.001

3). Run python ./train_model.py with below setting

self.fine_tune_vgg16 = True
self.max_number_of_steps = 1200000
self.learning_rate=0.0005

Get both train and evaluation accuracy

1). Run python ./run_all_checkpoints.py with below settings

min_step = 100
step = 10000

2). Run python ./run_all_checkpoints.py -f with below settings

min_step = 30000
step = 10000

ssd_tensorflow_voc's People

Contributors

levinj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

ssd_tensorflow_voc's Issues

Deactivate patch sampling

When I comment the distorted bounding box crop, than the net will give me an error, due to nans and infs. I assume that I get invalid bounding box sizes (< 0 or > 1). But with the distorted bounding box crop everything works fine.
I double checked my custom dataset many times. When creating tfrecords I resize every box to be in range [0,1] and also with your provided scripts for checking the tfrecords, I don't have any values for bboxes out of range.

Does someone has the same problems? How can I deactivate the distorted bounding box crop? Cross Entropy for Positives is not converging, localization converges very slowly. Deactivating color distortion helped already, but I need to deactivate the cropping!

Another problem I have with the cropping is, when I obtain the distorted bbox from the RGB image, I slice the RGB image and I also want to slice a grayscale image with the same distorted bbox, too. But sometimes I cannot slice the grayscale image with the distorted bbox I got from the RGB due to non matching tensor shapes (both images have the same sizes). Is this because RGB has 3 channels and the other just 1 channel?

Any help would be awesome.

how to train SSD_512x512?

hello,
i wanna train my own data with 512X512, but i got a error ''ValueError: Dimension 0 in both shapes must be equal, but are 788992 and 279424 for 'ssd_losses/Select' (op: 'Select') with input shapes: [279424], [788992], [279424]. '', caused by ssd.py line 904 nvalues = tf.where(nmask, predictions[:, 0], 1. - fnmask) ,i tried to Modify the source code,but failed! Could you give me some suggestion? thank you.

Visualize image with bb in evaluation script

Hi Levin,

do you know how to visualize the image with bounding boxes during evaluation? I would like to record them in Tensorboard, too. But I couldn't figure out how to do it.
Do you have any idea?

Cheers

why mAP is too high?

I only use VOC2007 data train the SSD, but after 30000 first traing, the mAP on test dataset is 70%, and when fine-tuning all all parameters with 70000 steps, the mAP on test dataset is 81%, it's too high, and I don't know why?

Sorry, I don't find the evalution process is done on the train and test dataset, the 81% is the train result.

error running eval_model and run_all_checkpoints

Hi! Levin, I tried running eval_model.py with no modification, it detects the latest checkpoint file, but then throws:


Traceback (most recent call last):
  File "/home/twk/prj/ssd_levin_modify/evaluate_model.py", line 159, in <module>
    obj.run()
  File "/home/twk/prj/ssd_levin_modify/evaluate_model.py", line 148, in run
    self.__setup_eval()
  File "/home/twk/prj/ssd_levin_modify/evaluate_model.py", line 88, in __setup_eval
    variables_to_restore=variables_to_restore)
  File "/home/twk/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/contrib/slim/python/slim/evaluation.py", line 207, in evaluate_once
    config=session_config)
  File "/home/twk/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/evaluation.py", line 182, in _evaluate_once
    eval_step_value = _get_latest_eval_step_value(eval_ops)
  File "/home/twk/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/evaluation.py", line 75, in _get_latest_eval_step_value
    with ops.control_dependencies(update_ops):
  File "/home/twk/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 4304, in control_dependencies
    return get_default_graph().control_dependencies(control_inputs)
  File "/home/twk/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 4017, in control_dependencies
    c = self.as_graph_element(c)
  File "/home/twk/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3035, in as_graph_element
    return self._as_graph_element_locked(obj, allow_tensor, allow_operation)
  File "/home/twk/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3124, in _as_graph_element_locked
    types_str))
TypeError: Can not convert a tuple into a Tensor or Operation.

training does not converge

With VGG16 weights trained from ImageNET, train SSD on VOC 07 trainval dataset, the training does not converge on location loss and cross entropy loss of positive bounding box, though training on cross entropy loss of negative bounding box does seem to converge, as seen in below loss chart.
screenshot from 2017-04-25 06-42-56

Hard negative mining – why only look at first class?

In ssd.py, lines 904-906 there is:

nvalues = tf.where(nmask,
                    predictions[:, 0],
                    1. - fnmask)

nvalues is then used to pick, in the hard mining routine, the k hardest negative examples:

val, idxes = tf.nn.top_k(-nvalues_flat, k=n_neg)

I don't understand why we only take into account the first class in predictions (predictions[:, 0]). Shouldn't we take the hardest mistakes across all classes?

mAP is too low but detect objects well with trained ckpt

Hi @LevinJ ,I apply SSD_tensorflow_VOC to my own datasets.I first Train SSD specific weights with self.max_number_of_steps = 10000,then Train VGG16 ad SSD specific weights with self.max_number_of_steps = 900000.First step has finished and second step has reached 60000.My loss is around 1.8 ,training mAP is 0.18 and testing mAP is 0.17. However,when I use trained ckpt to detect objects in testing pictures,it does well! So I go to your codes and website https://sanchom.wordpress.com/tag/average-precision/ to learn how mAP is computed. I don't find anything wrong. I'm quite confused. The testing results with trained ckpt don't match the mAP with 0.17.

Training Instructions

I'm trying to train the model on Pascal VOC.
I downloaded the dataset and converted them into TFRecords following the instructions on the README.

However I'm a little confused about how to train. My tfrecords files are in the ./datasets folder.
My ./logs folder has a fine-tune subfolder which is empty with the exception of some events.out.tfevent files that get created every time I attempt to train.

I'm not sure what I'm missing. Am I supposed to have some retrained ImageNets weights somewhere?

Any help is appreciated!

Here is my error message:

$ python ./train_model.py
INFO:tensorflow:Fine-tuning from None
INFO:tensorflow:Restoring parameters from None
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InternalError'>, Unable to get element from the feed as bytes.
Traceback (most recent call last):
  File "./train_model.py", line 435, in <module>
    obj.run()
  File "./train_model.py", line 427, in run
    self.__start_training()
  File "./train_model.py", line 222, in __start_training
    save_interval_secs=self.save_interval_secs)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/contrib/slim/python/slim/learning.py", line 732, in train
    master, start_standard_services=False, config=session_config) as sess:
  File "/usr/local/Cellar/python/2.7.13_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 964, in managed_session
    self.stop(close_summary_writer=close_summary_writer)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 792, in stop
    stop_grace_period_secs=self._stop_grace_secs)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 389, in join
    six.reraise(*self._exc_info_to_raise)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 953, in managed_session
    start_standard_services=start_standard_services)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 708, in prepare_or_wait_for_session
    init_feed_dict=self._init_feed_dict, init_fn=self._init_fn)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/session_manager.py", line 281, in prepare_session
    init_fn(sess)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/contrib/framework/python/ops/variables.py", line 654, in callback
    saver.restore(session, model_path)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1548, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 789, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 997, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Unable to get element from the feed as bytes.
ERROR:tensorflow:==================================
Object was never used (type <class 'tensorflow.python.framework.ops.Tensor'>):
<tf.Tensor 'init_ops/report_uninitialized_variables/boolean_mask/Gather:0' shape=(?,) dtype=string>
If you want to mark it as used call its "mark_used()" method.
It was originally created here:
['File "./train_model.py", line 435, in <module>\n    obj.run()', 'File "./train_model.py", line 427, in run\n    self.__start_training()', 'File "./train_model.py", line 222, in __start_training\n    save_interval_secs=self.save_interval_secs)', 'File "/usr/local/lib/python2.7/site-packages/tensorflow/contrib/slim/python/slim/learning.py", line 655, in train\n    ready_op = tf_variables.report_uninitialized_variables()', 'File "/usr/local/lib/python2.7/site-packages/tensorflow/python/util/tf_should_use.py", line 170, in wrapped\n    return _add_should_use_warning(fn(*args, **kwargs))', 'File "/usr/local/lib/python2.7/site-packages/tensorflow/python/util/tf_should_use.py", line 139, in _add_should_use_warning\n    wrapped = TFShouldUseWarningWrapper(x)', 'File "/usr/local/lib/python2.7/site-packages/tensorflow/python/util/tf_should_use.py", line 96, in __init__\n    stack = [s.strip() for s in traceback.format_stack()]']
==================================

ohem

Your project very good, Do you consider joining OHEM in your project

Which folder should my file "vgg_16.ckpt" be placed in?

My error report :

_**INFO:tensorflow:Error reported to Coordinator: <class 'ValueError'>, Can't load save_path when it is None.
Traceback (most recent call last):
File "D:/ssd_tensorflow_LevinJ/train_model.py", line 435, in
obj.run()
File "D:/ssd_tensorflow_LevinJ/train_model.py", line 429, in run
self.__start_training()
File "D:/ssd_tensorflow_LevinJ/train_model.py", line 223, in __start_training
save_interval_secs=self.save_interval_secs)
File "D:\FaceDlib\lib\site-packages\tensorflow\contrib\slim\python\slim\learning.py", line 748, in train
master, start_standard_services=False, config=session_config) as sess:
File "D:\FaceDlib\lib\contextlib.py", line 81, in enter
return next(self.gen)
File "D:\FaceDlib\lib\site-packages\tensorflow\python\training\supervisor.py", line 1004, in managed_session
self.stop(close_summary_writer=close_summary_writer)
File "D:\FaceDlib\lib\site-packages\tensorflow\python\training\supervisor.py", line 832, in stop
ignore_live_threads=ignore_live_threads)
File "D:\FaceDlib\lib\site-packages\tensorflow\python\training\coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "D:\FaceDlib\lib\site-packages\six.py", line 693, in reraise
raise value
File "D:\FaceDlib\lib\site-packages\tensorflow\python\training\supervisor.py", line 993, in managed_session
start_standard_services=start_standard_services)
File "D:\FaceDlib\lib\site-packages\tensorflow\python\training\supervisor.py", line 730, in prepare_or_wait_for_session
init_feed_dict=self._init_feed_dict, init_fn=self._init_fn)
File "D:\FaceDlib\lib\site-packages\tensorflow\python\training\session_manager.py", line 296, in prepare_session
init_fn(sess)
File "D:\FaceDlib\lib\site-packages\tensorflow\contrib\framework\python\ops\variables.py", line 750, in callback
saver.restore(session, model_path)
File "D:\FaceDlib\lib\site-packages\tensorflow\python\training\saver.py", line 1534, in restore
raise ValueError("Can't load save_path when it is None.")
ValueError: Can't load save_path when it is None.
ERROR:tensorflow:==================================
Object was never used (type <class 'tensorflow.python.framework.ops.Tensor'>):
<tf.Tensor 'init_ops/report_uninitialized_variables/boolean_mask/GatherV2:0' shape=(?,) dtype=string>
If you want to mark it as used call its "mark_used()" method.
It was originally created here:
File "D:/ssd_tensorflow_LevinJ/train_model.py", line 435, in
obj.run() File "D:/ssd_tensorflow_LevinJ/train_model.py", line 429, in run
self.__start_training() File "D:/ssd_tensorflow_LevinJ/train_model.py", line 223, in __start_training
save_interval_secs=self.save_interval_secs) File "D:\FaceDlib\lib\site-packages\tensorflow\contrib\slim\python\slim\learning.py", line 791, in train
should_retry = True File "D:\FaceDlib\lib\site-packages\tensorflow\python\util\tf_should_use.py", line 189, in wrapped
return add_should_use_warning(fn(*args, kwargs))
==================================

which folder should my file "vgg_16.ckpt" be placed in ?
train_dir or checkpoint_path ???

can not reach 66%

Thanks for your great work. But I still have some issues needs your help:
I traian on voc07+12_train datasets for 160000 steps(train 30000+finetune 130000), and finally I found that mAP is 57% below. I dont konw why, Is my datasets(about 2.4GB) wrong?
default This is evaluted on voc07_test(trained on voc07+12_train)
Another question is that I trianed on voc07_train dataset, and got a higher mAP than trained on voc07+12_train.
default

How to get single image detections?

I am still performing the training of then ssd weights and I am yet to fine tune the vgg net. How did you obtain the single image detections(as ahown in your README file)?

Error

I have error when running training_model.py. The message is:

Traceback (most recent call last):
File "D:\Workspace\DL\SSD\train_net.py", line 281, in
obj.run()
File "D:\Workspace\DL\SSD\train_net.py", line 241, in run
image, glabels, gbboxes, gdifficults, gclasses, localizations, gscores = self.get_voc_2007_train_data()
ValueError: too many values to unpack (expected 7)

So, how is the solution? I am newbie in python/tensorflow. Thank you.

how to use the model to test some images

Hi, how to use a post-trained model (ckpt file) to predit obejcts in a image? So, the output is a window containing box(s) and a name of class. I am new to deep learning. Thank you.

why mAP is higher than paper?

I used the "ssd_300_vgg.cpkt" fine-tuned in the voc2007_train dataset, at step 57771, got mAP:
AP_VOC07/mAP[0.83292667058394565]

But in the paper, the mAP is 81.6(07+12+coco_train)。
what wrong with my data...

how to understanding the model_size?

I download the ssd_300_vgg.ckpt, it's about 100M.
But when I initialize the ssdnet randomly I get a 300M model_cpkt.
And when I restore from the ssd_300_vgg.ckpt, and finetune the network, I also get a 300M model.
How to understanding this? can you help me.

How to use multiple GPUs for training?

Great job here!

Is there a way to adapt train_model.py for training using multiple GPUs?

I am currently training the SSD512 model. On my computer with GTX1070 with 8GB of RAM, I can only set batch_size = 20. This is fine for the first training stage. At the second stage when the vgg net is fine tuned, I have to lower batch size to 10, which makes the training very unstable. Therefore I wonder if there is a way to use multiple GPUs to stablize the training process.

no update ops in tf.GraphKeys.UPDATE_OPS collection

Hi! Levin, by setting breakpoint, I found after constructing the network, tf.get_collections(tf.GraphKeys.UPDATE_OPS) returns empty list, in my experiences, that should already contain update ops of batchnorm layers, but when inspect the trained checkpoint file, the moving average values of batchnorm params had actually been updated, this is really strange, could you give some idea on this issue? Thanks !

not converge

Soory, I using this code train in voc2007 and voc2012 dataset, the total loss is 5 can't decrease。Can you tell some ways?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.