stick-to / centernet-tensorflow Goto Github PK

View Code? Open in Web Editor NEW

133.0 3.0 53.0 683 KB

CenterNet: Objects as Points in Tensorflow

License: MIT License

Python 100.00%

object-detection centernet tensorflow centernet-tensorflow

centernet-tensorflow's People

Contributors

Stargazers

Watchers

Forkers

lishuanzhu gnefihs jason4521 youngergao pbdahzou zccvip7 majunfu wolfworld6 caikw0602 dimitrius-ion zhoumaomin wq001007163com alienge crawlingd xiaowang5121 cenchaojun gzroy duzhengyu jdc08161063 benjamesbabala nuaasxr chenpengf0223 unlabeleddata dongyyyyy robotseye yangtong1989 qhapper dlh06 pruv dylanhsu niexiaokun ceciliaai yjingyu zhaoyunxi peternara codingali1 peterwkl2013 roy699 lovepan1 persuelx xjiao004 yuzhijun2 mornydew kaaicheung qiuweibin2005 kapilm26 nhatuan84 phuthu19112000 bigmai-1234 madmonn chonje cheon12 tiger1933

centernet-tensorflow's Issues

I didn't understand this line, could you help me? If the shape of the ground_truth was [-1, 5], the operation of "tf.argmin(ground_truth, axis=0)[0]" got the minimum of y？

CenterNet-tensorflow/CenterNet.py

Line 189 in 01b3188

slice_index = tf.argmin(ground_truth, axis=0)[0]

Directory structure for voc dataset

Can you please mention the dataset directory structure for the dataset used

Pose estimation

Did you handle the pose estimation in this repo ?

SparseTensor Error! Indices are not valid (out of bounds)

Hi, I try your code under tensorflow 1.4 and train on COCO dataset. I met some bugs and have no idea to solve it. Could you give me a hand?

In TF 1.4, `tf.boolean_mask doesn't support axis, so I remove it.
For the same TF version reason, I use tensor.sparse_tensor_to_dense and tf.SparseTensor.

InvalidArgumentError (see above for traceback): Indices are not valid (out of bounds). Shape: [128,128]
[[Node: get_loss/cond_649/SparseToDense = SparseToDense[T=DT_FLOAT, Tindices=DT_INT64, validate_indices=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](get_loss/cond_649/strided_slice/Switch, get_loss/cond_649/SparseToDense/Switch/_15033, get_loss/cond_649/ones_like/_15035, get_loss/cond_649/SparseToDense/default_value)]]

The codes are:
` for i in range(self.num_classes):

        i = tf.Print(i, [i], 'i')
        exist_i = tf.equal(classid, i)
        reduce_i = tf.boolean_mask(
            keyp_penalty_reduce, exist_i)
        reduce_i = tf.Print(
            reduce_i, [tf.shape(reduce_i)], 'tf.shape(reduce_i)')
        reduce_i = tf.cond(
            tf.equal(tf.shape(reduce_i)[0], 0),
            lambda: zero_like_keyp,
            lambda: tf.expand_dims(
                tf.reduce_max(reduce_i, axis=0), axis=-1)
        )
        reduction.append(reduce_i)

        gbbox_yx_i = tf.boolean_mask(gbbox_yx, exist_i)
        gbbox_yx_i = tf.Print(
            gbbox_yx_i, [tf.shape(gbbox_yx_i)], 'tf.shape(gbbox_yx_i)')
        pshape = tf.to_int64(pshape)

        gt_keypoints_i = tf.cond(
            tf.equal(tf.shape(gbbox_yx_i)[0], 0),
            lambda: zero_like_keyp,
            lambda: tf.expand_dims(tf.sparse_tensor_to_dense(tf.SparseTensor(gbbox_yx_i, tf.ones_like(
                gbbox_yx_i[..., 0], tf.float32), dense_shape=pshape), validate_indices=False), axis=-1)
            # sparse_to_dense version
        )
        gt_keypoints.append(gt_keypoints_i)`

Specific conv bias init value (-2.19) for the keypoints in author's pytorch code. Line 132 in your code.

The author use a special init value of the last keypoints conv's bias.
And he or she said it is because of the focal loss.
So do you use this setting?
I think the same layer should be in your code is Line 132 in CenterNet.py.
Line132: keypoints = self._conv_bn_activation(features, self.num_classes, 3, 1, None)

train my own data?

where is the code to train my data????????

run extremely SLOW

I just clone and modify in test.py and voc_classname_encoder.py params.
But I see it always stuck in this.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/iterator_ops.py:358: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /content/gdrive/My Drive/CenterNet-tensorflow/CenterNet.py:343: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
WARNING:tensorflow:From /content/gdrive/My Drive/CenterNet-tensorflow/CenterNet.py:332: batch_normalization (from tensorflow.python.layers.normalization) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.batch_normalization instead.
WARNING:tensorflow:From /content/gdrive/My Drive/CenterNet-tensorflow/CenterNet.py:412: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
WARNING:tensorflow:From /content/gdrive/My Drive/CenterNet-tensorflow/CenterNet.py:422: average_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.average_pooling2d instead.
WARNING:tensorflow:From /content/gdrive/My Drive/CenterNet-tensorflow/CenterNet.py:358: conv2d_transpose (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d_transpose instead.

When I use ^C, it's always message like still build in _build_graph
I install tensorflow-gpu 1.14.0
Help pls :((

What is the mean of "pad_truth_to" please?

作者你好，请问“pad_truth_to"这个参数是做什么的呢？这个参数不能小于一张图片中标记的目标数量，是吗？如果不用会影响什么呢？

多谢

OutOfRangeError (see above for traceback): End of sequence [[node IteratorGetNext (defined at G:\git\CenterNet-tensorflow\CenterNet.py:63) ]]

I met this issue when training on my own dataset. I realize that my " .tfrecord" file has size = 0 bytes on disk. I think that's the reason why my error occurred. Does anyone have the same issue, and how to fix it? Many thanks!!!!!

Focal loss

Hi,

I see that in line 249 there is:
keypoints_neg_loss = -tf.pow(1.-reduction, 4) * tf.pow(tf.sigmoid(keypoints), 2.) * (-keypoints+tf.log_sigmoid(keypoints)) * (1.-gt_keypoints)

My question is about the following part:
(-keypoints+tf.log_sigmoid(keypoints))

Should not it be:
log(1 - sigmoid(keypoints)
?

Thanks,
Ilya

How about the performance of you net？

Could you tell me about the performance of your code on VOC2007 dataset? thanks

error occur when training my own data

Nice works.
something like image's wrong occur when i training,is right?

and i want to know how to filter such mistake image when make tfrecord datasets.
thankyou

Loss Nan

training VOC2007 model, result is good
but i change the dataset, image set have one class, 950 images,
when training, the result 4~5 iters the loss is nan, i check the dataset no problem, and change the learning rate and batch_size, but can't help to loss, always loss nan, has anyone met it before? how can i fixed it? thx

when executing under ubuntu import error

On ubuntu 16.04, import utils.tfrecord_voc_utils as voc_utils

ModuleNotFoundError: No module named 'utils'

however,this error does not occur on windows10

Grond truth calculation

Hello @Stick-To ,
Did you calculate these ground truth quantities "Heat map, offset, size" and the focal loss? I think you did only a simulation of training ? Thanks

nan loss

without modifying any config after few epochs (4, 5) I got nan loss. If I reduce the batch size it reaches nan even earlier. I use tf version 1.13.1

loss nan

hello I trained in my dataset and also load pre train weight that you upload, but still get loss is nan,
can you give me som advice ? thanks

442368 =3843843 Invalid argument: Input to reshape is a tensor with 442368 values,

Hello, thank you for replicating so many networks. It's really a great job. This problem occurred when I ran the program. I don't know what went wrong.

Traceback (most recent call last):
File "test.py", line 74, in
mean_loss = centernet.train_one_epoch(lr)
File "/home/training/zcc/CenterNet-tensorflow-master/CenterNet.py", line 298, in train_one_epoch
_, loss = self.sess.run([self.train_op, self.loss], feed_dict={self.lr: lr})
File "/home/training/anaconda2/envs/zcc/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
。。。
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Input to reshape is a tensor with 442368 values, but the requested shape has 0
[[{{node Reshape_2}}]]
[[IteratorGetNext]]
[[center_detector/cond_5/SparseToDense/_115]]
(1) Invalid argument: Input to reshape is a tensor with 442368 values, but the requested shape has 0
[[{{node Reshape_2}}]]
[[IteratorGetNext]]

Train more than 100 epochs on VOC 0712 but not convergent

I trained on voc 0712 with 15000+ images using your code.
After trained more than 100 epochs, the mean loss on train set is 4.27.
Than I test this model on voc 07 test set, and I found that the model is not convergent.
It can not detect any objects.

All the hyper parameters are as same as your setting.

Find output tensor name.

I want to convert this tensorflow model to ONNX . But for that i need the output and input tensor names. I tried to visualize graph with tensorboard. But i got no clue .Then i tried NETRON , but it crashed. Is there any way to find out the input and output names?

Train on my own data，loss always NAN

Is there a pre-trained model about the centernet？When I train on my own data set, the loss function is always nan。

Not good results

作者你好。我采用VOC2007训练了centernet模型，batchsize=3，epoch=20，此时的loss已经稳定在5左右，继续训练下去loss不会降低。然而，我采用其他图片进行测试，效果很差，请问是不是我的测试方法有问题或者有什么好的解决办法呢？谢谢！代码和结果如下：

`centernet = net.CenterNet(config, trainset_provider)
centernet.load_weight('./centernet/test-50100')
centernet.load_pretrained_weight('./centernet/test-50100')

img = io.imread('./img/person1.jpg')
img = transform.resize(img, [384,384])
img = np.expand_dims(img, 0)
result = centernet.test_one_image(img)
id_to_clasname = {k:v for (v,k) in classname_to_ids.items()}
scores = result[0]
bbox = result[1]
class_id = result[2]
print(scores, bbox, class_id)
plt.figure(1)
plt.imshow(np.squeeze(img))
axis = plt.gca()
for i in range(len(scores)):
rect = patches.Rectangle((bbox[i][1],bbox[i][0]), bbox[i][3]-bbox[i][1],bbox[i][2]-bbox[i][0],linewidth=2,edgecolor='b',facecolor='none')
axis.add_patch(rect)
plt.text(bbox[i][1],bbox[i][0], id_to_clasname[class_id[i]]+str(' ')+str(scores[i]), color='red', fontsize=12)
plt.show()`

When I was using estimator with your model, Global step not increased,always was 0.

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
import numpy as np
import sys
import os

def estimator_model_fn(features, labels, mode, params):
    img = features['img']
    model = CenterNet(params['config'], params['is_train'])
    if mode == tf.estimator.ModeKeys.EVAL:
        model.build_whole_network(img, labels)
        print("======== loss =========")
        print(loss)
        loss = model.loss
        return tf.estimator.EstimatorSpec(mode, loss=loss)

    if mode == tf.estimator.ModeKeys.TRAIN:
        model.build_whole_network(img, labels)
        loss = model.loss
        optimizer = tf.train.AdamOptimizer(learning_rate=0.001)

        update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
        if update_ops:
            with tf.control_dependencies(update_ops):
                train_op = optimizer.minimize(loss, global_step=tf.train.get_global_step())
                return tf.estimator.EstimatorSpec(mode, loss=loss, train_op=train_op)

    if mode == tf.estimator.ModeKeys.PREDICT:
        final_bbox, final_scores, final_category = model.build_whole_network(img, None)
        predictions = {
            'predict_boxes': model.detection_pred[1],
            'predict_scores': model.detection_pred[0],
            'predict_category': model.detection_pred[2]
        }
        return tf.estimator.EstimatorSpec(mode, predictions=predictions)

class CenterNet:
    def __init__(self, config = {}, is_training=True):
        self.is_training = is_training  # 是否训练
        self.config = config
        assert config['mode'] in ['train', 'test']
        assert config['data_format'] in ['channels_first', 'channels_last']
        self.config = config
        self.input_size = config['input_size']
        if config['data_format'] == 'channels_last':
            self.data_shape = [self.input_size, self.input_size, 3]
        else:
            self.data_shape = [3, self.input_size, self.input_size]
        self.num_classes = config['num_classes']
        self.weight_decay = config['weight_decay']
        self.prob = 1. - config['keep_prob']
        self.data_format = config['data_format']
        self.mode = config['mode']
        self.batch_size = config['batch_size'] if config['mode'] == 'train' else 1

        if self.mode != 'train':
            self.score_threshold = config['score_threshold']
            self.top_k_results_output = config['top_k_results_output']

        #self.global_step = tf.get_variable(name='global_step', initializer=tf.constant(0), trainable=False)
        #self.is_training = True

    def build_whole_network(self, input_image_batch, gtboxes_batch=None):
        #print("input_image_batch", input_image_batch)
        if self.is_training:
            print("=====is_training======")
            print(gtboxes_batch)
            gtboxes_batch = tf.reshape(gtboxes_batch, [1, -1, 5])  # 最后一个维度为类别
            gtboxes_batch = tf.cast(gtboxes_batch, tf.float32)

        img_shape = tf.shape(input_image_batch)

        with tf.variable_scope('backone'):
            conv = self._conv_bn_activation(
                bottom=input_image_batch,
                filters=16,
                kernel_size=7,
                strides=1,
            )
            conv = self._conv_bn_activation(
                bottom=conv,
                filters=16,
                kernel_size=3,
                strides=1,
            )
            conv = self._conv_bn_activation(
                bottom=conv,
                filters=32,
                kernel_size=3,
                strides=2,
            )
            dla_stage3 = self._dla_generator(conv, 64, 1, self._basic_block)
            dla_stage3 = self._max_pooling(dla_stage3, 2, 2)

            dla_stage4 = self._dla_generator(dla_stage3, 128, 2, self._basic_block)
            residual = self._conv_bn_activation(dla_stage3, 128, 1, 1)
            residual = self._avg_pooling(residual, 2, 2)
            dla_stage4 = self._max_pooling(dla_stage4, 2, 2)
            dla_stage4 = dla_stage4 + residual

            dla_stage5 = self._dla_generator(dla_stage4, 256, 2, self._basic_block)
            residual = self._conv_bn_activation(dla_stage4, 256, 1, 1)
            residual = self._avg_pooling(residual, 2, 2)
            dla_stage5 = self._max_pooling(dla_stage5, 2, 2)
            dla_stage5 = dla_stage5 + residual

            dla_stage6 = self._dla_generator(dla_stage5, 512, 1, self._basic_block)
            residual = self._conv_bn_activation(dla_stage5, 512, 1, 1)
            residual = self._avg_pooling(residual, 2, 2)
            dla_stage6 = self._max_pooling(dla_stage6, 2, 2)
            dla_stage6 = dla_stage6 + residual
        with tf.variable_scope('upsampling'):
            dla_stage6 = self._conv_bn_activation(dla_stage6, 256, 1, 1)
            dla_stage6_5 = self._dconv_bn_activation(dla_stage6, 256, 4, 2)
            dla_stage6_4 = self._dconv_bn_activation(dla_stage6_5, 256, 4, 2)
            dla_stage6_3 = self._dconv_bn_activation(dla_stage6_4, 256, 4, 2)

            dla_stage5 = self._conv_bn_activation(dla_stage5, 256, 1, 1)
            #print("--- dla_stage5, dla_stage6_5 ---")
            #print(dla_stage5, dla_stage6_5)
            dla_stage5_4 = self._conv_bn_activation(dla_stage5 + dla_stage6_5, 256, 3, 1)
            dla_stage5_4 = self._dconv_bn_activation(dla_stage5_4, 256, 4, 2)
            dla_stage5_3 = self._dconv_bn_activation(dla_stage5_4, 256, 4, 2)

            dla_stage4 = self._conv_bn_activation(dla_stage4, 256, 1, 1)
            dla_stage4_3 = self._conv_bn_activation(dla_stage4 + dla_stage5_4 + dla_stage6_4, 256, 3, 1)
            dla_stage4_3 = self._dconv_bn_activation(dla_stage4_3, 256, 4, 2)

            features = self._conv_bn_activation(dla_stage6_3 + dla_stage5_3 + dla_stage4_3, 256, 3, 1)
            features = self._conv_bn_activation(features, 256, 1, 1)
            stride = 4.0

        with tf.variable_scope('center_detector'):
            keypoints = self._conv_bn_activation(features, self.num_classes, 3, 1, None)
            offset = self._conv_bn_activation(features, 2, 3, 1, None)
            size = self._conv_bn_activation(features, 2, 3, 1, None)
            if self.data_format == 'channels_first':
                keypoints = tf.transpose(keypoints, [0, 2, 3, 1])
                offset = tf.transpose(offset, [0, 2, 3, 1])
                size = tf.transpose(size, [0, 2, 3, 1])
            pshape = [tf.shape(offset)[1], tf.shape(offset)[2]]

            h = tf.range(0., tf.cast(pshape[0], tf.float32), dtype=tf.float32)
            w = tf.range(0., tf.cast(pshape[1], tf.float32), dtype=tf.float32)
            [meshgrid_x, meshgrid_y] = tf.meshgrid(w, h)
            if self.mode == 'train':
                total_loss = []
                print("---------------------")
                print("keypoints", keypoints)
                print("gtboxes_batch", gtboxes_batch)
                for i in range(self.batch_size):
                    loss = self._compute_one_image_loss(keypoints[i, ...], offset[i, ...], size[i, ...],
                                                        gtboxes_batch[i], meshgrid_y, meshgrid_x,
                                                        stride, pshape)
                    total_loss.append(loss)

                #self.loss = total_loss[0]# +
                self.loss = self.weight_decay * tf.add_n([tf.nn.l2_loss(var) for var in tf.trainable_variables()])
                #self.loss = tf.ones([2], dtype=tf.float32)[0]
            else:
                keypoints = tf.sigmoid(keypoints)
                meshgrid_y = tf.expand_dims(meshgrid_y, axis=-1)
                meshgrid_x = tf.expand_dims(meshgrid_x, axis=-1)
                center = tf.concat([meshgrid_y, meshgrid_x], axis=-1)
                category = tf.expand_dims(tf.squeeze(tf.argmax(keypoints, axis=-1, output_type=tf.int32)), axis=-1)
                meshgrid_xyz = tf.concat([tf.zeros_like(category), tf.cast(center, tf.int32), category], axis=-1)
                keypoints = tf.gather_nd(keypoints, meshgrid_xyz)
                keypoints = tf.expand_dims(keypoints, axis=0)
                keypoints = tf.expand_dims(keypoints, axis=-1)
                keypoints_peak = self._max_pooling(keypoints, 3, 1)
                keypoints_mask = tf.cast(tf.equal(keypoints, keypoints_peak), tf.float32)
                keypoints = keypoints * keypoints_mask
                scores = tf.reshape(keypoints, [-1])
                class_id = tf.reshape(category, [-1])
                bbox_yx = tf.reshape(center + offset, [-1, 2])
                bbox_hw = tf.reshape(size, [-1, 2])
                score_mask = scores > self.score_threshold
                scores = tf.boolean_mask(scores, score_mask)
                class_id = tf.boolean_mask(class_id, score_mask)
                bbox_yx = tf.boolean_mask(bbox_yx, score_mask)
                bbox_hw = tf.boolean_mask(bbox_hw, score_mask)
                bbox = tf.concat([bbox_yx - bbox_hw / 2., bbox_yx + bbox_hw / 2.], axis=-1) * stride
                num_select = tf.cond(tf.shape(scores)[0] > self.top_k_results_output, lambda: self.top_k_results_output,
                                     lambda: tf.shape(scores)[0])
                select_scores, select_indices = tf.nn.top_k(scores, num_select)
                select_class_id = tf.gather(class_id, select_indices)
                select_bbox = tf.gather(bbox, select_indices)
                self.detection_pred = [select_scores, select_bbox, select_class_id]


    def _define_inputs(self):
        shape = [self.batch_size]
        shape.extend(self.data_shape)
        mean = tf.convert_to_tensor([0.485, 0.456, 0.406], dtype=tf.float32)
        std = tf.convert_to_tensor([0.229, 0.224, 0.225], dtype=tf.float32)
        if self.data_format == 'channels_last':
            mean = tf.reshape(mean, [1, 1, 1, 3])
            std = tf.reshape(std, [1, 1, 1, 3])
        else:
            mean = tf.reshape(mean, [1, 3, 1, 1])
            std = tf.reshape(std, [1, 3, 1, 1])
        if self.mode == 'train':
            self.images, self.ground_truth = self.train_iterator.get_next()
            print("load ground_truth.shape", self.ground_truth)
            self.images.set_shape(shape)
            self.images = (self.images / 255. - mean) / std
        else:
            self.images = tf.placeholder(tf.float32, shape, name='images')
            self.images = (self.images / 255. - mean) / std
            self.ground_truth = tf.placeholder(tf.float32, [self.batch_size, None, 5], name='labels')
        self.lr = tf.placeholder(dtype=tf.float32, shape=[], name='lr')

    def _compute_one_image_loss(self, keypoints, offset, size, ground_truth, meshgrid_y, meshgrid_x,
                                stride, pshape):
        #ground_truth = tf.reshape(ground_truth, [-1, 5])
        print("reshape to [1, 5] ground_truth", ground_truth)
        #slice_index = tf.argmin(ground_truth, axis=0)[0]
        #ground_truth = tf.gather(ground_truth, tf.range(0, slice_index, dtype=tf.int64))
        ngbbox_y = ground_truth[..., 0] / stride
        ngbbox_x = ground_truth[..., 1] / stride
        ngbbox_h = ground_truth[..., 2] / stride
        ngbbox_w = ground_truth[..., 3] / stride
        class_id = tf.cast(ground_truth[..., 4], dtype=tf.int32)
        ngbbox_yx = ground_truth[..., 0:2] / stride
        ngbbox_yx_round = tf.floor(ngbbox_yx)
        offset_gt = ngbbox_yx - ngbbox_yx_round
        size_gt = ground_truth[..., 2:4] / stride
        ngbbox_yx_round_int = tf.cast(ngbbox_yx_round, tf.int64)
        keypoints_loss = self._keypoints_loss(keypoints, ngbbox_yx_round_int, ngbbox_y, ngbbox_x, ngbbox_h,
                                              ngbbox_w, class_id, meshgrid_y, meshgrid_x, pshape)

        offset = tf.gather_nd(offset, ngbbox_yx_round_int)
        size = tf.gather_nd(size, ngbbox_yx_round_int)
        offset_loss = tf.reduce_mean(tf.abs(offset_gt - offset))
        size_loss = tf.reduce_mean(tf.abs(size_gt - size))
        total_loss = keypoints_loss + 0.1*size_loss + offset_loss
        print("=================================")
        print("total_loss", total_loss)
        return total_loss
        #return 0.1

    def _keypoints_loss(self, keypoints, gbbox_yx, gbbox_y, gbbox_x, gbbox_h, gbbox_w,
                        classid, meshgrid_y, meshgrid_x, pshape):
        sigma = self._gaussian_radius(gbbox_h, gbbox_w, 0.7)
        gbbox_y = tf.reshape(gbbox_y, [-1, 1, 1])
        gbbox_x = tf.reshape(gbbox_x, [-1, 1, 1])
        sigma = tf.reshape(sigma, [-1, 1, 1])

        num_g = tf.shape(gbbox_y)[0]
        meshgrid_y = tf.expand_dims(meshgrid_y, 0)
        meshgrid_y = tf.tile(meshgrid_y, [num_g, 1, 1])
        meshgrid_x = tf.expand_dims(meshgrid_x, 0)
        meshgrid_x = tf.tile(meshgrid_x, [num_g, 1, 1])

        keyp_penalty_reduce = tf.exp(-((gbbox_y-meshgrid_y)**2 + (gbbox_x-meshgrid_x)**2)/(2*sigma**2))
        zero_like_keyp = tf.expand_dims(tf.zeros(pshape, dtype=tf.float32), axis=-1)
        reduction = []
        gt_keypoints = []
        for i in range(self.num_classes):
            exist_i = tf.equal(classid, i)
            reduce_i = tf.boolean_mask(keyp_penalty_reduce, exist_i, axis=0)
            reduce_i = tf.cond(
                tf.equal(tf.shape(reduce_i)[0], 0),
                lambda: zero_like_keyp,
                lambda: tf.expand_dims(tf.reduce_max(reduce_i, axis=0), axis=-1)
            )
            reduction.append(reduce_i)

            gbbox_yx_i = tf.boolean_mask(gbbox_yx, exist_i)
            gt_keypoints_i = tf.cond(
                tf.equal(tf.shape(gbbox_yx_i)[0], 0),
                lambda: zero_like_keyp,
                lambda: tf.expand_dims(tf.sparse.to_dense(tf.sparse.SparseTensor(gbbox_yx_i, tf.ones_like(gbbox_yx_i[..., 0], tf.float32), dense_shape=pshape), validate_indices=False),
                                       axis=-1)
            )
            gt_keypoints.append(gt_keypoints_i)
        reduction = tf.concat(reduction, axis=-1)
        gt_keypoints = tf.concat(gt_keypoints, axis=-1)
        keypoints_pos_loss = -tf.pow(1.-tf.sigmoid(keypoints), 2.) * tf.log_sigmoid(keypoints) * gt_keypoints
        keypoints_neg_loss = -tf.pow(1.-reduction, 4) * tf.pow(tf.sigmoid(keypoints), 2.) * (-keypoints+tf.log_sigmoid(keypoints)) * (1.-gt_keypoints)
        keypoints_loss = tf.reduce_sum(keypoints_pos_loss) / tf.cast(num_g, tf.float32) + tf.reduce_sum(keypoints_neg_loss) / tf.cast(num_g, tf.float32)
        return keypoints_loss

    # from cornernet
    def _gaussian_radius(self, height, width, min_overlap=0.7):
        a1 = 1.
        b1 = (height + width)
        c1 = width * height * (1. - min_overlap) / (1. + min_overlap)
        sq1 = tf.sqrt(b1 ** 2. - 4. * a1 * c1)
        r1 = (b1 + sq1) / 2.
        a2 = 4.
        b2 = 2. * (height + width)
        c2 = (1. - min_overlap) * width * height
        sq2 = tf.sqrt(b2 ** 2. - 4. * a2 * c2)
        r2 = (b2 + sq2) / 2.
        a3 = 4. * min_overlap
        b3 = -2. * min_overlap * (height + width)
        c3 = (min_overlap - 1.) * width * height
        sq3 = tf.sqrt(b3 ** 2. - 4. * a3 * c3)
        r3 = (b3 + sq3) / 2.
        return tf.reduce_min([r1, r2, r3])

    def _create_summary(self):
        with tf.variable_scope('summaries'):
            tf.summary.scalar('loss', self.loss)
            self.summary_op = tf.summary.merge_all()

    '''def load_weight(self, path):
        self.saver.restore(self.sess, path)
        print('load weight', path, 'successfully')

    def load_pretrained_weight(self, path):
        self.pretrained_saver.restore(self.sess, path)
        print('load pretrained weight', path, 'successfully')
    '''

    def _bn(self, bottom):
        bn = tf.layers.batch_normalization(
            inputs=bottom,
            axis=3 if self.data_format == 'channels_last' else 1,
            training=self.is_training
        )
        return bn

    def _conv_bn_activation(self, bottom, filters, kernel_size, strides, activation=tf.nn.relu):
        conv = tf.layers.conv2d(
            inputs=bottom,
            filters=filters,
            kernel_size=kernel_size,
            strides=strides,
            padding='same',
            data_format=self.data_format
        )
        bn = self._bn(conv)
        if activation is not None:
            return activation(bn)
        else:
            return bn

    def _dconv_bn_activation(self, bottom, filters, kernel_size, strides, activation=tf.nn.relu):
        conv = tf.layers.conv2d_transpose(
            inputs=bottom,
            filters=filters,
            kernel_size=kernel_size,
            strides=strides,
            padding='same',
            data_format=self.data_format,
        )
        bn = self._bn(conv)
        if activation is not None:
            bn = activation(bn)
        return bn

    def _separable_conv_layer(self, bottom, filters, kernel_size, strides, activation=tf.nn.relu):
        conv = tf.layers.separable_conv2d(
            inputs=bottom,
            filters=filters,
            kernel_size=kernel_size,
            strides=strides,
            padding='same',
            data_format=self.data_format,
            use_bias=False,
        )
        bn = self._bn(conv)
        if activation is not None:
            bn = activation(bn)
        return bn

    def _basic_block(self, bottom, filters):
        conv = self._conv_bn_activation(bottom, filters, 3, 1)
        conv = self._conv_bn_activation(conv, filters, 3, 1)
        axis = 3 if self.data_format == 'channels_last' else 1
        input_channels = tf.shape(bottom)[axis]
        shutcut = tf.cond(
            tf.equal(input_channels, filters),
            lambda: bottom,
            lambda: self._conv_bn_activation(bottom, filters, 1, 1)
        )
        return conv + shutcut

    def _dla_generator(self, bottom, filters, levels, stack_block_fn):
        if levels == 1:
            block1 = stack_block_fn(bottom, filters)
            block2 = stack_block_fn(block1, filters)
            aggregation = block1 + block2
            aggregation = self._conv_bn_activation(aggregation, filters, 3, 1)
        else:
            block1 = self._dla_generator(bottom, filters, levels-1, stack_block_fn)
            block2 = self._dla_generator(block1, filters, levels-1, stack_block_fn)
            aggregation = block1 + block2
            aggregation = self._conv_bn_activation(aggregation, filters, 3, 1)
        return aggregation

    def _max_pooling(self, bottom, pool_size, strides, name=None):
        return tf.layers.max_pooling2d(
            inputs=bottom,
            pool_size=pool_size,
            strides=strides,
            padding='same',
            data_format=self.data_format,
            name=name
        )

    def _avg_pooling(self, bottom, pool_size, strides, name=None):
        return tf.layers.average_pooling2d(
            inputs=bottom,
            pool_size=pool_size,
            strides=strides,
            padding='same',
            data_format=self.data_format,
            name=name
        )

    def _dropout(self, bottom, name):
        return tf.layers.dropout(
            inputs=bottom,
            rate=self.prob,
            training=self.is_training,
            name=name
        )

Pre trained weights

How do you use pretrained weights?
I always get an error stating that backone can‘t be found in the vgg 16

Could anyone provide trained weights?

Training on COCO dataset

Hi, thanks for your sharing your code!

Have you compared your results with the author's results on Pascal VOC?
Have you trained on COCO dataset? I trained Resnet-50 backbone on COCO dataset using your code, and I found that the size loss is extremely hard to converge.
Could you share your training loss curve and training hyper-parameters such as lr?

Thank you very much!

what backbone is used?

Hey, curious as to what backbone network you used for the data and how easy it is to replace it with something else?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.