raghakot / keras-resnet Goto Github PK

Residual networks implementation using Keras-1.0 functional API

License: Other

Python 100.00%

deep-learning keras resnet

keras-resnet's Introduction

keras-resnet

Residual networks implementation using Keras-1.0 functional API, that works with both theano/tensorflow backend and 'th'/'tf' image dim ordering.

The original articles

Deep Residual Learning for Image Recognition (the 2015 ImageNet competition winner)
Identity Mappings in Deep Residual Networks

Residual blocks

The residual blocks are based on the new improved scheme proposed in Identity Mappings in Deep Residual Networks as shown in figure (b)

Both bottleneck and basic residual blocks are supported. To switch them, simply provide the block function here

Code Walkthrough

The architecture is based on 50 layer sample (snippet from paper)

There are two key aspects to note here

conv2_1 has stride of (1, 1) while remaining conv layers has stride (2, 2) at the beginning of the block. This fact is expressed in the following lines.
At the end of the first skip connection of a block, there is a disconnect in num_filters, width and height at the merge layer. This is addressed in _shortcut by using conv 1X1 with an appropriate stride. For remaining cases, input is directly merged with residual block as identity.

ResNetBuilder factory

Use ResNetBuilder build methods to build standard ResNet architectures with your own input shape. It will auto calculate paddings and final pooling layer filters for you.
Use the generic build method to setup your own architecture.

Cifar10 Example

Includes cifar10 training example. Achieves ~86% accuracy using Resnet18 model.

Note that ResNet18 as implemented doesn't really seem appropriate for CIFAR-10 as the last two residual stages end up as all 1x1 convolutions from downsampling (stride). This is worse for deeper versions. A smaller, modified ResNet-like architecture achieves ~92% accuracy (see gist).

keras-resnet's People

Contributors

Stargazers

Watchers

Forkers

dylan-fan amoliu wavelets psy2013github hedgefair wanjinchang jeffzhengye ctgushiwei jesse-back qqgeogor agarwalnaimish countkisg limingdeng jalused aizvorski juanlp ml-ai-nlp-ir kratos4ever buemin jwgu stevenlol ameya005 arasharchor mixml xifengbishu anazou junfenglx superyangwenwen ericschles mlzxy vyraun rhythm92 zhimingluo drzhanying kurnianggoro yuyincug bharathbs93 aistrych mayanxin89 dominjune paulfitz pszyu kirangavini nextowang ahundt zilongzhong dschappler prokil shelpuk jrao1 simonsunqtj nitinh mdshopon abioy ps793 jadielam jamescfli wangleye raghavendranpm maxwelllzh ilibx kaeflint meshiguge happynoom pgnepal karimpedia akashgupta299 wolfhu xiaofengqing lbsantos liuhengguang ivjia omprakash95 stas-sl tianyuhu fireae potis joselgomez hwtkao350 ykwon0407 superresolution aabercrombie0492 bazinga4043 luchunyv chinazm rubenszimbres stella-gao ravnoor renyi533 aneube pyup megamanics jihongju collector-m ahmedhani diazandr3s huangpeng1126 bityangke adsjhjqw kunlqt

keras-resnet's Issues

Adapt Keras 2 API

Keras has just released Keras 2 which introduced significant API changes and the old API will not be supported anymore from August 2017. The main difference that matters is the change of arguments names.

I am kind of annoyed by the warnings pop-up every time before training so I just changed those names myself and I am willing to send a PR for those changes.

See PR #39

Stuck in first epoch

The epoch doesn't start-off, the fitting of the model doesn't happen. Tried it on local machine as well as google colabs.

fit_generator get type none

Exception: output of generator should be a tuple (x, y, sample_weight) or (x, y). Found: None
Why I am getting this error? But when I am using other models, it does not get the error, the input is (-1,1,224,224) output is (1,8), and I adjust it to the model too.

the pool mode issue in the last pool layer

I noticed that the last layer pool in your network:
pool2 = AveragePooling2D(pool_size=(7, 7), strides=(1, 1), border_mode="same")(block4)
and I check then the output size will be still 7*7 for the reason mode = "same", However, the output size in the paper is told to be 1, should here to modify the mode = "valid"?

Stride cannot be greater than or equal to filter size

I think there is a bug, filter size should be greater then subsample length. After block1 init_subsample is 2X2
line no. 48:
conv_1_1 = _bn_relu_conv(nb_filters, 1, 1, subsample=init_subsample)(input)
filter size = (1,1)
stride = (2,2)
for block2, 3 and 4
block2 = _residual_block(block_fn, nb_filters=128, repetations=4)(block1)
block3 = _residual_block(block_fn, nb_filters=256, repetations=6)(block2)
block4 = _residual_block(block_fn, nb_filters=512, repetations=3)(block3)

def f(input):
for i in range(repetations):
init_subsample = (1, 1)
if i == 0 and not is_first_layer:
---> init_subsample = (2, 2)
input = block_function(nb_filters=nb_filters, init_subsample=init_subsample)(input)
return input

Full pre-activation implementation deviates from paper

(This is a very nice implementation - thanks for sharing. I stumbled across here searching different ResNet implementations, especially those that have adopted the improved pre-activation stages.)

From the paper by He et al (Identity Mappings in Deep Residual Networks), there is a perhaps subtle implementation detail that doesn't appear to be implemented here. In particular, the paper states:

When using the pre-activation Residual Units (Fig. 4(d)(e) and Fig. 5), we pay special attention to the first and the last Residual Units of the entire network. For the first Residual Unit (that follows a stand-alone convolutional layer, conv1), we adopt the first activation right after conv1 and before splitting into two paths; for the last Residual Unit (followed by average pooling and a fully- connected classifier), we adopt an extra activation right after its element-wise addition.

It seems that using the new pre-activation bottleneck stages as implemented here results in a conv-bn-relu-bn-relu-... entering into the bottleneck stages (back-to-back bn-relu) and that there is no activation, either relu or probably better bn-relu, heading into the average pooling.

FWIW, I have a much less polished adaptation that achieves 85.7% on CIFAR-10 with about 180K parameters (10 bottleneck stages, 31 total convolutional layers).

I might also suggest setting bias = False for any convolutional or dense layer followed by batchnorm since the batchnorm bias parameter makes that bias redundant.

Submit Pull request to keras-contrib?

Keras has created a new official keras-contrib repository where they will accept a broader range of contributions than mainline keras, for eventual inclusion in mainline if it becomes widely used. Would you consider submitting your implementation?

https://github.com/farizrahman4u/keras-contrib

Why Conv2_1 has stride of (1,1)?

Hello,
Thanks for your rep.
It's wirtten in the readme that:

conv2_1 has stride of (1, 1) while remaining conv layers has stride (2, 2) at the beginning of the block. This fact is expressed in the following lines.

However, I don't quite understand how the linked lines explain the reason why conv2_1 has stride of (1,1). Actually I am curious about that for long time. Could you please that for me?
Thank you! 😄

TypeError: init() takes at least 4 arguments (1 given)

My keras version is 1.2.2, tensorflow version is 1.4.0, but when I try to run the code, it got this problem.
TypeError: init() takes at least 4 arguments (1 given)

Last layer activation

Maybe you should use sigmoid activation instead of softmax for the last layer if num_outputs == 1?

Only layers of same output shape can be merged using sum mode.

Using input shape (3, 300, 300), with 5 labels:
model = ResNetBuilder.build_resnet_50((3, 300, 300), 5)

I get the error:
Exception: Only layers of same output shape can be merged using sum mode. Layer shapes: [(None, 1, 75, 64), (None, 1, 75, 256)]

What input sizes can be used?

K.set_dim_ordering runtime errors

This works fine:

`from resnet import ResNetBuilder

model = ResNetBuilder.build_resnet_18((3, 224, 224), 1000)
model.compile(loss="categorical_crossentropy", optimizer="sgd")
model.summary()`

But this does not:

`from resnet import ResNetBuilder

from keras import backend as K
K.set_image_dim_ordering('th')

model = ResNetBuilder.build_resnet_18((3, 224, 224), 1000)
model.compile(loss="categorical_crossentropy", optimizer="sgd")
model.summary()`

Errors:

Using Theano backend. Using gpu device 0: GeForce GTX 750 Ti (CNMeM is disabled, cuDNN 5105) /......./site-packages/keras/backend/theano_backend.py:1500: UserWarning: DEPRECATION: the 'ds' parameter is not going to exist anymore as it is going to be replaced by the parameter 'ws'. mode='max') /......./site-packages/keras/backend/theano_backend.py:1500: UserWarning: DEPRECATION: the 'st' parameter is not going to exist anymore as it is going to be replaced by the parameter 'stride'. mode='max') /......./site-packages/keras/backend/theano_backend.py:1500: UserWarning: DEPRECATION: the 'padding' parameter is not going to exist anymore as it is going to be replaced by the parameter 'pad'. mode='max') Traceback (most recent call last): File "testResnet.py", line 6, in <module> model = ResNetBuilder.build_resnet_18((3, 224, 224), 1000) File "........resnet.py", line 154, in build_resnet_18 return ResNetBuilder.build(input_shape, num_outputs, basic_block, [2, 2, 2, 2]) File "........resnet.py", line 139, in build block = _residual_block(block_fn, nb_filters=nb_filters, repetitions=r, is_first_layer=i == 0)(block) File "........resnet.py", line 77, in f input = block_function(nb_filters=nb_filters, init_subsample=init_subsample)(input) File "........resnet.py", line 90, in f return _shortcut(input, residual) File "........resnet.py", line 65, in _shortcut init="he_normal", border_mode="valid")(input) File "/......./site-packages/keras/engine/topology.py", line 514, in __call__ self.add_inbound_node(inbound_layers, node_indices, tensor_indices) File "/......./site-packages/keras/engine/topology.py", line 572, in add_inbound_node Node.create_node(self, inbound_layers, node_indices, tensor_indices) File "/......./site-packages/keras/engine/topology.py", line 152, in create_node output_shapes = to_list(outbound_layer.get_output_shape_for(input_shapes[0])) File "/......./site-packages/keras/layers/convolutional.py", line 451, in get_output_shape_for self.border_mode, self.subsample[0]) File "/......./site-packages/keras/utils/np_utils.py", line 131, in conv_output_length return (output_length + stride - 1) // stride ZeroDivisionError: integer division or modulo by zero

Error CNTK

I get error when I use cntk for backend.

Using CNTK backend
Traceback (most recent call last):
File "c:\Users\malan\Documents\Python Scripts\keras-resnet-master\cifar10.py", line 49, in
model = resnet.ResnetBuilder.build_resnet_18((img_channels, img_rows, img_cols), nb_classes)
File "c:\Users\malan\Documents\Python Scripts\keras-resnet-master\resnet.py", line 233, in build_resnet_18
return ResnetBuilder.build(input_shape, num_outputs, basic_block, [2, 2, 2, 2])
File "c:\Users\malan\Documents\Python Scripts\keras-resnet-master\resnet.py", line 214, in build
block = _residual_block(block_fn, filters=filters, repetitions=r, is_first_layer=(i == 0))(block)
File "c:\Users\malan\Documents\Python Scripts\keras-resnet-master\resnet.py", line 104, in f
is_first_block_of_first_layer=(is_first_layer and i == 0))(input)
File "c:\Users\malan\Documents\Python Scripts\keras-resnet-master\resnet.py", line 127, in f
residual = bn_relu_conv(filters=filters, kernel_size=(3, 3))(conv1)
File "c:\Users\malan\Documents\Python Scripts\keras-resnet-master\resnet.py", line 65, in f
kernel_regularizer=kernel_regularizer)(activation)
File "C:\Users\malan\Anaconda3\envs\cntkkeraspy35\Lib\site-packages\keras\engine\topology.py", line 596, in call
output = self.call(inputs, **kwargs)
File "C:\Users\malan\Anaconda3\envs\cntkkeraspy35\Lib\site-packages\keras\layers\convolutional.py", line 164, in call
dilation_rate=self.dilation_rate)
File "C:\Users\malan\Anaconda3\envs\cntkkeraspy35\Lib\site-packages\keras\backend\cntk_backend.py", line 1340, in conv2d
padding])
File "C:\Users\malan\Anaconda3\envs\cntkkeraspy35\Lib\site-packages\cntk\internal\swig_helper.py", line 69, in wrapper
result = f(*args, **kwds)
File "C:\Users\malan\Anaconda3\envs\cntkkeraspy35\Lib\site-packages\cntk\ops_init.py", line 253, in convolution
max_temp_mem_size_in_samples, name)
ValueError: Convolution operation requires that kernel dim 3 <= input dim 2.
[CALL STACK]
> Microsoft::MSR::CNTK::Matrix:: __autoclassinit2
- CNTK::NDMask:: MaskedCount (x2)
- CNTK::Function:: ~Function
- CNTK::Evaluator:: TestMinibatch
- RtlRunOnceExecuteOnce
- InitOnceExecuteOnce
- _crtInitOnceExecuteOnce
- CNTK::Evaluator:: TestMinibatch
- CNTK::Function:: RawOutputs
- CNTK::DeviceDescriptor:: UseDefaultDevice
- CNTK::Function:: ~Function
- CNTK::Evaluator:: TestMinibatch
- RtlRunOnceExecuteOnce
- InitOnceExecuteOnce
- _crtInitOnceExecuteOnce

Where is the model saved?

I run the cifar10.py,but the model do not saved,I used the commend model.save,it not have the methods, I want to know how to save the model.

Error with building resnet

On keras 1.0.5 on my notebook (mac) all fine. But on my Linux today I download keras 1.0.6 and see error
...
Average pooling, from (3,3) to (1,1)
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/numpy/core/fromnumeric.py", line 2489, in prod
prod = a.prod
AttributeError: 'list' object has no attribute 'prod'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "example.py", line 167, in
model = get_residual_model(is_mnist=is_mnist, img_channels=img_channels, img_rows=img_rows, img_cols=img_cols)
File "example.py", line 132, in get_residual_model
model.add(Flatten())
File "/usr/local/lib/python3.5/dist-packages/Keras-1.0.6-py3.5.egg/keras/models.py", line 146, in add
output_tensor = layer(self.outputs[0])
File "/usr/local/lib/python3.5/dist-packages/Keras-1.0.6-py3.5.egg/keras/engine/topology.py", line 485, in call
self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
File "/usr/local/lib/python3.5/dist-packages/Keras-1.0.6-py3.5.egg/keras/engine/topology.py", line 543, in add_inbound_node
Node.create_node(self, inbound_layers, node_indices, tensor_indices)
File "/usr/local/lib/python3.5/dist-packages/Keras-1.0.6-py3.5.egg/keras/engine/topology.py", line 148, in create_node
output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
File "/usr/local/lib/python3.5/dist-packages/Keras-1.0.6-py3.5.egg/keras/layers/core.py", line 311, in call
return K.batch_flatten(x)
File "/usr/local/lib/python3.5/dist-packages/Keras-1.0.6-py3.5.egg/keras/backend/tensorflow_backend.py", line 646, in batch_flatten
x = tf.reshape(x, [-1, np.prod(x.get_shape()[1:].as_list())])
File "/usr/local/lib/python3.5/dist-packages/numpy/core/fromnumeric.py", line 2492, in prod
out=out, keepdims=keepdims)
File "/usr/local/lib/python3.5/dist-packages/numpy/core/_methods.py", line 35, in _prod
return umr_prod(a, axis, dtype, out, keepdims)
TypeError: unsupported operand type(s) for *: 'int' and 'NoneType'

Later I found that keras was updated several minutes ago, update all and get other error:

File "example.py", line 167, in
model = get_residual_model(is_mnist=is_mnist, img_channels=img_channels, img_rows=img_rows, img_cols=img_cols)
File "example.py", line 123, in get_residual_model
model.add(Convolution2D(first_layer_channel, 3, 3, border_mode='same', input_shape=(img_channels, img_rows, img_cols)))
File "/usr/local/lib/python3.5/dist-packages/Keras-1.0.6-py3.5.egg/keras/models.py", line 114, in add
layer.create_input_layer(batch_input_shape, input_dtype)
File "/usr/local/lib/python3.5/dist-packages/Keras-1.0.6-py3.5.egg/keras/engine/topology.py", line 341, in create_input_layer
self(x)
File "/usr/local/lib/python3.5/dist-packages/Keras-1.0.6-py3.5.egg/keras/engine/topology.py", line 458, in call
self.build(input_shapes[0])
File "/usr/local/lib/python3.5/dist-packages/Keras-1.0.6-py3.5.egg/keras/layers/convolutional.py", line 296, in build
self.W = self.init(self.W_shape, name='{}_W'.format(self.name))
File "/usr/local/lib/python3.5/dist-packages/Keras-1.0.6-py3.5.egg/keras/initializations.py", line 59, in glorot_uniform
return uniform(shape, s, name=name)
File "/usr/local/lib/python3.5/dist-packages/Keras-1.0.6-py3.5.egg/keras/initializations.py", line 32, in uniform
return K.random_uniform_variable(shape, -scale, scale, name=name)
File "/usr/local/lib/python3.5/dist-packages/Keras-1.0.6-py3.5.egg/keras/backend/tensorflow_backend.py", line 240, in random_uniform_variable
value = tf.random_uniform_initializer(low, high, dtype=tf_dtype)(shape)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/init_ops.py", line 98, in _initializer
return random_ops.random_uniform(shape, minval, maxval, dtype, seed=seed)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/random_ops.py", line 172, in random_uniform
shape = _ShapeTensor(shape)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/random_ops.py", line 45, in _ShapeTensor
return ops.convert_to_tensor(shape, dtype=dtype, name="shape")
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 620, in convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/constant_op.py", line 179, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/constant_op.py", line 162, in constant
tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape))
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/tensor_util.py", line 421, in make_tensor_proto
tensor_proto.string_val.extend([compat.as_bytes(x) for x in proto_values])
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/tensor_util.py", line 421, in
tensor_proto.string_val.extend([compat.as_bytes(x) for x in proto_values])
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/compat.py", line 44, in as_bytes
raise TypeError('Expected binary or unicode string, got %r' % bytes_or_text)
TypeError: Expected binary or unicode string, got < map object at 0x7f22129d3e10 >

which one make sense?

Hi,

I port the resnet18 model from caffe to keras. Test is on CIFAR10. The convergence curve is below. The accuracy has also reached ~84%. But the curve is strange. Does it make sense?

And I have already recreate the keras-resnet here by CIFAR18:

Thanks a lot.

Should we watch val_loss or val_acc in callbacks?

In cifar10.py 2 callbacks are used:

ReduceLROnPlateau(monitor='val_loss', factor=np.sqrt(0.1), cooldown=0, patience=5, min_lr=0.5e-6) EarlyStopping(monitor='val_acc', min_delta=0.001, patience=10)

First one changes learning rate when validation loss stops decreasing, second one stops learning when validation accurracy stops increasing. Shouldn't both monitor the same quantity? When I use this callbacks with such configuration, learning often stops when validation accuracy is quite low. I observed that best final results are achieved when in both callbacks is used val_loss, with val_acc they are slightly worse (but not much).

What do you think about it?

A very beginner question

I have a two questions to consult.

I am trying to find code to do experiments on residual deep networks. I am wondering whether this code is used to train a residual deep network or is used only for creating model visualisation? If I can use it to train a model, how should I input my training data?
I used both Theano and TensorFlow as backend of Keras to run the original code, but it gives me same error (using Theano as an example):
Using Theano backend.
Traceback (most recent call last):
File "resnet.py", line 147, in
main()
File "resnet.py", line 127, in main
model = resnet()
File "resnet.py", line 110, in resnet
block1 = _residual_block(block_fn, nb_filters=64, repetations=3, is_first_layer=True)(pool1)
File "resnet.py", line 94, in f
input = block_function(nb_filters=nb_filters, init_subsample=init_subsample)(input)
File "resnet.py", line 51, in f
return _shortcut(input, residual)
File "resnet.py", line 84, in _shortcut
return merge([shortcut, residual], mode="sum")
File "/Library/Python/2.7/site-packages/keras/engine/topology.py", line 1528, in merge
name=name)
File "/Library/Python/2.7/site-packages/keras/engine/topology.py", line 1186, in init
node_indices, tensor_indices)
File "/Library/Python/2.7/site-packages/keras/engine/topology.py", line 1221, in _arguments_validation
'Layer shapes: %s' % input_shapes)
Exception: Only layers of same output shape can be merged using sum mode. Layer shapes: [(None, 1, 56, 64), (None, 1, 56, 256)]

It seems that the code already tried to match the input size, but the error is still there. I am wondering whether anyone has also met this problem. I used Python 2.7, Keras 1.1.1, tensorflow-0.11.0, and Theano 0.8.2.

Thank you!

Any pretrained model provided?

Dense predictions for semantic segmentation (fully convolutional)

FYI I'm planning on working on converting this to work with dense predictions based on the resnet v2 in tensorflow models via the use of AtrousConvolutions2D in Keras.

It will be in this branch https://github.com/ahundt/keras-resnet/tree/dense_predictions

And I'm hoping to train/test on camvid and coco, some scripts to load camvid are in this keras segnet implementation.

I might also try the same with this densenet implementation if all goes well https://github.com/titu1994/DenseNet

adapt your solution for semantic segmentation problem

Hi, Thanks for sharing the code. I am just curious can this architecture be applied to solve semantic segmentation problem. If possible, which part of your code need to be modified? Thanks.

possible bug between residual unit and shortcut

There may be an inconsistency in the definition of the last unit before the shortcut leading to extra activations. If you look at the keras implementation of resnet50 it contains the following:

    shortcut = Conv2D(filters3, (1, 1), strides=strides,
                      name=conv_name_base + '1')(input_tensor)
    shortcut = BatchNormalization(axis=bn_axis, name=bn_name_base + '1')(shortcut)

    x = layers.add([x, shortcut])
    x = Activation('relu')(x)

On the other hand keras resnet does the following:

        residual = _bn_relu_conv(filters=filters, kernel_size=(3, 3))(conv1)
        return _shortcut(input, residual)

perhaps this is a difference between resnet v1 and resnet v2?

Spec requirement for running resnet152

Thank you for the resnet model.

On my machine, resnet152 is not available. If I run it, there is no response, machine hanging up. Resnet101 is also not available. Only resnet50 and below are available.
Machine specification is:
Memory: 24G
CPU:intel core it-7700(4 core, 3.6GHz)
GPU: GTX1070

This is not a high spec, but not a low spec too.
I wonder what spec can run the resnet152.

Training loss reduces sooo slow

Using ResnetBuilder.build((1, 32, 32), 1, 'basic_block', [1, 1, 1, 1]) to build my own network for image data of size 32x32 with depth 1, I cannot make the training loss decrease as normally it does. In fact the loss reduces very very slow (two number after decimal point) no matter how big the learning rate is set. I tried both SGD and Adam but do not seem to work. In the meantime, my data used with a simple feed forward convnet run very good. What problem it may happen with resnet arch. if you have any idea?

Edit: my bad. I use the resnet for regression purpose so I need to change activation in the last layer as 'linear'.

Model cannot add original to filtered data.

My input data size is (112, 112, 1), but I receive this error when trying to run the Resnet18 model on my data.

Traceback (most recent call last):
  File "/src/sar_optical_matching.py", line 183, in <module>
    if __name__=="__main__": main()
  File "/src/sar_optical_matching.py", line 146, in main
    model = model_def.build(input_shape, verbose=args.verbose, one_hot=args.one_hot)
  File "/src/models/resnet_siamese.py", line 21, in build
    y_opt = keras_resnet.models.ResNet18(x_opt)
  File "/opt/conda/envs/custom/lib/python3.6/site-packages/keras_resnet/models.py", line 103, in __init__
    super(ResNet18, self).__init__(inputs, [2, 2, 2, 2], block)
  File "/opt/conda/envs/custom/lib/python3.6/site-packages/keras_resnet/models.py", line 61, in __init__
    x = block(features, strides, j == 0 and k == 0)(x)
  File "/opt/conda/envs/custom/lib/python3.6/site-packages/keras_resnet/block/__init__.py", line 48, in f
    y = _shortcut(x, y)
  File "/opt/conda/envs/custom/lib/python3.6/site-packages/keras_resnet/block/__init__.py", line 121, in _shortcut
    return keras.layers.add([a, b])
  File "/opt/conda/envs/custom/lib/python3.6/site-packages/keras/layers/merge.py", line 455, in add
    return Add(**kwargs)(inputs)
  File "/opt/conda/envs/custom/lib/python3.6/site-packages/keras/engine/topology.py", line 560, in __call__
    self.build(input_shapes)
  File "/opt/conda/envs/custom/lib/python3.6/site-packages/keras/layers/merge.py", line 84, in build
    output_shape = self._compute_elemwise_op_output_shape(output_shape, shape)
  File "/opt/conda/envs/custom/lib/python3.6/site-packages/keras/layers/merge.py", line 55, in _compute_elemwise_op_output_shape
    str(shape1) + ' ' + str(shape2))
ValueError: Operands could not be broadcast together with shapes (7, 7, 512) (4, 4, 512)

Adapt the resnet model into a pixel classification model

Hi,
I've been attempting to adapt your implementation of the residual model to do pixel-wise classification. It may be a long shot, but I was hoping I could ask you a few question and hoping for some hints.

So I'm currently thinking of using upsampling such that the repræsentation and the input dimensions stays the same but upsampling in the latter part of the model. My images are 155x240 (BRATS CHALLENGE data set) but I'm afraid that the residual model downsamples my input so much that upsampling the image to its original size makes it such that I cannot classify each pixel properly.
Another way of preserving the image repræsentations dimensions would simply be to make stride=1 for each layer which downsamples the repræsentation, but if possible I would like to let the resNet architecture stay the same while mbe changing it by only adding layers.

Other hints are welcome as well!

Great work and thanks for sharing it,

Best regards, S2ica

inference image preprocessing

I get wonderful results while training (val_acc > 98%), but when i try to evaluate images it seems like rolling a dice.

Is it necessary to preprocess the images the same way, as it is done before training?
I tried the following:

image = cv2.resize(image, (32, 32))
image = img_to_array(image)
image = image.astype('float32')
mean_image = np.mean(image, axis=0)
image -= mean_image
image /= 128.
image = np.expand_dims(image, axis=0)

But the result is not usable at all.
Thanks in advance

Hi, I am having trouble getting the resnet to work. Do you mind helping?

This is the error I get when I try to use the resnet implemented by you.

Traceback (most recent call last):
File "resnet.py", line 297, in
main()
File "resnet.py", line 229, in main
history = model.fit(X_train, y_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=1, validation_data=(X_test, y_test), callbacks=[lrate, early_stop, hist, checkpoint])
File "/usr/local/lib/python2.7/site-packages/keras/engine/training.py", line 1082, in fit
callback_metrics=callback_metrics)
File "/usr/local/lib/python2.7/site-packages/keras/engine/training.py", line 801, in _fit_loop
outs = f(ins_batch)
File "/usr/local/lib/python2.7/site-packages/keras/backend/theano_backend.py", line 531, in call
return self.function(*inputs)
File "/usr/local/lib/python2.7/site-packages/theano/compile/function_module.py", line 875, in call
storage_map=getattr(self.fn, 'storage_map', None))
File "/usr/local/lib/python2.7/site-packages/theano/gof/link.py", line 325, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File "/usr/local/lib/python2.7/site-packages/theano/compile/function_module.py", line 862, in call
self.fn() if output_subset is None else
File "/usr/local/lib/python2.7/site-packages/theano/gof/op.py", line 908, in rval
r = p(n, [x[0] for x in i], o)
File "/usr/local/lib/python2.7/site-packages/theano/tensor/signal/pool.py", line 849, in perform
raise NotImplementedError()

NotImplementedError:
Apply node that caused the error: AveragePoolGrad{ds=(7, 7), ignore_border=True, st=(1, 1), padding=(5, 5), mode='average_exc_pad'}(Elemwise{add,no_inplace}.0, IncSubtensor{InplaceInc;::, ::, :int64:, :int64:}.0)
Toposort index: 1209
Inputs types: [TensorType(float32, 4D), TensorType(float32, 4D)]
Inputs shapes: [(192, 8, 4, 4), (192, 8, 8, 8)]
Inputs strides: [(512, 64, 16, 4), (2048, 256, 32, 4)]
Inputs values: ['not shown', 'not shown']
Outputs clients: [[Sum{axis=[0, 2, 3], acc_dtype=float64}(AveragePoolGrad{ds=(7, 7), ignore_border=True, st=(1, 1), padding=(5, 5), mode='average_exc_pad'}.0), CorrMM_gradWeights{half, (1, 1)}(Elemwise{Composite{(i0 * (Abs(i1) + i2 + i3))}}[(0, 2)].0, AveragePoolGrad{ds=(7, 7), ignore_border=True, st=(1, 1), padding=(5, 5), mode='average_exc_pad'}.0, Subtensor{int64}.0, Subtensor{int64}.0), CorrMM_gradInputs{half, (1, 1)}(Subtensor{::, ::, ::int64, ::int64}.0, AveragePoolGrad{ds=(7, 7), ignore_border=True, st=(1, 1), padding=(5, 5), mode='average_exc_pad'}.0), Elemwise{add,no_inplace}(Elemwise{Composite{Switch(i0, i1, (i2 / i3))}}.0, Elemwise{Composite{((i0 * i1 * i2) / i3)}}.0, Elemwise{Composite{(((i0 / i1) / i2) / i3)}}.0, AveragePoolGrad{ds=(7, 7), ignore_border=True, st=(1, 1), padding=(5, 5), mode='average_exc_pad'}.0, Elemwise{Switch}[(0, 1)].0), Elemwise{add,no_inplace}(Elemwise{Composite{Switch(i0, i1, (i2 / i3))}}.0, Elemwise{Composite{((i0 * i1 * i2) / i3)}}.0, Elemwise{Composite{(((i0 / i1) / i2) / i3)}}.0, Elemwise{switch,no_inplace}.0, Elemwise{Composite{Switch(i0, i1, (i2 / i3))}}.0, Elemwise{Composite{((i0 * i1 * i2) / i3)}}.0, Elemwise{Composite{(((i0 / i1) / i2) / i3)}}.0, AveragePoolGrad{ds=(7, 7), ignore_border=True, st=(1, 1), padding=(5, 5), mode='average_exc_pad'}.0, Elemwise{Switch}[(0, 1)].0), Elemwise{Composite{(Switch(i0, i1, (i2 / i3)) + ((i4 * i5 * i6) / i7) + i8 + Switch(i0, i9, i1) + i10 + i11 + i12 + i13 + i14 + i15 + i16 + i17 + i18)}}[(0, 2)](InplaceDimShuffle{x,x,x,x}.0, TensorConstant{%281, 1, 1, 1%29 of 0}, Elemwise{mul}.0, Elemwise{add,no_inplace}.0, Elemwise{Composite{AND%28GE%28i0, i1%29, LE%28i0, i2%29%29}}.0, InplaceDimShuffle{x,0,x,x}.0, Elemwise{sub,no_inplace}.0, Elemwise{mul,no_inplace}.0, Elemwise{Composite{%28%28%28i0 / i1%29 / i2%29 / i3%29}}.0, Elemwise{true_div}.0, Elemwise{Composite{Switch%28i0, i1, %28i2 / i3%29%29}}.0, Elemwise{Composite{%28%28i0 * i1 * i2%29 / i3%29}}.0, Elemwise{Composite{%28%28%28i0 / i1%29 / i2%29 / i3%29}}.0, Elemwise{switch,no_inplace}.0, Elemwise{Composite{Switch%28i0, i1, %28i2 / i3%29%29}}.0, Elemwise{Composite{%28%28i0 * i1 * i2%29 / i3%29}}.0, Elemwise{Composite{%28%28%28i0 / i1%29 / i2%29 / i3%29}}.0, AveragePoolGrad{ds=%287, 7%29, ignore_border=True, st=%281, 1%29, padding=%285, 5%29, mode='average_exc_pad'}.0, Elemwise{Switch}[%280, 1%29].0)]]

Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer):
File "/usr/local/lib/python2.7/site-packages/theano/gradient.py", line 1267, in access_grad_cache
term = access_term_cache(node)[idx]
File "/usr/local/lib/python2.7/site-packages/theano/gradient.py", line 961, in access_term_cache
output_grads = [access_grad_cache(var) for var in node.outputs]
File "/usr/local/lib/python2.7/site-packages/theano/gradient.py", line 1267, in access_grad_cache
term = access_term_cache(node)[idx]
File "/usr/local/lib/python2.7/site-packages/theano/gradient.py", line 961, in access_term_cache
output_grads = [access_grad_cache(var) for var in node.outputs]
File "/usr/local/lib/python2.7/site-packages/theano/gradient.py", line 1267, in access_grad_cache
term = access_term_cache(node)[idx]
File "/usr/local/lib/python2.7/site-packages/theano/gradient.py", line 961, in access_term_cache
output_grads = [access_grad_cache(var) for var in node.outputs]
File "/usr/local/lib/python2.7/site-packages/theano/gradient.py", line 1267, in access_grad_cache
term = access_term_cache(node)[idx]
File "/usr/local/lib/python2.7/site-packages/theano/gradient.py", line 1101, in access_term_cache
input_grads = node.op.grad(inputs, new_output_grads)

Problem of funtion _shortcut

keras-resnet/resnet.py

Line 70 in 5e9bcca

def _shortcut(input, residual):

In Function _shortcut, two lines
stride_width = int(round(input_shape[ROW_AXIS] / residual_shape[ROW_AXIS]))
stride_height = int(round(input_shape[COL_AXIS] / residual_shape[COL_AXIS]))
and
shortcut = Conv2D(filters=residual_shape[CHANNEL_AXIS],
kernel_size=(1, 1),
strides=(stride_width, stride_height),
padding="valid",
kernel_initializer="he_normal",
kernel_regularizer=l2(0.0001))(input)
The Problem is : Throught the above operation, the shape of shortcut is not equal to residual's.
For example input_shape:3434 residual_shape 1010 and throught the operation above we have shortcut's shape 1212 which is not equal to 1010

Got zeroDivisionError on running default main

After running default main() function with nothing changed, I got the following error. So wondering whether anyone else got this error as well and how to revise it. self.subsample[1] = 0. Thank you!
Using Theano backend.

Traceback (most recent call last):
File "/Users/PaulYu/Documents/git/Deep Residual Networks/keras-resnet/resnet.py", line 163, in
main()
File "/Users/PaulYu/Documents/git/Deep Residual Networks/keras-resnet/resnet.py", line 157, in main
model = ResNetBuilder.build_resnet_18((3, 224, 224), 1000)
File "/Users/PaulYu/Documents/git/Deep Residual Networks/keras-resnet/resnet.py", line 137, in build_resnet_18
return ResNetBuilder.build(input_shape, num_outputs, basic_block, [2, 2, 2, 2])
File "/Users/PaulYu/Documents/git/Deep Residual Networks/keras-resnet/resnet.py", line 124, in build
block = _residual_block(block_fn, nb_filters=nb_filters, repetitions=r, is_first_layer=i == 0)(block)
File "/Users/PaulYu/Documents/git/Deep Residual Networks/keras-resnet/resnet.py", line 66, in f
input = block_function(nb_filters=nb_filters, init_subsample=init_subsample)(input)
File "/Users/PaulYu/Documents/git/Deep Residual Networks/keras-resnet/resnet.py", line 79, in f
return _shortcut(input, residual)
File "/Users/PaulYu/Documents/git/Deep Residual Networks/keras-resnet/resnet.py", line 54, in _shortcut
init="he_normal", border_mode="valid")(input)
File "/Library/Python/2.7/site-packages/keras/engine/topology.py", line 514, in call
self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
File "/Library/Python/2.7/site-packages/keras/engine/topology.py", line 572, in add_inbound_node
Node.create_node(self, inbound_layers, node_indices, tensor_indices)
File "/Library/Python/2.7/site-packages/keras/engine/topology.py", line 152, in create_node
output_shapes = to_list(outbound_layer.get_output_shape_for(input_shapes[0]))
File "/Library/Python/2.7/site-packages/keras/layers/convolutional.py", line 453, in get_output_shape_for
self.border_mode, self.subsample[1])
File "/Library/Python/2.7/site-packages/keras/utils/np_utils.py", line 131, in conv_output_length
return (output_length + stride - 1) // stride
ZeroDivisionError: integer division or modulo by zero

Process finished with exit code 1

Some questions about feature map size matching.

Hello, I want to ask you some questions about the matching of feature map size.
For example:
Input an 256X256 size image
First Conv2D output 16X128X128 size feature map
Second Conv2D output 32X64X64 size feature map
Then we want use the Res_block(short_cut) to match the channel and feature map size.
can we use 3X3 kernel, 32 channel, strides=2, Conv2D, to match the feature map size?
I am not sure about it.

about data provider

Hello, I want to use your resnet model to pretrain my data. When I write the function of load_data.It always has error.I have no idea of defining .Any idea or example about this？ I would appreciate it.

AttributeError: module 'resnet' has no attribute 'ResnetBuilder'

Hi surpisingly I get this error but I don't undestand why it happens in following part:

 model = resnet.ResnetBuilder.build_resnet_18((img_channels, img_rows, img_cols), nb_classes)
 model.compile(loss='categorical_crossentropy',
                 optimizer='adam',

AttributeError: module 'resnet' has no attribute 'ResnetBuilder'

Any ideas , how it could be fixed?

Problem with input shape

When I run this with:

tensorflow (0.10.0)
Keras (1.0.8)

I get an error:

Using TensorFlow backend.
Traceback (most recent call last):
File "resnet.py", line 147, in
main()
File "resnet.py", line 127, in main
model = resnet()
File "resnet.py", line 105, in resnet
conv1 = _conv_bn_relu(nb_filter=64, nb_row=7, nb_col=7, subsample=(2, 2))(input)
File "resnet.py", line 24, in f
init="he_normal", border_mode="same")(input)
File "/Users/owang/Library/Python/2.7/lib/python/site-packages/keras/engine/topology.py", line 515, in call
self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
File "/Users/owang/Library/Python/2.7/lib/python/site-packages/keras/engine/topology.py", line 573, in add_inbound_node
Node.create_node(self, inbound_layers, node_indices, tensor_indices)
File "/Users/owang/Library/Python/2.7/lib/python/site-packages/keras/engine/topology.py", line 150, in create_node
output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
File "/Users/owang/Library/Python/2.7/lib/python/site-packages/keras/layers/convolutional.py", line 353, in call
filter_shape=self.W_shape)
File "/Users/owang/Library/Python/2.7/lib/python/site-packages/keras/backend/tensorflow_backend.py", line 1518, in conv2d
x = tf.nn.conv2d(x, kernel, strides, padding=padding)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 394, in conv2d
data_format=data_format, name=name)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 703, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2319, in create_op
set_shapes_for_outputs(ret)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1711, in set_shapes_for_outputs
shapes = shape_func(op)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/common_shapes.py", line 246, in conv2d_shape
padding)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/common_shapes.py", line 184, in get2d_conv_output_size
(row_stride, col_stride), padding_type)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/common_shapes.py", line 149, in get_conv_output_size
"Filter: %r Input: %r" % (filter_size, input_size))
ValueError: Filter must not be larger than the input: Filter: (7, 7) Input: (3, 224)

AssertionError running resnet() on Theano backend

I've been trying to use resnet for an image classification task, and am encountering the following assertion error:

Traceback (most recent call last):
  File "resnet.py", line 270, in <module>
    callbacks = [earlyStopping])
  File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/keras/engine/training.py", line 1026, in fit
    self._make_test_function()
  File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/keras/engine/training.py", line 695, in _make_test_function
    **self._function_kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/keras/backend/theano_backend.py", line 541, in function
    return Function(inputs, outputs, updates=updates, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/keras/backend/theano_backend.py", line 527, in __init__
    **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/theano/compile/function.py", line 320, in function
    output_keys=output_keys)
  File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/theano/compile/pfunc.py", line 479, in pfunc
    output_keys=output_keys)
  File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/theano/compile/function_module.py", line 1776, in orig_function
    output_keys=output_keys).create(
  File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/theano/compile/function_module.py", line 1456, in __init__
    optimizer_profile = optimizer(fgraph)
  File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/theano/gof/opt.py", line 101, in __call__
    return self.optimize(fgraph)
  File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/theano/gof/opt.py", line 89, in optimize
    ret = self.apply(fgraph, *args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/theano/gof/opt.py", line 230, in apply
    sub_prof = optimizer.optimize(fgraph)
  File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/theano/gof/opt.py", line 85, in optimize
    self.add_requirements(fgraph)
  File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/theano/tensor/opt.py", line 1438, in add_requirements
    fgraph.attach_feature(ShapeFeature())
  File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/theano/gof/fg.py", line 621, in attach_feature
    attach(self)
  File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/theano/tensor/opt.py", line 1262, in on_attach
    self.on_import(fgraph, node, reason='on_attach')
  File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/theano/tensor/opt.py", line 1304, in on_import
    assert d.dtype in theano.tensor.discrete_dtypes, (node, d.dtype)
AssertionError: (AbstractConv2d{border_mode='valid', subsample=(1.0, 1.0), filter_flip=True, imshp=(None, None, None, None), kshp=(256, 64, 1, 1)}(Subtensor{::, ::, :int64:, :int64:}.0, HostFromGpu.0), 'float64')

I've changed the input shape and output dimension of the resnet() function, and left the other helper functions untouched, and the way I am using resnet() is as follows:

model = resnet()
model.compile(loss='categorical_crossentropy',
              optimizer=sgd, # custom optimizer
              metrics=['accuracy'])
datagen = ImageDataGenerator(
        featurewise_center=False, 
        samplewise_center=False, 
        featurewise_std_normalization=False, 
        samplewise_std_normalization=False, 
        zca_whitening=False,  
        rotation_range=5,  
        width_shift_range=0, 
        height_shift_range=0,
        horizontal_flip=False, 
        vertical_flip=False)  

datagen.fit(X_train)
model.fit_generator(datagen.flow(X_train, Y_train,
                batch_size=batch_size),
                samples_per_epoch=X_train.shape[0],
                nb_epoch=nb_epoch,
                validation_data=(X_val, Y_val),
                callbacks = [earlyStopping])

The error is being thrown at the model.fit_generator line, and it might be useful to note that I've tested the same code on much simpler conv nets.

Any thoughts on how I may proceed dealing with this error?

TimeDistributed wrapping causing issues

Hi,
I want to wrap TimeDistributed over a built resnet. So I tried doing it in my model like this:

TimeDistributed(ResnetBuilder.build_resnet_18(mouth_input_shape, mouth_features_dim),
                                              input_shape=(TIME_STEPS, *mouth_input_shape))

However, while fitting my model, it is giving me the following error:

2017-11-17 18:35:58.790402: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: You must feed a value for placeholder tensor 'batch_normalization_1/keras_learning_phase' with dtype bool
         [[Node: batch_normalization_1/keras_learning_phase = Placeholder[dtype=DT_BOOL, shape=<unknown>, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
2017-11-17 18:35:58.790456: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: You must feed a value for placeholder tensor 'batch_normalization_1/keras_learning_phase' with dtype bool
         [[Node: batch_normalization_1/keras_learning_phase = Placeholder[dtype=DT_BOOL, shape=<unknown>, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
2017-11-17 18:35:58.790835: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: You must feed a value for placeholder tensor 'batch_normalization_1/keras_learning_phase' with dtype bool
         [[Node: batch_normalization_1/keras_learning_phase = Placeholder[dtype=DT_BOOL, shape=<unknown>, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
2017-11-17 18:35:58.791022: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: You must feed a value for placeholder tensor 'batch_normalization_1/keras_learning_phase' with dtype bool
         [[Node: batch_normalization_1/keras_learning_phase = Placeholder[dtype=DT_BOOL, shape=<unknown>, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
2017-11-17 18:35:58.791086: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: You must feed a value for placeholder tensor 'batch_normalization_1/keras_learning_phase' with dtype bool
         [[Node: batch_normalization_1/keras_learning_phase = Placeholder[dtype=DT_BOOL, shape=<unknown>, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
Traceback (most recent call last):
  File "/users/voleti.vikram/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1323, in _do_call
    return fn(*args)
  File "/users/voleti.vikram/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1302, in _run_fn
    status, run_metadata)
  File "/users/voleti.vikram/.local/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'batch_normalization_1/keras_learning_phase' with dtype bool
         [[Node: batch_normalization_1/keras_learning_phase = Placeholder[dtype=DT_BOOL, shape=<unknown>, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
         [[Node: _arg_input_4_0_5/_850 = _Send[T=DT_FLOAT, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", 
send_device_incarnation=1, tensor_name="edge_10815__arg_input_4_0_5", _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_input_4_0_5)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 11, in <module>                                                                                                                                                              [0/1869]
  File "/users/voleti.vikram/.local/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 87, in wrapper
    return func(*args, **kwargs)
  File "/users/voleti.vikram/.local/lib/python3.6/site-packages/keras/engine/training.py", line 2077, in fit_generator
    class_weight=class_weight)
  File "/users/voleti.vikram/.local/lib/python3.6/site-packages/keras/engine/training.py", line 1797, in train_on_batch
    outputs = self.train_function(ins)
  File "/users/voleti.vikram/.local/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2332, in __call__
    **self.session_kwargs)
  File "/users/voleti.vikram/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 889, in run
    run_metadata_ptr)
  File "/users/voleti.vikram/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1120, in _run
    feed_dict_tensor, options, run_metadata)
  File "/users/voleti.vikram/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run
    options, run_metadata)
  File "/users/voleti.vikram/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'batch_normalization_1/keras_learning_phase' with dtype bool
         [[Node: batch_normalization_1/keras_learning_phase = Placeholder[dtype=DT_BOOL, shape=<unknown>, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
         [[Node: _arg_input_4_0_5/_850 = _Send[T=DT_FLOAT, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", 
send_device_incarnation=1, tensor_name="edge_10815__arg_input_4_0_5", _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_input_4_0_5)]]

Caused by op 'batch_normalization_1/keras_learning_phase', defined at:
  File "<stdin>", line 1, in <module>
  File "/shared/fusor/home/voleti.vikram/lipreading-in-the-wild-experiments/assessor/assessor_model.py", line 40, in my_assessor_model
    mouth_feature_model = TimeDistributed(ResnetBuilder.build_resnet_18(mouth_input_shape, mouth_features_dim),
  File "/shared/fusor/home/voleti.vikram/lipreading-in-the-wild-experiments/assessor/resnet.py", line 273, in build_resnet_18
    return ResnetBuilder.build(input_shape, num_outputs, basic_block, [2, 2, 2, 2])
  File "/shared/fusor/home/voleti.vikram/lipreading-in-the-wild-experiments/assessor/resnet.py", line 248, in build
    conv1 = _conv_bn_relu(filters=64, kernel_size=(7, 7), strides=(2, 2))(input)
  File "/shared/fusor/home/voleti.vikram/lipreading-in-the-wild-experiments/assessor/resnet.py", line 81, in f
    return _bn_relu(conv)
  File "/shared/fusor/home/voleti.vikram/lipreading-in-the-wild-experiments/assessor/resnet.py", line 62, in _bn_relu
    norm = BatchNormalization(axis=CHANNEL_AXIS)(input)
  File "/users/voleti.vikram/.local/lib/python3.6/site-packages/keras/engine/topology.py", line 603, in __call__
    output = self.call(inputs, **kwargs)
  File "/users/voleti.vikram/.local/lib/python3.6/site-packages/keras/layers/normalization.py", line 190, in call
    training=training)
  File "/users/voleti.vikram/.local/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2712, in in_train_phase
    training = learning_phase()
  File "/users/voleti.vikram/.local/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 120, in learning_phase
    name='keras_learning_phase')
  File "/users/voleti.vikram/.local/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 1599, in placeholder
    return gen_array_ops._placeholder(dtype=dtype, shape=shape, name=name)
  File "/users/voleti.vikram/.local/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3091, in _placeholder
    "Placeholder", dtype=dtype, shape=shape, name=name)
  File "/users/voleti.vikram/.local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/users/voleti.vikram/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
    op_def=op_def)
  File "/users/voleti.vikram/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'batch_normalization_1/keras_learning_phase' with dtype bool
         [[Node: batch_normalization_1/keras_learning_phase = Placeholder[dtype=DT_BOOL, shape=<unknown>, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
         [[Node: _arg_input_4_0_5/_850 = _Send[T=DT_FLOAT, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", 
send_device_incarnation=1, tensor_name="edge_10815__arg_input_4_0_5", _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_input_4_0_5)]]

It is possibly because learning_phase needs to be set to 1. But I tried doing that and it still doesn't work. Please help.

Training accuracy converges very fast

Hello everyone, I have run the cifar10.py (set batch_size=128 as in the paper, and set data_augmentation=false, using sgd). My result shows that the training accuracy converges to 1 after 30 epochs, but the testing accuracy converges to 0.6. I do not understand why training accuracy converges so fast (much faster than the results in paper). Does anyone has an idea of the reason?

Thank you!

'TensorVariable' object has no attribute '_shape'

Hey Ragha, thank you for sharing this, nice job!

I'm having difficulties running your example. I'm using latest versions of Keras and Theano.

File "train_resnet.py", line 139, in <module>
    model = resnet()
  File "resnet.py", line 110, in resnet
    block1 = _residual_block(block_fn, nb_filters=64, repetations=3, is_first_layer=True)(pool1)
  File "resnet.py", line 93, in f
    input = block_function(nb_filters=nb_filters, init_subsample=init_subsample)(input)
  File "resnet.py", line 50, in f
    return _shortcut(input, residual)
  File "resnet.py", line 72, in _shortcut
    stride_width = input._shape[2].value / residual._shape[2].value
AttributeError: 'TensorVariable' object has no attribute '_shape'

Would you mind letting me know if have to do anything more in order to run your implementation of ResNet?

Thank you in advance,
Marko

Incorporate into keras models repository

Would you be interested to get this into the keras models repository?
https://github.com/fchollet/deep-learning-models

I might be able to help with modifying the class to match the others appropriately.

Merging before BM and Pool__resnet50

I have puzzled about the sequence of merging;
http://ethereon.github.io/netscope/#/gist/db945b393d40bfa26006; this picture told me that I should do BM first and then merge the input with residual and finally execute the activation function ReLU. However, I found that the author do merging first and after that do the BN and ReLu in the code. So, I wanna know whether there is the differnce between this two kinds of ways for the architure?

weights initilization method

According to the paper:https://arxiv.org/pdf/1512.03385v1.pdf, it is using https://arxiv.org/pdf/1502.01852.pdf to initialize weights, what do you think of it? It also mentions prelu, should we use it, I have such question as my loss is not converging at all...

Cannot allocate memory

I am seeing the following error when trying to train the model for 3000 (224*224) images. I am running it against theano on a g2.2x instance. Does keras copy all training examples to GPU (which is 4gb) before starting to train or it just copies one batch at a time. I was able to run the smaller network for cifar10 without any problems. Anything I am missing?

Problem occurred during compilation with the command line below:
/usr/bin/g++ -shared -g -O3 -fno-math-errno -Wno-unused-label -Wno-unused-variable -Wno-write-strings -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -m64 -fPIC -I/usr/local/lib/python2.7/dist-packages/numpy/core/include -I/usr/include/python2.7 -I/usr/local/lib/python2.7/dist-packages/theano/gof -fvisibility=hidden -o /home/ubuntu/.theano/compiledir_Linux-3.13--generic-x86_64-with-Ubuntu-14.04-trusty-x86_64-2.7.6-64/tmpHQHx4h/af8288e23d4f9216c7a523f6c00e6cf5.so /home/ubuntu/.theano/compiledir_Linux-3.13--generic-x86_64-with-Ubuntu-14.04-trusty-x86_64-2.7.6-64/tmpHQHx4h/mod.cpp -L/usr/lib -lpython2.7
ERROR (theano.gof.cmodule): [Errno 12] Cannot allocate memory
Traceback (most recent call last):
...

 File "/usr/local/lib/python2.7/dist-packages/theano/misc/windows.py", line 75, in output_subprocess_Popen
    p = subprocess_Popen(command, **params)
File "/usr/local/lib/python2.7/dist-packages/theano/misc/windows.py", line 36, in subprocess_Popen
   proc = subprocess.Popen(command, startupinfo=startupinfo, **params)
File "/usr/lib/python2.7/subprocess.py", line 710, in __init__
    errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1223, in _execute_child
   self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

ValueError on Python 3.4 environment using TensorFlow backend

I worked well on Python 2.7 environment using Theano backend, but I got a ValueError on Python 3.4 environment using TensorFlow backend. How should I resolve that?

Keiku@ubuntu:~/keras-resnet$ python resnet.py
Using TensorFlow backend.
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally

(Omitted)

Traceback (most recent call last):
  File "resnet.py", line 147, in <module>
    main()
  File "resnet.py", line 127, in main
    model = resnet()
  File "resnet.py", line 111, in resnet
    block2 = _residual_block(block_fn, nb_filters=128, repetations=4)(block1)
  File "resnet.py", line 94, in f
    input = block_function(nb_filters=nb_filters, init_subsample=init_subsample)(input)
  File "resnet.py", line 48, in f
    conv_1_1 = _bn_relu_conv(nb_filters, 1, 1, subsample=init_subsample)(input)
  File "resnet.py", line 38, in f
    init="he_normal", border_mode="same")(activation)
  File "/home/Keiku/.pyenv/versions/anaconda3-2.3.0/lib/python3.4/site-packages/keras/engine/topology.py", line 485, in __call__
    self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
  File "/home/Keiku/.pyenv/versions/anaconda3-2.3.0/lib/python3.4/site-packages/keras/engine/topology.py", line 543, in add_inbound_node
    Node.create_node(self, inbound_layers, node_indices, tensor_indices)
  File "/home/Keiku/.pyenv/versions/anaconda3-2.3.0/lib/python3.4/site-packages/keras/engine/topology.py", line 148, in create_node
    output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
  File "/home/Keiku/.pyenv/versions/anaconda3-2.3.0/lib/python3.4/site-packages/keras/layers/convolutional.py", line 341, in call
    filter_shape=self.W_shape)
  File "/home/Keiku/.pyenv/versions/anaconda3-2.3.0/lib/python3.4/site-packages/keras/backend/tensorflow_backend.py", line 952, in conv2d
    x = tf.nn.conv2d(x, kernel, strides, padding=padding)
  File "/home/Keiku/.pyenv/versions/anaconda3-2.3.0/lib/python3.4/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 295, in conv2d
    data_format=data_format, name=name)
  File "/home/Keiku/.pyenv/versions/anaconda3-2.3.0/lib/python3.4/site-packages/tensorflow/python/ops/op_def_library.py", line 655, in apply_op
    op_def=op_def)
  File "/home/Keiku/.pyenv/versions/anaconda3-2.3.0/lib/python3.4/site-packages/tensorflow/python/framework/ops.py", line 2156, in create_op
    set_shapes_for_outputs(ret)
  File "/home/Keiku/.pyenv/versions/anaconda3-2.3.0/lib/python3.4/site-packages/tensorflow/python/framework/ops.py", line 1612, in set_shapes_for_outputs
    shapes = shape_func(op)
  File "/home/Keiku/.pyenv/versions/anaconda3-2.3.0/lib/python3.4/site-packages/tensorflow/python/ops/common_shapes.py", line 238, in conv2d_shape
    in_rows, in_cols, filter_rows, filter_cols, stride_r, stride_c, padding)
  File "/home/Keiku/.pyenv/versions/anaconda3-2.3.0/lib/python3.4/site-packages/tensorflow/python/ops/common_shapes.py", line 155, in get2d_conv_output_size
    % (row_stride, col_stride, filter_height, filter_width))
ValueError: ('stride must be less than or equal to filter size', 'stride: [2x2] filter: [1x1]')
Keiku@ubuntu:~/keras-resnet$

add license

Hey, this look quite good, thanks for putting it up!

Could you add a license to this, preferably the MIT license that Keras itself uses?
https://github.com/fchollet/keras/blob/master/LICENSE

I think you'd just have to change the copyright line to your own.

can not get val_acc above 90%

Hi there,

In the paper the val_acc could reach more than 93%, but my trained model using the default configuration could at most got 87% val_acc. I've tried different learning rate schedules but do not improve a lot.

Any idea how can we further boost the val_acc? Thank you.

image prediction

Hello.

After training resnet how do i use it?

Only layers of same output shape can be merged using sum mode.

for shape (3, 200, 200)

basically reopen issue #17

#17

I had this error

Hi Mr raghakot
Im very thankful for your work in resnet

the issue was that
I had this error while trying this
#########################################3
Error:
input = block_function(nb_filter,init_subsample,(is_first_layer and (i == 0)))(input)

TypeError: 'str' object is not callable

#########################################3

while trying to run this:

model=ResnetBuilder.build((3,100,100), 1, 'bottleneck',[1])
model.summary()

#########################################3
So then I went to resnet.py and changed this

def _residual_block(block_function, nb_filter, repetitions, is_first_layer=False):

"""Builds a residual block with repeating bottleneck blocks.

"""

def f(input):

    for i in range(repetitions):

        init_subsample = (1, 1)

        if i == 0 and not is_first_layer:

            init_subsample = (2, 2)

        input = block_function(

                nb_filter=nb_filter,

                init_subsample=init_subsample,

                is_first_block_of_first_layer=(is_first_layer and i == 0)

            )(input)

    return input



return f

to this :

def _residual_block(block_function, nb_filter, repetitions, is_first_layer=False):
"""Builds a residual block with repeating bottleneck blocks.
"""
def f(input):
for i in range(repetitions):
init_subsample = (1, 1)
if i == 0 and not is_first_layer:
init_subsample = (2, 2)
if block_function=='basic_block':
input = basic_block(nb_filter,init_subsample,(is_first_layer and (i == 0)))(input)
elif block_function=='bottleneck':
input = bottleneck(nb_filter,init_subsample,(is_first_layer and (i == 0)))(input)
else:
print('llamada a funcion extraña')
return input

return f

I would like to know if it could still working correctly after doing that.

Thank you very much for your help

add BatchNormalization layers after 1 * 1 convolution in the shortcut path

In the official version keras.datasets.resnet50, the 1 * 1 convolution layer on the shortcut path is continued with a batchnormalization layer. I tried the network without this layer with CIFAR-10 dataset, and it seems the response of convolution path is overwhelmed by the shortcut path. In this way, training has to be very slow, then reach a low accuracy plateau.

No weight decay

I notice this implementation does not apply weight decay as in the original paper. Is this an oversight or did you find that weight decay was not required for good results?