vsfbmdsvfz bdsvfbdszxvbn zxfb dsbnz vn bnbnf nbf sdzvn

This turned out to be a longer write-up than I anticipated. The main points are:

Keras' Theano Backend runs over 10X slower with batch normalization
This issue does not exist with a Tensorflow backend.
Issue #1309 seems to say the problem is fixed, though in my experience it persists
I don't know whether my problem stems from:
a. Theano's implementation of Batch Normalization
b. Keras' use of Theano's Batch Normalization Procedures
c. My use of Keras' use of Theano's Batch Normalization procedures
I'm currently using fairly recent versions of Keras, Theano and cuDNN:

Using Theano backend.
Using cuDNN version 7104 on context None
Mapped name None to device cuda: GeForce GTX 1080 with Max-Q Design (0000:01:00.0)

keras.version
'2.2.4'
theano.version
u'1.0.3'

When I run the following modified/simplified version of keras/examples/cifar10_resnet.py, I get a significant slowdown when batch normalization is used. The code is:

"""Adapted from cifar10_resnet.py"""

from future import print_function
import argparse
import keras
from keras.layers import Dense, Conv2D, BatchNormalization, Activation
from keras.layers import AveragePooling2D, Input, Flatten
from keras.optimizers import Adam
from keras.callbacks import ModelCheckpoint, LearningRateScheduler
from keras.callbacks import ReduceLROnPlateau
from keras.preprocessing.image import ImageDataGenerator
from keras.regularizers import l2
from keras import backend as K
from keras.models import Model
from keras.datasets import cifar10
import numpy as np
import os
import pdb

def get_data():
""" Loads CIFAR10 Data and converts to numpy arrays
for net"""

Load the CIFAR10 data.

(x_train, y_train), (x_test, y_test) = cifar10.load_data()

Normalize data.

x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255

Convert class vectors to binary class matrices.

num_classes = 10
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

return (x_train, x_test, y_train, y_test)
def resnet_layer(inputs, batch_norm):
"""2D Convolution-Batch Normalization-Activation stack builder

Arguments

inputs (tensor): input tensor from input image or previous layer
batch_norm (bool): whether to include batch normalization

Returns

x (tensor): tensor as input to the next layer

"""
conv = Conv2D(16,
kernel_size=3,
strides=1,
padding='same',
kernel_initializer='he_normal',
kernel_regularizer=l2(1e-4))

x = inputs
x = conv(x)
if batch_norm:
x = BatchNormalization()(x)
x = Activation('relu')(x)

return x
def resnet_v1(input_shape, batch_norm):
"""ResNet Version 1 Model builder [a]

Arguments

input_shape (tensor): shape of input image tensor
batch_norm (bool): whether to include batch normalization

Returns

model (Model): Keras model instance

"""

Start model definition.

inputs = Input(shape=input_shape)
x = resnet_layer(inputs, batch_norm)

Instantiate the stack of residual units

for res_block in range(3):
y = resnet_layer(x, batch_norm)
x = keras.layers.add([x, y])
x = Activation('relu')(x)

Add classifier on top.

v1 does not use BN after last shortcut connection-ReLU

x = AveragePooling2D(pool_size=8)(x)
y = Flatten()(x)
outputs = Dense(10,
activation='softmax',
kernel_initializer='he_normal')(y)

Instantiate model.

model = Model(inputs=inputs, outputs=outputs)
return model
if name == 'main':
ap = argparse.ArgumentParser()
ap.add_argument("--bn", action='store_true',
help="batch_normalization flag")

args = ap.parse_args()
batch_norm_flag = args.bn

Get CIFAR10 data

x_train, x_test, y_train, y_test = get_data()

Build net

input_shape = x_train.shape[1:]
model = resnet_v1(input_shape, batch_norm_flag)
model.compile(loss='categorical_crossentropy',
optimizer=Adam(lr=1e-3),
metrics=['accuracy'])
#model.summary()

Run training, without data augmentation.

model.fit(x_train, y_train,
batch_size=32,
epochs=1,
validation_data=(x_test, y_test),
shuffle=True)

Score trained model.

scores = model.evaluate(x_test, y_test, verbose=1)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])
The different (with respect to speed) results I get are as follows:

WITH Batchnorm
$ python cifar10_resnet_batchnorm_test.py --bn
Using Theano backend.
Using cuDNN version 7104 on context None
Mapped name None to device cuda: GeForce GTX 1080 with Max-Q Design (0000:01:00.0)
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 82s 2ms/step - loss: 1.7749 - acc: 0.3629 - val_loss: 1.5035 - val_acc: 0.4697
10000/10000 [==============================] - 4s 383us/step
Test loss: 1.5035479030609131
Test accuracy: 0.4697

WITHOUT Batchnorm
$ python cifar10_resnet_batchnorm_test.py
Using Theano backend.
Using cuDNN version 7104 on context None
Mapped name None to device cuda: GeForce GTX 1080 with Max-Q Design (0000:01:00.0)
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 7s 132us/step - loss: 1.7122 - acc: 0.3863 - val_loss: 1.4815 - val_acc: 0.4750
10000/10000 [==============================] - 0s 27us/step
Test loss: 1.481479389190674
Test accuracy: 0.475

This slowdown is by more than a factor of 10.

If I use Tensorflow, the timings are:

WITH Batch Norm
50000/50000 [==============================] - 10s 204us/step - loss: 1.5723 - acc: 0.4416 - val_loss: 1.3853 - val_acc: 0.5129
10000/10000 [==============================] - 1s 57us/step
Test loss: 1.3852726093292236
Test accuracy: 0.5129

WITHOUT Batch Norm:
50000/50000 [==============================] - 10s 194us/step - loss: 1.7507 - acc: 0.3728 - val_loss: 1.5280 - val_acc: 0.4503
10000/10000 [==============================] - 1s 53us/step
Test loss: 1.527961152267456
Test accuracy: 0.4503

So, the issue seems to be the use of batch norm with Theano. It's also puzzling that Theano does slightly worse (-0.5%) with batch norm, while Tensorflow does noticeably better with batch norm (+6%), but the two backends converge with more training epochs.

I note that issue #1309 seems to be about this same problem and seems to regard it as solved, as of Feb. 14, 2017. Yet, I'm still having this problem. Is it something I'm doing, an issue with Keras' interfacing with Theano or an issue with Theano's implementation of Batch Normalization?

(Note: This is a distillation of an issue I raised in #12173, where I noted that Tensorflow does not experience this slowdown. But, I closed that issue and opened this one, since I am now able to more precisely state what the issue I've run into is. I hope this is the correct protocol for redefining an issue after further study)

Keras SQRT implementation

Hi all,

I am having some warning from theano NaNGuard complaining about np.inf being a very large number ;)

Seems that the source of this warning is the way backend.sqrt is implemented in Keras in the different backends.
This is the tensorflow implementation (comments off):

keras/keras/backend/tensorflow_backend.py

Line 1568 in a0e90bd

def sqrt(x):
def sqrt(x):
zero = _to_tensor(0., x.dtype.base_dtype)
inf = _to_tensor(np.inf, x.dtype.base_dtype)
x = tf.clip_by_value(x, zero, inf)
return tf.sqrt(x)
For the sqrt case, Wouldn't it be simpler to use tf.maximum (x, zero) instead of tf.clip_by_value which evaluates both limits in tensorflow (more complex graph with clip which makes two ops)?

https://github.com/tensorflow/tensorflow/blob/a6d8ffae097d0132989ae4688d224121ec6d8f35/tensorflow/python/ops/clip_ops.py#L39

This applies also to theano and probably to some activation functions like relu etc.

I don't really understand the rationale behind this protection (use of clip to ensure the argument is positive). As a developer, if I design a component or a optimizer that is bad at this point (log of negative values) , I prefer getting an explicit error message from lower-level layer (tensorflow, theano). CNTK backend does not protect and that seems OK.
Thanks in advance for your feedback.

rthadur / csat Goto Github PK

csat's People

Contributors

Watchers

csat's Issues

Load the CIFAR10 data.

Normalize data.

Convert class vectors to binary class matrices.

Arguments

Returns

Arguments

Returns

Start model definition.

Instantiate the stack of residual units

Add classifier on top.

v1 does not use BN after last shortcut connection-ReLU

Instantiate model.

Get CIFAR10 data

Build net

Run training, without data augmentation.

Score trained model.

Recommend Projects

Recommend Topics

Recommend Org