csat's People
csat's Issues
vsfbmdsvfz bdsvfbdszxvbn zxfb dsbnz vn bnbnf nbf sdzvn
This turned out to be a longer write-up than I anticipated. The main points are:
Keras' Theano Backend runs over 10X slower with batch normalization
This issue does not exist with a Tensorflow backend.
Issue #1309 seems to say the problem is fixed, though in my experience it persists
I don't know whether my problem stems from:
a. Theano's implementation of Batch Normalization
b. Keras' use of Theano's Batch Normalization Procedures
c. My use of Keras' use of Theano's Batch Normalization procedures
I'm currently using fairly recent versions of Keras, Theano and cuDNN:
Using Theano backend.
Using cuDNN version 7104 on context None
Mapped name None to device cuda: GeForce GTX 1080 with Max-Q Design (0000:01:00.0)
keras.version
'2.2.4'
theano.version
u'1.0.3'
When I run the following modified/simplified version of keras/examples/cifar10_resnet.py, I get a significant slowdown when batch normalization is used. The code is:
"""Adapted from cifar10_resnet.py"""
from future import print_function
import argparse
import keras
from keras.layers import Dense, Conv2D, BatchNormalization, Activation
from keras.layers import AveragePooling2D, Input, Flatten
from keras.optimizers import Adam
from keras.callbacks import ModelCheckpoint, LearningRateScheduler
from keras.callbacks import ReduceLROnPlateau
from keras.preprocessing.image import ImageDataGenerator
from keras.regularizers import l2
from keras import backend as K
from keras.models import Model
from keras.datasets import cifar10
import numpy as np
import os
import pdb
def get_data():
""" Loads CIFAR10 Data and converts to numpy arrays
for net"""
Load the CIFAR10 data.
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
Normalize data.
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255
Convert class vectors to binary class matrices.
num_classes = 10
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
return (x_train, x_test, y_train, y_test)
def resnet_layer(inputs, batch_norm):
"""2D Convolution-Batch Normalization-Activation stack builder
Arguments
inputs (tensor): input tensor from input image or previous layer
batch_norm (bool): whether to include batch normalization
Returns
x (tensor): tensor as input to the next layer
"""
conv = Conv2D(16,
kernel_size=3,
strides=1,
padding='same',
kernel_initializer='he_normal',
kernel_regularizer=l2(1e-4))
x = inputs
x = conv(x)
if batch_norm:
x = BatchNormalization()(x)
x = Activation('relu')(x)
return x
def resnet_v1(input_shape, batch_norm):
"""ResNet Version 1 Model builder [a]
Arguments
input_shape (tensor): shape of input image tensor
batch_norm (bool): whether to include batch normalization
Returns
model (Model): Keras model instance
"""
Start model definition.
inputs = Input(shape=input_shape)
x = resnet_layer(inputs, batch_norm)
Instantiate the stack of residual units
for res_block in range(3):
y = resnet_layer(x, batch_norm)
x = keras.layers.add([x, y])
x = Activation('relu')(x)
Add classifier on top.
v1 does not use BN after last shortcut connection-ReLU
x = AveragePooling2D(pool_size=8)(x)
y = Flatten()(x)
outputs = Dense(10,
activation='softmax',
kernel_initializer='he_normal')(y)
Instantiate model.
model = Model(inputs=inputs, outputs=outputs)
return model
if name == 'main':
ap = argparse.ArgumentParser()
ap.add_argument("--bn", action='store_true',
help="batch_normalization flag")
args = ap.parse_args()
batch_norm_flag = args.bn
Get CIFAR10 data
x_train, x_test, y_train, y_test = get_data()
Build net
input_shape = x_train.shape[1:]
model = resnet_v1(input_shape, batch_norm_flag)
model.compile(loss='categorical_crossentropy',
optimizer=Adam(lr=1e-3),
metrics=['accuracy'])
#model.summary()
Run training, without data augmentation.
model.fit(x_train, y_train,
batch_size=32,
epochs=1,
validation_data=(x_test, y_test),
shuffle=True)
Score trained model.
scores = model.evaluate(x_test, y_test, verbose=1)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])
The different (with respect to speed) results I get are as follows:
WITH Batchnorm
$ python cifar10_resnet_batchnorm_test.py --bn
Using Theano backend.
Using cuDNN version 7104 on context None
Mapped name None to device cuda: GeForce GTX 1080 with Max-Q Design (0000:01:00.0)
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 82s 2ms/step - loss: 1.7749 - acc: 0.3629 - val_loss: 1.5035 - val_acc: 0.4697
10000/10000 [==============================] - 4s 383us/step
Test loss: 1.5035479030609131
Test accuracy: 0.4697
WITHOUT Batchnorm
$ python cifar10_resnet_batchnorm_test.py
Using Theano backend.
Using cuDNN version 7104 on context None
Mapped name None to device cuda: GeForce GTX 1080 with Max-Q Design (0000:01:00.0)
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 7s 132us/step - loss: 1.7122 - acc: 0.3863 - val_loss: 1.4815 - val_acc: 0.4750
10000/10000 [==============================] - 0s 27us/step
Test loss: 1.481479389190674
Test accuracy: 0.475
This slowdown is by more than a factor of 10.
If I use Tensorflow, the timings are:
WITH Batch Norm
50000/50000 [==============================] - 10s 204us/step - loss: 1.5723 - acc: 0.4416 - val_loss: 1.3853 - val_acc: 0.5129
10000/10000 [==============================] - 1s 57us/step
Test loss: 1.3852726093292236
Test accuracy: 0.5129
WITHOUT Batch Norm:
50000/50000 [==============================] - 10s 194us/step - loss: 1.7507 - acc: 0.3728 - val_loss: 1.5280 - val_acc: 0.4503
10000/10000 [==============================] - 1s 53us/step
Test loss: 1.527961152267456
Test accuracy: 0.4503
So, the issue seems to be the use of batch norm with Theano. It's also puzzling that Theano does slightly worse (-0.5%) with batch norm, while Tensorflow does noticeably better with batch norm (+6%), but the two backends converge with more training epochs.
I note that issue #1309 seems to be about this same problem and seems to regard it as solved, as of Feb. 14, 2017. Yet, I'm still having this problem. Is it something I'm doing, an issue with Keras' interfacing with Theano or an issue with Theano's implementation of Batch Normalization?
(Note: This is a distillation of an issue I raised in #12173, where I noted that Tensorflow does not experience this slowdown. But, I closed that issue and opened this one, since I am now able to more precisely state what the issue I've run into is. I hope this is the correct protocol for redefining an issue after further study)
Keras SQRT implementation
Hi all,
I am having some warning from theano NaNGuard complaining about np.inf being a very large number ;)
Seems that the source of this warning is the way backend.sqrt is implemented in Keras in the different backends.
This is the tensorflow implementation (comments off):
keras/keras/backend/tensorflow_backend.py
Line 1568 in a0e90bd
def sqrt(x):
def sqrt(x):
zero = _to_tensor(0., x.dtype.base_dtype)
inf = _to_tensor(np.inf, x.dtype.base_dtype)
x = tf.clip_by_value(x, zero, inf)
return tf.sqrt(x)
For the sqrt case, Wouldn't it be simpler to use tf.maximum (x, zero) instead of tf.clip_by_value which evaluates both limits in tensorflow (more complex graph with clip which makes two ops)?
This applies also to theano and probably to some activation functions like relu etc.
I don't really understand the rationale behind this protection (use of clip to ensure the argument is positive). As a developer, if I design a component or a optimizer that is bad at this point (log of negative values) , I prefer getting an explicit error message from lower-level layer (tensorflow, theano). CNTK backend does not protect and that seems OK.
Thanks in advance for your feedback.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.