Comments (15)
Hi, @yhenon!
I appreciate the comment and the explanation (I was puzzled by your custom BatchNormalization implementation)! I’ll take a look. The BatchNormalization(axis=bn_axis+1) sounds like the better way to go!
I’m curious, would you be interested in helping out? Your package was certainly an inspiration!
from keras-rcnn.
I was not awared of this problem. Thank you for pointing out this issue. As far as I understood, the TimeDistributed layer should apply to a tensor of shape without the time dimension. If this is not the case for BatchNormalization, it might be an issue for keras as well because that would be inconsistent with the other layers. I'm not sure if the issue was caused by the extra dimension or others. That seems interesting and I will look into it.
from keras-rcnn.
Hi @yhenon ,
I've tried the TimeDistributed BatchNormalization with the following sample:
import numpy as np
from keras.layers import *
from keras.models import *
import keras.backend as K
img_size = 8
batch_size = 64
num_time_steps = 4
num_channels = 3
K.set_learning_phase(1)
X = np.random.rand(batch_size, num_time_steps,
img_size, img_size, num_channels)
x = K.variable(X)
y = TimeDistributed(BatchNormalization(axis=-1))(x)
print(K.int_shape(y))
norm = K.eval(y)
for i in range(num_time_steps):
for j in range(num_channels):
print(norm[:, i, ..., j].mean(), norm[:, i, ..., j].std())
And the results were:
(64, 4, 8, 8, 3)
-5.02914e-08 0.994199
2.79397e-09 0.99404
-2.79397e-08 0.994136
-3.21306e-08 0.994103
3.53903e-08 0.99412
2.79397e-08 0.994144
3.35276e-08 0.993953
-8.3819e-09 0.994113
1.11759e-08 0.994066
-8.73115e-08 0.993973
-7.45058e-09 0.993843
-6.61239e-08 0.99407
This seems to match what we desired from the BatchNormalization. It returns normalized activations per batch and the normalization was applied independently to all the data streams.
from keras-rcnn.
@JihongJu After looking over your code, I still think my issue stands (though I may be missing something). To clarify my point a bit:
TimeDistributed(BatchNorm())
seems to work fine at training time (as you point out), as it normalizes using the statistics of the mini-batchTimeDistributed(BatchNorm())
does not work fine at test time, as it normalizes using statistics computed on the training set. However, these statistics never get updated when the BN layer is in aTimeDistributed
wrapper.
The problem stands from your line K.set_learning_phase(1)
, which uses BN in train mode. However, having K.set_learning_phase(1)
as test time since it makes a number of layers behave undesirably (like dropout).
Here's a more complete example, where we compute the stats on a batch at both train and test time, using both approaches to BN:
from keras.layers import *
from keras.models import *
import keras.backend as K
def test_bn(batch_norm_type, learning_phase):
K.set_learning_phase(learning_phase)
img_size = 8
batch_size = 64
num_time_steps = 4
num_channels = 3
inputs = Input(shape=(num_time_steps, img_size, img_size, num_channels))
if batch_norm_type == 'time_dist':
# momentum increased for faster update of dataset statistics
x = TimeDistributed(BatchNormalization(axis=-1, momentum=0.5))(inputs)
elif batch_norm_type == 'flat':
x = BatchNormalization(axis=4, momentum=0.5)(inputs)
model = Model(inputs=inputs, outputs=x)
model.compile(loss='mae', optimizer='sgd')
X = np.random.rand(batch_size, num_time_steps, img_size, img_size, num_channels)
Y = np.random.rand(batch_size, num_time_steps, img_size, img_size, num_channels)
history = model.fit(X, Y, epochs=4, verbose=0)
P = model.predict(X)
print('bn_type: {:10} | learning_phase: {} | mean: {:14} | std: {:14}'.format(
batch_norm_type, learning_phase,
P.mean(), P.std()))
return
for batch_norm_type in ['time_dist', 'flat']:
for learning_phase in [0, 1]:
test_bn(batch_norm_type, learning_phase)
And the corresponding output:
bn_type: time_dist | learning_phase: 0 | mean: 0.498111873865 | std: 0.287745058537
bn_type: time_dist | learning_phase: 1 | mean: 0.00772533146665 | std: 0.973870813847
bn_type: flat | learning_phase: 0 | mean: 0.0119582833722 | std: 0.961674869061
bn_type: flat | learning_phase: 1 | mean: 0.0076981917955 | std: 0.973761022091
from keras-rcnn.
@0x00b1 Hi!
To be clear, in my implementation, I was just implementing what the paper said:
For the usage of BN layers, after pretraining, we compute the BN statistics (means and variances) for each layer on the ImageNet training set. Then the BN layers are fixed during fine-tuning for object detection. As such, the BN layers become linear activations with constant offsets and scales, and BN statistics are not updated by fine-tuning. We fix the BN layers mainly for reducing memory consumption in Faster R-CNN training.
This also provided a way of dealing with the above issue, so I left it.
I would certainly be interested in helping - my original implementation is rather limited in scope and full of hacks, amd a better quality keras frcnn would be desirable.
from keras-rcnn.
@yhenon Hmm, now I get the point. In that case, I agree with you, adding a flat BN, instead of a time distributed BN, to the 5D tensor seems fine.
from keras-rcnn.
Can I abuse this issue to ask why TimeDistributed
layers are necessary? Is it to perform computation per ROI (meaning the term 'time distributed' is a bit poorly chosen here)? I noticed in py-faster-rcnn
that they are limited to single batch training only, presumably because Caffe blobs are limited to 4d. If you have batch_size > 1
and ROIs, your blob would need 5 dimensions (batch_id, roi_id, height, width, channels). Is the use of TimeDistributed
intended to get this fifth dimension?
In addition, I noticed that for Keras the moving average / variation is not updated when in test
mode (see here). Wouldn't that be an issue? Shouldn't it be updated during test
mode? Should this be fixed in Keras? So many questions :)
from keras-rcnn.
@JihongJu I played with this too. I think @yhenon is correct. And I believe the suggestion by @yhenon will work (i.e. BatchNormalization(axis=bn_axis + 1)
).
@yhenon Want to send a PR? 😄
from keras-rcnn.
Can I abuse this issue to ask why TimeDistributed layers are necessary? Is it to perform computation per ROI (meaning the term 'time distributed' is a bit poorly chosen here)? I noticed in py-faster-rcnn that they are limited to single batch training only, presumably because Caffe blobs are limited to 4d. If you have batch_size > 1 and ROIs, your blob would need 5 dimensions (batch_id, roi_id, height, width, channels). Is the use of TimeDistributed intended to get this fifth dimension?
Yep. Your instincts are right. It’s a super clever hack by @yhenon to exploit the TimeDistributed wrapper’s batching to iterate across a variable number of regions. And, I agree, TimeDistributed is a bad name. I think Distributed (or Batched) would make more sense. (cc: @fchollet)
from keras-rcnn.
In addition, I noticed that for Keras the moving average / variation is not updated when in test mode (see here). Wouldn't that be an issue? Shouldn't it be updated during test mode? Should this be fixed in Keras? So many questions :)
Hrm. Why do you think it should be updated during test (i.e. inference or prediction)?
from keras-rcnn.
Hrm. Why do you think it should be updated during test (i.e. inference or prediction)?
I'm not sure, but it sounds like the moving average / variation is depending on your current data, not on the data you trained on. I will read more today on BatchNormalization to see how it should be.
from keras-rcnn.
I pushed PR to fix this issue here keras-team/keras#7467. I believe it's a more generic solution than the bn_axis+1 solution, and fixes the root problem in the TimeDistributed layer.
from keras-rcnn.
Thanks to @waleedka for making that PR which has now been merged!
Re-running the above snippet with a freshly checked out keras install gives:
bn_type: time_dist | learning_phase: 0 | mean: 0.0126442806795 | std: 0.960675358772
bn_type: time_dist | learning_phase: 1 | mean: 0.00776057131588 | std: 0.973823308945
bn_type: flat | learning_phase: 0 | mean: 0.0131619861349 | std: 0.961250126362
bn_type: flat | learning_phase: 1 | mean: 0.00772643135861 | std: 0.97383749485
Which is the desired output. This should keep the API a bit simpler, since TimeDistributed()
can now be applied to all layers in the final stage classifier. It means people will need to update their keras version to the latest, but that's ok.
from keras-rcnn.
Awesome! Thanks for the update, @yhenon and thanks for the work @waleedka!
@waleedka please feel free to add yourself to the CONTRIBUTORS file!
from keras-rcnn.
What is the best way to extend this script to a batch inference / training? @yhenon
from keras-rcnn.
Related Issues (20)
- masks for dsb18 are off
- Generator limits image pixel range 0 to 1
- nans during RPN training HOT 4
- Remove boxes at the edge
- Model is not saved using model.save("model.h5") HOT 3
- KeyError when trying out the Example HOT 1
- AttributeError: module 'keras.engine' has no attribute 'topology' HOT 3
- Path Aggregation Network for Instance Segmentation
- Error while calculating val_loss using validation_data HOT 1
- Misspelling in README.rst
- README, the parameter position is wrong
- Error reading B&W object
- Unable to load the datasets in keras_rcnn
- Generating json of images with no object of concern.
- How to calculate the accuracy of the model that is generated?
- No module named 'keras_resnet' HOT 1
- AttributeError: module 'keras_resnet.models' has no attribute 'FPN2D50' HOT 1
- Tensorflow 2.x upgrade
- AttributeError: 'Node' object has no attribute 'output_masks' HOT 6
- share pretrained weights
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from keras-rcnn.