Giter Site home page Giter Site logo

Comments (15)

0x00b1 avatar 0x00b1 commented on September 14, 2024

Hi, @yhenon!

I appreciate the comment and the explanation (I was puzzled by your custom BatchNormalization implementation)! I’ll take a look. The BatchNormalization(axis=bn_axis+1) sounds like the better way to go!

I’m curious, would you be interested in helping out? Your package was certainly an inspiration!

from keras-rcnn.

JihongJu avatar JihongJu commented on September 14, 2024

I was not awared of this problem. Thank you for pointing out this issue. As far as I understood, the TimeDistributed layer should apply to a tensor of shape without the time dimension. If this is not the case for BatchNormalization, it might be an issue for keras as well because that would be inconsistent with the other layers. I'm not sure if the issue was caused by the extra dimension or others. That seems interesting and I will look into it.

from keras-rcnn.

JihongJu avatar JihongJu commented on September 14, 2024

Hi @yhenon ,

I've tried the TimeDistributed BatchNormalization with the following sample:

import numpy as np
from keras.layers import *
from keras.models import *
import keras.backend as K

img_size = 8
batch_size = 64
num_time_steps = 4
num_channels = 3
K.set_learning_phase(1)
X = np.random.rand(batch_size, num_time_steps,
                   img_size, img_size, num_channels)
x = K.variable(X)
y = TimeDistributed(BatchNormalization(axis=-1))(x)
print(K.int_shape(y))

norm = K.eval(y)
for i in range(num_time_steps):
    for j in range(num_channels):
        print(norm[:, i, ..., j].mean(), norm[:, i, ..., j].std())

And the results were:

(64, 4, 8, 8, 3)
-5.02914e-08 0.994199
2.79397e-09 0.99404
-2.79397e-08 0.994136
-3.21306e-08 0.994103
3.53903e-08 0.99412
2.79397e-08 0.994144
3.35276e-08 0.993953
-8.3819e-09 0.994113
1.11759e-08 0.994066
-8.73115e-08 0.993973
-7.45058e-09 0.993843
-6.61239e-08 0.99407

This seems to match what we desired from the BatchNormalization. It returns normalized activations per batch and the normalization was applied independently to all the data streams.

from keras-rcnn.

yhenon avatar yhenon commented on September 14, 2024

@JihongJu After looking over your code, I still think my issue stands (though I may be missing something). To clarify my point a bit:

  • TimeDistributed(BatchNorm()) seems to work fine at training time (as you point out), as it normalizes using the statistics of the mini-batch
  • TimeDistributed(BatchNorm()) does not work fine at test time, as it normalizes using statistics computed on the training set. However, these statistics never get updated when the BN layer is in a TimeDistributed wrapper.

The problem stands from your line K.set_learning_phase(1), which uses BN in train mode. However, having K.set_learning_phase(1) as test time since it makes a number of layers behave undesirably (like dropout).

Here's a more complete example, where we compute the stats on a batch at both train and test time, using both approaches to BN:

from keras.layers import *
from keras.models import *
import keras.backend as K

def test_bn(batch_norm_type, learning_phase):
	K.set_learning_phase(learning_phase)

	img_size = 8
	batch_size = 64
	num_time_steps = 4
	num_channels = 3

	inputs = Input(shape=(num_time_steps, img_size, img_size, num_channels))
	
	if batch_norm_type == 'time_dist':
		# momentum increased for faster update of dataset statistics
		x = TimeDistributed(BatchNormalization(axis=-1, momentum=0.5))(inputs)
	elif batch_norm_type == 'flat':
		x = BatchNormalization(axis=4, momentum=0.5)(inputs)

	model = Model(inputs=inputs, outputs=x)
	model.compile(loss='mae', optimizer='sgd')

	X = np.random.rand(batch_size, num_time_steps, img_size, img_size, num_channels)
	Y = np.random.rand(batch_size, num_time_steps, img_size, img_size, num_channels)
	history = model.fit(X, Y, epochs=4, verbose=0)

	P = model.predict(X)

	print('bn_type: {:10} | learning_phase: {} | mean: {:14} | std: {:14}'.format(
		batch_norm_type, learning_phase, 
		P.mean(), P.std()))
	return

for batch_norm_type in ['time_dist', 'flat']:
	for learning_phase in [0, 1]:
		test_bn(batch_norm_type, learning_phase)

And the corresponding output:

bn_type: time_dist  | learning_phase: 0 | mean: 0.498111873865 | std: 0.287745058537
bn_type: time_dist  | learning_phase: 1 | mean: 0.00772533146665 | std: 0.973870813847
bn_type: flat       | learning_phase: 0 | mean: 0.0119582833722 | std: 0.961674869061
bn_type: flat       | learning_phase: 1 | mean: 0.0076981917955 | std: 0.973761022091

from keras-rcnn.

yhenon avatar yhenon commented on September 14, 2024

@0x00b1 Hi!
To be clear, in my implementation, I was just implementing what the paper said:

For the usage of BN layers, after pretraining, we compute the BN statistics (means and variances) for each layer on the ImageNet training set. Then the BN layers are fixed during fine-tuning for object detection. As such, the BN layers become linear activations with constant offsets and scales, and BN statistics are not updated by fine-tuning. We fix the BN layers mainly for reducing memory consumption in Faster R-CNN training.

This also provided a way of dealing with the above issue, so I left it.

I would certainly be interested in helping - my original implementation is rather limited in scope and full of hacks, amd a better quality keras frcnn would be desirable.

from keras-rcnn.

JihongJu avatar JihongJu commented on September 14, 2024

@yhenon Hmm, now I get the point. In that case, I agree with you, adding a flat BN, instead of a time distributed BN, to the 5D tensor seems fine.

from keras-rcnn.

hgaiser avatar hgaiser commented on September 14, 2024

Can I abuse this issue to ask why TimeDistributed layers are necessary? Is it to perform computation per ROI (meaning the term 'time distributed' is a bit poorly chosen here)? I noticed in py-faster-rcnn that they are limited to single batch training only, presumably because Caffe blobs are limited to 4d. If you have batch_size > 1 and ROIs, your blob would need 5 dimensions (batch_id, roi_id, height, width, channels). Is the use of TimeDistributed intended to get this fifth dimension?

In addition, I noticed that for Keras the moving average / variation is not updated when in test mode (see here). Wouldn't that be an issue? Shouldn't it be updated during test mode? Should this be fixed in Keras? So many questions :)

from keras-rcnn.

0x00b1 avatar 0x00b1 commented on September 14, 2024

@JihongJu I played with this too. I think @yhenon is correct. And I believe the suggestion by @yhenon will work (i.e. BatchNormalization(axis=bn_axis + 1)).

@yhenon Want to send a PR? 😄

from keras-rcnn.

0x00b1 avatar 0x00b1 commented on September 14, 2024

Can I abuse this issue to ask why TimeDistributed layers are necessary? Is it to perform computation per ROI (meaning the term 'time distributed' is a bit poorly chosen here)? I noticed in py-faster-rcnn that they are limited to single batch training only, presumably because Caffe blobs are limited to 4d. If you have batch_size > 1 and ROIs, your blob would need 5 dimensions (batch_id, roi_id, height, width, channels). Is the use of TimeDistributed intended to get this fifth dimension?

Yep. Your instincts are right. It’s a super clever hack by @yhenon to exploit the TimeDistributed wrapper’s batching to iterate across a variable number of regions. And, I agree, TimeDistributed is a bad name. I think Distributed (or Batched) would make more sense. (cc: @fchollet)

from keras-rcnn.

0x00b1 avatar 0x00b1 commented on September 14, 2024

In addition, I noticed that for Keras the moving average / variation is not updated when in test mode (see here). Wouldn't that be an issue? Shouldn't it be updated during test mode? Should this be fixed in Keras? So many questions :)

Hrm. Why do you think it should be updated during test (i.e. inference or prediction)?

from keras-rcnn.

hgaiser avatar hgaiser commented on September 14, 2024

Hrm. Why do you think it should be updated during test (i.e. inference or prediction)?

I'm not sure, but it sounds like the moving average / variation is depending on your current data, not on the data you trained on. I will read more today on BatchNormalization to see how it should be.

from keras-rcnn.

waleedka avatar waleedka commented on September 14, 2024

I pushed PR to fix this issue here keras-team/keras#7467. I believe it's a more generic solution than the bn_axis+1 solution, and fixes the root problem in the TimeDistributed layer.

from keras-rcnn.

yhenon avatar yhenon commented on September 14, 2024

Thanks to @waleedka for making that PR which has now been merged!
Re-running the above snippet with a freshly checked out keras install gives:

bn_type: time_dist  | learning_phase: 0 | mean: 0.0126442806795 | std: 0.960675358772
bn_type: time_dist  | learning_phase: 1 | mean: 0.00776057131588 | std: 0.973823308945
bn_type: flat       | learning_phase: 0 | mean: 0.0131619861349 | std: 0.961250126362
bn_type: flat       | learning_phase: 1 | mean: 0.00772643135861 | std:  0.97383749485

Which is the desired output. This should keep the API a bit simpler, since TimeDistributed() can now be applied to all layers in the final stage classifier. It means people will need to update their keras version to the latest, but that's ok.

from keras-rcnn.

0x00b1 avatar 0x00b1 commented on September 14, 2024

Awesome! Thanks for the update, @yhenon and thanks for the work @waleedka!

@waleedka please feel free to add yourself to the CONTRIBUTORS file!

from keras-rcnn.

subhashree-r avatar subhashree-r commented on September 14, 2024

What is the best way to extend this script to a batch inference / training? @yhenon

from keras-rcnn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.