Comments (8)
If you use a generator to provide data, rather than just arrays of data, this error reappears and cannot be solved by implementing a test step. Predictably, there seems to be little documentation on the call
method, and how to use it. E.g, should you set up a gradient tape inside call? Or does something outside of call handle the gradients?
Why does this happen? Shouldn't train/test steps be data-generator agnostic?
from keras-io.
You are using the default test_step
, which attempts to call the model, but this model is not callable.
Solutions:
- Implement a correct
test_step
on the VAE model. Example:
def test_step(self, data):
if isinstance(data, tuple):
data = data[0]
z_mean, z_log_var, z = encoder(data)
reconstruction = decoder(z)
reconstruction_loss = tf.reduce_mean(
keras.losses.binary_crossentropy(data, reconstruction)
)
reconstruction_loss *= 28 * 28
kl_loss = 1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var)
kl_loss = tf.reduce_mean(kl_loss)
kl_loss *= -0.5
total_loss = reconstruction_loss + kl_loss
return {
"loss": total_loss,
"reconstruction_loss": reconstruction_loss,
"kl_loss": kl_loss,
}
- Implement a
call
method (this is exactly what the error message is telling you to do). Example:
def call(self, inputs):
z_mean, z_log_var, z = encoder(inputs)
reconstruction = decoder(z)
reconstruction_loss = tf.reduce_mean(
keras.losses.binary_crossentropy(inputs, reconstruction)
)
reconstruction_loss *= 28 * 28
kl_loss = 1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var)
kl_loss = tf.reduce_mean(kl_loss)
kl_loss *= -0.5
total_loss = reconstruction_loss + kl_loss
self.add_metric(kl_loss, name='kl_loss', aggregation='mean')
self.add_metric(total_loss, name='total_loss', aggregation='mean')
self.add_metric(reconstruction_loss, name='reconstruction_loss', aggregation='mean')
return reconstruction
Go with option 1.
from keras-io.
I'm having this issue with generators as well, what is the solution?
from keras-io.
My understanding for the reconstruction_loss *= 28 * 28
statement is that the reconstruction loss is the mean loss. As such, regardless of whether your image is 28x28 or 500x500, the scale of your reconstruction loss remains the same. Multiplying the loss by the image size will give average error back. I don't think the exact scaling factor is important here, it's more a means of balancing the reconstruction loss vs the KL loss. There are some papers and stack overflow posts that discuss fancier ways of setting this value.
tl;dr: how you weight your KL divergence and reconstruction loss depends on your problem, the size of your inputs, and your latent space setup
https://stats.stackexchange.com/questions/332179/how-to-weight-kld-loss-vs-reconstruction-loss-in-variational-auto-encoder
https://stats.stackexchange.com/questions/341954/balancing-reconstruction-vs-kl-loss-variational-autoencoder
https://arxiv.org/abs/2002.07514
from keras-io.
If you use a generator to provide data, rather than just arrays of data, this error reappears and cannot be solved by implementing a test step. Predictably, there seems to be little documentation on the
call
method, and how to use it. E.g, should you set up a gradient tape inside call? Or does something outside of call handle the gradients?Why does this happen? Shouldn't train/test steps be data-generator agnostic?
You could convert your generator to a tf.dataset.Dataset or use image_dataset_from_directory instead of using flow_from_directory.
from keras-io.
Why is it that Option 1 is recommended in #38 (comment) ? Is it because the train_step
needs access to z_mean and z_log_var, while the call
function returns only the reconstruction?
I came up with the implementation below, which seems fine to me, but I'm newer to this style of building models so I am not familiar with best practices. I am calling self(data)
from inside of the gradient tape to be sure that I can get the gradients.
If this approach is acceptable, I would be glad to submit a PR to the tutorial, as it makes it easier to extend the tutorial with functionality like validation data and model saving.
class VAE(tf.keras.Model):
def __init__(self, encoder, decoder, **kwargs):
super(VAE, self).__init__(**kwargs)
self.encoder = encoder
self.decoder = decoder
self.total_loss_tracker = tf.keras.metrics.Mean(name="total_loss")
self.reconstruction_loss_tracker = tf.keras.metrics.Mean(
name="reconstruction_loss")
self.kl_loss_tracker = tf.keras.metrics.Mean(name="kl_loss")
@property
def metrics(self):
return [
self.total_loss_tracker,
self.reconstruction_loss_tracker,
self.kl_loss_tracker,
]
def call(self, inputs):
"""Call the model on a particular input."""
z_mean, z_log_var, z = encoder(inputs)
reconstruction = decoder(z)
return z_mean, z_log_var, reconstruction
def train_step(self, data):
with tf.GradientTape() as tape:
z_mean, z_log_var, reconstruction = self(data)
reconstruction_loss = tf.reduce_mean(
tf.reduce_sum(tf.keras.losses.binary_crossentropy(
data, reconstruction),
axis=(0)))
kl_loss = -0.5 * (1 + z_log_var - tf.square(z_mean) -
tf.exp(z_log_var))
kl_loss = tf.reduce_mean(tf.reduce_sum(kl_loss, axis=1))
total_loss = reconstruction_loss + kl_loss
mae_loss = tf.reduce_mean(
tf.keras.metrics.mean_absolute_error(data, reconstruction))
grads = tape.gradient(total_loss, self.trainable_weights)
self.optimizer.apply_gradients(zip(grads, self.trainable_weights))
self.total_loss_tracker.update_state(total_loss)
self.reconstruction_loss_tracker.update_state(reconstruction_loss)
self.kl_loss_tracker.update_state(kl_loss)
return {
"loss": self.total_loss_tracker.result(),
"reconstruction_loss": self.reconstruction_loss_tracker.result(),
"kl_loss": self.kl_loss_tracker.result(),
}
def test_step(self, data):
"""Step run during validation."""
if isinstance(data, tuple):
data = data[0]
z_mean, z_log_var, reconstruction = self(data)
reconstruction_loss = tf.reduce_mean(
tf.keras.losses.binary_crossentropy(data, reconstruction))
reconstruction_loss *= 28 * 28
kl_loss = 1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var)
kl_loss = tf.reduce_mean(kl_loss)
kl_loss *= -0.5
total_loss = reconstruction_loss + kl_loss
return {
"loss": total_loss,
"reconstruction_loss": reconstruction_loss,
"kl_loss": kl_loss,
}
Alternatively you could do the below in the event that call can only return one item, but I could see it potentially leading to extra computation.
Snippet
class VAE(tf.keras.Model):
def __init__(self, encoder, decoder, **kwargs):
super(VAE, self).__init__(**kwargs)
self.encoder = encoder
self.decoder = decoder
self.total_loss_tracker = tf.keras.metrics.Mean(name="total_loss")
self.reconstruction_loss_tracker = tf.keras.metrics.Mean(
name="reconstruction_loss")
self.kl_loss_tracker = tf.keras.metrics.Mean(name="kl_loss")
@property
def metrics(self):
return [
self.total_loss_tracker,
self.reconstruction_loss_tracker,
self.kl_loss_tracker,
]
def call(self, inputs):
"""Call the model on a particular input."""
z_mean, z_log_var, z = encoder(inputs)
reconstruction = decoder(z)
return z_mean, z_log_var, reconstruction
def train_step(self, data):
with tf.GradientTape() as tape:
z_mean, z_log_var, _ = encoder(data)
reconstruction = self(data)
reconstruction_loss = tf.reduce_mean(
tf.reduce_sum(tf.keras.losses.binary_crossentropy(
data, reconstruction),
axis=(0)))
kl_loss = -0.5 * (1 + z_log_var - tf.square(z_mean) -
tf.exp(z_log_var))
kl_loss = tf.reduce_mean(tf.reduce_sum(kl_loss, axis=1))
total_loss = reconstruction_loss + kl_loss
mae_loss = tf.reduce_mean(
tf.keras.metrics.mean_absolute_error(data, reconstruction))
grads = tape.gradient(total_loss, self.trainable_weights)
self.optimizer.apply_gradients(zip(grads, self.trainable_weights))
self.total_loss_tracker.update_state(total_loss)
self.reconstruction_loss_tracker.update_state(reconstruction_loss)
self.kl_loss_tracker.update_state(kl_loss)
return {
"loss": self.total_loss_tracker.result(),
"reconstruction_loss": self.reconstruction_loss_tracker.result(),
"kl_loss": self.kl_loss_tracker.result(),
}
def test_step(self, data):
"""Step run during validation."""
if isinstance(data, tuple):
data = data[0]
z_mean, z_log_var, _ = encoder(data)
reconstruction = self(data)
reconstruction_loss = tf.reduce_mean(
tf.keras.losses.binary_crossentropy(data, reconstruction))
reconstruction_loss *= 28 * 28
kl_loss = 1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var)
kl_loss = tf.reduce_mean(kl_loss)
kl_loss *= -0.5
total_loss = reconstruction_loss + kl_loss
return {
"loss": total_loss,
"reconstruction_loss": reconstruction_loss,
"kl_loss": kl_loss,
}
from keras-io.
I have this working using generators for training and validation data using the additional test_step
method as above but without a call
method.
train_gen = datagen.flow_from_dataframe(
dataframe=file_df,
directory=img_directory,
color_mode="rgba",
target_size=(128, 128),
class_mode=None,
batch_size=train_batch_size,
subset="training")
val_gen = datagen.flow_from_dataframe(
dataframe=file_df,
directory=img_directory,
color_mode="rgba",
target_size=(128, 128),
class_mode=None,
batch_size=val_batch_size,
subset="validation")
class VAE(keras.Model):
def __init__(self, encoder, decoder, **kwargs):
super(VAE, self).__init__(**kwargs)
self.encoder = encoder
self.decoder = decoder
self.total_loss_tracker = keras.metrics.Mean(name="total_loss")
self.reconstruction_loss_tracker = keras.metrics.Mean(name="reconstruction_loss")
self.kl_loss_tracker = keras.metrics.Mean(name="kl_loss")
@property
def metrics(self):
return [self.total_loss_tracker,
self.reconstruction_loss_tracker,
self.kl_loss_tracker]
def train_step(self, data):
with tf.GradientTape() as tape:
z_mean, z_log_var, z = self.encoder(data)
reconstruction = self.decoder(z)
reconstruction_loss = tf.reduce_mean(tf.reduce_sum(keras.losses.binary_crossentropy(data, reconstruction), axis=(1, 2)))
kl_loss = -0.5 * (1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var))
kl_loss = tf.reduce_mean(tf.reduce_sum(kl_loss, axis=1))
total_loss = reconstruction_loss + kl_loss
grads = tape.gradient(total_loss, self.trainable_weights)
self.optimizer.apply_gradients(zip(grads, self.trainable_weights))
self.total_loss_tracker.update_state(total_loss)
self.reconstruction_loss_tracker.update_state(reconstruction_loss)
self.kl_loss_tracker.update_state(kl_loss)
return {"loss": self.total_loss_tracker.result(),
"reconstruction_loss": self.reconstruction_loss_tracker.result(),
"kl_loss": self.kl_loss_tracker.result()}
def test_step(self, data):
if isinstance(data, tuple):
data = data[0]
z_mean, z_log_var, z = self.encoder(data)
reconstruction = self.decoder(z)
reconstruction_loss = tf.reduce_mean(tf.reduce_sum(keras.losses.binary_crossentropy(data, reconstruction), axis=(1, 2)))
kl_loss = -0.5 * (1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var))
kl_loss = tf.reduce_mean(tf.reduce_sum(kl_loss, axis=1))
total_loss = reconstruction_loss + kl_loss
return {"loss": total_loss,
"reconstruction_loss": reconstruction_loss,
"kl_loss": kl_loss}
vae = VAE(encoder, decoder)
vae.compile(optimizer=keras.optimizers.Adam())
early_stopping = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=10)
history = vae.fit(train_gen, validation_data=val_gen, steps_per_epoch=(len(files)*0.8)/train_batch_size, epochs=100, callbacks=[early_stopping])
I didn't fully understand the test_step
method above so based mine on the originally suggested train step
. Could anyone explain the purpose of this line reconstruction_loss *= 28 * 28
?
from keras-io.
The problem exists in DNN classifiers on tabular data. The problem is the usage of numpy arrays for X, y, and sample weights. A generator fixes this.
from keras-io.
Related Issues (20)
- neural_machine_translation_with_keras_nlp Example test script segmentation fault
- using test set as validation set HOT 1
- Bounding box predictions are too small (too narrow) in yolov8 HOT 5
- Keras version and Formatting issue in `MultipleChoice Task with Transfer Learning` example HOT 3
- CTC Batch Cost error while migrating "handwriting_recognition" example to Keras 3 HOT 2
- 'Functional' object has no attribute 'parameters' HOT 3
- Improve documentation on Yolov8 Detector HOT 1
- Keras_NLP Getting Started Tutorial: Mixed Precision Error; AttributeError: 'LossScaleOptimizerV3' HOT 6
- Performance degraded after using the mixup HOT 5
- In keras-io/guides/keras_nlp /transformer_pretraining.py there is an documentation bug HOT 1
- Add example Zero-shot Image Classification with SigLIP / CLIP from scratch using KerasCV and KerasNLP HOT 4
- Add Token Classification / Named Entity Recongnition (NER) Example with KerasNLP HOT 2
- kerascv layers demonstration HOT 3
- Could not find TensorRT HOT 3
- Error occurred in the Named Entity Recognition using Transformers example HOT 6
- Support Distributed Training for Fine-tuning Stable Diffusion Example HOT 5
- kagglecatsanddogs_5340.zip not available to downoad - image_classification_from_scratch.py HOT 1
- Problem with training yolov8 on TPU
- model not converging the sparse categorical accuracy stays same for the whole epochs. HOT 6
- batch out of range & loss value becomes 'nan' when running monocular depth estimation HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from keras-io.