I love the new guides! I'm following <a href="https://github.com/ker

Why is it that Option 1 is recommended in <a class="issue-link js-issue-link" data-err

Supplying validation data to the fit method of a subclassed model with a train_step (e.g. the VAE guide) about keras-io HOT 8 CLOSED

keras-team commented on May 12, 2024

Supplying validation data to the fit method of a subclassed model with a train_step (e.g. the VAE guide)

from keras-io.

Comments (8)

ckyleda commented on May 12, 2024 14

If you use a generator to provide data, rather than just arrays of data, this error reappears and cannot be solved by implementing a test step. Predictably, there seems to be little documentation on the call method, and how to use it. E.g, should you set up a gradient tape inside call? Or does something outside of call handle the gradients?

Why does this happen? Shouldn't train/test steps be data-generator agnostic?

from keras-io.

fchollet commented on May 12, 2024 7

You are using the default test_step, which attempts to call the model, but this model is not callable.

Solutions:

Implement a correct test_step on the VAE model. Example:

 def test_step(self, data):
      if isinstance(data, tuple):
        data = data[0]

      z_mean, z_log_var, z = encoder(data)
      reconstruction = decoder(z)
      reconstruction_loss = tf.reduce_mean(
          keras.losses.binary_crossentropy(data, reconstruction)
      )
      reconstruction_loss *= 28 * 28
      kl_loss = 1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var)
      kl_loss = tf.reduce_mean(kl_loss)
      kl_loss *= -0.5
      total_loss = reconstruction_loss + kl_loss
      return {
          "loss": total_loss,
          "reconstruction_loss": reconstruction_loss,
          "kl_loss": kl_loss,
      }

Implement a call method (this is exactly what the error message is telling you to do). Example:

  def call(self, inputs):
      z_mean, z_log_var, z = encoder(inputs)
      reconstruction = decoder(z)
      reconstruction_loss = tf.reduce_mean(
          keras.losses.binary_crossentropy(inputs, reconstruction)
      )
      reconstruction_loss *= 28 * 28
      kl_loss = 1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var)
      kl_loss = tf.reduce_mean(kl_loss)
      kl_loss *= -0.5
      total_loss = reconstruction_loss + kl_loss
      self.add_metric(kl_loss, name='kl_loss', aggregation='mean')
      self.add_metric(total_loss, name='total_loss', aggregation='mean')
      self.add_metric(reconstruction_loss, name='reconstruction_loss', aggregation='mean')
      return reconstruction

Go with option 1.

from keras-io.

romanovzky commented on May 12, 2024 1

I'm having this issue with generators as well, what is the solution?

from keras-io.

TylerADavis commented on May 12, 2024 1

My understanding for the reconstruction_loss *= 28 * 28 statement is that the reconstruction loss is the mean loss. As such, regardless of whether your image is 28x28 or 500x500, the scale of your reconstruction loss remains the same. Multiplying the loss by the image size will give average error back. I don't think the exact scaling factor is important here, it's more a means of balancing the reconstruction loss vs the KL loss. There are some papers and stack overflow posts that discuss fancier ways of setting this value.

tl;dr: how you weight your KL divergence and reconstruction loss depends on your problem, the size of your inputs, and your latent space setup
https://stats.stackexchange.com/questions/332179/how-to-weight-kld-loss-vs-reconstruction-loss-in-variational-auto-encoder
https://stats.stackexchange.com/questions/341954/balancing-reconstruction-vs-kl-loss-variational-autoencoder
https://arxiv.org/abs/2002.07514

from keras-io.

riteshahlawat commented on May 12, 2024

If you use a generator to provide data, rather than just arrays of data, this error reappears and cannot be solved by implementing a test step. Predictably, there seems to be little documentation on the call method, and how to use it. E.g, should you set up a gradient tape inside call? Or does something outside of call handle the gradients?

Why does this happen? Shouldn't train/test steps be data-generator agnostic?

You could convert your generator to a tf.dataset.Dataset or use image_dataset_from_directory instead of using flow_from_directory.

from keras-io.

TylerADavis commented on May 12, 2024

Why is it that Option 1 is recommended in #38 (comment) ? Is it because the train_step needs access to z_mean and z_log_var, while the call function returns only the reconstruction?

I came up with the implementation below, which seems fine to me, but I'm newer to this style of building models so I am not familiar with best practices. I am calling self(data) from inside of the gradient tape to be sure that I can get the gradients.

If this approach is acceptable, I would be glad to submit a PR to the tutorial, as it makes it easier to extend the tutorial with functionality like validation data and model saving.

class VAE(tf.keras.Model):

    def __init__(self, encoder, decoder, **kwargs):
        super(VAE, self).__init__(**kwargs)
        self.encoder = encoder
        self.decoder = decoder
        self.total_loss_tracker = tf.keras.metrics.Mean(name="total_loss")
        self.reconstruction_loss_tracker = tf.keras.metrics.Mean(
            name="reconstruction_loss")
        self.kl_loss_tracker = tf.keras.metrics.Mean(name="kl_loss")

    @property
    def metrics(self):
        return [
            self.total_loss_tracker,
            self.reconstruction_loss_tracker,
            self.kl_loss_tracker,
        ]

    def call(self, inputs):
        """Call the model on a particular input."""
        z_mean, z_log_var, z = encoder(inputs)
        reconstruction = decoder(z)
        return z_mean, z_log_var, reconstruction

    def train_step(self, data):
        with tf.GradientTape() as tape:
            z_mean, z_log_var, reconstruction = self(data)
            reconstruction_loss = tf.reduce_mean(
                tf.reduce_sum(tf.keras.losses.binary_crossentropy(
                    data, reconstruction),
                              axis=(0)))
            kl_loss = -0.5 * (1 + z_log_var - tf.square(z_mean) -
                              tf.exp(z_log_var))
            kl_loss = tf.reduce_mean(tf.reduce_sum(kl_loss, axis=1))
            total_loss = reconstruction_loss + kl_loss

            mae_loss = tf.reduce_mean(
                tf.keras.metrics.mean_absolute_error(data, reconstruction))

        grads = tape.gradient(total_loss, self.trainable_weights)
        self.optimizer.apply_gradients(zip(grads, self.trainable_weights))
        self.total_loss_tracker.update_state(total_loss)
        self.reconstruction_loss_tracker.update_state(reconstruction_loss)
        self.kl_loss_tracker.update_state(kl_loss)
        return {
            "loss": self.total_loss_tracker.result(),
            "reconstruction_loss": self.reconstruction_loss_tracker.result(),
            "kl_loss": self.kl_loss_tracker.result(),
        }

    def test_step(self, data):
        """Step run during validation."""
        if isinstance(data, tuple):
            data = data[0]

        z_mean, z_log_var, reconstruction = self(data)
        reconstruction_loss = tf.reduce_mean(
            tf.keras.losses.binary_crossentropy(data, reconstruction))
        reconstruction_loss *= 28 * 28
        kl_loss = 1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var)
        kl_loss = tf.reduce_mean(kl_loss)
        kl_loss *= -0.5
        total_loss = reconstruction_loss + kl_loss
      
        return {
            "loss": total_loss,
            "reconstruction_loss": reconstruction_loss,
            "kl_loss": kl_loss,
        }

Alternatively you could do the below in the event that call can only return one item, but I could see it potentially leading to extra computation.

Snippet

class VAE(tf.keras.Model):

    def __init__(self, encoder, decoder, **kwargs):
        super(VAE, self).__init__(**kwargs)
        self.encoder = encoder
        self.decoder = decoder
        self.total_loss_tracker = tf.keras.metrics.Mean(name="total_loss")
        self.reconstruction_loss_tracker = tf.keras.metrics.Mean(
            name="reconstruction_loss")
        self.kl_loss_tracker = tf.keras.metrics.Mean(name="kl_loss")

    @property
    def metrics(self):
        return [
            self.total_loss_tracker,
            self.reconstruction_loss_tracker,
            self.kl_loss_tracker,
        ]

    def call(self, inputs):
        """Call the model on a particular input."""
        z_mean, z_log_var, z = encoder(inputs)
        reconstruction = decoder(z)
        return z_mean, z_log_var, reconstruction

    def train_step(self, data):
        with tf.GradientTape() as tape:
            z_mean, z_log_var, _ = encoder(data)
           reconstruction = self(data)
            reconstruction_loss = tf.reduce_mean(
                tf.reduce_sum(tf.keras.losses.binary_crossentropy(
                    data, reconstruction),
                              axis=(0)))
            kl_loss = -0.5 * (1 + z_log_var - tf.square(z_mean) -
                              tf.exp(z_log_var))
            kl_loss = tf.reduce_mean(tf.reduce_sum(kl_loss, axis=1))
            total_loss = reconstruction_loss + kl_loss

            mae_loss = tf.reduce_mean(
                tf.keras.metrics.mean_absolute_error(data, reconstruction))

        grads = tape.gradient(total_loss, self.trainable_weights)
        self.optimizer.apply_gradients(zip(grads, self.trainable_weights))
        self.total_loss_tracker.update_state(total_loss)
        self.reconstruction_loss_tracker.update_state(reconstruction_loss)
        self.kl_loss_tracker.update_state(kl_loss)
        return {
            "loss": self.total_loss_tracker.result(),
            "reconstruction_loss": self.reconstruction_loss_tracker.result(),
            "kl_loss": self.kl_loss_tracker.result(),
        }

    def test_step(self, data):
        """Step run during validation."""
        if isinstance(data, tuple):
            data = data[0]
        z_mean, z_log_var, _ = encoder(data)
        reconstruction = self(data)
        reconstruction_loss = tf.reduce_mean(
            tf.keras.losses.binary_crossentropy(data, reconstruction))
        reconstruction_loss *= 28 * 28
        kl_loss = 1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var)
        kl_loss = tf.reduce_mean(kl_loss)
        kl_loss *= -0.5
        total_loss = reconstruction_loss + kl_loss
      
        return {
            "loss": total_loss,
            "reconstruction_loss": reconstruction_loss,
            "kl_loss": kl_loss,
        }

from keras-io.

dunst4n91 commented on May 12, 2024

I have this working using generators for training and validation data using the additional test_step method as above but without a call method.

train_gen = datagen.flow_from_dataframe(
    dataframe=file_df,
    directory=img_directory,
    color_mode="rgba",
    target_size=(128, 128),
    class_mode=None,
    batch_size=train_batch_size,
    subset="training")

val_gen = datagen.flow_from_dataframe(
    dataframe=file_df,
    directory=img_directory,
    color_mode="rgba",
    target_size=(128, 128),
    class_mode=None,
    batch_size=val_batch_size,
    subset="validation")

class VAE(keras.Model):
    def __init__(self, encoder, decoder, **kwargs):
        super(VAE, self).__init__(**kwargs)
        self.encoder = encoder
        self.decoder = decoder
        self.total_loss_tracker = keras.metrics.Mean(name="total_loss")
        self.reconstruction_loss_tracker = keras.metrics.Mean(name="reconstruction_loss")
        self.kl_loss_tracker = keras.metrics.Mean(name="kl_loss")

    @property
    def metrics(self):
        return [self.total_loss_tracker,
                self.reconstruction_loss_tracker,
                self.kl_loss_tracker]

    def train_step(self, data):
        with tf.GradientTape() as tape:
            z_mean, z_log_var, z = self.encoder(data)
            reconstruction = self.decoder(z)
            
            reconstruction_loss = tf.reduce_mean(tf.reduce_sum(keras.losses.binary_crossentropy(data, reconstruction), axis=(1, 2)))
            kl_loss = -0.5 * (1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var))
            kl_loss = tf.reduce_mean(tf.reduce_sum(kl_loss, axis=1))
            total_loss = reconstruction_loss + kl_loss
            
        grads = tape.gradient(total_loss, self.trainable_weights)
        self.optimizer.apply_gradients(zip(grads, self.trainable_weights))
        self.total_loss_tracker.update_state(total_loss)
        self.reconstruction_loss_tracker.update_state(reconstruction_loss)
        self.kl_loss_tracker.update_state(kl_loss)
        
        return {"loss": self.total_loss_tracker.result(),
                "reconstruction_loss": self.reconstruction_loss_tracker.result(),
                "kl_loss": self.kl_loss_tracker.result()}
    
    def test_step(self, data):
        if isinstance(data, tuple):
            data = data[0]

        z_mean, z_log_var, z = self.encoder(data)
        reconstruction = self.decoder(z)
        
        reconstruction_loss = tf.reduce_mean(tf.reduce_sum(keras.losses.binary_crossentropy(data, reconstruction), axis=(1, 2)))
        kl_loss = -0.5 * (1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var))
        kl_loss = tf.reduce_mean(tf.reduce_sum(kl_loss, axis=1))
        total_loss = reconstruction_loss + kl_loss
        return {"loss": total_loss,
                "reconstruction_loss": reconstruction_loss,
                "kl_loss": kl_loss}

vae = VAE(encoder, decoder)
vae.compile(optimizer=keras.optimizers.Adam())
early_stopping = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=10)
history = vae.fit(train_gen, validation_data=val_gen, steps_per_epoch=(len(files)*0.8)/train_batch_size, epochs=100, callbacks=[early_stopping])

I didn't fully understand the test_step method above so based mine on the originally suggested train step. Could anyone explain the purpose of this line reconstruction_loss *= 28 * 28?

from keras-io.

romanovzky commented on May 12, 2024

The problem exists in DNN classifiers on tabular data. The problem is the usage of numpy arrays for X, y, and sample weights. A generator fixes this.

from keras-io.

Supplying validation data to the fit method of a subclassed model with a train_step (e.g. the VAE guide) about keras-io HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent