Giter Site home page Giter Site logo

Comments (8)

ckyleda avatar ckyleda commented on May 12, 2024 14

If you use a generator to provide data, rather than just arrays of data, this error reappears and cannot be solved by implementing a test step. Predictably, there seems to be little documentation on the call method, and how to use it. E.g, should you set up a gradient tape inside call? Or does something outside of call handle the gradients?

Why does this happen? Shouldn't train/test steps be data-generator agnostic?

from keras-io.

fchollet avatar fchollet commented on May 12, 2024 7

You are using the default test_step, which attempts to call the model, but this model is not callable.

Solutions:

  1. Implement a correct test_step on the VAE model. Example:
 def test_step(self, data):
      if isinstance(data, tuple):
        data = data[0]

      z_mean, z_log_var, z = encoder(data)
      reconstruction = decoder(z)
      reconstruction_loss = tf.reduce_mean(
          keras.losses.binary_crossentropy(data, reconstruction)
      )
      reconstruction_loss *= 28 * 28
      kl_loss = 1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var)
      kl_loss = tf.reduce_mean(kl_loss)
      kl_loss *= -0.5
      total_loss = reconstruction_loss + kl_loss
      return {
          "loss": total_loss,
          "reconstruction_loss": reconstruction_loss,
          "kl_loss": kl_loss,
      }
  1. Implement a call method (this is exactly what the error message is telling you to do). Example:
  def call(self, inputs):
      z_mean, z_log_var, z = encoder(inputs)
      reconstruction = decoder(z)
      reconstruction_loss = tf.reduce_mean(
          keras.losses.binary_crossentropy(inputs, reconstruction)
      )
      reconstruction_loss *= 28 * 28
      kl_loss = 1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var)
      kl_loss = tf.reduce_mean(kl_loss)
      kl_loss *= -0.5
      total_loss = reconstruction_loss + kl_loss
      self.add_metric(kl_loss, name='kl_loss', aggregation='mean')
      self.add_metric(total_loss, name='total_loss', aggregation='mean')
      self.add_metric(reconstruction_loss, name='reconstruction_loss', aggregation='mean')
      return reconstruction

Go with option 1.

from keras-io.

romanovzky avatar romanovzky commented on May 12, 2024 1

I'm having this issue with generators as well, what is the solution?

from keras-io.

TylerADavis avatar TylerADavis commented on May 12, 2024 1

My understanding for the reconstruction_loss *= 28 * 28 statement is that the reconstruction loss is the mean loss. As such, regardless of whether your image is 28x28 or 500x500, the scale of your reconstruction loss remains the same. Multiplying the loss by the image size will give average error back. I don't think the exact scaling factor is important here, it's more a means of balancing the reconstruction loss vs the KL loss. There are some papers and stack overflow posts that discuss fancier ways of setting this value.

tl;dr: how you weight your KL divergence and reconstruction loss depends on your problem, the size of your inputs, and your latent space setup
https://stats.stackexchange.com/questions/332179/how-to-weight-kld-loss-vs-reconstruction-loss-in-variational-auto-encoder
https://stats.stackexchange.com/questions/341954/balancing-reconstruction-vs-kl-loss-variational-autoencoder
https://arxiv.org/abs/2002.07514

from keras-io.

riteshahlawat avatar riteshahlawat commented on May 12, 2024

If you use a generator to provide data, rather than just arrays of data, this error reappears and cannot be solved by implementing a test step. Predictably, there seems to be little documentation on the call method, and how to use it. E.g, should you set up a gradient tape inside call? Or does something outside of call handle the gradients?

Why does this happen? Shouldn't train/test steps be data-generator agnostic?

You could convert your generator to a tf.dataset.Dataset or use image_dataset_from_directory instead of using flow_from_directory.

from keras-io.

TylerADavis avatar TylerADavis commented on May 12, 2024

Why is it that Option 1 is recommended in #38 (comment) ? Is it because the train_step needs access to z_mean and z_log_var, while the call function returns only the reconstruction?

I came up with the implementation below, which seems fine to me, but I'm newer to this style of building models so I am not familiar with best practices. I am calling self(data) from inside of the gradient tape to be sure that I can get the gradients.

If this approach is acceptable, I would be glad to submit a PR to the tutorial, as it makes it easier to extend the tutorial with functionality like validation data and model saving.

class VAE(tf.keras.Model):

    def __init__(self, encoder, decoder, **kwargs):
        super(VAE, self).__init__(**kwargs)
        self.encoder = encoder
        self.decoder = decoder
        self.total_loss_tracker = tf.keras.metrics.Mean(name="total_loss")
        self.reconstruction_loss_tracker = tf.keras.metrics.Mean(
            name="reconstruction_loss")
        self.kl_loss_tracker = tf.keras.metrics.Mean(name="kl_loss")

    @property
    def metrics(self):
        return [
            self.total_loss_tracker,
            self.reconstruction_loss_tracker,
            self.kl_loss_tracker,
        ]

    def call(self, inputs):
        """Call the model on a particular input."""
        z_mean, z_log_var, z = encoder(inputs)
        reconstruction = decoder(z)
        return z_mean, z_log_var, reconstruction

    def train_step(self, data):
        with tf.GradientTape() as tape:
            z_mean, z_log_var, reconstruction = self(data)
            reconstruction_loss = tf.reduce_mean(
                tf.reduce_sum(tf.keras.losses.binary_crossentropy(
                    data, reconstruction),
                              axis=(0)))
            kl_loss = -0.5 * (1 + z_log_var - tf.square(z_mean) -
                              tf.exp(z_log_var))
            kl_loss = tf.reduce_mean(tf.reduce_sum(kl_loss, axis=1))
            total_loss = reconstruction_loss + kl_loss

            mae_loss = tf.reduce_mean(
                tf.keras.metrics.mean_absolute_error(data, reconstruction))

        grads = tape.gradient(total_loss, self.trainable_weights)
        self.optimizer.apply_gradients(zip(grads, self.trainable_weights))
        self.total_loss_tracker.update_state(total_loss)
        self.reconstruction_loss_tracker.update_state(reconstruction_loss)
        self.kl_loss_tracker.update_state(kl_loss)
        return {
            "loss": self.total_loss_tracker.result(),
            "reconstruction_loss": self.reconstruction_loss_tracker.result(),
            "kl_loss": self.kl_loss_tracker.result(),
        }

    def test_step(self, data):
        """Step run during validation."""
        if isinstance(data, tuple):
            data = data[0]

        z_mean, z_log_var, reconstruction = self(data)
        reconstruction_loss = tf.reduce_mean(
            tf.keras.losses.binary_crossentropy(data, reconstruction))
        reconstruction_loss *= 28 * 28
        kl_loss = 1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var)
        kl_loss = tf.reduce_mean(kl_loss)
        kl_loss *= -0.5
        total_loss = reconstruction_loss + kl_loss
      
        return {
            "loss": total_loss,
            "reconstruction_loss": reconstruction_loss,
            "kl_loss": kl_loss,
        }

Alternatively you could do the below in the event that call can only return one item, but I could see it potentially leading to extra computation.

Snippet

class VAE(tf.keras.Model):

    def __init__(self, encoder, decoder, **kwargs):
        super(VAE, self).__init__(**kwargs)
        self.encoder = encoder
        self.decoder = decoder
        self.total_loss_tracker = tf.keras.metrics.Mean(name="total_loss")
        self.reconstruction_loss_tracker = tf.keras.metrics.Mean(
            name="reconstruction_loss")
        self.kl_loss_tracker = tf.keras.metrics.Mean(name="kl_loss")

    @property
    def metrics(self):
        return [
            self.total_loss_tracker,
            self.reconstruction_loss_tracker,
            self.kl_loss_tracker,
        ]

    def call(self, inputs):
        """Call the model on a particular input."""
        z_mean, z_log_var, z = encoder(inputs)
        reconstruction = decoder(z)
        return z_mean, z_log_var, reconstruction

    def train_step(self, data):
        with tf.GradientTape() as tape:
            z_mean, z_log_var, _ = encoder(data)
           reconstruction = self(data)
            reconstruction_loss = tf.reduce_mean(
                tf.reduce_sum(tf.keras.losses.binary_crossentropy(
                    data, reconstruction),
                              axis=(0)))
            kl_loss = -0.5 * (1 + z_log_var - tf.square(z_mean) -
                              tf.exp(z_log_var))
            kl_loss = tf.reduce_mean(tf.reduce_sum(kl_loss, axis=1))
            total_loss = reconstruction_loss + kl_loss

            mae_loss = tf.reduce_mean(
                tf.keras.metrics.mean_absolute_error(data, reconstruction))

        grads = tape.gradient(total_loss, self.trainable_weights)
        self.optimizer.apply_gradients(zip(grads, self.trainable_weights))
        self.total_loss_tracker.update_state(total_loss)
        self.reconstruction_loss_tracker.update_state(reconstruction_loss)
        self.kl_loss_tracker.update_state(kl_loss)
        return {
            "loss": self.total_loss_tracker.result(),
            "reconstruction_loss": self.reconstruction_loss_tracker.result(),
            "kl_loss": self.kl_loss_tracker.result(),
        }

    def test_step(self, data):
        """Step run during validation."""
        if isinstance(data, tuple):
            data = data[0]
        z_mean, z_log_var, _ = encoder(data)
        reconstruction = self(data)
        reconstruction_loss = tf.reduce_mean(
            tf.keras.losses.binary_crossentropy(data, reconstruction))
        reconstruction_loss *= 28 * 28
        kl_loss = 1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var)
        kl_loss = tf.reduce_mean(kl_loss)
        kl_loss *= -0.5
        total_loss = reconstruction_loss + kl_loss
      
        return {
            "loss": total_loss,
            "reconstruction_loss": reconstruction_loss,
            "kl_loss": kl_loss,
        }

from keras-io.

dunst4n91 avatar dunst4n91 commented on May 12, 2024

I have this working using generators for training and validation data using the additional test_step method as above but without a call method.

train_gen = datagen.flow_from_dataframe(
    dataframe=file_df,
    directory=img_directory,
    color_mode="rgba",
    target_size=(128, 128),
    class_mode=None,
    batch_size=train_batch_size,
    subset="training")

val_gen = datagen.flow_from_dataframe(
    dataframe=file_df,
    directory=img_directory,
    color_mode="rgba",
    target_size=(128, 128),
    class_mode=None,
    batch_size=val_batch_size,
    subset="validation")

class VAE(keras.Model):
    def __init__(self, encoder, decoder, **kwargs):
        super(VAE, self).__init__(**kwargs)
        self.encoder = encoder
        self.decoder = decoder
        self.total_loss_tracker = keras.metrics.Mean(name="total_loss")
        self.reconstruction_loss_tracker = keras.metrics.Mean(name="reconstruction_loss")
        self.kl_loss_tracker = keras.metrics.Mean(name="kl_loss")

    @property
    def metrics(self):
        return [self.total_loss_tracker,
                self.reconstruction_loss_tracker,
                self.kl_loss_tracker]

    def train_step(self, data):
        with tf.GradientTape() as tape:
            z_mean, z_log_var, z = self.encoder(data)
            reconstruction = self.decoder(z)
            
            reconstruction_loss = tf.reduce_mean(tf.reduce_sum(keras.losses.binary_crossentropy(data, reconstruction), axis=(1, 2)))
            kl_loss = -0.5 * (1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var))
            kl_loss = tf.reduce_mean(tf.reduce_sum(kl_loss, axis=1))
            total_loss = reconstruction_loss + kl_loss
            
        grads = tape.gradient(total_loss, self.trainable_weights)
        self.optimizer.apply_gradients(zip(grads, self.trainable_weights))
        self.total_loss_tracker.update_state(total_loss)
        self.reconstruction_loss_tracker.update_state(reconstruction_loss)
        self.kl_loss_tracker.update_state(kl_loss)
        
        return {"loss": self.total_loss_tracker.result(),
                "reconstruction_loss": self.reconstruction_loss_tracker.result(),
                "kl_loss": self.kl_loss_tracker.result()}
    
    def test_step(self, data):
        if isinstance(data, tuple):
            data = data[0]

        z_mean, z_log_var, z = self.encoder(data)
        reconstruction = self.decoder(z)
        
        reconstruction_loss = tf.reduce_mean(tf.reduce_sum(keras.losses.binary_crossentropy(data, reconstruction), axis=(1, 2)))
        kl_loss = -0.5 * (1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var))
        kl_loss = tf.reduce_mean(tf.reduce_sum(kl_loss, axis=1))
        total_loss = reconstruction_loss + kl_loss
        return {"loss": total_loss,
                "reconstruction_loss": reconstruction_loss,
                "kl_loss": kl_loss}

vae = VAE(encoder, decoder)
vae.compile(optimizer=keras.optimizers.Adam())
early_stopping = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=10)
history = vae.fit(train_gen, validation_data=val_gen, steps_per_epoch=(len(files)*0.8)/train_batch_size, epochs=100, callbacks=[early_stopping])

I didn't fully understand the test_step method above so based mine on the originally suggested train step. Could anyone explain the purpose of this line reconstruction_loss *= 28 * 28?

from keras-io.

romanovzky avatar romanovzky commented on May 12, 2024

The problem exists in DNN classifiers on tabular data. The problem is the usage of numpy arrays for X, y, and sample weights. A generator fixes this.

from keras-io.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.