is it right to have beta lower than one multiplied by KLd?

Beta lower than one about variational-autoencoder HOT 11 CLOSED

jaanli commented on May 28, 2024

Beta lower than one

from variational-autoencoder.

Comments (11)

jaanli commented on May 28, 2024

Good question - this is more a question of what helps empirically. E.g. in some work annealing beta helps: https://arxiv.org/abs/1511.06349

It probably depends on the problem; it's easy to try.

However, leaving beta lower than one is technically incorrect if the goal is to have a well-defined loss function that is a lower bound on the evidence.

from variational-autoencoder.

nitba commented on May 28, 2024

Thank you @altosaar , when I have beta = 1 my elbo doesnot decrease , but when I have beta lower than one it decrease, Where I have made mistake you think , do you think writing loss function in a
way you wrote differes results? I wrote simple KLD loss + reconstruction loss , but you wrote in a different form

from variational-autoencoder.

jaanli commented on May 28, 2024

Can you write the LaTeX form of your loss please? And do you mean your elbo increases? Can you plot both terms in the ELBO and try annealing beta?

…

On Wednesday, September 11, 2019, Ivamcoder ***@***.***> wrote: Thank you @altosaar <https://github.com/altosaar> , when I have beta = 1 my elbo doesnot decrease , but when I have beta lower than one it decrease, Where I have made mistake you think , do you think writing loss function in a way you wrote differes results? I wrote simple KLD loss + reconstruction loss , but you wrote in a different form — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#22>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABISE7AOY5D4VJIWX6UPSKDQJCOA7ANCNFSM4IVOCL3Q> .

from variational-autoencoder.

nitba commented on May 28, 2024

@altosaar , My plot shows that for beta=1 my results is worse than Beta lower than 1.

from variational-autoencoder.

nitba commented on May 28, 2024

and loss function I have used is

`from .loss import Loss
from .util import *

class BetaHLoss(Loss):

@staticmethod
def match(head_name):
    return head_name.lower() in ('beat-h', 'beta-vae-h')

@classmethod
def apply_args(cls, cfg):

    cls.BETA = cfg.KL.BETA
    cls.ANNEAL = cfg.KL.ANNEAL_REG
    cls.REC_ID = cfg.REC.ID

def __init__(self, cfg):
    super(BetaHLoss, self).__init__()
    BetaHLoss.apply_args(cfg)

def __call__(self, **kwargs):
    x = kwargs['x']
    x_recon = kwargs['x_recon']
    latent_dist = kwargs['latent_dist']
    step = kwargs['step']

    recon_loss = reconstruction_loss(x, x_recon, self.REC_ID)
    total_kld, dim_wise_kld, mean_kld = kl_divergence(*latent_dist)

    anneal_w = anneal_reg(step, self.ANNEAL)
    beta_vae_loss = recon_loss + self.BETA * anneal_w * total_kld

    meta = {
        'elbo': beta_vae_loss,
        'recon': recon_loss,
        'kld': total_kld,
        'dim_wise_kld': dim_wise_kld,
        'mean_kld': mean_kld,
        'anneal_reg': anneal_w}

    return beta_vae_loss, meta`

`import numpy as np
import torch.nn.functional as F

""" KLD DIVERGENCE"""

def kl_divergence(mu, logvar):

batch_size = mu.size(0)
assert batch_size != 0

klds = -0.5 * (1 + logvar - mu.pow(2) - logvar.exp())

total_kld = klds.sum(1).mean(0, True)
dimension_wise_kld = klds.mean(0)
mean_kld = klds.mean(1).mean(0, True)

return total_kld, dimension_wise_kld, mean_kld

""" RECONSTRUCTION LOSS """

def reconstruction_loss(x, x_recon, ID):

batch_size = x.size(0)
assert batch_size != 0

if ID == 'bernoulli':
    recon_loss = F.binary_cross_entropy_with_logits(x_recon, x, size_average=False).div(batch_size)

elif ID == 'gaussian':
    recon_loss = F.mse_loss(x_recon, x, size_average=False).div(batch_size)
else:
    recon_loss = None

return recon_loss

""" ANNEAL REGULARIZATION """
def anneal_reg(step, anneal):

k = anneal.K
x0 = anneal.X0
ID = anneal.ID

if ID == 'logistic':
    return float(1 / (1 + np.exp(-k * (step - x0))))

elif ID == 'linear':
    return min(1., max(0., step/x0))

else:
    return 1.0

from variational-autoencoder.

nitba commented on May 28, 2024

Do you think if I wrote my loss function like you , Would I get different results?

from variational-autoencoder.

jaanli commented on May 28, 2024

Thanks, that's interesting. Were you able to write the LaTeX for the ELBO? That will help us figure out if we're talking about the same thing.

…

On Wednesday, September 11, 2019, Ivamcoder ***@***.***> wrote: Do you think if I wrote my loss function like you , Would I get different results? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#22>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABISE7FY3ZV7SP4RFJ4MP4TQJGFOJANCNFSM4IVOCL3Q> .

from variational-autoencoder.

nitba commented on May 28, 2024

Thanks, that's interesting. Were you able to write the LaTeX for the ELBO? That will help us figure out if we're talking about the same thing.
…
On Wednesday, September 11, 2019, Ivamcoder @.***> wrote: Do you think if I wrote my loss function like you , Would I get different results? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#22?email_source=notifications&email_token=ABISE7A7ZBPSKZBDB2KUF4TQJGFOJA5CNFSM4IVOCL32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6QJJ5Q#issuecomment-530617590>, or mute the thread https://github.com/notifications/unsubscribe-auth/ABISE7FY3ZV7SP4RFJ4MP4TQJGFOJANCNFSM4IVOCL3Q .

This is my elbo elbo=recon_loss + self.BETA * anneal_w * total_kld

from variational-autoencoder.

jaanli commented on May 28, 2024

Shouldn't beta be the thing being annealed here? Or are you annealing anneal_w as well?

from variational-autoencoder.

nitba commented on May 28, 2024

Shouldn't beta be the thing being annealed here? Or are you annealing anneal_w as well?

beta took different constant values beta=0.01, 0.1, 1 multiplied by anneal_w , in fact beta change the final value of anneal_w, for example if beta=1 , the factor of KLD initialize from zero to 1, if beta=0.1 the KLD factorinitializes from zero to 0.1. I attached the figure of anneal_w in previous posts which is a logistic annealing function.

from variational-autoencoder.

jaanli commented on May 28, 2024

Probably best to start from annealing beta, with `anneal_w` removed. It will be easier to understand whether annealing beta helps your task. Feel free to follow up with me about this via email.

…

On Fri, Sep 13, 2019 at 2:05 AM Ivamcoder ***@***.***> wrote: Shouldn't beta be the thing being annealed here? Or are you annealing anneal_w as well? beta took different constant values beta=0.01, 0.1, 1 multiplied by anneal_w , in fact beta change the final value of anneal_w, for example if beta=1 , the factor of KLD initialize from zero to 1, if beta=0.1 the KLD factorinitializes from zero to 0.1. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#22>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABISE7HQGWAIGA4PCQ5R5CDQJMURDANCNFSM4IVOCL3Q> .

from variational-autoencoder.

Beta lower than one about variational-autoencoder HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent