Could you please elaborate on your loss function? about twostagevae HOT 7 OPEN

daib13 commented on May 31, 2024

Could you please elaborate on your loss function?

from twostagevae.

Comments (7)

daib13 commented on May 31, 2024

Hi @mmderakhshani

KL loss. What do you mean by a regular implementation? The KL loss is exactly the same as equation 6 in "Tutorial on variational autoencoders". Note that the approximate posterior for the j-th dimension is N(\mu_j, \sigma_j^2). Some implementations may use N(\mu_j, \sigma_j). This could be a potential difference but it will make no difference in the generative performance.
self.gen_loss1 = - log p_\theta(x|z), where p_\theta(x|z) is a Gaussian distribution, i.e. N(x | \hat{x}, \gamma I). Then we have

self.gen_loss1 = \sum ( (x- \hat{x})^2 / \gamma / 2 - log \gamma ) + constant,

which is our implementation. Yes self.gen_loss1 could be negative. As we argued in our paper "Diagnosing and enhancing vae models", \gamma will converge to 0 and self.gen_loss1 will go to negative infinite when the objective function is globaly optimized. Of course you can fix self.loggamma_x to be 0. But this will make the reconstruction blurry. Intuitively speaking, as \hat{x} become exactly the same as x, meaning the model produces perfect reconstruction, the only term related to \gamma in the objective is -log \gamma, which will push \gamma to 0 and the objective to negative infinite.

from twostagevae.

mmderakhshani commented on May 31, 2024

Ok. Thanks for the paper you referred to and again thanks for your great explanation.

Could you please tell me how did you handle the negative loss case? Did you follow a kind of policy?

As another question, I have seen in your code that you used Adam Optimizer to optimize the parameters of the network and also at the beginning of each epoch, you changed the value of the learning rate of each parameter (some kind of decaying strategy). I think, as far as I know, Adam Optimizer, on its own, changes the value of the learning rate per each parameter based on some update rule. Could you please tell me why did you change the value of learning rate manually?

from twostagevae.

daib13 commented on May 31, 2024

@mmderakhshani

About the negative loss. We just leave it negative. There is no need to force the loss to be positive.
About the adam optimizer and the learning rate. I just randomly select an optimization strategy. I don't know much about optimization. Maybe there is no need to manually change the learning rate as you said. I am not sure which way is better.

from twostagevae.

chanshing commented on May 31, 2024

self.gen_loss1 = \sum ( (x- \hat{x})^2 / \gamma / 2 - log \gamma ) + constant,

I think the terms in the summation should be summing, instead of substracting:

\sum ( (x- \hat{x})^2 / \gamma / 2 + log \gamma )

from twostagevae.

daib13 commented on May 31, 2024

@chanshing yes you are correct. I made a typo in the response. Thanks for pointing this out.

from twostagevae.

mago876 commented on May 31, 2024

I'm having trouble with \gamma:
In your code it seems that N(x | \hat{x}, \gamma^2 I) then
self.gen_loss1 = \sum ( (x- \hat{x})^2 / \gamma^2 / 2 + log \gamma ) + constant,
That's right?

from twostagevae.

daib13 commented on May 31, 2024

@mago876 yeah in the code we use N(x | \hat{x}, \gamma^2 I). In the paper and the discussion, we use N(x | \hat{x}, \gamma I). Sorry for the confusion. In our original paper draft, we used the same formulation as that in the code. But we changed it to the current version for convenience but didn’t change the code accordingly.

from twostagevae.

Could you please elaborate on your loss function? about twostagevae HOT 7 OPEN

Comments (7)

Related Issues (15)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent