Hi, The main optimizer gets its parameters through net.p

Hi, good point. Thanks for the report! The <code class="notranslate"

EntropyBottleneck parameters not optimized as they should about compressai HOT 4 CLOSED

larjung commented on July 17, 2024

EntropyBottleneck parameters not optimized as they should

from compressai.

Comments (4)

larjung commented on July 17, 2024 1

Hi Jean,
Ok you are right, it is my mistake. Probably used some older version of the _likelihood() function without the scale_bound (as in the original tf_compression code, the scale lower bound does not exist in the likelihood function).
Thanks!

from compressai.

jbegaint commented on July 17, 2024

Hi, good point. Thanks for the report!

The _biases, _factors, _matrices are actually updated by the auxiliary optimizer, but their gradients are computed via the "main" loss, not the auxiliary ones.

You are right to highlight it would be more correct to only update the quantiles with the aux_optimizer, and update all the other parameters via the main optimizer. I'm actually working on a cleaner implementation, which would also not require to override .parameters() (which is a major issue do support multi-gpus training right now) and I'll address then.

In the meantime, the current version should still allow you to train and reproduce results. That said, please let me know if you have troubles reproducing results or training. Thanks!

from compressai.

larjung commented on July 17, 2024

Hi Jean,
Thank you for acknowledging this point.
Regarding being able to reproduce the training results, one interesting issue that I've encountered and had to solve, concerns MeanScaleHyperprior and JointAutoregressiveHierarchicalPriors models. In both of them, the estimated bpp on the latent entropy model (i.e. the gaussian conditional) started very high when training (around 20-30 bpp) and hasn't gone down to the the proper values (e.g. 1-2bpp). So although D loss term gone down, R remained high resulting in a poor model.
After investigation I found out that the root cause of this, is the fact that some "scale_hat" values (as predicted by the hyper-synthesis subnetwork) were negative (e.g -0.3) at the beginning of training, as there is no ReLU at the end of h_s (or context_prediction in case of JAHP). Those negative scales disrupted the entropy model in a way that the optimizer couldn't get out of.
My solution is to lower bound the predicted "scale_hat" to zero using LowerBound class. After this, training behaves normally and bpp values are as expected.
It's probably not the most elegant or correct solution, but it worked for me. One other option is to add a final ReLU to h_s (or context_prediction) but apply it only on the chunk that belongs to scale.
(Note that the ScaleHyperprior model doesn't suffer from this issue, since it has a ReLU at the end of h_s.)
If you managed to train successfully without encountering this issue at all, I'd be curious to understand what makes the difference. Maybe some other training hyper-params, since we are probably using different training scripts...

Anyway, it is a pleasure to work with your great framework. Thanks again.

Danny

from compressai.

jbegaint commented on July 17, 2024

Hi Danny, mmh this is weird.

So I've seen some papers and implementations where authors used ReLU or an exponential function to prevent the scales values going below zero. I've experimented a bit with torch.exp for the scales but then I had convergence issues.

Anyway, regarding the GaussianConditional class, there's already a scale_bound parameter that uses the LowerBound op to clip the scales to a minimum values (https://github.com/InterDigitalInc/CompressAI/blob/master/compressai/entropy_models/entropy_models.py#L496 and https://github.com/InterDigitalInc/CompressAI/blob/master/compressai/entropy_models/entropy_models.py#L567).
How do you initialize the GaussianConditional class? Can you share your lambda and hyper-parameters values?

from compressai.

EntropyBottleneck parameters not optimized as they should about compressai HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent