Really interesting work. I have a baseline with softmax loss with the Deeplabv3 an

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Fail to improve the performance if I train the model from scratch. about lovaszsoftmax HOT 12 CLOSED

bermanmaxim commented on May 27, 2024 1

Fail to improve the performance if I train the model from scratch.

from lovaszsoftmax.

Comments (12)

bermanmaxim commented on May 27, 2024 4

Hi @PkuRainBow, thanks a lot for your interest! I am happy to hear about your result when combining the two losses.

I believe what you observe is mainly due to optimization. Our loss is particularely adapted for fine-tuning, as detailed in the FAQ in the main readme. Note that our VOC-DeepLab experiments are also a form of fine-tuning since they are initalized from the MS-COCO weights of the authors.

Fine-tuning has also a computational advantage, since our loss is slower to compute (O(p log p) complexity) - altough a dedicated CUDA kernel would likely significatively speed up our current implementation.

Combining the two losses can also steer the learning process. Besides just adding the losses, you could also do a weighted sum of the two losses, and decrease the weight of the cross-entropy loss throughout the optimization.

Other optimization-related aspect: while we found that keeping the same learning rate was generally good enough, I assume that doing more hyperparameter tuning would be beneficial.

Besides optimization, one other possible negative effect is that for smaller batches our loss optimizes something closer to image-IoU than dataset-IoU, as we discuss in section 3.1, and this can lead to a decrease of the dataset-IoU in the end. Combining with cross-entropy can also help here, preventing to specialize too closely to image-IoU.

Other approaches to tackle these limitations and the optimization are left to future work.

from lovaszsoftmax.

bermanmaxim commented on May 27, 2024 2

Hi @PkuRainBow, @alexander-rakhlin,

Thanks again for your interest, I'm looking forward to see these kind of contributions with finding good ways to train and combine our loss. At this point these questions are mostly experimental & based on intuition rather than theoretically founded.

@PkuRainBow your interpretation of that sentence is correct, I mean we mostly kept the training parameters but there could be more gains to be made with more hyperparameter exploration.

When I mentioned weights combining I talked about the possibility of a dynamic weighting, with for instance lambda * cross-entropy + (1 - lambda) * lovasz-softmax; by changing lambda from 1 to 0 accross epochs of the optimization you could likely benefit from both the combination and the "soft-finetuning" aspects of the loss.

from lovaszsoftmax.

bermanmaxim commented on May 27, 2024 2

@PkuRainBow it can be due to a combination of various factors; for instance optimization questions, or smaller batch sizes as underlined in the last paragraph of my first comment

Besides optimization, one other possible negative effect is that for smaller batches our loss optimizes something closer to image-IoU than dataset-IoU, as we discuss in section 3.1, and this can lead to a decrease of the dataset-IoU in the end. Combining with cross-entropy can also help here, preventing to specialize too closely to image-IoU.

I will close this thread as it is not an issue anymore but more of an extended scientific discussion. Happy to see usage of the loss and improvements - at least in combination for easier optimization.

from lovaszsoftmax.

PkuRainBow commented on May 27, 2024 1

@bermanmaxim
Thanks for your help.

I will try to finetune the models with LovaszSoftmax loss firstly. As for other approaches, it is a little bit expensive to exploit better hyperparameters.

As for the combination of the two loss functions. I simply choose 1:1. Maybe other combinations can further improve the performance.

Besides, great thanks to your other advice.

from lovaszsoftmax.

bermanmaxim commented on May 27, 2024

You're welcome, my pleasure! For now I'll keep this issue open for visibility.

from lovaszsoftmax.

alexander-rakhlin commented on May 27, 2024

Hi Maxim,

I too noticed that Lovasz Softmax works better in combination with cross-entropy. I used 0.5 and 0.9 weights for Lovasz Softmax in the weighted sum, and as long as the Lovasz Softmax weight was <1.0 the difference between 0.5 and 0.9 was not evident.

You say it's due to more local minima in Lovasz Softmax. Is this intuition or has a theory behind?

Thank you.

from lovaszsoftmax.

PkuRainBow commented on May 27, 2024

@alexander-rakhlin Could you share your results with different weights combinations? I only tried 1:1.

from lovaszsoftmax.

PkuRainBow commented on May 27, 2024

Here I share the single crop results on the validations set of cityscapes.

baseline with softmax loss：

 {'IU_array': array([0.98016613, 0.84412102, 0.9267689 , 0.62228906, 0.61643986,
        0.63782245, 0.69795884, 0.78631591, 0.92559306, 0.66367732,
        0.94683498, 0.82639853, 0.65797452, 0.95216736, 0.8073407 ,
        0.85389642, 0.63814378, 0.68186497, 0.77889023])}

baseline with both softmax loss and lovasz softmax loss(1:1):

{ 'IU_array': array([0.97955327, 0.84097007, 0.92471465, 0.52840852, 0.62658249,
       0.65854956, 0.71770889, 0.8115076 , 0.92476787, 0.65601253,
       0.94814198, 0.83470563, 0.6706206 , 0.95355899, 0.8312481 ,
       0.88661456, 0.71990126, 0.70689474, 0.78928316])}

We find that the improvements on some classes are very obvious. But there also exist some classes' performance drop largely. Very interesting.

So, I will try to finetune the baseline model with only the lovasz loss and report the related results latter.

from lovaszsoftmax.

PkuRainBow commented on May 27, 2024

@bermanmaxim Could you explain me the words "Other optimization-related aspect: while we found that keeping the same learning rate was generally good enough, I assume that doing more hyperparameter tuning would be beneficial."?

I checked your paper and I guessed that it means that we just replace the softmax loss with the lovasz-softmax loss and train the models with the same settings as previous one.

from lovaszsoftmax.

alexander-rakhlin commented on May 27, 2024

@PkuRainBow I tried 0.5 and 0.9 (1:1 and 9:1), and the results in my task were seemingly irrelevant to the combination weights.

from lovaszsoftmax.

PkuRainBow commented on May 27, 2024

@bermanmaxim Sorry to inform you that finetuning the cross-entropy based model will harm the performance according to my current experiments.

from lovaszsoftmax.

CoinCheung commented on May 27, 2024

@bermanmaxim

Hi, Is the key to use lovasz_softmax is to first fully train the model with normal cross entropy loss, and then finetune the model with lovasz_softmax ? Or should we roughly train the model with normal cross entropy and mainly depend on the lovasz_softmax to make the model converge better ?

from lovaszsoftmax.

Fail to improve the performance if I train the model from scratch. about lovaszsoftmax HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent