Giter Site home page Giter Site logo

Comments (12)

bermanmaxim avatar bermanmaxim commented on May 27, 2024 4

Hi @PkuRainBow, thanks a lot for your interest! I am happy to hear about your result when combining the two losses.

I believe what you observe is mainly due to optimization. Our loss is particularely adapted for fine-tuning, as detailed in the FAQ in the main readme. Note that our VOC-DeepLab experiments are also a form of fine-tuning since they are initalized from the MS-COCO weights of the authors.

Fine-tuning has also a computational advantage, since our loss is slower to compute (O(p log p) complexity) - altough a dedicated CUDA kernel would likely significatively speed up our current implementation.

Combining the two losses can also steer the learning process. Besides just adding the losses, you could also do a weighted sum of the two losses, and decrease the weight of the cross-entropy loss throughout the optimization.

Other optimization-related aspect: while we found that keeping the same learning rate was generally good enough, I assume that doing more hyperparameter tuning would be beneficial.

Besides optimization, one other possible negative effect is that for smaller batches our loss optimizes something closer to image-IoU than dataset-IoU, as we discuss in section 3.1, and this can lead to a decrease of the dataset-IoU in the end. Combining with cross-entropy can also help here, preventing to specialize too closely to image-IoU.

Other approaches to tackle these limitations and the optimization are left to future work.

from lovaszsoftmax.

bermanmaxim avatar bermanmaxim commented on May 27, 2024 2

Hi @PkuRainBow, @alexander-rakhlin,

Thanks again for your interest, I'm looking forward to see these kind of contributions with finding good ways to train and combine our loss. At this point these questions are mostly experimental & based on intuition rather than theoretically founded.

@PkuRainBow your interpretation of that sentence is correct, I mean we mostly kept the training parameters but there could be more gains to be made with more hyperparameter exploration.

When I mentioned weights combining I talked about the possibility of a dynamic weighting, with for instance lambda * cross-entropy + (1 - lambda) * lovasz-softmax; by changing lambda from 1 to 0 accross epochs of the optimization you could likely benefit from both the combination and the "soft-finetuning" aspects of the loss.

from lovaszsoftmax.

bermanmaxim avatar bermanmaxim commented on May 27, 2024 2

@PkuRainBow it can be due to a combination of various factors; for instance optimization questions, or smaller batch sizes as underlined in the last paragraph of my first comment

Besides optimization, one other possible negative effect is that for smaller batches our loss optimizes something closer to image-IoU than dataset-IoU, as we discuss in section 3.1, and this can lead to a decrease of the dataset-IoU in the end. Combining with cross-entropy can also help here, preventing to specialize too closely to image-IoU.

I will close this thread as it is not an issue anymore but more of an extended scientific discussion. Happy to see usage of the loss and improvements - at least in combination for easier optimization.

from lovaszsoftmax.

PkuRainBow avatar PkuRainBow commented on May 27, 2024 1

@bermanmaxim
Thanks for your help.

I will try to finetune the models with LovaszSoftmax loss firstly. As for other approaches, it is a little bit expensive to exploit better hyperparameters.

As for the combination of the two loss functions. I simply choose 1:1. Maybe other combinations can further improve the performance.

Besides, great thanks to your other advice.

from lovaszsoftmax.

bermanmaxim avatar bermanmaxim commented on May 27, 2024

You're welcome, my pleasure! For now I'll keep this issue open for visibility.

from lovaszsoftmax.

alexander-rakhlin avatar alexander-rakhlin commented on May 27, 2024

Hi Maxim,

I too noticed that Lovasz Softmax works better in combination with cross-entropy. I used 0.5 and 0.9 weights for Lovasz Softmax in the weighted sum, and as long as the Lovasz Softmax weight was <1.0 the difference between 0.5 and 0.9 was not evident.

You say it's due to more local minima in Lovasz Softmax. Is this intuition or has a theory behind?

Thank you.

from lovaszsoftmax.

PkuRainBow avatar PkuRainBow commented on May 27, 2024

@alexander-rakhlin Could you share your results with different weights combinations? I only tried 1:1.

from lovaszsoftmax.

PkuRainBow avatar PkuRainBow commented on May 27, 2024

Here I share the single crop results on the validations set of cityscapes.

baseline with softmax loss:

 {'IU_array': array([0.98016613, 0.84412102, 0.9267689 , 0.62228906, 0.61643986,
        0.63782245, 0.69795884, 0.78631591, 0.92559306, 0.66367732,
        0.94683498, 0.82639853, 0.65797452, 0.95216736, 0.8073407 ,
        0.85389642, 0.63814378, 0.68186497, 0.77889023])}

baseline with both softmax loss and lovasz softmax loss(1:1):

{ 'IU_array': array([0.97955327, 0.84097007, 0.92471465, 0.52840852, 0.62658249,
       0.65854956, 0.71770889, 0.8115076 , 0.92476787, 0.65601253,
       0.94814198, 0.83470563, 0.6706206 , 0.95355899, 0.8312481 ,
       0.88661456, 0.71990126, 0.70689474, 0.78928316])}

We find that the improvements on some classes are very obvious. But there also exist some classes' performance drop largely. Very interesting.

So, I will try to finetune the baseline model with only the lovasz loss and report the related results latter.

from lovaszsoftmax.

PkuRainBow avatar PkuRainBow commented on May 27, 2024

@bermanmaxim Could you explain me the words "Other optimization-related aspect: while we found that keeping the same learning rate was generally good enough, I assume that doing more hyperparameter tuning would be beneficial."?

I checked your paper and I guessed that it means that we just replace the softmax loss with the lovasz-softmax loss and train the models with the same settings as previous one.

from lovaszsoftmax.

alexander-rakhlin avatar alexander-rakhlin commented on May 27, 2024

@PkuRainBow I tried 0.5 and 0.9 (1:1 and 9:1), and the results in my task were seemingly irrelevant to the combination weights.

from lovaszsoftmax.

PkuRainBow avatar PkuRainBow commented on May 27, 2024

@bermanmaxim Sorry to inform you that finetuning the cross-entropy based model will harm the performance according to my current experiments.

from lovaszsoftmax.

CoinCheung avatar CoinCheung commented on May 27, 2024

@bermanmaxim

Hi, Is the key to use lovasz_softmax is to first fully train the model with normal cross entropy loss, and then finetune the model with lovasz_softmax ? Or should we roughly train the model with normal cross entropy and mainly depend on the lovasz_softmax to make the model converge better ?

from lovaszsoftmax.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.