Comments (12)
Hi @PkuRainBow, thanks a lot for your interest! I am happy to hear about your result when combining the two losses.
I believe what you observe is mainly due to optimization. Our loss is particularely adapted for fine-tuning, as detailed in the FAQ in the main readme. Note that our VOC-DeepLab experiments are also a form of fine-tuning since they are initalized from the MS-COCO weights of the authors.
Fine-tuning has also a computational advantage, since our loss is slower to compute (O(p log p) complexity) - altough a dedicated CUDA kernel would likely significatively speed up our current implementation.
Combining the two losses can also steer the learning process. Besides just adding the losses, you could also do a weighted sum of the two losses, and decrease the weight of the cross-entropy loss throughout the optimization.
Other optimization-related aspect: while we found that keeping the same learning rate was generally good enough, I assume that doing more hyperparameter tuning would be beneficial.
Besides optimization, one other possible negative effect is that for smaller batches our loss optimizes something closer to image-IoU than dataset-IoU, as we discuss in section 3.1, and this can lead to a decrease of the dataset-IoU in the end. Combining with cross-entropy can also help here, preventing to specialize too closely to image-IoU.
Other approaches to tackle these limitations and the optimization are left to future work.
from lovaszsoftmax.
Hi @PkuRainBow, @alexander-rakhlin,
Thanks again for your interest, I'm looking forward to see these kind of contributions with finding good ways to train and combine our loss. At this point these questions are mostly experimental & based on intuition rather than theoretically founded.
@PkuRainBow your interpretation of that sentence is correct, I mean we mostly kept the training parameters but there could be more gains to be made with more hyperparameter exploration.
When I mentioned weights combining I talked about the possibility of a dynamic weighting, with for instance lambda * cross-entropy + (1 - lambda) * lovasz-softmax; by changing lambda from 1 to 0 accross epochs of the optimization you could likely benefit from both the combination and the "soft-finetuning" aspects of the loss.
from lovaszsoftmax.
@PkuRainBow it can be due to a combination of various factors; for instance optimization questions, or smaller batch sizes as underlined in the last paragraph of my first comment
Besides optimization, one other possible negative effect is that for smaller batches our loss optimizes something closer to image-IoU than dataset-IoU, as we discuss in section 3.1, and this can lead to a decrease of the dataset-IoU in the end. Combining with cross-entropy can also help here, preventing to specialize too closely to image-IoU.
I will close this thread as it is not an issue anymore but more of an extended scientific discussion. Happy to see usage of the loss and improvements - at least in combination for easier optimization.
from lovaszsoftmax.
@bermanmaxim
Thanks for your help.
I will try to finetune the models with LovaszSoftmax loss firstly. As for other approaches, it is a little bit expensive to exploit better hyperparameters.
As for the combination of the two loss functions. I simply choose 1:1. Maybe other combinations can further improve the performance.
Besides, great thanks to your other advice.
from lovaszsoftmax.
You're welcome, my pleasure! For now I'll keep this issue open for visibility.
from lovaszsoftmax.
Hi Maxim,
I too noticed that Lovasz Softmax works better in combination with cross-entropy. I used 0.5 and 0.9 weights for Lovasz Softmax in the weighted sum, and as long as the Lovasz Softmax weight was <1.0 the difference between 0.5 and 0.9 was not evident.
You say it's due to more local minima in Lovasz Softmax. Is this intuition or has a theory behind?
Thank you.
from lovaszsoftmax.
@alexander-rakhlin Could you share your results with different weights combinations? I only tried 1:1.
from lovaszsoftmax.
Here I share the single crop results on the validations set of cityscapes.
baseline with softmax loss:
{'IU_array': array([0.98016613, 0.84412102, 0.9267689 , 0.62228906, 0.61643986,
0.63782245, 0.69795884, 0.78631591, 0.92559306, 0.66367732,
0.94683498, 0.82639853, 0.65797452, 0.95216736, 0.8073407 ,
0.85389642, 0.63814378, 0.68186497, 0.77889023])}
baseline with both softmax loss and lovasz softmax loss(1:1):
{ 'IU_array': array([0.97955327, 0.84097007, 0.92471465, 0.52840852, 0.62658249,
0.65854956, 0.71770889, 0.8115076 , 0.92476787, 0.65601253,
0.94814198, 0.83470563, 0.6706206 , 0.95355899, 0.8312481 ,
0.88661456, 0.71990126, 0.70689474, 0.78928316])}
We find that the improvements on some classes are very obvious. But there also exist some classes' performance drop largely. Very interesting.
So, I will try to finetune the baseline model with only the lovasz loss and report the related results latter.
from lovaszsoftmax.
@bermanmaxim Could you explain me the words "Other optimization-related aspect: while we found that keeping the same learning rate was generally good enough, I assume that doing more hyperparameter tuning would be beneficial."?
I checked your paper and I guessed that it means that we just replace the softmax loss with the lovasz-softmax loss and train the models with the same settings as previous one.
from lovaszsoftmax.
@PkuRainBow I tried 0.5 and 0.9 (1:1 and 9:1), and the results in my task were seemingly irrelevant to the combination weights.
from lovaszsoftmax.
@bermanmaxim Sorry to inform you that finetuning the cross-entropy based model will harm the performance according to my current experiments.
from lovaszsoftmax.
Hi, Is the key to use lovasz_softmax is to first fully train the model with normal cross entropy loss, and then finetune the model with lovasz_softmax ? Or should we roughly train the model with normal cross entropy and mainly depend on the lovasz_softmax to make the model converge better ?
from lovaszsoftmax.
Related Issues (20)
- whats the difference with dice ? HOT 1
- How to improve the effect of the model? HOT 1
- Implementation of equibatch? HOT 4
- Got a very low MIoU after simply swapping out the cross entropy loss for "lovasz_softmax" HOT 5
- mIOU decreasing as the Lovasz Hinge Loss decreases HOT 1
- Memory requirement in TF HOT 2
- Loss function selection problem
- Tensorflow implemention is different from the pytorch version HOT 1
- Raise error when training xception+deeplabv3p with SGD
- Lovasz softmax with 1 class and small batch does not learn HOT 4
- lovasz_hinge as loss function Error HOT 1
- Gradients problem HOT 3
- weird results
- multi classes with lovasz_hinge
- how to combine lovasz hinge and bce in binary segmentaion task appropriately? HOT 1
- ModuleNotFoundError: No module named 'lovasz'
- How to understand the lovasz_grad when gt_sorted class number>1?
- How to visualize the given surface?
- train my data
- train my data HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lovaszsoftmax.