bermanmaxim / lovaszsoftmax Goto Github PK

View Code? Open in Web Editor NEW

1.4K 32.0 268.0 2.11 MB

Code for the Lovász-Softmax loss (CVPR 2018)

Home Page: http://bmax.im/LovaszSoftmax

License: MIT License

Jupyter Notebook 97.64% Python 2.36%

image-segmentation pytorch neural-networks loss-functions

lovaszsoftmax's Introduction

The Lovász-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks

Maxim Berman, Amal Rannen Triki, Matthew B. Blaschko

ESAT-PSI, KU Leuven, Belgium.

Published in CVPR 2018. See project page, arxiv paper, paper on CVF open access.

PyTorch implementation of the loss layer (pytorch folder)

Files included:

lovasz_losses.py: Standalone PyTorch implementation of the Lovász hinge and Lovász-Softmax for the Jaccard index
demo_binary.ipynb: Jupyter notebook showcasing binary training of a linear model, with the Lovász Hinge and with the Lovász-Sigmoid.
demo_multiclass.ipynb: Jupyter notebook showcasing multiclass training of a linear model with the Lovász-Softmax

The binary lovasz_hinge expects real-valued scores (positive scores correspond to foreground pixels).

The multiclass lovasz_softmax expect class probabilities (the maximum scoring category is predicted). First use a Softmax layer on the unnormalized scores.

TensorFlow implementation of the loss layer (tensorflow folder)

Files included:

lovasz_losses_tf.py: Standalone TensorFlow implementation of the Lovász hinge and Lovász-Softmax for the Jaccard index
demo_binary_tf.ipynb: Jupyter notebook showcasing binary training of a linear model, with the Lovász Hinge and with the Lovász-Sigmoid.
demo_multiclass_tf.ipynb: Jupyter notebook showcasing the application of the multiclass loss with the Lovász-Softmax

Warning: the losses values and gradients have been tested to be the same as in PyTorch (see notebooks), however we have not used the TF implementation in a training setting.

Usage

See the demos for simple proofs of principle.

FAQ

How should I use the Lovász-Softmax loss?

The loss can be optimized on its own, but the optimal optimization hyperparameters (learning rates, momentum) might be different from the best ones for cross-entropy. As discussed in the paper, optimizing the dataset-mIoU (Pascal VOC measure) is dependent on the batch size and number of classes. Therefore you might have best results by optimizing with cross-entropy first and finetuning with our loss, or by combining the two losses.

See for example how the work Land Cover Classification From Satellite Imagery With U-Net and Lovasz-Softmax Loss by Alexander Rakhlin et al. used our loss in the CVPR 18 DeepGlobe challenge.

Inference in Tensorflow is very slow...

Compiling from Tensorflow master (or using a future distribution that includes commit tensorflow/tensorflow@73e3215) should solve this problem; see issue #6.

Citation

Please cite

@inproceedings{berman2018lovasz,
  title={The Lov{\'a}sz-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks},
  author={Berman, Maxim and Rannen Triki, Amal and Blaschko, Matthew B},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={4413--4421},
  year={2018}
}

lovaszsoftmax's People

Contributors

Stargazers

Watchers

Forkers

jdc08161063 wanjinchang liyuanyaun farmingyard suzhenghang mingensiie starstylesky daijucug insmod-he hyzcn csuwoshikunge pchank cxz alliedel preheatedkd qqgeogor aihill lvnszn soonhwan-kwon arc144 netphone alex-ah yingxi320 chicm braindancer lochappy aglotero igeti sorourmo wangyoucao antonpon hongdayu wukongeek yashar78 zunzhumu szywind vibhanshu13 se7enzhou shubhampachori12110095 wmmxk sumitdua10 r3krut batermj fujiyuu75 onexuan gu5hanl1gh7n1n deisler134 qq604395564 spyderxu shahidpavis ashwathaithal pfhgetty mxwell insad mihailsalnikov klqulei mateusz93 zhuyiche luqinglin1024 vipermdl fendaq yuv4r4j volodymyrahafonov nero19960329 moontree dreadlord1984 tbetterlife xiamenwcy mhaiyang ryohachiuma anyuzoey fireae earhian sunbau brisyramshere balrajashwath junjue-wang lhanaf zihaodong hsuxu luhc15 choidm wldeephi blaxe05 verigle watchmexiang yakuzeng vasgaowei kehuantiantang elle1994 yaojiajie zhouleisjtu yangyuren03 sxgnwf brandonzhong mohamedmindee lightningsoon baoyufuyou my-hello-world fhxzh

lovaszsoftmax's Issues

Gradients problem

Hi, I've tried to use your loss implementation using pytorch 1.4 and the network do not learn anything. This is because you've used Variable to define some tensors in the loss and in this Pytorch version the Variable has been deprecated as you can see in:
https://pytorch.org/docs/stable/autograd.html#variable-deprecated.

I've fixed this problem by adding requires_grad=True in all Variable definitions in the file:
https://github.com/bermanmaxim/LovaszSoftmax/blob/master/pytorch/lovasz_losses.py

Lovasz softmax with 1 class and small batch does not learn

I have an image segmentation task with small batch size (4-8) and some samples that have only the background (negative) class.

I have implemented lovasz softmax as below:

loss2 = lovasz_softmax(probs, labels, classes=[1], per_image=False)

where probs are B, H, W, C, and label is B, H, W with a range of [0, 1]

However, the network does not learn at all -- the output feature maps look random, and tuning the learning rate does not improve the issue.

The same network works fine with dice, Tversky, focal, or BCE loss.

I think it is due to the presence of background-only classes -- I know that classes = 'present' solves this for multi class problems. Is there a way to do the same for a binary lovasz softmax?

possible bug

Hi,
Thank you for your work.

I believe there is error here

LovaszSoftmax/lovasz_losses.py

Line 26 in 6309c68

jaccard[1:p] = jaccard[1:p] - jaccard[0:-1]

should be
jaccard[1:p] = jaccard[1:p] - jaccard[:p-1]

mIOU decreasing as the Lovasz Hinge Loss decreases

Hi Maxim Berman! Great Work.
I am using the lovasz_hinge loss and iou_binary as my metric.
My labels are binary masks of foreground represented as 1 and background as 0.
And I am currently overfitting one example just to see how my model(which is a form of hyper-network) works.
But as the Lovasz Hinge loss decreases, the output of iou_binary also decreases.
Thanks alot in advance for helping

Tensorflow version?

Hi!
Do you have tensorflow version? Because I have limited experience with PyTorch and I am stuck a little in reimplementation in tensorflow(((

ModuleNotFoundError: No module named 'lovasz'

pip install lovasz

ERROR: Could not find a version that satisfies the requirement lovasz (from versions: none)
ERROR: No matching distribution found for lovasz
Note: you may need to restart the kernel to use updated packages.

Please let me know the solution to this.

Should I cancel the "sigmoid/softmax" layer in the last?

Should I cancel the "sigmoid/softmax" layer in the last when training?

How to visualize the given surface?

How to visualize the given surface as below?

Some TensorFlow implementation problems.

hi, Maxim

you present a very interesting and solid work! but I met some implementation error while using your lovasz loss in my deeplab v3+. My initial loss is tf.losses.softmax_cross_entropy, and I prepared:

onehot_labels: [batch_size, num_classes] target one-hot-encoded labels.
logits: [batch_size, num_classes] logits outputs of the network .

as its input, but it just didn't fit in your loss, could you please give some advice about how to transfer original params into your params like probas, labels? Thank you!

plug'n play implentation for tensorflow/keras

I'm trying to use your tensorflow implementation for a U-net using keras.
The problem I face is that I cannot simply plugin the lovasz_softmax loss function U-Net model since the loss function takes the labels as image batches dim(labels) = (batchsize, width, height, 1) and the probas in the one-hot-vector notation - as it should be for such a problem (dim(probas)=(batchsize, width, height, n_classes).

Simply taking argmax of the probas does not work because keras raises an Exception when trying to calculate the gradient of argmax during training. If I understood your paper correctly, this is problem what the use lovasz softmax should avoid.

model.compile(optimizer=SDG() , loss=lovasz_sofmax, metrics=["accuracy"])

multi classes with lovasz_hinge

Hi, thanks for your great work. But I have a question here.
emm, when I have a multi classes semantic segmentation task, I can convert the label to one-hot format and do sigmoid to the output of the network then apply nn.BCELoss() to the label and outputs. (Certainly, one-hot + no sigmoid outputs + nn.BCEWithLogitsLoss is also ok), when i do the inference, i just do torch.sigmoid to the outputs of the network and set the thershold as 0.5, then i can get the correct results of semantic segmentation.So may I do the same thing to the lovasz_hinge()? one-hot + no sigmoid outputs + lovasz_hinge?Does that work? And the inference process is same as above?

Vairiable data has to be a tensor, but got Variable

I have solved this problem!

I test the pytorch implementation on my project and I call the lovasz_loss like below:

class CriterionLovaszSoftmax(nn.Module):
    '''
    LovaszSoftmax loss:
        loss functions used to optimize the mIOU directly.
    '''
    def __init__(self, ignore_index=255):
        super(CriterionLovaszSoftmax, self).__init__()
        self.ignore_index = ignore_index

    def forward(self, preds, target):
        n, h, w = target.size(0), target.size(1), target.size(2)
        scale_pred = F.upsample(input=preds, size=(h, w), mode='bilinear')
        prob = F.softmax(scale_pred)
        loss = lovasz_softmax(prob, target, ignore=self.ignore_index)
        return loss

But I got such errors:

File "/home/sdb/semantic-segmentation/utils/criterion.py", line 40, in forward
loss = lovasz_softmax(prob, target, ignore=self.ignore_index)
File "/home/sdb/semantic-segmentation/utils/lovasz_loss.py", line 166, in lovasz_softmax
loss = lovasz_softmax_flat(*flatten_probas(probas, labels, ignore), only_present=only_present)
File "/home/sdb/semantic-segmentation/utils/lovasz_loss.py", line 183, in lovasz_softmax_flat
errors = (Variable(fg) - probas[:, c]).abs()
RuntimeError: Variable data has to be a tensor, but got Variable

Then I checked the implementations for lovasz_softmax_flat function:

def lovasz_softmax_flat(probas, labels, only_present=False):
    """
    Multi-class Lovasz-Softmax loss
      probas: [P, C] Variable, class probabilities at each prediction (between 0 and 1)
      labels: [P] Tensor, ground truth labels (between 0 and C - 1)
      only_present: average only on classes present in ground truth
    """
    C = probas.size(1)
    losses = []
    for c in range(C):
        fg = (labels == c).float() # foreground for class c
        if only_present and fg.sum() == 0:
            continue
        errors = (Variable(fg) - probas[:, c]).abs()
        errors_sorted, perm = torch.sort(errors, 0, descending=True)
        perm = perm.data
        fg_sorted = fg[perm]
        losses.append(torch.dot(errors_sorted, Variable(lovasz_grad(fg_sorted))))
    return mean(losses)

So it seems that we do not need to convert the fg with Variable(fg).

ENet version?

Dear Maxim,
Nice to read your paper. Congrats for getting selected in the CVPR 18. Great joob!!!
I am interested in fine tuned ENet version of your work. If you can provide relevant prototxt files and pre trained weights and/or modified caffe build would help me alot.
Thanks

How to understand the lovasz_grad when gt_sorted class number>1?

hi, @bermanmaxim
jaccard_loss = 1 - IOU
why do the jaccard[1:] - jaccard[0:-1]?

def lovasz_grad(gt_sorted):
    """
    Computes gradient of the Lovasz extension w.r.t sorted errors
    See Alg. 1 in paper
    """
    p = len(gt_sorted)
    gts = gt_sorted.sum()
    intersection = gts - gt_sorted.float().cumsum(0)
    union = gts + (1 - gt_sorted).float().cumsum(0)
    jaccard = 1. - intersection / union
    if p > 1: # cover 1-pixel case
        jaccard[1:p] = jaccard[1:p] - jaccard[0:-1]
    return jaccard

Implementation of equibatch?

Hi, thanks for the great work! It might be a silly question. Could you share the implementation of equibatch (even better in pytorch version) which is mentioned in the paper? Thanks!

How to set classes weights?

In Cross-entropy training,
loss = L.xloss(out, labels, ignore=255)?

What if i need to use weighted LovaszSoftmax.

What should I set?

weird results

Hi，thanks for your work. I have added the tensorflow version of lovaszsoftmax to my task，I trained the model with
Cross entropy loss first, then fine tuned it with Cross entropy loss + lovaszsoftmax loss（the weights is 1:1）, the mIoU improved about 2%.
But when tested on videos it seems like that the model without lovaszsoftmax loss performs better, especially on recall.
Do you have any idea about this, thank you.

my task is 2-class lane segmentation

lovasz_hinge as loss function Error

Hi!!
Firstly, thank you very much for sharing your code. It's a great work!

I'm trying to train my model with lovasz_hinge as loss function:

model.compile(optimizer =opt,loss= [lovasz_hinge], metrics = [matthews_correlation])

But I have the next error:

File "C:\Users\Usuario\Anaconda3\envs\env_gpu\lib\site-packages\keras\optimizers.py", line 91, in get_gradients
raise ValueError('An operation has None for gradient. '

ValueError: An operation has None for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval

Do you know what it's the problem?

Thank you very much in advance!

Labels should be only {-1,1} in case of binary segmentation?

Hi there,

Thanks for sharing the code of this fantastic job. Congratulations on your CVPR paper! I have a question about the labels in the ground-truths. The gt labels should be {-1,1} (-1:background, 1:foreground) and for instance {0,1} (0:background, 1:foreground) doesn't work properly, right?

How to improve the effect of the model?

I simply replaced the CrossEntropyLoss with LovaszSoftmax and I got very bad results！

Raise error when training xception+deeplabv3p with SGD

tensorflow.python.framework.errors_impl.InvalidArgumentError: Reshape cannot infer the missing input size for an empty tensor unless all specified input sizes are non-zero
         [[Node: training/SGD/gradients/loss/activation_82_loss/descending_sort_0_grad/Reshape = Reshape[T=DT_INT32, Tshape=DT_INT32, _class=["loc:@train...rseToDense"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](loss/activation_82_loss/descending_sort_0:1, training/SGD/gradients/loss/activation_82_loss/descending_sort_0_grad/stack)]]

Raise error with LovaszSoftmax and no error with jaccard loss or others, where I keep the same datasets,networks,optimizers and params.

Fail to improve the performance if I train the model from scratch.

Really interesting work.
I have a baseline with softmax loss with the Deeplabv3 and achieve mIOU=76.7 on the Cityscapes.

And I simply replace the cross entropy loss with your proposed loss and train the models with the same learning rate and weight decay, but I only achieve mIOU=64.7.

Could you give me some hint?

I also notice that you also do not train the ENet from scratch and you just finetune the models.

Besides, I also conduct a small experiments to train the models with both the cross entropy loss and your proposed loss, which achieves a good performance: mIOU=78.4.

It would be great if you could share me your advice!

About the slow speed on tensorflow

Hello,
I loved your work on Lovasz softmax very much and implemented it on a modified version of Deeplabv3+ in Tensorflow. However, I experienced significant speed drop, the time used per step increased from 0.4s ( using cross entropy) to now almost 3.8s. Is this normal or did I do something wrong?
Thankyou!

Memory requirement in TF

I was using batch of size 32 and once I switched to lovasz loss I had to decrease batch size to 8 to be able to train the same model. Otherwise OOM error is thrown.
Is it expected?

Got a very low MIoU after simply swapping out the cross entropy loss for "lovasz_softmax"

Hello, nice to read this paper. I have encountered the problem that I got a very low miou(0.003) from Deeplabv3+ with Lovasz_softmax. It can normally achieve miou=76% using cross entropy loss.
Environment:
pytorch 1.0
Ubuntu 16.04
batch size: 10
dataset: Pascal VOC 2012 (aug)
loaded ImageNet pretrained ResNet-101 weight

And here is the code of Lovasz softmax:

class LovaszSoftmax(nn.Module):
    def __init__(self, per_image=False):
        super(LovaszSoftmax, self).__init__()
        self.lovasz_softmax = lovasz_softmax
        self.per_image = per_image

    def forward(self, pred, label):
        pred = F.softmax(pred, dim=1)
        return self.lovasz_softmax(pred, label, per_image=self.per_image, ignore=255)

Thanks!

whats the difference with dice ?

Thanks for this great work, but i canno understand, I think dice is also kind of optimizing iou, so whats the difference ?

train my data

My dataset is a csv file and the tag file is also a csv file, after training and prediction I want to get still a csv file, how can I change it in the program, thanks.

"name 'ifilterfalse' is not defined" in Python3

Hi,
I've tried to run PyTorch implementation in Python 3.6 and got the error:

    226     l = iter(l)
    227     if ignore_nan:
--> 228         l = ifilterfalse(isnan, l)
    229     try:
    230         n = 1

NameError: name 'ifilterfalse' is not defined

I think the function needs to be aliased after import in Python 3 like this:

from itertools import filterfalse as filterfalse

Loss function selection problem

Hello, ask you a question, my sample is two types, the target pixel is 255, the background pixel is 0, I would like to choose which loss function, and then what changes I need to make in the process of use

how to combine lovasz hinge and bce in binary segmentaion task appropriately?

Dear BermanMaxin,
Thanks for your great work,it has helped me a lot!
I got a confusion that how to combine lovasz hinge and bce in binary segmentaion task appropriately.
As we all know,lovasz hinge expects logits(without sigmoid),but bce need relust after sigmoid.What confuses me is that whether these two different types(with/without sigmoid) losses can get along well.
Other combo loss,e.g. bce+lovasz softmax,bce+dice, all need sigmoid so in my mind there is no problem.
Could you give me some advice about this,thanks!
In addition,if 'per_image=False' can bring a faster convergence when batch size is big.
Thanks.

train my data

My dataset is a single image corresponding to a label without having to stitch it, how do I modify it in the program

Very slow inference in tensorflow

before i use your loss function, 2.5sec/step
after i use your loss function, 32.0sec/step

i use tensorflow 1.6.0

Tensorflow implemention is different from the pytorch version

Hi, thanks for your job. When I read the code, I find possible problem here,
This is the tensorflow version:
tf.tensordot(errors_sorted, tf.stop_gradient(grad), 1, name="loss_class_{}".format(c))
This is ht pytorch version:
loss = torch.dot(F.relu(errors_sorted), Variable(grad))
The tensorflow version have no no Nonlinear，but pytorch have it. I have no idea about which one is right or which one is better.
A more question here, if I want to understand submodular completely, what should I do, do you have some link or book to recommend to me.
Thanks!