jeromerony / dml_cross_entropy Goto Github PK

Code for the paper "A unifying mutual information view of metric learning: cross-entropy vs. pairwise losses" (ECCV 2020 - Spotlight)

Home Page: https://arxiv.org/abs/2003.08983

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

metric-learning deep-learning cross-entropy

dml_cross_entropy's Introduction

Requirements for the experiments

scikit-learn
pytorch >= 1.4
sacred >= 0.8
tqdm
visdom_logger https://github.com/luizgh/visdom_logger
faiss https://github.com/facebookresearch/faiss

Data management

For In-Shop, you need to manually download the data from https://drive.google.com/drive/folders/0B7EVK8r0v71pVDZFQXRsMDZCX1E (at least the img.zip and list_eval_partition.txt), put them in data/InShop and extract img.zip.

You can download and generate the train.txt and test.txt for every dataset using the prepare_data.py script with:

python prepare_data.py

This will download and prepare all the necessary data for CUB200, Cars-196 and Stanford Online Products.

Usage

This repo uses sacred to manage the experiments. To run an experiment (e.g. on CUB200):

python experiment.py with dataset.cub

You can add an observer to save the metrics and files related to the expriment by adding -F result_dir:

python experiment.py -F result_dir with dataset.cub

Reproducing the results of the paper

CUB200

python experiment.py with dataset.cub model.resnet50 epochs=30 lr=0.02

CARS-196

python experiment.py with dataset.cars model.resnet50 epochs=100 lr=0.05 model.norm_layer=batch

Stanford Online Products

python experiment.py with dataset.sop model.resnet50 epochs=100 lr=0.003 momentum=0.99 nesterov=True model.norm_layer=batch

In-Shop

python experiment.py with dataset.inshop model.resnet50 epochs=100 lr=0.003 momentum=0.99 nesterov=True model.norm_layer=batch

Citation

@inproceedings{boudiaf2020unifying,
  title={A unifying mutual information view of metric learning: cross-entropy vs. pairwise losses},
  author={Boudiaf, Malik and Rony, J{\'e}r{\^o}me and Ziko, Imtiaz Masud and Granger, Eric and Pedersoli, Marco and Piantanida, Pablo and {Ben Ayed}, Ismail},
  booktitle={European Conference on Computer Vision},
  pages={548--564},
  year={2020},
  organization={Springer}
}

dml_cross_entropy's People

Contributors

Stargazers

Watchers

Forkers

forks-learning peternara kiminh gdjmck kwotsin youngfly11 ali-hassan mldl wayne980 lingxiao-he uripatish sidney001 lindsey98 xuchengxc alwayspku trellixvulnteam cuixxx alialemimatinpour

dml_cross_entropy's Issues

Trained models?

Hi,
Thanks for sharing the code!!!

Is it possible to have the trained models for CUB, CARS and SOP?

Thanks a lot.

Some overseen advantages of the pairwise losses

Hi,

I really love your paper, but would like to point some problems with "unary" softmax.

DML losses can work with infinite amount of classes. Pairwise CE can as well, unlike the traditional softmax. And by "infinite" one would mean either "not fitting into the memory", or "changing on the fly as new data comes".
"Softmax bottleneck" While it is true that pairwise losses need to pay attention to sampling, once as you could afford big enough batch size, it is not the problem anymore.

To conclude and if you are interested in the counter-examples - consider local descriptor learning -- it is a specific example of metric learning. Lets consider "Liberty" from PhotoTour dataset
(available at torchvision https://pytorch.org/docs/stable/torchvision/datasets.html#phototour ). It consists of 450k samples, 161k classes.
All the sota, or near-sota methods e.g. https://github.com/DagnyT/hardnet ) use a variant of a triplet loss with hard-negative mining. I am not aware of any successful application of the cross-entropy there.

Why |z_i-z_j| is both tightness term and diversity term ?

According to ICA based on a Smooth Estimation of the Differential Entropy NIPS 2008, Cited: 81
|z_i -z_k | is \hat H (Z), which refers to the diversity term.

Am I missing something?

In Class SmoothCrossEntropy, the temperature is not used

As title states

Some comparisons

Hello,

On the paper, just have few questions.

Using Cross Entropy, does it use the last layer embedding for the the top-k retrieval evaluation ?

Do you see any benefits of using
multi-label cross entropy head loss vs
a series of single labels losses ?

thx and happy new year

Query about SPCE implementation

Hi Jeromerony,
Thanks for your great work in introducing SPCE. Recently our group is investigating the limitation of the cross-entropy loss in reducing the intra-class variation, and I found your work may help enhance our understanding in this problem. I wonder whether it’s possible to also upload the code of SPCE to the repo.

Thanks,
Shaw

Getting OOM after first evaluation

Hi, we are very interesting in your work. We were trying to run your code with default setting in a server with 4 Tesla V100 GPUs, but we got the OOM error after the first evaluation process. I pasted the screenshot belowe, could you please give some hints to solve this problem. Thank you.

Link between the center loss and conditional cross entropy

Dear authors,

Thanks for your wonderful work. I really like it, yet I got confused about why the center loss can be interpreted as a conditional entropy between \hat{Z} and \bar Z.

Might I have your kind reply.
Thanks.

Understanding the results

Hi,
Thanks for sharing the code!!!

I've run your code

python experiment.py with dataset.cars model.resnet50 epochs=100 lr=0.05 model.norm_layer=batch -F results/cars/

and I've getting the following result:

Validation [099]
{'cosine': {1: 1.03, 2: 1.03, 4: 1.03, 8: 1.03, 16: 1.03, 32: 2.0},
'l2': {1: 1.17, 2: 2.12, 4: 4.19, 8: 7.4, 16: 12.73, 32: 18.71}}
INFO - Metric Learning - Result: 1.03
INFO - Metric Learning - Completed after 0:54:25

I do not know if I am doing something wrong, but how the result 1.03 is related to the 89.3 reported in your paper?

Am i doing something wrong?

Best regards.

the implementation of losses

Hi jeromerony, I like your paper very much, thanks for your wonderful work. I noticed that you only implemented cross entropy in this repo, neither PCE loss nor SPCE loss can be found here, and you also mentioned this is an early version in another issue. To make clear, I am curious that have you really implemented PCE loss? or you just use this loss to demonstrate your theory? is the result in section 6.4 merely from CE loss or it is a result of SPCE loss?

My english is poor, sorry to bother you.

Clarification on SCE loss

Hi,
Can I have a quick clarification on the SCE loss. Is the epsilon in your loss aim to avoid -inf problem for log? Thanks!