Giter Site home page Giter Site logo

research-ms-loss's Introduction

License: CC BY-NC 4.0

Multi-Similarity Loss for Deep Metric Learning (MS-Loss)

Code for the CVPR 2019 paper Multi-Similarity Loss with General Pair Weighting for Deep Metric Learning

Performance compared with SOTA methods on CUB-200-2011

Rank@K 1 2 4 8 16 32
Clustering64 48.2 61.4 71.8 81.9 - -
ProxyNCA64 49.2 61.9 67.9 72.4 - -
Smart Mining64 49.8 62.3 74.1 83.3 -
Our MS-Loss64 57.4 69.8 80.0 87.8 93.2 96.4
HTL512 57.1 68.8 78.7 86.5 92.5 95.5
ABIER512 57.5 68.7 78.3 86.2 91.9 95.5
Our MS-Loss512 65.7 77.0 86.3 91.2 95.0 97.3

Prepare the data and the pretrained model

The following script will prepare the CUB dataset for training by downloading to the ./resource/datasets/ folder; which will then build the data list (train.txt test.txt):


Download the imagenet pretrained model of bninception and put it in the folder: ~/.torch/models/.


pip install -r requirements.txt
python develop build

Train and Test on CUB200-2011 with MS-Loss


Trained models will be saved in the ./output/ folder if using the default config.

Best recall@1 higher than 66 (65.7 in the paper).


For any questions, please feel free to reach


If you use this method or this code in your research, please cite as:

title={Multi-Similarity Loss with General Pair Weighting for Deep Metric Learning},
author={Wang, Xun and Han, Xintong and Huang, Weilin and Dong, Dengke and Scott, Matthew R},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},


MS-Loss is CC-BY-NC 4.0 licensed, as found in the LICENSE file. It is released for academic research / non-commercial use only. If you wish to use for commercial purposes, please contact [email protected].

research-ms-loss's People


chammika-become avatar dependabot[bot] avatar kktsubota avatar mscottml avatar


 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar


 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

research-ms-loss's Issues

how is the PIXEL_MEAN and PIXEL_STD calculated?

First of all thank you for the great work.

In ret_benchmark/data/transforms/, you have:
normalize_transform = T.Normalize(mean=cfg.INPUT.PIXEL_MEAN,
if is_train:
transform = T.Compose([

I wonder how is the PIXEL_MEAN and PIXEL_STD calculated? Are they calculated after Resize()、RandomResizedCrop(), RandomHorizontalFlip() and ToTensor()? Or they are calculated applying only ToTensor() (which converts a PIL image from [0, 255] to [0,1]) to all the pics in the dataset?

_prepare_batch function in RandomIdentitySampler select just positive pair if available!!

I read your good paper with title "Multi-similarity loss with general pair weighting for deep metric learning" and based on the content of that paper, I cannot understand logic of "_prepare_batch" function in "RandomIdentitySampler", because you select just positive pair in it (if available).
This make your method bad training!
for convenient i copy mentioned part bellow:
for label in self.labels:
idxs = copy.deepcopy(self.label_index_dict[label])
#load all data indexes that equal to label to idxs
if len(idxs) < self.K:
idxs.extend(np.random.choice(idxs, size=self.K - len(idxs), replace=True))

Detailed setting of hyper parameters

The setting of hyper parameters we can refer is just the example.yaml, but as you said in your paper, you have experimented on more than one dataset. How to set the hyper parameters for those datasets? Please give more config file for reference, thanks.

About Multi-label task , I am desire your reply.

Hello , I found this code can only run on one-label task and , I want to use MSLOSS to an image retrieval task , And When I deal with multi-label dataset , I am kind of ignorant now.

pos_pair_ = sim_mat[i][labels == labels[i]]
pos_pair_ = pos_pair_[pos_pair_ < 1 - ep]
neg_pair_ = sim_mat[i][labels != labels[i]]

this is a original code , What I am think first is that I want to change the part "label==label[i]" into label[i] ,but before this I input a onehot-label and I use (label = label @ label.t() > 0)
but I the loss is easy to get a INF and NAN
I have suspicion that it's my problem, so I'm asking you for advice

How to understand the recall@K in your code?

First of all thank you for sharing the code, this is really great work.

I ran the experiment and got good results, but I can't understand the implementation of computational recall @ K in your code. Can you explain it to me? The two bold lines are shown below.

`def recall_k(self, k=1):
m = len(self.sim_mat)
match_counter = 0
for i in range(m):
pos_sim = self.sim_mat[i][self.gallery_labels == self.query_labels[i]]
neg_sim = self.sim_mat[i][self.gallery_labels != self.query_labels[i]]
thresh = np.sort(pos_sim)[-2] if self.is_equal_query else np.max(pos_sim)

        ****if np.sum(neg_sim > thresh) < k:   #  The  lines that I can not understand.
            match_counter += 1**** 

 return float(match_counter) / m`

Thank you!

train.txt and test.txt

First of all, thanks you for a great work!

Can you upload train.txt and test.txt files for training CUB-200-2011 dataset?

pos_pair_ = pos_pair_[pos_pair_ < 1 - epsilon] ?

Hi there, thanks for sharing the code and beautifule work!
In line 35 :
pos_pair_ = pos_pair_[pos_pair_ < 1 - epsilon]
why do we need this code ?
And what's the logic of using the output of avgpooling as the embeddings of network?

NMI results (or trained models)


Do you have the NMI results of your methods in the datasets you experiment on the paper? If not, do you have the trained models?

I was interesting to compare our method with yours in the NMI metric.

Thanks in advance!

Hyperparams for CARS?

First of all thank you for sharing the code, this is really great work.

I am trying to reproduce the results on other datasets (CARS, SOP, In-Shop), both with resnet50 and inception-bn, could you please share the hyperparams that you used? I tried the default ones and a few tweaking but I could not get the numbers in the paper.

Thank you!

Overfitting on "iteration" parameters?

Hi, the unify framework for all knids of paired-loss proposed in the paper is great, while i found that it appeared that the best "test recall" has been actually decided by val_dataset, which refenced to the raw code below:
According to the fig above, "val datatset" actually also plays a role of "test dataset", which means "test dataset" is visible during training.
So does it seems like choosing a "best train iteration" parameter, which is a risk of overfitting on training hyperparameters?
(I have found similar operation in several other papers, and i knew there was a lack of test dataset building the dataset, such as the general protocal "construct query+gallery based on the raw val+test split in DeepFashion")

Cars-196 experiments settings

I cannot see any experiments setting for Cars-196, can you show the yaml config also?

When I use the CUB-200's yaml config to train the Cars-196(followed the paper's training split rule, 98 classes), but only got best R@1=78.4%(the embedding size is 512), much lower than 84.1% in your paper.


Models underfit on highly imbalanced dataset

I trained models with the same loss setting as mentioned in the paper; alpha=2 and beta=50. It seemed like the models can't produce good enough embedding features for the minority class (judging from the visualization with t-SNE), but they do obviously a better job for the majority class which led to poor classification results. I'd like to get some advice on how to adjust the hyperparameters or mining setting of this ms loss to better handle the highly imbalanced dataset (say having class 0 10x more samples than class 1). For additional details, I used the embedding size of 512 and the batch size of 8 (maximum capacity of my GPU because the image size is quite large).

Thanks in advance.

Pytorch implementation of ms-loss

class MultiSimilarityLoss(nn.Module):
    def __init__(self, configer=None):  
        super(MultiSimilarityLoss, self).__init__()
        self.is_norm = True
        self.eps = 0.1
        self.lamb = 1
        self.alpha = 2
        self.beta = 50
    def forward(self, inputs, targets):
        n = inputs.size(0)
        if self.is_norm:
            inputs = inputs / torch.norm(inputs, dim=1, keepdim=True)
        similari_matrix = inputs.matmul(inputs.t())
        mask = targets.expand(n, n).eq(targets.expand(n, n).t())
        loss = None
        for i in range(n):
            temp_sim, temp_mask = similari_matrix[i], mask[i]
            min_ap, max_an = temp_sim[temp_mask].min(), temp_sim[temp_mask==0].max()
            temp_AP = temp_sim[(temp_mask==1) & (temp_sim < max_an + self.eps)]       # may be tensor([])
            temp_AN = temp_sim[(temp_mask==0) & (temp_sim > min_ap - self.eps)]  # torch.sum(tensor([])) = tensor(0.)
            L1 = torch.log(1 + torch.sum(torch.exp(-self.alpha * (temp_AP - self.lamb)))) / self.alpha
            L2 = torch.log(1 + torch.sum(torch.exp(self.beta * (temp_AN - self.lamb)))) / self.beta
            L = L1 + L2
            if loss is None:
                loss = L
                loss += L
        loss /= n

        return loss  



class MultiSimilarityLoss(nn.Module):
    def __init__(self, cfg):
        super(MultiSimilarityLoss, self).__init__()
        self.thresh = 0.5
        self.margin = 0.1
        self.scale_pos = cfg.LOSSES.MULTI_SIMILARITY_LOSS.SCALE_POS
        self.scale_neg = cfg.LOSSES.MULTI_SIMILARITY_LOSS.SCALE_NEG

    def forward(self, feats, labels):
        assert feats.size(0) == labels.size(0), \
            f"feats.size(0): {feats.size(0)} is not equal to labels.size(0): {labels.size(0)}"
        batch_size = feats.size(0)
        sim_mat = torch.matmul(feats, torch.t(feats))

        epsilon = 1e-5
        loss = list()

        for i in range(batch_size):
            pos_pair_ = sim_mat[i][labels == 1]  # 此处修改
            pos_pair_ = pos_pair_[pos_pair_ < 1 - epsilon]
            neg_pair_ = sim_mat[i][labels == 0]  # 此处修改

            neg_pair = neg_pair_[neg_pair_ + self.margin > min(pos_pair_)]
            pos_pair = pos_pair_[pos_pair_ - self.margin < max(neg_pair_)]

            if len(neg_pair) < 1 or len(pos_pair) < 1:

            # weighting step
            pos_loss = 1.0 / self.scale_pos * torch.log(
                1 + torch.sum(torch.exp(-self.scale_pos * (pos_pair - self.thresh))))
            neg_loss = 1.0 / self.scale_neg * torch.log(
                1 + torch.sum(torch.exp(self.scale_neg * (neg_pair - self.thresh))))
            loss.append(pos_loss + neg_loss)

        if len(loss) == 0:
            return torch.zeros([], requires_grad=True)

        loss = sum(loss) / batch_size
        return loss

Use a for loop in loss calculation is a little bit slow.

Use a for loop in loss calculation is a little bit slow.
You can find a way to remove the for loop.
In my case, only pairs on the diagonal are positive, so I remove the for loop as follows.

simi_mat = torch.matmul(y1, torch.t(y2))
simi_sub = simi_mat - ms_gama
pos_pair_sub = torch.unsqueeze(torch.diag(simi_sub), 1)
neg_pair_sub_plus1 = simi_sub
neg_pair_sub_plus1[range(batch_size), range(batch_size)] = 0
pos_loss = torch.log(1 + torch.sum(torch.exp(-ms_alpha * pos_pair_sub), dim = 1)) / ms_alpha
neg_loss = torch.log(torch.sum(torch.exp(ms_beta * neg_pair_sub_plus1), dim = 1)) / ms_beta
loss = torch.mean(pos_loss + neg_loss)

what is the release date of ms-loss?

dear @mscottml ,i have just read the paper "Multi-Similarity Loss with General Pair Weighting
for Deep Metric Learning" .It's a great job I have ever seen in metrics learning.
Would you please release the code and show more details ?Thank you very much.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.