Giter Site home page Giter Site logo

annusha / temperature_schedules Goto Github PK

View Code? Open in Web Editor NEW
17.0 2.0 2.0 8.69 MB

Temperature Schedules for self-supervised contrastive methods on long-tail data (ICLR'23)

Home Page: https://openreview.net/forum?id=ejHUr4nfHhD

Python 63.37% Shell 0.61% Jupyter Notebook 36.02%
contrastive-learning long-tail self-supervised-learning temperature-analysis

temperature_schedules's Introduction

Temperature Schedules for self-supervised contrastive methods on long-tail data

ICLR paper

pipeline

Introduction

Most approaches for self-supervised learning (SSL) are optimised on curated balanced datasets, e.g. ImageNet, despite the fact that natural data usually exhibits long-tail distributions. In this paper, we analyse the behaviour of one of the most popular variants of SSL, i.e. contrastive methods, on long-tail data. In particular, we investigate the role of the temperature parameter τ in the contrastive loss, by analysing the loss through the lens of average distance maximisation, and find that a large τ emphasises group-wise discrimination, whereas a small τ leads to a higher degree of instance discrimination. While τ has thus far been treated exclusively as a constant hyperparameter, in this work, we propose to employ a dynamic τ and show that a simple cosine schedule can yield significant improvements in the learnt representations. Such a schedule results in a constant `task switching' between an emphasis on instance discrimination and group-wise discrimination and thereby ensures that the model learns both group-wise features, as well as instance-specific details. Since frequent classes benefit from the former, while infrequent classes require the latter, we find this method to consistently improve separation between the classes in long-tail data without any additional computational cost.

Environment

Requirements:

pytorch 1.7.1 
opencv-python
scikit-learn 
matplotlib

Recommend installation cmds (linux)

conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=10.2 -c pytorch # change cuda version according to hardware
pip install opencv-python
conda install -c conda-forge scikit-learn matplotlib

Pretrained models downloading

CIFAR10: pretrained models

CIFAR100: pretrained models

Imagenet100: pretrained models

Evaluation of pretrained models

CIFAR10: evaluation

CIFAR100: evaluation

Imagenet100: evaluation

Pretrain models

SimCLR with cosine temperature schedule

CIFAR10

torchrun --standalone --nnodes=1 --nproc_per_node=1 train_simCLR.py simclr_TS \
--batch_size 512 \
--optimizer sgd \
--lr 0.5 \
--model res18 \
--scheduler cosine \
--epochs 2000 \
--seed 42 \
--output_ch 128 \
--num_workers 8 \
--adj_tau cos \
--temperature_min 0.1 \
--temperature_max 1.0 \  
--t_max 200 \
--trainSplit cifar10_imbSub_with_subsets/split1_D_i.npy \
--split_idx 1 \
--save-dir ./checkpoints/cifar10-LT/ 

CIFAR100

torchrun --standalone --nnodes=1 --nproc_per_node=1 train_simCLR.py simclr_TS \
--dataset cifar100 \
--batch_size 512 \
--optimizer sgd \
--lr 0.5 \
--model res18 \
--scheduler cosine \
--epochs 2000 \
--seed 42 \
--output_ch 128 \
--num_workers 8 \
--adj_tau cos \
--temperature_min 0.1 \
--temperature_max 1.0 \  
--t_max 200 \
--trainSplit cifar100_imbSub_with_subsets/cifar100_split1_D_i.npy \
--split_idx 1 \
--save-dir ./checkpoints/cifar100-LT/ 

ImangeNet100

torchrun --standalone --nnodes=1 --nproc_per_node=1 train_simCLR.py simclr_TS \
--dataset imagenet-100 \
--batch_size 256 \
--optimizer sgd \
--lr 0.5 \
--model res50 \
--scheduler cosine \
--epochs 800 \
--seed 42 \
--output_ch 128 \
--num_workers 8 \
--adj_tau cos \
--temperature_min 0.1 \
--temperature_max 1.0 \  
--t_max 200 \
--imagenetCustomSplit imageNet_100_LT_train \
--split_idx 0 \
--save-dir ./checkpoints/imagenet100-LT/ \
--data ./datasets/ILSVRC2012/

Citation

@inproceedings{
kukleva2023temperature,
title={Temperature Schedules for self-supervised contrastive methods on long-tail data},
author={Anna Kukleva and Moritz Böhle and Bernt Schiele and Hilde Kuehne and Christian Rupprecht},
booktitle={ICLR},
year={2023},
url={https://openreview.net/forum?id=ejHUr4nfHhD}
}

Thanks

SimCLR part is based on SDCLR repo
MoCo will be released soon

temperature_schedules's People

Contributors

annusha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

temperature_schedules's Issues

Repository link in the paper is incorrect

Hello Anna, just fyi, the current code implementation url in the paper (both the ICLR and arxiv version) redirects to here instead of this repository, resulting with 404 page not found.

Experiments of fig.3

Hi, I am very interested in the visualization experiment of Fig. 3. Could you please release the code for drawing this picture?

Description of splits + some kNN questions

The repository contains some splits on datasets. Can you provide with a bit more information about those?

For example, I guess i stands for imbalanced and b for balanced. Also, I guess split1,2,3,4,5 correspond to 5 randomized splits from the same dataset. I don’t know what D and S are for. I guess the imbalance ratio is 100 as stated in the paper.

Some further reproducibility questions:

  1. Regarding the KNN experiments, did you collect statistics on the balanced or imbalanced training set?
  2. The oracle experiments presented in figure 1 are super interesting! what was the number of neighbors used in that case? Also, same to the question above.
  3. For kNN, to you use embeddings from the hypersphere or from the outputs of the backbone model (before the projector)?

Reproducing experiments of fig. 3

Hello there again,

I am trying to reproduce the experiments of figure 3 in my own codebase, but I have not been successful in reproducing the results of the case of assigning temperatures according to whether a sample belongs in the head (\tau=1.0) or tail (\tau=0.1). Is there anything I need to take special consideration about? Can you recommend me how I can reproduce these experiments in the current code base?

Let me know if I get the following correct:

  • The temperature for the infoNCE loss is selected according to the anchor’s status in the head/tail.
  • Head anchor samples get a higher temperature than tail anchor samples.
  • The selected temperature influences the denominator of the loss. Does it also influence the nominator? I paid attention to the last section of the appendix, and I didn’t understand at the end if you are using the selected/oracle temperature for the positive pairs as well.
  • Use MoCO to accumulate negative representations from a momentum network.
  • Use batch normalization in the projection head.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.