sthalles / simclr Goto Github PK

PyTorch implementation of SimCLR: A Simple Framework for Contrastive Learning of Visual Representations

Home Page: https://sthalles.github.io/simple-self-supervised-learning/

License: MIT License

Python 28.98% Jupyter Notebook 71.02%

machine-learning deep-learning representation-learning pytorch-implementation pytorch torchvision unsupervised-learning contrastive-loss simclr

simclr's Introduction

PyTorch SimCLR: A Simple Framework for Contrastive Learning of Visual Representations

Blog post with full documentation: Exploring SimCLR: A Simple Framework for Contrastive Learning of Visual Representations

Installation

$ conda env create --name simclr --file env.yml
$ conda activate simclr
$ python run.py

Config file

Before running SimCLR, make sure you choose the correct running configurations. You can change the running configurations by passing keyword arguments to the run.py file.

$ python run.py -data ./datasets --dataset-name stl10 --log-every-n-steps 100 --epochs 100

If you want to run it on CPU (for debugging purposes) use the --disable-cuda option.

For 16-bit precision GPU training, there NO need to to install NVIDIA apex. Just use the --fp16_precision flag and this implementation will use Pytorch built in AMP training.

Feature Evaluation

Feature evaluation is done using a linear model protocol.

First, we learned features using SimCLR on the STL10 unsupervised set. Then, we train a linear classifier on top of the frozen features from SimCLR. The linear model is trained on features extracted from the STL10 train set and evaluated on the STL10 test set.

Check the notebook for reproducibility.

Note that SimCLR benefits from longer training.

Linear Classification	Dataset	Feature Extractor	Architecture	Feature dimensionality	Projection Head dimensionality	Epochs	Top1 %
Logistic Regression (Adam)	STL10	SimCLR	ResNet-18	512	128	100	74.45
Logistic Regression (Adam)	CIFAR10	SimCLR	ResNet-18	512	128	100	69.82
Logistic Regression (Adam)	STL10	SimCLR	ResNet-50	2048	128	50	70.075

simclr's People

Contributors

Stargazers

Watchers

Forkers

twistedmove sethglazier israrbacha pckuo nguyenducnhaty paulxiong wylcasia mohammadjavadd wanfang13 clarabing marvin521 min9kwak szrayic cloughurd yunzeman cfld ziyu-deep jawaechan alessiamarcolini jpatrickpark khuongnd nii4u shiyongde hopeliu20160622 leule jackxu0 km3888 yizhang-unifr happenwah rpand002 davidchou2017 zzweng poeroz mahayat yonischirris xiaolaodi yongduek mpskex salarim tsaxena kylepula rbozydar makama-md gigglingragdoll megayeye darkmir caddyless sumethy userkkw gleb-t msank00 trisct shivamsaboo17 robot-ai-machinelearning zlapp sgulyano guillaumejs2403 329tyson glenguo06 m-kuoch csj2112 sccu duguyue100 justhungryman zywia joshuaas tor4z qianrenjian dreadlord1984 imadcat agaier decoder996 tikzoxs erezbeyond katherine-h pycern violet998 983632847 1170500804 n-askarian chang-github-00 tkhkaeio josephelhachem ninatu won-bae raegher kingfou tuskaw staffd dawncc giannisdaras tarunn2799 suquark youtang1993 cooparation eddypn mansiagarwal11 yidingjiang yuvenduan hynekdav

simclr's Issues

why download and extract model files raise err？

thanks ！

Whether there is evaluation code that runs locally

Dear researcher,
Thank you for the open-source code you provided,it is of great help to me for the understanding the SimCLR.
Your code is perfect,But I want to ask whether there is evaluation code that runs locally without the google colab,or I can how to amend the code make the eval code just in the local because I can't have the google coLab in the China. I hope you can give me some tips if you are free.
Thanks!
Chen He.

Issue with batch-size

In function info_nce_loss, the line 28, creates labels based on batch_size and on other side we have STL10 dataset which has 100,000 images which is divisible by batch_size of 32 and having batch_size like 128 or 64 gives a remainder of 32.

Having batch_size != 32, causes error in line 42, because the similarity matrix will based on features and labels will be based on batch size.

For instance, if the batch size = 128, the remaining images in the dataset in the last iter of data_loader is 32. Since we create two variant of each image we'll have 64 images. Now we have 128 x 2 = 256 labels from line 28, and we'll have similarity matrix of (64 x 128, 128 x 64) => (64 x 64) but with mask (256 x 256) causing "dimension mismatch"

Solution:
Change Line 28 as below

labels = torch.cat([torch.arange(features.shape[0]//2) for i in range(self.args.n_views)], dim=0)

I can't run this error in simclr.py,how to solve it.I'm going crazy.

this is error:
Files already downloaded and verified
0%| | 0/390 [00:06<?, ?it/s]
Traceback (most recent call last):
File "run.py", line 90, in
main()
File "run.py", line 86, in main
simclr.train(train_loader)
File "D:\SimCLR-master\simclr.py", line 71, in train
for images, _ in tqdm(train_loader):
File "D:\miniconda3\envs\SimCLR-Han\lib\site-packages\tqdm\std.py", line 1195, in iter
for obj in iterable:
File "D:\miniconda3\envs\SimCLR-Han\lib\site-packages\torch\utils\data\dataloader.py", line 352, in iter
return self._get_iterator()
File "D:\miniconda3\envs\SimCLR-Han\lib\site-packages\torch\utils\data\dataloader.py", line 294, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "D:\miniconda3\envs\SimCLR-Han\lib\site-packages\torch\utils\data\dataloader.py", line 801, in init
w.start()
File "D:\miniconda3\envs\SimCLR-Han\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "D:\miniconda3\envs\SimCLR-Han\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "D:\miniconda3\envs\SimCLR-Han\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "D:\miniconda3\envs\SimCLR-Han\lib\multiprocessing\popen_spawn_win32.py", line 89, in init
reduction.dump(process_obj, to_child)
File "D:\miniconda3\envs\SimCLR-Han\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
OSError: [Errno 22] Invalid argument

About loss function

Hi,

Thank you for the great work and I am trying to use your code in 3D patches. I separately input two paired datasets, which contain the domain difference and didn't use the data augmentation. I have extracted the representation using an encoder. However, the contrastive loss calculation is zero. Is there any steps that I haven't done to run the code successfully?

Thanks a lot!

Loss function

Hi.
Thanks for your great work. But I have a little confusion. You implement the contrastive loss by Cross-Entropy Loss without softmax function. So the negatives actually didn't work but only positives.

Conda requirements broken

Running the command

$ conda create --name simclr python=3.7 --file requirements.txt

gives the following error:

Collecting package metadata (current_repodata.json): done
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

  - pyasn1==0.4.8=pypi_0
  - grpcio==1.27.2=pypi_0
  - google-auth==1.11.3=pypi_0
  - idna==2.9=pypi_0
  - google-auth-oauthlib==0.4.1=pypi_0
  - tensorboard==2.1.1=pypi_0
  - requests-oauthlib==1.3.0=pypi_0
  - requests==2.23.0=pypi_0
  - markdown==3.2.1=pypi_0
  - pyyaml==5.3=pypi_0
  - cachetools==4.0.0=pypi_0
  - werkzeug==1.0.0=pypi_0
  - absl-py==0.9.0=pypi_0
  - pytorch==1.4.0=py3.7_cuda10.1.243_cudnn7.6.3_0
  - oauthlib==3.1.0=pypi_0
  - urllib3==1.25.8=pypi_0
  - pyasn1-modules==0.2.8=pypi_0
  - protobuf==3.11.3=pypi_0
  - chardet==3.0.4=pypi_0
  - rsa==4.0=pypi_0
  - torchvision==0.5.0=py37_cu101

Current channels:

  - https://conda.anaconda.org/conda-forge/linux-64
  - https://conda.anaconda.org/conda-forge/noarch
  - https://repo.anaconda.com/pkgs/main/linux-64
  - https://repo.anaconda.com/pkgs/main/noarch
  - https://repo.anaconda.com/pkgs/r/linux-64
  - https://repo.anaconda.com/pkgs/r/noarch

To search for alternate channels that may provide the conda package you're
looking for, navigate to

    https://anaconda.org

and use the search bar at the top of the page.

..

GPU utilization rate is low

Hi, thanks for the code!

When I tried to run it on single GPU (v-100), the utilizaiton rate is very low (~0-10%) even if I increase num_worker. Would you know why this happens and how to solve it? Thanks!

Info NCE loss

Hi, may I ask how you were able to calculate the info nce loss in this work? I am confused on the methodology as it is quite different from the code of the authors.

You are returning labels of all 0 because you only want to calculate negative labels. However in this code here, you used the logits for both the negative samples and the positive sample (I'm assuming this is the augmented counterpart of the image). May I ask the reasoning for this kind of implementation?

SimCLR/simclr.py

Lines 51 to 55 in 1848fc9

    
           logits = torch.cat([positives, negatives], dim=1) 
        
           labels = torch.zeros(logits.shape[0], dtype=torch.long).to(self.args.device) 
        
           logits = logits / self.args.temperature 
        
           return logits, labels

P.S.: I am still at loss currently on how you were able to simplify the code to just calculating only the negative samples. Hopefully this can be clarified in your reply. Thank you!

NT_Xent Loss function: all negatives are not being used?

Hi @sthalles , Thank you for sharing your code!

Pl correct me if I am wrong:
I see that in line loss/nt_xent.py line 57 (below) you are not computing contrastive loss for all negative pairs as you are reshaping total negatives in 2D array i.e. only a part of negative pairs are being used for a single positive pair, right? :

_negatives = similarity_matrix[self.mask_samples_from_same_repr].view(2 * self.batch_size, -1)_
_logits = torch.cat((positives, negatives), dim=1)_

Hope to hear from you soon.

-Ishan

Trainning on multi-GPUs?

How to run the code on multi GPUs?

Confusion matrix

Does anyone know how to add the confusion matrix in this code? After I added it according to the online one, something went wrong. I don't know what went wrong in my code.I can't solve it. please help help me! Thanks.
def confusion_matrix(output, labels, conf_matrix):

preds = torch.argmax(output, dim=-1)
for p, t in zip(preds, labels):
    conf_matrix[p, t] += 1
return conf_matrix

Run with error

Hi,

Thank you for the great work. Follow README, I have a problem while running the code.

No module named 'torch.cuda.amp'

Does env.yml need update? I get the below error :
No module named 'torch.cuda.amp'

Training loss reduction during the training?

How fast it got reduced? also on which scale?

Is it something wrong with the training model for CIFAR-10 experiments?

Hi,

I find that the ResNet20 model for CIFAR-10 experiments is not fully correct. The head conv structure should be modified (stride=1 and no pooling,) because the image size of CIFAR-10 is very small.

Should assert n_views == 2?

Thanks for your excellent implementation! I'd like to confirm that N_VIEW == 2 as in the paper and the default args in the code. If N_VIEW > 2, with logits.shape = (N_VIEW x N, N_VIEW x N - 1)

SimCLR/simclr.py

Line 51 in 1848fc9

logits = torch.cat([positives, negatives], dim=1)

, N_VIEW x N - 1 contains at least one more positive pairs (except the one indexed with 0) which will be treated as negative pairs. @sthalles @alessiamarcolini @butyuhao

training log

Hi,
Do you still have the log of pre-training? I want to know how the loss changes for every epoch and the accuracy of the positive examples in each batch

Calculating acc in training

Hi, I have a question about calculating an Accuracy in training mode of SimCLR model. How is it working, or how is it possible, that you can compute accuracy when you are training on data without labels? Thanks a lot!

Does this code only run the part without label

Does this code only run the part without label, and the code running the part with label is not in it

ModuleNotFoundError: No module named 'torch.cuda'

I am using pythion 3.7 on Win10, Anaconda Jupyter. I have successfully installed torch-1.10.0+cu113 torchaudio-0.10.0+cu113 torchvision-0.11.1+cu113.
When trying to import torch , I get ModuleNotFoundError: No module named 'torch.cuda'
Detailed error:

ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-1-bfd2c657fa76> in <module>
      1 import numpy as np
      2 import pandas as pd
----> 3 import torch
      4 import torch.nn as nn
      5 from sklearn.model_selection import train_test_split

~\AppData\Roaming\Python\Python38\site-packages\torch\__init__.py in <module>
    603 
    604 # Shared memory manager needs to know the exact location of manager executable
--> 605 _C._initExtension(manager_path())
    606 del manager_path
    607 

ModuleNotFoundError: No module named 'torch.cuda'

I found posts for similar error No module named 'torch.cuda.amp'. However, any of the suggested solutions worked. Please advise.

evaluation code batch_size & validation process

I'm really appreciated about your good work :)
I left a question because I got confused while studying through your great code.

First, I wonder why you used "batch_size=batch_size*2" differently from train_loader in the test_loader part of the file "mini_batch_logistic_regression_valuator.ipynb". Is it related to creating 2 views when doing data augmentation?

Also, in the last cell of this file, I'm confused whether the second "for" (of the two "for") in the large epoch "for" statement corresponds to the test process or the validation process. I thought it was a test process, because loss update, backpropagation, optimization, etc. were done only in the first "for", and the second yield only accuracy, but is that right? Or I'm confused if the second "for" is a validating process because the first "for" and the second "for" are going together in the entire epoch processing.

Why cos_sim after L2 norm?

Hi, This code is really useful for me. Thanks!
But I got a question about the NT-Xent loss. I noticed that you use L2 norm on z and then use cos_similarity after that. But cos_similarity already contain the function of l2 norm. Why use L2 norm first?

Question about CE Loss

Hello,

Thanks for sharing the code, nice implementation.

The way you calculate the loss by using a mask is quite brilliant. But I have a question.

logits = torch.cat((positives, negatives), dim=1)
So if I'm not wrong, the first column of logits is positive and the rest are negatives.

labels = torch.zeros(2 * self.batch_size).to(self.device).long()
But your labels are all zeros, which means no matter positive or negative, the similarity should low.

So I wonder is the first column of labels supposed to be 1 instead of 0.

Thanks for your help.

NT_Xent Loss function: all negatives are not being used in multi-gpu case ?

@sthalles
Hello, i saw there have the similar issue before, the problem is i haven't saw any torch.gather API to collect the negative terms crossing all gpu in loss function calculation.
So, is that means this repo does not support the multi-gpu yet ?

Any suggestion will be appreciated !!

'CosineAnnealingLR' never works with the wrong position of 'scheduler.step()'

Considering the setting in 'scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=len(train_loader), eta_min=0, last_epoch=-1)'，I think 'scheduler.step()' should be called every step in 'for (xis,xjs),_ in train_loader'. Otherwise，lr will nerver change until 'len(train_loader)' epochs but not steps

Validation Loss calculation

First of all, thank you for your great work!

Method _validate in simclr.py will raise ZeroDivisionError at line 148 if the validation data loader performs only one iteration (since counter starts from 0).

Run experiments on ImageNet

Hi,
Thanks for your nice work.
I am planning to run SimCLR on ImageNet dataset. I wonder if I need to adjust the structure of network or add some tricks. For example increasing the dimension of output used to calculating losses, which in your code is 64. Or directly change the dataset to ImageNet and keep other configuration the same?
I'll appreciate any advice.

Permission Denied with the model download link

Hi sthalles,
Thanks for your great implementation. When I run your linear_feature_eval.ipynb, there is one error about the model download link, which is:
Permission denied: https://drive.google.com/uc?id=1LjuZ1RmhotrnugprRQc2Exk0EbQHMJhL
Maybe you need to change permission over 'Anyone with the link'?
unzip: cannot find or open Mar14_05-52-52_thallessilva, Mar14_05-52-52_thallessilva.zip or Mar14_05-52-52_thallessilva.ZIP.
Could you change the link download permission? Thanks a lot.

keyword arguments to the run.py file

for smooth execution use run.py use following command

BEFORE : $ python run.py -data ./datasets --dataset-name stl10 --log-every-n-steps 100 --epochs 100
AFTER : $ python run.py -data ./datasets -dataset-name stl10 --log-every-n-steps 100 --epochs 100

Or else you can make change in run.py at line number 16 from parser.add_argument('-dataset-name', default='stl10',help='dataset name', choices=['stl10', 'cifar10']) to parser.add_argument('--dataset-name', default='stl10', help='dataset name', choices=['stl10', 'cifar10'])

No module named 'torch.cuda.amp'

When I run "python run.py", there is an error as No module named 'torch.cuda.amp'. Does anyone know how to address it?

About the code "images = torch.cat(images, dim=0)"

hi~ the "images" here is a tensor, but the first parameter in cat() is a tensor sequence. I don't understand what's going on here. Could you please explain it, thank you very much.

No upscale in image augmentation?

The SimCLR paper says:

In this work, we sequentially apply three simple augmentations: random
cropping followed by resize back to the original size, random color distortions, and random Gaussian blur

but it seems like the augmentations used in this repository first do a random crop, but do not afterwards resize the crop back to the original size. Why the difference? Am I misunderstanding the SimCLR paper?

A question about the "labels"

Hi! I have a question about the definition of "labels" in the script "simclr.py".

On line 54 of "simclr.py", the authors defined:

labels = torch.zeros(logits.shape[0], dtype=torch.long).to(self.args.device)

So all the entries of "labels" are all zeros. But I think according to the paper, there should be an entry as 1 for the positive pair?

Thanks in advance for your reply!

Loading pretrained model weights

The code uses state_dict = torch.load for pretrained model but I was not able to get it to use pretrained weights for resnets. Any suggestions?

Loss function and optimizer

Hi Thalles,
I went through the code and find two things I can't understand:
(1) in code "labels = torch.zeros(2 * self.batch_size).to(self.device).long()", in nt_xent.py, seems the label is constantly 1. So the label is not used since its all 0?
(2) is adam + "scheduler = torch.optim.lr_scheduler.CosineAnnealingLR" same as the LARS optimizer?

Thanks in advance.

how to do downstream task for image classification

Hi, is there any example how to do fine tuning / downstream task for classification by adding fc after the backbone ?

thanks

Why is there no validation loss?

Hi, thanks very much for providing the code and framework!

Can I check why the code does not monitor loss evolution for a separate validation set? I see from the closed issues that there used to be one, but it seems to have been removed in latest. I'm sure validation loss should still be monitored to ensure model learns proper features right?

Thanks!
Michael

Calculation of the similarity

Hi! Thank you for your great work!
I'm a bit curious here about how you calculated the cosine similarity.
The code just put the similarity calculation with similarity_matrix = torch.matmul(features, features.T).

size of tensors in cosine_simiarity function

Hi , I'm trying to understand the code in :
loss/nt_xent.py

we are sending "representations" on both arguments

    def forward(self, zis, zjs):
        representations = torch.cat([zjs, zis], dim=0)
        similarity_matrix = self.similarity_function(representations, representations)

But when receiving it in cosine_similarity func somehow the sizes are:
(N, 1, C) and y shape: (1, 2N, C), how can it be double if you sent the same argument

    def _cosine_simililarity(self, x, y):
        # x shape: (N, 1, C)
        # y shape: (1, 2N, C)
        # v shape: (N, 2N)
        v = self._cosine_similarity(x.unsqueeze(1), y.unsqueeze(0))
        return v

Thanks for your help.

What is the proper range of NT xent loss when the network converges?

After 80 epoch,my nt xent loss is about 5.I am not sure whether the network has learnt well.

How do i train the SimCLR model with my local dataset?

Dear researcher,
Thank you for the open-source code you provided, it is of great help to me for understanding contrastive learning.
But I still have some confusion when training the SimCLR model with my local dataset, could you give me some guidance or tips? I would appreciate it if you could reply to this issue.

Similarity matrix shape does not match the shape of the mask

Hello,

I was currently testing the implementation when an error occured: The shape of the mask [512, 512] at index 0 does not match the shape of the indexed tensor [2, 2] at index 0.
My batch size is 256.

The error occurs in this part of the code:
similarity_matrix = torch.matmul(features, features.T)
mask = torch.eye(labels.shape[0], dtype=torch.bool).to(device)
labels = labels[~mask].view(labels.shape[0], -1)
similarity_matrix = similarity_matrix[~mask].view(similarity_matrix.shape[0], -1)

I'm wondering if this something I'm doing wrong and how do I match the shape of tensors?

Thanks in advance!

Review Training | Fine-Tune | Test details

Hi, I just want to check all the experiments details and make sure I didn't miss any part(?

Training Phase : use SimCLR (two encoder branches) to train on ImageNet for 1000 epochs to get a init pretrained weights.
Fine-Tuned : load the init pretrained weights on the resnet18(50/101/...) with freezed parameters and concate with a linear classifier, and train the classifier with CIFAR10/STL10 training dataset for 100 epochs.
Test Phase : freeze all the encoder, classifier parameters, and test on the CIFAR10/STL10 testing dataset.

Is this the way how you get the top1 acc in the README?

Thanks!

About learning rate schedule

Hi, Thalles.

SimCLR/run.py

Line 79 in 1848fc9

    
           scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=len(train_loader), eta_min=0,

Shouldn't T_max=args.epochs instead of T_max=len(train_loader) since learning rate schedule happens at every epoch?
Thanks.

	logits = torch.cat([positives, negatives], dim=1)
	labels = torch.zeros(logits.shape[0], dtype=torch.long).to(self.args.device)

	logits = logits / self.args.temperature
	return logits, labels