Giter Site home page Giter Site logo

simsiam's People

Contributors

btwardow avatar flocf avatar leodesigner avatar patrickhua avatar rjean avatar youqingxiaozhua avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

simsiam's Issues

SyncBatchNorm

To the best of my knowledge, SyncBatchnorm is only supported with DDP not DataParallel.

Please add license

First of all, I want to thank you for this great project! I am a phD student in the field of Deep Learning and would really like to include your implementation in my experiments. Unfortunately, what stops me from doing so is that you did not provide a license yet. Would it be possible for you to add a license for this project such as the MIT license? I would greatly appreciate that and, of course, properly cite your work.

Two sloved problems during evalution:

1.RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same.
Code in linear_eval.py line 50 and 59 should add .to(device) :
model = get_backbone(args.backbone).to(args.device)

2.RuntimeError: main thread is not in main loop
Tcl_AsyncDelete: async handler deleted by the wrong thread
In plot_logger.py add:
import matplotlib
matplotlib.use('Agg')
before:
import matplotlib.pyplot as plt

Some small mistakes

Thanks for your work. I thank the "nn.BatchNorm1d(hidden_dim)" in line 42 in the models/simsiam.py should be changed to "nn.BatchNorm1d(out_dim)".

A strange error in the transform class: 'tuple' object is not callable

Traceback (most recent call last):
File "main.py", line 94, in
main(args=get_args())
File "main.py", line 68, in main
for idx, ((images1, images2), _) in enumerate(p_bar):
File "/mnt/users/miniconda3/envs/simsiam/lib/python3.8/site-packages/tqdm/std.py", line 1129, in iter
for obj in iterable:
File "/mnt/users/miniconda3/envs/simsiam/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 435, in next
data = self._next_data()
File "/mnt/users/miniconda3/envs/simsiam/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 475, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/mnt/users/miniconda3/envs/simsiam/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/mnt/users/miniconda3/envs/simsiam/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/mnt/users/miniconda3/envs/simsiam/lib/python3.8/site-packages/torch/utils/data/dataset.py", line 272, in getitem
return self.dataset[self.indices[idx]]
File "/mnt/users/miniconda3/envs/simsiam/lib/python3.8/site-packages/torchvision/datasets/cifar.py", line 120, in getitem
img = self.transform(img)
File "/mnt/users/03_simsiam/augmentations/simsiam_aug.py", line 25, in call
x1 = self.transform(x)
File "/mnt/users/miniconda3/envs/simsiam/lib/python3.8/site-packages/torchvision/transforms/transforms.py", line 68, in call
img = t(img)
TypeError: 'tuple' object is not callable


I met this TypeError when I run the commad:
python main.py --debug --dataset cifar10 --data_dir my/data/folder/ --output_dir ./outputs
And I print the type of x in the code ' x1 = self.transform(x)' ,which may cause this error :
<class 'PIL.Image.Image'>

SWAV

please implement swav

Looking for Phd positions ...

Previously I was working on my graduate school application.

I know there are several bugs in the code and I feel so guilty not able to fix it. Now that I've finished all my applicaitons, I will make the repo better!

Does anyone know labs/professors that are looking for PhD studuents? My research interest is self-supervised learning(apparently). If you happen to know such a position, please contact me!

AttributeError and NotImplementedError

Hello,

While running SimCLR, I get the error "AttributeError: Can not find momentum in namespace. Please write momentum in your config file(xxx.yaml)!"

Also, running BYOL produces the error
" File "/content/gdrive/My Drive/Competitor/SimSiam/models/byol.py", line 75, in init
raise NotImplementedError('Please put update_moving_average to training')
NotImplementedError: Please put update_moving_average to training"

Are there anywork arounds to this issue?

Thank you.

For SimSiam, should be evaluation after backbone or encoder?

I checked the linear_eval.py and noticed only the backbone is imported during the evaluation; this should be correct for SimCl. However, I think the learned representation in SimSiam should be the p after the encoder, which includes the backbone and projector.
This might be the reason that others claim performance doesn't meet the original paper.

Implementation of Stopgrad

May I a ask you a question, how did you implment stop grad? I viewed the code, but I didn't find it. Thank you!

Normalization for different datasets not implemented?

First of all, thanks for providing this pytorch implementation.

If we look into the augmentations for each of the models (SimSiam, BYOL, etc), it seems that it is using the ImageNet dataset's mean & std dev, regardless of whether you're training on CIFAR10 or CIFAR100 or others. (

imagenet_mean_std = [[0.485, 0.456, 0.406],[0.229, 0.224, 0.225]]
)

Is my understanding correct and should this implementation be corrected?

A mistake caused by batch norm layers

Initially I used the z1, z2 = encoder(torch.cat([x1, x2])).chunk(2) to replace the twice forwarding in simsiam. However I realized the output can not be aligned with the original implementation in the paper. Here's a simplified example:

import torch, torchvision
x1 = torch.randn((2,3,224,224))
x2 = torch.randn_like(x1)
encoder = torchvision.models.resnet50()

z1, z2 = encoder(x1), encoder(x2)
print(z1,z2)
z1, z2 = encoder(torch.cat([x1, x2])).chunk(2)
print(z1,z2)

This gives different outputs for z1 and z2.

Then I disabled the bn using eval():

encoder.eval()
z1, z2 = encoder(x1), encoder(x2)
print(z1,z2)
z1, z2 = encoder(torch.cat([x1, x2])).chunk(2)
print(z1,z2)

The outputs are the same now!

Can't achieve the accuracy in the paper with cifar10

I use the kNN classification as a monitor during training. As shown in Figure D.1 in paper, the accuracy is about 60% in the beginning and finally achieve 90%. I can't achieve this accuracy and just achieve a very low accuracy with the parameter mentioned in the paper.

If anyone can achieve the results in the paper, thank you very much for sharing some experimental details.

--resume not implemented

Need to resume training.
Followed example in linear_eval.py

but getting:

Resuming model from outputs/custom_small_resnet18/simsiam-custom_small-epoch300.pth
Epoch 0/500:   0%|                                                                                                                                   | 0/1263 [00:02<?, ?it/s]
Training:   0%|                                                                                                                                       | 0/500 [00:02<?, ?it/s]
Traceback (most recent call last):
  File "main.py", line 139, in <module>
    main(args=get_args())
  File "main.py", line 101, in main
    loss = model.forward(images1.to(args.device), images2.to(args.device))
TypeError: forward() takes 2 positional arguments but 3 were given

How to properly load checkpoint and resume training ?
Thanks !

Loss collapse

I am trying to pretrain the SimSiam model on mscoco dataset.. but the loss collapses to -1 very quickly.. What are the possible reasons behind and some suggestions to solve the same?

Attribute not found for Resenet

Hi,

I installed the required version of torch and torchvision but still got "torch.nn.modules.module.ModuleAttributeError: 'ResNet' object has no attribute 'output_dim'".

image

模型崩塌

我用simsiam,Resnet18作为主干,在cifar10上,batchsize 128,基础lr0.06,warmup 50epoch,在30个epoch,loss降为-1,准确率只有10%,而前五个epoch准确率在30左右,后面loss下降,但是准确率降低,可能是什么原因呢

How can i use the simSiam to do classification tasks?

  1. can i add a classification loss in each branch or one of the branch? will the result be good?
  2. if the p and z have different dimensions, how can i change same to have same dimension and then calculate the D(p,z) loss?

Default settings of gaussian blur described in the paper is unclear

Color augmentation is ColorJitter with {brightness, contrast, saturation, hue} strength of {0.4, 0.4, 0.4, 0.1} with an applying probability of 0.8, and RandomGrayscale with an applying probability of 0.2. Blurring augmentation [8] has a Gaussian kernel with std in [0.1, 2.0].

They didn't say the probability of gaussian blur. It's just doesn't make sense to have gaussian blur on both augmentations. Because in training the model only sees blurred images, but in testing, the blury effect is removed. This will definetely hurt the generalization ability of this model. I will use the default gaussian blur probability in simclr instead!

A solved problem during parallel training.

When I use multi-gpu for training.
An error accured.

Traceback (most recent call last): File "main.py", line 73, in <module> main(args=get_args()) File "main.py", line 51, in main loss = model.forward(images1.to(args.device), images2.to(args.device)) File "/data/fengry/anaconda3/envs/pytorch17/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 161, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/data/fengry/anaconda3/envs/pytorch17/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 171, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/data/fengry/anaconda3/envs/pytorch17/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply output.reraise() File "/data/fengry/anaconda3/envs/pytorch17/lib/python3.8/site-packages/torch/_utils.py", line 428, in reraise raise self.exc_type(msg) AssertionError: Caught AssertionError in replica 0 on device 0. Original Traceback (most recent call last): File "/data/fengry/anaconda3/envs/pytorch17/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker output = module(*input, **kwargs) File "/data/fengry/anaconda3/envs/pytorch17/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/data1/fengry/vcm/comfea/MySimSiam-0.1.0/model.py", line 94, in forward z1, z2 = f(x1), f(x2) File "/data/fengry/anaconda3/envs/pytorch17/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/data/fengry/anaconda3/envs/pytorch17/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward input = module(input) File "/data/fengry/anaconda3/envs/pytorch17/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/data/fengry/anaconda3/envs/pytorch17/lib/python3.8/site-packages/torchvision/models/resnet.py", line 220, in forward return self._forward_impl(x) File "/data/fengry/anaconda3/envs/pytorch17/lib/python3.8/site-packages/torchvision/models/resnet.py", line 204, in _forward_impl x = self.bn1(x) File "/data/fengry/anaconda3/envs/pytorch17/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/data/fengry/anaconda3/envs/pytorch17/lib/python3.8/site-packages/torch/nn/modules/batchnorm.py", line 519, in forward world_size = torch.distributed.get_world_size(process_group) File "/data/fengry/anaconda3/envs/pytorch17/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 625, in get_world_size return _get_group_size(group) File "/data/fengry/anaconda3/envs/pytorch17/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 220, in _get_group_size _check_default_pg() File "/data/fengry/anaconda3/envs/pytorch17/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 210, in _check_default_pg assert _default_pg is not None, \ AssertionError: Default process group is not initialized

Then in main.py, I add
torch.distributed.init_process_group('gloo', init_method='file:///tmp/somefile', rank=0, world_size=1)
before
if torch.cuda.device_count() > 1: model = torch.nn.SyncBatchNorm.convert_sync_batchnorm(model)
, and add
loss = loss.mean()
before
loss.backward().

Everything goes well now.

performance on simclr

Hi, @PatrickHua , thank you for your implementation.
I ran the code of simclr with default parameters, except for setting the momentum as 0.9, since it's missing in the simclr_cifar.yaml.
After 100 epochs, I got a 55.57% acc. It seems much lower than that in the paper. Whts's yours? Are there something wrong with my settings?

Strange errors when running cifar_experiment.sh

The OS is Ubuntu 18.04. The environment is in the conda environment as indicated with all required dependencies in requirements.txt installed.

The script in the debug mode runs well. However, when I ran:

sh configs/cifar_experiment.sh

A strange error happened during the evaluation time:

Training: 100%|██████████| 800/800 [6:17:14<00:00, 28.29s/it, epoch=799, loss_avg=-.878]

Evaluating:   0%|          | 0/30 [00:00<?, ?it/s]Model saved to outputs/cifar10_experiment/simsiam-cifar10-epoch800.pth
Files already downloaded and verified

Evaluating:   0%|          | 0/30 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "main.py", line 116, in <module>
    main(args=get_args())
  File "main.py", line 113, in main
    linear_eval(args, backbone)
  File "/home/yl764/SimSiam/SimSiam/linear_eval.py", line 109, in main
    feature = model(images.to(args.device))
  File "/home/yl764/miniconda3/envs/simsiam/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/yl764/miniconda3/envs/simsiam/lib/python3.8/site-packages/torchvision/models/resnet.py", line 220, in forward
    return self._forward_impl(x)
  File "/home/yl764/miniconda3/envs/simsiam/lib/python3.8/site-packages/torchvision/models/resnet.py", line 203, in _forward_impl
    x = self.conv1(x)
  File "/home/yl764/miniconda3/envs/simsiam/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/yl764/miniconda3/envs/simsiam/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 423, in forward
    return self._conv_forward(input, self.weight)
  File "/home/yl764/miniconda3/envs/simsiam/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 419, in _conv_forward
    return F.conv2d(input, weight, self.bias, self.stride,
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same

Thanks!

Is "DistributedSampler" necessary?

Hello,
I find that there is no "DistributedSampler" in the code.
Is it a normal setting?
I think this setting would make the model run the same data twice (or the number of GPU) in a single epoch (because the shuffle is True.).
I'm not sure if this is normal.
Thank you very much.

Backbone setting

In the SimSiam,the backbone like ResNet50 was used.
Does the backbone Resnet50 as an encoder include the final fc layer? or stop at the global average pooling layer?
I found that the whole network as the encoder in the models.simsiam.py file.

loss clsoe to -1 at the begining of training

Has anyone met the problem that the loss close to -1 at the beginning of training? BTW, the training data is not sourced from traditional classification data like Imagenet or cifar.

There should be a stop gradient in the simsiam model

Hello,

I was using your implementation of SimSiam for contrastive learning. I noticed that the model that you have created has a few problems:

  1. The "stop_gradient" part of the network is absent from your implementation. This model is effectively training both the path.

Could you please clarify how and where you are taking care of it?

Consumes a lot GPU memory than standard?

Hi, I used a ResNet34 backbone to train on (1, 128, 128) images with a batch size of 128. The total allocated memory is >35GB. According to the post, a ResNet50 on (3,256,256) images with a batch size of 96 only consumes 10GB. I am wondering if anyone else experiences the same issue and if there is any clue as to why this network takes such a lot of memory.

Do cifar_resnet_1.py and cifar_resnet_2.py are the same?

Dear author:

 Do the following cifar_resnet_1.py and cifar_resnet_2.py are the same? If not, what's the difference between them?

   + https://github.com/PatrickHua/SimSiam/blob/main/models/backbones/cifar_resnet_1.py
   + https://github.com/PatrickHua/SimSiam/blob/main/models/backbones/cifar_resnet_2.py

And are they the same as the network used in MoCo_cifar_10_demo ?
    + https://colab.research.google.com/github/facebookresearch/moco/blob/colab-notebook/colab/moco_cifar10_demo.ipynb

The code error in logger

Hi all, I am doing research for SimSiam with this implementation.
The framework of this backbone is really good and easy to expend, so I tried to run the SimSiam on ImageNet and CUB200 datasets.
However, to reproduce the experimental results in the paper, I had to run the model with resnet50/CUB200 with batch size =256, and I got an OOM problem.
Then I ran the model on two GPUs, but the code complains that there are some errors.
image
Can anyone help me out with this issue?
The code works fine when I only use one GPU
Thanks.
The command I used like:
CUDA_VISICE_DEVICES=0,1 python /root/SimSiam/main.py --data_dir /root/CUB_200_2011/ --log_dir ../logs/ -c configs/simsiam_cub200.yaml --ckpt_dir ~/.cache --hide_progress

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.