Giter Site home page Giter Site logo

alinlab / csi Goto Github PK

View Code? Open in Web Editor NEW
269.0 7.0 63.0 6.02 MB

CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted Instances (NeurIPS 2020)

Home Page: https://arxiv.org/abs/2007.08176

Python 100.00%
novelty-detection anomaly-detection out-of-distribution-detection contrastive-learning

csi's Introduction

CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted Instances

Official PyTorch implementation of "CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted Instances" (NeurIPS 2020) by Jihoon Tack*, Sangwoo Mo*, Jongheon Jeong, and Jinwoo Shin.

1. Requirements

Environments

Currently, requires following packages

Datasets

For CIFAR, please download the following datasets to ~/data.

For ImageNet-30, please download the following datasets to ~/data.

For Food-101, remove hotdog class to avoid overlap.

2. Training

Currently, all code examples are assuming distributed launch with 4 multi GPUs. To run the code with single GPU, remove -m torch.distributed.launch --nproc_per_node=4.

Unlabeled one-class & multi-class

To train unlabeled one-class & multi-class models in the paper, run this command:

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 train.py --dataset <DATASET> --model <NETWORK> --mode simclr_CSI --shift_trans_type rotation --batch_size 32 --one_class_idx <One-Class-Index>

Option --one_class_idx denotes the in-distribution of one-class training. For multi-class training, set --one_class_idx as None. To run SimCLR simply change --mode to simclr. Total batch size should be 512 = 4 (GPU) * 32 (--batch_size option) * 4 (cardinality of shifted transformation set).

Labeled multi-class

To train labeled multi-class model (confidence calibrated classifier) in the paper, run this command:

# Representation train
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 train.py --dataset <DATASET> --model <NETWORK> --mode sup_simclr_CSI --shift_trans_type rotation --batch_size 32 --epoch 700
# Linear layer train
python train.py --mode sup_CSI_linear --dataset <DATASET> --model <NETWORK> --batch_size 32 --epoch 100 --shift_trans_type rotation --load_path <MODEL_PATH>

To run SupCLR simply change --mode to sup_simclr, sup_linear for representation training and linear layer training respectively. Total batch size should be same as above. Currently only supports rotation for shifted transformation.

3. Evaluation

We provide the checkpoint of the CSI pre-trained model. Download the checkpoint from the following link:

Unlabeled one-class & multi-class

To evaluate my model on unlabeled one-class & multi-class out-of-distribution (OOD) detection setting, run this command:

python eval.py --mode ood_pre --dataset <DATASET> --model <NETWORK> --ood_score CSI --shift_trans_type rotation --print_score --ood_samples 10 --resize_factor 0.54 --resize_fix --one_class_idx <One-Class-Index> --load_path <MODEL_PATH>

Option --one_class_idx denotes the in-distribution of one-class evaluation. For multi-class evaluation, set --one_class_idx as None. The resize_factor & resize fix option fix the cropping size of RandomResizedCrop(). For SimCLR evaluation, change --ood_score to simclr.

Labeled multi-class

To evaluate my model on labeled multi-class accuracy, ECE, OOD detection setting, run this command:

# OOD AUROC
python eval.py --mode ood --ood_score baseline_marginalized --print_score --dataset <DATASET> --model <NETWORK> --shift_trans_type rotation --load_path <MODEL_PATH>
# Accuray & ECE
python eval.py --mode test_marginalized_acc --dataset <DATASET> --model <NETWORK> --shift_trans_type rotation --load_path <MODEL_PATH>

This option is for marginalized inference. For single inference (also used for SupCLR) change --ood_score baseline in first command, and --mode test_acc in second command.

4. Results

Our model achieves the following performance on:

One-Class Out-of-Distribution Detection

Method Dataset AUROC (Mean)
SimCLR CIFAR-10-OC 87.9%
Rot+Trans CIFAR-10-OC 90.0%
CSI (ours) CIFAR-10-OC 94.3%

We only show CIFAR-10 one-class result in this repo. For other setting, please see our paper.

Unlabeled Multi-Class Out-of-Distribution Detection

Method Dataset OOD Dataset AUROC (Mean)
Rot+Trans CIFAR-10 CIFAR-100 82.5%
CSI (ours) CIFAR-10 CIFAR-100 89.3%

We only show CIFAR-10 to CIFAR-100 OOD detection result in this repo. For other OOD dataset results, see our paper.

Labeled Multi-Class Result

Method Dataset OOD Dataset Acc ECE AUROC (Mean)
SupCLR CIFAR-10 CIFAR-100 93.9% 5.54% 88.3%
CSI (ours) CIFAR-10 CIFAR-100 94.8% 4.24% 90.6%
CSI-ensem (ours) CIFAR-10 CIFAR-100 96.0% 3.64% 92.3%

We only show CIFAR-10 with CIFAR-100 as OOD in this repo. For other dataset results, please see our paper.

5. New OOD dataset

We find that current benchmark datasets for OOD detection, are visually far from in-distribution datasets (e.g. CIFAR).

To address this issue, we provide new datasets for OOD detection evaluation: LSUN_fix, ImageNet_fix. See the above figure for the visualization of current benchmark and our dataset.

To generate OOD datasets, run the following codes inside the ./datasets folder:

# ImageNet FIX generation code
python imagenet_fix_preprocess.py 
# LSUN FIX generation code
python lsun_fix_preprocess.py

Citation

@inproceedings{tack2020csi,
  title={CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted Instances},
  author={Jihoon Tack and Sangwoo Mo and Jongheon Jeong and Jinwoo Shin},
  booktitle={Advances in Neural Information Processing Systems},
  year={2020}
}

csi's People

Contributors

jh-jeong avatar jihoontack avatar sangwoomo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

csi's Issues

test score

I am sorry to bother you again! I have some confuse about test score:

we have 10000 test image (for cifar-10), in the code < evals/ood_pre.py >, we got 9000 ood features from 9 ood class in line 92. And we put 19000 features into roc_auc_score() function in line 96.

  1. Why the code compute the score use 19000 features instead 10000? Isn't there 9000 features that are duplicates here?
    (because that 10000 test features already included 9000 features from ood class)
  2. Is that “One_class_real_mean” are the final result? (It is computed from 19000 features.)

the eval's result

I used the pre-trained model cifar_last.model you provided, and then ran the following code:
python eval.py --mode ood_pre --dataset cifar10 --model resnet18 --ood_score CSI --shift_trans_type rotation --print_score --ood_samples 10 --resize_factor 0.54 --resize_fix --one_class_idx 0 --load_path "./checkpoint/cifar_last.model"
and the final result was very terrible:
[one_class_mean CSI 0.4087] [one_class_mean best 0.4087]
What causes this result?

ood_samples parameter and # of samples in Table 11

I am trying to reproduce Table 11 (appendix D).
Do I understand correctly that the ood_samples parameter of eval.py is what you are using to produce the # of samples parameter in the table?
Because surprisingly I find that the ood_samples parameter of eval.py has surprisingly little effect:
CIFAR10 OC

ood_samples mean
1 0.9327
4 0.93647
10 0.93709
40 0.93737

Or do I need to adjust the ood_samples parameter in the train.py?

In Table 11 you also report controlled results, how do I achieve those? I can't find a parameter in the script to adjust this. Or is this the difference between setting --resize_fix and not setting it?

Thank you very much for your help.

training problem

I run the flowing code:
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 train.py --dataset cifar10 --model resnet18 --mode simclr_CSI --shift_trans_type rotation --batch_size 32 --one_class_idx 0
and I meet the error:

RuntimeError: Broken pipe
work = _default_pg.broadcast([tensor], opts)
work = _default_pg.broadcast([tensor], opts)
RuntimeError: Broken pipe
RuntimeError: Broken pipe

I don't know why .

some confusion

Hi, sir. Thanks for your excellent idea!
Now I have some confusion:

  1. The paper said the contrastive loss makes the norm of in-distribution samples relatively larger than OOD samples (P23). But according to the cosine similarity formula, when the cosine similarity of two similar features increases, the norm should become smaller.
  2. You said the code use the inner product in our implementation, which is the product of cosine similarity and norm. But the inner product is not equal to formula(6) from the paper.
  3. I don't understand how use the shift transformations in the simCLR (how to putting formula 3 into formula 2? Is there still have positive pairs in the molecular of formula 1?) and why is formula 4 needed?

I hope you can help me out.
I am looking forward to your reply!

Questions About Multi-Labeled OOD

Are the results from multi-labeled OOD were yielded at the same epoch?
ex) results from cifar10 - svhn, imagenet-resize, lsun-resize, lsun-fix....
or are each of them are from the best AUROC epoch during training?

There might be difference in best epoch for each out-dataset.

Thanks.

How to create, train and evaluate DTD dataset

Hi, would you be so kind to explain how to do the DTD training and evaluation?
In the paper you mention that DTD are the inliers and imagenet30 the outliers. How is the folder structure of "~/data/dtd/" supposed to look like?

For training, do I assume correctly unlabeled multi-class? I.e. CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 train.py --dataset dtd --model resnet18 --mode simclr_CSI --shift_trans_type rotation --batch_size 32 --one_class_idx None

And for evaluation, how do I specify the out-distribution? I only see the "dataset" flag, but I would need to specify in-distribution and out-of-distribution datasets, right?

Thank you again for your help!

Possible further improvements (SimCLR hyperparameters)

We found that our implementation is slightly different from the official SimCLR. One may further improve the performance (of both SimCLR and CSI) by fixing the follows:

  1. Use BN after linear layers and set bias=False for the last linear layer for the projection head.
  2. Use temperature=0.1 for ImageNet-30.

THCudaCheck FAIL file=/pytorch/aten/src/THC/THCCachingHostAllocator.cpp line=278 error=700 : an illegal memory access was encountered

THCudaCheck FAIL file=/pytorch/aten/src/THC/THCCachingHostAllocator.cpp line=278 error=700 : an illegal memory access was encountered
Traceback (most recent call last):
File "train.py", line 38, in
train(P, epoch, model, criterion, optimizer, scheduler_warmup, train_loader, logger=logger, **kwargs)
File "/home/westlake/zhangjunlei/code/auto-ood/training/unsup/simclr_CSI.py", line 57, in train
images_pair = simclr_aug(images_pair) # transform
File "/home/westlake/miniconda3/envs/zjl/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/westlake/miniconda3/envs/zjl/lib/python3.6/site-packages/apex/parallel/distributed.py", line 560, in forward
result = self.module(*inputs, **kwargs)
File "/home/westlake/miniconda3/envs/zjl/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/westlake/miniconda3/envs/zjl/lib/python3.6/site-packages/torch/nn/modules/container.py", line 100, in forward
input = module(input)
File "/home/westlake/miniconda3/envs/zjl/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/westlake/zhangjunlei/code/auto-ood/models/transform_layers.py", line 388, in forward
return inputs * (1 - _mask) + self.transform(inputs) * _mask
File "/home/westlake/zhangjunlei/code/auto-ood/models/transform_layers.py", line 381, in transform
inputs = t(inputs)
File "/home/westlake/zhangjunlei/code/auto-ood/models/transform_layers.py", line 371, in adjust_hsv
return RandomHSVFunction.apply(x, f_h, f_s, f_v)
File "/home/westlake/zhangjunlei/code/auto-ood/models/transform_layers.py", line 396, in forward
x = rgb2hsv(x)
File "/home/westlake/zhangjunlei/code/auto-ood/models/transform_layers.py", line 40, in rgb2hsv
hsv = torch.stack([hue, saturate, value], dim=1)
RuntimeError: cuda runtime error (700) : an illegal memory access was encountered at /pytorch/aten/src/THC/THCCachingHostAllocator.cpp:278
NCCL error in: /pytorch/torch/lib/c10d/../c10d/NCCLUtils.hpp:65, unhandled cuda error, NCCL version 2.4.8

Pytorch: 1.4
CUDA: 10.1
cudnn:7.6.3
python:3.6.2

HellO, I tired to run your code, but I got an error. Could you help me with this?

ImportError: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found

Getting following error when running with 1 GPU:

> CUDA_VISIBLE_DEVICES=0 python  train.py --dataset imagenet  --model resnet18  --mode simclr_CSI --shift_trans_type rotation --batch_size 32 --one_class_idx 0
/home/rashindrie/.local/lib/python3.7/site-packages/torch/utils/data/dataloader.py:481: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  cpuset_checked))
Traceback (most recent call last):
  File "train.py", line 7, in <module>
    from common.train import *
  File "/data/gpfs/projects/punim1193/tils_score_prediction/CSI/common/train.py", line 108, in <module>
    from torchlars import LARS
  File "/home/rashindrie/.local/lib/python3.7/site-packages/torchlars/__init__.py", line 2, in <module>
    from torchlars.lars import LARS
  File "/home/rashindrie/.local/lib/python3.7/site-packages/torchlars/lars.py", line 6, in <module>
    from torchlars._adaptive_lr import compute_adaptive_lr
ImportError: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /home/rashindrie/.local/lib/python3.7/site-packages/torchlars/_adaptive_lr.cpython-37m-x86_64-linux-gnu.so)

Running with 2 GPUs give the following error:

> CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=4 train.py --dataset imagenet  --model resnet18  --mode simclr_CSI --shift_trans_type rotation --batch_size 32 --one_class_idx 0

INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group
INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_jmq0lqp2/none_lvv336sz/attempt_1/0/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_jmq0lqp2/none_lvv336sz/attempt_1/1/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_jmq0lqp2/none_lvv336sz/attempt_1/2/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker3 reply file to: /tmp/torchelastic_jmq0lqp2/none_lvv336sz/attempt_1/3/error.json

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -11) local_rank: 2 (pid: 235841) of binary: /home/rashindrie/.conda/envs/simclr/bin/python
ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed
INFO:torch.distributed.elastic.agent.server.api:[default] Worker group FAILED. 2/3 attempts left; will restart worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Stopping worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result:
  restart_count=2
  master_addr=127.0.0.1
  master_port=29500
  group_rank=0
  group_world_size=1
  local_ranks=[0, 1, 2, 3]
  role_ranks=[0, 1, 2, 3]
  global_ranks=[0, 1, 2, 3]
  role_world_sizes=[4, 4, 4, 4]
  global_world_sizes=[4, 4, 4, 4]

I have also attached a list of packages in my conda environment. I created the conda env following package list at SimCLR.

Any help is appreciated.

Thank you!

Questions about New Dataset

Hello
I am trying to run your code on test dataset 'LSUN_fix'.

However, when I extract the zip file, there comes '.correct_resize.png' along with 'correct_resize_.png'.

Is it okay if I just delete all files that starts with '._'?

Why did you do the rotation transformation firstly and then apply the simlcr_aug?

Hello,Why did you do the rotation transformation firstly and then apply the simlcr_aug? Because you classify the features which transformation is applied after the simclr_aug. It is like Image—> T_transform->simclr_aug->T_transform classification task. This means that, if you changed your T_transform to something like Color change. The transform classification will be confused as the images will be applied the simclr_aug then.

By the way, does it matter to change the orders of T_transform and simclr_aug? Thank you!

CIFAR-100 SuperClasses division used in this repo is different from the standard one

Hello,

The CIFAR100 super-class used here are quite different from the standard division described here
This can be easily seen by comparing to other implementations that follow the standard division, e.g. here or here
As you can see there is shifting between the fine-grained classes inside each super-class

I think this makes results presented here not comparable to other results on the literature and would need to be re-done

GPU requirement for training One-class ImageNet-30

Thanks for the nice work! I have one question about the image size you used in one-class ImageNet-30 setting. I used four 3080ti gpus with a batch-size of 32 for each gpu, raising out of memory error. I assume the actual batch size for each gpu is 32 * 4 (four rotations) * 2 (2 views) = 256. So the input tensor has as size of (256, 3, 224, 224)?

run SimCLR

Hi,

I'm running evaluation of simclr following the command in README.md
python eval.py --mode ood_pre --dataset --model --ood_score simclr --shift_trans_type rotation --print_score --ood_samples 10 --resize_factor 0.54 --resize_fix --one_class_idx --load_path <MODEL_PATH>

Python reports error at
evals/ood_pre.py", line 120, in get_scores
score += (f_sim[shi] * P.axis[shi]).sum(dim=1).max().item() * P.weight_sim[shi]
because P.K_shift is 4 with --shift_trans_type rotation
and P.weight_sim is [1] with --ood_score simclr

It seems simclr should be trained and evaluated with --shift_trans_type option disabled

The hyper parmeter of Rot(resnet18) and Rot+Trans(resnet18)

Can you please tell me the hyper parmeter of your retrained Rot(resnet18) and Rot+Trans(resnet18), such as batch size, learning rate, the weight of rotation-loss and trans-loss in Rot+Trans. Or you can just show me your retrained code. Thanks!

Training CSI using SGD optimizer

I try to reproduce CSI result in your paper. In my server environment, LARS does not work. Could you let me know setting using SGD (ex, learning rate, command line etc)?

about detection score

hi, sir.
the test score function make me confuse.
1.
Eq.(7) utilizing shifting transformations S upon the detection score Eq.(6) that is applicable to any contrastive representation. Given a training data B={x_1, x_2} and set S = {s_0=I, s_1}. then we identify whether x belong to the in-distribution or not.
In Eq.(7), it is not only find the maximum cosine similarity between sim(x_1, x) and sim(x_2, x), but also find the maximum cosine similarity from the transformations of x and the transformations of B (e.g., sim(s_1(x_1), s_1(x)) and sim(s_1(x_2), s_1(x)) ), that make me confuse: why we need the maximum similarity between transformations of x and the transformations of B ? What is the meaning of this?

Eq.(8) utilizing the auxiliary classifier p(yS |x) upon fθ 。what is the output of Eq.(8) ? and I don't understand what does this formula represent ?
Eq.(9) = Eq.(7) + Eq.(8). what does this formula represent ? similarity between test data x and nearest training data(x_1 or x_2)? Do we need to set a threshold for the actual application?

You said that we hypothesize that the SimCLR objective implicitly increases the norm, as it is hard to decrease the Euclidian distance of the features u and v. so, it is just hypothesize ???
the paper said :"We suspect that increasing the norm may be an easier way to maximize cosine similarity between two vectors". but, in fact the cosine similarity should be decrease when we increasing the norm?

Is custom one-class learning possible?

Hello,

Thank you for sharing hard work.

I am currently trying to perform one-class learning with my own dataset.(with no ground truth)
Is the 'Unlabeled one-class' training method you guided available to train my own custom one-class dataset?
I am facing difficulties and can not figure out whether it is my mistake or it wasn't for custom one-class dataset from the first.

Thank you.

about cosine similarity

hi, sir.

In your paper , both training model and test model are used cosine similarity. But in your code, you only used the inner product.

  1. In training model, there https://github.com/alinlab/CSI/blob/60742b60a16501350eca823fcc910ddd10f7a379/training/contrastive_loss.py#L21 only use torch.mm() to calculate the inner product?
    why use the inner product to training model instead the cosine similarity?

  2. You said in other issues that the code use the inner product to calculate the score , which is the product of cosine similarity and norm.
    Because cosine similarity sim(z, z′) := z · z′ / ||z|| ||z′||, so, the product of cosine similarity and norm should be
    sim(x, x_m))·||x||·||x_m||.

    But, CSI computes the maximum cosine similarity over the training samples and the norm of the representation(e.g., max_m sim(x, x_m) · ||x|| .)
    if we followed the this (the inner product to calculate the score , which is the product of cosine similarity and norm.), test score will be max_m sim(x, x_m) · ||x|| ||x_m||. that make me confuse.

I hope you can help me out.
I am looking forward to your reply!

Reproducing results for Cifar100 ens multi-class

Hi

I tried to rerun your code for cifar100 multi-class with the following command:

CUDA_VISIBLE_DEVICES=4,5,6,7 python -m torch.distributed.launch --nproc_per_node=4 train.py --dataset 'cifar100' --model 'resnet18' --mode sup_simclr_CSI --shift_trans_type rotation --batch_size 32 --epoch 700

But I got very poor results. Can you possibly provide me with a checkpoint for the same config if you have already trained any?

thanks

The result of using the checkpoint of unlabeled ImageNet-30

Hello!

I use the checkpoint of unlabeled ImageNet-30, I use the following command
python eval.py --mode ood_pre --dataset imagenet --model resnet18_imagenet --ood_score CSI --shift_trans_type rotation --print_score --ood_samples 10 --resize_factor 0.54 --resize_fix --load_path imagenet30_unlabeled.model
The result of this experiment is the following.
[cub best 0.8613] , [stanford_dogs best 0.8144] , [flowers102 best 0.9472] , [places365 best 0.7900] , [food_101 best 0.8802], [caltech_256 best 0.8720],
[dtd best 0.9696] , [pets best 0.8474]
In datasets except CUB-200 and Dogs, the result is the same as what is reported in the paper. In CUB-200 and Dogs, the AUROC is different from what is reported in the paper.
In the paper, the AUROC of CUB-200 is 90.5±0.1, and the AUROC of Dogs is 97.1±0.1.
What am I doing wrong?
I'm sorry to trouble you, but could I ask you to answer this question?

how to identify

I don't understand how your scripts identify whether the test samples belong to the in-distribution or not. It seems taking the feature of training set into consideration when testing,which make me confused

cannot download imagenet-30

I cannot download the ImageNet-30 dataset. It says

Too many users have viewed or downloaded this file recently. Please try accessing the file again later. If the file you are trying to access is particularly large or is shared with many people, it may take up to 24 hours to be able to view or download the file. If you still can't access a file after 24 hours, contact your domain administrator.

*** RuntimeError: cuda runtime error (700) : an illegal memory access was encountered at /pytorch/aten/src/THC/THCReduceAll.cuh:327

class RandomColorGrayLayer(nn.Module):
def init(self, p):
super(RandomColorGrayLayer, self).init()
self.prob = p

    _weight = torch.tensor([[0.299, 0.587, 0.114]])
    self.register_buffer('_weight', _weight.view(1, 3, 1, 1))

def forward(self, inputs, aug_index=None):
    pdb.set_trace()
    if aug_index == 0:
        return inputs
    pdb.set_trace()

---->ERROR LINE l = F.conv2d(inputs, self._weight)

    gray = torch.cat([l, l, l], dim=1)

    if aug_index is None:
        _prob = inputs.new_full((inputs.size(0),), self.prob)
        _mask = torch.bernoulli(_prob).view(-1, 1, 1, 1)

        gray = inputs * (1 - _mask) + gray * _mask

    return gray

pytorch 1.4.0
cuda 10.1
cudnn 7.6.3
apex 1.0
When I run the eval.py with single a single V100 GPU, I encountered this error. I found when I run the F,conv2d((inputs, self._weight), this error occered. And I make sure the devices of inputs and the self._weight is the same one. Do you know how to solve it?

batch size

Hello,
You said that "Total batch size should be 512 = 4 (GPU) * 32 (--batch_size option) * 4 (cardinality of shifted transformation set)". but, I set batch_size=32 and the different number of GPU = 2 or 3. Then, Total batch size always equal to batch_size *2(data augmentation)*4(shifted transformation) no mater what number the GPU set to.

How to define the joint_labels

joint_labels = torch.cat([labels + P.n_classes * i for i in range(4)], dim=0)
I do not understand what is the meaning of this code.

CSI/training/unsup/simclr_CSI.py

"unsup/simclr_CSI.py" is the code for unlabeled multi-class. But in this code, you use the labeles.

outputs_aux['penultimate'] = torch.cat([penul_1, penul_2]) # only use original rotation
### Linear evaluation ###
outputs_linear_eval = linear(outputs_aux['penultimate'].detach())
loss_linear = criterion(outputs_linear_eval, labels.repeat(2))
linear_optim.zero_grad()
loss_linear.backward()
linear_optim.step()
losses['cls'].update(0, batch_size)
losses['sim'].update(loss_sim.item(), batch_size)
losses['shift'].update(loss_shift.item(), batch_size)

Hence, I think the result with this code is wrong.
We apologize for the inconvenience, but we would appreciate your answer.

Why you add a normalize layer on the head of resnet instead of using the normalize augment

Hi, Thank you for your excellent work. I noticed that you add a normalize layer on the head of resnet:
`class NormalizeLayer(nn.Module):
"""
In order to certify radii in original coordinates rather than standardized coordinates, we
add the Gaussian noise before standardizing, which is why we have standardization be the first
layer of the classifier rather than as a part of preprocessing as is typical.
"""

def __init__(self):
    super(NormalizeLayer, self).__init__()

def forward(self, inputs):
    return (inputs - 0.5) / 0.5`

` def penultimate(self, x, all_features=False):
out_list = []

    out = self.normalize(x)
    out = self.conv1(out)
    out = self.bn1(out)
    out = F.relu(out)
    out_list.append(out)

    out = self.layer1(out)
    out_list.append(out)
    out = self.layer2(out)
    out_list.append(out)
    out = self.layer3(out)
    out_list.append(out)
    out = self.layer4(out)
    out_list.append(out)

    out = F.avg_pool2d(out, 4)
    out = out.view(out.size(0), -1)

    if all_features:
        return out, out_list
    else:
        return out`

And normally we add a normalize augment in the dataloader like:
` normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])

train_dataset = datasets.ImageFolder(
    traindir,
    transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        normalize,
    ]))

`
But I noticed that you did not use the normalize in the data augment . Furthermore, they are totally different(normalize vs normalize layer in your code)

Reopen: Training vanilla SimCLR

#16
Hi, I run the following code.
for training:
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch --nproc_per_node=4 train.py --dataset cifar10 --model resnet18 --mode simclr --shift_trans_type none --batch_size 128 --one_class_idx 0 --optimizer adam --suffix new
for evaluation:
python3 eval.py --mode ood_pre --dataset cifar10 --model resnet18 --ood_score simclr --shift_trans_type none --print_score --ood_samples 10 --resize_factor 0.54 --resize_fix --one_class_idx 0 --load_path <load_path>

And I got a different result.
image

I think it should be 87.9 (in the paper, Table 7a). It may be lower due to the optimizer, but it is too low.

Is there any problem with my command?
Could you provide sample codes to reproduce the Table 7a result?
Thanks

ImportError: libtorch_cpu.so: cannot open shared object file: No such file or directory

CUDA version: 10.1
pytorch version: 1.4
torchvision version: 0.5
platform: ubuntu

Traceback (most recent call last):
File "train.py", line 6, in
from common.train import *
File "/home/westlake/zhangjunlei/code/CSI-master/common/train.py", line 110, in
from torchlars import LARS
File "/home/westlake/miniconda3/envs/zjl/lib/python3.6/site-packages/torchlars/init.py", line 2, in
from torchlars.lars import LARS
File "/home/westlake/miniconda3/envs/zjl/lib/python3.6/site-packages/torchlars/lars.py", line 6, in
from torchlars._adaptive_lr import compute_adaptive_lr
ImportError: libtorch_cpu.so: cannot open shared object file: No such file or directory

Hello, Thank you for your excellent work. But I met the above issues when I tired to import this package. Do you know how to solve it?

the problem of the score and the feature

1、In your paper , the score is related of the cosine similarity and the norm of the feature. But in your code, you don't use the cosine similarity to calculate the score.
2、in your code , features are extracted after the projection head, but the simclr's feature is extracted after the resnet.
Look forward to your reply

Reproducing results

I use the following command python3 -m torch.distributed.launch --nproc_per_node 4 train.py --dataset cifar10 --model resnet18 --mode simclr_CSI --shift_trans_type rotation --batch_size 32 --one_class_idx 0 on a 4 GPU machine. Do I understand correctly, that this should yield comparable results to Table 1a "plane" CSI (ours) of 90%?

I have let it run a couple of times now, getting an average of 86%. What am I doing wrong?

Also, how do I interpret the result output [one_class_mean clean_norm 0.8478] [one_class_mean similar 0.6925] [one_class_mean best 0.8478]? Is clean_norm the norm (z) by itself? And similar the l2 to the closest training point? And best looks to me like the best of these two results, shouldn't it be the product?

Error while running the training script.

Hi Folks,
In Transform_layer.py in RandomResizedCropLayer class, in the forward method,
output = F.adaptive_avg_pool2d(output, self.size)

Here the self. size is (32,32,2) and this throws an error.
For adaptive_avg_pool2d we should use the size of 2d right?
I am wondering whether the self.size should be (32,32) or am I missing something?
It would be great if you could shed some clarity on this.

GPU requirement for training ImageNet model

Hi
What is your suggested GPU spec. to train ImageNet model?
I am using V100 GPU, but still running out of memory with batch size set to 16 on single GPU

Below is the error message
File "train.py", line 37, in
train(P, epoch, model, criterion, optimizer, scheduler_warmup, train_loader, logger=logger, **kwargs)
File "CSI/training/unsup/simclr_CSI.py", line 59, in train
_, outputs_aux = model(images_pair, simclr=True, penultimate=True, shift=True)

(omitted)

RuntimeError: CUDA out of memory. Tried to allocate 784.00 MiB (GPU 0; 31.75 GiB total capacity; 30.12 GiB already allocated; 405.75 MiB free; 30.14 GiB reserved in total by PyTorch)

Train and evaluate vanilla SimCLR

I want to reproduce the experiment with the vanilla SimCLR in the paper. (Table 7)
I followed the instructions in the main page;

I run the below command to train vanilla SimCLR. (Since I have trouble in installing torchlars package, I changed the optimizer adam)
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch --nproc_per_node=4 train.py --dataset cifar10 --model resnet18 --mode simclr --shift_trans_type rotation --batch_size 32 --one_class_idx 0 --optimizer adam

After that, I tried the evaluation command.

python3 eval.py --mode ood_pre --dataset cifar10 --model resnet18 --ood_score simclr --shift_trans_type rotation --print_score --ood_samples 10 --resize_factor 0.54 --resize_fix --one_class_idx 0 --load_path {LOAD_PATH}

But I got errors.
Here are detailed logs.

Pre-compute global statistics...
axis size: 5000 5000 5000 5000
weight_sim: 1.0000
weight_shi: 0.0000
Pre-compute features...
Compute OOD scores... (score: simclr)
Traceback (most recent call last):
File "eval.py", line 23, in
train_loader=train_loader, simclr_aug=simclr_aug)
File "/home/hyun78/aya/CSI/evals/ood_pre.py", line 84, in eval_ood_detection
scores_id = get_scores(P, feats_id, ood_score).numpy()
File "/home/hyun78/aya/CSI/evals/ood_pre.py", line 121, in get_scores
score += (f_sim[shi] * P.axis[shi]).sum(dim=1).max().item() * P.weight_sim[shi]
IndexError: list index out of range

+) I think the vanilla version of SimCLR should be the same as the original SimCLR paper, but your training codes include shift layers. I don't make sure it is okay to include the additional backward steps (line74~80 in training/unsup/simclr.py) and shift layers(line100 in common/train.py).

baseline code?

Hi, congrats on the interesting work!
Do you have the code for the baseline methods in the paper, e.g., Rot[25]?

Thanks in advance.

Some questions about Supervised_NT_xent

Excuse me, I recently paid attention to this paper.

when I use the Supervised_NT_xent loss, I find that there are some question maybe.

In SupCLR paper, when calculate the loss, the positive pair is (i, j), where label_i is the same as label_j, and the pair (i, i) is not regarded as positive pair, even though label_i must be the same as label_i.

However, when i use the Supervised_NT_xent loss from your code, and calculate Mask, I notice that Mask[i,i] is not zero. Therefore, the pair(i, i) will also be regarded as positive pair to calculate loss.

Mask = torch.eq(labels, labels.t()).float().to(device)
#Mask = eye * torch.stack([labels == labels[i] for i in range(labels.size(0))]).float().to(device)
Mask = Mask / (Mask.sum(dim=1, keepdim=True) + eps)

Maybe line 72 should be
Mask = torch.eq(labels, labels.t()).float().to(device) * (1 - eye)

I have some questions about it. May I trouble you to answer it?
Looking forward to your reply!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.