Giter Site home page Giter Site logo

deit's Introduction

Data-Efficient architectures and training for Image classification

This repository contains PyTorch evaluation code, training code and pretrained models for the following papers:

DeiT Data-Efficient Image Transformers, ICML 2021 [bib]
@InProceedings{pmlr-v139-touvron21a,
  title =     {Training data-efficient image transformers & distillation through attention},
  author =    {Touvron, Hugo and Cord, Matthieu and Douze, Matthijs and Massa, Francisco and Sablayrolles, Alexandre and Jegou, Herve},
  booktitle = {International Conference on Machine Learning},
  pages =     {10347--10357},
  year =      {2021},
  volume =    {139},
  month =     {July}
}
CaiT (Going deeper with Image Transformers), ICCV 2021 [bib]
@InProceedings{Touvron_2021_ICCV,
    author    = {Touvron, Hugo and Cord, Matthieu and Sablayrolles, Alexandre and Synnaeve, Gabriel and J\'egou, Herv\'e},
    title     = {Going Deeper With Image Transformers},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {32-42}
}
ResMLP (ResMLP: Feedforward networks for image classification with data-efficient training), TPAMI 2022 [bib]
@article{touvron2021resmlp,
  title={ResMLP: Feedforward networks for image classification with data-efficient training},
  author={Hugo Touvron and Piotr Bojanowski and Mathilde Caron and Matthieu Cord and Alaaeldin El-Nouby and Edouard Grave and Gautier Izacard and Armand Joulin and Gabriel Synnaeve and Jakob Verbeek and Herv'e J'egou},
  journal={arXiv preprint arXiv:2105.03404},
  year={2021},
}
PatchConvnet (Augmenting Convolutional networks with attention-based aggregation) [bib]
@article{touvron2021patchconvnet,
  title={Augmenting Convolutional networks with attention-based aggregation},
  author={Hugo Touvron and Matthieu Cord and Alaaeldin El-Nouby and Piotr Bojanowski and Armand Joulin and Gabriel Synnaeve and Jakob Verbeek and Herve Jegou},
  journal={arXiv preprint arXiv:2112.13692},
  year={2021},
}
3Things (Three things everyone should know about Vision Transformers), ECCV 2022 [bib]
@article{Touvron2022ThreeTE,
  title={Three things everyone should know about Vision Transformers},
  author={Hugo Touvron and Matthieu Cord and Alaaeldin El-Nouby and Jakob Verbeek and Herve Jegou},
  journal={arXiv preprint arXiv:2203.09795},
  year={2022},
}
DeiT III (DeiT III: Revenge of the ViT), ECCV 2022 [bib]
@article{Touvron2022DeiTIR,
  title={DeiT III: Revenge of the ViT},
  author={Hugo Touvron and Matthieu Cord and Herve Jegou},
  journal={arXiv preprint arXiv:2204.07118},
  year={2022},
}
Cosub (Co-training 2L Submodels for Visual Recognition), CVPR 2023 [bib]
@article{Touvron2022Cotraining2S,
  title={Co-training 2L Submodels for Visual Recognition},
  author={Hugo Touvron and Matthieu Cord and Maxime Oquab and Piotr Bojanowski and Jakob Verbeek and Herv'e J'egou},
  journal={arXiv preprint arXiv:2212.04884},
  year={2022},
}
If you find this repository useful, please consider giving a star โญ and cite the relevant papers.

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Contributing

We actively welcome your pull requests! Please see CONTRIBUTING.md and CODE_OF_CONDUCT.md for more info.

deit's People

Contributors

bhheo avatar bigfootjon avatar changlin31 avatar dependabot[bot] avatar developer0hye avatar fabfish avatar fmassa avatar jegou avatar kozistr avatar lmk123568 avatar lsch0lz avatar maxwell-aladago avatar mdouze avatar michaelmonashev avatar sanjaydatasciencedojo avatar touvronhugo avatar wzk1015 avatar yazdanimehdi avatar zhiyuanchen avatar zsef123 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deit's Issues

Fine-Tuning

Hi,
Could you add some instructions on how to fine-tune the pretrained model?

Thanks in advance

Image throughput numbers

What do the image / sec throughput numbers represent (train, inferences, batch size, mixed-prc or float32, etc)? They are lower than any inference numbers I'm familiar with for any of the listed models. They also don't seem to match expected training throughputs and have an odd spread (smallest to largest models), being quite low for the smaller models (CPU bound?).

I don't spend much time with V100, but relative to Titan RTX and RTX 3090 I have a fairly good idea where the numbers should fall...

Thanks

Training Tiny Deit, only get 59% top-one acc.

Hi, I try to train Tiny Deit on ImageNet with
ython -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --model deit_tiny_patch16_224 --batch-size 256 --data-path /path/to/imagenet --output_dir /path/to/save.

All hyperparameters are default. But I only get 59% top-one acc. Am I missing something important? Could you help me?
I upload training log 'log.txt'
log.txt

best wish.

Dataset Size

I'm trying out the code with a custom dataset that has about 8k training images and 550 validation images. Is it enough for this method ?

Reproducibility issue

Thanks for great work.
I tried to reproduce learning of tiny, small and base models using your code. The only parameter I had to change is batch size due to resource constraints. I successfully reproduced results for small and tiny networks but have big accuracy drop for base.

I got:
Tiny - Acc@1 72.172 Acc@5 91.188 loss 1.222 Max accuracy: 72.31% (batch size 256 per gpu, 8 gpus)
Small - Acc@1 79.786 Acc@5 95.008 loss 0.880 Max accuracy: 79.84% (batch size 144 per gpu, 8 gpus)
Base - Acc@1 78.568 Acc@5 93.966 loss 1.048 Max accuracy: 78.78% (batch size 60 per gpu, 8 gpus)

Deit-BS

On learning curve I see that base model starts overfitting. Base model has higher test loss and lower train loss than small model. Could be smaller batch size the reason of such big drop of accuracy (-3 % Acc@1)? Did you reproduce results of base model with this code and training parameters?

No weight decay for distillation token

Hi

I have read distillation code and I think there is some error

As I know, no_weight_decay function of timm vision transformer doesn't covers the distillation token

https://github.com/rwightman/pytorch-image-models/blob/f8463b8fa9c0490db093b36acfce71fa2363b8c3/timm/models/vision_transformer.py#L254-L256

So, no_weight_decay function has to be overrided in DistilledVisionTransformer

@torch.jit.ignore
def no_weight_decay(self):
    return {'pos_embed', 'cls_token', 'dist_token'}

RuntimeError: No shared folder available

Traceback (most recent call last):
File "run_with_submitit.py", line 130, in
main()
File "run_with_submitit.py", line 89, in main
args.job_dir = get_shared_folder() / "%j"
File "run_with_submitit.py", line 40, in get_shared_folder
raise RuntimeError("No shared folder available")
RuntimeError: No shared folder available

Can't reproduce the accuracy

Hello, when just valuate the ImageNet ILSVRC2012 test set with your provide model weight, I
run the command

CUDA_VISIBLE_DEVICES=4, python main.py --eval --resume https://dl.fbaipublicfiles.com/deit/deit_base_patch16_224-b5f2ef4d.pth --data-path ~/Dataset/ILSVRC2012/ --batch-size 256

and get the output as following:

Not using distributed mode
Namespace(aa='rand-m9-mstd0.5-inc1', batch_size=256, clip_grad=None, color_jitter=0.4, cooldown_epochs=10, cutmix=1.0, cutmix_minmax=None, data_path='/home/PengZhiliang/Dataset/ILSVRC2012/', data_set='IMNET', decay_epochs=30, decay_rate=0.1, device='cuda', dist_url='env://', distributed=False, drop=0.0, drop_block=None, drop_path=0.1, epochs=300, eval=True, inat_category='name', input_size=224, lr=0.0005, lr_noise=None, lr_noise_pct=0.67, lr_noise_std=1.0, min_lr=1e-05, mixup=0.8, mixup_mode='batch', mixup_prob=1.0, mixup_switch_prob=0.5, model='deit_base_patch16_224', model_ema=True, model_ema_decay=0.99996, model_ema_force_cpu=False, momentum=0.9, num_workers=10, opt='adamw', opt_betas=None, opt_eps=1e-08, output_dir='', patience_epochs=10, pin_mem=True, recount=1, remode='pixel', repeated_aug=True, reprob=0.25, resplit=False, resume='https://dl.fbaipublicfiles.com/deit/deit_base_patch16_224-b5f2ef4d.pth', sched='cosine', seed=0, smoothing=0.1, start_epoch=0, train_interpolation='bicubic', warmup_epochs=5, warmup_lr=1e-06, weight_decay=0.05, world_size=1)
Creating model: deit_base_patch16_224
number of params: 86567656
Test:  [  0/131]  eta: 0:55:44  loss: 0.5442 (0.5442)  acc1: 89.0625 (89.0625)  acc5: 97.9167 (97.9167)  time: 25.5341  data: 5.6634  max mem: 3764
Test:  [ 10/131]  eta: 0:06:50  loss: 0.7311 (0.7458)  acc1: 82.5521 (83.2623)  acc5: 96.6146 (96.5672)  time: 3.3911  data: 0.6908  max mem: 3765
Test:  [ 20/131]  eta: 0:04:16  loss: 0.6279 (0.6319)  acc1: 86.9792 (86.8180)  acc5: 97.1354 (97.0982)  time: 1.1467  data: 0.0969  max mem: 3765
Test:  [ 30/131]  eta: 0:03:01  loss: 0.6306 (0.6680)  acc1: 86.7188 (85.7275)  acc5: 97.1354 (96.8834)  time: 0.9225  data: 0.0003  max mem: 3765
Test:  [ 40/131]  eta: 0:02:19  loss: 0.7467 (0.6843)  acc1: 82.8125 (85.1880)  acc5: 96.8750 (96.9957)  time: 0.7303  data: 0.0003  max mem: 3765
Test:  [ 50/131]  eta: 0:01:51  loss: 0.6383 (0.6821)  acc1: 84.1146 (85.1563)  acc5: 97.6562 (97.0537)  time: 0.7351  data: 0.0003  max mem: 3765
Test:  [ 60/131]  eta: 0:01:30  loss: 0.8259 (0.7335)  acc1: 80.7292 (83.9737)  acc5: 95.0521 (96.4566)  time: 0.7368  data: 0.0003  max mem: 3765
Test:  [ 70/131]  eta: 0:01:13  loss: 1.0689 (0.7899)  acc1: 75.0000 (82.4604)  acc5: 93.2292 (95.9067)  time: 0.7361  data: 0.0003  max mem: 3765
Test:  [ 80/131]  eta: 0:00:58  loss: 1.0258 (0.8079)  acc1: 77.0833 (82.2499)  acc5: 92.9688 (95.6340)  time: 0.7379  data: 0.0002  max mem: 3765
Test:  [ 90/131]  eta: 0:00:45  loss: 0.9900 (0.8380)  acc1: 79.6875 (81.4618)  acc5: 92.9688 (95.3383)  time: 0.7396  data: 0.0002  max mem: 3765
Test:  [100/131]  eta: 0:00:32  loss: 1.0648 (0.8557)  acc1: 75.2604 (81.1237)  acc5: 92.4479 (95.1140)  time: 0.7379  data: 0.0002  max mem: 3765
Test:  [110/131]  eta: 0:00:21  loss: 1.0434 (0.8747)  acc1: 77.8646 (80.7057)  acc5: 92.4479 (94.9324)  time: 0.7389  data: 0.0002  max mem: 3765
Test:  [120/131]  eta: 0:00:11  loss: 0.9864 (0.8857)  acc1: 78.1250 (80.3891)  acc5: 92.9688 (94.8390)  time: 0.7830  data: 0.0001  max mem: 3765
Test:  [130/131]  eta: 0:00:01  loss: 0.9252 (0.8872)  acc1: 78.6458 (80.4440)  acc5: 95.3125 (94.8820)  time: 0.9149  data: 0.0001  max mem: 3765
Test: Total time: 0:02:13 (1.0171 s / it)
* Acc@1 80.444 Acc@5 94.882 loss 0.887
Accuracy of the network on the 50000 test images: 80.4%

The accuracy rate is only 80.4, which is 1.4 lower than the 81.8 you reported. And I can guarantee that the code has not been modified.

And the conda environment is:

Package           Version
----------------- -------------------
certifi           2020.12.5
mkl-fft           1.2.0
mkl-random        1.1.1
mkl-service       2.3.0
numpy             1.19.2
olefile           0.46
Pillow            8.0.1
pip               20.3.3
setuptools        51.0.0.post20201207
six               1.15.0
timm              0.3.2
torch             1.7.1
torch-summary     1.4.5
torchvision       0.8.2
typing-extensions 3.7.4.3
wheel             0.36.2

Whether it is on TITAN RTX, 2080Ti or 3090, the accuracy rate is only 80.4.
Similarly, the accuracy of deit_small_patch16_224 and deit_tiny_patch16_224 are lower than your reported.

Unable to download pretrained weights

Hi!

Thanks for the great resource!
I'm trying to work with your pretrained models, but getting the following error:
image

Any chance you're familiar with this and know how to solve it?
I made sure my PyTorch version is indeed 1.7.1.

results on cifar100?

Very interesting paper.
Can I replicate cifar 10 or cifar 100 using this code base?

Best Wishes

Can't replicate the validation results

Hi, I'm trying to validate DeiT on the ImageNet validation set and I can't get the same accuracy values as you reported. A launch of python main.py --eval --resume https://dl.fbaipublicfiles.com/deit/deit_base_patch16_224-b5f2ef4d.pth --data-path /path/to/imagenet gives 80.985% top1 accuracy, while it should be 81.846 according to the tutorial in the README. the timm version is 0.3.2 as it should be. If the code of DeiT is correct then there's only one place for mistakes -- the ImageNet dataset. The class names in the validation folder look as follows 000 001 ... 999, which means they are sorted in numerical order. Probably something is wrong with the names. Here's the val_log.txt file. Have you guys encountered a similar issue? Thanx

fail to reproduce accuracy of deit-s

Hi, I follow the training command:

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --model deit_small_patch16_224 --batch-size 256 --data-path /path/to/imagenet

and get the final results:

Acc@1 74.852 Acc@5 91.862 loss 1.143
Max accuracy: 75.19%
Training time: 2 days, 15:49:59

fail to reproduce 79.8% reported in the paper. Is there any further adjustments I need to do to reproduce the 79.8% result?

Thanks!

Question about experiment results in paper

Hi,

Thanks for release the code of this great work. I found that in table 6 of the paper that DeiT-B trained on 224x224 images achieved top1 acc of 81.8, which ViT-B/16 achieved top1 acc 77.9(trained on 384x384 images). Do DeiT-B and ViT-B/16 have same model structure? If yes, why is ViT-B/16 achieves smaller acc even trained on large images?

Fine-tuning details

Hi,

I am trying to replicate the results of the paper that have been fine-tuned to datasets such as CIFAR-10 and Stanford Cars. Could you give details about hyper-parameters used (like batch size, learning rate etc.)

Thanks.

Code for distillation part

Could share us code for distillation part๏ผŸ Your paperใ€Training data-efficient image transformers
& distillation through attentionใ€‘is great๏ผŒand I find ใ€data-efficientใ€‘part in your code, but without distillation part.

colab

Can you please add a google colab for inference thanks!

no_weight_decay is not called

Hi,

I think VisionTransformer.no_weight_decay() is not used as intended.
https://github.com/rwightman/pytorch-image-models/blob/f8463b8fa9c0490db093b36acfce71fa2363b8c3/timm/models/vision_transformer.py#L255

When using timm, optimizer should be created before model being wrapped by DDP, because model.no_weight_decay() is called when creating optimizer, and DDP doesn't have attribute no_weight_decay.
https://github.com/rwightman/pytorch-image-models/blob/f8463b8fa9c0490db093b36acfce71fa2363b8c3/timm/optim/optim_factory.py#L45

if hasattr(model, 'no_weight_decay'):
    skip = model.no_weight_decay()

Since DDP doesn't have attribute no_weight_decay, model.no_weight_decay() will not be called in create_optimizer and thus weight_decay is applied to all the weights including {'pos_embed', 'cls_token'}.

A quick fix could be changing

deit/main.py

Line 257 in 30eb318

optimizer = create_optimizer(args, model)

to optimizer = create_optimizer(args, model_without_ddp).
But I'm not sure how fixing this will affect performance, since I have already reproduced the reported performance with your current code.

Model_ema

Hi,
I can't find any usage for the model_ema in your code since you are training it and just use it at logging ,
So Can I know what is the usage for it since it doesn't affect the original model at all ?
By the way congratulation for the great work .

failed to load from torch.hub

AFAIK torch.hub needs at least one release to be able to load, since there is no any github release, it has failed to load.

Screen Shot 2021-01-22 at 21 10 53

About the learning rate in finetuning stage

Hi, I want to finetune the deit_base_patch16_384 model in Imagenet with batch size =64 and 128.
Basically, I want to follow

python run_with_submitit.py --model deit_base_patch16_384 --batch-size 32 --finetune https://dl.fbaipublicfiles.com/deit/deit_base_patch16_224-b5f2ef4d.pth --input-size 384 --use_volta32 --nodes 2 --lr 5e-6 --weight-decay 1e-8 --epochs 30 --min-lr 5e-6

But I only have one gpu with only 64 or 128 can be set as the batch size. So I use

python main.py --model deit_base_patch16_224 --batch-size 64 --finetune deit_base_patch16_224-b5f2ef4d.pth --input-size 224 --lr 5e-8 --weight-decay 1e-8 --epochs 30 --min-lr 5e-8 --data-path data/imagenet/

--batch-size 64 --min-lr 5e-8
--batch-size 128 --min-lr 5e-7
Am I right? How can I set the lr respectively?
Thanks.

Loss NAN for Deit Base

I have reproduced the small and tiny model but met with problems for reproducing the base model with 224 and 384 image size. With a large probability, the loss came to NAN after training with few epochs.
My setting is 16 GPUs and the batch size is 64 on each GPU and I do not change any hyper-parameters in run_with_submitit.py . Do you have any idea to solve this problem?
Thanks for your help.

Question: Why use label smoothed trainer output

More of a methodological question, then a repo-related question.
Wouldn't it make more sense to use softmax teacher output to train the student? As opposed to using the label smoothed teacher (hard) output? Why did you choose the label smoothing step if you have access to the teacher model (ie. it's logits or it's softmax output)?

Thanks and great work!

ValueError: LocalExecutor can use only one node. Use nodes=1

Traceback (most recent call last):
File "run_with_submitit.py", line 131, in
main()
File "run_with_submitit.py", line 116, in main
**kwargs
File "/opt/tiger/conda/lib/python3.7/site-packages/submitit/core/core.py", line 638, in update_parameters
self._internal_update_parameters(**kwargs)
File "/opt/tiger/conda/lib/python3.7/site-packages/submitit/auto/auto.py", line 197, in _internal_update_parameters
self._executor._internal_update_parameters(**parameters)
File "/opt/tiger/conda/lib/python3.7/site-packages/submitit/local/local.py", line 158, in _internal_update_parameters
raise ValueError("LocalExecutor can use only one node. Use nodes=1")
ValueError: LocalExecutor can use only one node. Use nodes=1

tiny model accuracy

Accuracy of the network on the 50000 test images: 71.9%
Max accuracy: 71.95%
Training time 1 day, 15:01:41

hi, the accuracy of the tiny model I trained is 71.95, which cannot reach 72.2

Another NCCL error

Hello, thanks for your wonder work!

I also come across an NCCL error on a single node with 4 GPUs.
I run the follow script, as suggested in issue #5:
NCCL_DEBUG=INFO python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --model deit_tiny_patch16_224 --batch-size 256 --data-path /path/to/imagenet
The terminal complaints:

(deit92) [yuxin.fang@gpu-dev006 deit]$ NCCL_DEBUG=INFO bash train.sh
training...
| distributed init (rank 3): env://
| distributed init (rank 1): env://
| distributed init (rank 0): env://
| distributed init (rank 2): env://
gpu-dev006:25486:25486 [0] NCCL INFO Bootstrap : Using [0]enp7s0:10.10.112.56<0> [1]virbr0:192.168.122.1<0> [2]vethee19468:fe80::4463:98ff:fe1a:66c9%vethee19468<0> [3]veth717ea13:fe80::3c8e:dcff:fed2:2236%veth717ea13<0> [4]veth9e7cb5a:fe80::94c9:90ff:fe6f:7fcb%veth9e7cb5a<0> [5]veth74a5bff:fe80::d01d:81ff:fee9:4dfa%veth74a5bff<0> [6]veth8231c1a:fe80::9068:abff:fe35:e6ad%veth8231c1a<0> [7]veth57f4fc5:fe80::446e:a2ff:fe34:fd05%veth57f4fc5<0> [8]veth35d67ed:fe80::9037:67ff:feb8:17b6%veth35d67ed<0> [9]veth22216db:fe80::70b3:b9ff:feef:be53%veth22216db<0> [10]veth207d721:fe80::1837:b5ff:feb6:b5b0%veth207d721<0> [11]veth19a2645:fe80::e4b3:40ff:fe8e:9756%veth19a2645<0> [12]veth52b5332:fe80::8052:d6ff:fe39:7c28%veth52b5332<0> [13]vethef511ca:fe80::64d0:3aff:fe3b:61d7%vethef511ca<0> [14]veth93f8d8c:fe80::d870:9bff:fec8:6c6f%veth93f8d8c<0> [15]vethcbdf2e2:fe80::786d:4fff:fef5:6daf%vethcbdf2e2<0>
gpu-dev006:25486:25486 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
gpu-dev006:25486:25486 [0] NCCL INFO NET/IB : Using [0]mlx4_0:1/RoCE ; OOB enp7s0:10.10.112.56<0>
gpu-dev006:25486:25486 [0] NCCL INFO Using network IB
NCCL version 2.7.8+cuda9.2
gpu-dev006:25488:25488 [2] NCCL INFO Bootstrap : Using [0]enp7s0:10.10.112.56<0> [1]virbr0:192.168.122.1<0> [2]vethee19468:fe80::4463:98ff:fe1a:66c9%vethee19468<0> [3]veth717ea13:fe80::3c8e:dcff:fed2:2236%veth717ea13<0> [4]veth9e7cb5a:fe80::94c9:90ff:fe6f:7fcb%veth9e7cb5a<0> [5]veth74a5bff:fe80::d01d:81ff:fee9:4dfa%veth74a5bff<0> [6]veth8231c1a:fe80::9068:abff:fe35:e6ad%veth8231c1a<0> [7]veth57f4fc5:fe80::446e:a2ff:fe34:fd05%veth57f4fc5<0> [8]veth35d67ed:fe80::9037:67ff:feb8:17b6%veth35d67ed<0> [9]veth22216db:fe80::70b3:b9ff:feef:be53%veth22216db<0> [10]veth207d721:fe80::1837:b5ff:feb6:b5b0%veth207d721<0> [11]veth19a2645:fe80::e4b3:40ff:fe8e:9756%veth19a2645<0> [12]veth52b5332:fe80::8052:d6ff:fe39:7c28%veth52b5332<0> [13]vethef511ca:fe80::64d0:3aff:fe3b:61d7%vethef511ca<0> [14]veth93f8d8c:fe80::d870:9bff:fec8:6c6f%veth93f8d8c<0> [15]vethcbdf2e2:fe80::786d:4fff:fef5:6daf%vethcbdf2e2<0>
gpu-dev006:25488:25488 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
gpu-dev006:25488:25488 [2] NCCL INFO NET/IB : Using [0]mlx4_0:1/RoCE ; OOB enp7s0:10.10.112.56<0>
gpu-dev006:25488:25488 [2] NCCL INFO Using network IB
gpu-dev006:25489:25489 [3] NCCL INFO Bootstrap : Using [0]enp7s0:10.10.112.56<0> [1]virbr0:192.168.122.1<0> [2]vethee19468:fe80::4463:98ff:fe1a:66c9%vethee19468<0> [3]veth717ea13:fe80::3c8e:dcff:fed2:2236%veth717ea13<0> [4]veth9e7cb5a:fe80::94c9:90ff:fe6f:7fcb%veth9e7cb5a<0> [5]veth74a5bff:fe80::d01d:81ff:fee9:4dfa%veth74a5bff<0> [6]veth8231c1a:fe80::9068:abff:fe35:e6ad%veth8231c1a<0> [7]veth57f4fc5:fe80::446e:a2ff:fe34:fd05%veth57f4fc5<0> [8]veth35d67ed:fe80::9037:67ff:feb8:17b6%veth35d67ed<0> [9]veth22216db:fe80::70b3:b9ff:feef:be53%veth22216db<0> [10]veth207d721:fe80::1837:b5ff:feb6:b5b0%veth207d721<0> [11]veth19a2645:fe80::e4b3:40ff:fe8e:9756%veth19a2645<0> [12]veth52b5332:fe80::8052:d6ff:fe39:7c28%veth52b5332<0> [13]vethef511ca:fe80::64d0:3aff:fe3b:61d7%vethef511ca<0> [14]veth93f8d8c:fe80::d870:9bff:fec8:6c6f%veth93f8d8c<0> [15]vethcbdf2e2:fe80::786d:4fff:fef5:6daf%vethcbdf2e2<0>
gpu-dev006:25489:25489 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
gpu-dev006:25489:25489 [3] NCCL INFO NET/IB : Using [0]mlx4_0:1/RoCE ; OOB enp7s0:10.10.112.56<0>
gpu-dev006:25489:25489 [3] NCCL INFO Using network IB
gpu-dev006:25487:25487 [1] NCCL INFO Bootstrap : Using [0]enp7s0:10.10.112.56<0> [1]virbr0:192.168.122.1<0> [2]vethee19468:fe80::4463:98ff:fe1a:66c9%vethee19468<0> [3]veth717ea13:fe80::3c8e:dcff:fed2:2236%veth717ea13<0> [4]veth9e7cb5a:fe80::94c9:90ff:fe6f:7fcb%veth9e7cb5a<0> [5]veth74a5bff:fe80::d01d:81ff:fee9:4dfa%veth74a5bff<0> [6]veth8231c1a:fe80::9068:abff:fe35:e6ad%veth8231c1a<0> [7]veth57f4fc5:fe80::446e:a2ff:fe34:fd05%veth57f4fc5<0> [8]veth35d67ed:fe80::9037:67ff:feb8:17b6%veth35d67ed<0> [9]veth22216db:fe80::70b3:b9ff:feef:be53%veth22216db<0> [10]veth207d721:fe80::1837:b5ff:feb6:b5b0%veth207d721<0> [11]veth19a2645:fe80::e4b3:40ff:fe8e:9756%veth19a2645<0> [12]veth52b5332:fe80::8052:d6ff:fe39:7c28%veth52b5332<0> [13]vethef511ca:fe80::64d0:3aff:fe3b:61d7%vethef511ca<0> [14]veth93f8d8c:fe80::d870:9bff:fec8:6c6f%veth93f8d8c<0> [15]vethcbdf2e2:fe80::786d:4fff:fef5:6daf%vethcbdf2e2<0>
gpu-dev006:25487:25487 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
gpu-dev006:25487:25487 [1] NCCL INFO NET/IB : Using [0]mlx4_0:1/RoCE ; OOB enp7s0:10.10.112.56<0>
gpu-dev006:25487:25487 [1] NCCL INFO Using network IB
gpu-dev006:25486:25652 [0] NCCL INFO Channel 00/02 : 0 1 2 3
gpu-dev006:25489:25656 [3] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/64
gpu-dev006:25487:25659 [1] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/64
gpu-dev006:25488:25654 [2] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/64
gpu-dev006:25487:25659 [1] NCCL INFO Trees [0] 2/-1/-1->1->0|0->1->2/-1/-1 [1] 2/-1/-1->1->0|0->1->2/-1/-1
gpu-dev006:25486:25652 [0] NCCL INFO Channel 01/02 : 0 1 2 3
gpu-dev006:25489:25656 [3] NCCL INFO Trees [0] -1/-1/-1->3->2|2->3->-1/-1/-1 [1] -1/-1/-1->3->2|2->3->-1/-1/-1
gpu-dev006:25488:25654 [2] NCCL INFO Trees [0] 3/-1/-1->2->1|1->2->3/-1/-1 [1] 3/-1/-1->2->1|1->2->3/-1/-1
gpu-dev006:25487:25659 [1] NCCL INFO Setting affinity for GPU 1 to ff
gpu-dev006:25488:25654 [2] NCCL INFO Setting affinity for GPU 2 to ff00
gpu-dev006:25489:25656 [3] NCCL INFO Setting affinity for GPU 3 to ff00
gpu-dev006:25486:25652 [0] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/64
gpu-dev006:25486:25652 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1|-1->0->1/-1/-1 [1] 1/-1/-1->0->-1|-1->0->1/-1/-1
gpu-dev006:25486:25652 [0] NCCL INFO Setting affinity for GPU 0 to ff
gpu-dev006:25488:25654 [2] NCCL INFO Channel 00 : 2[82000] -> 3[83000] via direct shared memory
gpu-dev006:25486:25652 [0] NCCL INFO Channel 00 : 0[2000] -> 1[3000] via direct shared memory
gpu-dev006:25489:25656 [3] NCCL INFO Channel 00 : 3[83000] -> 0[2000] via direct shared memory
gpu-dev006:25487:25659 [1] NCCL INFO Channel 00 : 1[3000] -> 2[82000] via direct shared memory
gpu-dev006:25489:25656 [3] NCCL INFO Channel 00 : 3[83000] -> 2[82000] via direct shared memory
gpu-dev006:25488:25654 [2] NCCL INFO Channel 00 : 2[82000] -> 1[3000] via direct shared memory
gpu-dev006:25487:25659 [1] NCCL INFO Channel 00 : 1[3000] -> 0[2000] via direct shared memory
gpu-dev006:25489:25656 [3] NCCL INFO Channel 01 : 3[83000] -> 0[2000] via direct shared memory
gpu-dev006:25488:25654 [2] NCCL INFO Channel 01 : 2[82000] -> 3[83000] via direct shared memory
gpu-dev006:25487:25659 [1] NCCL INFO Channel 01 : 1[3000] -> 2[82000] via direct shared memory
gpu-dev006:25486:25652 [0] NCCL INFO Channel 01 : 0[2000] -> 1[3000] via direct shared memory
gpu-dev006:25489:25656 [3] NCCL INFO Channel 01 : 3[83000] -> 2[82000] via direct shared memory
gpu-dev006:25488:25654 [2] NCCL INFO Channel 01 : 2[82000] -> 1[3000] via direct shared memory
gpu-dev006:25489:25656 [3] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer
gpu-dev006:25489:25656 [3] NCCL INFO comm 0x7f93ac000d70 rank 3 nranks 4 cudaDev 3 busId 83000 - Init COMPLETE
gpu-dev006:25487:25659 [1] NCCL INFO Channel 01 : 1[3000] -> 0[2000] via direct shared memory
gpu-dev006:25486:25652 [0] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer
gpu-dev006:25486:25652 [0] NCCL INFO comm 0x7fc27c000d70 rank 0 nranks 4 cudaDev 0 busId 2000 - Init COMPLETE
gpu-dev006:25486:25486 [0] NCCL INFO Launch mode Parallel
gpu-dev006:25488:25654 [2] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer
gpu-dev006:25488:25654 [2] NCCL INFO comm 0x7ff108000d70 rank 2 nranks 4 cudaDev 2 busId 82000 - Init COMPLETE
gpu-dev006:25487:25659 [1] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer
gpu-dev006:25487:25659 [1] NCCL INFO comm 0x7f5a68000d70 rank 1 nranks 4 cudaDev 1 busId 3000 - Init COMPLETE
Namespace(aa='rand-m9-mstd0.5-inc1', batch_size=256, clip_grad=None, color_jitter=0.4, cooldown_epochs=10, cutmix=1.0, cutmix_minmax=None, data_path='/home/public_data/zhigang.yang/data/orig_data/imagenet', data_set='IMNET', decay_epochs=30, decay_rate=0.1, device='cuda', dist_backend='nccl', dist_url='env://', distributed=True, drop=0.0, drop_block=None, drop_path=0.1, epochs=300, eval=False, gpu=0, inat_category='name', input_size=224, lr=0.0005, lr_noise=None, lr_noise_pct=0.67, lr_noise_std=1.0, min_lr=1e-05, mixup=0.8, mixup_mode='batch', mixup_prob=1.0, mixup_switch_prob=0.5, model='deit_tiny_patch16_224', model_ema=True, model_ema_decay=0.99996, model_ema_force_cpu=False, momentum=0.9, num_workers=10, opt='adamw', opt_betas=None, opt_eps=1e-08, output_dir='', patience_epochs=10, pin_mem=True, rank=0, recount=1, remode='pixel', repeated_aug=True, reprob=0.25, resplit=False, resume='', sched='cosine', seed=0, smoothing=0.1, start_epoch=0, train_interpolation='bicubic', warmup_epochs=5, warmup_lr=1e-06, weight_decay=0.05, world_size=4)
Creating model: deit_tiny_patch16_224
number of params: 5717416
Start training
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [0,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [1,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [7,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [9,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [10,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [13,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [14,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [16,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [17,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [21,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [25,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [30,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [31,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [32,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [35,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [36,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [37,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [38,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [39,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [41,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [42,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [44,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [46,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [48,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [49,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [50,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [53,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [54,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [56,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [57,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [58,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [60,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [61,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [62,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [0,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [1,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [7,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [10,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [13,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [14,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [16,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [17,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [24,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [25,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [31,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [32,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [33,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [35,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [37,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [38,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [41,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [42,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [44,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [46,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [48,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [49,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [50,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [53,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [56,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [57,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [58,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [60,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [61,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [62,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [0,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [1,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [10,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [14,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [17,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [22,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [24,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [31,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [32,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [33,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [35,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [37,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [38,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [40,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [41,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [42,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [43,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [44,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [48,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [49,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [50,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [53,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [55,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [56,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [57,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [58,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [60,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [61,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [62,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [0,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [1,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [2,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [8,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [10,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [11,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [14,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [17,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [22,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [24,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [26,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [31,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [32,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [33,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [37,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [38,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [40,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [42,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [43,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [47,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [48,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [49,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [50,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [55,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [57,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [58,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [59,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [60,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [61,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [62,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
Traceback (most recent call last):
File "main.py", line 335, in
main(args)
File "main.py", line 295, in main
args.clip_grad, model_ema, mixup_fn
File "/home/users/yuxin.fang/vt/deit/engine.py", line 42, in train_one_epoch
outputs = model(samples)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 619, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/timm/models/vision_transformer.py", line 281, in forward
x = self.forward_features(x)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/timm/models/vision_transformer.py", line 267, in forward_features
x = self.patch_embed(x)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/timm/models/vision_transformer.py", line 165, in forward
x = self.proj(x).flatten(2).transpose(1, 2)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 423, in forward
return self._conv_forward(input, self.weight)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 420, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED
Traceback (most recent call last):
File "main.py", line 335, in
main(args)
File "main.py", line 295, in main
args.clip_grad, model_ema, mixup_fn
File "/home/users/yuxin.fang/vt/deit/engine.py", line 39, in train_one_epoch
samples, targets = mixup_fn(samples, targets)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/timm/data/mixup.py", line 217, in call
target = mixup_target(target, self.num_classes, lam, self.label_smoothing)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/timm/data/mixup.py", line 27, in mixup_target
return y1 * lam + y2 * (1. - lam)
RuntimeError: CUDA error: device-side assert triggered

gpu-dev006:25487:25487 [1] init.cc:924 NCCL WARN Cuda failure 'device-side assert triggered'
terminate called after throwing an instance of 'std::runtime_error'
what(): NCCL error in: /opt/conda/conda-bld/pytorch_1607370144807/work/torch/lib/c10d/../c10d/NCCLUtils.hpp:136, unhandled cuda error, NCCL version 2.7.8

gpu-dev006:25489:25489 [3] init.cc:924 NCCL WARN Cuda failure 'device-side assert triggered'
terminate called after throwing an instance of 'std::runtime_error'
what(): NCCL error in: /opt/conda/conda-bld/pytorch_1607370144807/work/torch/lib/c10d/../c10d/NCCLUtils.hpp:136, unhandled cuda error, NCCL version 2.7.8
Traceback (most recent call last):
File "main.py", line 335, in
main(args)
File "main.py", line 295, in main
args.clip_grad, model_ema, mixup_fn
File "/home/users/yuxin.fang/vt/deit/engine.py", line 42, in train_one_epoch
outputs = model(samples)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 619, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/timm/models/vision_transformer.py", line 281, in forward
x = self.forward_features(x)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/timm/models/vision_transformer.py", line 267, in forward_features
x = self.patch_embed(x)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/timm/models/vision_transformer.py", line 165, in forward
x = self.proj(x).flatten(2).transpose(1, 2)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 423, in forward
return self._conv_forward(input, self.weight)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 420, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED
Traceback (most recent call last):
File "main.py", line 335, in
main(args)
File "main.py", line 295, in main
args.clip_grad, model_ema, mixup_fn
File "/home/users/yuxin.fang/vt/deit/engine.py", line 42, in train_one_epoch
outputs = model(samples)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 619, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/timm/models/vision_transformer.py", line 281, in forward
x = self.forward_features(x)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/timm/models/vision_transformer.py", line 267, in forward_features
x = self.patch_embed(x)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/timm/models/vision_transformer.py", line 165, in forward
x = self.proj(x).flatten(2).transpose(1, 2)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 423, in forward
return self._conv_forward(input, self.weight)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 420, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

gpu-dev006:25488:25488 [2] init.cc:924 NCCL WARN Cuda failure 'device-side assert triggered'
terminate called after throwing an instance of 'std::runtime_error'
what(): NCCL error in: /opt/conda/conda-bld/pytorch_1607370144807/work/torch/lib/c10d/../c10d/NCCLUtils.hpp:136, unhandled cuda error, NCCL version 2.7.8

gpu-dev006:25486:25486 [0] init.cc:924 NCCL WARN Cuda failure 'device-side assert triggered'
terminate called after throwing an instance of 'std::runtime_error'
what(): NCCL error in: /opt/conda/conda-bld/pytorch_1607370144807/work/torch/lib/c10d/../c10d/NCCLUtils.hpp:136, unhandled cuda error, NCCL version 2.7.8
Traceback (most recent call last):
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/distributed/launch.py", line 260, in
main()
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/distributed/launch.py", line 256, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/home/users/yuxin.fang/anaconda3/envs/deit92/bin/python', '-u', 'main.py', '--model', 'deit_tiny_patch16_224', '--batch-size', '256', '--data-path', '/home/public_data/zhigang.yang/data/orig_data/imagenet']' died with <Signals.SIGABRT: 6>.
(deit92) [yuxin.fang@gpu-dev006 deit]$

Since the DeiT's implementation is heavily depends on the timm, so I run the training script of EfficientNet_B0 using timm on the same machine under the same env, with 0 warning & 0 error.

Could you help me fix this? Thanks.

Question about the convergence of the Deit-base model

Great work! and thanks for sharing the codes.

I am trying to re-train Deit base model but I encountered some issues.
May I ask for your insights?

I can reproduce the reported results 81.8% with all default setting; however, the performance degrades a lot if I change two very minor hyperparameters

  1. Change batch size to 512 (default is 1024), and learning rate is automatically scaled based on your codes.
  2. Keep batch size to 1024 but increase the warmup epochs to 10 (default is 5).

Here is the test accuracy over epochs

The orange line is the default setting. (81.8%)
The blue line is batch size 512. (78.8%)
The green line is using 10 epochs for warmup. (79.2%)

Testing accuracy curve
deit-base

Zoom in for the first 50 epochs
zoom-in

For the default setting, it seems that the model is going to diverge around the 6-th epoch but it recovers later, and then it eventually achieve pretty good results. (81.8%)
However, when using smaller batch size or warmup for additional 5 epochs, the performance degrades ~3%

I wonder that do you observe the same trend? and do you have any insights into why two small changes I made will affect so much?

My env:
pytorch 1.7, timm 0.3.2, torchvision 0.8

Thanks.

Image Regression

Hi,
Is there any way to solve an image regression problem with deit?
A problem like "age prediction based on image" or similar.
Thanks.

Removing Last FC layer & manual looping

I need to insert custom layers between the transformer modules and classify among K(!=1000) classes.
For that, If I try to remove last FC layer and loop over other modules manually, it yields the output of size (batch_size, 196, 768) instead of expected (batch_size, 768):

Removing last layer:

self.model = torch.hub.load('facebookresearch/deit:main', 'deit_base_patch16_224', pretrained=True)
self.model = nn.Sequential(*list(self.model.children())[:-1])

Manual looping:
input: (batch_size, 3, 224, 224)

output = self.model[1](self.model[0](input))    # (patch_embed & pos_drop;   output_size: (batch_size, 196, 768))
for i in range(0, 12):  
    output = self.model[2][i](output)     # transformer blocks;   output_size: (batch_size, 196, 768)
output = self.model[3](output)    # LayerNorm;   output_size: (batch_size, 196, 768)

Doing self.model(input) works as expected. (i.e. no manual looping).
Am I doing something wrong? This usually works for other torchvision models.

100 epoch or 300 epoch, which is correct?

Hi

Thank you for you great work.
I'm trying to reproduce performance on ImageNet.
I have a question on training setting.

In the paper, you mentioned that
Formally it means that we have 100 epochs, but each is 3x longer because of the repeated augmentations. We prefer to refer to this as 300 epochs.
In the code, repeated augment is used by default

deit/main.py

Line 106 in 4e91d25

parser.set_defaults(repeated_aug=True)

But, training epochs is 300 epochs

deit/main.py

Line 34 in 4e91d25

parser.add_argument('--epochs', default=300, type=int)

As I understand, it should be 100 epochs by the paper.
Is there any code that reduces actual training epochs ? or Should I train it for 300 epochs to reproduce the performance ?

no pre_logits

I found that the model trained does not work well in transfer, I suspect it might because the model in this repo does not come with pre_logits

Image regression

Hi,
Is there any way to solve an image regression problem with deit?
A problem like "age prediction based on image" or similar.
Thanks.

Image size 384

Hi , how can i use model DeIT base for image with size 384 ? I just typed the script with " deit_base_patch16_384" but it Cannot find callable deit_base_patch16_384 in hubconf

Transfer Learning

Can you add a tutorial about how to do a transfer learning ? Thank for your excellent work !

NCCL error

Thanks for the code release!

I tried to launch a run on submitit + slurm with the default parameters (python run_with_submitit.py) and after a couple epochs the job died with the following error in one of the 16 processes and wasn't automatically restarted:

Traceback (most recent call last):
  File "/private/home/norm/miniconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/private/home/norm/miniconda3/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/private/home/norm/miniconda3/lib/python3.8/site-packages/submitit/core/_submit.py", line 11, in <module>
    submitit_main()
  File "/private/home/norm/miniconda3/lib/python3.8/site-packages/submitit/core/submission.py", line 65, in submitit_main
    process_job(args.folder)
  File "/private/home/norm/miniconda3/lib/python3.8/site-packages/submitit/core/submission.py", line 58, in process_job
    raise error
  File "/private/home/norm/miniconda3/lib/python3.8/site-packages/submitit/core/submission.py", line 47, in process_job
    result = delayed.result()
  File "/private/home/norm/miniconda3/lib/python3.8/site-packages/submitit/core/utils.py", line 123, in result
    self._result = self.function(*self.args, **self.kwargs)
  File "run_with_submitit.py", line 60, in __call__
    classification.main(self.args)
  File "/private/home/norm/code/deit/main.py", line 165, in main
    utils.init_distributed_mode(args)
  File "/private/home/norm/code/deit/utils.py", line 243, in init_distributed_mode
    torch.distributed.init_process_group(backend=args.dist_backend, init_method=args.dist_url,
  File "/private/home/norm/miniconda3/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 455, in init_process_group
    barrier()
  File "/private/home/norm/miniconda3/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 1960, in barrier
    work = _default_pg.barrier()
RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1607370117127/work/torch/lib/c10d/ProcessGroupNCCL.cpp:784, unhandled system error, NCCL version 2.7.8

I re-ran the job manually and it seems to be doing fine now, but did you ever run into this error? How did you fix this?

batch size

hi, If using multi node to train the base model, should the total batch-size be set to 1024?

multi node training

Hi, the code will always stay here when I using multi node and training to the fifth epoch. The gpu utilization will suddenly become 0

3511610383641_ pic_hd

evaluation is not running in parallel on multiple gpus

I'm trying out the code with a dataset that has about 190.000 training images and about 81.000 validation images. With a batch size of 64 and 8 GPUs, the progress stats report
372 steps for a training epoch and
846 steps for an eval epoch.

372*8*64=190464
846*64*1.5=81216
Also nvidia-smi reports all except one gpu are utilized during the eval step. As a quickfix I just evaluate every 10th epoch now. But it would be great if this could be parallelized.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.