facebookresearch / deit Goto Github PK
View Code? Open in Web Editor NEWOfficial DeiT repository
License: Apache License 2.0
Official DeiT repository
License: Apache License 2.0
Hello, when just valuate the ImageNet ILSVRC2012 test set with your provide model weight, I
run the command
CUDA_VISIBLE_DEVICES=4, python main.py --eval --resume https://dl.fbaipublicfiles.com/deit/deit_base_patch16_224-b5f2ef4d.pth --data-path ~/Dataset/ILSVRC2012/ --batch-size 256
and get the output as following:
Not using distributed mode
Namespace(aa='rand-m9-mstd0.5-inc1', batch_size=256, clip_grad=None, color_jitter=0.4, cooldown_epochs=10, cutmix=1.0, cutmix_minmax=None, data_path='/home/PengZhiliang/Dataset/ILSVRC2012/', data_set='IMNET', decay_epochs=30, decay_rate=0.1, device='cuda', dist_url='env://', distributed=False, drop=0.0, drop_block=None, drop_path=0.1, epochs=300, eval=True, inat_category='name', input_size=224, lr=0.0005, lr_noise=None, lr_noise_pct=0.67, lr_noise_std=1.0, min_lr=1e-05, mixup=0.8, mixup_mode='batch', mixup_prob=1.0, mixup_switch_prob=0.5, model='deit_base_patch16_224', model_ema=True, model_ema_decay=0.99996, model_ema_force_cpu=False, momentum=0.9, num_workers=10, opt='adamw', opt_betas=None, opt_eps=1e-08, output_dir='', patience_epochs=10, pin_mem=True, recount=1, remode='pixel', repeated_aug=True, reprob=0.25, resplit=False, resume='https://dl.fbaipublicfiles.com/deit/deit_base_patch16_224-b5f2ef4d.pth', sched='cosine', seed=0, smoothing=0.1, start_epoch=0, train_interpolation='bicubic', warmup_epochs=5, warmup_lr=1e-06, weight_decay=0.05, world_size=1)
Creating model: deit_base_patch16_224
number of params: 86567656
Test: [ 0/131] eta: 0:55:44 loss: 0.5442 (0.5442) acc1: 89.0625 (89.0625) acc5: 97.9167 (97.9167) time: 25.5341 data: 5.6634 max mem: 3764
Test: [ 10/131] eta: 0:06:50 loss: 0.7311 (0.7458) acc1: 82.5521 (83.2623) acc5: 96.6146 (96.5672) time: 3.3911 data: 0.6908 max mem: 3765
Test: [ 20/131] eta: 0:04:16 loss: 0.6279 (0.6319) acc1: 86.9792 (86.8180) acc5: 97.1354 (97.0982) time: 1.1467 data: 0.0969 max mem: 3765
Test: [ 30/131] eta: 0:03:01 loss: 0.6306 (0.6680) acc1: 86.7188 (85.7275) acc5: 97.1354 (96.8834) time: 0.9225 data: 0.0003 max mem: 3765
Test: [ 40/131] eta: 0:02:19 loss: 0.7467 (0.6843) acc1: 82.8125 (85.1880) acc5: 96.8750 (96.9957) time: 0.7303 data: 0.0003 max mem: 3765
Test: [ 50/131] eta: 0:01:51 loss: 0.6383 (0.6821) acc1: 84.1146 (85.1563) acc5: 97.6562 (97.0537) time: 0.7351 data: 0.0003 max mem: 3765
Test: [ 60/131] eta: 0:01:30 loss: 0.8259 (0.7335) acc1: 80.7292 (83.9737) acc5: 95.0521 (96.4566) time: 0.7368 data: 0.0003 max mem: 3765
Test: [ 70/131] eta: 0:01:13 loss: 1.0689 (0.7899) acc1: 75.0000 (82.4604) acc5: 93.2292 (95.9067) time: 0.7361 data: 0.0003 max mem: 3765
Test: [ 80/131] eta: 0:00:58 loss: 1.0258 (0.8079) acc1: 77.0833 (82.2499) acc5: 92.9688 (95.6340) time: 0.7379 data: 0.0002 max mem: 3765
Test: [ 90/131] eta: 0:00:45 loss: 0.9900 (0.8380) acc1: 79.6875 (81.4618) acc5: 92.9688 (95.3383) time: 0.7396 data: 0.0002 max mem: 3765
Test: [100/131] eta: 0:00:32 loss: 1.0648 (0.8557) acc1: 75.2604 (81.1237) acc5: 92.4479 (95.1140) time: 0.7379 data: 0.0002 max mem: 3765
Test: [110/131] eta: 0:00:21 loss: 1.0434 (0.8747) acc1: 77.8646 (80.7057) acc5: 92.4479 (94.9324) time: 0.7389 data: 0.0002 max mem: 3765
Test: [120/131] eta: 0:00:11 loss: 0.9864 (0.8857) acc1: 78.1250 (80.3891) acc5: 92.9688 (94.8390) time: 0.7830 data: 0.0001 max mem: 3765
Test: [130/131] eta: 0:00:01 loss: 0.9252 (0.8872) acc1: 78.6458 (80.4440) acc5: 95.3125 (94.8820) time: 0.9149 data: 0.0001 max mem: 3765
Test: Total time: 0:02:13 (1.0171 s / it)
* Acc@1 80.444 Acc@5 94.882 loss 0.887
Accuracy of the network on the 50000 test images: 80.4%
The accuracy rate is only 80.4, which is 1.4 lower than the 81.8 you reported. And I can guarantee that the code has not been modified.
And the conda environment is:
Package Version
----------------- -------------------
certifi 2020.12.5
mkl-fft 1.2.0
mkl-random 1.1.1
mkl-service 2.3.0
numpy 1.19.2
olefile 0.46
Pillow 8.0.1
pip 20.3.3
setuptools 51.0.0.post20201207
six 1.15.0
timm 0.3.2
torch 1.7.1
torch-summary 1.4.5
torchvision 0.8.2
typing-extensions 3.7.4.3
wheel 0.36.2
Whether it is on TITAN RTX, 2080Ti or 3090, the accuracy rate is only 80.4.
Similarly, the accuracy of deit_small_patch16_224
and deit_tiny_patch16_224
are lower than your reported.
Thanks for the code release!
I tried to launch a run on submitit + slurm with the default parameters (python run_with_submitit.py
) and after a couple epochs the job died with the following error in one of the 16 processes and wasn't automatically restarted:
Traceback (most recent call last):
File "/private/home/norm/miniconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/private/home/norm/miniconda3/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/private/home/norm/miniconda3/lib/python3.8/site-packages/submitit/core/_submit.py", line 11, in <module>
submitit_main()
File "/private/home/norm/miniconda3/lib/python3.8/site-packages/submitit/core/submission.py", line 65, in submitit_main
process_job(args.folder)
File "/private/home/norm/miniconda3/lib/python3.8/site-packages/submitit/core/submission.py", line 58, in process_job
raise error
File "/private/home/norm/miniconda3/lib/python3.8/site-packages/submitit/core/submission.py", line 47, in process_job
result = delayed.result()
File "/private/home/norm/miniconda3/lib/python3.8/site-packages/submitit/core/utils.py", line 123, in result
self._result = self.function(*self.args, **self.kwargs)
File "run_with_submitit.py", line 60, in __call__
classification.main(self.args)
File "/private/home/norm/code/deit/main.py", line 165, in main
utils.init_distributed_mode(args)
File "/private/home/norm/code/deit/utils.py", line 243, in init_distributed_mode
torch.distributed.init_process_group(backend=args.dist_backend, init_method=args.dist_url,
File "/private/home/norm/miniconda3/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 455, in init_process_group
barrier()
File "/private/home/norm/miniconda3/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 1960, in barrier
work = _default_pg.barrier()
RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1607370117127/work/torch/lib/c10d/ProcessGroupNCCL.cpp:784, unhandled system error, NCCL version 2.7.8
I re-ran the job manually and it seems to be doing fine now, but did you ever run into this error? How did you fix this?
Very interesting paper.
Can I replicate cifar 10 or cifar 100 using this code base?
Best Wishes
hi, If using multi node to train the base model, should the total batch-size
be set to 1024?
More of a methodological question, then a repo-related question.
Wouldn't it make more sense to use softmax teacher output to train the student? As opposed to using the label smoothed teacher (hard) output? Why did you choose the label smoothing step if you have access to the teacher model (ie. it's logits or it's softmax output)?
Thanks and great work!
What do the image / sec throughput numbers represent (train, inferences, batch size, mixed-prc or float32, etc)? They are lower than any inference numbers I'm familiar with for any of the listed models. They also don't seem to match expected training throughputs and have an odd spread (smallest to largest models), being quite low for the smaller models (CPU bound?).
I don't spend much time with V100, but relative to Titan RTX and RTX 3090 I have a fairly good idea where the numbers should fall...
Thanks
Hi,
I am trying to replicate the results of the paper that have been fine-tuned to datasets such as CIFAR-10 and Stanford Cars. Could you give details about hyper-parameters used (like batch size, learning rate etc.)
Thanks.
Great work~
Looking forward to the pre-trained model with distillation and the code for training with convnet teachers.
Hi,
I can't find any usage for the model_ema in your code since you are training it and just use it at logging ,
So Can I know what is the usage for it since it doesn't affect the original model at all ?
By the way congratulation for the great work .
Hello, thanks for your wonder work!
I also come across an NCCL error on a single node with 4 GPUs.
I run the follow script, as suggested in issue #5:
NCCL_DEBUG=INFO python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --model deit_tiny_patch16_224 --batch-size 256 --data-path /path/to/imagenet
The terminal complaints:
(deit92) [yuxin.fang@gpu-dev006 deit]$ NCCL_DEBUG=INFO bash train.sh
training...
| distributed init (rank 3): env://
| distributed init (rank 1): env://
| distributed init (rank 0): env://
| distributed init (rank 2): env://
gpu-dev006:25486:25486 [0] NCCL INFO Bootstrap : Using [0]enp7s0:10.10.112.56<0> [1]virbr0:192.168.122.1<0> [2]vethee19468:fe80::4463:98ff:fe1a:66c9%vethee19468<0> [3]veth717ea13:fe80::3c8e:dcff:fed2:2236%veth717ea13<0> [4]veth9e7cb5a:fe80::94c9:90ff:fe6f:7fcb%veth9e7cb5a<0> [5]veth74a5bff:fe80::d01d:81ff:fee9:4dfa%veth74a5bff<0> [6]veth8231c1a:fe80::9068:abff:fe35:e6ad%veth8231c1a<0> [7]veth57f4fc5:fe80::446e:a2ff:fe34:fd05%veth57f4fc5<0> [8]veth35d67ed:fe80::9037:67ff:feb8:17b6%veth35d67ed<0> [9]veth22216db:fe80::70b3:b9ff:feef:be53%veth22216db<0> [10]veth207d721:fe80::1837:b5ff:feb6:b5b0%veth207d721<0> [11]veth19a2645:fe80::e4b3:40ff:fe8e:9756%veth19a2645<0> [12]veth52b5332:fe80::8052:d6ff:fe39:7c28%veth52b5332<0> [13]vethef511ca:fe80::64d0:3aff:fe3b:61d7%vethef511ca<0> [14]veth93f8d8c:fe80::d870:9bff:fec8:6c6f%veth93f8d8c<0> [15]vethcbdf2e2:fe80::786d:4fff:fef5:6daf%vethcbdf2e2<0>
gpu-dev006:25486:25486 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
gpu-dev006:25486:25486 [0] NCCL INFO NET/IB : Using [0]mlx4_0:1/RoCE ; OOB enp7s0:10.10.112.56<0>
gpu-dev006:25486:25486 [0] NCCL INFO Using network IB
NCCL version 2.7.8+cuda9.2
gpu-dev006:25488:25488 [2] NCCL INFO Bootstrap : Using [0]enp7s0:10.10.112.56<0> [1]virbr0:192.168.122.1<0> [2]vethee19468:fe80::4463:98ff:fe1a:66c9%vethee19468<0> [3]veth717ea13:fe80::3c8e:dcff:fed2:2236%veth717ea13<0> [4]veth9e7cb5a:fe80::94c9:90ff:fe6f:7fcb%veth9e7cb5a<0> [5]veth74a5bff:fe80::d01d:81ff:fee9:4dfa%veth74a5bff<0> [6]veth8231c1a:fe80::9068:abff:fe35:e6ad%veth8231c1a<0> [7]veth57f4fc5:fe80::446e:a2ff:fe34:fd05%veth57f4fc5<0> [8]veth35d67ed:fe80::9037:67ff:feb8:17b6%veth35d67ed<0> [9]veth22216db:fe80::70b3:b9ff:feef:be53%veth22216db<0> [10]veth207d721:fe80::1837:b5ff:feb6:b5b0%veth207d721<0> [11]veth19a2645:fe80::e4b3:40ff:fe8e:9756%veth19a2645<0> [12]veth52b5332:fe80::8052:d6ff:fe39:7c28%veth52b5332<0> [13]vethef511ca:fe80::64d0:3aff:fe3b:61d7%vethef511ca<0> [14]veth93f8d8c:fe80::d870:9bff:fec8:6c6f%veth93f8d8c<0> [15]vethcbdf2e2:fe80::786d:4fff:fef5:6daf%vethcbdf2e2<0>
gpu-dev006:25488:25488 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
gpu-dev006:25488:25488 [2] NCCL INFO NET/IB : Using [0]mlx4_0:1/RoCE ; OOB enp7s0:10.10.112.56<0>
gpu-dev006:25488:25488 [2] NCCL INFO Using network IB
gpu-dev006:25489:25489 [3] NCCL INFO Bootstrap : Using [0]enp7s0:10.10.112.56<0> [1]virbr0:192.168.122.1<0> [2]vethee19468:fe80::4463:98ff:fe1a:66c9%vethee19468<0> [3]veth717ea13:fe80::3c8e:dcff:fed2:2236%veth717ea13<0> [4]veth9e7cb5a:fe80::94c9:90ff:fe6f:7fcb%veth9e7cb5a<0> [5]veth74a5bff:fe80::d01d:81ff:fee9:4dfa%veth74a5bff<0> [6]veth8231c1a:fe80::9068:abff:fe35:e6ad%veth8231c1a<0> [7]veth57f4fc5:fe80::446e:a2ff:fe34:fd05%veth57f4fc5<0> [8]veth35d67ed:fe80::9037:67ff:feb8:17b6%veth35d67ed<0> [9]veth22216db:fe80::70b3:b9ff:feef:be53%veth22216db<0> [10]veth207d721:fe80::1837:b5ff:feb6:b5b0%veth207d721<0> [11]veth19a2645:fe80::e4b3:40ff:fe8e:9756%veth19a2645<0> [12]veth52b5332:fe80::8052:d6ff:fe39:7c28%veth52b5332<0> [13]vethef511ca:fe80::64d0:3aff:fe3b:61d7%vethef511ca<0> [14]veth93f8d8c:fe80::d870:9bff:fec8:6c6f%veth93f8d8c<0> [15]vethcbdf2e2:fe80::786d:4fff:fef5:6daf%vethcbdf2e2<0>
gpu-dev006:25489:25489 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
gpu-dev006:25489:25489 [3] NCCL INFO NET/IB : Using [0]mlx4_0:1/RoCE ; OOB enp7s0:10.10.112.56<0>
gpu-dev006:25489:25489 [3] NCCL INFO Using network IB
gpu-dev006:25487:25487 [1] NCCL INFO Bootstrap : Using [0]enp7s0:10.10.112.56<0> [1]virbr0:192.168.122.1<0> [2]vethee19468:fe80::4463:98ff:fe1a:66c9%vethee19468<0> [3]veth717ea13:fe80::3c8e:dcff:fed2:2236%veth717ea13<0> [4]veth9e7cb5a:fe80::94c9:90ff:fe6f:7fcb%veth9e7cb5a<0> [5]veth74a5bff:fe80::d01d:81ff:fee9:4dfa%veth74a5bff<0> [6]veth8231c1a:fe80::9068:abff:fe35:e6ad%veth8231c1a<0> [7]veth57f4fc5:fe80::446e:a2ff:fe34:fd05%veth57f4fc5<0> [8]veth35d67ed:fe80::9037:67ff:feb8:17b6%veth35d67ed<0> [9]veth22216db:fe80::70b3:b9ff:feef:be53%veth22216db<0> [10]veth207d721:fe80::1837:b5ff:feb6:b5b0%veth207d721<0> [11]veth19a2645:fe80::e4b3:40ff:fe8e:9756%veth19a2645<0> [12]veth52b5332:fe80::8052:d6ff:fe39:7c28%veth52b5332<0> [13]vethef511ca:fe80::64d0:3aff:fe3b:61d7%vethef511ca<0> [14]veth93f8d8c:fe80::d870:9bff:fec8:6c6f%veth93f8d8c<0> [15]vethcbdf2e2:fe80::786d:4fff:fef5:6daf%vethcbdf2e2<0>
gpu-dev006:25487:25487 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
gpu-dev006:25487:25487 [1] NCCL INFO NET/IB : Using [0]mlx4_0:1/RoCE ; OOB enp7s0:10.10.112.56<0>
gpu-dev006:25487:25487 [1] NCCL INFO Using network IB
gpu-dev006:25486:25652 [0] NCCL INFO Channel 00/02 : 0 1 2 3
gpu-dev006:25489:25656 [3] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/64
gpu-dev006:25487:25659 [1] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/64
gpu-dev006:25488:25654 [2] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/64
gpu-dev006:25487:25659 [1] NCCL INFO Trees [0] 2/-1/-1->1->0|0->1->2/-1/-1 [1] 2/-1/-1->1->0|0->1->2/-1/-1
gpu-dev006:25486:25652 [0] NCCL INFO Channel 01/02 : 0 1 2 3
gpu-dev006:25489:25656 [3] NCCL INFO Trees [0] -1/-1/-1->3->2|2->3->-1/-1/-1 [1] -1/-1/-1->3->2|2->3->-1/-1/-1
gpu-dev006:25488:25654 [2] NCCL INFO Trees [0] 3/-1/-1->2->1|1->2->3/-1/-1 [1] 3/-1/-1->2->1|1->2->3/-1/-1
gpu-dev006:25487:25659 [1] NCCL INFO Setting affinity for GPU 1 to ff
gpu-dev006:25488:25654 [2] NCCL INFO Setting affinity for GPU 2 to ff00
gpu-dev006:25489:25656 [3] NCCL INFO Setting affinity for GPU 3 to ff00
gpu-dev006:25486:25652 [0] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/64
gpu-dev006:25486:25652 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1|-1->0->1/-1/-1 [1] 1/-1/-1->0->-1|-1->0->1/-1/-1
gpu-dev006:25486:25652 [0] NCCL INFO Setting affinity for GPU 0 to ff
gpu-dev006:25488:25654 [2] NCCL INFO Channel 00 : 2[82000] -> 3[83000] via direct shared memory
gpu-dev006:25486:25652 [0] NCCL INFO Channel 00 : 0[2000] -> 1[3000] via direct shared memory
gpu-dev006:25489:25656 [3] NCCL INFO Channel 00 : 3[83000] -> 0[2000] via direct shared memory
gpu-dev006:25487:25659 [1] NCCL INFO Channel 00 : 1[3000] -> 2[82000] via direct shared memory
gpu-dev006:25489:25656 [3] NCCL INFO Channel 00 : 3[83000] -> 2[82000] via direct shared memory
gpu-dev006:25488:25654 [2] NCCL INFO Channel 00 : 2[82000] -> 1[3000] via direct shared memory
gpu-dev006:25487:25659 [1] NCCL INFO Channel 00 : 1[3000] -> 0[2000] via direct shared memory
gpu-dev006:25489:25656 [3] NCCL INFO Channel 01 : 3[83000] -> 0[2000] via direct shared memory
gpu-dev006:25488:25654 [2] NCCL INFO Channel 01 : 2[82000] -> 3[83000] via direct shared memory
gpu-dev006:25487:25659 [1] NCCL INFO Channel 01 : 1[3000] -> 2[82000] via direct shared memory
gpu-dev006:25486:25652 [0] NCCL INFO Channel 01 : 0[2000] -> 1[3000] via direct shared memory
gpu-dev006:25489:25656 [3] NCCL INFO Channel 01 : 3[83000] -> 2[82000] via direct shared memory
gpu-dev006:25488:25654 [2] NCCL INFO Channel 01 : 2[82000] -> 1[3000] via direct shared memory
gpu-dev006:25489:25656 [3] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer
gpu-dev006:25489:25656 [3] NCCL INFO comm 0x7f93ac000d70 rank 3 nranks 4 cudaDev 3 busId 83000 - Init COMPLETE
gpu-dev006:25487:25659 [1] NCCL INFO Channel 01 : 1[3000] -> 0[2000] via direct shared memory
gpu-dev006:25486:25652 [0] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer
gpu-dev006:25486:25652 [0] NCCL INFO comm 0x7fc27c000d70 rank 0 nranks 4 cudaDev 0 busId 2000 - Init COMPLETE
gpu-dev006:25486:25486 [0] NCCL INFO Launch mode Parallel
gpu-dev006:25488:25654 [2] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer
gpu-dev006:25488:25654 [2] NCCL INFO comm 0x7ff108000d70 rank 2 nranks 4 cudaDev 2 busId 82000 - Init COMPLETE
gpu-dev006:25487:25659 [1] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer
gpu-dev006:25487:25659 [1] NCCL INFO comm 0x7f5a68000d70 rank 1 nranks 4 cudaDev 1 busId 3000 - Init COMPLETE
Namespace(aa='rand-m9-mstd0.5-inc1', batch_size=256, clip_grad=None, color_jitter=0.4, cooldown_epochs=10, cutmix=1.0, cutmix_minmax=None, data_path='/home/public_data/zhigang.yang/data/orig_data/imagenet', data_set='IMNET', decay_epochs=30, decay_rate=0.1, device='cuda', dist_backend='nccl', dist_url='env://', distributed=True, drop=0.0, drop_block=None, drop_path=0.1, epochs=300, eval=False, gpu=0, inat_category='name', input_size=224, lr=0.0005, lr_noise=None, lr_noise_pct=0.67, lr_noise_std=1.0, min_lr=1e-05, mixup=0.8, mixup_mode='batch', mixup_prob=1.0, mixup_switch_prob=0.5, model='deit_tiny_patch16_224', model_ema=True, model_ema_decay=0.99996, model_ema_force_cpu=False, momentum=0.9, num_workers=10, opt='adamw', opt_betas=None, opt_eps=1e-08, output_dir='', patience_epochs=10, pin_mem=True, rank=0, recount=1, remode='pixel', repeated_aug=True, reprob=0.25, resplit=False, resume='', sched='cosine', seed=0, smoothing=0.1, start_epoch=0, train_interpolation='bicubic', warmup_epochs=5, warmup_lr=1e-06, weight_decay=0.05, world_size=4)
Creating model: deit_tiny_patch16_224
number of params: 5717416
Start training
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [0,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [1,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [7,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [9,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [10,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [13,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [14,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [16,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [17,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [21,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [25,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [30,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [31,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [32,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [35,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [36,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [37,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [38,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [39,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [41,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [42,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [44,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [46,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [48,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [49,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [50,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [53,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [54,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [56,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [57,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [58,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [60,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [61,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [62,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [0,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [1,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [7,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [10,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [13,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [14,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [16,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [17,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [24,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [25,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [31,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [32,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [33,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [35,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [37,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [38,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [41,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [42,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [44,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [46,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [48,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [49,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [50,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [53,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [56,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [57,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [58,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [60,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [61,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [62,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [0,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [1,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [10,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [14,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [17,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [22,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [24,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [31,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [32,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [33,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [35,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [37,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [38,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [40,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [41,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [42,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [43,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [44,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [48,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [49,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [50,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [53,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [55,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [56,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [57,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [58,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [60,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [61,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [62,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [0,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [1,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [2,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [8,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [10,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [11,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [14,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [17,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [22,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [24,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [26,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [31,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [32,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [33,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [37,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [38,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [40,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [42,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [43,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [47,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [48,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [49,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [50,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [55,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [57,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [58,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [59,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [60,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [61,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1607370144807/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [62,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed.
Traceback (most recent call last):
File "main.py", line 335, in
main(args)
File "main.py", line 295, in main
args.clip_grad, model_ema, mixup_fn
File "/home/users/yuxin.fang/vt/deit/engine.py", line 42, in train_one_epoch
outputs = model(samples)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 619, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/timm/models/vision_transformer.py", line 281, in forward
x = self.forward_features(x)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/timm/models/vision_transformer.py", line 267, in forward_features
x = self.patch_embed(x)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/timm/models/vision_transformer.py", line 165, in forward
x = self.proj(x).flatten(2).transpose(1, 2)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 423, in forward
return self._conv_forward(input, self.weight)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 420, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED
Traceback (most recent call last):
File "main.py", line 335, in
main(args)
File "main.py", line 295, in main
args.clip_grad, model_ema, mixup_fn
File "/home/users/yuxin.fang/vt/deit/engine.py", line 39, in train_one_epoch
samples, targets = mixup_fn(samples, targets)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/timm/data/mixup.py", line 217, in call
target = mixup_target(target, self.num_classes, lam, self.label_smoothing)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/timm/data/mixup.py", line 27, in mixup_target
return y1 * lam + y2 * (1. - lam)
RuntimeError: CUDA error: device-side assert triggered
gpu-dev006:25487:25487 [1] init.cc:924 NCCL WARN Cuda failure 'device-side assert triggered'
terminate called after throwing an instance of 'std::runtime_error'
what(): NCCL error in: /opt/conda/conda-bld/pytorch_1607370144807/work/torch/lib/c10d/../c10d/NCCLUtils.hpp:136, unhandled cuda error, NCCL version 2.7.8
gpu-dev006:25489:25489 [3] init.cc:924 NCCL WARN Cuda failure 'device-side assert triggered'
terminate called after throwing an instance of 'std::runtime_error'
what(): NCCL error in: /opt/conda/conda-bld/pytorch_1607370144807/work/torch/lib/c10d/../c10d/NCCLUtils.hpp:136, unhandled cuda error, NCCL version 2.7.8
Traceback (most recent call last):
File "main.py", line 335, in
main(args)
File "main.py", line 295, in main
args.clip_grad, model_ema, mixup_fn
File "/home/users/yuxin.fang/vt/deit/engine.py", line 42, in train_one_epoch
outputs = model(samples)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 619, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/timm/models/vision_transformer.py", line 281, in forward
x = self.forward_features(x)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/timm/models/vision_transformer.py", line 267, in forward_features
x = self.patch_embed(x)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/timm/models/vision_transformer.py", line 165, in forward
x = self.proj(x).flatten(2).transpose(1, 2)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 423, in forward
return self._conv_forward(input, self.weight)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 420, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED
Traceback (most recent call last):
File "main.py", line 335, in
main(args)
File "main.py", line 295, in main
args.clip_grad, model_ema, mixup_fn
File "/home/users/yuxin.fang/vt/deit/engine.py", line 42, in train_one_epoch
outputs = model(samples)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 619, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/timm/models/vision_transformer.py", line 281, in forward
x = self.forward_features(x)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/timm/models/vision_transformer.py", line 267, in forward_features
x = self.patch_embed(x)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/timm/models/vision_transformer.py", line 165, in forward
x = self.proj(x).flatten(2).transpose(1, 2)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 423, in forward
return self._conv_forward(input, self.weight)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 420, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED
gpu-dev006:25488:25488 [2] init.cc:924 NCCL WARN Cuda failure 'device-side assert triggered'
terminate called after throwing an instance of 'std::runtime_error'
what(): NCCL error in: /opt/conda/conda-bld/pytorch_1607370144807/work/torch/lib/c10d/../c10d/NCCLUtils.hpp:136, unhandled cuda error, NCCL version 2.7.8
gpu-dev006:25486:25486 [0] init.cc:924 NCCL WARN Cuda failure 'device-side assert triggered'
terminate called after throwing an instance of 'std::runtime_error'
what(): NCCL error in: /opt/conda/conda-bld/pytorch_1607370144807/work/torch/lib/c10d/../c10d/NCCLUtils.hpp:136, unhandled cuda error, NCCL version 2.7.8
Traceback (most recent call last):
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/distributed/launch.py", line 260, in
main()
File "/home/users/yuxin.fang/anaconda3/envs/deit92/lib/python3.7/site-packages/torch/distributed/launch.py", line 256, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/home/users/yuxin.fang/anaconda3/envs/deit92/bin/python', '-u', 'main.py', '--model', 'deit_tiny_patch16_224', '--batch-size', '256', '--data-path', '/home/public_data/zhigang.yang/data/orig_data/imagenet']' died with <Signals.SIGABRT: 6>.
(deit92) [yuxin.fang@gpu-dev006 deit]$
Since the DeiT's implementation is heavily depends on the timm
, so I run the training script of EfficientNet_B0 using timm
on the same machine under the same env, with 0 warning & 0 error.
Could you help me fix this? Thanks.
Hi
Thank you for you great work.
I'm trying to reproduce performance on ImageNet.
I have a question on training setting.
In the paper, you mentioned that
Formally it means that we have 100 epochs, but each is 3x longer because of the repeated augmentations. We prefer to refer to this as 300 epochs
.
In the code, repeated augment is used by default
Line 106 in 4e91d25
Line 34 in 4e91d25
As I understand, it should be 100 epochs by the paper.
Is there any code that reduces actual training epochs ? or Should I train it for 300 epochs to reproduce the performance ?
Hi there,
I wonder how do you decide on the denominator here? You have currently set it as 512.0:
linear_scaled_lr = args.lr * args.batch_size * utils.get_world_size() / 512.0
As the title implies
As title.
I wonder that do you add L2 weight decay on the distillation token?
The vision transformer excludes cls_token and pos_embed in computing weight decay.
https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/vision_transformer.py#L347
Thanks.
Hi , how can i use model DeIT base for image with size 384 ? I just typed the script with " deit_base_patch16_384" but it Cannot find callable deit_base_patch16_384 in hubconf
Hi,
Thanks for release the code of this great work. I found that in table 6 of the paper that DeiT-B
trained on 224x224 images achieved top1 acc of 81.8, which ViT-B/16
achieved top1 acc 77.9(trained on 384x384 images). Do DeiT-B
and ViT-B/16
have same model structure? If yes, why is ViT-B/16
achieves smaller acc even trained on large images?
Hi, what is the best way to make a single image inference on a pre-trained model?
Hi, I'm trying to validate DeiT on the ImageNet validation set and I can't get the same accuracy values as you reported. A launch of python main.py --eval --resume https://dl.fbaipublicfiles.com/deit/deit_base_patch16_224-b5f2ef4d.pth --data-path /path/to/imagenet
gives 80.985% top1 accuracy, while it should be 81.846 according to the tutorial in the README. the timm
version is 0.3.2
as it should be. If the code of DeiT is correct then there's only one place for mistakes -- the ImageNet dataset. The class names in the validation folder look as follows 000 001 ... 999
, which means they are sorted in numerical order. Probably something is wrong with the names. Here's the val_log.txt file. Have you guys encountered a similar issue? Thanx
I need to insert custom layers between the transformer modules and classify among K(!=1000) classes.
For that, If I try to remove last FC layer and loop over other modules manually, it yields the output of size (batch_size, 196, 768) instead of expected (batch_size, 768):
Removing last layer:
self.model = torch.hub.load('facebookresearch/deit:main', 'deit_base_patch16_224', pretrained=True)
self.model = nn.Sequential(*list(self.model.children())[:-1])
Manual looping:
input: (batch_size, 3, 224, 224)
output = self.model[1](self.model[0](input)) # (patch_embed & pos_drop; output_size: (batch_size, 196, 768))
for i in range(0, 12):
output = self.model[2][i](output) # transformer blocks; output_size: (batch_size, 196, 768)
output = self.model[3](output) # LayerNorm; output_size: (batch_size, 196, 768)
Doing self.model(input) works as expected. (i.e. no manual looping).
Am I doing something wrong? This usually works for other torchvision models.
I have reproduced the small and tiny model but met with problems for reproducing the base model with 224 and 384 image size. With a large probability, the loss came to NAN after training with few epochs.
My setting is 16 GPUs and the batch size is 64 on each GPU and I do not change any hyper-parameters in run_with_submitit.py
. Do you have any idea to solve this problem?
Thanks for your help.
Traceback (most recent call last):
File "run_with_submitit.py", line 131, in
main()
File "run_with_submitit.py", line 116, in main
**kwargs
File "/opt/tiger/conda/lib/python3.7/site-packages/submitit/core/core.py", line 638, in update_parameters
self._internal_update_parameters(**kwargs)
File "/opt/tiger/conda/lib/python3.7/site-packages/submitit/auto/auto.py", line 197, in _internal_update_parameters
self._executor._internal_update_parameters(**parameters)
File "/opt/tiger/conda/lib/python3.7/site-packages/submitit/local/local.py", line 158, in _internal_update_parameters
raise ValueError("LocalExecutor can use only one node. Use nodes=1")
ValueError: LocalExecutor can use only one node. Use nodes=1
Accuracy of the network on the 50000 test images: 71.9%
Max accuracy: 71.95%
Training time 1 day, 15:01:41
hi, the accuracy of the tiny model I trained is 71.95, which cannot reach 72.2
Traceback (most recent call last):
File "run_with_submitit.py", line 130, in
main()
File "run_with_submitit.py", line 89, in main
args.job_dir = get_shared_folder() / "%j"
File "run_with_submitit.py", line 40, in get_shared_folder
raise RuntimeError("No shared folder available")
RuntimeError: No shared folder available
Hi, are the results tested on Val sets? And are the results of compared methods also Val results?
Hi,
Is there any way to solve an image regression problem with deit?
A problem like "age prediction based on image" or similar.
Thanks.
I notice in your paper, you trained 300 epochs. Is this the best accuracy? or by training more epochs the accuracy will increase?
Hi, I want to finetune the deit_base_patch16_384 model in Imagenet with batch size =64 and 128.
Basically, I want to follow
python run_with_submitit.py --model deit_base_patch16_384 --batch-size 32 --finetune https://dl.fbaipublicfiles.com/deit/deit_base_patch16_224-b5f2ef4d.pth --input-size 384 --use_volta32 --nodes 2 --lr 5e-6 --weight-decay 1e-8 --epochs 30 --min-lr 5e-6
But I only have one gpu with only 64 or 128 can be set as the batch size. So I use
python main.py --model deit_base_patch16_224 --batch-size 64 --finetune deit_base_patch16_224-b5f2ef4d.pth --input-size 224 --lr 5e-8 --weight-decay 1e-8 --epochs 30 --min-lr 5e-8 --data-path data/imagenet/
--batch-size 64 --min-lr 5e-8
--batch-size 128 --min-lr 5e-7
Am I right? How can I set the lr respectively?
Thanks.
Thanks for great work.
I tried to reproduce learning of tiny, small and base models using your code. The only parameter I had to change is batch size due to resource constraints. I successfully reproduced results for small and tiny networks but have big accuracy drop for base.
I got:
Tiny - Acc@1 72.172 Acc@5 91.188 loss 1.222 Max accuracy: 72.31% (batch size 256 per gpu, 8 gpus)
Small - Acc@1 79.786 Acc@5 95.008 loss 0.880 Max accuracy: 79.84% (batch size 144 per gpu, 8 gpus)
Base - Acc@1 78.568 Acc@5 93.966 loss 1.048 Max accuracy: 78.78% (batch size 60 per gpu, 8 gpus)
On learning curve I see that base model starts overfitting. Base model has higher test loss and lower train loss than small model. Could be smaller batch size the reason of such big drop of accuracy (-3 % Acc@1)? Did you reproduce results of base model with this code and training parameters?
Hi, I follow the training command:
python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --model deit_small_patch16_224 --batch-size 256 --data-path /path/to/imagenet
and get the final results:
Acc@1 74.852 Acc@5 91.862 loss 1.143
Max accuracy: 75.19%
Training time: 2 days, 15:49:59
fail to reproduce 79.8% reported in the paper. Is there any further adjustments I need to do to reproduce the 79.8% result?
Thanks!
Can you add a tutorial about how to do a transfer learning ? Thank for your excellent work !
Hi, thanks for your wonderful work!
I wonder the Acc-FLOPs trade-off of DeiT. Will you consider provide this data in the future?
Hi,
Could you add some instructions on how to fine-tune the pretrained model?
Thanks in advance
I'm trying out the code with a dataset that has about 190.000 training images and about 81.000 validation images. With a batch size of 64 and 8 GPUs, the progress stats report
372 steps for a training epoch and
846 steps for an eval epoch.
372*8*64=190464
846*64*1.5=81216
Also nvidia-smi reports all except one gpu are utilized during the eval step. As a quickfix I just evaluate every 10th epoch now. But it would be great if this could be parallelized.
As the title implies, it would be great to share the training log of imagenet.
I'm trying out the code with a custom dataset that has about 8k training images and 550 validation images. Is it enough for this method ?
Can you please add a google colab for inference thanks!
Hi
I have read distillation code and I think there is some error
As I know, no_weight_decay
function of timm vision transformer doesn't covers the distillation token
So, no_weight_decay
function has to be overrided in DistilledVisionTransformer
@torch.jit.ignore
def no_weight_decay(self):
return {'pos_embed', 'cls_token', 'dist_token'}
Hi,
Is there any way to solve an image regression problem with deit?
A problem like "age prediction based on image" or similar.
Thanks.
Could share us code for distillation part? Your paper【Training data-efficient image transformers
& distillation through attention】is great,and I find 【data-efficient】part in your code, but without distillation part.
Hi, I try to train Tiny Deit on ImageNet with
ython -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --model deit_tiny_patch16_224 --batch-size 256 --data-path /path/to/imagenet --output_dir /path/to/save
.
All hyperparameters are default. But I only get 59% top-one acc. Am I missing something important? Could you help me?
I upload training log 'log.txt'
log.txt
best wish.
Great work! and thanks for sharing the codes.
I am trying to re-train Deit base model but I encountered some issues.
May I ask for your insights?
I can reproduce the reported results 81.8% with all default setting; however, the performance degrades a lot if I change two very minor hyperparameters
Here is the test accuracy over epochs
The orange line is the default setting. (81.8%)
The blue line is batch size 512. (78.8%)
The green line is using 10 epochs for warmup. (79.2%)
Zoom in for the first 50 epochs
For the default setting, it seems that the model is going to diverge around the 6-th epoch but it recovers later, and then it eventually achieve pretty good results. (81.8%)
However, when using smaller batch size or warmup for additional 5 epochs, the performance degrades ~3%
I wonder that do you observe the same trend? and do you have any insights into why two small changes I made will affect so much?
My env:
pytorch 1.7, timm 0.3.2, torchvision 0.8
Thanks.
Hi,
I think VisionTransformer.no_weight_decay()
is not used as intended.
https://github.com/rwightman/pytorch-image-models/blob/f8463b8fa9c0490db093b36acfce71fa2363b8c3/timm/models/vision_transformer.py#L255
When using timm
, optimizer should be created before model being wrapped by DDP, because model.no_weight_decay()
is called when creating optimizer, and DDP doesn't have attribute no_weight_decay
.
https://github.com/rwightman/pytorch-image-models/blob/f8463b8fa9c0490db093b36acfce71fa2363b8c3/timm/optim/optim_factory.py#L45
if hasattr(model, 'no_weight_decay'):
skip = model.no_weight_decay()
Since DDP doesn't have attribute no_weight_decay
, model.no_weight_decay()
will not be called in create_optimizer
and thus weight_decay is applied to all the weights including {'pos_embed', 'cls_token'}
.
A quick fix could be changing
Line 257 in 30eb318
optimizer = create_optimizer(args, model_without_ddp)
.I found that the model trained does not work well in transfer, I suspect it might because the model in this repo does not come with pre_logits
At line https://github.com/facebookresearch/deit/blob/main/losses.py#L54
The sum of KLDiv is divided by outputs_kd.numel()
, should it be divided by output_kd.size(0)
? That is average over batch only.
The current code works as setting reduction='mean'
.
In pytorch 1.7 documentation, https://pytorch.org/docs/stable/generated/torch.nn.KLDivLoss.html,
the last note pointed out batchmean
equal to the real math definition of KLDiv.
Hi, the code will always stay here when I using multi node and training to the fifth epoch. The gpu utilization will suddenly become 0
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.