Comments (4)
Now, I can confirm that it is the problem with my ImageNet data, thank you very much for suggestions!
from deit.
@pengzhiliang could you elaborate a little more on your solution? which version of ImageNet data should we use?
from deit.
Hi,
Thanks for trying out DeiT and opening this issue.
We have tried to run the code in the same setup as you (PyTorch 1.7.1, torchvision 0.8.2, larger than default batch size) but we always obtain the same results, which are equivalent to the reported results.
Process group: 1 tasks, rank: 0
| distributed init (rank 0): file:///checkpoint/fmassa/experiments/aca5643552e54a709e4221ec0f1d7dc2_init
Namespace(aa='rand-m9-mstd0.5-inc1', batch_size=256, clip_grad=None, color_jitter=0.4, comment='', cooldown_epochs=10, cutmix=1.0, cutmix_minmax=None, data_path='/datasets01_101/imagenet_full_size/061417/', data_set='IMNET', decay_epochs=30, decay_rate=0.1, device='cuda', dist_backend='nccl', dist_url='file:///checkpoint/fmassa/experiments/aca5643552e54a709e4221ec0f1d7dc2_init', distributed=True, drop=0.0, drop_block=None, drop_path=0.1, epochs=300, eval=True, gpu=0, inat_category='name', input_size=224, job_dir=PosixPath('/checkpoint/fmassa/experiments/%j'), lr=0.0005, lr_noise=None, lr_noise_pct=0.67, lr_noise_std=1.0, min_lr=1e-05, mixup=0.8, mixup_mode='batch', mixup_prob=1.0, mixup_switch_prob=0.5, model='deit_base_patch16_224', model_ema=True, model_ema_decay=0.99996, model_ema_force_cpu=False, momentum=0.9, ngpus=1, nodes=1, num_workers=10, opt='adamw', opt_betas=None, opt_eps=1e-08, output_dir=PosixPath('/checkpoint/fmassa/experiments/ 34199672'), partition='dev', patience_epochs=10, pin_mem=True, rank=0, recount=1, remode='pixel', repeated_aug=True, reprob=0.25, resplit=False, resume='https://dl.fbaipublicfiles.com/deit/deit_base_patch16_224-b5f2ef4d.pth', sched='cosine', seed=0, smoothing=0.1, start_epoch=0, timeout=60, train_interpolation='bicubic', use_volta32=False, warmup_epochs=5, warmup_lr=1e-06, weight_decay=0.05, world_size=1)
Creating model: deit_base_patch16_224
number of params: 86567656
Test: [ 0/131] eta: 0:28:18 loss: 0.4708 (0.4708) acc1: 91.1458 (91.1458) acc5: 98.1771 (98.1771) time: 12.9632 data: 10.2194 max mem: 4095
Test: [ 10/131] eta: 0:06:23 loss: 0.6503 (0.6840) acc1: 85.4167 (84.6117) acc5: 97.1354 (97.0644) time: 3.1661 data: 0.9293 max mem: 4096
Test: [ 20/131] eta: 0:05:01 loss: 0.6172 (0.5853) acc1: 87.5000 (87.7852) acc5: 97.6562 (97.6439) time: 2.2052 data: 0.0002 max mem: 4096
Test: [ 30/131] eta: 0:04:18 loss: 0.5880 (0.6271) acc1: 86.9792 (86.6767) acc5: 97.6562 (97.3790) time: 2.2326 data: 0.0002 max mem: 4096
Test: [ 40/131] eta: 0:03:45 loss: 0.6971 (0.6410) acc1: 84.3750 (86.1979) acc5: 97.1354 (97.4403) time: 2.2295 data: 0.0002 max mem: 4096
Test: [ 50/131] eta: 0:03:16 loss: 0.6124 (0.6369) acc1: 84.8958 (86.2796) acc5: 97.3958 (97.4826) time: 2.2186 data: 0.0002 max mem: 4096
Test: [ 60/131] eta: 0:02:49 loss: 0.7854 (0.6815) acc1: 81.5104 (85.2459) acc5: 96.0938 (97.0116) time: 2.2190 data: 0.0002 max mem: 4096
Test: [ 70/131] eta: 0:02:24 loss: 0.9804 (0.7316) acc1: 77.6042 (83.8982) acc5: 94.0104 (96.5449) time: 2.2215 data: 0.0002 max mem: 4096
Test: [ 80/131] eta: 0:01:59 loss: 0.9096 (0.7451) acc1: 79.1667 (83.6902) acc5: 94.0104 (96.2867) time: 2.2262 data: 0.0002 max mem: 4096
Test: [ 90/131] eta: 0:01:35 loss: 0.9411 (0.7751) acc1: 80.4688 (82.8783) acc5: 93.7500 (96.0079) time: 2.2260 data: 0.0002 max mem: 4096
Test: [100/131] eta: 0:01:12 loss: 0.9518 (0.7919) acc1: 77.8646 (82.5572) acc5: 93.4896 (95.8024) time: 2.2245 data: 0.0002 max mem: 4096
Test: [110/131] eta: 0:00:48 loss: 0.9518 (0.8111) acc1: 78.9062 (82.0993) acc5: 93.4896 (95.6198) time: 2.2253 data: 0.0002 max mem: 4096
Test: [120/131] eta: 0:00:25 loss: 0.9147 (0.8229) acc1: 78.9062 (81.7837) acc5: 94.0104 (95.5428) time: 2.2208 data: 0.0002 max mem: 4096
Test: [130/131] eta: 0:00:02 loss: 0.8777 (0.8247) acc1: 79.6875 (81.8520) acc5: 95.3125 (95.5940) time: 2.1280 data: 0.0001 max mem: 4096
Test: Total time: 0:05:00 (2.2903 s / it)
* Acc@1 81.852 Acc@5 95.594 loss 0.825
Accuracy of the network on the 50000 test images: 81.9%
We've tried running the code on different environments to see if it was a difference in Python version, but it all gave the same results.
The only thing I can think of that might explain the difference you are facing is that your copy of ImageNet might be different. Could you try evaluating one of the torchvision models using the code in https://github.com/pytorch/vision/tree/master/references/classification and see if they match the reported accuracies in https://pytorch.org/docs/stable/torchvision/models.html#classification ? For example a resnet50 or resnet18.
This way we can factor out that if the problem is in your copy of ImageNet or not.
from deit.
OK, I'll check it and response you in a few hours.
Thank u very much!
from deit.
Related Issues (20)
- What are the hyperparameters for DeiT-III (epoch 400 or 600)?
- The ablation experiment of DeiT HOT 2
- how to implement cosub training use deit-III
- how to implement cosub training use deit-III HOT 2
- DeiT depth 24 (CaiT - TABLE 1) HOT 2
- ImageNet21K data preparation for pre-training HOT 5
- batch_size flag HOT 2
- Code for cosub
- How to launch a training of CAIT models ?
- TracerWarning
- Hi,Why can't I find deit_tiny_distilled_patch16_224 in hubconf
- Checkpoints of IN21K pretrained deit III
- ViT-B Training for DeiT HOT 2
- Slow Training HOT 2
- random.seed(seed) in line 205 is commented
- Inclusion of Transformers Need Registers
- Training
- Question about different seeds per gpu with DDP
- Gradient accumulation code
- Will you be releasing the accuracy of the official deit III framework trained tiny version on IN1k?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deit.