Giter Site home page Giter Site logo

Comments (4)

pengzhiliang avatar pengzhiliang commented on August 15, 2024 3

Now, I can confirm that it is the problem with my ImageNet data, thank you very much for suggestions!

from deit.

Wallace-222 avatar Wallace-222 commented on August 15, 2024 1

@pengzhiliang could you elaborate a little more on your solution? which version of ImageNet data should we use?

from deit.

fmassa avatar fmassa commented on August 15, 2024

Hi,

Thanks for trying out DeiT and opening this issue.

We have tried to run the code in the same setup as you (PyTorch 1.7.1, torchvision 0.8.2, larger than default batch size) but we always obtain the same results, which are equivalent to the reported results.

Process group: 1 tasks, rank: 0
| distributed init (rank 0): file:///checkpoint/fmassa/experiments/aca5643552e54a709e4221ec0f1d7dc2_init
Namespace(aa='rand-m9-mstd0.5-inc1', batch_size=256, clip_grad=None, color_jitter=0.4, comment='', cooldown_epochs=10, cutmix=1.0, cutmix_minmax=None, data_path='/datasets01_101/imagenet_full_size/061417/', data_set='IMNET', decay_epochs=30,        decay_rate=0.1, device='cuda', dist_backend='nccl', dist_url='file:///checkpoint/fmassa/experiments/aca5643552e54a709e4221ec0f1d7dc2_init', distributed=True, drop=0.0, drop_block=None, drop_path=0.1, epochs=300, eval=True, gpu=0,                    inat_category='name', input_size=224, job_dir=PosixPath('/checkpoint/fmassa/experiments/%j'), lr=0.0005, lr_noise=None, lr_noise_pct=0.67, lr_noise_std=1.0, min_lr=1e-05, mixup=0.8, mixup_mode='batch', mixup_prob=1.0, mixup_switch_prob=0.5,         model='deit_base_patch16_224', model_ema=True, model_ema_decay=0.99996, model_ema_force_cpu=False, momentum=0.9, ngpus=1, nodes=1, num_workers=10, opt='adamw', opt_betas=None, opt_eps=1e-08, output_dir=PosixPath('/checkpoint/fmassa/experiments/     34199672'), partition='dev', patience_epochs=10, pin_mem=True, rank=0, recount=1, remode='pixel', repeated_aug=True, reprob=0.25, resplit=False, resume='https://dl.fbaipublicfiles.com/deit/deit_base_patch16_224-b5f2ef4d.pth', sched='cosine',        seed=0, smoothing=0.1, start_epoch=0, timeout=60, train_interpolation='bicubic', use_volta32=False, warmup_epochs=5, warmup_lr=1e-06, weight_decay=0.05, world_size=1)
Creating model: deit_base_patch16_224
number of params: 86567656
Test:  [  0/131]  eta: 0:28:18  loss: 0.4708 (0.4708)  acc1: 91.1458 (91.1458)  acc5: 98.1771 (98.1771)  time: 12.9632  data: 10.2194  max mem: 4095
Test:  [ 10/131]  eta: 0:06:23  loss: 0.6503 (0.6840)  acc1: 85.4167 (84.6117)  acc5: 97.1354 (97.0644)  time: 3.1661  data: 0.9293  max mem: 4096
Test:  [ 20/131]  eta: 0:05:01  loss: 0.6172 (0.5853)  acc1: 87.5000 (87.7852)  acc5: 97.6562 (97.6439)  time: 2.2052  data: 0.0002  max mem: 4096
Test:  [ 30/131]  eta: 0:04:18  loss: 0.5880 (0.6271)  acc1: 86.9792 (86.6767)  acc5: 97.6562 (97.3790)  time: 2.2326  data: 0.0002  max mem: 4096
Test:  [ 40/131]  eta: 0:03:45  loss: 0.6971 (0.6410)  acc1: 84.3750 (86.1979)  acc5: 97.1354 (97.4403)  time: 2.2295  data: 0.0002  max mem: 4096
Test:  [ 50/131]  eta: 0:03:16  loss: 0.6124 (0.6369)  acc1: 84.8958 (86.2796)  acc5: 97.3958 (97.4826)  time: 2.2186  data: 0.0002  max mem: 4096
Test:  [ 60/131]  eta: 0:02:49  loss: 0.7854 (0.6815)  acc1: 81.5104 (85.2459)  acc5: 96.0938 (97.0116)  time: 2.2190  data: 0.0002  max mem: 4096
Test:  [ 70/131]  eta: 0:02:24  loss: 0.9804 (0.7316)  acc1: 77.6042 (83.8982)  acc5: 94.0104 (96.5449)  time: 2.2215  data: 0.0002  max mem: 4096
Test:  [ 80/131]  eta: 0:01:59  loss: 0.9096 (0.7451)  acc1: 79.1667 (83.6902)  acc5: 94.0104 (96.2867)  time: 2.2262  data: 0.0002  max mem: 4096
Test:  [ 90/131]  eta: 0:01:35  loss: 0.9411 (0.7751)  acc1: 80.4688 (82.8783)  acc5: 93.7500 (96.0079)  time: 2.2260  data: 0.0002  max mem: 4096
Test:  [100/131]  eta: 0:01:12  loss: 0.9518 (0.7919)  acc1: 77.8646 (82.5572)  acc5: 93.4896 (95.8024)  time: 2.2245  data: 0.0002  max mem: 4096
Test:  [110/131]  eta: 0:00:48  loss: 0.9518 (0.8111)  acc1: 78.9062 (82.0993)  acc5: 93.4896 (95.6198)  time: 2.2253  data: 0.0002  max mem: 4096
Test:  [120/131]  eta: 0:00:25  loss: 0.9147 (0.8229)  acc1: 78.9062 (81.7837)  acc5: 94.0104 (95.5428)  time: 2.2208  data: 0.0002  max mem: 4096
Test:  [130/131]  eta: 0:00:02  loss: 0.8777 (0.8247)  acc1: 79.6875 (81.8520)  acc5: 95.3125 (95.5940)  time: 2.1280  data: 0.0001  max mem: 4096
Test: Total time: 0:05:00 (2.2903 s / it)
* Acc@1 81.852 Acc@5 95.594 loss 0.825
Accuracy of the network on the 50000 test images: 81.9%

We've tried running the code on different environments to see if it was a difference in Python version, but it all gave the same results.

The only thing I can think of that might explain the difference you are facing is that your copy of ImageNet might be different. Could you try evaluating one of the torchvision models using the code in https://github.com/pytorch/vision/tree/master/references/classification and see if they match the reported accuracies in https://pytorch.org/docs/stable/torchvision/models.html#classification ? For example a resnet50 or resnet18.
This way we can factor out that if the problem is in your copy of ImageNet or not.

from deit.

pengzhiliang avatar pengzhiliang commented on August 15, 2024

OK, I'll check it and response you in a few hours.
Thank u very much!

from deit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.