Comments (4)
Hi,
The only thing I can think of for now is that we used PyTorch 1.7.0 and torchvision 0.8.1, but this shouldn't explain the big drop in accuracy that you are seing (I couldn't find any note in the release notes of PyTorch which could indicate that a bug was fixed in 1.7.0 that could have affected the accuracies).
I'm going to be running a few more trainings with the released code and I'll keep you updated of what I get.
from deit.
Hi,
Can you paste your environment here?
Here are the logs for DeiT small with the latest version of the code that I have (which should be fairly similar to the version that we released, maybe with a few arguments removed)
* Acc@1 79.828 Acc@5 95.076 loss 0.882
Max accuracy: 79.84%
The command-line arguments that we used are here
Namespace(aa='rand-m9-mstd0.5-inc1', batch_size=64, clip_grad=None, color_jitter=0.4, comment='', cooldown_epochs=10, cutmix=1.0, cutmix_minmax=None, data_path='/datasets01_101/imagenet_full_size/061417/', data_set='IMNET', decay_epochs=30, decay_rate=0.1, device='cuda', dist_backend='nccl', dist_url='file:///checkpoint/fmassa/experiments/0f45078640694b86abbf9c85fef17611_init', distributed=True, drop=0.0, drop_block=None, drop_path=0.1, epochs=300, eval=False, gpu=0, inat_category='name', input_size=224, job_dir=PosixPath('/checkpoint/fmassa/experiments/%j'), lr=0.0005,lr_noise=None, lr_noise_pct=0.67, lr_noise_std=1.0, min_lr=1e-05, mixup=0.8, mixup_mode='batch', mixup_prob=1.0, mixup_switch_prob=0.5, model='deit_small_patch16_224', model_ema=True, model_ema_decay=0.99996, model_ema_force_cpu=False, momentum=0.9, ngpus=8, nodes=2, num_workers=10, opt='adamw', opt_betas=None, opt_eps=1e-08, output_dir=PosixPath('/checkpoint/fmassa/experiments/34020965'), partition='learnfair', patience_epochs=10, pin_mem=True, rank=0, recount=1, remode='pixel', repeated_aug=True, reprob=0.25, resplit=False, resume='', sched='cosine', seed=0, smoothing=0.1, start_epoch=0, timeout=2800, train_interpolation='bicubic', use_volta32=False, warmup_epochs=5, warmup_lr=1e-06, weight_decay=0.05, world_size=16)
In this run we used 16 GPUs (and a 4x smaller batch size), can you double-check if you see anything different? Otherwise I can try to run this again on 4 GPUs with the version of the code that we opensourced to see what I get
from deit.
My environments:
Python 3.7
torch 1.6
torchvision 0.7
timm 0.3.2
Here are the arguments I used
Namespace(aa='rand-m9-mstd0.5-inc1', batch_size=256, clip_grad=None, color_jitter=0.4, cooldown_epochs=10, cutmix=1.0, cutmix_minmax=None, data_path='../data/Imagenet/', data_set='IMNET', decay_epochs=30, decay_rate=0.1, device='cuda', dist_backend='nccl', dist_url='env://', distributed=True, drop=0.0, drop_block=None, drop_path=0.1, epochs=300, eval=False, gpu=0, inat_category='name', input_size=224, lr=0.0005, lr_noise=None, lr_noise_pct=0.67, lr_noise_std=1.0, min_lr=1e-05, mixup=0.8, mixup_mode='batch', mixup_prob=1.0, mixup_switch_prob=0.5, model='deit_small_patch16_224', model_ema=True, model_ema_decay=0.99996, model_ema_force_cpu=False, momentum=0.9, num_workers=10, opt='adamw', opt_betas=None, opt_eps=1e-08, output_dir='', patience_epochs=10, pin_mem=True, rank=0, recount=1, remode='pixel', repeated_aug=True, reprob=0.25, resplit=False, resume='', sched='cosine', seed=0, smoothing=0.1, start_epoch=0, train_interpolation='bicubic', warmup_epochs=5, warmup_lr=1e-06, weight_decay=0.05, world_size=4)
The main differences here is only on batch-size, where I use 256 with 4 GPUs. I guess I can try to scale up lr to 4x to match with the batch size I used here and see if the performance could be better. Any ideas from your side?
from deit.
I've just got the results from training on 4 and 16 GPUs, with the default commands. They match the reported results.
For 4 GPUs
python run_with_submitit.py --model deit_small_patch16_224 --batch-size 256 --nodes 1 --ngpus 4 --use_volta32
gives
* Acc@1 79.860 Acc@5 94.950 loss 0.885
and for 16 GPUs
python run_with_submitit.py --model deit_small_patch16_224
I got
* Acc@1 79.790 Acc@5 94.880 loss 0.883
To facilitate comparison / reproducibility, I'm pasting here the training logs for both runs in https://gist.github.com/fmassa/0dbd0184a0adb904ef42277b487d8b53
Also, here is the result of the `conda list` from the environment I used
alembic 1.4.3 <pip>
appdirs 1.4.4 <pip>
astor 0.8.1 <pip>
attrs 20.3.0 <pip>
black 20.8b1 <pip>
blas 1.0 mkl
ca-certificates 2020.12.8 h06a4308_0
certifi 2020.12.5 py37h06a4308_0
click 7.1.2 <pip>
cloudpickle 1.6.0 <pip>
contextlib2 0.6.0.post1 <pip>
cudatoolkit 10.1.243 h6bb024c_0
dataclasses 0.6 <pip>
dumbo 0.1.1 <pip>
flake8 3.8.4 <pip>
freetype 2.10.4 h5ab3b9f_0
future 0.18.2 <pip>
git-archive-all 1.22.0 <pip>
importlib-metadata 3.3.0 <pip>
intel-openmp 2020.2 254
jpeg 9b h024ee3a_2
lcms2 2.11 h396b838_0
ld_impl_linux-64 2.33.1 h53a641e_7
libedit 3.1.20191231 h14c3975_1
libffi 3.3 he6710b0_2
libgcc-ng 9.1.0 hdf63c60_0
libpng 1.6.37 hbc83047_0
libstdcxx-ng 9.1.0 hdf63c60_0
libtiff 4.1.0 h2733197_1
libuv 1.40.0 h7b6447c_0
lz4-c 1.9.2 heb0550a_3
Mako 1.1.3 <pip>
MarkupSafe 1.1.1 <pip>
mccabe 0.6.1 <pip>
mkl 2020.2 256
mkl-service 2.3.0 py37he8ac12f_0
mkl_fft 1.2.0 py37h23d657b_0
mkl_random 1.1.1 py37h0573a6f_0
mypy-extensions 0.4.3 <pip>
ncurses 6.2 he6710b0_1
ninja 1.10.2 py37hff7bd54_0
numpy 1.19.2 py37h54aff64_0
numpy-base 1.19.2 py37hfa32c7d_0
olefile 0.46 py_0
openssl 1.1.1i h27cfd23_0
pathspec 0.8.1 <pip>
pillow 8.0.1 py37he98fc37_0
pip 20.3.3 py37h06a4308_0
pycodestyle 2.6.0 <pip>
pyflakes 2.2.0 <pip>
python 3.7.9 h7579374_0
python-dateutil 2.8.1 <pip>
python-editor 1.0.4 <pip>
pytorch 1.7.0 py3.7_cuda10.1.243_cudnn7.6.3_0 pytorch
readline 8.0 h7b6447c_0
regex 2020.11.13 <pip>
setuptools 51.0.0 py37h06a4308_2
six 1.15.0 py37h06a4308_0
SQLAlchemy 1.3.21 <pip>
sqlite 3.33.0 h62c20be_0
submitit 1.1.5 <pip>
tabulate 0.8.7 <pip>
timm 0.3.2 <pip>
tk 8.6.10 hbc83047_0
toml 0.10.2 <pip>
torchvision 0.8.1 py37_cu101 pytorch
typed-ast 1.4.1 <pip>
typing_extensions 3.7.4.3 py_0
wheel 0.36.2 pyhd3eb1b0_0
xz 5.2.5 h7b6447c_0
zipp 3.4.0 <pip>
zlib 1.2.11 h7b6447c_3
zstd 1.4.5 h9ceee32_0
Given that I was able to match the reported accuracies with the released codebase, I'm closing this issue, but let us know if you have any further questions.
from deit.
Related Issues (20)
- What are the hyperparameters for DeiT-III (epoch 400 or 600)?
- The ablation experiment of DeiT HOT 2
- how to implement cosub training use deit-III
- how to implement cosub training use deit-III HOT 2
- DeiT depth 24 (CaiT - TABLE 1) HOT 2
- ImageNet21K data preparation for pre-training HOT 5
- batch_size flag HOT 2
- Code for cosub
- How to launch a training of CAIT models ?
- TracerWarning
- Hi,Why can't I find deit_tiny_distilled_patch16_224 in hubconf
- Checkpoints of IN21K pretrained deit III
- ViT-B Training for DeiT HOT 2
- Slow Training HOT 2
- random.seed(seed) in line 205 is commented
- Inclusion of Transformers Need Registers
- Training
- Question about different seeds per gpu with DDP
- Gradient accumulation code
- Will you be releasing the accuracy of the official deit III framework trained tiny version on IN1k?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deit.