Comments (4)
hi, @tzt101,
Could you paste the printed configuration for your job?
from cvt.
This is the configuration, I just keep the default settings.
AMP:
ENABLED: true
MEMORY_FORMAT: nchw
AUG:
COLOR_JITTER:
- 0.4
- 0.4
- 0.4
- 0.1
- 0.0
DROPBLOCK_BLOCK_SIZE: 7
DROPBLOCK_KEEP_PROB: 1.0
DROPBLOCK_LAYERS: - 3
- 4
GAUSSIAN_BLUR: 0.0
GRAY_SCALE: 0.0
INTERPOLATION: 2
MIXCUT: 1.0
MIXCUT_AND_MIXUP: false
MIXCUT_MINMAX: []
MIXUP: 0.8
MIXUP_MODE: batch
MIXUP_PROB: 1.0
MIXUP_SWITCH_PROB: 0.5
RATIO: - 0.75
- 1.3333333333333333
SCALE: - 0.08
- 1.0
TIMM_AUG:
AUTO_AUGMENT: rand-m9-mstd0.5-inc1
COLOR_JITTER: 0.4
HFLIP: 0.5
INTERPOLATION: bicubic
RE_COUNT: 1
RE_MODE: pixel
RE_PROB: 0.25
RE_SPLIT: false
USE_LOADER: true
USE_TRANSFORM: false
VFLIP: 0.0
BASE: - ''
CUDNN:
BENCHMARK: true
DETERMINISTIC: false
ENABLED: true
DATASET:
DATASET: imagenet
DATA_FORMAT: jpg
LABELMAP: ''
ROOT: /home/tzt/dataset/imagenet/
SAMPLER: default
TARGET_SIZE: -1
TEST_SET: val
TEST_TSV_LIST: []
TRAIN_SET: train
TRAIN_TSV_LIST: []
DATA_DIR: ''
DEBUG:
DEBUG: false
DIST_BACKEND: nccl
FINETUNE:
BASE_LR: 0.003
BATCH_SIZE: 512
EVAL_EVERY: 3000
FINETUNE: false
FROZEN_LAYERS: []
LR_SCHEDULER:
DECAY_TYPE: step
TRAIN_MODE: true
USE_TRAIN_AUG: false
GPUS: - 0
INPUT:
MEAN:- 0.485
- 0.456
- 0.406
STD: - 0.229
- 0.224
- 0.225
LOSS:
LABEL_SMOOTHING: 0.1
LOSS: softmax
MODEL:
INIT_WEIGHTS: true
NAME: cls_cvt
NUM_CLASSES: 1000
PRETRAINED: ''
PRETRAINED_LAYERS: - '*'
SPEC:
ATTN_DROP_RATE:- 0.0
- 0.0
- 0.0
CLS_TOKEN: - false
- false
- true
DEPTH: - 1
- 2
- 10
DIM_EMBED: - 64
- 192
- 384
DROP_PATH_RATE: - 0.0
- 0.0
- 0.1
DROP_RATE: - 0.0
- 0.0
- 0.0
INIT: trunc_norm
KERNEL_QKV: - 3
- 3
- 3
MLP_RATIO: - 4.0
- 4.0
- 4.0
NUM_HEADS: - 1
- 3
- 6
NUM_STAGES: 3
PADDING_KV: - 1
- 1
- 1
PADDING_Q: - 1
- 1
- 1
PATCH_PADDING: - 2
- 1
- 1
PATCH_SIZE: - 7
- 3
- 3
PATCH_STRIDE: - 4
- 2
- 2
POS_EMBED: - false
- false
- false
QKV_BIAS: - true
- true
- true
QKV_PROJ_METHOD: - dw_bn
- dw_bn
- dw_bn
STRIDE_KV: - 2
- 2
- 2
STRIDE_Q: - 1
- 1
- 1
MODEL_SUMMARY: false
MULTIPROCESSING_DISTRIBUTED: true
NAME: cvt-13-224x224
OUTPUT_DIR: OUTPUT/
PIN_MEMORY: true
PRINT_FREQ: 500
RANK: 0
TEST:
BATCH_SIZE_PER_GPU: 32
CENTER_CROP: true
IMAGE_SIZE:
- 224
- 224
INTERPOLATION: 3
MODEL_FILE: ''
REAL_LABELS: false
VALID_LABELS: ''
TRAIN:
AUTO_RESUME: true
BATCH_SIZE_PER_GPU: 128
BEGIN_EPOCH: 0
CHECKPOINT: ''
CLIP_GRAD_NORM: 0.0
DETECT_ANOMALY: false
END_EPOCH: 300
EVAL_BEGIN_EPOCH: 0
GAMMA1: 0.99
GAMMA2: 0.0
IMAGE_SIZE: - 224
- 224
LR: 0.16
LR_SCHEDULER:
ARGS:
cooldown_epochs: 10
decay_rate: 0.1
epochs: 300
min_lr: 1.0e-05
sched: cosine
warmup_epochs: 5
warmup_lr: 1.0e-06
METHOD: timm
MOMENTUM: 0.9
NESTEROV: true
OPTIMIZER: adamW
OPTIMIZER_ARGS: {}
SAVE_ALL_MODELS: false
SCALE_LR: true
SHUFFLE: true
WD: 0.05
WITHOUT_WD_LIST: - bn
- bias
- ln
VERBOSE: true
WORKERS: 6
from cvt.
it seems that you are using a larger LR.
If you specify BATCH_SIZE_PER_GPU to 128, you should specify LR to 0.000125.
The LR in our config is with respect to BATCH_SIZE_PER_GPU. You are using a much larger LR than our original config. I guess that's the reason you got NaN error.
from cvt.
Thank you very much! I will try to use small lr later.
from cvt.
Related Issues (20)
- About Cls_cvt.py HOT 2
- what should I change if I want to use a data set with images of 750* 184 HOT 1
- 22k model
- About the pretrained model HOT 4
- Question about the class token
- η²ΎεΊ¦
- As for Cifar10 or Cifar100
- How to calculate the flops of the model?
- recommended torch version may be wrong
- modelzoo not available!!!!! HOT 1
- Recommend change the code
- What's the accuracy of CvT-13 without pre-trained on CIFAR10
- model release
- How can I use CvT architecture to perform Semantic Segmentation?
- bugs when eval
- The config of W24 of finetune on 1k
- pretrained models are not available.
- the flops computed by this code don't match that in the paper
- code mismatch with the theory HOT 10
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cvt.