Giter Site home page Giter Site logo

NAN loss about cvt HOT 4 CLOSED

microsoft avatar microsoft commented on August 28, 2024
NAN loss

from cvt.

Comments (4)

leoxiaobin avatar leoxiaobin commented on August 28, 2024

hi, @tzt101,

Could you paste the printed configuration for your job?

from cvt.

tzt101 avatar tzt101 commented on August 28, 2024

This is the configuration, I just keep the default settings.
AMP:
ENABLED: true
MEMORY_FORMAT: nchw
AUG:
COLOR_JITTER:

  • 0.4
  • 0.4
  • 0.4
  • 0.1
  • 0.0
    DROPBLOCK_BLOCK_SIZE: 7
    DROPBLOCK_KEEP_PROB: 1.0
    DROPBLOCK_LAYERS:
  • 3
  • 4
    GAUSSIAN_BLUR: 0.0
    GRAY_SCALE: 0.0
    INTERPOLATION: 2
    MIXCUT: 1.0
    MIXCUT_AND_MIXUP: false
    MIXCUT_MINMAX: []
    MIXUP: 0.8
    MIXUP_MODE: batch
    MIXUP_PROB: 1.0
    MIXUP_SWITCH_PROB: 0.5
    RATIO:
  • 0.75
  • 1.3333333333333333
    SCALE:
  • 0.08
  • 1.0
    TIMM_AUG:
    AUTO_AUGMENT: rand-m9-mstd0.5-inc1
    COLOR_JITTER: 0.4
    HFLIP: 0.5
    INTERPOLATION: bicubic
    RE_COUNT: 1
    RE_MODE: pixel
    RE_PROB: 0.25
    RE_SPLIT: false
    USE_LOADER: true
    USE_TRANSFORM: false
    VFLIP: 0.0
    BASE:
  • ''
    CUDNN:
    BENCHMARK: true
    DETERMINISTIC: false
    ENABLED: true
    DATASET:
    DATASET: imagenet
    DATA_FORMAT: jpg
    LABELMAP: ''
    ROOT: /home/tzt/dataset/imagenet/
    SAMPLER: default
    TARGET_SIZE: -1
    TEST_SET: val
    TEST_TSV_LIST: []
    TRAIN_SET: train
    TRAIN_TSV_LIST: []
    DATA_DIR: ''
    DEBUG:
    DEBUG: false
    DIST_BACKEND: nccl
    FINETUNE:
    BASE_LR: 0.003
    BATCH_SIZE: 512
    EVAL_EVERY: 3000
    FINETUNE: false
    FROZEN_LAYERS: []
    LR_SCHEDULER:
    DECAY_TYPE: step
    TRAIN_MODE: true
    USE_TRAIN_AUG: false
    GPUS:
  • 0
    INPUT:
    MEAN:
    • 0.485
    • 0.456
    • 0.406
      STD:
    • 0.229
    • 0.224
    • 0.225
      LOSS:
      LABEL_SMOOTHING: 0.1
      LOSS: softmax
      MODEL:
      INIT_WEIGHTS: true
      NAME: cls_cvt
      NUM_CLASSES: 1000
      PRETRAINED: ''
      PRETRAINED_LAYERS:
    • '*'
      SPEC:
      ATTN_DROP_RATE:
      • 0.0
      • 0.0
      • 0.0
        CLS_TOKEN:
      • false
      • false
      • true
        DEPTH:
      • 1
      • 2
      • 10
        DIM_EMBED:
      • 64
      • 192
      • 384
        DROP_PATH_RATE:
      • 0.0
      • 0.0
      • 0.1
        DROP_RATE:
      • 0.0
      • 0.0
      • 0.0
        INIT: trunc_norm
        KERNEL_QKV:
      • 3
      • 3
      • 3
        MLP_RATIO:
      • 4.0
      • 4.0
      • 4.0
        NUM_HEADS:
      • 1
      • 3
      • 6
        NUM_STAGES: 3
        PADDING_KV:
      • 1
      • 1
      • 1
        PADDING_Q:
      • 1
      • 1
      • 1
        PATCH_PADDING:
      • 2
      • 1
      • 1
        PATCH_SIZE:
      • 7
      • 3
      • 3
        PATCH_STRIDE:
      • 4
      • 2
      • 2
        POS_EMBED:
      • false
      • false
      • false
        QKV_BIAS:
      • true
      • true
      • true
        QKV_PROJ_METHOD:
      • dw_bn
      • dw_bn
      • dw_bn
        STRIDE_KV:
      • 2
      • 2
      • 2
        STRIDE_Q:
      • 1
      • 1
      • 1
        MODEL_SUMMARY: false
        MULTIPROCESSING_DISTRIBUTED: true
        NAME: cvt-13-224x224
        OUTPUT_DIR: OUTPUT/
        PIN_MEMORY: true
        PRINT_FREQ: 500
        RANK: 0
        TEST:
        BATCH_SIZE_PER_GPU: 32
        CENTER_CROP: true
        IMAGE_SIZE:
    • 224
    • 224
      INTERPOLATION: 3
      MODEL_FILE: ''
      REAL_LABELS: false
      VALID_LABELS: ''
      TRAIN:
      AUTO_RESUME: true
      BATCH_SIZE_PER_GPU: 128
      BEGIN_EPOCH: 0
      CHECKPOINT: ''
      CLIP_GRAD_NORM: 0.0
      DETECT_ANOMALY: false
      END_EPOCH: 300
      EVAL_BEGIN_EPOCH: 0
      GAMMA1: 0.99
      GAMMA2: 0.0
      IMAGE_SIZE:
    • 224
    • 224
      LR: 0.16
      LR_SCHEDULER:
      ARGS:
      cooldown_epochs: 10
      decay_rate: 0.1
      epochs: 300
      min_lr: 1.0e-05
      sched: cosine
      warmup_epochs: 5
      warmup_lr: 1.0e-06
      METHOD: timm
      MOMENTUM: 0.9
      NESTEROV: true
      OPTIMIZER: adamW
      OPTIMIZER_ARGS: {}
      SAVE_ALL_MODELS: false
      SCALE_LR: true
      SHUFFLE: true
      WD: 0.05
      WITHOUT_WD_LIST:
    • bn
    • bias
    • ln
      VERBOSE: true
      WORKERS: 6

from cvt.

leoxiaobin avatar leoxiaobin commented on August 28, 2024

it seems that you are using a larger LR.
If you specify BATCH_SIZE_PER_GPU to 128, you should specify LR to 0.000125.
The LR in our config is with respect to BATCH_SIZE_PER_GPU. You are using a much larger LR than our original config. I guess that's the reason you got NaN error.

from cvt.

tzt101 avatar tzt101 commented on August 28, 2024

Thank you very much! I will try to use small lr later.

from cvt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.