Giter Site home page Giter Site logo

Very noisy loss of Dino about detrex HOT 4 CLOSED

hg6185 avatar hg6185 commented on June 3, 2024
Very noisy loss of Dino

from detrex.

Comments (4)

rentainhe avatar rentainhe commented on June 3, 2024

Hello,

i am finetuning DINO (SwinB) on a small imbalanced custom dataset. I initially used the 0.0002 LR and the multiplier but then i reduced the LR to 0.0001 like you did within your paper. Also you write that the DINO is very sensitive to Hyperparameter of the LR.

My loss function is very noisy. It decreases in average fast, but I also find that there are high spikes. Did you observe something similiar, or does this could be related to the imbalance of the dataset?

Thank you so much in advance!

Would u like to share your training log with us which may be very helpful!

from detrex.

hg6185 avatar hg6185 commented on June 3, 2024

Hey,
here for a SwinS. I switched due to the smaller size of window attention, since my APs was always very bad. In general, the model converges towards sth. that is good for large objects but rather bad for small objects. I assume that the loss is so noisy because of that. Have you encountered this phenomenon before and do you know if non-optimal learning_rate and small batch_sizes trap the dino model in a non optimal condition? Or even if there are known tricks to improve convergence for smaller objects with DINO?

The two images show training loss and validation loss:

image
image

So far, I also set the image size to 640+.
Thank you already in advance!

The maximum iterations were around 3000 with a dataset of only roughly 1400 (training): I scheduled after 1 and 2k iterations by 0.5 and respectively 0.1.

sh: readlink: command not found
[INFO] Module CUDA/11.3.1 loaded.
sh: readlink: command not found
[INFO] Module GCC/9.4.0 loaded.
[INFO] Module numactl/2.0.14 loaded.

Inactive Modules:

  1. UCX/1.12.1

Due to MODULEPATH changes, the following have been reloaded:

  1. numactl/2.0.14

The following have been reloaded with a version change:

  1. GCCcore/.11.3.0 => GCCcore/.9.4.0 3) zlib/1.2.12 => zlib/1.2.11
  2. binutils/2.38 => binutils/2.36.1

/
[09/05 00:05:09 detectron2]: Rank of current process: 0. World size: 2
[09/05 00:05:11 detectron2]: Environment info:


sys.platform linux
Python 3.8.17 (default, Jul 5 2023, 20:41:08) [GCC 11.2.0]
numpy 1.22.3
detectron2 0.6 @
detectron2/detectron2
Compiler GCC 9.4
CUDA compiler CUDA 11.3
detectron2 arch flags 7.0
DETECTRON2_ENV_MODULE
PyTorch 1.12.1+cu113 @/home/_/miniconda3/envs/fps/lib/python3.8/site-packages/torch
PyTorch debug build False
GPU available Yes
GPU 0,1 Tesla V100-SXM2-16GB (arch=7.0)
Driver version
CUDA_HOME /cvmfs/software.hpc.rwth.de/Linux/RH8/x86_64/intel/skylake_avx512/software/CUDA/11.3.1
Pillow 9.4.0
torchvision 0.13.1+cu113 @/home/cg072483/miniconda3/envs/fps/lib/python3.8/site-packages/torchvision
torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6
fvcore 0.1.5.post20221221
iopath 0.1.9
cv2 4.8.0


PyTorch built with:

  • GCC 9.3
  • C++ Version: 201402
  • Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • LAPACK is enabled (usually provided by MKL)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • CUDA Runtime 11.3
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  • CuDNN 8.3.2 (built against CUDA 11.5)
  • Magma 2.5.2
  • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.12.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

[09/05 00:05:11 detectron2]: Command line arguments: Namespace(confidence_threshold=0.5, config_file='./detrex/projects/dino/configs/dino-swin/dino_swin_small_224_4scale_12ep.py', img_format='RGB', input=None, max_size_test=1333, metadata_dataset='coco_2017_val', min_size_test=800, num_gpus=1, opts=['train.init_checkpoint=./models/dino_swin_small_224_4scale_12ep.pth'], output=None, resume=False, video_input=None, webcam=False)
[09/05 00:05:11 detectron2]: Contents of args.config_file=./detrex/projects/dino/configs/dino-swin/dino_swin_small_224_4scale_12ep.py:
from detrex.config import get_config
from ..models.dino_swin_small_224 import model

get default config

dataloader = get_config("common/data/coco_detr.py").dataloader
optimizer = get_config("common/optim.py").AdamW
lr_multiplier = get_config("common/coco_schedule.py").lr_multiplier_12ep
train = get_config("common/train.py").train

modify training config

train.init_checkpoint = "/path/to/swin_small_patch4_window7_224.pth"
train.output_dir = "./output/dino_swin_small_224_4scale_12ep"

max training iterations

train.max_iter = 90000
train.eval_period = 5000
train.log_period = 20
train.checkpointer.period = 5000

gradient clipping for training

train.clip_grad.enabled = True
train.clip_grad.params.max_norm = 0.1
train.clip_grad.params.norm_type = 2

set training devices

train.device = "cuda"
model.device = train.device

modify optimizer config

optimizer.lr = 1e-4
optimizer.betas = (0.9, 0.999)
optimizer.weight_decay = 1e-4
optimizer.params.lr_factor_func = lambda module_name: 0.1 if "backbone" in module_name else 1

modify dataloader config

dataloader.train.num_workers = 16

please notice that this is total batch size.

surpose you're using 4 gpus for training and the batch size for

each gpu is 16/4 = 4

dataloader.train.total_batch_size = 16

WARNING [09/05 00:05:11 d2.config.lazy]: The config contains objects that cannot serialize to a valid yaml. ./output/dino_swin_small_224_4scale_12ep/config.yaml is human-readable but cannot be loaded.
WARNING [09/05 00:05:11 d2.config.lazy]: Config is saved using cloudpickle at ./output/dino_swin_small_224_4scale_12ep/config.yaml.pkl.
[09/05 00:05:11 detectron2]: Full config saved to ./output/dino_swin_small_224_4scale_12ep/config.yaml
[09/05 00:05:11 d2.utils.env]: Using a generated random seed 11479259

Backbone was sucessfully frozen! Trainable params: 20518719

DINO(
(backbone): SwinTransformer(
(patch_embed): PatchEmbed(
(proj): Conv2d(3, 96, kernel_size=(4, 4), stride=(4, 4))
(norm): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
)
(pos_drop): Dropout(p=0.0, inplace=False)
(layers): ModuleList(
(0): BasicLayer(
(blocks): ModuleList(
(0): SwinTransformerBlock(
(norm1): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=96, out_features=288, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=96, out_features=96, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): Identity()
(norm2): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=96, out_features=384, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=384, out_features=96, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(1): SwinTransformerBlock(
(norm1): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=96, out_features=288, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=96, out_features=96, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.009)
(norm2): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=96, out_features=384, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=384, out_features=96, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
)
(downsample): PatchMerging(
(reduction): Linear(in_features=384, out_features=192, bias=False)
(norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
)
)
(1): BasicLayer(
(blocks): ModuleList(
(0): SwinTransformerBlock(
(norm1): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=192, out_features=576, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=192, out_features=192, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.017)
(norm2): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=192, out_features=768, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=768, out_features=192, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(1): SwinTransformerBlock(
(norm1): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=192, out_features=576, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=192, out_features=192, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.026)
(norm2): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=192, out_features=768, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=768, out_features=192, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
)
(downsample): PatchMerging(
(reduction): Linear(in_features=768, out_features=384, bias=False)
(norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
)
(2): BasicLayer(
(blocks): ModuleList(
(0): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.035)
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(1): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.043)
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(2): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.052)
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(3): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.061)
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(4): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.070)
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(5): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.078)
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(6): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.087)
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(7): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.096)
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(8): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.104)
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(9): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.113)
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(10): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.122)
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(11): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.130)
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(12): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.139)
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(13): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.148)
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(14): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.157)
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(15): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.165)
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(16): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.174)
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(17): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.183)
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
)
(downsample): PatchMerging(
(reduction): Linear(in_features=1536, out_features=768, bias=False)
(norm): LayerNorm((1536,), eps=1e-05, elementwise_affine=True)
)
)
(3): BasicLayer(
(blocks): ModuleList(
(0): SwinTransformerBlock(
(norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=768, out_features=2304, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=768, out_features=768, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.191)
(norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=768, out_features=3072, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=3072, out_features=768, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(1): SwinTransformerBlock(
(norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=768, out_features=2304, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=768, out_features=768, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.200)
(norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=768, out_features=3072, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=3072, out_features=768, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
)
)
)
(norm1): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(norm3): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
(position_embedding): PositionEmbeddingSine()
(neck): ChannelMapper(
(convs): ModuleList(
(0): ConvNormAct(
(conv): Conv2d(192, 256, kernel_size=(1, 1), stride=(1, 1))
(norm): GroupNorm(32, 256, eps=1e-05, affine=True)
)
(1): ConvNormAct(
(conv): Conv2d(384, 256, kernel_size=(1, 1), stride=(1, 1))
(norm): GroupNorm(32, 256, eps=1e-05, affine=True)
)
(2): ConvNormAct(
(conv): Conv2d(768, 256, kernel_size=(1, 1), stride=(1, 1))
(norm): GroupNorm(32, 256, eps=1e-05, affine=True)
)
)
(extra_convs): ModuleList(
(0): ConvNormAct(
(conv): Conv2d(768, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(norm): GroupNorm(32, 256, eps=1e-05, affine=True)
)
)
)
(transformer): DINOTransformer(
(encoder): DINOTransformerEncoder(
(layers): ModuleList(
(0): BaseTransformerLayer(
(attentions): ModuleList(
(0): MultiScaleDeformableAttention(
(dropout): Dropout(p=0.0, inplace=False)
(sampling_offsets): Linear(in_features=256, out_features=256, bias=True)
(attention_weights): Linear(in_features=256, out_features=128, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
)
(ffns): ModuleList(
(0): FFN(
(activation): ReLU(inplace=True)
(layers): Sequential(
(0): Sequential(
(0): Linear(in_features=256, out_features=2048, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.0, inplace=False)
)
(1): Linear(in_features=2048, out_features=256, bias=True)
(2): Dropout(p=0.0, inplace=False)
)
)
)
(norms): ModuleList(
(0): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
)
(1): BaseTransformerLayer(
(attentions): ModuleList(
(0): MultiScaleDeformableAttention(
(dropout): Dropout(p=0.0, inplace=False)
(sampling_offsets): Linear(in_features=256, out_features=256, bias=True)
(attention_weights): Linear(in_features=256, out_features=128, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
)
(ffns): ModuleList(
(0): FFN(
(activation): ReLU(inplace=True)
(layers): Sequential(
(0): Sequential(
(0): Linear(in_features=256, out_features=2048, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.0, inplace=False)
)
(1): Linear(in_features=2048, out_features=256, bias=True)
(2): Dropout(p=0.0, inplace=False)
)
)
)
(norms): ModuleList(
(0): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
)
(2): BaseTransformerLayer(
(attentions): ModuleList(
(0): MultiScaleDeformableAttention(
(dropout): Dropout(p=0.0, inplace=False)
(sampling_offsets): Linear(in_features=256, out_features=256, bias=True)
(attention_weights): Linear(in_features=256, out_features=128, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
)
(ffns): ModuleList(
(0): FFN(
(activation): ReLU(inplace=True)
(layers): Sequential(
(0): Sequential(
(0): Linear(in_features=256, out_features=2048, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.0, inplace=False)
)
(1): Linear(in_features=2048, out_features=256, bias=True)
(2): Dropout(p=0.0, inplace=False)
)
)
)
(norms): ModuleList(
(0): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
)
(3): BaseTransformerLayer(
(attentions): ModuleList(
(0): MultiScaleDeformableAttention(
(dropout): Dropout(p=0.0, inplace=False)
(sampling_offsets): Linear(in_features=256, out_features=256, bias=True)
(attention_weights): Linear(in_features=256, out_features=128, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
)
(ffns): ModuleList(
(0): FFN(
(activation): ReLU(inplace=True)
(layers): Sequential(
(0): Sequential(
(0): Linear(in_features=256, out_features=2048, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.0, inplace=False)
)
(1): Linear(in_features=2048, out_features=256, bias=True)
(2): Dropout(p=0.0, inplace=False)
)
)
)
(norms): ModuleList(
(0): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
)
(4): BaseTransformerLayer(
(attentions): ModuleList(
(0): MultiScaleDeformableAttention(
(dropout): Dropout(p=0.0, inplace=False)
(sampling_offsets): Linear(in_features=256, out_features=256, bias=True)
(attention_weights): Linear(in_features=256, out_features=128, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
)
(ffns): ModuleList(
(0): FFN(
(activation): ReLU(inplace=True)
(layers): Sequential(
(0): Sequential(
(0): Linear(in_features=256, out_features=2048, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.0, inplace=False)
)
(1): Linear(in_features=2048, out_features=256, bias=True)
(2): Dropout(p=0.0, inplace=False)
)
)
)
(norms): ModuleList(
(0): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
)
(5): BaseTransformerLayer(
(attentions): ModuleList(
(0): MultiScaleDeformableAttention(
(dropout): Dropout(p=0.0, inplace=False)
(sampling_offsets): Linear(in_features=256, out_features=256, bias=True)
(attention_weights): Linear(in_features=256, out_features=128, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
)
(ffns): ModuleList(
(0): FFN(
(activation): ReLU(inplace=True)
(layers): Sequential(
(0): Sequential(
(0): Linear(in_features=256, out_features=2048, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.0, inplace=False)
)
(1): Linear(in_features=2048, out_features=256, bias=True)
(2): Dropout(p=0.0, inplace=False)
)
)
)
(norms): ModuleList(
(0): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
)
)
)
(decoder): DINOTransformerDecoder(
(layers): ModuleList(
(0): BaseTransformerLayer(
(attentions): ModuleList(
(0): MultiheadAttention(
(attn): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(proj_drop): Dropout(p=0.0, inplace=False)
)
(1): MultiScaleDeformableAttention(
(dropout): Dropout(p=0.0, inplace=False)
(sampling_offsets): Linear(in_features=256, out_features=256, bias=True)
(attention_weights): Linear(in_features=256, out_features=128, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
)
(ffns): ModuleList(
(0): FFN(
(activation): ReLU(inplace=True)
(layers): Sequential(
(0): Sequential(
(0): Linear(in_features=256, out_features=2048, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.0, inplace=False)
)
(1): Linear(in_features=2048, out_features=256, bias=True)
(2): Dropout(p=0.0, inplace=False)
)
)
)
(norms): ModuleList(
(0): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
)
(1): BaseTransformerLayer(
(attentions): ModuleList(
(0): MultiheadAttention(
(attn): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(proj_drop): Dropout(p=0.0, inplace=False)
)
(1): MultiScaleDeformableAttention(
(dropout): Dropout(p=0.0, inplace=False)
(sampling_offsets): Linear(in_features=256, out_features=256, bias=True)
(attention_weights): Linear(in_features=256, out_features=128, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
)
(ffns): ModuleList(
(0): FFN(
(activation): ReLU(inplace=True)
(layers): Sequential(
(0): Sequential(
(0): Linear(in_features=256, out_features=2048, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.0, inplace=False)
)
(1): Linear(in_features=2048, out_features=256, bias=True)
(2): Dropout(p=0.0, inplace=False)
)
)
)
(norms): ModuleList(
(0): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
)
(2): BaseTransformerLayer(
(attentions): ModuleList(
(0): MultiheadAttention(
(attn): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(proj_drop): Dropout(p=0.0, inplace=False)
)
(1): MultiScaleDeformableAttention(
(dropout): Dropout(p=0.0, inplace=False)
(sampling_offsets): Linear(in_features=256, out_features=256, bias=True)
(attention_weights): Linear(in_features=256, out_features=128, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
)
(ffns): ModuleList(
(0): FFN(
(activation): ReLU(inplace=True)
(layers): Sequential(
(0): Sequential(
(0): Linear(in_features=256, out_features=2048, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.0, inplace=False)
)
(1): Linear(in_features=2048, out_features=256, bias=True)
(2): Dropout(p=0.0, inplace=False)
)
)
)
(norms): ModuleList(
(0): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
)
(3): BaseTransformerLayer(
(attentions): ModuleList(
(0): MultiheadAttention(
(attn): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(proj_drop): Dropout(p=0.0, inplace=False)
)
(1): MultiScaleDeformableAttention(
(dropout): Dropout(p=0.0, inplace=False)
(sampling_offsets): Linear(in_features=256, out_features=256, bias=True)
(attention_weights): Linear(in_features=256, out_features=128, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
)
(ffns): ModuleList(
(0): FFN(
(activation): ReLU(inplace=True)
(layers): Sequential(
(0): Sequential(
(0): Linear(in_features=256, out_features=2048, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.0, inplace=False)
)
(1): Linear(in_features=2048, out_features=256, bias=True)
(2): Dropout(p=0.0, inplace=False)
)
)
)
(norms): ModuleList(
(0): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
)
(4): BaseTransformerLayer(
(attentions): ModuleList(
(0): MultiheadAttention(
(attn): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(proj_drop): Dropout(p=0.0, inplace=False)
)
(1): MultiScaleDeformableAttention(
(dropout): Dropout(p=0.0, inplace=False)
(sampling_offsets): Linear(in_features=256, out_features=256, bias=True)
(attention_weights): Linear(in_features=256, out_features=128, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
)
(ffns): ModuleList(
(0): FFN(
(activation): ReLU(inplace=True)
(layers): Sequential(
(0): Sequential(
(0): Linear(in_features=256, out_features=2048, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.0, inplace=False)
)
(1): Linear(in_features=2048, out_features=256, bias=True)
(2): Dropout(p=0.0, inplace=False)
)
)
)
(norms): ModuleList(
(0): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
)
(5): BaseTransformerLayer(
(attentions): ModuleList(
(0): MultiheadAttention(
(attn): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(proj_drop): Dropout(p=0.0, inplace=False)
)
(1): MultiScaleDeformableAttention(
(dropout): Dropout(p=0.0, inplace=False)
(sampling_offsets): Linear(in_features=256, out_features=256, bias=True)
(attention_weights): Linear(in_features=256, out_features=128, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
)
(ffns): ModuleList(
(0): FFN(
(activation): ReLU(inplace=True)
(layers): Sequential(
(0): Sequential(
(0): Linear(in_features=256, out_features=2048, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.0, inplace=False)
)
(1): Linear(in_features=2048, out_features=256, bias=True)
(2): Dropout(p=0.0, inplace=False)
)
)
)
(norms): ModuleList(
(0): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
)
)
(ref_point_head): MLP(
(layers): ModuleList(
(0): Linear(in_features=512, out_features=256, bias=True)
(1): Linear(in_features=256, out_features=256, bias=True)
)
)
(norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(class_embed): ModuleList(
(0): Linear(in_features=256, out_features=5, bias=True)
(1): Linear(in_features=256, out_features=5, bias=True)
(2): Linear(in_features=256, out_features=5, bias=True)
(3): Linear(in_features=256, out_features=5, bias=True)
(4): Linear(in_features=256, out_features=5, bias=True)
(5): Linear(in_features=256, out_features=5, bias=True)
(6): Linear(in_features=256, out_features=5, bias=True)
)
(bbox_embed): ModuleList(
(0): MLP(
(layers): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Linear(in_features=256, out_features=256, bias=True)
(2): Linear(in_features=256, out_features=4, bias=True)
)
)
(1): MLP(
(layers): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Linear(in_features=256, out_features=256, bias=True)
(2): Linear(in_features=256, out_features=4, bias=True)
)
)
(2): MLP(
(layers): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Linear(in_features=256, out_features=256, bias=True)
(2): Linear(in_features=256, out_features=4, bias=True)
)
)
(3): MLP(
(layers): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Linear(in_features=256, out_features=256, bias=True)
(2): Linear(in_features=256, out_features=4, bias=True)
)
)
(4): MLP(
(layers): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Linear(in_features=256, out_features=256, bias=True)
(2): Linear(in_features=256, out_features=4, bias=True)
)
)
(5): MLP(
(layers): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Linear(in_features=256, out_features=256, bias=True)
(2): Linear(in_features=256, out_features=4, bias=True)
)
)
(6): MLP(
(layers): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Linear(in_features=256, out_features=256, bias=True)
(2): Linear(in_features=256, out_features=4, bias=True)
)
)
)
)
(tgt_embed): Embedding(900, 256)
(enc_output): Linear(in_features=256, out_features=256, bias=True)
(enc_output_norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
(class_embed): ModuleList(
(0): Linear(in_features=256, out_features=5, bias=True)
(1): Linear(in_features=256, out_features=5, bias=True)
(2): Linear(in_features=256, out_features=5, bias=True)
(3): Linear(in_features=256, out_features=5, bias=True)
(4): Linear(in_features=256, out_features=5, bias=True)
(5): Linear(in_features=256, out_features=5, bias=True)
(6): Linear(in_features=256, out_features=5, bias=True)
)
(bbox_embed): ModuleList(
(0): MLP(
(layers): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Linear(in_features=256, out_features=256, bias=True)
(2): Linear(in_features=256, out_features=4, bias=True)
)
)
(1): MLP(
(layers): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Linear(in_features=256, out_features=256, bias=True)
(2): Linear(in_features=256, out_features=4, bias=True)
)
)
(2): MLP(
(layers): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Linear(in_features=256, out_features=256, bias=True)
(2): Linear(in_features=256, out_features=4, bias=True)
)
)
(3): MLP(
(layers): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Linear(in_features=256, out_features=256, bias=True)
(2): Linear(in_features=256, out_features=4, bias=True)
)
)
(4): MLP(
(layers): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Linear(in_features=256, out_features=256, bias=True)
(2): Linear(in_features=256, out_features=4, bias=True)
)
)
(5): MLP(
(layers): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Linear(in_features=256, out_features=256, bias=True)
(2): Linear(in_features=256, out_features=4, bias=True)
)
)
(6): MLP(
(layers): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Linear(in_features=256, out_features=256, bias=True)
(2): Linear(in_features=256, out_features=4, bias=True)
)
)
)
(criterion): Criterion DINOCriterion
matcher: Matcher HungarianMatcher
cost_class: 2.0
cost_bbox: 5.0
cost_giou: 2.0
cost_class_type: focal_loss_cost
focal cost alpha: 0.25
focal cost gamma: 2.0
losses: ['class', 'boxes']
loss_class_type: focal_loss
weight_dict: {'loss_class': 1, 'loss_bbox': 5.0, 'loss_giou': 2.0, 'loss_class_dn': 1, 'loss_bbox_dn': 5.0, 'loss_giou_dn': 2.0, 'loss_class_enc': 1, 'loss_bbox_enc': 5.0, 'loss_giou_enc': 2.0, 'loss_class_dn_enc': 1, 'loss_bbox_dn_enc': 5.0, 'loss_giou_dn_enc': 2.0, 'loss_class_0': 1, 'loss_bbox_0': 5.0, 'loss_giou_0': 2.0, 'loss_class_dn_0': 1, 'loss_bbox_dn_0': 5.0, 'loss_giou_dn_0': 2.0, 'loss_class_1': 1, 'loss_bbox_1': 5.0, 'loss_giou_1': 2.0, 'loss_class_dn_1': 1, 'loss_bbox_dn_1': 5.0, 'loss_giou_dn_1': 2.0, 'loss_class_2': 1, 'loss_bbox_2': 5.0, 'loss_giou_2': 2.0, 'loss_class_dn_2': 1, 'loss_bbox_dn_2': 5.0, 'loss_giou_dn_2': 2.0, 'loss_class_3': 1, 'loss_bbox_3': 5.0, 'loss_giou_3': 2.0, 'loss_class_dn_3': 1, 'loss_bbox_dn_3': 5.0, 'loss_giou_dn_3': 2.0, 'loss_class_4': 1, 'loss_bbox_4': 5.0, 'loss_giou_4': 2.0, 'loss_class_dn_4': 1, 'loss_bbox_dn_4': 5.0, 'loss_giou_dn_4': 2.0}
num_classes: 5
eos_coef: None
focal loss alpha: 0.25
focal loss gamma: 2.0
(label_enc): Embedding(5, 256)
)

from detrex.

rentainhe avatar rentainhe commented on June 3, 2024

Sorry for the late reply, I think this is a normal loss fluctuation phenomenon. After the learning rate decreases, this situation will be alleviated and the model will gradually converge. @hg6185

from detrex.

hg6185 avatar hg6185 commented on June 3, 2024

Sorry, I meant to thank you :)

from detrex.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.