Comments (4)
Hello,
i am finetuning DINO (SwinB) on a small imbalanced custom dataset. I initially used the 0.0002 LR and the multiplier but then i reduced the LR to 0.0001 like you did within your paper. Also you write that the DINO is very sensitive to Hyperparameter of the LR.
My loss function is very noisy. It decreases in average fast, but I also find that there are high spikes. Did you observe something similiar, or does this could be related to the imbalance of the dataset?
Thank you so much in advance!
Would u like to share your training log with us which may be very helpful!
from detrex.
Hey,
here for a SwinS. I switched due to the smaller size of window attention, since my APs was always very bad. In general, the model converges towards sth. that is good for large objects but rather bad for small objects. I assume that the loss is so noisy because of that. Have you encountered this phenomenon before and do you know if non-optimal learning_rate and small batch_sizes trap the dino model in a non optimal condition? Or even if there are known tricks to improve convergence for smaller objects with DINO?
The two images show training loss and validation loss:
So far, I also set the image size to 640+.
Thank you already in advance!
The maximum iterations were around 3000 with a dataset of only roughly 1400 (training): I scheduled after 1 and 2k iterations by 0.5 and respectively 0.1.
sh: readlink: command not found
[INFO] Module CUDA/11.3.1 loaded.
sh: readlink: command not found
[INFO] Module GCC/9.4.0 loaded.
[INFO] Module numactl/2.0.14 loaded.
Inactive Modules:
- UCX/1.12.1
Due to MODULEPATH changes, the following have been reloaded:
- numactl/2.0.14
The following have been reloaded with a version change:
- GCCcore/.11.3.0 => GCCcore/.9.4.0 3) zlib/1.2.12 => zlib/1.2.11
- binutils/2.38 => binutils/2.36.1
/
[09/05 00:05:09 detectron2]: Rank of current process: 0. World size: 2
[09/05 00:05:11 detectron2]: Environment info:
sys.platform linux
Python 3.8.17 (default, Jul 5 2023, 20:41:08) [GCC 11.2.0]
numpy 1.22.3
detectron2 0.6 @
detectron2/detectron2
Compiler GCC 9.4
CUDA compiler CUDA 11.3
detectron2 arch flags 7.0
DETECTRON2_ENV_MODULE
PyTorch 1.12.1+cu113 @/home/_/miniconda3/envs/fps/lib/python3.8/site-packages/torch
PyTorch debug build False
GPU available Yes
GPU 0,1 Tesla V100-SXM2-16GB (arch=7.0)
Driver version
CUDA_HOME /cvmfs/software.hpc.rwth.de/Linux/RH8/x86_64/intel/skylake_avx512/software/CUDA/11.3.1
Pillow 9.4.0
torchvision 0.13.1+cu113 @/home/cg072483/miniconda3/envs/fps/lib/python3.8/site-packages/torchvision
torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6
fvcore 0.1.5.post20221221
iopath 0.1.9
cv2 4.8.0
PyTorch built with:
- GCC 9.3
- C++ Version: 201402
- Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 11.3
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
- CuDNN 8.3.2 (built against CUDA 11.5)
- Magma 2.5.2
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.12.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,
[09/05 00:05:11 detectron2]: Command line arguments: Namespace(confidence_threshold=0.5, config_file='./detrex/projects/dino/configs/dino-swin/dino_swin_small_224_4scale_12ep.py', img_format='RGB', input=None, max_size_test=1333, metadata_dataset='coco_2017_val', min_size_test=800, num_gpus=1, opts=['train.init_checkpoint=./models/dino_swin_small_224_4scale_12ep.pth'], output=None, resume=False, video_input=None, webcam=False)
[09/05 00:05:11 detectron2]: Contents of args.config_file=./detrex/projects/dino/configs/dino-swin/dino_swin_small_224_4scale_12ep.py:
from detrex.config import get_config
from ..models.dino_swin_small_224 import model
get default config
dataloader = get_config("common/data/coco_detr.py").dataloader
optimizer = get_config("common/optim.py").AdamW
lr_multiplier = get_config("common/coco_schedule.py").lr_multiplier_12ep
train = get_config("common/train.py").train
modify training config
train.init_checkpoint = "/path/to/swin_small_patch4_window7_224.pth"
train.output_dir = "./output/dino_swin_small_224_4scale_12ep"
max training iterations
train.max_iter = 90000
train.eval_period = 5000
train.log_period = 20
train.checkpointer.period = 5000
gradient clipping for training
train.clip_grad.enabled = True
train.clip_grad.params.max_norm = 0.1
train.clip_grad.params.norm_type = 2
set training devices
train.device = "cuda"
model.device = train.device
modify optimizer config
optimizer.lr = 1e-4
optimizer.betas = (0.9, 0.999)
optimizer.weight_decay = 1e-4
optimizer.params.lr_factor_func = lambda module_name: 0.1 if "backbone" in module_name else 1
modify dataloader config
dataloader.train.num_workers = 16
please notice that this is total batch size.
surpose you're using 4 gpus for training and the batch size for
each gpu is 16/4 = 4
dataloader.train.total_batch_size = 16
WARNING [09/05 00:05:11 d2.config.lazy]: The config contains objects that cannot serialize to a valid yaml. ./output/dino_swin_small_224_4scale_12ep/config.yaml is human-readable but cannot be loaded.
WARNING [09/05 00:05:11 d2.config.lazy]: Config is saved using cloudpickle at ./output/dino_swin_small_224_4scale_12ep/config.yaml.pkl.
[09/05 00:05:11 detectron2]: Full config saved to ./output/dino_swin_small_224_4scale_12ep/config.yaml
[09/05 00:05:11 d2.utils.env]: Using a generated random seed 11479259
Backbone was sucessfully frozen! Trainable params: 20518719
DINO(
(backbone): SwinTransformer(
(patch_embed): PatchEmbed(
(proj): Conv2d(3, 96, kernel_size=(4, 4), stride=(4, 4))
(norm): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
)
(pos_drop): Dropout(p=0.0, inplace=False)
(layers): ModuleList(
(0): BasicLayer(
(blocks): ModuleList(
(0): SwinTransformerBlock(
(norm1): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=96, out_features=288, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=96, out_features=96, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): Identity()
(norm2): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=96, out_features=384, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=384, out_features=96, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(1): SwinTransformerBlock(
(norm1): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=96, out_features=288, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=96, out_features=96, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.009)
(norm2): LayerNorm((96,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=96, out_features=384, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=384, out_features=96, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
)
(downsample): PatchMerging(
(reduction): Linear(in_features=384, out_features=192, bias=False)
(norm): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
)
)
(1): BasicLayer(
(blocks): ModuleList(
(0): SwinTransformerBlock(
(norm1): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=192, out_features=576, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=192, out_features=192, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.017)
(norm2): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=192, out_features=768, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=768, out_features=192, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(1): SwinTransformerBlock(
(norm1): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=192, out_features=576, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=192, out_features=192, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.026)
(norm2): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=192, out_features=768, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=768, out_features=192, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
)
(downsample): PatchMerging(
(reduction): Linear(in_features=768, out_features=384, bias=False)
(norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
)
(2): BasicLayer(
(blocks): ModuleList(
(0): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.035)
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(1): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.043)
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(2): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.052)
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(3): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.061)
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(4): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.070)
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(5): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.078)
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(6): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.087)
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(7): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.096)
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(8): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.104)
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(9): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.113)
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(10): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.122)
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(11): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.130)
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(12): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.139)
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(13): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.148)
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(14): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.157)
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(15): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.165)
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(16): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.174)
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(17): SwinTransformerBlock(
(norm1): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=384, out_features=1152, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=384, out_features=384, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.183)
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=384, out_features=1536, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=1536, out_features=384, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
)
(downsample): PatchMerging(
(reduction): Linear(in_features=1536, out_features=768, bias=False)
(norm): LayerNorm((1536,), eps=1e-05, elementwise_affine=True)
)
)
(3): BasicLayer(
(blocks): ModuleList(
(0): SwinTransformerBlock(
(norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=768, out_features=2304, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=768, out_features=768, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.191)
(norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=768, out_features=3072, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=3072, out_features=768, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
(1): SwinTransformerBlock(
(norm1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(attn): WindowAttention(
(qkv): Linear(in_features=768, out_features=2304, bias=True)
(attn_drop): Dropout(p=0.0, inplace=False)
(proj): Linear(in_features=768, out_features=768, bias=True)
(proj_drop): Dropout(p=0.0, inplace=False)
(softmax): Softmax(dim=-1)
)
(drop_path): DropPath(drop_prob=0.200)
(norm2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(mlp): Mlp(
(fc1): Linear(in_features=768, out_features=3072, bias=True)
(act): GELU(approximate=none)
(fc2): Linear(in_features=3072, out_features=768, bias=True)
(drop): Dropout(p=0.0, inplace=False)
)
)
)
)
)
(norm1): LayerNorm((192,), eps=1e-05, elementwise_affine=True)
(norm2): LayerNorm((384,), eps=1e-05, elementwise_affine=True)
(norm3): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
(position_embedding): PositionEmbeddingSine()
(neck): ChannelMapper(
(convs): ModuleList(
(0): ConvNormAct(
(conv): Conv2d(192, 256, kernel_size=(1, 1), stride=(1, 1))
(norm): GroupNorm(32, 256, eps=1e-05, affine=True)
)
(1): ConvNormAct(
(conv): Conv2d(384, 256, kernel_size=(1, 1), stride=(1, 1))
(norm): GroupNorm(32, 256, eps=1e-05, affine=True)
)
(2): ConvNormAct(
(conv): Conv2d(768, 256, kernel_size=(1, 1), stride=(1, 1))
(norm): GroupNorm(32, 256, eps=1e-05, affine=True)
)
)
(extra_convs): ModuleList(
(0): ConvNormAct(
(conv): Conv2d(768, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(norm): GroupNorm(32, 256, eps=1e-05, affine=True)
)
)
)
(transformer): DINOTransformer(
(encoder): DINOTransformerEncoder(
(layers): ModuleList(
(0): BaseTransformerLayer(
(attentions): ModuleList(
(0): MultiScaleDeformableAttention(
(dropout): Dropout(p=0.0, inplace=False)
(sampling_offsets): Linear(in_features=256, out_features=256, bias=True)
(attention_weights): Linear(in_features=256, out_features=128, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
)
(ffns): ModuleList(
(0): FFN(
(activation): ReLU(inplace=True)
(layers): Sequential(
(0): Sequential(
(0): Linear(in_features=256, out_features=2048, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.0, inplace=False)
)
(1): Linear(in_features=2048, out_features=256, bias=True)
(2): Dropout(p=0.0, inplace=False)
)
)
)
(norms): ModuleList(
(0): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
)
(1): BaseTransformerLayer(
(attentions): ModuleList(
(0): MultiScaleDeformableAttention(
(dropout): Dropout(p=0.0, inplace=False)
(sampling_offsets): Linear(in_features=256, out_features=256, bias=True)
(attention_weights): Linear(in_features=256, out_features=128, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
)
(ffns): ModuleList(
(0): FFN(
(activation): ReLU(inplace=True)
(layers): Sequential(
(0): Sequential(
(0): Linear(in_features=256, out_features=2048, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.0, inplace=False)
)
(1): Linear(in_features=2048, out_features=256, bias=True)
(2): Dropout(p=0.0, inplace=False)
)
)
)
(norms): ModuleList(
(0): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
)
(2): BaseTransformerLayer(
(attentions): ModuleList(
(0): MultiScaleDeformableAttention(
(dropout): Dropout(p=0.0, inplace=False)
(sampling_offsets): Linear(in_features=256, out_features=256, bias=True)
(attention_weights): Linear(in_features=256, out_features=128, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
)
(ffns): ModuleList(
(0): FFN(
(activation): ReLU(inplace=True)
(layers): Sequential(
(0): Sequential(
(0): Linear(in_features=256, out_features=2048, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.0, inplace=False)
)
(1): Linear(in_features=2048, out_features=256, bias=True)
(2): Dropout(p=0.0, inplace=False)
)
)
)
(norms): ModuleList(
(0): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
)
(3): BaseTransformerLayer(
(attentions): ModuleList(
(0): MultiScaleDeformableAttention(
(dropout): Dropout(p=0.0, inplace=False)
(sampling_offsets): Linear(in_features=256, out_features=256, bias=True)
(attention_weights): Linear(in_features=256, out_features=128, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
)
(ffns): ModuleList(
(0): FFN(
(activation): ReLU(inplace=True)
(layers): Sequential(
(0): Sequential(
(0): Linear(in_features=256, out_features=2048, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.0, inplace=False)
)
(1): Linear(in_features=2048, out_features=256, bias=True)
(2): Dropout(p=0.0, inplace=False)
)
)
)
(norms): ModuleList(
(0): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
)
(4): BaseTransformerLayer(
(attentions): ModuleList(
(0): MultiScaleDeformableAttention(
(dropout): Dropout(p=0.0, inplace=False)
(sampling_offsets): Linear(in_features=256, out_features=256, bias=True)
(attention_weights): Linear(in_features=256, out_features=128, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
)
(ffns): ModuleList(
(0): FFN(
(activation): ReLU(inplace=True)
(layers): Sequential(
(0): Sequential(
(0): Linear(in_features=256, out_features=2048, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.0, inplace=False)
)
(1): Linear(in_features=2048, out_features=256, bias=True)
(2): Dropout(p=0.0, inplace=False)
)
)
)
(norms): ModuleList(
(0): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
)
(5): BaseTransformerLayer(
(attentions): ModuleList(
(0): MultiScaleDeformableAttention(
(dropout): Dropout(p=0.0, inplace=False)
(sampling_offsets): Linear(in_features=256, out_features=256, bias=True)
(attention_weights): Linear(in_features=256, out_features=128, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
)
(ffns): ModuleList(
(0): FFN(
(activation): ReLU(inplace=True)
(layers): Sequential(
(0): Sequential(
(0): Linear(in_features=256, out_features=2048, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.0, inplace=False)
)
(1): Linear(in_features=2048, out_features=256, bias=True)
(2): Dropout(p=0.0, inplace=False)
)
)
)
(norms): ModuleList(
(0): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
)
)
)
(decoder): DINOTransformerDecoder(
(layers): ModuleList(
(0): BaseTransformerLayer(
(attentions): ModuleList(
(0): MultiheadAttention(
(attn): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(proj_drop): Dropout(p=0.0, inplace=False)
)
(1): MultiScaleDeformableAttention(
(dropout): Dropout(p=0.0, inplace=False)
(sampling_offsets): Linear(in_features=256, out_features=256, bias=True)
(attention_weights): Linear(in_features=256, out_features=128, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
)
(ffns): ModuleList(
(0): FFN(
(activation): ReLU(inplace=True)
(layers): Sequential(
(0): Sequential(
(0): Linear(in_features=256, out_features=2048, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.0, inplace=False)
)
(1): Linear(in_features=2048, out_features=256, bias=True)
(2): Dropout(p=0.0, inplace=False)
)
)
)
(norms): ModuleList(
(0): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
)
(1): BaseTransformerLayer(
(attentions): ModuleList(
(0): MultiheadAttention(
(attn): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(proj_drop): Dropout(p=0.0, inplace=False)
)
(1): MultiScaleDeformableAttention(
(dropout): Dropout(p=0.0, inplace=False)
(sampling_offsets): Linear(in_features=256, out_features=256, bias=True)
(attention_weights): Linear(in_features=256, out_features=128, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
)
(ffns): ModuleList(
(0): FFN(
(activation): ReLU(inplace=True)
(layers): Sequential(
(0): Sequential(
(0): Linear(in_features=256, out_features=2048, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.0, inplace=False)
)
(1): Linear(in_features=2048, out_features=256, bias=True)
(2): Dropout(p=0.0, inplace=False)
)
)
)
(norms): ModuleList(
(0): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
)
(2): BaseTransformerLayer(
(attentions): ModuleList(
(0): MultiheadAttention(
(attn): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(proj_drop): Dropout(p=0.0, inplace=False)
)
(1): MultiScaleDeformableAttention(
(dropout): Dropout(p=0.0, inplace=False)
(sampling_offsets): Linear(in_features=256, out_features=256, bias=True)
(attention_weights): Linear(in_features=256, out_features=128, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
)
(ffns): ModuleList(
(0): FFN(
(activation): ReLU(inplace=True)
(layers): Sequential(
(0): Sequential(
(0): Linear(in_features=256, out_features=2048, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.0, inplace=False)
)
(1): Linear(in_features=2048, out_features=256, bias=True)
(2): Dropout(p=0.0, inplace=False)
)
)
)
(norms): ModuleList(
(0): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
)
(3): BaseTransformerLayer(
(attentions): ModuleList(
(0): MultiheadAttention(
(attn): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(proj_drop): Dropout(p=0.0, inplace=False)
)
(1): MultiScaleDeformableAttention(
(dropout): Dropout(p=0.0, inplace=False)
(sampling_offsets): Linear(in_features=256, out_features=256, bias=True)
(attention_weights): Linear(in_features=256, out_features=128, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
)
(ffns): ModuleList(
(0): FFN(
(activation): ReLU(inplace=True)
(layers): Sequential(
(0): Sequential(
(0): Linear(in_features=256, out_features=2048, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.0, inplace=False)
)
(1): Linear(in_features=2048, out_features=256, bias=True)
(2): Dropout(p=0.0, inplace=False)
)
)
)
(norms): ModuleList(
(0): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
)
(4): BaseTransformerLayer(
(attentions): ModuleList(
(0): MultiheadAttention(
(attn): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(proj_drop): Dropout(p=0.0, inplace=False)
)
(1): MultiScaleDeformableAttention(
(dropout): Dropout(p=0.0, inplace=False)
(sampling_offsets): Linear(in_features=256, out_features=256, bias=True)
(attention_weights): Linear(in_features=256, out_features=128, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
)
(ffns): ModuleList(
(0): FFN(
(activation): ReLU(inplace=True)
(layers): Sequential(
(0): Sequential(
(0): Linear(in_features=256, out_features=2048, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.0, inplace=False)
)
(1): Linear(in_features=2048, out_features=256, bias=True)
(2): Dropout(p=0.0, inplace=False)
)
)
)
(norms): ModuleList(
(0): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
)
(5): BaseTransformerLayer(
(attentions): ModuleList(
(0): MultiheadAttention(
(attn): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(proj_drop): Dropout(p=0.0, inplace=False)
)
(1): MultiScaleDeformableAttention(
(dropout): Dropout(p=0.0, inplace=False)
(sampling_offsets): Linear(in_features=256, out_features=256, bias=True)
(attention_weights): Linear(in_features=256, out_features=128, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
)
(ffns): ModuleList(
(0): FFN(
(activation): ReLU(inplace=True)
(layers): Sequential(
(0): Sequential(
(0): Linear(in_features=256, out_features=2048, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.0, inplace=False)
)
(1): Linear(in_features=2048, out_features=256, bias=True)
(2): Dropout(p=0.0, inplace=False)
)
)
)
(norms): ModuleList(
(0): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
)
)
(ref_point_head): MLP(
(layers): ModuleList(
(0): Linear(in_features=512, out_features=256, bias=True)
(1): Linear(in_features=256, out_features=256, bias=True)
)
)
(norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(class_embed): ModuleList(
(0): Linear(in_features=256, out_features=5, bias=True)
(1): Linear(in_features=256, out_features=5, bias=True)
(2): Linear(in_features=256, out_features=5, bias=True)
(3): Linear(in_features=256, out_features=5, bias=True)
(4): Linear(in_features=256, out_features=5, bias=True)
(5): Linear(in_features=256, out_features=5, bias=True)
(6): Linear(in_features=256, out_features=5, bias=True)
)
(bbox_embed): ModuleList(
(0): MLP(
(layers): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Linear(in_features=256, out_features=256, bias=True)
(2): Linear(in_features=256, out_features=4, bias=True)
)
)
(1): MLP(
(layers): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Linear(in_features=256, out_features=256, bias=True)
(2): Linear(in_features=256, out_features=4, bias=True)
)
)
(2): MLP(
(layers): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Linear(in_features=256, out_features=256, bias=True)
(2): Linear(in_features=256, out_features=4, bias=True)
)
)
(3): MLP(
(layers): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Linear(in_features=256, out_features=256, bias=True)
(2): Linear(in_features=256, out_features=4, bias=True)
)
)
(4): MLP(
(layers): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Linear(in_features=256, out_features=256, bias=True)
(2): Linear(in_features=256, out_features=4, bias=True)
)
)
(5): MLP(
(layers): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Linear(in_features=256, out_features=256, bias=True)
(2): Linear(in_features=256, out_features=4, bias=True)
)
)
(6): MLP(
(layers): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Linear(in_features=256, out_features=256, bias=True)
(2): Linear(in_features=256, out_features=4, bias=True)
)
)
)
)
(tgt_embed): Embedding(900, 256)
(enc_output): Linear(in_features=256, out_features=256, bias=True)
(enc_output_norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
(class_embed): ModuleList(
(0): Linear(in_features=256, out_features=5, bias=True)
(1): Linear(in_features=256, out_features=5, bias=True)
(2): Linear(in_features=256, out_features=5, bias=True)
(3): Linear(in_features=256, out_features=5, bias=True)
(4): Linear(in_features=256, out_features=5, bias=True)
(5): Linear(in_features=256, out_features=5, bias=True)
(6): Linear(in_features=256, out_features=5, bias=True)
)
(bbox_embed): ModuleList(
(0): MLP(
(layers): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Linear(in_features=256, out_features=256, bias=True)
(2): Linear(in_features=256, out_features=4, bias=True)
)
)
(1): MLP(
(layers): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Linear(in_features=256, out_features=256, bias=True)
(2): Linear(in_features=256, out_features=4, bias=True)
)
)
(2): MLP(
(layers): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Linear(in_features=256, out_features=256, bias=True)
(2): Linear(in_features=256, out_features=4, bias=True)
)
)
(3): MLP(
(layers): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Linear(in_features=256, out_features=256, bias=True)
(2): Linear(in_features=256, out_features=4, bias=True)
)
)
(4): MLP(
(layers): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Linear(in_features=256, out_features=256, bias=True)
(2): Linear(in_features=256, out_features=4, bias=True)
)
)
(5): MLP(
(layers): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Linear(in_features=256, out_features=256, bias=True)
(2): Linear(in_features=256, out_features=4, bias=True)
)
)
(6): MLP(
(layers): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Linear(in_features=256, out_features=256, bias=True)
(2): Linear(in_features=256, out_features=4, bias=True)
)
)
)
(criterion): Criterion DINOCriterion
matcher: Matcher HungarianMatcher
cost_class: 2.0
cost_bbox: 5.0
cost_giou: 2.0
cost_class_type: focal_loss_cost
focal cost alpha: 0.25
focal cost gamma: 2.0
losses: ['class', 'boxes']
loss_class_type: focal_loss
weight_dict: {'loss_class': 1, 'loss_bbox': 5.0, 'loss_giou': 2.0, 'loss_class_dn': 1, 'loss_bbox_dn': 5.0, 'loss_giou_dn': 2.0, 'loss_class_enc': 1, 'loss_bbox_enc': 5.0, 'loss_giou_enc': 2.0, 'loss_class_dn_enc': 1, 'loss_bbox_dn_enc': 5.0, 'loss_giou_dn_enc': 2.0, 'loss_class_0': 1, 'loss_bbox_0': 5.0, 'loss_giou_0': 2.0, 'loss_class_dn_0': 1, 'loss_bbox_dn_0': 5.0, 'loss_giou_dn_0': 2.0, 'loss_class_1': 1, 'loss_bbox_1': 5.0, 'loss_giou_1': 2.0, 'loss_class_dn_1': 1, 'loss_bbox_dn_1': 5.0, 'loss_giou_dn_1': 2.0, 'loss_class_2': 1, 'loss_bbox_2': 5.0, 'loss_giou_2': 2.0, 'loss_class_dn_2': 1, 'loss_bbox_dn_2': 5.0, 'loss_giou_dn_2': 2.0, 'loss_class_3': 1, 'loss_bbox_3': 5.0, 'loss_giou_3': 2.0, 'loss_class_dn_3': 1, 'loss_bbox_dn_3': 5.0, 'loss_giou_dn_3': 2.0, 'loss_class_4': 1, 'loss_bbox_4': 5.0, 'loss_giou_4': 2.0, 'loss_class_dn_4': 1, 'loss_bbox_dn_4': 5.0, 'loss_giou_dn_4': 2.0}
num_classes: 5
eos_coef: None
focal loss alpha: 0.25
focal loss gamma: 2.0
(label_enc): Embedding(5, 256)
)
from detrex.
Sorry for the late reply, I think this is a normal loss fluctuation phenomenon. After the learning rate decreases, this situation will be alleviated and the model will gradually converge. @hg6185
from detrex.
Sorry, I meant to thank you :)
from detrex.
Related Issues (20)
- GPU memory requirements increases when using DCNv3_pytorch rather than DCNv3 HOT 2
- Memory leak during DINO training. HOT 1
- Why is the optimal result different each time during training? HOT 4
- Adding pre-trained models HOT 3
- A issues about slurm and hydra HOT 3
- 找不到相关配置文件 HOT 10
- 训练问题
- 请问训练的参数在哪个文件 HOT 1
- 关于训练轮次问题 HOT 34
- Is there a plan to support rt-detr? HOT 1
- 关于mixed_selection的问题 HOT 3
- The H deformable detr code may have some problems ! HOT 3
- does the detrex support sahi? HOT 1
- DINO Training with Swin-small HOT 3
- 推理图片框框很大
- 推理的图片类别序号如何改成类别名 HOT 1
- 关于two-stage topk_proposals的问题
- Pretrained model for dino swin maynot be correct HOT 2
- Installation with poetry HOT 2
- 推理类别框的颜色如何统一对应 HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from detrex.