hyz-xmaster / swa_object_detection Goto Github PK

View Code? Open in Web Editor NEW

246.0 4.0 26.0 19 MB

SWA Object Detection

License: Apache License 2.0

Python 99.90% Shell 0.08% Dockerfile 0.03%

object-detection instance-segmentation mscoco mmdetection deep-neural-networks

swa_object_detection's People

Contributors

Stargazers

Watchers

swa_object_detection's Issues

Performance with other optimizer, such as Adam, AdamW

Hi, thanks for your nice work.

I am wondering the performance of SWA with other optimizers, such as Adam, Adamw;
Can it achieve consistent performance gain?

If the original network is trained with Adam or Adamw, can SWA (with SGD) improve its performance?

Thanks very much.

problem in get_swa_model.py

Hi, @hyz-xmaster . Thanks for releasing the code.

When using your utils to get the avg model get_swa_model.py, I've got a problem. For example, if I want to avg chkpts of 13-24 epochs, I would intuitively pass starting=13 and ending=24 in args. However, the code actually gave me the avg of 13-23, because of this line:

swa_object_detection/swa/get_swa_model.py

Line 28 in 2feb867

model_names = list(range(starting_id, ending_id))

I think it would be better to use ending_id + 1 instead of ending_id for easier understanding.

compare traditional training result with swa result at same epoch level

nice work to make swa work in object detection!
i have one question about same epoch level comparison.
the result looks like faster rcnn r50 1x + 1x swa extra training get same result as faster rcnn r50 2x?
i think maybe some problem.

faster rcnn r50 1x + 1x swa extra training use cyclic training, but origin faster rcnn r50 2x use step down lr training.
this mismatch may lead to differenct converge. i think the best way is to train models from scratch with cyclic training to get a fair comaprison.
swa needs to change batch norm param to match average weight. frozen bn may harm the final ensemble result

where is the circycle loss?

I can't find the circycle loss

可以使用VOC数据集做SWA训练吗

使用VOC数据集训练验证阶段会报错

代码运用到自己的模型

怎么把这种方法运用到自己的训练中？仅仅多训练12个epochs吗？

Why setting step_up_ratio=0.0?

I just wonder woud it be better to have some iterations during which the learning rate goes up.

Are there any comparisons on other parameters? such as mAP.5,mAP.75 and etc.

Describe the feature
Such as mAP.5,mAP.75 and etc.

When I run Two-pahse mode I met error

Traceback (most recent call last):
File "tools/train.py", line 188, in
main()
File "tools/train.py", line 184, in main
meta=meta)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmdet-2.12.0-py3.7.egg/mmdet/apis/train.py", line 175, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 125, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 54, in train
self.call_hook('after_train_epoch')
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook
getattr(hook, fn_name)(self)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmdet-2.12.0-py3.7.egg/mmdet/core/evaluation/eval_hooks.py", line 149, in after_train_epoch
self.save_best_checkpoint(runner, key_score)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmdet-2.12.0-py3.7.egg/mmdet/core/evaluation/eval_hooks.py", line 166, in save_best_checkpoint
last_ckpt = runner.meta['hook_msgs']['last_ckpt']
KeyError: 'last_ckpt'

Mask RCNN save the best segm model

Hi,
really glad that the save_the_best_model is updated. But would like to ask, how can i save the best_segm_mAP.pth, but not best_bbox_mAP.pth?
thx:)

Can I use dist_train.sh to train the original model?

I think I should train the original model at first, then start to train the extra checkpoints.

But the scripts as follows in README.md don't include the script of training the original model.
I think the first script as follows only trains the extra checkpoints of 12/24 epochs. And the second script as follows averages these 12/24 checkpoints for final detection model.
(1)./tools/dist_train.sh configs/swa/swa_mask_rcnn_r101_fpn_2x_coco.py
(2)./swa/get_swa_model.py work_dirs/swa_mask_rcnn_r101_fpn_2x_coco 1 12 --save_dir work_dirs/swa_mask_rcnn

Maybe I can use this script as follow to train the original model.
./tools/dist_train.sh configs/mask_rcnn/mask_rcnn_r101_fpn_2x_coco.py work_dir=mymodel
Then I can use this script as follow to train the extra checkpoints. And I must set work_dir to the path where saves the original model.
./tools/dist_train.sh configs/swa/swa_mask_rcnn_r101_fpn_2x_coco.py work_dir=mymodel

Am I right?
Thanks very much!

ModuleNotFoundError: No module named 'mmdet'

I have download the master branch of swa_object_detection. Meanwhile i have mmdetection on my pc.

I have try to train mask rcnn r50fpn with the normal mmdetection code. But this error comes always. Appreciate for the help!

what's the difference between swa_epoch_xx.pth and swa_model_xx.pth?

Hello sir! I appreciate your wonderful work which helps a lot. But there's a question I can't figure out.

When I run Two-pahse mode, after I got 12 traditional checkpoints, I can get 2 checkpoints after each epoch. I wonder the difference between them, and which one should I use?

epoch_1.pth to epoch_12.pth is traditional checkpoints

swa_epoch_xx.pth and swa_model_xx.pth are the checkpoints after swa training

swa_model_12.pth和swa_epoch_12.pth有什么区别吗？

还要用get_swa_model()对swa_model_12.pth----swa_model_1.pth求平均吗

理解问题

你好，我查看了代码。看到配置文件中,比如yolov3中，total_epoch和cyclic_times都是24，意思是每个epoch都按照iter进行一次余弦退火，然后把这24个epoch求平均吗

question about BN fronze

@hyz-xmaster hi， thank you for your great job. I have read your paper, and it said that batch normalization layers in backbones are frozen. My question is that you just frozen BN in backbone or you fronzen all the layer in backbone.

when i run only_swa_training i meet a error

Environment

Python: 3.7.10 | packaged by conda-forge | (default, Feb 19 2021, 16:07:37) [GCC 9.3.0]
CUDA available: True
GPU 0,1: Tesla V100-SXM2-32GB
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 10.1, V10.1.168
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.6.0+cu101
PyTorch compiling details: PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) Math Kernel Library Version 2019.0.5 Product Build 20190808 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v1.5.0 (Git Hash e2ac1fac44c5078ca927cb9b90e1b3066a0b2ed0)
OpenMP 201511 (a.k.a. OpenMP 4.5)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 10.1
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75
CuDNN 7.6.3
Magma 2.5.2
Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.7.0+cu101
OpenCV: 4.5.3
MMCV: 1.3.8
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 10.1
MMDetection: 2.14.0+7b5a58b
Error traceback
SWAHook’object has no attribute 'save_checkpoint'

how to use of swa repo correctly

Hello, sir. This is a nice work! I want to use this repo to improve my detection performance, but there are some question about using this repo.
I have trained my detection model in mmdetection, and get the epoch_12.pth normally, and follow the Only-SWA mode make the config,

after training, there are 24 models in my work_dir:

I use swa/get_swa_model.py which use swa_epoch_1.pth to swa_epoch_12.pth to produce swa_1-12.pth, and I use this model to test, the final score decrease a little, is there some problem when I use swa? Very confused, and very appreciate for your reply!!

hyz-xmaster / swa_object_detection Goto Github PK

swa_object_detection's People

Contributors

Stargazers

Watchers

Forkers

swa_object_detection's Issues

Recommend Projects

Recommend Topics

Recommend Org