Giter Site home page Giter Site logo

hyz-xmaster / swa_object_detection Goto Github PK

View Code? Open in Web Editor NEW
246.0 4.0 26.0 19 MB

SWA Object Detection

License: Apache License 2.0

Python 99.90% Shell 0.08% Dockerfile 0.03%
object-detection instance-segmentation mscoco mmdetection deep-neural-networks

swa_object_detection's People

Contributors

aemikachow avatar chrisfsj2051 avatar daavoo avatar erotemic avatar gt9505 avatar hellock avatar hhaandroid avatar hyz-xmaster avatar innerlee avatar johnson-wang avatar jshilong avatar korabelnikov avatar liaopeiyuan avatar lindahua avatar melikovk avatar mxbonn avatar myownskyw7 avatar oceanpang avatar runningleon avatar ryanxli avatar shinya7y avatar thangvubk avatar tianyuandu avatar v-qjqs avatar wangruohui avatar wswday avatar xvjiarui avatar yhcao6 avatar yuzhj avatar zwwwayne avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

swa_object_detection's Issues

Performance with other optimizer, such as Adam, AdamW

Hi, thanks for your nice work.

I am wondering the performance of SWA with other optimizers, such as Adam, Adamw;
Can it achieve consistent performance gain?

If the original network is trained with Adam or Adamw, can SWA (with SGD) improve its performance?

Thanks very much.

problem in get_swa_model.py

Hi, @hyz-xmaster . Thanks for releasing the code.

When using your utils to get the avg model get_swa_model.py, I've got a problem. For example, if I want to avg chkpts of 13-24 epochs, I would intuitively pass starting=13 and ending=24 in args. However, the code actually gave me the avg of 13-23, because of this line:

model_names = list(range(starting_id, ending_id))

I think it would be better to use ending_id + 1 instead of ending_id for easier understanding.

compare traditional training result with swa result at same epoch level

nice work to make swa work in object detection!
i have one question about same epoch level comparison.
the result looks like faster rcnn r50 1x + 1x swa extra training get same result as faster rcnn r50 2x?
i think maybe some problem.

  1. faster rcnn r50 1x + 1x swa extra training use cyclic training, but origin faster rcnn r50 2x use step down lr training.
    this mismatch may lead to differenct converge. i think the best way is to train models from scratch with cyclic training to get a fair comaprison.
  2. swa needs to change batch norm param to match average weight. frozen bn may harm the final ensemble result

image
image

When I run Two-pahse mode I met error

Traceback (most recent call last):
File "tools/train.py", line 188, in
main()
File "tools/train.py", line 184, in main
meta=meta)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmdet-2.12.0-py3.7.egg/mmdet/apis/train.py", line 175, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 125, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 54, in train
self.call_hook('after_train_epoch')
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook
getattr(hook, fn_name)(self)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmdet-2.12.0-py3.7.egg/mmdet/core/evaluation/eval_hooks.py", line 149, in after_train_epoch
self.save_best_checkpoint(runner, key_score)
File "/home/hxz/anaconda3/envs/mmdnew/lib/python3.7/site-packages/mmdet-2.12.0-py3.7.egg/mmdet/core/evaluation/eval_hooks.py", line 166, in save_best_checkpoint
last_ckpt = runner.meta['hook_msgs']['last_ckpt']
KeyError: 'last_ckpt'

Mask RCNN save the best segm model

Hi,
really glad that the save_the_best_model is updated. But would like to ask, how can i save the best_segm_mAP.pth, but not best_bbox_mAP.pth?
thx:)

Can I use dist_train.sh to train the original model?

I think I should train the original model at first, then start to train the extra checkpoints.

But the scripts as follows in README.md don't include the script of training the original model.
I think the first script as follows only trains the extra checkpoints of 12/24 epochs. And the second script as follows averages these 12/24 checkpoints for final detection model.
(1)./tools/dist_train.sh configs/swa/swa_mask_rcnn_r101_fpn_2x_coco.py
(2)./swa/get_swa_model.py work_dirs/swa_mask_rcnn_r101_fpn_2x_coco 1 12 --save_dir work_dirs/swa_mask_rcnn

Maybe I can use this script as follow to train the original model.
./tools/dist_train.sh configs/mask_rcnn/mask_rcnn_r101_fpn_2x_coco.py work_dir=mymodel
Then I can use this script as follow to train the extra checkpoints. And I must set work_dir to the path where saves the original model.
./tools/dist_train.sh configs/swa/swa_mask_rcnn_r101_fpn_2x_coco.py work_dir=mymodel

Am I right?
Thanks very much!

ModuleNotFoundError: No module named 'mmdet'

I have download the master branch of swa_object_detection. Meanwhile i have mmdetection on my pc.

I have try to train mask rcnn r50fpn with the normal mmdetection code. But this error comes always. Appreciate for the help!

what's the difference between swa_epoch_xx.pth and swa_model_xx.pth?

Hello sir! I appreciate your wonderful work which helps a lot. But there's a question I can't figure out.

When I run Two-pahse mode, after I got 12 traditional checkpoints, I can get 2 checkpoints after each epoch. I wonder the difference between them, and which one should I use?

epoch_1.pth to epoch_12.pth is traditional checkpoints
image

swa_epoch_xx.pth and swa_model_xx.pth are the checkpoints after swa training
image
image

理解问题

你好,我查看了代码。看到配置文件中,比如yolov3中,total_epoch和cyclic_times都是24,意思是每个epoch都按照iter进行一次余弦退火,然后把这24个epoch求平均吗

question about BN fronze

@hyz-xmaster hi, thank you for your great job. I have read your paper, and it said that batch normalization layers in backbones are frozen. My question is that you just frozen BN in backbone or you fronzen all the layer in backbone.

when i run only_swa_training i meet a error

Environment

Python: 3.7.10 | packaged by conda-forge | (default, Feb 19 2021, 16:07:37) [GCC 9.3.0]
CUDA available: True
GPU 0,1: Tesla V100-SXM2-32GB
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 10.1, V10.1.168
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.6.0+cu101
PyTorch compiling details: PyTorch built with:

  • GCC 7.3
  • C++ Version: 201402
  • Intel(R) Math Kernel Library Version 2019.0.5 Product Build 20190808 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v1.5.0 (Git Hash e2ac1fac44c5078ca927cb9b90e1b3066a0b2ed0)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • CUDA Runtime 10.1
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75
  • CuDNN 7.6.3
  • Magma 2.5.2
  • Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.7.0+cu101
OpenCV: 4.5.3
MMCV: 1.3.8
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 10.1
MMDetection: 2.14.0+7b5a58b
Error traceback
SWAHook’object has no attribute 'save_checkpoint'

how to use of swa repo correctly

Hello, sir. This is a nice work! I want to use this repo to improve my detection performance, but there are some question about using this repo.
I have trained my detection model in mmdetection, and get the epoch_12.pth normally, and follow the Only-SWA mode make the config,
image
after training, there are 24 models in my work_dir:

image
I use swa/get_swa_model.py which use swa_epoch_1.pth to swa_epoch_12.pth to produce swa_1-12.pth, and I use this model to test, the final score decrease a little, is there some problem when I use swa? Very confused, and very appreciate for your reply!!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.