zhangxiaosong18 / freeanchor Goto Github PK

View Code? Open in Web Editor NEW

671.0 21.0 111.0 1.86 MB

FreeAnchor: Learning to Match Anchors for Visual Object Detection (NeurIPS 2019)

Home Page: https://arxiv.org/abs/1909.02466

License: MIT License

Python 86.44% C++ 5.26% Cuda 8.30%

freeanchor object-detection one-stage pytorch computer-vision neurips-2019

freeanchor's Introduction

FreeAnchor

The Code for "FreeAnchor: Learning to Match Anchors for Visual Object Detection".

This repository is based on maskrcnn-benchmark, and FreeAnchor has also been implemented in mmdetection, thanks @yhcao6 and @hellock.

New performance on COCO

We added multi-scale testing support and updated experiments. The previous version is in this branch.

Backbone	Iteration	Training scales	Multi-scale testing	AP (minival)	AP (test-dev)	Model
ResNet-50-FPN	90k	800	N	38.7	38.7	Link
ResNet-101-FPN	90k	800	N	40.5	40.9	Link
ResNet-101-FPN	180k	[640, 800]	N	42.7	43.1	Link
ResNet-101-FPN	180k	[480, 960]	N	43.2	43.9	Link
ResNet-101-FPN	180k	[480, 960]	Y	44.7	45.2	Link
ResNeXt-64x4d-101-FPN	180k	[640, 800]	N	44.5	44.9	Link
ResNeXt-64x4d-101-FPN	180k	[480, 960]	N	45.6	46.0	Link
ResNeXt-64x4d-101-FPN	180k	[480, 960]	Y	46.8	47.3	Link

Notes:

We use 8 GPUs with 2 image / GPU.
In multi-scale testing, we use image scales in {480, 640, 800, 960, 1120, 1280} and max_size are 1.666× than scales.

Installation

Check INSTALL.md for installation instructions.

Usage

You will need to download the COCO dataset and configure your own paths to the datasets.

For that, all you need to do is to modify maskrcnn_benchmark/config/paths_catalog.py to point to the location where your dataset is stored.

Config Files

We provide four configuration files in the configs directory.

Config File	Backbone	Iteration	Training scales
configs/free_anchor_R-50-FPN_1x.yaml	ResNet-50-FPN	90k	800
configs/free_anchor_R-101-FPN_1x.yaml	ResNet-101-FPN	90k	800
configs/free_anchor_R-101-FPN_j2x.yaml	ResNet-101-FPN	180k	[640, 800]
configs/free_anchor_X-101-FPN_j2x.yaml	ResNeXt-64x4d-101-FPN	180k	[640, 800]
configs/free_anchor_R-101-FPN_e2x.yaml	ResNet-101-FPN	180k	[480, 960]
configs/free_anchor_X-101-FPN_e2x.yaml	ResNeXt-64x4d-101-FPN	180k	[480, 960]

Training with 8 GPUs

cd path_to_free_anchor
export NGPUS=8
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/train_net.py --config-file "path/to/config/file.yaml"

Test on COCO test-dev

cd path_to_free_anchor
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_net.py --config-file "path/to/config/file.yaml" MODEL.WEIGHT "path/to/.pth file" DATASETS.TEST "('coco_test-dev',)"

Multi-scale testing

cd path_to_free_anchor
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/multi_scale_test.py --config-file "path/to/config/file.yaml" MODEL.WEIGHT "path/to/.pth file" DATASETS.TEST "('coco_test-dev',)"

Evaluate NMS Recall

cd path_to_free_anchor
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/eval_NR.py --config-file "path/to/config/file.yaml" MODEL.WEIGHT "path/to/.pth file"

Citations

Please consider citing our paper in your publications if the project helps your research.

@inproceedings{zhang2019freeanchor,
  title   =  {{FreeAnchor}: Learning to Match Anchors for Visual Object Detection},
  author  =  {Zhang, Xiaosong and Wan, Fang and Liu, Chang and Ji, Rongrong and Ye, Qixiang},
  booktitle =  {Neural Information Processing Systems},
  year    =  {2019}
}

freeanchor's People

Contributors

Stargazers

Watchers

Forkers

weitaoatvison maycbj dreadlord1984 prozyy jlqzzz mornydew nichhb niexiaokun chaoso hx121071 zengzhaoyang flavio58it hzhang57 chaos1992 youtang1993 xuewengeophysics onejune2018 baby47 sicparvismagna95 lijun20 fenling wanfang13 xuezu29 templeblock compliceu collector-m shengzhang90 levizhu llltttppp rotorliu jessony aronick mgq1507 cqray1990 tjuhenryli xzmbb kingwangseet whjzsy zhdai zdhscdj 875798590 paul0m ieee820 zymale banyueqin louyanyang piseyyou sdyyn mbyase bingxianchen zhly0 sunxingxingtf doctorwk007 pekinghk tchigher hhy5277 dltensor zhearing vantm hello-anmol hdjsjyl zorrocai brianlv xrosliang fendaq wwwanghao leo-xxx autogyro shmathrabbit liuguoyou qilei123 mathpopo llq201809 ouya-bytes aelimrani yuexinpu zonasw lg12170226 cnndko menguangwen-cn-0411 xialuxi yogsin wolfworld6 yutao007 qintaohu crazyvertigo ibrahim85 trantorrepository holygen jacke121 hell-to-heaven buaaplayer cloudybai mrphu zrh0712 qianrenjian wangdeyu hugallant xxoox168 zksar

freeanchor's Issues

How do i test 5,000 images?

I want to test every image in one gpu.
But it ends in 626iter.
What do I have to do?

Use Group Normalization

I need to use Group Normalization in my task. Is it as simple as setting
MODEL: USE_GN: True

positive bag loss decrease very quickly

Hi, I try to transfer free anchor loss into 3d object detection. But I found that the positive loss decreases very quickly in the first few iterations from 2.0+ to 0.7+. Have you ever met this problem? Could you please give me some advice? Thanks in advance.

distributed.deprecated

Why do you use torch.distributed.deprecated? All functions have counterparts in torch.distributed.

Question about test?

Hello, I have a question. Multi-scale tests are used on networks like RefineDet and CornerNet. Why are FCOS and your FreeAnchor networks not doing multi-scale testing? Is there any special reason?

AttributeError: nms

from maskrcnn_benchmark import _C
“cannot find any reference _C”,how can I deal with this problem?waiting for your reply.thanks

Undefined name 'mask_utils' in segmentation_mask.py

https://github.com/zhangxiaosong18/FreeAnchor/search?q=mask_utils&unscoped_q=mask_utils

flake8 testing of https://github.com/zhangxiaosong18/FreeAnchor on Python 3.7.1

$ flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics

./maskrcnn_benchmark/structures/segmentation_mask.py:126:20: F821 undefined name 'mask_utils'
            rles = mask_utils.frPyObjects(
                   ^
./maskrcnn_benchmark/structures/segmentation_mask.py:129:19: F821 undefined name 'mask_utils'
            rle = mask_utils.merge(rles)
                  ^
./maskrcnn_benchmark/structures/segmentation_mask.py:130:20: F821 undefined name 'mask_utils'
            mask = mask_utils.decode(rle)
                   ^
3     F821 undefined name 'mask_utils'
3

E901,E999,F821,F822,F823 are the "showstopper" flake8 issues that can halt the runtime with a SyntaxError, NameError, etc. These 5 are different from most other flake8 issues which are merely "style violations" -- useful for readability but they do not effect runtime safety.

F821: undefined name name
F822: undefined name name in __all__
F823: local variable name referenced before assignment
E901: SyntaxError or IndentationError
E999: SyntaxError -- failed to compile a file into an Abstract Syntax Tree

CPU usage goes too high when running demo/webcam.py

it costs too much cpu resources when I run demo/webcom.py. It almost occupied all cpus, which makes other program has no cpu resources available. However, it doesn't happen while training or testing. I wonder why it cost so much cpu resource and how can I fix this problem?

I modified the code of webcom.py to make a single image as input.

can it train on dataset that only has bounding boxes?

AP gap of ResNet-50 based FreeAnchor

The AP of FreeAnchor with ResNet-50 as the backbone can reach 39.1, which is 0.4 higher than this project. Why?

Performance gap!

Hi, Thanks for your public code. I train this code only get 38.2 mAP on minival datasets. I want to know this is normal fluctuate?

formulation error

all sum func in eq 2 should be product and the Cij is not the same as Eq 1

please correct me if there is something wrong. thanks

Problem about calculation of loss

Dear author, I am really puzzled about the loss calculation in the following code segmentation in free_anchor_loss.py, could you explain to me please?

Thanks in advance!

with torch.set_grad_enabled(False):
    box_localization = self.box_coder.decode(box_regression_, anchors_.bbox)
    object_box_iou = boxlist_iou(
        targets_,
        BoxList(box_localization, anchors_.size, mode='xyxy')
    )
    H = object_box_iou.max(dim=1, keepdim=True).values.clamp(
        min=self.bbox_threshold + 1e-12)
    object_box_prob = (
            (object_box_iou - self.bbox_threshold) / (H - self.bbox_threshold)
    ).clamp(min=0, max=1)

    indices = torch.stack(
        [torch.arange(len(labels_)).type_as(labels_), labels_], dim=0)

    """
    to implement image_box_iou = torch.sparse.max(
                      torch.sparse_coo_tensor(indices, object_box_iou), dim=0
                 )
    """
    # start
    indices = torch.nonzero(torch.sparse.sum(
        torch.sparse_coo_tensor(indices, object_box_prob), dim=0
    ).to_dense()).t_()

    if indices.numel() == 0:
        image_box_prob = torch.zeros(anchors_.bbox.size(0),
                                     self.num_classes).type_as(object_box_prob)
    else:
        nonzero_box_prob = torch.where(
            (labels_.unsqueeze(dim=-1) == indices[0]),
            object_box_prob[:, indices[1]],
            torch.tensor([0]).type_as(object_box_prob)
        ).max(dim=0).values

        image_box_prob = torch.sparse_coo_tensor(
            indices.flip([0]), nonzero_box_prob,
            size=(anchors_.bbox.size(0), self.num_classes)
        ).to_dense()
    # end

memory leak in boxlist_iou?

I found out that memory leak may happen in box_list_iou (https://github.com/zhangxiaosong18/FreeAnchor/blob/master/maskrcnn_benchmark/modeling/rpn/free_anchor_loss.py#L108)

when it processes large number of ground truth, of course the gpu memory grows up. but it never frees the memory it holds.

turning to cpu mode is a workaround but very slow.

训练loss的一些问题

您好，很感谢您的分享。在读论文的时候关于训练loss有些不太明白，想请教下您：
Loss的部分为什么可以用Mean-max(X)以及FL_(p)替代呢？这里不是很懂，可以说下推导过程么？谢谢您！

import _c error

I followed the install.me process.
But there is an error like the following.

File "tools/train_net.py", line 18, in
from maskrcnn_benchmark.engine.inference import inference
File "/workspace/FreeAnchor/maskrcnn_benchmark/engine/inference.py", line 20, in
from maskrcnn_benchmark.structures.boxlist_ops import boxlist_iou
File "/workspace/FreeAnchor/maskrcnn_benchmark/structures/boxlist_ops.py", line 6, in
from maskrcnn_benchmark.layers import nms as _box_nms
File "/workspace/FreeAnchor/maskrcnn_benchmark/layers/init.py", line 8, in
from .nms import nms
File "/workspace/FreeAnchor/maskrcnn_benchmark/layers/nms.py", line 3, in
from maskrcnn_benchmark import _C
ImportError: /workspace/FreeAnchor/maskrcnn_benchmark/_C.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN2at19UndefinedTensorImpl10_singletonE

Package Version Location

backcall 0.1.0
certifi 2019.11.28
cffi 1.13.2
cycler 0.10.0
Cython 0.29.14
decorator 4.4.1
ipython 7.12.0
ipython-genutils 0.2.0
jedi 0.16.0
kiwisolver 1.1.0
maskrcnn-benchmark 0.1 /workspace/FreeAnchor
matplotlib 3.1.3
mkl-fft 1.0.15
mkl-random 1.1.0
mkl-service 2.3.0
numpy 1.18.1
olefile 0.46
parso 0.6.0
pexpect 4.8.0
pickleshare 0.7.5
Pillow 4.1.1
pip 20.0.2
prompt-toolkit 3.0.3
ptyprocess 0.6.0
pycocotools 2.0.0
pycparser 2.19
Pygments 2.5.2
pyparsing 2.4.6
python-dateutil 2.8.1
PyYAML 5.3
setuptools 45.1.0.post20200127
six 1.14.0
torch 1.1.0
torchvision 0.2.1
tqdm 4.42.1
traitlets 4.3.3
wcwidth 0.1.8
wheel 0.34.2
yacs 0.1.6

How do I solve this problem?

Difference between code and paper

When

$L_{ij}^{cls}=BCE(a_{j}^{cls}, b_{i}^{cls}))=-[b_{i}^{cls}\log{a_{j}^{cls}} + (1-b_{i}^{cls})\log{(1-a_{j}^{cls})}] \neq BCE(a_{j}^{cls}, \vec{1})),$

$p_{ij}^{cls}=e^{-L_{ij}^{cls}} \neq e^{\log{a_{j}^{cls}}},$

then

$p_{ij}^{cls}\neq a_{j}^{cls},$

but in your code https://github.com/zhangxiaosong18/FreeAnchor/blob/master/maskrcnn_benchmark/modeling/rpn/free_anchor_loss.py#L161,
you just use $a_{j}^{cls}$ (matched_cls_prob in your code) as $p_{ij}^{cls}$ ,
that means you just ignore the other predicted classes which not matching the target class, and I think it's different with retinanet_cls_loss defined in https://github.com/zhangxiaosong18/FreeAnchor/blob/master/maskrcnn_benchmark/modeling/rpn/retinanet_loss.py#L142.

I try to rewrite the code calculating matched_cls_prob as blew:

labels_mul = torch.zeros([len(labels_), self.num_classes])
for i in range(len(labels_)):
    labels_mul[i, labels_[i]] = 1

labels_mul = labels_mul.unsqueeze(1).repeat(1, self.pre_anchor_topk, 1)

loss_mul_class = nn.BCELoss(reduction="none")(cls_prob_[matched], labels_mul).sum(dim=-1)
matched_cls_prob = (-loss_mul_class).exp()

Did I get it wrong ？ @zhangxiaosong18

positive_loss is much larger than negative_loss

Train log:
2019-10-11 16:29:21,673 maskrcnn_benchmark.trainer INFO: eta: 1 day, 0:40:30 iter: 260 loss: 3.8217 (3.8392) negative_loss: 0.0326 (0.0354) positive_loss: 3.7731 (3.8038) time: 1.3759 (1.4870) data: 0.0050 (0.0068) lr: 0.000680 max mem: 7173
2019-10-11 16:29:48,522 maskrcnn_benchmark.trainer INFO: eta: 1 day, 0:29:44 iter: 280 loss: 3.7343 (3.8322) negative_loss: 0.0513 (0.0364) positive_loss: 3.6920 (3.7958) time: 1.2558 (1.4766) data: 0.0049 (0.0067) lr: 0.000707 max mem: 7173
2019-10-11 16:30:17,056 maskrcnn_benchmark.trainer INFO: eta: 1 day, 0:25:56 iter: 300 loss: 3.5909 (3.8169) negative_loss: 0.0517 (0.0395) positive_loss: 3.5172 (3.7775) time: 1.1965 (1.4733) data: 0.0047 (0.0066) lr: 0.000733 max mem: 7173_
It's normal?

RuntimeError when training

I only change the config file (free_anchor_R-50-FPN_test.txt), and got the error

2020-01-16 15:49:05,738 maskrcnn_benchmark.trainer INFO: eta: 3:46:57 iter: 244400 loss: 1.7938 (1.8977) loss_retina_positive: 1.6451 (1.7404) loss_retina_negative: 0.1402 (0.1573) time: 0.1097 (0.1178) data: 0.0042 (0.0045) lr: 0.010000 max mem: 1404
2020-01-16 15:49:08,161 maskrcnn_benchmark.trainer INFO: eta: 3:46:55 iter: 244420 loss: 1.7646 (1.8977) loss_retina_positive: 1.6248 (1.7404) loss_retina_negative: 0.1239 (0.1573) time: 0.1109 (0.1178) data: 0.0041 (0.0045) lr: 0.010000 max mem: 1404
2020-01-16 15:49:10,560 maskrcnn_benchmark.trainer INFO: eta: 3:46:52 iter: 244440 loss: 1.8001 (1.8977) loss_retina_positive: 1.6412 (1.7404) loss_retina_negative: 0.1554 (0.1573) time: 0.1126 (0.1178) data: 0.0040 (0.0045) lr: 0.010000 max mem: 1404
2020-01-16 15:49:12,817 maskrcnn_benchmark.trainer INFO: eta: 3:46:50 iter: 244460 loss: 1.7907 (1.8977) loss_retina_positive: 1.6191 (1.7404) loss_retina_negative: 0.1470 (0.1573) time: 0.1076 (0.1178) data: 0.0037 (0.0045) lr: 0.010000 max mem: 1404
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1556653215914/work/aten/src/THC/THCCachingHostAllocator.cpp line=265 error=59 : device-side assert triggered
/opt/conda/conda-bld/pytorch_1556653215914/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype *, const Dtype *, Dtype *) [with Dtype = float, Acctype = float]: block: [2887,0,0], thread: [16,0,0] Assertion *input >= 0. && *input <= 1. failed.
Traceback (most recent call last):
File "tools/train_net.py", line 171, in
main()
File "tools/train_net.py", line 164, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 73, in train
arguments,
File "/home/zz/work/FreeAnchor/maskrcnn_benchmark/engine/trainer.py", line 70, in do_train
loss_dict_reduced = reduce_loss_dict(loss_dict)
File "/home/zz/work/FreeAnchor/maskrcnn_benchmark/engine/trainer.py", line 28, in reduce_loss_dict
all_losses = torch.stack(all_losses, dim=0)
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1556653215914/work/aten/src/THC/THCCachingHostAllocator.cpp:265
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered (insert_events at /opt/conda/conda-bld/pytorch_1556653215914/work/c10/cuda/CUDACachingAllocator.cpp:564)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7fcb2ed3fdc5 in /home/zz/anaconda3/envs/fa/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: + 0x14792 (0x7fcb2bc1c792 in /home/zz/anaconda3/envs/fa/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x50 (0x7fcb2ed2f640 in /home/zz/anaconda3/envs/fa/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #3: + 0x3067fb (0x7fcb2c33c7fb in /home/zz/anaconda3/envs/fa/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #4: + 0x14019b (0x7fcb54b2019b in /home/zz/anaconda3/envs/fa/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: + 0x3bfc84 (0x7fcb54d9fc84 in /home/zz/anaconda3/envs/fa/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #6: + 0x3bfcd1 (0x7fcb54d9fcd1 in /home/zz/anaconda3/envs/fa/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #7: + 0x19dfce (0x56446760afce in /home/zz/anaconda3/envs/fa/bin/python)
frame #8: + 0x113a6b (0x564467580a6b in /home/zz/anaconda3/envs/fa/bin/python)
frame #9: + 0x103948 (0x564467570948 in /home/zz/anaconda3/envs/fa/bin/python)
frame #10: + 0x114267 (0x564467581267 in /home/zz/anaconda3/envs/fa/bin/python)
frame #11: + 0x11427d (0x56446758127d in /home/zz/anaconda3/envs/fa/bin/python)
frame #12: + 0x11427d (0x56446758127d in /home/zz/anaconda3/envs/fa/bin/python)
frame #13: + 0x11427d (0x56446758127d in /home/zz/anaconda3/envs/fa/bin/python)
frame #14: PyDict_SetItem + 0x502 (0x5644675cc602 in /home/zz/anaconda3/envs/fa/bin/python)
frame #15: PyDict_SetItemString + 0x4f (0x5644675cd0cf in /home/zz/anaconda3/envs/fa/bin/python)
frame #16: PyImport_Cleanup + 0x9e (0x56446760c91e in /home/zz/anaconda3/envs/fa/bin/python)
frame #17: Py_FinalizeEx + 0x67 (0x564467682367 in /home/zz/anaconda3/envs/fa/bin/python)
frame #18: + 0x227d93 (0x564467694d93 in /home/zz/anaconda3/envs/fa/bin/python)
frame #19: _Py_UnixMain + 0x3c (0x5644676950bc in /home/zz/anaconda3/envs/fa/bin/python)
frame #20: __libc_start_main + 0xe7 (0x7fcb651ccb97 in /lib/x86_64-linux-gnu/libc.so.6)
frame #21: + 0x1d0990 (0x56446763d990 in /home/zz/anaconda3/envs/fa/bin/python)

free_anchor_R-50-FPN_test.txt

【APs】All APs are -1.0 when I test pretrained_models on 'coco_test-dev'

when I test pretrained_models on 'coco_2017_val', I can get expected APs results like:
But when I change the DATASET.TEST to 'coco_test-dev'，all APs are -1.0....

How to solve this problem? Any suggestion? Thanks!!

how long it takes to train coco data?

Any step-by-step results available?

Thanks for sharing this great work :)

Just wonder if you have any step by step experiments for building the final loss formulation.

I tried to decompose the loss and see the effectiveness of each term, but it turns out that the loss works as a whole and sensitive to some part changes in my case.

Really appreciate your time.

import _c error

Traceback (most recent call last):
File "tools/multi_scale_test.py", line 7, in
from maskrcnn_benchmark.engine.inference import inference
File "/home/hansol/PycharmProjects/FreeAnchor/maskrcnn_benchmark/engine/inference.py", line 20, in
from maskrcnn_benchmark.structures.boxlist_ops import boxlist_iou
File "/home/hansol/PycharmProjects/FreeAnchor/maskrcnn_benchmark/structures/boxlist_ops.py", line 6, in
from maskrcnn_benchmark.layers import nms as _box_nms
File "/home/hansol/PycharmProjects/FreeAnchor/maskrcnn_benchmark/layers/init.py", line 8, in
from .nms import nms
File "/home/hansol/PycharmProjects/FreeAnchor/maskrcnn_benchmark/layers/nms.py", line 3, in
from maskrcnn_benchmark import _C
ImportError: /home/hansol/PycharmProjects/FreeAnchor/maskrcnn_benchmark/_C.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe28TypeMeta21_typeMetaDataInstanceIN3c108BFloat16EEEPKNS_6detail12TypeMetaDataEv
Traceback (most recent call last):
File "/home/hansol/anaconda3/envs/free/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/hansol/anaconda3/envs/free/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/hansol/anaconda3/envs/free/lib/python3.7/site-packages/torch/distributed/launch.py", line 235, in
main()
File "/home/hansol/anaconda3/envs/free/lib/python3.7/site-packages/torch/distributed/launch.py", line 231, in main
cmd=process.args)
subprocess.CalledProcessError: Command '['/home/hansol/anaconda3/envs/free/bin/python', '-u', 'tools/multi_scale_test.py', '--local_rank=0', '--config-file', 'config/free_anchor_X-101-FPN_e2x.yaml', 'MODEL.WEIGHT', 'free_anchor_X-64x4d-101-FPN_e2x.pth', 'DATASETS.TEST', 'coco_2017_val']' returned non-zero exit status 1.

I followed the install.me course but there are the above errors.
How can we solve this?

about test process

Hi, Thank you for your excellent work. I have a question about the test. So you just revised the loss in the training. That's not any revised in the test compare with retinenet?

loss increases when training

Hi, I try to transfer the free anchor loss into 3d object detection. But when training the positiver loss increase gradually and negative loss decrease instead. Have you ever met this kind of problem？

【AP】got unexpected AP results

I downloaded some pretrain_model .pth file using the links.
But when I do test on COCO_test-dev, all the AP results I got are -1.00...

command:
$ export NGPUS=4

$ python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_net.py --config-file configs/free_anchor_R-50-FPN_1x.yaml MODEL.WEIGHT Pretrained_Models/free_anchor_R-50-FPN_1x.pth DATASETS.TEST "('coco_test-dev',)"

Why? How to solve this problem? Thanks

AttributeError: 'Image' object has no attribute 'shape'

Dear author，the environment is ready when I tests the demo, but I cannot train my data. The problem is occur when I run it. Can you give me some advises ? Thanks

Does FreeAnchor require artificially designed anchor boxes?Such as anchor_rations=[0.5, 1.0, 2.0]

Compile Failed on Windows 10

I tried to run this command : python setup.py build develop but these error raised :


D:/Artificial Intelligence/Object Detection/FreeAnchor-master/FreeAnchor-master/maskrcnn_benchmark/csrc/cuda/ROIAlign_cuda.cu(275): error: no instance of function template "THCCeilDiv" matches the argument list
            argument types are: (long long, long)

D:/Artificial Intelligence/Object Detection/FreeAnchor-master/FreeAnchor-master/maskrcnn_benchmark/csrc/cuda/ROIAlign_cuda.cu(275): error: no instance of overloaded function "std::min" matches the argument list
            argument types are: (<error-type>, long)

D:/Artificial Intelligence/Object Detection/FreeAnchor-master/FreeAnchor-master/maskrcnn_benchmark/csrc/cuda/ROIAlign_cuda.cu(320): error: no instance of function template "THCCeilDiv" matches the argument list
            argument types are: (int64_t, long)

D:/Artificial Intelligence/Object Detection/FreeAnchor-master/FreeAnchor-master/maskrcnn_benchmark/csrc/cuda/ROIAlign_cuda.cu(320): error: no instance of overloaded function "std::min" matches the argument list
            argument types are: (<error-type>, long)

4 errors detected in the compilation of "C:/Users/127051/AppData/Local/Temp/tmpxft_000021a0_00000000-10_ROIAlign_cuda.cpp1.ii".
error: command 'C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v10.1\\bin\\nvcc.exe' failed with exit status 1

RuntimeError: CUDA error: an illegal memory access was encountered

I met this problem with 2080ti. It occurs error after several epoch. Setting image/gpu from 2 to 1 and reducing the gtBoxes of per image didn't work.
Train log:

out of memory
invalid argument
an illegal memory access was encountered
an illegal memory access was encountered
Traceback (most recent call last):
...
overlaps_th = torch.tensor(overlaps).to(boxlist1.bbox.device) #[N, M]
RuntimeError: CUDA error: an illegal memory access was encountered
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: an illegal memory access was encountered (insert_events at /pytorch/c10/cuda/CUDACachingAllocator.cpp:569)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7fb1e9515813 in /home/fw/Softwares/anaconda3/lib/python3.7/site-packages/torch/lib/libc10.so)

This is my environment information:

OS: Ubuntu 16.04 LTS 64-bit
Command: conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
GPU: 2080ti
Driver Version: 418.67
Python Version: 3.7
cuda Version: 10.1
cudnn Version: 7
pytorch Version: torch-1.1.0, torchvision-0.2.0

I try to set CUDA_LAUNCH_BLOCKING to 1 and meet the same problem. How can I solve it?

AttributeError: 'tuple' object has no attribute 'values'

When I use mmdet framework, i occured this problem:

dim=1, keepdim=True).values.clamp(min=t1 + 1e-12)
AttributeError: 'tuple' object has no attribute 'values'

Thanks.

RuntimeError: CUDA error: device-side assert triggered

Hi，I'm trying to run **CUDA_VISIBLE_DEVICES=2 python -m torch.distributed.launch --nproc_per_node=1 tools/train_net.py --config-file configs/free_anchor_R-50-FPN_1x.yaml** on a single Geforce 1080ti GPU. It could run for 760 iterations and I got the following error:

...
2019-09-14 09:19:42,314 maskrcnn_benchmark.trainer INFO: eta: 8:44:11  iter: 740  loss: 3.6038 (3.8930)  loss_retina_positive: 3.4092 (3.6858)  loss_retina_negative: 0.1473 (0.2072)  time: 0.3526 (0.3524)  data: 0.0130 (0.0210)  lr: 0.010000  max mem: 4004
2019-09-14 09:19:49,227 maskrcnn_benchmark.trainer INFO: eta: 8:43:48  iter: 760  loss: 3.6373 (3.8917)  loss_retina_positive: 3.4743 (3.6837)  loss_retina_negative: 0.1995 (0.2080)  time: 0.3483 (0.3522)  data: 0.0153 (0.0209)  lr: 0.010000  max mem: 4004
/opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype *, const Dtype *, Dtype *) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [0,0,0] Assertion `*input >= 0. && *input <= 1.` failed.
/opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype *, const Dtype *, Dtype *) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [1,0,0] Assertion `*input >= 0. && *input <= 1.` failed.
/opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype *, const Dtype *, Dtype *) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [2,0,0] Assertion `*input >= 0. && *input <= 1.` failed.
/opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype *, const Dtype *, Dtype *) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [3,0,0] Assertion `*input >= 0. && *input <= 1.` failed.
/opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype *, const Dtype *, Dtype *) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [4,0,0] Assertion `*input >= 0. && *input <= 1.` failed.
/opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype *, const Dtype *, Dtype *) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [5,0,0] Assertion `*input >= 0. && *input <= 1.` failed.
/opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype *, const Dtype *, Dtype *) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [6,0,0] Assertion `*input >= 0. && *input <= 1.` failed.
/opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype *, const Dtype *, Dtype *) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [7,0,0] Assertion `*input >= 0. && *input <= 1.` failed.
/opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype *, const Dtype *, Dtype *) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [8,0,0] Assertion `*input >= 0. && *input <= 1.` failed.
/opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype *, const Dtype *, Dtype *) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [9,0,0] Assertion `*input >= 0. && *input <= 1.` failed.
/opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype *, const Dtype *, Dtype *) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [10,0,0] Assertion `*input >= 0. && *input <= 1.` failed.
/opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype *, const Dtype *, Dtype *) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [11,0,0] Assertion `*input >= 0. && *input <= 1.` failed.
/opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype *, const Dtype *, Dtype *) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [12,0,0] Assertion `*input >= 0. && *input <= 1.` failed.
/opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype *, const Dtype *, Dtype *) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [13,0,0] Assertion `*input >= 0. && *input <= 1.` failed.
/opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype *, const Dtype *, Dtype *) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [14,0,0] Assertion `*input >= 0. && *input <= 1.` failed.
/opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype *, const Dtype *, Dtype *) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [15,0,0] Assertion `*input >= 0. && *input <= 1.` failed.
Traceback (most recent call last):
  File "tools/train_net.py", line 171, in <module>
    main()
  File "tools/train_net.py", line 164, in main
    model = train(cfg, args.local_rank, args.distributed)
  File "tools/train_net.py", line 73, in train
    arguments,
  File "/home/zlq/code/FreeAnchor/maskrcnn_benchmark/engine/trainer.py", line 74, in do_train
    loss_dict = model(images, targets)
  File "/home/zlq/anaconda3/envs/torch2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/zlq/code/FreeAnchor/maskrcnn_benchmark/modeling/detector/retinanet.py", line 62, in forward
    (anchors, detections), detector_losses = self.rpn(images, rpn_features, targets)
  File "/home/zlq/anaconda3/envs/torch2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/zlq/code/FreeAnchor/maskrcnn_benchmark/modeling/rpn/retinanet.py", line 152, in forward
    return self._forward_train(anchors, box_cls, box_regression, targets)
  File "/home/zlq/code/FreeAnchor/maskrcnn_benchmark/modeling/rpn/retinanet.py", line 159, in _forward_train
    anchors, box_cls, box_regression, targets
  File "/home/zlq/code/FreeAnchor/maskrcnn_benchmark/modeling/rpn/free_anchor_loss.py", line 114, in __call__
    (object_box_iou - self.bbox_threshold) / (H - self.bbox_threshold)
RuntimeError: CUDA error: device-side assert triggered
Traceback (most recent call last):
  File "/home/zlq/anaconda3/envs/torch2/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/zlq/anaconda3/envs/torch2/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/zlq/anaconda3/envs/torch2/lib/python3.6/site-packages/torch/distributed/launch.py", line 235, in <module>
    main()
  File "/home/zlq/anaconda3/envs/torch2/lib/python3.6/site-packages/torch/distributed/launch.py", line 231, in main
    cmd=process.args)
subprocess.CalledProcessError: Command '['/home/zlq/anaconda3/envs/torch2/bin/python', '-u', 'tools/train_net.py', '--local_rank=0', '--config-file', 'configs/free_anchor_R-50-FPN_1x.yaml']' returned non-zero exit status 1.

Did anyone meet this error and please give me some suggestions on sloving this! Thanks a lot!

RuntimeError: The size of tensor a (67) must match the size of tensor b (68) at non-singleton dimension 3

萌新一名，请问必须要搭建maskrcnn-benchmark的环境吗？ I'm a beginner ，do I have to install the environment of maskrcnn-benchmark

本人萌新一个，请问必须要搭建maskrcnn-benchmark的环境才能用freeanchor吗？
I'm a beginner ，do I have to install the environment of maskrcnn-benchmark

dimension specified as 0 but tensor has no dimensions

the issue is as follows:
Traceback (most recent call last):
File "/usr/share/pycharm/helpers/pydev/pydevd.py", line 1758, in
main()
File "/usr/share/pycharm/helpers/pydev/pydevd.py", line 1752, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "/usr/share/pycharm/helpers/pydev/pydevd.py", line 1147, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/usr/share/pycharm/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/media/zy/Documents/FreeAnchor/tools/train_net.py", line 171, in
main()
File "/media/zy/Documents/FreeAnchor/tools/train_net.py", line 164, in main
model = train(cfg, args.local_rank, args.distributed)
File "/media/zy/Documents/FreeAnchor/tools/train_net.py", line 73, in train
arguments,
File "/media/zy/Documents/FreeAnchor/maskrcnn_benchmark/engine/trainer.py", line 56, in do_train
for iteration, (images, targets, _) in enumerate(data_loader, start_iter):
File "/media/zy/Software/miniconda/envs/free_anchor/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 582, in next
return self._process_next_batch(batch)
File "/media/zy/Software/miniconda/envs/free_anchor/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 608, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
IndexError: Traceback (most recent call last):
File "/media/zy/Software/miniconda/envs/free_anchor/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 99, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/media/zy/Software/miniconda/envs/free_anchor/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 99, in
samples = collate_fn([dataset[i] for i in batch_indices])
File "/media/zy/Documents/FreeAnchor/maskrcnn_benchmark/data/datasets/coco.py", line 58, in getitem
img, target = self.transforms(img, target)
File "/media/zy/Documents/FreeAnchor/maskrcnn_benchmark/data/transforms/transforms.py", line 15, in call
image, target = t(image, target)
File "/media/zy/Documents/FreeAnchor/maskrcnn_benchmark/data/transforms/transforms.py", line 70, in call
image, target = resizer(image, target)
File "/media/zy/Documents/FreeAnchor/maskrcnn_benchmark/data/transforms/transforms.py", line 58, in call
target = target.resize(image.size)
File "/media/zy/Documents/FreeAnchor/maskrcnn_benchmark/structures/bounding_box.py", line 124, in resize
v = v.resize(size, *args, **kwargs)
File "/media/zy/Documents/FreeAnchor/maskrcnn_benchmark/structures/segmentation_mask.py", line 184, in resize
scaled.append(polygon.resize(size, *args, **kwargs))
File "/media/zy/Documents/FreeAnchor/maskrcnn_benchmark/structures/segmentation_mask.py", line 117, in resize
p[0::2] *= ratio_w
IndexError: dimension specified as 0 but tensor has no dimensions

the version of pytorch is 1.1.0.
How to fix the problem

Does it support train on my own dataset?

with multi GPUs prefer