Hi, thanks for your great work AttaNet and I'm pretty interested in your research.

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

MS/Flip Be clear that SFNet uses MS/Flip when testing on ADE2

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Quantitive results of AttaNet about attanet HOT 14 CLOSED

wondervictor commented on September 14, 2024 5

Quantitive results of AttaNet

from attanet.

Comments (14)

wondervictor commented on September 14, 2024 5

Besides, reporting the inference time and corresponding accuracy with the single-scale input without cropping or flipping will be more fair in comparison with other methods

from attanet.

wondervictor commented on September 14, 2024 5

Just to be clear, SFNet adopts the single scale testing with input size 1024x2048 and BiSeNet adopts a downsampled input 1024x2048. Neither of them adopts multi-scale testing / sliding / flipping in test. Notably, we fixed the input size to 1024x2048 or 512x1024 of the AttaNet but failed to reach the results of the paper. (You can see the details of the discussions above. I'm not alone). Moreover, evaluating speed without torch.cuda.synchronize() is a serious bug and leads to wrong inference time(time w/ synchronize >> time w/o synchronize) .

Actual speed and accuracy of the proposed AttaNet grabs more attention. Providing correct evaluation scripts is urgent since the repo has been open sourced for several months.
Thanks.

from attanet.

wondervictor commented on September 14, 2024 2

Further, I've downloaded the code & models and evaluated the speed and accuracy in my local machine.
Specs: GPU: NVIDIA Titan Xp, CPU: 2 Intel Xeon E5-2620 v3.

Model: AttaNet w/ ResNet-18

Speed: 1024x2048 input size

inference.py outputs:

load resnet
start warm upwarm up done
=======================================
FPS: 24.972443
Inference time 40.044140 ms

Accuracy: 1024x2048 input size * w/o crop and flip*

evaluate.py outputs:

================================================================================
evaluating the model ...

setup and restore model
load resnet
compute the mIOU
 61%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌                                                                            | 305/500 [03:10<02:55,  1.11it/s]100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500/500 [06:09<00:00,  1.35it/s]
[0.98095101 0.84992488 0.91837538 0.64723809 0.65401004 0.5827008
 0.61811279 0.75313067 0.91223381 0.68902523 0.92999311 0.77462518
 0.5425117  0.94219012 0.85731529 0.87147848 0.79053572 0.52289506
 0.739424  ]
0.7671932295569451
mIOU is: 0.767193

from attanet.

ydhongHIT commented on September 14, 2024 2

I find that the speed test code does not use torch.cuda.synchronize().

from attanet.

wondervictor commented on September 14, 2024 1

Hi @liuzhidemaomao, your results (76.7 mIoU and 55.2 FPS w.r.t 1024x2048 input) are consistent with mine regardless of the gpu. (Results from Table 1 in original paper are 78.5 mIoU and 130 FPS on 1080Ti, which is much slower than 2080Ti)
In my opinion, reporting the speed with the same setting (inference setting: single scale or test-time augmentation ) as the performance evaluation is more convincing and reasonable.
However, using test-time augmentation (crop and flip in evaluate.py) to reach higher accuracy but providing the speed in another setting (input is 512x1024) will be misleading for the community to use.
Moreover, other methods cited in Table 1 and Figure 1 adopt the same setting for both performance evaluation and speed evaluation as far as I know.

from attanet.

songqi-github commented on September 14, 2024 1

MS/Flip
Be clear that SFNet uses MS/Flip when testing on ADE20K (see Tabel 5 in SFNet). In Table 5 of our paper, nearly all the comparison methods use MS/Flip, to compare with those methods, we also use MS/Flip on ADE20K.
w/ synchronize
In our code, w/ or w/o synchronize doesn't influence the inference speed.
Real-time evaluate
We will upload the weights and the evaluation file for real-time testing soon, please wait for that.

from attanet.

songqi-github commented on September 14, 2024

Hi, thanks for your attention to our paper. There must be some problem with your fps testing. We have tested several times on our GPU, and we can achieve at least 120 fps even when the GPU is much slower than other same types. Please check your code and environment. About the accuracy testing, we mainly follow the evaluation method in BiSeNetV2 and SFNet to ensure fairness. We use this file for multi-scale testing of ResNet-50/101 and not for real-time accuracy testing in our paper. The inference time and corresponding accuracy use the same testing settings in our paper. We will upload the used one for real-time accuracy testing as soon as possible.

from attanet.

wondervictor commented on September 14, 2024

Hi @songqi-github
Indeed, AttaNet is a great work with some efficient designs.
To compare with SFNet, FANet and etc. which adopts the 1024x2048 input, I modified the script inference.py by changing the input size to 1024x2048 and removing the downsample operation and the model achieved 25 FPS with 76.7 mIoU.
Besides, BiSeNetV2 adopts 512x1024 input to evaluate mIoU and inference speed without evaluation tricks.

We do not adopt any evaluation tricks, e.g., sliding-window evaluation and multi-scale testing, which can improve accuracy but are time-consuming. With the input of 2048 × 1024 resolution, we first re- size it to 1024 × 512 resolution to inference and then resize the prediction to the original size of the input. We measure the inference time with only one GPU card and repeat 5000 iterations to eliminate the error fluctuation. We note that the time of resizing is included in the inference time measurement. In other words, when measuring the inference time, the practical input size is 2048 × 1024

In my opinion, reporting the inference time and mIoU without test-time augmentations is more convincing. In other words, the time of inferencing chips cropped or flipped for each input should be added.

from attanet.

liuzhidemaomao commented on September 14, 2024

Further, I've downloaded the code & models and evaluated the speed and accuracy in my local machine.
Specs: GPU: NVIDIA Titan Xp, CPU: 2 Intel Xeon E5-2620 v3.

Model: AttaNet w/ ResNet-18

Speed: 1024x2048 input size

inference.py outputs:

load resnet
start warm upwarm up done
=======================================
FPS: 24.972443
Inference time 40.044140 ms

Accuracy: 1024x2048 input size * w/o crop and flip*

evaluate.py outputs:

================================================================================
evaluating the model ...

setup and restore model
load resnet
compute the mIOU
 61%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌                                                                            | 305/500 [03:10<02:55,  `1.11it/s]100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|` 500/500 [06:09<00:00,  1.35it/s]
[0.98095101 0.84992488 0.91837538 0.64723809 0.65401004 0.5827008
 0.61811279 0.75313067 0.91223381 0.68902523 0.92999311 0.77462518
 0.5425117  0.94219012 0.85731529 0.87147848 0.79053572 0.52289506
 0.739424  ]
0.7671932295569451
mIOU is: 0.767193

I have re-trained and re-evaluated the code in my own machine without any changes.
My environment:
GPU: GeForce RTX 2080ti, CPU: Intel(R) Core(TM) i9-10900X CPU @ 3.70GHz

inference.py output

evaluate.py output

After change the inference.py with input size 1024*2048.
The result:

from attanet.

lxtGH commented on September 14, 2024

@wondervictor

Hi @liuzhidemaomao, your results (76.7 mIoU and 55.2 FPS w.r.t 1024x2048 input) are consistent with mine regardless of the gpu. (Results from Table 1 in original paper are 78.5 mIoU and 130 FPS on 1080Ti, which is much slower than 2080Ti)
In my opinion, reporting the speed with the same setting (inference setting: single scale or test-time augmentation ) as the performance evaluation is more convincing and reasonable.
However, using test-time augmentation (crop and flip in evaluate.py) to reach higher accuracy but providing the speed in another setting (input is 512x1024) will be misleading for the community to use.
Moreover, other methods cited in Table 1 and Figure 1 adopt the same setting for both performance evaluation and speed evaluation as far as I know.

I agree with you. I can not reproduce this work using my own codebase. with 1024*2048 input, I obtain 76.8 mIoU. With 512 x 1024 input , the result is very bad.
What is your results using 512 x 1024 input ?

from attanet.

lxtGH commented on September 14, 2024

I find that the speed test code does not use torch.cuda.synchronize().

Hi! @ydhongHIT Interesting, Did you test the speed using the torch.cuda.synchronize()
wutianyiRosun/CGNet#2

from attanet.

ydhongHIT commented on September 14, 2024

I find that the speed test code does not use torch.cuda.synchronize().

Hi! @ydhongHIT Interesting, Did you test the speed using the torch.cuda.synchronize()
wutianyiRosun/CGNet#2

I didn't test the speed but I think it may explain why the test speed of you is different from author's.

from attanet.

songqi-github commented on September 14, 2024

In the previous reply, we already said that we used the same settings in both speed testing and performance evaluation. The given evaluate.py is used for multi-scale testing for heavy models. We are still working on this repo, and we'll try to release the full code soon. Please check how to implement SAM and AFM first.

from attanet.

BUAA-LKG commented on September 14, 2024

@wondervictor

Hi @liuzhidemaomao, your results (76.7 mIoU and 55.2 FPS w.r.t 1024x2048 input) are consistent with mine regardless of the gpu. (Results from Table 1 in original paper are 78.5 mIoU and 130 FPS on 1080Ti, which is much slower than 2080Ti)
In my opinion, reporting the speed with the same setting (inference setting: single scale or test-time augmentation ) as the performance evaluation is more convincing and reasonable.
However, using test-time augmentation (crop and flip in evaluate.py) to reach higher accuracy but providing the speed in another setting (input is 512x1024) will be misleading for the community to use.
Moreover, other methods cited in Table 1 and Figure 1 adopt the same setting for both performance evaluation and speed evaluation as far as I know.

I agree with you. I can not reproduce this work using my own codebase. with 1024*2048 input, I obtain 76.8 mIoU. With 512 x 1024 input , the result is very bad. What is your results using 512 x 1024 input ?

Have you reproduce this work？

from attanet.

Quantitive results of AttaNet about attanet HOT 14 CLOSED

Comments (14)

Related Issues (17)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent