Giter Site home page Giter Site logo

paper related problem about yolof HOT 4 CLOSED

megvii-model avatar megvii-model commented on June 27, 2024
paper related problem

from yolof.

Comments (4)

chensnathan avatar chensnathan commented on June 27, 2024 3

Hi, thanks for your questions.

In MiMo encoders, there are five levels of features, from P3 to P7. The receptive field in P6 and P7 is relatively larger than C5, making the receptive field problem not severe. However, in SiSo encoders, there is only one level feature (C5 or DC5). We have to enlarge the receptive field, compensating for the lack of P6 and P7.

While for small objects, it is true that C5 achieves an inferior performance in detecting small objects (Shown in Table1, YOLOF(19.1 mAP) vs. RetinaNet+(22.2 mAP)). However, it is hard to recover more detailed information from a high-level feature, thus we try to keep the detailed information in the C5 feature by adopting a shortcut.

We also show a feasible way to improve the detection performance on small objects in Table 9 in the paper. We use the dilated C5 feature to replace the C5 feature. The performance of small objects can be improved (from 19.1 mAP(YOLOF-C5-1x) to 22.3 mAP(YOLOF-DC5-1x)).

from yolof.

chensnathan avatar chensnathan commented on June 27, 2024 1

Yes, the performance drop is mainly caused by small objects.

The results are list below:
MiMo:
AP | AP50 | AP75 | APs | APm | APl |
35.9 | 55.8 | 38.5 | 19.9 | 39.6 | 47.9 |

SiMo:
AP | AP50 | AP75 | APs | APm | APl |
35.0 | 54.8 | 37.1 | 17.4 | 39.3 | 47.8 |

from yolof.

yypurpose avatar yypurpose commented on June 27, 2024

Thanks for your timely and detailed answer!

So it's bridging the gap between SiSo and SiMo, not SiSo and MiMo! My understanding was wrong. And dilated conversations is to make up for the missing of P6 & P7. I DO UNDERTAND THIS TIME.

And as you have said in the paper, SiMo is comparable with MiMo (35.0mAP vs. 35.9mAP), is the main drop caused by the small object or it's uniform? Could you provide the detailed results of the four models in Fig. 1 as I did not found them in the paper. I'm really curious about it.

THANKS A LOT!!

from yolof.

yypurpose avatar yypurpose commented on June 27, 2024

Thanks a lot!

from yolof.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.