Giter Site home page Giter Site logo

Overzealous inline opitimizations? about ftl HOT 7 OPEN

oliv3r avatar oliv3r commented on June 30, 2024
Overzealous inline opitimizations?

from ftl.

Comments (7)

DL6ER avatar DL6ER commented on June 30, 2024

Generally the idea is 'do not inline, let the compiler figure it out, it knows best.

No, because "best" is not a well-defined term here. You see that -O3 and -Os define "best" in two very different ways. Pi-hole itself is optimized for speed because that's what matters with any real application these days (space is cheap as it has never been). The only reason where I see justification for -Os is the MCU world where you may really still be restricted to a few KB of program memory.

gravityDB_finalize_client_statements(*client) involves quite some code that'd be unnecessarily be duplicated all over the place without a definition in a common place. Forcing inlining here actually gives a small speed enhancement as this function is called rather often and the mere fact that we inline saves us the costs of one context switch. Even when this isn't much, it is still measurable at runtime. The compiler, however, doesn't know if this function is called very often (without a profiling) and typically decides against inlining a function that has more than like three lines of code (the exact number has changes a few times in the history of gcc) which is not what we want.

At this point, FTL isn't meant to be compiled with -Os and I'd only consider changing proven efficient code if you can describe a real scenario where buying a barely - if at all - measureable reduction in application size at the costs of a measurable slowdown is worthwhile. In the end we are talking about a reduction at most a few KB (in an application in the multi-MB range this is < 0.1%) whereas the expected slowdown due to removing many optimizations which are known to often increase instruction size* will very likely be in the measurable low integer percent range.

TL;DR: Compiling for size comes at a measurable speed penalty. So far, this isn't a really supported configuration and I would not speak about "overzealous" in this context just because we give the compiler hints it cannot know itself.


*) not only our manual inlining instructions are removed, but also function/jump/labels/loops alignment, prefetching of loop arrays, and reordering of block algorithms, which can greatly reduce branch mis-prediction in the exectuable

from ftl.

oliv3r avatar oliv3r commented on June 30, 2024

Generally the idea is 'do not inline, let the compiler figure it out, it knows best.

No, because "best" is not a well-defined term here. You see that -O3 and -Os define "best" in two very different ways. Pi-hole itself is optimized for speed because that's what matters with any real application these days (space is cheap as it has never been). The only reason where I see justification for -Os is the MCU world where you may really still be restricted to a few KB of program memory.

I beg to differ, but your point does stand. My accesspoint that has 8 mb of flash and 128 mb of ram running openwrt doesn't have 'space to burn' no matter how cheap it is. Also, -Os has an more important impact, which is caching. Caches is still limited in a lot of CPU's because space for cache is expensive. Also being more effective in cache usage does improve speed and performance. Benchmarks would need to prove this of course.

So there certainly is still usage for Os in certain scenario's. However as I said, your point does stand and I do agree with it for the most part. ;)

gravityDB_finalize_client_statements(*client) involves quite some code that'd be unnecessarily be duplicated all over the place without a definition in a common place. Forcing inlining here actually gives a small speed enhancement as this function is called rather often and the mere fact that we inline saves us the costs of one context switch. Even when this isn't much, it is still measurable at runtime. The compiler, however, doesn't know if this function is called very often (without a profiling) and typically decides against inlining a function that has more than like three lines of code (the exact number has changes a few times in the history of gcc) which is not what we want.

I'm more then happy to accept that you know what you are doing :) no need to prove anything :p

At this point, FTL isn't meant to be compiled with -Os and I'd only consider changing proven efficient code if you can describe a real scenario where buying a barely - if at all - measureable reduction in application size at the costs of a measurable slowdown is worthwhile. In the end we are talking about a reduction at most a few KB (in an application in the multi-MB range this is < 0.1%) whereas the expected slowdown due to removing many optimizations which are known to often increase instruction size* will very likely be in the measurable low integer percent range.

And that's perfectly fine! So the answer to my question is 'no, we know what we are doing'. And that's great :) I should have potentially given more context, and that is I was trying to use the CMake target MinSizeRel because for the some parts, (openwrt, alpine, etc) it's the commonly picked target. This lead me down a path of compilation errors, where compilation was no longer possible.

That means, that

set(CMAKE_C_FLAGS_MINSIZEREL "-Os -DNDEBUG")
needs to be either removed, as it's not even valid, or (a bit better imo) is to either disable -Winline for that target (could we put -Wnonline there? or increase the inline warning limit (to which I had no success with so far).

TL;DR: Compiling for size comes at a measurable speed penalty. So far, this isn't a really supported configuration and I would not speak about "overzealous" in this context just because we give the compiler hints it cannot know itself.

Well then you conclude say we should actually remove MinSizeRel!

*) not only our manual inlining instructions are removed, but also function/jump/labels/loops alignment, prefetching of loop arrays, and reordering of block algorithms, which can greatly reduce branch mis-prediction in the exectuable

👍

from ftl.

DL6ER avatar DL6ER commented on June 30, 2024

My accesspoint that has 8 mb of flash and 128 mb of ram running openwrt doesn't have 'space to burn' no matter how cheap it is.

Ah, yes. We do not "support" any embedded devices out of the box so that's why I haven't thought about them. Sure, there are so many different devices out there that not everyone will have a USB port or whatsover with which you could extend storage for the hundreds GBs with only couple of dollars. This context indeed sheds another light on your endeavor - one which I can easily follow.

Well then you conclude say we should actually remove MinSizeRel!

I have no objections. A quick look on git blame on this line reveals it was an external contribution and (assumption ahead!) probably originated from some default template just as the other two (DEBUG and RELEASE).

The Pi-hole team itself solely uses the RELWITHDEBINFO target both with our internal tooling as well as with the recommended compiling steps packed together into build.sh. I'd be ready to say that MinSizeRel was probably tried back when this was added but never since again.

from ftl.

kocburak avatar kocburak commented on June 30, 2024

Hi, (newbiew here)
Why aren't we using -Ofast instead of -O3 ?

from ftl.

oliv3r avatar oliv3r commented on June 30, 2024

@DL6ER so for me, I'm just building with Release and I'm done; but I suppose you guys have to make a choice here. Or at least have test targets that build them ;)

So I'll leave this issue open and you can close it with whatever solution you feel fits best.

from ftl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.