This CL adds a tweak to be used by large codebases to noinline certain
commonly-used functions. This can reduce binary size considerably for some
codebases. For example, Chromium on Linux was reduced by 4.51MiB or 3.31%.
Details
- Reviewers
thakis EricWF mclow.lists pcc
Diff Detail
Event Timeline
include/__config | ||
---|---|---|
782 | This should be documented in docs/DesignDocs/VisibilityMacros.rst instead of in the header. | |
812 | I don't have a strong objection to the intention of the patch, but I really dislike the name. | |
814 | I understand wanting to disable the forced inlining of functions, but I don't understand why you would want to explicitly disable inlining. |
include/string | ||
---|---|---|
802 | Most of these functions *can't* be inlined by the compiler unless the inline keyword is explicitly specified because these functions are externally instantiated. I suspect not only that these functions not need __attribute__((noinline)) but that would implicitly have that effect if __attribute__((always_inline)) was omitted. |
As previously mentioned, most of these changes deal with symbols which are externally instantiated. These symbols cannot be inlined unless the inline keyword is explicitly specified. The addition of __attribute__((noinline)) likely has no effect in a large number of cases.
include/__config | ||
---|---|---|
814 | I'm building Chromium on Linux with lld (without LTO) with -O2. With the attribute((noinline))'s from this patch, the binary size is 194M, 203296816B. Without the noinline's, it's 197M, 206227536B. If I use -Os instead, the measurements are 191M, 200064384B and 192M, 200578400B. Better, but still not perfect. +pcc do you know why lld is having trouble noinlining these functions? |
include/__config | ||
---|---|---|
814 | Since you have LTO disabled, the choice of linker shouldn't make a difference with regards to inlining here. (Even with LTO you shouldn't see a difference with Chromium because Chromium links with --lto-O0.) Note that we're not trying to avoid inlining for performance reasons but for code size reasons. Finding a good heuristic for the latter in the inliner is, unfortunately, an unsolved problem as of yet. (One of the reasons is that the heuristic would need be something of the form "if I inline this, will it unblock other size optimizations on the inlined code" and LLVM's optimizers just don't work that way, at least not yet.) | |
include/string | ||
802 | You might consider moving these attributes onto the definitions, which is where the inline keyword should appear (or not appear) anyway. |
This should be documented in docs/DesignDocs/VisibilityMacros.rst instead of in the header.