This patch introduces an inline heuristic to be used whilst the cost of inlining is calculated for calls at '-Oz' optimization level.
When the cost of inlining is calculated for a call which has the fast calling convention and it is a call of a frequently used function, the penalty of the call is not taken into account. By default a function is frequently used if a number of its users is greater than 3. This value can be change via the command-line option '-inline-freq-func-threshold'. The heuristic allows inlining of very small functions, like a few instructions.
The rationale for the heuristic:
- Fast calls are usually optimized to minimize the call penalty, e.g. parameters are passed through registers.
- Inlining of frequently used functions increases the code size.
- It's based on real examples of code for microcontrollers (v6-M). We see an improvement of the code size of the examples ~6.5%.
Benchmark results of the LNT testsuite:
Code size of only three benchmarks were affected by the heuristic.
x86 (i7-4770):
Code size improvement | Perf regression | |
MultiSource/Applications/SPASS/SPASS | 6.67% | ~0% |
MultiSource/Benchmarks/FreeBench/distray/distray | 2.46% | ~0% |
MultiSource/Applications/sqlite3/sqlite3 | 1.34% | ~0% |
v7-A (Cortex-A53, Thumb)
Code size improvement | Perf regression | |
MultiSource/Applications/SPASS/SPASS | 4.93% | ~3% |
MultiSource/Applications/sqlite3/sqlite3 | 1.12% | ~1.5% |
AArch64 (Cortex-A57)
Code size improvement | Perf regression | |
MultiSource/Applications/SPASS/SPASS | 3.95% | ~3% |
MultiSource/Benchmarks/FreeBench/distray/distray | 1.17% | ~0% |
MultiSource/Applications/sqlite3/sqlite3 | 1.07% | ~0% |
This comparison seems backwards... generally, we assume non-local functions have many callers, and therefore treat them the same way as local functions with many callers.