Call instructions had a penalty of 25 applied during inline cost analysis. This seems nonsensical - calls surely don't cost any more than any other instruction when inlined. I presume that the logic was that a call dominates in terms of time so as to diminish the benefits of inlining other instructions. In reality this penalty was applied in some cases and not others, causing skewed behaviour and some pretty glaringly obvious errors. For example:
int g() { h(); } int f() { g(); }
With an inline threshold set to zero (which *should* mean no code bloat), we won't inline g into f. This is because there's a call penalty of 25 applied for the call to h(), but there *isn't an equivalent bonus applied for removing the call to g()*! This has caused the inlining threshold for minsize to be set to a default of... 25. Which means that for functions that *don't* have a call in them, we'll accept around 5 instructions of code bloat in minimum size mode!
This call penalty has logic behind it but it's skewing decisions and causing us to try and counterbalance it in strange places. Nuke it.
This improves code size by a percentage point or so over the test-suite in minsize mode, but most importantly removes some really quite glaring clangers of inlining decisions and is a more sane default. The effects on -Os and -O3 is very minimal, because their thresholds are 125 and 225 respectively which are really quite large. However with functions with many calls inside this may change the inlining decision.
Zero would be the obvious choice for minsize, however I've noticed code size regressions when picking 0 - most of these are fixed with a threshold of 5. The reason for this is the cost model isn't accurate - being a little more aggressive allows us to take some risks that sometimes (or often) pay off. For example:
int g(int a) { h(0); } int f() { g(); }
With a threshold of zero we will not inline g into f, because the call to h requires one more argument to set up than the call to g it replaces. In reality this extra parameter will cause negligible code size impact, and we have the possibility of removing g entirely if nothing else references it (happens often when using C++ containers).
I've tested this on the test-suite and it gives decent results. Excluding TSVC (see later):
242 total test executables 68 have a code size improvement from between 0.01% - 4.6% (most of these are tiny, in the 0.01 - 1.00% range) 34 have a code size regression from between 0.01% - 5.3% (most of these are tiny, in the 0.01 - 1.00% range) 0.07% geomean improvement in code size.
The TSVC benchmarks were excluded because they *improve* in codesize by 40% - their size almost halves. There are 36 of these, with code size improvements ranging from 32% to 45%.
7.55% geomean improvement in code size, with TSVC included in the sample.
These tests were done on Armv7, thumb mode, -Oz. I have confirmed that -O3 looks vaguely similar to how it did before (not surprising - the threshold there is 225 so it less likely to be affected by this change).
No, I don’t mean the parameters, the visitor account for one instruction per parameter just before this indeed.
However a call has other extra-cost on top of a regular instruction, even without any argument: it splits live-ranges and may cause spills around it to preserve registers that aren't callee-saved.
In this sense, it make sense that a call can "cost" more than a normal instruction.