This patch teaches shouldBeDeferred to take into account the total
cost of inlining.
Suppose we have a call hierarchy {A1,A2,A3,...}->B->C. (Each of A1,
A2, A3, ... calls B, which in turn calls C.)
Without this patch, shouldBeDeferred essentially returns true if
TotalSecondaryCost < IC.getCost()
where TotalSecondaryCost is the total cost of inlining B into As.
This means that if B is a small wraper function, for example, it would
get inlined into all of As. In turn, C gets inlined into all of As.
In other words, shouldBeDeferred ignores the cost of inlining C into
each of As.
This patch replaces the expression above with:
TotalCost < Allowance
where
- TotalCost is TotalSecondaryCost + IC.getCost() * # of As, and
- Allowance is IC.getCost() * Scale
For now, Scale defaults to 2, which essentially limits the number of
As to 1 for shouldBeDeferred to return true.
With this patch, Clang PGO bootstrap results in a 0.33% smaller .text*
sections. Compiling the 10 largest preprocessed files of Clang with
the PGO bootstrapped clang takes:
- 69.677 seconds on average of five runs without the patch, and
- 68.939 seconds on average of five runs with the patch.
NumCallerUsers may be more explicit.