This adds a heuristic that estimates the speedup due to inlining a callsite and increases thresholds for callsites that have an estimated speedup above a configurable minimum. The speedup is define as WeightedSavings / (WeightedCost + WeightedSavings). WeightedCost above is obtained by weighting the cost (already computed by the analysis) per basic block with the block's frequency. In order to compute the WeightedSavings, we keep track of savings due to inlining. There are 3 components to the Savings: a. The cost savings due to elimination of call overhead and argument setup overhead, b. The cost savings due to SROA, and c. The cost savings due to elimination of instructions after inlining. Note that the savings does not include the cost of blocks that are unreachable in the call context since, irresepective of inlining, those blocks are not reachable in a dynamic sense. Similar to WeightedCost, WeightedSavings is calculated by weighting the savings per block with its frequency.
This analysis is triggered only when the callsite's frequency relative to caller's entry exceeds a configurable parameter. In my experiments, I noticed that not having this filter results in increased code size in the cold regions.
I have done some parameter tuning using a hacked up version of this code (to work on the old PM) on a set of internal benchmarks. I do expect more tuning to be done on a broader set of benchmarks. Since this is active only with the new PM based pipeline, I haven't added a flag to disable this but I'll add them if you prefer so.
nit: Please use vertical whitespace before the comment for the second flag.
More substantive comment, what unit is this? The "desc" string only says "speedup", but it isn't clear if you mean "10% faster after inlining"? If so, that seems a surprisingly low threshold...