The per-callsite size threshold used today to drive preinline decision is based on hotness/coldness cutoff. The default setup is for callsites with a sample count above the hotness cutoff (99%), a 1500 size threshold is used. Any callsite below 99.99% coldness cutoff uses a zero threshold. This has a couple issues:
- While both cutoffs and size thoresholds are configurable, different applications may need different setups, making a universal setup impractical.
- The callsites between hotness cutoff and coldness cutoff are not considered as inline candidates, which could be a missing opportunity.
- Hot callsites always use the same threshold. In reality we may want a bigger threshold for hotter callsites.
In this change we are introducing a linear threshold regardless of hot/cold cutoffs. Given a sample space, a threshold is computed for a callsite based on the position of that callsite sample in the whole space. With that we no longer need to define what's hot or cold. Callsites with different hotness will get a different threshold. This should overcome the above three issues.
I have seen good results with a universal default setup for two of our internal services.
For one service, 0.2% to 0.5% perf improvement over a baseline with a previous default setup, on-par code size.
For the second service, 0.5% to 0.8% perf improvement over a baseline with a previous default setup, 0.2% code size increase; on-par performance and code size with a baseline that is with a carefully tuned cutoff to cover enough hot functions.
One concern about using getMaxCount -- that will have more instability and variation run to run. How about using HotCountThreshold (or N * HotCountThresold, or a set percentile) to stabilize?
The NormalizedHotness doesn't have to be between 0 and 1. Or if we want it to be between 0 and 1, we can also do min(NormalizedHotness, 1).