Hot callsites should have higher threshold than inline hints. This patch uses separate threshold parameter for hot callsites.
Details
Diff Detail
Event Timeline
it looks good for me, but I am not really competent to accept it yet, so I guess wait for the other reviewers.
should we have a fixed cutoff threshold, or make it adaptive to the hotness ? Or more generally, make it part of the global speedup analysis (the larger the global speedup, the larger the speedup) -- which we will soon have?
Another thing: PGO and autoFDO should reduce overall text size compared with O2 build, so we should also do the opposite -- if the caller has zero count (non samples with autoFDO), the threshold should be reduced.
I agree. But for now, the PSI interface only give use boolean value for hot/cold inquiry. How about we have this fixed cutoff in first, later when global speedup analysis is ready, we use the speedup as parameter to set the threshold?
Another thing: PGO and autoFDO should reduce overall text size compared with O2 build, so we should also do the opposite -- if the caller has zero count (non samples with autoFDO), the threshold should be reduced.
Added cold callsite heuristic.
lib/Analysis/InlineCost.cpp | ||
---|---|---|
69 | Adding a threshold different from inlinehint-threshold makes sense. But if you want to set the default this high, then it is important to have numbers to justify. Running spec with different thresholds and a comment here saying you chose this default because this gives the best performance/size tradeoff is important. |
LGTM.
[Copying Dehao's comments from the review thread for reference]
I experimented different thresholds from 325 (original) to 4000. The code size change is within 2% fall all speccpu2006 int benchmarks. In terms of performance. Up to 4% speedup is observed for perlbench when changing threshold from 325 to 3000. And above 3000, the performance curve remains flat. For all other speccpu2006 int benchmarks, the performance change does not escape noise range when changing threshold from 325 to 4000. Similar performance/size result is observed for internal benchmarks. So I think 3000 seems to be a sweet spot for the threshold.
Adding a threshold different from inlinehint-threshold makes sense. But if you want to set the default this high, then it is important to have numbers to justify. Running spec with different thresholds and a comment here saying you chose this default because this gives the best performance/size tradeoff is important.