This is an archive of the discontinued LLVM Phabricator instance.

Replace hot-callsite based heuristic to use its own threshold parameter instead of share inline-hint parameter
ClosedPublic

Authored by danielcdh on Jul 14 2016, 10:02 AM.

Details

Summary

Hot callsites should have higher threshold than inline hints. This patch uses separate threshold parameter for hot callsites.

Diff Detail

Event Timeline

danielcdh updated this revision to Diff 63998.Jul 14 2016, 10:02 AM
danielcdh retitled this revision from to Replace hot-callsite based heuristic to use its own threshold parameter instead of share inline-hint parameter.
danielcdh updated this object.
danielcdh added reviewers: eraman, davidxl.
danielcdh added a subscriber: llvm-commits.
Prazek added a subscriber: Prazek.Jul 16 2016, 3:42 PM

it looks good for me, but I am not really competent to accept it yet, so I guess wait for the other reviewers.

davidxl edited edge metadata.Jul 19 2016, 11:05 AM

should we have a fixed cutoff threshold, or make it adaptive to the hotness ? Or more generally, make it part of the global speedup analysis (the larger the global speedup, the larger the speedup) -- which we will soon have?

Another thing: PGO and autoFDO should reduce overall text size compared with O2 build, so we should also do the opposite -- if the caller has zero count (non samples with autoFDO), the threshold should be reduced.

danielcdh updated this revision to Diff 64552.Jul 19 2016, 1:25 PM
danielcdh edited edge metadata.

add cold callsite heuristic

should we have a fixed cutoff threshold, or make it adaptive to the hotness ? Or more generally, make it part of the global speedup analysis (the larger the global speedup, the larger the speedup) -- which we will soon have?

I agree. But for now, the PSI interface only give use boolean value for hot/cold inquiry. How about we have this fixed cutoff in first, later when global speedup analysis is ready, we use the speedup as parameter to set the threshold?

Another thing: PGO and autoFDO should reduce overall text size compared with O2 build, so we should also do the opposite -- if the caller has zero count (non samples with autoFDO), the threshold should be reduced.

Added cold callsite heuristic.

eraman added inline comments.Jul 19 2016, 5:26 PM
lib/Analysis/InlineCost.cpp
69

Adding a threshold different from inlinehint-threshold makes sense. But if you want to set the default this high, then it is important to have numbers to justify. Running spec with different thresholds and a comment here saying you chose this default because this gives the best performance/size tradeoff is important.

eraman edited edge metadata.Jul 25 2016, 1:11 PM

LGTM.

[Copying Dehao's comments from the review thread for reference]

I experimented different thresholds from 325 (original) to 4000. The code size change is within 2% fall all speccpu2006 int benchmarks. In terms of performance. Up to 4% speedup is observed for perlbench when changing threshold from 325 to 3000. And above 3000, the performance curve remains flat. For all other speccpu2006 int benchmarks, the performance change does not escape noise range when changing threshold from 325 to 4000. Similar performance/size result is observed for internal benchmarks. So I think 3000 seems to be a sweet spot for the threshold.

This patch still needs an approval?

eraman accepted this revision.Aug 5 2016, 10:24 AM
eraman edited edge metadata.

This patch still needs an approval

LGTM from me. Don't know if David has any other comments.

This revision is now accepted and ready to land.Aug 5 2016, 10:24 AM
danielcdh closed this revision.Aug 5 2016, 1:36 PM