If profile data is available then we can use it to avoid outlining hot blocks by using the -machine-outliner-use-profile-data backend flag. If profile data is not available then we fall back to the target defaults.
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Can you add test cases for:
- hot block + optsize minsize
- cold block + optsize minsize
?
llvm/lib/CodeGen/MachineOutliner.cpp | ||
---|---|---|
911 | I'm not sure I understand this correctly. When UseProfileData is true, do we ignore the other two conditions? |
llvm/lib/CodeGen/MachineOutliner.cpp | ||
---|---|---|
911 | There are basically three reasons why the machine outliner would run.
On second thought I should also check F.hasProfileData() so that we don't conservatively assume unprofiled functions are always hot. |
Check if profile data is available before trying to use and add a test to make sure we enable outlining for minsize functions without profiles.
Remove dependency on isBlockRarelyExecuted() that I created in D124490 and replace with ProfileSummaryInfo API to determine if a block is hot.
I'm also wondering if I should use the isHotBlockNthPercentile() function and add an option to specify the percentile. Maybe something like -machine-outliner-cutoff-prof=99900.
llvm/lib/CodeGen/MachineOutliner.cpp | ||
---|---|---|
121 | I don't think we do this for other passes. Once the plumbing for BFI/MBFI is done for passes to leverage profile info, we just always do that. Using profile should indeed always be better if heuristic is reasonable. Why you need a flag for this case? If you need some confidence in profile quality, you can check on profile (PSI->hasInstrumentationProfile()) etc. | |
380 | Why do we require BFI instead of MBFI given this is dealing with MIR. | |
940 | Usually we call higher level API like isHotBlock instead of isHotBlockNthPercentile. You can tweak the global percentile by -profile-summary-cutoff-hot. |
I'm planning changes to this for now, see the TODO comments.
llvm/lib/CodeGen/MachineOutliner.cpp | ||
---|---|---|
380 | I would have liked to use LazyMachineBlockFrequencyInfoPass instead, but it seems that does not work here because this is a ModulePass instead of a MachineFunctionPass. |
I don't think we do this for other passes. Once the plumbing for BFI/MBFI is done for passes to leverage profile info, we just always do that. Using profile should indeed always be better if heuristic is reasonable. Why you need a flag for this case?
If you need some confidence in profile quality, you can check on profile (PSI->hasInstrumentationProfile()) etc.