If the normal destination of the invoke or the parent block of the call site is unreachable-terminated, there is little point in inlining the call site unless there is literally zero cost. Unlike my previous change (D15289), this change specifically handle the call sites followed by unreachable in the same basic block for call or in the normal destination for the invoke. This change could be a reasonable first step to conservatively inline call sites leading to an unreachable-terminated block while BFI / BPI is not yet available in inliner.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
The patch is straightforward and looks reasonable to me.
lib/Analysis/InlineCost.cpp | ||
---|---|---|
1217 ↗ | (On Diff #46084) | If the normal destination of the invoke or the parent block of the call site is unreachable-terminated, ... |
The only concern I have is the suppression of inlining of hot calls in the following hypothetical example:
main() {
hot_call_1(); ... hot_call_N() exit(0);
}
Even though each of the hot_call_X calls are executed once, inlining may be beneficial (due to constprop for example). Short of checking for caller to be main (as you've done in your previous patch), is there a way to distinguish this case? I suppose this pattern is very rare in real code to worry about this.
Thanks Easwaran for the comment.
I can see that the case you mention could happen with this change. Although I tried to come up with a workaround without BPI, nothing seems to show desirable results in my test environment. To cover this case, I may want to wait until BPI is hooked in inliner and revisit this to elaborate it by checking the actual coldness of the block. So, in future, we may only suppress inlining a call site in a block that is leading to an unreachable and dominated by a block of which all edges from predecessors have zero branch probability.
I mentioned this in another reply. This scenario is rare so this patch
is still good in general. If we want to handle it in the future, this
heuristics needs to be moved up and can be followed by other threshold
adjusting heuristics -- but that should not be required for this
patch.
David
Sounds reasonable. I also don't expect this kind of code to appear in real life, but one can imagine microbenchmarks or toy programs to have this pattern.