This is an archive of the discontinued LLVM Phabricator instance.

[Inliner] Inline through indirect call sites having !callees metadata
AcceptedPublic

Authored by mssimpso on Nov 9 2017, 1:53 PM.

Details

Summary

This patch allows the inliner to inspect !callees metadata attached to an indirect call site, and then inline the subset of callees for which inlining is profitable. The inlined code is conditionally executed depending on the run-time value of the call site's function pointer. !callees metadata is attached to an indirect call by the called value propagation pass and indicates the set of functions the call site might call.

Functionally, the patch versions an indirect call site for each possible callee that can be inlined, and then promotes those call sites to direct calls. If inlining is unsuccessful for any reason, a promoted call site is demoted and made indirect once again. This ensures that no direct call sites remain after inlining that weren't already in the program. Although call site promotion based on !callees metadata could conceivably be performed as a stand-alone pass, performing it in the inliner avoids extra work, since inline cost is used as the determining measure. Moreover, knowing if a callee will be inlined is probably one of the most reasonable static estimates of whether a call site should be promoted.

A dependent revision (D39339) modifies the call graph analyses to consider !callees metadata. It ensures the inliner will visit all of the functions listed in a given !callees metadata node prior to the functions containing call sites annotated by that node. This means the inliner's normal bottom-up traversal of the call graph is valid when promoting and then inlining the subset of possible callees.

Event Timeline

mssimpso created this revision.Nov 9 2017, 1:53 PM
davidxl edited edge metadata.Nov 27 2017, 9:52 AM

A high level comment: I think the right direction is to share the implementation with the existing IndirectCallPromotion.cpp pass. The analyzer can synthesize the MD_prof profile data for the indirect targets, though the actual target count can be synthesized with some heuristics.

mssimpso added a comment.EditedNov 27 2017, 11:04 AM

A high level comment: I think the right direction is to share the implementation with the existing IndirectCallPromotion.cpp pass. The analyzer can synthesize the MD_prof profile data for the indirect targets, though the actual target count can be synthesized with some heuristics.

Thanks very much for the feedback, David. Just to clarify, you mean both the indirect call promotion pass and the inliner patch here should use a common interface (like llvm::promoteIndirectCall, et al.) to perform the actual transformation? This makes sense to me, since there's a lot of overlap. There are a few differences in what I've implemented, though. For example, in this patch, we can do the promotion for a single callee without creating an if-then-else structure. I think sharing the implementation might require refactoring the existing code in the indirect call promotion pass somewhat. Are you OK with me doing that as a first step?

Yes, I think it is good to do the necessary refactoring work first.

Yes, I think it is good to do the necessary refactoring work first.

OK, sounds good. I'll submit the refactoring patch for review, and then update this one afterwards. Thanks!

mssimpso updated this revision to Diff 124955.Nov 30 2017, 9:50 AM

Rebased on top of D40658. We now use a common indirect call promotion interface.

mssimpso updated this revision to Diff 126565.Dec 12 2017, 10:08 AM

Moved demoteCall, previously in D40751, to this patch.

I think 'tryToPromote' method should be moved to the indirect call promotion pass so that the callback to inline cost/benefit is exposed there (a more general cost/benefit model needs to be developed for indirect call promotion). Initially, I think we can limit the use of the getInlineCost callback to cases where profile data is not available, this will achieve what this patch is doing without affecting existing promotion heuristics.

I think 'tryToPromote' method should be moved to the indirect call promotion pass so that the callback to inline cost/benefit is exposed there (a more general cost/benefit model needs to be developed for indirect call promotion). Initially, I think we can limit the use of the getInlineCost callback to cases where profile data is not available, this will achieve what this patch is doing without affecting existing promotion heuristics.

Thanks for taking a look at this again, David. This makes sense to me -- I'll update the patch shortly.

Ping

Wrong Patch - please ignore! Sorry.

mssimpso updated this revision to Diff 129910.Jan 15 2018, 3:09 PM

Addressed David's comments.

  • Made tryToPromote visible to the indirect call promotion pass
  • Added a check in tryToPromote that gives up if a call site has profile metadata to avoid conflicting heuristics.

David,

I hope I've at least addressed the spirit of you comments. First, I've moved tryToPromote to Transforms/Utils/CallPromotionUtils.cpp. IndirectCallPromotion.cpp already includes the necessary header, so the function is now visible there.

Second, regarding the inline cost callback, I made the profitability measure completely generic. So the client needs to provide tryToPromote with a function to determine if promotion is profitable, given an indirect call site and possible direct callee. I think it makes sense to implement it in this generic way if we leave the function in CallPromotionUtils.cpp. When called from within the inliner, the profitability measure is a wrapper around shouldInline that takes care of caching the inline cost to avoid recomputing it, etc. If we want to reuse the code in the indirect call promotion pass, we would need to define provide a similar profitability function.

I was unsure if you intended that I completely scrap all of the inliner changes, and make tryToPromote only available to the indirect call promotion pass. I didn't take your comment to mean this, but I could be wrong. Just in case, here are a few words in defense of the current patch.

In the current approach, where promotion is directed by the inliner, we limit promotion to the callees we know will definitely be inlined. So promotion is really just an intermediate step. We know the direct callee will be eliminated by inlining, so we're actually just inlining code through the indirect call. We also know the profitability measure we're trying to satisfy, inline cost, is accurate (but note that we need D39339 to get this right). Although a useful indicator, I think if we use inline cost outside the inliner, or at least outside a CallGraphSCCPass, it may end up being wrong in some cases. Suppose, for example, we use inline cost in a module pass that promotes an indirect call site to directly call function F. Later, the inliner may decide to inline a lot of code into F, making F a bad candidate for inlining into its caller. So the benefit we thought we were getting by promotion (the code simplification, measured by inline cost, due to inlining F into its caller) turns out to be nothing. This kind of situation is avoided by letting the inliner direct the promotion.

Thanks again!

I actually mean the former.

Regarding your comment "Although a useful indicator, I think if we use inline cost outside the inliner, or at least outside a CallGraphSCCPass, it may end up being wrong in some cases. Suppose, for example, we use inline cost in a module pass that promotes an indirect call site to directly call function F. Later, the inliner may decide to inline a lot of code into F, making F a bad candidate for inlining into its caller. So the benefit we thought we were getting by promotion (the code simplification, measured by inline cost, due to inlining F into its caller) turns out to be nothing. This kind of situation is avoided by letting the inliner direct the promotion",

I understand what you are saying -- due to the bottom up order (limitation) of the current inliner, doing inlining cost analysis at indirect call promootion may not reflect what is actually happening -- the indirect callsite that deem to be beneficial to be inlined at indirect call promotion pass may not be inlinable anymore because it may get too large.

I need to think about this a little.

Hi David,

I'd like to get back to this patch soon. Have you given the previous comments anymore thought? We had started to discuss whether the transformation should be done inside the inliner or the indirect call promotion pass. An advantage of letting the inliner do the promotion is that the inline cost, which is used to determine if an indirect call should be promoted, is known. If promotion is done outside the inliner, the cost would only be an estimate. An advantage of moving the transformation to the indirect call promotion pass may be that it is a better conceptual fit within that pass. On the other hand, it is not a pgo-driven transformation. A third possibility would be a separate pass. Would do you think?

I added Easwaran to get a closer look at the change.

I added Easwaran to get a closer look at the change.

Thanks David.

eraman added inline comments.Mar 20 2018, 6:07 PM
lib/Transforms/IPO/Inliner.cpp
447

CallSite has a getCaller method.

1121

This same code repeats twice. This can be moved to a static function. Actually, most of the logic used is common to legacy and new PM (except for the callgraph updating bit) and hence could be consolidated into a separate function.

1142

There are two other checks (starting at line 1051) that could prevent a callsite from inlining before we call shouldInline. This means it is possible that the promoted call doesn't get inlined.

mssimpso updated this revision to Diff 141406.Apr 6 2018, 12:57 PM
mssimpso marked an inline comment as done.

Addressed Easwaran's comments.

  • Reused more code between the new and legacy pass managers.
  • Ensured that promotion only happens when inlining is successful.

Thanks for your comments, Easwaran. This a fairly large update, primarily meant to address your concern that a promoted call site doesn't end up getting inlined. Ensuring this situation doesn't happen is somewhat difficult because the inlining algorithm itself can abort, even after the inline cost has been computed and inlining found to be profitable. The solution I've implemented here is to perform the promotion before the inline cost is computed (as in the previous version of the patch), but if inlining fails for any reason, the call site is demoted and made indirect again. So there shouldn't be any new direct calls in the program that weren't there before.

mssimpso edited the summary of this revision. (Show Details)Apr 6 2018, 12:59 PM

This is looking good. Main comment I have is about the number of promotions+inlining that can take place is unbounded. One could argue that if there were many small direct calls, we would inline all of them, but still I feel it is better to limit the number of promotions.

lib/Transforms/IPO/Inliner.cpp
723

Is there any limit on the size of the callees metadata? Otherwise you might potentially end up promoting a large number of calls and inline them

842

Nit: The call is not necessarily inlined. More precisely, the call is deleted (either due to inlining or because it is dead). Perhaps just modify the comment above or rename the function.

lib/Transforms/Utils/CallPromotionUtils.cpp
447

You could set the CPI->Versioned = true here instead of passing an additional parameter to versionCallSite.

476

It looks like isLegalToPromote doesn't check if the callee's body is available. If F->isDeclaration() is true, is there any reason to promote? The inliner is going to not inline the callee and ends up unnecessarily promote and demote the callee. This is even more important in the versioned case since it invalidates the analysis.

test/Transforms/Inline/callees-metadata.ll
23

This is relying on how instructions get simplified after inlining. Why not check that there are not call instructions and followed by a ret?

mssimpso updated this revision to Diff 142819.Apr 17 2018, 1:01 PM
mssimpso marked 5 inline comments as done.

Addressed Easwaran's comments. Thanks again!

  • Added a command line option to limit the number of callees that are inlined for a given indirect call site.
  • Changed tests to not run simplification after inlining.
  • Renamed setInlined to setCallSiteRemoved.
  • Addressed the remaining comments for CallPromotionUtils.
lib/Transforms/IPO/Inliner.cpp
723

The CalledValuePropagation pass limits the size of the callees set to 4 functions. But it's probably a good idea to have a limit in the inliner as well. I'll add a command line option for limiting the number of callees we inline for a given call site.

lib/Transforms/Utils/CallPromotionUtils.cpp
476

Nice catch! I think this check was present in a previous version of this patch, but it must have gotten lost in an update. I'll add it back.

eraman accepted this revision.May 10 2018, 4:09 PM

LGTM

lib/Transforms/IPO/Inliner.cpp
764

Typo

This revision is now accepted and ready to land.May 10 2018, 4:09 PM

Thanks very much for the review Easwaran! As this patch still depends on D39339 to get the function iteration order right in the inliner, would you mind taking at look at that patch if you have a chance, or recommending someone for the review? Thanks again.