This is an archive of the discontinued LLVM Phabricator instance.

Fix a bug in the inliner that causes subsequent double inlining
ClosedPublic

Authored by hoyFB on Mar 16 2020, 12:39 PM.

Details

Summary

A recent change in the instruction simplifier enables a call to a function that just returns one of its parameter to be simplified as simply loading the parameter. This exposes a bug in the inliner where double inlining may be involved which in turn may cause compiler ICE when an already-inlined callsite is reused for further inlining.
To put it simply, in the following-like C program, when the function call second(t) is inlined, its code t = third(t) will be reduced to just loading the return value of the callsite first(). This causes the inliner internal data structure to register the first() callsite for the call edge representing the third() call, therefore incurs a double inlining when both call edges are considered an inline candidate. I'm making a fix to break the inliner from reusing a callsite for new call edges.

void top()
{
    int t = first();
    second(t);
}

void second(int t)
{
   t = third(t);
   fourth(t);
}

void third(int t)
{
   return t;
}

The actual failing case is much trickier than the example here and is only reproducible with the legacy inliner. The way the legacy inliner works is to process each SCC in a bottom-up order. That means in reality function first may be already inlined into top, or function third is either inlined to second or is folded into nothing. To repro the failure seen from building a large application, we need to figure out a way to confuse the inliner so that the bottom-up inlining is not fulfilled. I'm doing this by making the second call indirect so that the alias analyzer fails to figure out the right call graph edge from top to second and top can be processed before second during the bottom-up. We also need to tweak the test code so that when the inlining of top happens, the function body of second is not that optimized, by delaying the pass of function attribute deducer (i.e, which tells function third has no side effect and just returns its parameter). Since the CGSCC pass is iterative, additional calls are added to top to postpone the inlining of second to the second round right after the first function attribute deducing pass is done. I haven't been able to repro the failure with the new pass manager since the processing order of ininlined callsites is a bit different, but in theory the issue could happen there too.

Note that this fix could introduce a side effect that blocks the simplification of inlined code, specifically for a call site that can be folded to another call site. I hope this can probably be complemented by subsequent inlining or folding, as shown in the attached unit test. The ideal fix should be to separate the use of VMap. However, in reality this failing pattern shouldn't happen often. And even if it happens, there should be a good chance that the non-folded call site will be refolded by iterative inlining or subsequent simplification.

Diff Detail

Event Timeline

hoyFB created this revision.Mar 16 2020, 12:39 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 16 2020, 12:39 PM
hoyFB edited the summary of this revision. (Show Details)Mar 16 2020, 12:40 PM
hoyFB added a reviewer: wenlei.
hoyFB edited the summary of this revision. (Show Details)Mar 16 2020, 12:43 PM

Thanks for fixing this! I forgot to ask earlier, but this could happen with inlining under new PM as well, right? With new PM, we collect calls within SCC, then iterate over them for inlining, so for the same reason we could have dangling call in VMap from cloning?

hoyFB added a comment.Mar 19 2020, 9:30 AM

Thanks for fixing this! I forgot to ask earlier, but this could happen with inlining under new PM as well, right? With new PM, we collect calls within SCC, then iterate over them for inlining, so for the same reason we could have dangling call in VMap from cloning?

Yes, it could, though that didn't happen in our case. I think it's due to different orders calls are processed in the legacy and new inliner.

davidxl added inline comments.Mar 19 2020, 9:50 AM
llvm/lib/Transforms/Utils/CloneFunction.cpp
361 ↗(On Diff #250616)

Is the first check needed (!dyn_cast<CallBase>(NewInst)) ?

hoyFB marked an inline comment as done.Mar 19 2020, 10:24 AM
hoyFB added inline comments.
llvm/lib/Transforms/Utils/CloneFunction.cpp
361 ↗(On Diff #250616)

It's needed when the inlined instruction is not a call but promoted to a call during the simplification. For example, some code on arm32/arm64 target could be promoted to a call or an intrinsic call. In this case we'd like to keep this simplification since the it will not cause double inlining given that the original instruction is not a call.

davidxl added inline comments.Mar 19 2020, 11:41 AM
llvm/lib/Transforms/Utils/CloneFunction.cpp
361 ↗(On Diff #250616)

Ok. Perhaps add a comment in the code?

hoyFB marked 2 inline comments as done.Mar 19 2020, 12:22 PM
hoyFB added inline comments.
llvm/lib/Transforms/Utils/CloneFunction.cpp
361 ↗(On Diff #250616)

Sounds good.

hoyFB updated this revision to Diff 251446.Mar 19 2020, 12:32 PM
hoyFB marked an inline comment as done.

UUpdating D76248: Fix a bug in the legacy inliner that causes subseqeunt double inlining.

wenlei accepted this revision.Mar 19 2020, 12:35 PM

Thanks for fixing this! I forgot to ask earlier, but this could happen with inlining under new PM as well, right? With new PM, we collect calls within SCC, then iterate over them for inlining, so for the same reason we could have dangling call in VMap from cloning?

Yes, it could, though that didn't happen in our case. I think it's due to different orders calls are processed in the legacy and new inliner.

Ok, then change the title/description to reflect that (not legacy PM specific)? otherwise LGTM.

This revision is now accepted and ready to land.Mar 19 2020, 12:35 PM
hoyFB retitled this revision from Fix a bug in the legacy inliner that causes subseqeunt double inlining. to Fix a bug in the inliner that causes subsequent double inlining.Mar 19 2020, 12:37 PM
hoyFB edited the summary of this revision. (Show Details)EditedMar 19 2020, 12:45 PM

Thanks for fixing this! I forgot to ask earlier, but this could happen with inlining under new PM as well, right? With new PM, we collect calls within SCC, then iterate over them for inlining, so for the same reason we could have dangling call in VMap from cloning?

Yes, it could, though that didn't happen in our case. I think it's due to different orders calls are processed in the legacy and new inliner.

Ok, then change the title/description to reflect that (not legacy PM specific)? otherwise LGTM.

Sure, removed the word 'legacy' from the title and added a new comment to the description.

davidxl added inline comments.Mar 20 2020, 5:23 PM
llvm/lib/Transforms/Utils/CloneFunction.cpp
361 ↗(On Diff #250616)

It should be ok to do the simplification if V has no side effects .

hoyFB marked an inline comment as done.Mar 20 2020, 6:10 PM
hoyFB added inline comments.
llvm/lib/Transforms/Utils/CloneFunction.cpp
361 ↗(On Diff #250616)

A pure function call could have no side effect and may still cause double inlining. As long as V is a call, it could be shared by NewInst and another call site. Note that an irregular call graph processing order may not guarantee a pure function call like V be fully inlined before processing NewInst.

ok -- I misunderstood (it to be a runtime problem) -- it is actually a stale reference related issue. In the failing case, is the inlining order the following?

top->second
top->first (original site)
top->first (from the clone of 'second')?

Even without ICE, this is still wrong unless first has no side effect.

ok -- I misunderstood (it to be a runtime problem) -- it is actually a stale reference related issue. In the failing case, is the inlining order the following?

top->second
top->first (original site)
top->first (from the clone of 'second')?

Even without ICE, this is still wrong unless first has no side effect.

Sorry, the function names in my test case are confusing. I should have named them more clearly. The actual inline order, in short, is
top->second
top->third
top->run (from third)
top->gen (from third)
top->gen (exactly the same callsite above, from the simplification of wrapper that comes with run. This is double inlining! The fix is to keep the wrapper call instead of simplifying to gen).

The order seems fine to me, except that wrapper is not inlined into run before run is cloned into top. This causes the callsite of wrapper in top to be considered as a further inline candidate. Normally a bottom-up inlining order ensures run is fully processed before its caller processed. However, with indirect calls the call graph is not precisely built, thus the bottom-up order is not fully fulfilled. We carefully elaborated the test case to expose the issue.

Thanks for the explanation.

The inliner does bottom up inlining -- after inlining each site for a node, a worklist based TopDown Inlining is attempted (in breadth first order) -- i.e., the new callsites from the the cloned inline instance will be pushed to the queue. The candidate callsites must be originated from the callee, not a reference to a callsite in the caller due to simplification. I think the better fix is at the queue update :

if (!InlineInfo.InlinedCalls.empty()) {
   // Create a new inline history entry for this, so that we remember
   // that these new callsites came about due to inlining Callee.
   int NewHistoryID = InlineHistory.size();
   InlineHistory.push_back(std::make_pair(Callee, InlineHistoryID));

   for (Value *Ptr : InlineInfo.InlinedCalls)
     CallSites.push_back(std::make_pair(CallSite(Ptr), NewHistoryID));
 }

In otherwords, if the callsite is already in the queue, do not attempt to insert it into it.

hoyFB added a comment.Mar 21 2020, 2:23 PM

Thanks for the explanation.

The inliner does bottom up inlining -- after inlining each site for a node, a worklist based TopDown Inlining is attempted (in breadth first order) -- i.e., the new callsites from the the cloned inline instance will be pushed to the queue. The candidate callsites must be originated from the callee, not a reference to a callsite in the caller due to simplification. I think the better fix is at the queue update :

if (!InlineInfo.InlinedCalls.empty()) {
   // Create a new inline history entry for this, so that we remember
   // that these new callsites came about due to inlining Callee.

   int NewHistoryID = InlineHistory.size();
   InlineHistory.push_back(std::make_pair(Callee, InlineHistoryID));

   for (Value *Ptr : InlineInfo.InlinedCalls)
     CallSites.push_back(std::make_pair(CallSite(Ptr), NewHistoryID));
 }

In otherwords, if the callsite is already in the queue, do not attempt to insert it into it.

Thanks for the pointers! Yes, I did try your fix first before making the current fix. Not queueing the same callisite twice fixes the doubling inlining, but another issue that the callgraph edge corresponding to the inlined wrapper call in top still is based on the callsite top->gen. When later on top->gen is inlined and removed, the wrapper call edge would AV. I think the culprit is that the top->gen callsite is shared by two call edges top->gen and top->wrapper.

nikic added a subscriber: nikic.Mar 21 2020, 2:23 PM

FTR the simplification in question (D75815) has since been reverted and added to InstCombine instead. See also there for a smaller reproducer. Is this change still needed now?

hoyFB added a comment.Mar 21 2020, 2:52 PM

FTR the simplification in question (D75815) has since been reverted and added to InstCombine instead. See also there for a smaller reproducer. Is this change still needed now?

That's a good point. Thanks for the information. Since D75815 has been reverted, the problem in discussion here won't be reproducible. But it's still not guaranteed that a physical callsite can not be reused by two call edges in the future. Maybe turning this fix into an assert? Please let me know your thoughts. Thanks!

Can probably turn this into a debug assert -- i.e, make sure that a callsite is not pushed into the queue more than once.

hoyFB updated this revision to Diff 253176.Mar 27 2020, 11:21 AM
hoyFB edited the summary of this revision. (Show Details)

Updating D76248: Fix a bug in the inliner that causes subsequent double inlining

Can probably turn this into a debug assert -- i.e, make sure that a callsite is not pushed into the queue more than once.

Done, also removed the fix in function cloning.

davidxl added inline comments.Mar 27 2020, 12:39 PM
llvm/lib/Transforms/IPO/Inliner.cpp
721

This has quadratic behavior and can cause excessive compile time increase for some cases with assertion on. Use a DenseSet to avoid that.

hoyFB marked an inline comment as done.Mar 27 2020, 1:12 PM
hoyFB added inline comments.
llvm/lib/Transforms/IPO/Inliner.cpp
721

Good point! I was not aware of debug build throughput could be an issue.

hoyFB updated this revision to Diff 253208.Mar 27 2020, 1:34 PM

Updating D76248: Fix a bug in the inliner that causes subsequent double inlining

davidxl accepted this revision.Mar 27 2020, 1:58 PM

The decl of debugCallSites can be moved even further ahead with incremental update, but probably won't make a big difference in practice,

hoyFB added a comment.Mar 27 2020, 2:22 PM

The decl of debugCallSites can be moved even further ahead with incremental update, but probably won't make a big difference in practice,

Yeah, I was thinking about a container parallel with and synchronized to CallSites. But that would make debug code scattered and not easy to track.

This revision was automatically updated to reflect the committed changes.