This is an archive of the discontinued LLVM Phabricator instance.

[Compile Time] Make it possible to skip redundant debuginst removal
AbandonedPublic

Authored by ctetreau on Oct 20 2020, 12:55 PM.

Details

Summary

MergeBlockIntoPredecessor unconditionally calls RemoveRedundantDbgInstrs,
which iterates twice over the instructions of the basic block.
MergeBlockIntoPredecessor is itself called in a hot inner loop,
and as a result, RemoveRedundantDbgInstrs was taking 98% of
the runtime in some cases, and causing 10 minute compile times in some
downstream workloads.

Make the call to RemoveRedundantDbgInstrs optional in
MergeBlockIntoPredecessor, and call RemoveRedundantDbgInstrs after the
loop is unrolled in LoopUnrollPass. This reduces the runtime usage of
RemoveRedundantDbgInstrs to 3%. There are places it can be moved where
the usage will be lower, but moving it into a function called
"simplifyLoopAfterUnroll" seemed a good fit.

See: Bug 47746

Diff Detail

Event Timeline

ctetreau created this revision.Oct 20 2020, 12:55 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 20 2020, 12:55 PM
ctetreau requested review of this revision.Oct 20 2020, 12:55 PM

This patch is basically the same thing we ended up doing in our downstream to resolve the issue. We're seeing it in LoopUnroll, so that's where we are skipping it.

If this problem crops up in other places, we could do something similar there. And if all the callsites have a mitigation, then we can safely remove RemoveRedundantDbgInstrs from MergeBlockIntoPredecessor

fhahn added a reviewer: vsk.Oct 20 2020, 1:01 PM

Is this an NFC patch? If not, can you please add a test case?

llvm/include/llvm/Transforms/Utils/BasicBlockUtils.h
93–97

please document under what circumstances one would pass true to the last two parameters

llvm/lib/Transforms/Utils/LoopUnroll.cpp
867

please add a comment that explains why passing true is correct/needed/right here.

Is this an NFC patch? If not, can you please add a test case?

This patch is NFC. Before this patch, for each merge in the loop, we remove redundant instructions. After this patch, for each merge in the loop, we defer, then for each basic block in the loop, we remove redundant instructions. Either way, all the redundant debug instructions are removed.

ctetreau marked 2 inline comments as done.Oct 20 2020, 2:04 PM
ctetreau updated this revision to Diff 299466.Oct 20 2020, 2:05 PM

address code review issues

vsk added a comment.Oct 20 2020, 2:45 PM

IIRC the original motivation for calling RemoveRedundantDbgInsts in MergeBlockIntoPredecessor was to improve compile time. It's been reported that the change inadvertently regressed compile time within SimplifyCFG as well as LoopUnroll. Imo, instead of making all the clients of MergeBlockIntoPredecessor opt-out of something that could be unexpectedly expensive, it would be better if they could opt-in instead (this is the approach taken in D88928). That still leaves open the possibility of making compile-time improvements in a targeted way, based on profiles.

I think your patch is reasonable. Mine is more conservative, yours more aggressive. Both solve the problem. If this patch is merged, yours should be abandoned.

I would like to see this issue addressed sooner than later though. This is causing actual problems in our downstream.

bjope added a comment.Oct 21 2020, 3:36 AM

As I see it this solution focuses on the problem seen in LoopUnroll. So I agree, it is conservative and simple so if it solves the PR I think it is just fine.

Comparing with the D88928 patch (in its current state), that patch removes the RemoveRedundantDbgInstrs call from MergeBlockIntoPredecessor, but only replaces that with extra calls to RemoveRedundantDbgInstrs in a few places where MergeBlockIntoPredecessor is used (so afaict that patch currently removes a lot of DbgInstr cleanups?). IMO this patch makes it more of an active choice to decide if the removal can be deferred or not. With D88928 it is easier to forget about such cleanups.

An alternative could be to mix the solutions a bit. We could have a version of MergeBlockIntoPredecessor that is doing the cleanup automatically, and one that skips the cleanup but instead returns the predecessor BB when a merge was made. The latter could be used in usecases such as the one in CodeGenPrepare, when we want to collect all the pred blocks in a set and defer calling of RemoveRedundantDbgInstrs for each of those blocks until after the inner loop.

vsk added a comment.EditedOct 21 2020, 12:51 PM

Comparing with the D88928 patch (in its current state), that patch removes the RemoveRedundantDbgInstrs call from MergeBlockIntoPredecessor, but only replaces that with extra calls to RemoveRedundantDbgInstrs in a few places where MergeBlockIntoPredecessor is used (so afaict that patch currently removes a lot of DbgInstr cleanups?). IMO this patch makes it more of an active choice to decide if the removal can be deferred or not. With D88928 it is easier to forget about such cleanups.

I did some digging (documented in D88928) about why we bother calling RemoveRedundantDbgInstrs in the first place, and couldn't find much besides a reference to llvm.org/PR35113. If it's a correctness concern, then we should make the dbg inst cleanup behavior opt-out (that's what this patch does). If it's a compile-time concern, we should make the cleanup behavior opt-in, because doing the cleanup increases the worst-case complexity of MergeBlockIntoPredecessor.

An alternative could be to mix the solutions a bit. We could have a version of MergeBlockIntoPredecessor that is doing the cleanup automatically, and one that skips the cleanup but instead returns the predecessor BB when a merge was made. The latter could be used in usecases such as the one in CodeGenPrepare, when we want to collect all the pred blocks in a set and defer calling of RemoveRedundantDbgInstrs for each of those blocks until after the inner loop.

Assuming that RemoveRedundantDbgInstrs is a compile-time fix: that sounds reasonable, but also more complicated. We might as well just have one version of the utility, and call RemoveRedundantDbgInstrs in cases where it demonstrably improves compile-time.

vsk added a comment.Oct 26 2020, 3:46 PM

Friendly ping. On the assumption that the debug inst cleanup in MergeBlockIntoPredecessor was intended to be a compile-time optimization, I propose making it opt-in as the performance impact can be unexpected. I.e., I suggest that we land D88928 soon-ish (tomorrow barring any concerns/objections).

ctetreau abandoned this revision.Oct 27 2020, 11:55 AM

The other patch was merged.