This is an archive of the discontinued LLVM Phabricator instance.

[NFC] [LoopPeel] Update IDoms of non-loop blocks dominated by the loop
ClosedPublic

Authored by dmakogon on Oct 12 2021, 12:06 AM.

Details

Summary

When peeling a loop, we assume that the latch has a br terminator and that all loop exits are either terminated with an unreachable or have a terminating deoptimize call. So when we peel off the 1st iteration, we change the IDom of all loop exits to the peeled copy of NCD(IDom(Exit), Latch). This works now, but if we add logic to support loops with exits that are followed by a block with an unreachable or a terminating deoptimize call, changing the exit's idom wouldn't be enough and DT would be broken.

For example, let Exit1 and Exit2 are loop exits, and each of them unconditionally branches to the same unreachable terminated block. So neither of the exits dominates this unreachable block. If we change the IDoms of the exits to some peeled loop block, we don't update the dominators of the unreachable block. Currently we just don't get to the peeling logic, saying that we can't peel such loops.

Previously we stored exits' IDoms in a map before peeling a loop and then, after peeling off one iteration, we changed their IDoms.
Now we use the same logic not only for exits but for all non-loop blocks dominated by the loop.
So when we add logic to support peeling loops with exits which branch, for example, to an unreachable-terminated block, we would update the IDoms not only for exits, but for their successors.

Diff Detail

Event Timeline

dmakogon created this revision.Oct 12 2021, 12:06 AM
dmakogon requested review of this revision.Oct 12 2021, 12:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 12 2021, 12:06 AM
dmakogon edited the summary of this revision. (Show Details)Oct 12 2021, 12:07 AM
dmakogon edited the summary of this revision. (Show Details)

Looks fine, but let's have more eyes on this.

fhahn added inline comments.Oct 12 2021, 1:01 AM
llvm/lib/Transforms/Utils/LoopPeel.cpp
836

Passing DT means that cloneLoopBlocks previously could rely on DT being up to date here. That's not true any more, as it is only updated later. Is that a problem? We also need to check all other uses of plain DT in until the updates are applied.

dmakogon planned changes to this revision.Oct 12 2021, 4:28 AM
mkazantsev added inline comments.Oct 12 2021, 4:34 AM
llvm/lib/Transforms/Utils/LoopPeel.cpp
836

We can completely remove DT from the signatures here and pass it updates for filling as well. Also may be faster.

dmakogon updated this revision to Diff 379082.Oct 12 2021, 9:29 AM

Now we pass a DomTreeUpdater to cloneLoopBlocks() for it to store updates there.

mkazantsev requested changes to this revision.Oct 12 2021, 9:38 PM
mkazantsev added inline comments.
llvm/lib/Transforms/Utils/LoopPeel.cpp
16

Pls remove this, DenseMap is already included on line 13.

563

Pls make it const

573

You already have DTU. You can just call insertEdge/deleteEdge on it without any checks. They will not be applied immediately because you are using lazy update strategy. So no need for DTUpdates at all.

835
for (auto *BB : L->blocks())
855

You can just call insertEdge here. DTU with null tree handles it correctly. The application will be lazy. No need to check hasDomTree.

896

Lazy DTU will only apply updates when it goes out of scope, or when it flushes. simplifyLoop below uses DT and needs it to be up to date. I'd suggest calling DTU.flush() here to apply all pending updates and leave assert as is.

This revision now requires changes to proceed.Oct 12 2021, 9:38 PM
mkazantsev added inline comments.Oct 12 2021, 9:45 PM
llvm/lib/Transforms/Utils/LoopPeel.cpp
573

**applyUpdates, since these two are deprecated. But a separate vector is not needed anyways, you have this lazy vector in DTU.

dmakogon added inline comments.Oct 12 2021, 11:21 PM
llvm/lib/Transforms/Utils/LoopPeel.cpp
896

Calling DTU.getDomTree flushes all updates to DT

dmakogon updated this revision to Diff 379289.Oct 12 2021, 11:29 PM

Removed DTU.hasDomTree() checks and DTUpdates in cloneLoopBlocks. All updates are applied on DTU

dmakogon marked 3 inline comments as done.Oct 12 2021, 11:35 PM
dmakogon added inline comments.
llvm/lib/Transforms/Utils/LoopPeel.cpp
563

How are we going to apply updates to it then? DTU::applyUpdates is a non-const function

836

Now we pass this function a DomTreeUpdater

dmakogon marked an inline comment as done.Oct 12 2021, 11:36 PM
dmakogon marked an inline comment as done.
mkazantsev added inline comments.Oct 13 2021, 3:42 AM
llvm/lib/Transforms/Utils/LoopPeel.cpp
563

LoopBlocksIDoms should be const, not DTU.

dmakogon updated this revision to Diff 379603.Oct 13 2021, 8:41 PM

Made LoopBlocksIDoms const

dmakogon marked 3 inline comments as done.Oct 13 2021, 8:41 PM
mkazantsev requested changes to this revision.Oct 14 2021, 12:40 AM
mkazantsev added inline comments.
llvm/lib/Transforms/Utils/LoopPeel.cpp
835

*BB

896

And you only call it in assert, i.e. in debug mode. It will simply not work in release. Please do it properly. I insist it to be flush + old assert.

This revision now requires changes to proceed.Oct 14 2021, 12:40 AM
dmakogon updated this revision to Diff 379643.Oct 14 2021, 2:54 AM

Moved DTU.getDomTree out of assert, now all further uses of DT (if it's present) will see the updated tree not only in debug mode

dmakogon marked 4 inline comments as done.Oct 14 2021, 2:55 AM
dmakogon updated this revision to Diff 379649.Oct 14 2021, 3:29 AM

fixed small mistake

mkazantsev accepted this revision.Oct 14 2021, 4:09 AM

Fine by me now, thanks! Let's see if someone else has concerns.

This revision is now accepted and ready to land.Oct 14 2021, 4:09 AM
nikic added inline comments.Oct 14 2021, 1:09 PM
llvm/lib/Transforms/Utils/LoopPeel.cpp
601

This code doesn't make sense to me. DT->addNewBlock() directly adds a new edge to the DT. DTU.applyUpdates() applies CFG updates to the DT. You seem to be passing a DT update to a CFG API here. Maybe that happens to work, but this is not how this is supposed to be used.

I'm not sure using DTU in this code is really useful.

mkazantsev added inline comments.Oct 14 2021, 10:24 PM
llvm/lib/Transforms/Utils/LoopPeel.cpp
601

Let me try to explain. We are going to expand the logic of exit detection on D110922. The current implementation relies on fact that we only need to update idoms of loop blocks and their immediate exits to get the right tree. However, this assumption will break once we handle exit blocks that may have successors, meaning idom for those successors may change. Imagine the case:

loop:
  br loop2, exit1

loop2:
  br loop3, exit 2

loop3:
  br loop, exit 3

exit1:
  br post_exit

exit2:
  br post_exit

exit3:
  br post_exit

After all it is done, idom of post_exit will change. Meaning that we should not just change idoms of exits, but also their successors. Now this situation is impossible, but it becomes possible as the scope of the optimization expands. We want to make current implementation independent of the scope (in fact, this patch is just a split-out piece of that one). That's why we need to use insert updates in the caller method.

Now, lazy strategy (at least in theory) should require less tree traversal than eager one. That's why we use lazy DTU in the caller.

Finally, we use DTU in the outer method and we have this cloneLoopBlocks here in that, I agree, we could leave it be with DT. However, the code written as is both reads and writes DT. What is the interaction between this and pending lazy updates, isn't very clear. I guess it just happens to be OK, but since we are to use DTU anyways, why not just use it everywhere for uniformity reasons? Just to avoid the situation that part of updates are applied, part is pending, and we also get idom from this half-updated tree. It should *happen* to work, but it doesn't really look nearly safe.

So the prime motivation for this was "ok, we are going to use lazy DTU anyways, let's be consistent and get all updates through it". It means that cloneLoopBlocks should be deprived of direct DT access, to avoid temptation to read something from it, knowing that not all updates are applied. This is why no DT is here, and there is DTU in its stead. And now, DTU simply doesn't have addNewBlock.

Does it all make more sense now?

nikic added inline comments.Oct 18 2021, 1:09 AM
llvm/lib/Transforms/Utils/LoopPeel.cpp
601

I understand the motivation for this change -- my problem is that the code misuses the DomTreeUpdater API. There is (generally) no CFG edge between VMap[IDom] and NewBB, but you are pretending there is. Maybe this happens to work because of implementation details, but it's not the correct way to use the API.

You'd have to actually insert all new control flow edges for the cloned blocks, not just the idom edges in the DT. It's one of the cases where directly updating the DT is simpler than going through DTU.

FYI the corresponding code for unrolling is https://github.com/llvm/llvm-project/blob/3f0b178de21ee82791a6ebe198314f14c0287a44/llvm/lib/Transforms/Utils/LoopUnroll.cpp#L643-L658 -- probably the same code would work for peeling as well.

mkazantsev added inline comments.Oct 18 2021, 3:11 AM
llvm/lib/Transforms/Utils/LoopPeel.cpp
601

I got your point now. Reverting the patch.

mkazantsev reopened this revision.Oct 18 2021, 4:14 AM
This revision is now accepted and ready to land.Oct 18 2021, 4:14 AM
mkazantsev requested changes to this revision.Oct 18 2021, 8:16 AM
mkazantsev added inline comments.
llvm/lib/Transforms/Utils/LoopPeel.cpp
601

I've re-read this code, and now I got why I had this confusion. The deprecated methods insertEdge and deleteEdge did check that the corresponding edge does [not] exist in CFG by the moment of this update. I was certain that lazy mass update also preserves this behavior, but it doesn't. This is highly confusing. Not sure how it happens to pass the tests, but it does.

Dima will come up with alternative approach.

This revision now requires changes to proceed.Oct 18 2021, 8:16 AM
dmakogon updated this revision to Diff 381218.Oct 21 2021, 5:13 AM
dmakogon retitled this revision from [NFC] [LoopPeel] Change the way DT is updated for loop exits to [NFC] [LoopPeel] Update IDoms of non-loop blocks dominated by the loop.
dmakogon edited the summary of this revision. (Show Details)

As pointed out by @nikic the proposed solution which used DomTreeUpdater was incorrect.
Now we remember IDoms of non-loop blocks which are dominated by some loop block.
Their IDom would be the nearest common dominator of their loop dominator and the latch.
The same logic was already used in peelLoop function, but only for the loop exits.
With this patch we also cover other blocks that are dominated by the loop.

The new logic is copied from the LoopUnroll pass. It's probably better to create an utility function for this, but need to check if it would work in the loop unroll code.

dmakogon marked 5 inline comments as done.Oct 21 2021, 5:13 AM

Looks fine for me, but let's give other reviewers time to raise their concerns. And please run some extensive fuzzer testing for it.

nikic accepted this revision.Oct 25 2021, 2:34 PM

LGTM

Looks fine for me, but let's give other reviewers time to raise their concerns. And please run some extensive fuzzer testing for it.

The testing finished successfully.

mkazantsev accepted this revision.Oct 25 2021, 11:05 PM
This revision is now accepted and ready to land.Oct 25 2021, 11:05 PM