This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Push freeze through recurrence phi
ClosedPublic

Authored by nikic on Jun 16 2022, 6:07 AM.

Details

Summary

We really want to push freezes through recurrence phis, so that we freeze only the start value, rather than the IV value on every iteration. foldOpIntoPhi() already handles this for the case where the transfer function doesn't produce poison, e.g. %iv.next = add %iv, 1. However, this does not work if nowrap flags are present, e.g. the very common %iv.next = add nuw %iv, 1 case.

This patch adds a fold that pushes freeze instructions to the start value by checking whether all backedge values will be non-poison after poison generating flags have been dropped. This allows pushing freezes out of loops in most cases. I suspect that this also obsoletes the CanonicalizeFreezeInLoops pass, and we can probably drop it.

Fixes https://github.com/llvm/llvm-project/issues/56048.

Diff Detail

Event Timeline

nikic created this revision.Jun 16 2022, 6:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 16 2022, 6:07 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
nikic requested review of this revision.Jun 16 2022, 6:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 16 2022, 6:07 AM
nlopes added inline comments.Jun 16 2022, 6:44 AM
llvm/test/Transforms/InstCombine/freeze.ll
746

This case is annoying because if undef didn't exist*, we could get rid of the freeze here altogether.
The reason is that if %init=poison, then %init.fr becomes any value we want.
Since the add below is nuw, we can pick a convenient value that makes the addition overflow and thus poison.
Thus, if %init is poison, %cond becomes poison, and the loop triggers UB.

  • The argument still applies to full undef values. But it does not for values that are partially undef (e.g., (undef & 1) | (x & ~1)). We've never enabled partial undefs in Alive2 because they are quite expensive and we only had some artificial examples on when they are needed. But this example shows it well.

We can decide to ignore partial undefs here, or at least leave a FIXME for that day when we remove undef from LLVM 🤣

nikic added inline comments.Jun 16 2022, 11:33 AM
llvm/test/Transforms/InstCombine/freeze.ll
746

I think there are two aspects here: The first is the dropped nowrap flags, and the other the freeze on the start value.

If we drop support for undef, then we'll automatically start preserving nowrap flags here (I checked that this happens if we replace isGuaranteedNotToBeUndefOrPoison with isGuaranteedNotToBePoison, which can exploit poison propagation + branch on poison UB).

The argument you make that would allow dropping the freeze entirely is a lot more subtle, and depends on the exact instructions being used. I think it would not work if the increment is a variable, for example (because it could be zero, in which case there might not be a choice of frozen value that results in a branch on poison). I tried to demonstrate this with https://alive2.llvm.org/ce/z/w9eiFn, but alive happily accepts this. I would have thought that init = poison, step = 0 is an obvious counter-example, where tgt causes branch on poison, while src is well-defined.

nlopes accepted this revision.Jun 16 2022, 11:45 AM

LGTM

llvm/test/Transforms/InstCombine/freeze.ll
746

That stretches a bit the capabilities of Alive2 regarding loops. FWIW, an increment of zero gives an infinite loop, so we may be fine.
Anyway, let's leave that discussion for when undef is gone :)

This revision is now accepted and ready to land.Jun 16 2022, 11:45 AM

I suspect that this also obsoletes the CanonicalizeFreezeInLoops pass, and we can probably drop it.

ATM CanonicalizeFreezeInLoops targets patterns that are weakly related but different from this patch:
(1) It deals with freeze(add nsw (phi, 1)). (foldFreezeIntoRecurrence is called only for freeze(phi) in this patch)
(2) Freeze instructions that are not finally fed into the branch condition are hoisted as well - this is to allow successful SCEV analysis for all variables inside a loop
So immediately dropping it might not be possible.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3803

Not sure whether this can happen; when does this condition meet?

nlopes added inline comments.Jun 17 2022, 1:03 AM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3803

I was also wondering about that, but I guess it may happen with malformed BBs (e.g., with just PHIs but no terminator). Unreachable BBs are allowed to be malformed in LLVM IR (which is a bit unfortunate).

aqjune added inline comments.Jun 17 2022, 1:08 AM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3803

Oh okay, it makes sense to me.

nikic updated this revision to Diff 437817.Jun 17 2022, 1:29 AM

Handle invoke terminator, remove unnecessary catchswitch handling.

nikic added a comment.Jun 17 2022, 1:35 AM

I suspect that this also obsoletes the CanonicalizeFreezeInLoops pass, and we can probably drop it.

ATM CanonicalizeFreezeInLoops targets patterns that are weakly related but different from this patch:
(1) It deals with freeze(add nsw (phi, 1)). (foldFreezeIntoRecurrence is called only for freeze(phi) in this patch)

InstCombine handles this case as well: pushFreezeToPreventPoisonFromPropagating() will convert freeze(add nsw (phi, 1)) into add(freeze(phi), 1), and from there we will push the freeze out of the loop.

(2) Freeze instructions that are not finally fed into the branch condition are hoisted as well - this is to allow successful SCEV analysis for all variables inside a loop

This code doesn't depend on the branch condition either, so I think it should be fine. I'll run the CanonicalizeFreezeInLoops tests through InstCombine to check whether we're missing any cases.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3803

Unreachable code still requires basic blocks to have terminators. What this was handling is a special case where no instructions can be inserted into a block for the case where an EH pad is also a terminator. This happens with catchswitch block pads, which can only contain phis and catchswitch. However, after trying to construct a test case for this, I don't think this is relevant for this code, because the successor of catchswitch can't be a loop (it would be another EH pad). So I've dropped this.

Instead I've added a check for invoke terminators, which we do need to handle (@fold_phi_invoke_start_value).

This revision was landed with ongoing or failed builds.Jun 17 2022, 6:03 AM
This revision was automatically updated to reflect the committed changes.
nikic added a comment.Jun 17 2022, 6:11 AM

Okay, I just checked the CanonicalizeFreezeInLoops tests, and there is one case that it handles but InstCombine doesn't: https://github.com/llvm/llvm-project/blob/c6b88cb9184fc6c17bb3d9c2c9568646dc46638d/llvm/test/Transforms/CanonicalizeFreezeInLoops/onephi.ll#L361 This is the case where the step value needs to be frozen.

Thank you for confirming it!