This is an archive of the discontinued LLVM Phabricator instance.

[instcombine] Propagate freeze back to def
AbandonedPublic

Authored by mtrofin on May 9 2022, 11:09 AM.

Details

Summary

We noticed a regression in
llvm-test-suite/MicroBenchmarks/ImageProcessing/Interpolation that,
through bisection, appeared to be caused by D124252. Upon investigation,
the root cause appears to be the way D105392 propagates upwards freezes.

D124252 would freeze the cmp result, and loop unswitching would
otherwise happen identically. We'd then do instcombine. At this point,
we'd be presented with a few redundant definitions - see the %idx1 and
%idx2 in the added test. They'd have uses in the then and else
sides of the unswitched loop that, before D124252, would be recognized
as equivalent and optimized out (the test glosses over that and just
uses the mock_use function to make sure the values have a use).

D124252's introduction of the freeze kicked in the behavior introduced
by D105392. The problem is that the freeze would be always propagated up
right before the current freeze - which in our case would leave one of
the equivalent idx before the freeze and one after. The two would be
now different, and we lose the optimization opportunity. Here's what the
IR looked like post-instcombine, for the newly added test:

define i1 @func(i32 %0, i32 noundef %1) {
  %dr = lshr i32 %0, 2
  %idx1 = zext i32 %dr to i64
  %dr.fr = freeze i32 %dr
  %add = add i32 %dr.fr, 1
  %cmp = icmp slt i32 %add, %1
  %idx2 = zext i32 %dr.fr to i64
  %v = call i1 @use(i64 %idx1, i64 %idx2)
  %ret = and i1 %v, %cmp
  ret i1 %ret
}

Note how %idx1 and %idx2 are using, one the pre-freeze def of %dr
and one the post-freeze (i.e. %dr.fr).

This patch moves the freeze right after definition.

Diff Detail

Event Timeline

mtrofin created this revision.May 9 2022, 11:09 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 9 2022, 11:09 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
mtrofin requested review of this revision.May 9 2022, 11:09 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 9 2022, 11:09 AM
mtrofin added inline comments.May 9 2022, 11:11 AM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3813

I *think* this is just the case when MaybePoisonOperand is a function parameter, so we should insert in the entry block - but want to first make sure the idea of the patch isn't missing something.

nikic requested changes to this revision.May 9 2022, 11:45 AM

I don't think this is the right place to do this. You should either:

  1. Remove the one-use restriction on this fold, and replace other uses with the frozen variant as well. This is on the premise that pushing freeze higher is more worthwhile than having some un-frozen uses.
  1. Adjust the freezeDominatedUses() fold to freeze not only dominated uses. Basically canonicalize %x to freeze %x. This is on the premise that allowing more exploitation of value identity is more worthwhile than having un-frozen uses.

I'm not sure which one of these is the right approach, and I didn't really understand what optimization is supposed to be achieved in your example -- you might want to construct it in a way that something actually folds away as a result.

This revision now requires changes to proceed.May 9 2022, 11:45 AM

I don't think this is the right place to do this. You should either:

  1. Remove the one-use restriction on this fold, and replace other uses with the frozen variant as well. This is on the premise that pushing freeze higher is more worthwhile than having some un-frozen uses.
  1. Adjust the freezeDominatedUses() fold to freeze not only dominated uses. Basically canonicalize %x to freeze %x. This is on the premise that allowing more exploitation of value identity is more worthwhile than having un-frozen uses.

I think in either case the freeze %s def needs to dominate its uses.

I'm not sure which one of these is the right approach, and I didn't really understand what optimization is supposed to be achieved in your example -- you might want to construct it in a way that something actually folds away as a result.

Sorry, the full picture is more involved, and I simplified it quite a bit, on the assumption that moving the freeze to the def is what we'd want.
In this example, we would want the uses of %idx1 and %idx2, which aren't shown here, be optimized by simplifycfg. I can rework the example, but wanted to first check if there are any issues with moving the freeze up to dominate all the uses of the value it freezes (basically I couldn't figure out why the original change chose to move the freeze one up rather than all the way to the original def. Maybe @hyeongyukim remembers?)

nikic added a comment.May 9 2022, 2:32 PM

I don't think this is the right place to do this. You should either:

  1. Remove the one-use restriction on this fold, and replace other uses with the frozen variant as well. This is on the premise that pushing freeze higher is more worthwhile than having some un-frozen uses.
  1. Adjust the freezeDominatedUses() fold to freeze not only dominated uses. Basically canonicalize %x to freeze %x. This is on the premise that allowing more exploitation of value identity is more worthwhile than having un-frozen uses.

I think in either case the freeze %s def needs to dominate its uses.

Sure -- to be clear, the suggestion for the second approach is to move the freeze up so that it does dominate all uses and allows all of them to be replaced. This is basically what you're doing here, with the crucial difference that it happens for all freeze instructions, not just the ones produced by this particular transform.

There is a tradeoff here between leaving some uses to not use freeze (and possibly allowing additional optimizations for them, as freeze is usually an optimization-blocker) and allowing folds based on value-identity (which benefit from all uses being freeze(%x), not %x).

With that in mind, I would probably lean toward dropping the one-use check on this fold as a starting point -- it's more conservative, but should still cover your motivating case.

I don't think this is the right place to do this. You should either:

  1. Remove the one-use restriction on this fold, and replace other uses with the frozen variant as well. This is on the premise that pushing freeze higher is more worthwhile than having some un-frozen uses.
  1. Adjust the freezeDominatedUses() fold to freeze not only dominated uses. Basically canonicalize %x to freeze %x. This is on the premise that allowing more exploitation of value identity is more worthwhile than having un-frozen uses.

I think in either case the freeze %s def needs to dominate its uses.

Sure -- to be clear, the suggestion for the second approach is to move the freeze up so that it does dominate all uses and allows all of them to be replaced. This is basically what you're doing here, with the crucial difference that it happens for all freeze instructions, not just the ones produced by this particular transform.

There is a tradeoff here between leaving some uses to not use freeze (and possibly allowing additional optimizations for them, as freeze is usually an optimization-blocker) and allowing folds based on value-identity (which benefit from all uses being freeze(%x), not %x).

With that in mind, I would probably lean toward dropping the one-use check on this fold as a starting point -- it's more conservative, but should still cover your motivating case.

Ah, yes, dropping the one-use check makes sense. Waiting to see if the others had any concerns with that, otherwise sgtm.

Here's also the shrunk down example showing the larger picture (it's as much shrinking I could do off binilearKernel in that benchmark):
https://gist.github.com/mtrofin/19ccc1c0fc2da8689422837ad442896c

(Just comment in/out the 2 options)

basically instcombine moves the freeze from cmp.fr up to div, but stops there. then gvn can't unify idx1 and 2 and then we end up with a select when we do simplifycfg. In the larger picture that's a bit more than just a select, the whole if.else block survives (with the subset of the instructions that couldn't be folded) and that redundant computation then causes the regression.

nikic added a comment.May 10 2022, 7:45 AM

I don't think this is the right place to do this. You should either:

  1. Remove the one-use restriction on this fold, and replace other uses with the frozen variant as well. This is on the premise that pushing freeze higher is more worthwhile than having some un-frozen uses.
  1. Adjust the freezeDominatedUses() fold to freeze not only dominated uses. Basically canonicalize %x to freeze %x. This is on the premise that allowing more exploitation of value identity is more worthwhile than having un-frozen uses.

I think in either case the freeze %s def needs to dominate its uses.

Sure -- to be clear, the suggestion for the second approach is to move the freeze up so that it does dominate all uses and allows all of them to be replaced. This is basically what you're doing here, with the crucial difference that it happens for all freeze instructions, not just the ones produced by this particular transform.

There is a tradeoff here between leaving some uses to not use freeze (and possibly allowing additional optimizations for them, as freeze is usually an optimization-blocker) and allowing folds based on value-identity (which benefit from all uses being freeze(%x), not %x).

With that in mind, I would probably lean toward dropping the one-use check on this fold as a starting point -- it's more conservative, but should still cover your motivating case.

Ah, yes, dropping the one-use check makes sense. Waiting to see if the others had any concerns with that, otherwise sgtm.

Here's also the shrunk down example showing the larger picture (it's as much shrinking I could do off binilearKernel in that benchmark):
https://gist.github.com/mtrofin/19ccc1c0fc2da8689422837ad442896c

(Just comment in/out the 2 options)

basically instcombine moves the freeze from cmp.fr up to div, but stops there. then gvn can't unify idx1 and 2 and then we end up with a select when we do simplifycfg. In the larger picture that's a bit more than just a select, the whole if.else block survives (with the subset of the instructions that couldn't be folded) and that redundant computation then causes the regression.

Thanks, I understand your case now. I also ran into another related one while working on CHR:

define i1 @test(i32 %x) {
  %and1 = and i32 %x, 4
  %cmp1 = icmp ne i32 %and1, 0
  %x.fr = freeze i32 %x
  %and2 = and i32 %x.fr, 11
  %cmp2 = icmp eq i32 %and2, 11
  %and = and i1 %cmp1, %cmp2
  ret i1 %and
}

Dropping the one-use limitation from the "push freeze upwards" fold would not help this case, as the freeze cannot be pushed any further. To cover this case, we do need to extend the freezeDominatedUses() transform.

I've put up https://reviews.llvm.org/D125321, which extends freezeDominatedUses() to move the freeze such that it dominates all uses (if possible).

mtrofin abandoned this revision.May 10 2022, 10:33 AM

I've put up https://reviews.llvm.org/D125321, which extends freezeDominatedUses() to move the freeze such that it dominates all uses (if possible).

Yup, thanks!. I'm not super sure why we'd want to insert the new freeze first (potentially) in the middle of the original's uses, then hoist it to dominate all uses (rather than place it "at the right place" upfront), but likely I'm missing some design nuance of this pass. I'm not hung up on this - asking to learn.

I've put up https://reviews.llvm.org/D125321, which extends freezeDominatedUses() to move the freeze such that it dominates all uses (if possible).

Yup, thanks!. I'm not super sure why we'd want to insert the new freeze first (potentially) in the middle of the original's uses, then hoist it to dominate all uses (rather than place it "at the right place" upfront), but likely I'm missing some design nuance of this pass. I'm not hung up on this - asking to learn.

Hoisting the freeze insertion point in pushFreezeToPreventPoisonFromPropagating() fixes the problem for freeze instructions introduced by that transform -- but it does not fix the problem for freeze instructions introduced by other (likely non-InstCombine) transforms. For that reason this should be handled by two separate folds that can apply independently of each other (though they would usually work in tandem).

I've put up https://reviews.llvm.org/D125321, which extends freezeDominatedUses() to move the freeze such that it dominates all uses (if possible).

Yup, thanks!. I'm not super sure why we'd want to insert the new freeze first (potentially) in the middle of the original's uses, then hoist it to dominate all uses (rather than place it "at the right place" upfront), but likely I'm missing some design nuance of this pass. I'm not hung up on this - asking to learn.

Hoisting the freeze insertion point in pushFreezeToPreventPoisonFromPropagating() fixes the problem for freeze instructions introduced by that transform -- but it does not fix the problem for freeze instructions introduced by other (likely non-InstCombine) transforms. For that reason this should be handled by two separate folds that can apply independently of each other (though they would usually work in tandem).

Thanks!