This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
1/1
InstructionCombining.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
freeze.ll

Differential D125248

[instcombine] Propagate freeze back to def
AbandonedPublic

Authored by mtrofin on May 9 2022, 11:09 AM.

Download Raw Diff

Details

Reviewers

hyeongyukim
nikic
lebedev.ri

Summary

We noticed a regression in
llvm-test-suite/MicroBenchmarks/ImageProcessing/Interpolation that,
through bisection, appeared to be caused by D124252. Upon investigation,
the root cause appears to be the way D105392 propagates upwards freezes.

D124252 would freeze the cmp result, and loop unswitching would
otherwise happen identically. We'd then do instcombine. At this point,
we'd be presented with a few redundant definitions - see the %idx1 and
%idx2 in the added test. They'd have uses in the then and else
sides of the unswitched loop that, before D124252, would be recognized
as equivalent and optimized out (the test glosses over that and just
uses the mock_use function to make sure the values have a use).

D124252's introduction of the freeze kicked in the behavior introduced
by D105392. The problem is that the freeze would be always propagated up
right before the current freeze - which in our case would leave one of
the equivalent idx before the freeze and one after. The two would be
now different, and we lose the optimization opportunity. Here's what the
IR looked like post-instcombine, for the newly added test:

define i1 @func(i32 %0, i32 noundef %1) {
  %dr = lshr i32 %0, 2
  %idx1 = zext i32 %dr to i64
  %dr.fr = freeze i32 %dr
  %add = add i32 %dr.fr, 1
  %cmp = icmp slt i32 %add, %1
  %idx2 = zext i32 %dr.fr to i64
  %v = call i1 @use(i64 %idx1, i64 %idx2)
  %ret = and i1 %v, %cmp
  ret i1 %ret
}

Note how %idx1 and %idx2 are using, one the pre-freeze def of %dr
and one the post-freeze (i.e. %dr.fr).

This patch moves the freeze right after definition.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,100 ms	x64 debian > AddressSanitizer-x86_64-linux-dynamic.TestCases::scariness_score_test.cpp
	60,060 ms	x64 debian > AddressSanitizer-x86_64-linux.TestCases::scariness_score_test.cpp
	60,030 ms	x64 debian > MLIR.Examples/standalone::test.toy
	60,040 ms	x64 debian > libFuzzer.libFuzzer::fuzzer-leak.test
	60,090 ms	x64 debian > libFuzzer.libFuzzer::large.test
		View Full Test Results (8 Failed)

Event Timeline

mtrofin created this revision.May 9 2022, 11:09 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 9 2022, 11:09 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

mtrofin requested review of this revision.May 9 2022, 11:09 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 9 2022, 11:09 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

mtrofin added inline comments.May 9 2022, 11:11 AM

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3813	I think this is just the case when `MaybePoisonOperand` is a function parameter, so we should insert in the entry block - but want to first make sure the idea of the patch isn't missing something.

I don't think this is the right place to do this. You should either:

Remove the one-use restriction on this fold, and replace other uses with the frozen variant as well. This is on the premise that pushing freeze higher is more worthwhile than having some un-frozen uses.

Adjust the freezeDominatedUses() fold to freeze not only dominated uses. Basically canonicalize %x to freeze %x. This is on the premise that allowing more exploitation of value identity is more worthwhile than having un-frozen uses.

I'm not sure which one of these is the right approach, and I didn't really understand what optimization is supposed to be achieved in your example -- you might want to construct it in a way that something actually folds away as a result.

This revision now requires changes to proceed.May 9 2022, 11:45 AM

Harbormaster completed remote builds in B163523: Diff 428134.May 9 2022, 1:14 PM

In D125248#3501446, @nikic wrote:

I don't think this is the right place to do this. You should either:

Remove the one-use restriction on this fold, and replace other uses with the frozen variant as well. This is on the premise that pushing freeze higher is more worthwhile than having some un-frozen uses.

Adjust the freezeDominatedUses() fold to freeze not only dominated uses. Basically canonicalize %x to freeze %x. This is on the premise that allowing more exploitation of value identity is more worthwhile than having un-frozen uses.

I think in either case the freeze %s def needs to dominate its uses.

I'm not sure which one of these is the right approach, and I didn't really understand what optimization is supposed to be achieved in your example -- you might want to construct it in a way that something actually folds away as a result.

Sorry, the full picture is more involved, and I simplified it quite a bit, on the assumption that moving the freeze to the def is what we'd want.
In this example, we would want the uses of %idx1 and %idx2, which aren't shown here, be optimized by simplifycfg. I can rework the example, but wanted to first check if there are any issues with moving the freeze up to dominate all the uses of the value it freezes (basically I couldn't figure out why the original change chose to move the freeze one up rather than all the way to the original def. Maybe @hyeongyukim remembers?)

In D125248#3501917, @mtrofin wrote:

In D125248#3501446, @nikic wrote:

I don't think this is the right place to do this. You should either:

Remove the one-use restriction on this fold, and replace other uses with the frozen variant as well. This is on the premise that pushing freeze higher is more worthwhile than having some un-frozen uses.

Adjust the freezeDominatedUses() fold to freeze not only dominated uses. Basically canonicalize %x to freeze %x. This is on the premise that allowing more exploitation of value identity is more worthwhile than having un-frozen uses.

I think in either case the freeze %s def needs to dominate its uses.

Sure -- to be clear, the suggestion for the second approach is to move the freeze up so that it does dominate all uses and allows all of them to be replaced. This is basically what you're doing here, with the crucial difference that it happens for all freeze instructions, not just the ones produced by this particular transform.

There is a tradeoff here between leaving some uses to not use freeze (and possibly allowing additional optimizations for them, as freeze is usually an optimization-blocker) and allowing folds based on value-identity (which benefit from all uses being freeze(%x), not %x).

With that in mind, I would probably lean toward dropping the one-use check on this fold as a starting point -- it's more conservative, but should still cover your motivating case.

In D125248#3501979, @nikic wrote:

In D125248#3501917, @mtrofin wrote:

In D125248#3501446, @nikic wrote:

I don't think this is the right place to do this. You should either:

Remove the one-use restriction on this fold, and replace other uses with the frozen variant as well. This is on the premise that pushing freeze higher is more worthwhile than having some un-frozen uses.

Adjust the freezeDominatedUses() fold to freeze not only dominated uses. Basically canonicalize %x to freeze %x. This is on the premise that allowing more exploitation of value identity is more worthwhile than having un-frozen uses.

I think in either case the freeze %s def needs to dominate its uses.

Sure -- to be clear, the suggestion for the second approach is to move the freeze up so that it does dominate all uses and allows all of them to be replaced. This is basically what you're doing here, with the crucial difference that it happens for all freeze instructions, not just the ones produced by this particular transform.

There is a tradeoff here between leaving some uses to not use freeze (and possibly allowing additional optimizations for them, as freeze is usually an optimization-blocker) and allowing folds based on value-identity (which benefit from all uses being freeze(%x), not %x).

With that in mind, I would probably lean toward dropping the one-use check on this fold as a starting point -- it's more conservative, but should still cover your motivating case.

Ah, yes, dropping the one-use check makes sense. Waiting to see if the others had any concerns with that, otherwise sgtm.

Here's also the shrunk down example showing the larger picture (it's as much shrinking I could do off binilearKernel in that benchmark):
https://gist.github.com/mtrofin/19ccc1c0fc2da8689422837ad442896c

(Just comment in/out the 2 options)

basically instcombine moves the freeze from cmp.fr up to div, but stops there. then gvn can't unify idx1 and 2 and then we end up with a select when we do simplifycfg. In the larger picture that's a bit more than just a select, the whole if.else block survives (with the subset of the instructions that couldn't be folded) and that redundant computation then causes the regression.

In D125248#3502172, @mtrofin wrote:

In D125248#3501979, @nikic wrote:

In D125248#3501917, @mtrofin wrote:

In D125248#3501446, @nikic wrote:

I don't think this is the right place to do this. You should either:

Remove the one-use restriction on this fold, and replace other uses with the frozen variant as well. This is on the premise that pushing freeze higher is more worthwhile than having some un-frozen uses.

Adjust the freezeDominatedUses() fold to freeze not only dominated uses. Basically canonicalize %x to freeze %x. This is on the premise that allowing more exploitation of value identity is more worthwhile than having un-frozen uses.

I think in either case the freeze %s def needs to dominate its uses.

Sure -- to be clear, the suggestion for the second approach is to move the freeze up so that it does dominate all uses and allows all of them to be replaced. This is basically what you're doing here, with the crucial difference that it happens for all freeze instructions, not just the ones produced by this particular transform.

There is a tradeoff here between leaving some uses to not use freeze (and possibly allowing additional optimizations for them, as freeze is usually an optimization-blocker) and allowing folds based on value-identity (which benefit from all uses being freeze(%x), not %x).

With that in mind, I would probably lean toward dropping the one-use check on this fold as a starting point -- it's more conservative, but should still cover your motivating case.

Ah, yes, dropping the one-use check makes sense. Waiting to see if the others had any concerns with that, otherwise sgtm.

Here's also the shrunk down example showing the larger picture (it's as much shrinking I could do off binilearKernel in that benchmark):
https://gist.github.com/mtrofin/19ccc1c0fc2da8689422837ad442896c

(Just comment in/out the 2 options)

basically instcombine moves the freeze from cmp.fr up to div, but stops there. then gvn can't unify idx1 and 2 and then we end up with a select when we do simplifycfg. In the larger picture that's a bit more than just a select, the whole if.else block survives (with the subset of the instructions that couldn't be folded) and that redundant computation then causes the regression.

Thanks, I understand your case now. I also ran into another related one while working on CHR:

define i1 @test(i32 %x) {
  %and1 = and i32 %x, 4
  %cmp1 = icmp ne i32 %and1, 0
  %x.fr = freeze i32 %x
  %and2 = and i32 %x.fr, 11
  %cmp2 = icmp eq i32 %and2, 11
  %and = and i1 %cmp1, %cmp2
  ret i1 %and
}

Dropping the one-use limitation from the "push freeze upwards" fold would not help this case, as the freeze cannot be pushed any further. To cover this case, we do need to extend the freezeDominatedUses() transform.

I've put up https://reviews.llvm.org/D125321, which extends freezeDominatedUses() to move the freeze such that it dominates all uses (if possible).

In D125248#3503741, @nikic wrote:

I've put up https://reviews.llvm.org/D125321, which extends freezeDominatedUses() to move the freeze such that it dominates all uses (if possible).

Yup, thanks!. I'm not super sure why we'd want to insert the new freeze first (potentially) in the middle of the original's uses, then hoist it to dominate all uses (rather than place it "at the right place" upfront), but likely I'm missing some design nuance of this pass. I'm not hung up on this - asking to learn.

In D125248#3504103, @mtrofin wrote:

In D125248#3503741, @nikic wrote:

I've put up https://reviews.llvm.org/D125321, which extends freezeDominatedUses() to move the freeze such that it dominates all uses (if possible).

Yup, thanks!. I'm not super sure why we'd want to insert the new freeze first (potentially) in the middle of the original's uses, then hoist it to dominate all uses (rather than place it "at the right place" upfront), but likely I'm missing some design nuance of this pass. I'm not hung up on this - asking to learn.

Hoisting the freeze insertion point in pushFreezeToPreventPoisonFromPropagating() fixes the problem for freeze instructions introduced by that transform -- but it does not fix the problem for freeze instructions introduced by other (likely non-InstCombine) transforms. For that reason this should be handled by two separate folds that can apply independently of each other (though they would usually work in tandem).

In D125248#3504196, @nikic wrote:

In D125248#3504103, @mtrofin wrote:

In D125248#3503741, @nikic wrote:

I've put up https://reviews.llvm.org/D125321, which extends freezeDominatedUses() to move the freeze such that it dominates all uses (if possible).

Yup, thanks!. I'm not super sure why we'd want to insert the new freeze first (potentially) in the middle of the original's uses, then hoist it to dominate all uses (rather than place it "at the right place" upfront), but likely I'm missing some design nuance of this pass. I'm not hung up on this - asking to learn.

Hoisting the freeze insertion point in pushFreezeToPreventPoisonFromPropagating() fixes the problem for freeze instructions introduced by that transform -- but it does not fix the problem for freeze instructions introduced by other (likely non-InstCombine) transforms. For that reason this should be handled by two separate folds that can apply independently of each other (though they would usually work in tandem).

Thanks!

nikic mentioned this in rG6001bfcedc31: [InstCombine] Freeze other uses of frozen value.May 11 2022, 7:47 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstructionCombining.cpp

12 lines

test/

Transforms/

InstCombine/

freeze.ll

25 lines

Diff 428134

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp

Show First 20 Lines • Show All 3,795 Lines • ▼ Show 20 Lines	InstCombinerImpl::pushFreezeToPreventPoisonFromPropagating(FreezeInst &OrigFI) {
OrigOpInst->dropPoisonGeneratingFlags();		OrigOpInst->dropPoisonGeneratingFlags();

// If all operands are guaranteed to be non-poison, we can drop freeze.		// If all operands are guaranteed to be non-poison, we can drop freeze.
if (!MaybePoisonOperand)		if (!MaybePoisonOperand)
return OrigOp;		return OrigOp;

auto *FrozenMaybePoisonOperand = new FreezeInst(		auto *FrozenMaybePoisonOperand = new FreezeInst(
MaybePoisonOperand->get(), MaybePoisonOperand->get()->getName() + ".fr");		MaybePoisonOperand->get(), MaybePoisonOperand->get()->getName() + ".fr");
		auto *IP = dyn_cast<Instruction>(MaybePoisonOperand->get());
replaceUse(*MaybePoisonOperand, FrozenMaybePoisonOperand);		replaceUse(*MaybePoisonOperand, FrozenMaybePoisonOperand);
		if (IP) {
		if (isa<PHINode>(IP))
		IP->getParent()->getInstList().insert(
		IP->getParent()->getFirstInsertionPt(), FrozenMaybePoisonOperand);
		else
		FrozenMaybePoisonOperand->insertAfter(IP);
		} else {
FrozenMaybePoisonOperand->insertBefore(OrigOpInst);		FrozenMaybePoisonOperand->insertBefore(OrigOpInst);
		mtrofinAuthorUnsubmitted Done Reply Inline Actions I think this is just the case when `MaybePoisonOperand` is a function parameter, so we should insert in the entry block - but want to first make sure the idea of the patch isn't missing something. mtrofin: I think this is just the case when `MaybePoisonOperand` is a function parameter, so we should…
		}
return OrigOp;		return OrigOp;
}		}

bool InstCombinerImpl::freezeDominatedUses(FreezeInst &FI) {		bool InstCombinerImpl::freezeDominatedUses(FreezeInst &FI) {
Value *Op = FI.getOperand(0);		Value *Op = FI.getOperand(0);

if (isa<Constant>(Op))		if (isa<Constant>(Op))
return false;		return false;
▲ Show 20 Lines • Show All 799 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/freeze.ll

Show First 20 Lines • Show All 588 Lines • ▼ Show 20 Lines	loop: ; preds = %loop, %entry
%i.next = getelementptr i8, i8* %i.fr, i64 1		%i.next = getelementptr i8, i8* %i.fr, i64 1
%cond = icmp eq i8* %i.next, %end		%cond = icmp eq i8* %i.next, %end
br i1 %cond, label %loop, label %exit		br i1 %cond, label %loop, label %exit

exit: ; preds = %loop		exit: ; preds = %loop
ret void		ret void
}		}

		declare i1 @mock_use(i64, i64)
		define i1 @propagate_freeze(i32 %0, i32 noundef %1) {
		; CHECK-LABEL: @propagate_freeze(
		; CHECK-NEXT: [[DOTFR:%.]] = freeze i32 [[TMP0:%.]]
		; CHECK-NEXT: [[DR:%.*]] = lshr i32 [[DOTFR]], 2
		; CHECK-NEXT: [[IDX1:%.*]] = zext i32 [[DR]] to i64
		; CHECK-NEXT: [[ADD:%.*]] = add nuw nsw i32 [[DR]], 1
		; CHECK-NEXT: [[CMP:%.]] = icmp slt i32 [[ADD]], [[TMP1:%.]]
		; CHECK-NEXT: [[IDX2:%.*]] = zext i32 [[DR]] to i64
		; CHECK-NEXT: [[V:%.*]] = call i1 @mock_use(i64 [[IDX1]], i64 [[IDX2]])
		; CHECK-NEXT: [[RET:%.*]] = and i1 [[V]], [[CMP]]
		; CHECK-NEXT: ret i1 [[RET]]
		;
		%dr = lshr i32 %0, 2
		%idx1 = zext i32 %dr to i64
		%add = add i32 %dr, 1
		%cmp = icmp slt i32 %add, %1
		%cmp.fr = freeze i1 %cmp
		%idx2 = zext i32 %dr to i64
		%v = call i1 @mock_use(i64 %idx1, i64 %idx2)
		%ret = and i1 %v, %cmp.fr
		ret i1 %ret
		}