This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
InstCombineSimplifyDemanded.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
pr44541.ll
-
select-imm-canon.ll

Differential D72944

[InstCombine] Fix worklist management when simplifying demanded bits (PR44541)
ClosedPublic

Authored by nikic on Jan 17 2020, 12:06 PM.

Download Raw Diff

Details

Reviewers

spatel
lebedev.ri
majnemer

Commits

rG1ab37fad61ab: [InstCombine] Fix worklist management when simplifying demanded bits

Summary

When simplifying demanded bits, we currently only report the instruction on which SimplifyDemandedBits was called as changed. However, this is a recursive call, and the actually modified instruction will usually be further up the chain. Additionally, all the intermediate instructions should also be revisited, as additional combines may be possible after the demanded bits simplification. We fix this by explicitly adding them to the worklist.

I originally wrote this patch to address the excessive number of instcombine iterations in an existing test (drops from 5 to 3, which is still not optimal), but found that this also addresses https://bugs.llvm.org/show_bug.cgi?id=44541, though I'm not sure whether this is really a "fix". What happens there is that demanded bits simplifies

%zero = call i16 @passthru(i16 0)
%sub = sub nuw nsw i16 %arg, %zero
%cmp = icmp slt i16 %sub, 0
%ret = select i1 %cmp, i16 0, i16 %sub

%zero = call i16 @passthru(i16 0)
%sub = sub nuw nsw i16 %arg, 0
%cmp = icmp slt i16 %sub, 0
%ret = select i1 %cmp, i16 0, i16 %sub

without adding the sub to the worklist (which this patch fixes). Then %cmp = icmp slt i16 %sub, 0 sensibly becomes %cmp = icmp slt i16 %arg, 0. The select is then recognized as a minmax SPF and canonicalized to

%zero = call i16 @passthru(i16 0)
%sub = sub nuw nsw i16 %arg, 0
%cmp = icmp sgt i16 0, %sub
%ret = select i1 %cmp, i16 0, i16 %sub

At which point we enter an infinite combine loop. Adding the %sub to the worklist is sufficient to get it simplified and avoid the issue. I don't know if that's considered enough, or whether there is also a bug in SPF matching or canonicalization here, in the sense that it needs to special-case this type of pattern, on the off-chance that it could appear through another pathway.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

nikic created this revision.Jan 17 2020, 12:06 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 17 2020, 12:06 PM

Herald added subscribers: llvm-commits, hiraditya. · View Herald Transcript

nikic marked an inline comment as done.Jan 17 2020, 12:08 PM

nikic added inline comments.

llvm/test/Transforms/InstCombine/logical-select.ll
553 ↗	(On Diff #238842)	These changes are caused by https://bugs.llvm.org/show_bug.cgi?id=44521. It's one of two common spurious differences that arise when worklist order changes are made.

RKSimon added a reviewer: majnemer.Jan 17 2020, 2:25 PM

This change LGTM.

To fix the infinite loop bug without this change, we could add a clause in canonicalizeMinMaxWithConstant() to avoid creating this:
%cmp = icmp sgt i16 0, %sub

Ie, if the LHS returned by matchSelectPattern() is a constant, swap LHS and RHS because we know we can't create a canonical compare that way. An alternative would be to do that swap within matchSelectPattern() itself. We already have a different LHS/RHS constraint for ABS/NABS on that analysis.

I'm not sure how we can expose that bug with this fix in place though, so might want to make that fix first to be safer?

This revision is now accepted and ready to land.Jan 20 2020, 5:15 AM

In D72944#1829322, @spatel wrote:

To fix the infinite loop bug without this change, we could add a clause in canonicalizeMinMaxWithConstant() to avoid creating this:
%cmp = icmp sgt i16 0, %sub

Ie, if the LHS returned by matchSelectPattern() is a constant, swap LHS and RHS because we know we can't create a canonical compare that way. An alternative would be to do that swap within matchSelectPattern() itself. We already have a different LHS/RHS constraint for ABS/NABS on that analysis.

I'm not sure how we can expose that bug with this fix in place though, so might want to make that fix first to be safer?

I don't think this would help. The problem here is not that the icmp operand order is non-canonical (though it would of course be better if we directly created it in canonical order), but that the (SPF) canonical form is based on %sub rather than %arg. This is because SPF has special support for sub-based min/max in https://github.com/llvm-mirror/llvm/blob/2c4ca6832fa6b306ee6a7010bfb80a3f2596f824/lib/Analysis/ValueTracking.cpp#L4687-L4699. I think to avoid this issue interdependently of this patch, we'd have to special case that code to not match the degenerate case where the sub has a zero on the RHS.

This should be using AddDeferred now. But if we do that, we run into https://bugs.llvm.org/show_bug.cgi?id=44754. Additionally, we don't get the desired number of iterations anymore until D73803 and followup work will shake out (a dangling "not" is not getting DCEd and blocks transforms)...

nikic mentioned this in D73849: [ValueTracking][InstCombine] Fix infinite min/max canonicalization loop (PR44541).Feb 2 2020, 8:45 AM

nikic mentioned this in rGa148b9e9909d: [InstCombine] Fix infinite min/max canonicalization loop (PR44541).Feb 8 2020, 11:49 AM

hans mentioned this in rGfc12083cbc5c: [InstCombine] Fix infinite min/max canonicalization loop (PR44541).Feb 10 2020, 2:38 AM

Rebase over D74294.

This revision is now accepted and ready to land.Feb 14 2020, 1:43 PM

nikic added a parent revision: D74294: [InstCombine] Relax preconditions for ashr+and+icmp fold (PR44754).Feb 14 2020, 1:44 PM

Closed by commit rG1ab37fad61ab: [InstCombine] Fix worklist management when simplifying demanded bits (authored by nikic). · Explain WhyFeb 18 2020, 9:03 AM

This revision was automatically updated to reflect the committed changes.

nikic mentioned this in rGb178555318cd: [InstCombine] Improve simplify demanded bits worklist management.Feb 21 2020, 9:56 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineSimplifyDemanded.cpp

2 lines

test/

Transforms/

InstCombine/

pr44541.ll

2 lines

select-imm-canon.ll

2 lines

Diff 245182

llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp

Show First 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	bool InstCombiner::SimplifyDemandedBits(Instruction *I, unsigned OpNo,
const APInt &DemandedMask,		const APInt &DemandedMask,
KnownBits &Known,		KnownBits &Known,
unsigned Depth) {		unsigned Depth) {
Use &U = I->getOperandUse(OpNo);		Use &U = I->getOperandUse(OpNo);
Value *NewVal = SimplifyDemandedUseBits(U.get(), DemandedMask, Known,		Value *NewVal = SimplifyDemandedUseBits(U.get(), DemandedMask, Known,
Depth, I);		Depth, I);
if (!NewVal) return false;		if (!NewVal) return false;
U = NewVal;		U = NewVal;
		// Add the simplified instruction back to the worklist.
		Worklist.addValue(U.get());
return true;		return true;
}		}


/// This function attempts to replace V with a simpler value based on the		/// This function attempts to replace V with a simpler value based on the
/// demanded bits. When this function is called, it is known that only the bits		/// demanded bits. When this function is called, it is known that only the bits
/// set in DemandedMask of the result of V are ever used downstream.		/// set in DemandedMask of the result of V are ever used downstream.
/// Consequently, depending on the mask and V, it may be possible to replace V		/// Consequently, depending on the mask and V, it may be possible to replace V
▲ Show 20 Lines • Show All 1,733 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/pr44541.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -instcombine -expensive-combines=0 -instcombine-infinite-loop-threshold=3 < %s \| FileCheck %s			; RUN: opt -S -instcombine -expensive-combines=0 -instcombine-infinite-loop-threshold=2 < %s \| FileCheck %s

	; This test used to cause an infinite combine loop.			; This test used to cause an infinite combine loop.

	define i16 @passthru(i16 returned %x) {			define i16 @passthru(i16 returned %x) {
	; CHECK-LABEL: @passthru(			; CHECK-LABEL: @passthru(
	; CHECK-NEXT: ret i16 [[X:%.*]]			; CHECK-NEXT: ret i16 [[X:%.*]]
	;			;
	ret i16 %x			ret i16 %x
	Show All 15 Lines

llvm/test/Transforms/InstCombine/select-imm-canon.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt < %s -instcombine -instcombine-infinite-loop-threshold=3 -S \| FileCheck %s

	define i8 @single(i32 %A) {			define i8 @single(i32 %A) {
	; CHECK-LABEL: @single(			; CHECK-LABEL: @single(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = icmp sgt i32 [[A:%.]], -128			; CHECK-NEXT: [[TMP0:%.]] = icmp sgt i32 [[A:%.]], -128
	; CHECK-NEXT: [[L2:%.*]] = select i1 [[TMP0]], i32 [[A]], i32 -128			; CHECK-NEXT: [[L2:%.*]] = select i1 [[TMP0]], i32 [[A]], i32 -128
	; CHECK-NEXT: [[CONV7:%.*]] = trunc i32 [[L2]] to i8			; CHECK-NEXT: [[CONV7:%.*]] = trunc i32 [[L2]] to i8
	; CHECK-NEXT: ret i8 [[CONV7]]			; CHECK-NEXT: ret i8 [[CONV7]]
	▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Fix worklist management when simplifying demanded bits (PR44541)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 245182

llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp

llvm/test/Transforms/InstCombine/pr44541.ll

llvm/test/Transforms/InstCombine/select-imm-canon.ll

[InstCombine] Fix worklist management when simplifying demanded bits (PR44541)
ClosedPublic