This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Canonicalize select immediates
ClosedPublic

Authored by dmgreen on Dec 14 2019, 12:31 PM.

Download Raw Diff

Details

Reviewers

spatel
lebedev.ri

Commits

rGa59cc5e128f0: [InstCombine] Canonicalize select immediates

Summary

I found a case where after a specific awkward set of inlining and combining, we would end up with code that looked like:

%1 = icmp slt i32 %shr, -128
%2 = select i1 %1, i32 128, i32 %shr
%.inv = icmp sgt i32 %shr, 127
%spec.select.i = select i1 %.inv, i32 127, i32 %2
%conv7 = trunc i32 %spec.select.i to i8

This should be turned into a min/max pattern, but somewhere along the line the "-128" in the first select was instead transformed into "128", as only the bottom byte was ever demanded.

To fix this, I've put in further canonicalisation for the immediates of select, preferring to use the same value as the icmp if available.

Diff Detail

Event Timeline

dmgreen created this revision.Dec 14 2019, 12:31 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 14 2019, 12:31 PM

Herald added a subscriber: hiraditya. · View Herald Transcript

nikic added a subscriber: nikic.Dec 14 2019, 1:38 PM

LGTM

llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp
352	nit: try and keep -> try to keep
361	nit: add period at end of sentence.
371–376	This comment/code didn't read clearly to me the first time through. It might just be me, but I'd rather see the positive comparison, followed by the default, so invert the order: if ((CmpC & DemandedMask) == (SelC & DemandedMask)) { I->setOperand(OpNo, ConstantInt::get(I->getType(), *CmpC)); return true; } return ShrinkDemandedConstant(I, OpNo, DemandedMask);

This revision is now accepted and ready to land.Dec 17 2019, 9:07 AM

Can you elaborate more on why this canonicalization should be performed? (show godbolt output with/without this
Is it an instruction selection issue?

Code that triggers this is actually used a lot in one library I was looking at. It's quite a big gain for both scalar and vector code to make sure this is matched correctly. (That's CMSISDSP, the same one that I was quoting differences from in the "bump select threshold to 3" patch. Interestingly it would only manifest some of the time, if inlining would happen before instead of after some simplification I think).

Does the "original" testcase shows the issue in enough detail? That's roughly what is happening. Or the example in the summary? They are not min/max patterns like they should be, and nothing will recognize them as such. Including ISel but also any of the code we have to attempt not split apart min/max patterns.

There's almost certainly a dozen different ways to fix this, but I think this one makes sense; canonicalising the select immediates towards the icmp values.

Address comments. Thanks for taking a look!

In D71516#1788713, @dmgreen wrote:

Code that triggers this is actually used a lot in one library I was looking at.
It's quite a big gain for both scalar and vector code to make sure this is matched correctly.
(That's CMSISDSP, the same one that I was quoting differences from in the
"bump select threshold to 3" patch. Interestingly it would only manifest some of the time,
if inlining would happen before instead of after some simplification I think).

Does the "original" testcase shows the issue in enough detail?
That's roughly what is happening. Or the example in the summary?

In D71516#1788713, @dmgreen wrote:

They are not min/max patterns like they should be, and nothing will recognize them as such.
Including ISel but also any of the code we have to attempt not split apart min/max patterns.

There's almost certainly a dozen different ways to fix this,
but I think this one makes sense; canonicalising the select immediates towards the icmp values.

Okay, that answered the question.
Sounds good, thank you.

Closed by commit rGa59cc5e128f0: [InstCombine] Canonicalize select immediates (authored by dmgreen). · Explain WhyDec 19 2019, 4:40 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineSimplifyDemanded.cpp

31 lines

test/

Transforms/

InstCombine/

select-imm-canon.ll

22 lines

Diff 233942

llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp

Show First 20 Lines • Show All 342 Lines • ▼ Show 20 Lines	case Instruction::Select: {

if (SimplifyDemandedBits(I, 2, DemandedMask, RHSKnown, Depth + 1) \|\|		if (SimplifyDemandedBits(I, 2, DemandedMask, RHSKnown, Depth + 1) \|\|
SimplifyDemandedBits(I, 1, DemandedMask, LHSKnown, Depth + 1))		SimplifyDemandedBits(I, 1, DemandedMask, LHSKnown, Depth + 1))
return I;		return I;
assert(!RHSKnown.hasConflict() && "Bits known to be one AND zero?");		assert(!RHSKnown.hasConflict() && "Bits known to be one AND zero?");
assert(!LHSKnown.hasConflict() && "Bits known to be one AND zero?");		assert(!LHSKnown.hasConflict() && "Bits known to be one AND zero?");

// If the operands are constants, see if we can simplify them.		// If the operands are constants, see if we can simplify them.
if (ShrinkDemandedConstant(I, 1, DemandedMask) \|\|		// This is similar to ShrinkDemandedConstant, but for a select we want to
ShrinkDemandedConstant(I, 2, DemandedMask))		// try and keep the selected constants the same as icmp value constants, if
		spatelUnsubmitted Not Done Reply Inline Actions nit: try and keep -> try to keep spatel: nit: try and keep -> try to keep
		// we can. This helps not break apart (or helps put back together)
		// canonical patterns like min and max.
		auto CanonicalizeSelectConstant = [](Instruction *I, unsigned OpNo,
		APInt DemandedMask) {
		const APInt *SelC;
		if (!match(I->getOperand(OpNo), m_APInt(SelC)))
		return false;

		// Get the constant out of the ICmp, if there is one
		spatelUnsubmitted Not Done Reply Inline Actions nit: add period at end of sentence. spatel: nit: add period at end of sentence.
		const APInt *CmpC;
		ICmpInst::Predicate Pred;
		if (!match(I->getOperand(0), m_c_ICmp(Pred, m_APInt(CmpC), m_Value())) \|\|
		CmpC->getBitWidth() != SelC->getBitWidth())
		return ShrinkDemandedConstant(I, OpNo, DemandedMask);

		// If the constant is already the same as the ICmp, leave it as-is.
		if (CmpC == SelC)
		return false;
		// If the constants are not already the same, but can be with the demand
		// mask, use the constant value from the ICmp.
		if ((CmpC & DemandedMask) != (SelC & DemandedMask))
		return ShrinkDemandedConstant(I, OpNo, DemandedMask);
		I->setOperand(OpNo, ConstantInt::get(I->getType(), *CmpC));
		return true;
		spatelUnsubmitted Not Done Reply Inline Actions This comment/code didn't read clearly to me the first time through. It might just be me, but I'd rather see the positive comparison, followed by the default, so invert the order: if ((CmpC & DemandedMask) == (SelC & DemandedMask)) { I->setOperand(OpNo, ConstantInt::get(I->getType(), CmpC)); return true; } return ShrinkDemandedConstant(I, OpNo, DemandedMask); spatel:* This comment/code didn't read clearly to me the first time through. It might just be me, but…
		};
		if (CanonicalizeSelectConstant(I, 1, DemandedMask) \|\|
		CanonicalizeSelectConstant(I, 2, DemandedMask))
return I;		return I;

// Only known if known in both the LHS and RHS.		// Only known if known in both the LHS and RHS.
Known.One = RHSKnown.One & LHSKnown.One;		Known.One = RHSKnown.One & LHSKnown.One;
Known.Zero = RHSKnown.Zero & LHSKnown.Zero;		Known.Zero = RHSKnown.Zero & LHSKnown.Zero;
break;		break;
}		}
case Instruction::ZExt:		case Instruction::ZExt:
▲ Show 20 Lines • Show All 1,400 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/select-imm-canon.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt < %s -instcombine -S \| FileCheck %s

	define i8 @single(i32 %A) {			define i8 @single(i32 %A) {
	; CHECK-LABEL: @single(			; CHECK-LABEL: @single(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[L1:%.]] = icmp slt i32 [[A:%.]], -128			; CHECK-NEXT: [[TMP0:%.]] = icmp sgt i32 [[A:%.]], -128
	; CHECK-NEXT: [[L2:%.*]] = select i1 [[L1]], i32 128, i32 [[A]]			; CHECK-NEXT: [[L2:%.*]] = select i1 [[TMP0]], i32 [[A]], i32 -128
	; CHECK-NEXT: [[CONV7:%.*]] = trunc i32 [[L2]] to i8			; CHECK-NEXT: [[CONV7:%.*]] = trunc i32 [[L2]] to i8
	; CHECK-NEXT: ret i8 [[CONV7]]			; CHECK-NEXT: ret i8 [[CONV7]]
	;			;
	entry:			entry:
	%l1 = icmp slt i32 %A, -128			%l1 = icmp slt i32 %A, -128
	%l2 = select i1 %l1, i32 128, i32 %A			%l2 = select i1 %l1, i32 128, i32 %A
	%conv7 = trunc i32 %l2 to i8			%conv7 = trunc i32 %l2 to i8
	ret i8 %conv7			ret i8 %conv7
	}			}

	define i8 @double(i32 %A) {			define i8 @double(i32 %A) {
	; CHECK-LABEL: @double(			; CHECK-LABEL: @double(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[L1:%.]] = icmp slt i32 [[A:%.]], -128			; CHECK-NEXT: [[TMP0:%.]] = icmp sgt i32 [[A:%.]], -128
	; CHECK-NEXT: [[L2:%.*]] = select i1 [[L1]], i32 128, i32 [[A]]			; CHECK-NEXT: [[L2:%.*]] = select i1 [[TMP0]], i32 [[A]], i32 -128
	; CHECK-NEXT: [[DOTINV:%.*]] = icmp sgt i32 [[A]], 127			; CHECK-NEXT: [[TMP1:%.*]] = icmp slt i32 [[L2]], 127
	; CHECK-NEXT: [[SPEC_SELECT_I:%.*]] = select i1 [[DOTINV]], i32 127, i32 [[L2]]			; CHECK-NEXT: [[SPEC_SELECT_I:%.*]] = select i1 [[TMP1]], i32 [[L2]], i32 127
	; CHECK-NEXT: [[CONV7:%.*]] = trunc i32 [[SPEC_SELECT_I]] to i8			; CHECK-NEXT: [[CONV7:%.*]] = trunc i32 [[SPEC_SELECT_I]] to i8
	; CHECK-NEXT: ret i8 [[CONV7]]			; CHECK-NEXT: ret i8 [[CONV7]]
	;			;
	entry:			entry:
	%l1 = icmp slt i32 %A, -128			%l1 = icmp slt i32 %A, -128
	%l2 = select i1 %l1, i32 128, i32 %A			%l2 = select i1 %l1, i32 128, i32 %A
	%.inv = icmp sgt i32 %A, 127			%.inv = icmp sgt i32 %A, 127
	%spec.select.i = select i1 %.inv, i32 127, i32 %l2			%spec.select.i = select i1 %.inv, i32 127, i32 %l2
	%conv7 = trunc i32 %spec.select.i to i8			%conv7 = trunc i32 %spec.select.i to i8
	ret i8 %conv7			ret i8 %conv7
	}			}

	define i8 @thisdoesnotloop(i32 %A, i32 %B) {			define i8 @thisdoesnotloop(i32 %A, i32 %B) {
	; CHECK-LABEL: @thisdoesnotloop(			; CHECK-LABEL: @thisdoesnotloop(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[L1:%.]] = icmp slt i32 [[A:%.]], -128			; CHECK-NEXT: [[L1:%.]] = icmp slt i32 [[A:%.]], -128
	; CHECK-NEXT: [[L2:%.]] = select i1 [[L1]], i32 128, i32 [[B:%.]]			; CHECK-NEXT: [[L2:%.]] = select i1 [[L1]], i32 -128, i32 [[B:%.]]
	; CHECK-NEXT: [[CONV7:%.*]] = trunc i32 [[L2]] to i8			; CHECK-NEXT: [[CONV7:%.*]] = trunc i32 [[L2]] to i8
	; CHECK-NEXT: ret i8 [[CONV7]]			; CHECK-NEXT: ret i8 [[CONV7]]
	;			;
	entry:			entry:
	%l1 = icmp slt i32 %A, -128			%l1 = icmp slt i32 %A, -128
	%l2 = select i1 %l1, i32 128, i32 %B			%l2 = select i1 %l1, i32 128, i32 %B
	%conv7 = trunc i32 %l2 to i8			%conv7 = trunc i32 %l2 to i8
	ret i8 %conv7			ret i8 %conv7
	}			}

	define i8 @original(i32 %A, i32 %B) {			define i8 @original(i32 %A, i32 %B) {
	; CHECK-LABEL: @original(			; CHECK-LABEL: @original(
	; CHECK-NEXT: [[TMP1:%.]] = icmp slt i32 [[A:%.]], -128			; CHECK-NEXT: [[TMP1:%.]] = icmp sgt i32 [[A:%.]], -128
	; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i32 128, i32 [[A]]			; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i32 [[A]], i32 -128
	; CHECK-NEXT: [[DOTINV:%.*]] = icmp sgt i32 [[A]], 127			; CHECK-NEXT: [[TMP3:%.*]] = icmp slt i32 [[TMP2]], 127
	; CHECK-NEXT: [[SPEC_SELECT_I:%.*]] = select i1 [[DOTINV]], i32 127, i32 [[TMP2]]			; CHECK-NEXT: [[SPEC_SELECT_I:%.*]] = select i1 [[TMP3]], i32 [[TMP2]], i32 127
	; CHECK-NEXT: [[CONV7:%.*]] = trunc i32 [[SPEC_SELECT_I]] to i8			; CHECK-NEXT: [[CONV7:%.*]] = trunc i32 [[SPEC_SELECT_I]] to i8
	; CHECK-NEXT: ret i8 [[CONV7]]			; CHECK-NEXT: ret i8 [[CONV7]]
	;			;
	%cmp4.i = icmp slt i32 127, %A			%cmp4.i = icmp slt i32 127, %A
	%cmp6.i = icmp sle i32 -128, %A			%cmp6.i = icmp sle i32 -128, %A
	%retval.0.i = select i1 %cmp4.i, i32 127, i32 -128			%retval.0.i = select i1 %cmp4.i, i32 127, i32 -128
	%not.cmp4.i = xor i1 %cmp4.i, true			%not.cmp4.i = xor i1 %cmp4.i, true
	%cleanup.dest.slot.0.i = and i1 %cmp6.i, %not.cmp4.i			%cleanup.dest.slot.0.i = and i1 %cmp6.i, %not.cmp4.i
	%spec.select.i = select i1 %cleanup.dest.slot.0.i, i32 %A, i32 %retval.0.i			%spec.select.i = select i1 %cleanup.dest.slot.0.i, i32 %A, i32 %retval.0.i
	%conv7 = trunc i32 %spec.select.i to i8			%conv7 = trunc i32 %spec.select.i to i8
	ret i8 %conv7			ret i8 %conv7
	}			}