This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
-
DAGCombiner.cpp
-
test/CodeGen/
-
CodeGen/
-
AArch64/
-
select_cc.ll
-
Thumb/
-
branchless-cmp.ll

Differential D53236

[SelectionDAG] swap select_cc operands to enable folding
ClosedPublic

Authored by labrinea on Oct 12 2018, 6:39 PM.

Download Raw Diff

Details

Reviewers

llvm-commits
bogner
javed.absar
spatel

Commits

rGe15c982f6d6a: [SelectionDAG] swap select_cc operands to enable folding
rL346484: [SelectionDAG] swap select_cc operands to enable folding

Summary

The DAGCombiner tries to SimplifySelectCC as follows:
select_cc(x, y, 16, 0, cc) -> shl(zext(set_cc(x, y, cc)), 4)
It can't cope with the situation of reordered operands:
select_cc(x, y, 0, 16, cc)
In that case we just need to swap the operands and invert the Condition Code:
select_cc(x, y, 16, 0, ~cc)

Diff Detail

Repository: rL LLVM

Event Timeline

labrinea created this revision.Oct 12 2018, 6:39 PM

Herald added a reviewer: javed.absar. · View Herald TranscriptOct 12 2018, 6:39 PM

dmgreen added a subscriber: dmgreen.Oct 14 2018, 1:47 AM

labrinea added a reviewer: spatel.Oct 15 2018, 2:49 AM

Is the motivating case integer or FP?
I'm asking because we have a canonicalization for integer cmp+sel for the IR in these tests, but we're missing the corresponding FP transform.
If we add the FP canonicalization in IR, would there still be a need for this backend patch? Ie, is something generating this select code in the DAG itself?

In D53236#1265594, @spatel wrote:

Is the motivating case integer or FP?
I'm asking because we have a canonicalization for integer cmp+sel for the IR in these tests, but we're missing the corresponding FP transform.
If we add the FP canonicalization in IR, would there still be a need for this backend patch? Ie, is something generating this select code in the DAG itself?

Yes, the motivating example was floating point comparison. Indeed, I added the unordered predicates and modified InstCombine to handle fpcmp. It then swapped the operands and inverted the predicates for me. But what does "Canonical" actually mean? The backend still won't be able to do the transformation of the description for select_cc(x, y, 0, 16, cc) if cc is already canonical, assuming this is a reachable state in DAG.

In D53236#1275285, @labrinea wrote:

In D53236#1265594, @spatel wrote:

Is the motivating case integer or FP?
I'm asking because we have a canonicalization for integer cmp+sel for the IR in these tests, but we're missing the corresponding FP transform.
If we add the FP canonicalization in IR, would there still be a need for this backend patch? Ie, is something generating this select code in the DAG itself?

Yes, the motivating example was floating point comparison. Indeed, I added the unordered predicates and modified InstCombine to handle fpcmp. It then swapped the operands and inverted the predicates for me. But what does "Canonical" actually mean? The backend still won't be able to do the transformation of the description for select_cc(x, y, 0, 16, cc) if cc is already canonical, assuming this is a reachable state in DAG.

Yes, we can't guarantee that this pattern won't appear in the DAG if the cc was already the canonical predicate. We could try harder in instcombine to make the larger constant appear as the 'true value' for the select, but we can't guarantee it (if there are extra uses of the compare, then we won't do that transform in IR and increase the instruction count).

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
18287 ↗	(On Diff #169531)	I don't know how to expose this as a bug, but this scares me. We are changing the value of CC without swapping the N2/N3 values, and it's possible to fall-through to the code underneath this block. At that point, we have the wrong CC value. Please move this block of code into a helper function as a preliminary step, so we don't have this risk.
test/CodeGen/AArch64/select_cc.ll
1–4 ↗	(On Diff #169531)	Please add this test file to trunk as a preliminary step and use utils/update_llc_test_checks.py to auto-generate the check lines.

I've autogenerated the filecheck lines to show the diff compared to the trunk codegen. For making sure we never fall-through to the next block, having changed the CC but not swapped (N2, N3), I've moved all the preconditions to the beginning of the block (instead of moving the block into a helper function).

labrinea added inline comments.Oct 29 2018, 5:35 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
18326 ↗	(On Diff #171488)	Shouldn't `TLI.getBooleanContents(N0.getValueType())` be `TLI.getBooleanContents(getSetCCResultType(N0.getValueType()))` instead, or it doesn't matter?

spatel added inline comments.Oct 29 2018, 2:03 PM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
18326 ↗	(On Diff #171488)	Given that we're using getSetCCResultType(N0.getValueType()) when we create the SCC value below, that seems right. But that should be a separate clean-up step either before or after this patch. I'm not sure if there's a way to expose that bug in a test though.
18326–18328 ↗	(On Diff #171488)	clang-format?
18360–18362 ↗	(On Diff #171488)	clang-format?

Rebased and clang-formatted.

LGTM

(For reference: I was wondering why x86 doesn't show any diffs for this change; it looks like there's custom code in X86ISelLowering that already does the same thing.)

This revision is now accepted and ready to land.Nov 6 2018, 12:13 PM

In D53236#1289222, @spatel wrote:

LGTM

(For reference: I was wondering why x86 doesn't show any diffs for this change; it looks like there's custom code in X86ISelLowering that already does the same thing.)

I see. Thanks for the review!

Closed by commit rL346484: [SelectionDAG] swap select_cc operands to enable folding (authored by alelab01). · Explain WhyNov 9 2018, 3:12 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

66 lines

test/

CodeGen/

AArch64/

select_cc.ll

54 lines

Thumb/

branchless-cmp.ll

16 lines

Diff 173294

llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 18,167 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::SimplifySelectCC(const SDLoc &DL, SDValue N0, SDValue N1,
SDValue N2, SDValue N3, ISD::CondCode CC,		SDValue N2, SDValue N3, ISD::CondCode CC,
bool NotExtCompare) {		bool NotExtCompare) {
// (x ? y : y) -> y.		// (x ? y : y) -> y.
if (N2 == N3) return N2;		if (N2 == N3) return N2;

EVT VT = N2.getValueType();		EVT VT = N2.getValueType();
ConstantSDNode *N1C = dyn_cast<ConstantSDNode>(N1.getNode());		ConstantSDNode *N1C = dyn_cast<ConstantSDNode>(N1.getNode());
ConstantSDNode *N2C = dyn_cast<ConstantSDNode>(N2.getNode());		ConstantSDNode *N2C = dyn_cast<ConstantSDNode>(N2.getNode());
		ConstantSDNode *N3C = dyn_cast<ConstantSDNode>(N3.getNode());

// Determine if the condition we're dealing with is constant		// Determine if the condition we're dealing with is constant
SDValue SCC = SimplifySetCC(getSetCCResultType(N0.getValueType()),		SDValue SCC = SimplifySetCC(getSetCCResultType(N0.getValueType()),
N0, N1, CC, DL, false);		N0, N1, CC, DL, false);
if (SCC.getNode()) AddToWorklist(SCC.getNode());		if (SCC.getNode()) AddToWorklist(SCC.getNode());

if (ConstantSDNode *SCCC = dyn_cast_or_null<ConstantSDNode>(SCC.getNode())) {		if (ConstantSDNode *SCCC = dyn_cast_or_null<ConstantSDNode>(SCC.getNode())) {
// fold select_cc true, x, y -> x		// fold select_cc true, x, y -> x
▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines	if (ConstAndRHS && ConstAndRHS->getAPIntValue().countPopulation() == 1) {
getShiftAmountTy(Shl.getValueType()));		getShiftAmountTy(Shl.getValueType()));
SDValue Shr = DAG.getNode(ISD::SRA, SDLoc(N0), VT, Shl, ShrAmt);		SDValue Shr = DAG.getNode(ISD::SRA, SDLoc(N0), VT, Shl, ShrAmt);

return DAG.getNode(ISD::AND, DL, VT, Shr, N3);		return DAG.getNode(ISD::AND, DL, VT, Shr, N3);
}		}
}		}

// fold select C, 16, 0 -> shl C, 4		// fold select C, 16, 0 -> shl C, 4
if (N2C && isNullConstant(N3) && N2C->getAPIntValue().isPowerOf2() &&		bool Fold = N2C && isNullConstant(N3) && N2C->getAPIntValue().isPowerOf2();
		bool Swap = N3C && isNullConstant(N2) && N3C->getAPIntValue().isPowerOf2();

		if ((Fold \|\| Swap) &&
TLI.getBooleanContents(N0.getValueType()) ==		TLI.getBooleanContents(N0.getValueType()) ==
TargetLowering::ZeroOrOneBooleanContent) {		TargetLowering::ZeroOrOneBooleanContent &&
		(!LegalOperations \|\|
		TLI.isOperationLegal(ISD::SETCC, N0.getValueType()))) {

		if (Swap) {
		CC = ISD::getSetCCInverse(CC, N0.getValueType().isInteger());
		std::swap(N2C, N3C);
		}

// If the caller doesn't want us to simplify this into a zext of a compare,		// If the caller doesn't want us to simplify this into a zext of a compare,
// don't do it.		// don't do it.
if (NotExtCompare && N2C->isOne())		if (NotExtCompare && N2C->isOne())
return SDValue();		return SDValue();

// Get a SetCC of the condition
// NOTE: Don't create a SETCC if it's not legal on this target.
if (!LegalOperations \|\|
TLI.isOperationLegal(ISD::SETCC, N0.getValueType())) {
SDValue Temp, SCC;		SDValue Temp, SCC;
// cast from setcc result type to select result type		// zext (setcc n0, n1)
if (LegalTypes) {		if (LegalTypes) {
SCC = DAG.getSetCC(DL, getSetCCResultType(N0.getValueType()),		SCC = DAG.getSetCC(DL, getSetCCResultType(N0.getValueType()), N0, N1, CC);
N0, N1, CC);
if (N2.getValueType().bitsLT(SCC.getValueType()))		if (N2.getValueType().bitsLT(SCC.getValueType()))
Temp = DAG.getZeroExtendInReg(SCC, SDLoc(N2),		Temp = DAG.getZeroExtendInReg(SCC, SDLoc(N2), N2.getValueType());
N2.getValueType());
else		else
Temp = DAG.getNode(ISD::ZERO_EXTEND, SDLoc(N2),		Temp = DAG.getNode(ISD::ZERO_EXTEND, SDLoc(N2), N2.getValueType(), SCC);
N2.getValueType(), SCC);
} else {		} else {
SCC = DAG.getSetCC(SDLoc(N0), MVT::i1, N0, N1, CC);		SCC = DAG.getSetCC(SDLoc(N0), MVT::i1, N0, N1, CC);
Temp = DAG.getNode(ISD::ZERO_EXTEND, SDLoc(N2),		Temp = DAG.getNode(ISD::ZERO_EXTEND, SDLoc(N2), N2.getValueType(), SCC);
N2.getValueType(), SCC);
}		}

AddToWorklist(SCC.getNode());		AddToWorklist(SCC.getNode());
AddToWorklist(Temp.getNode());		AddToWorklist(Temp.getNode());

if (N2C->isOne())		if (N2C->isOne())
return Temp;		return Temp;

// shl setcc result by log2 n2c		// shl setcc result by log2 n2c
return DAG.getNode(		return DAG.getNode(ISD::SHL, DL, N2.getValueType(), Temp,
ISD::SHL, DL, N2.getValueType(), Temp,		DAG.getConstant(N2C->getAPIntValue().logBase2(),
DAG.getConstant(N2C->getAPIntValue().logBase2(), SDLoc(Temp),		SDLoc(Temp),
getShiftAmountTy(Temp.getValueType())));		getShiftAmountTy(Temp.getValueType())));
}		}
}

// Check to see if this is an integer abs.		// Check to see if this is an integer abs.
// select_cc setg[te] X, 0, X, -X ->		// select_cc setg[te] X, 0, X, -X ->
// select_cc setgt X, -1, X, -X ->		// select_cc setgt X, -1, X, -X ->
// select_cc setl[te] X, 0, -X, X ->		// select_cc setl[te] X, 0, -X, X ->
// select_cc setlt X, 1, -X, X ->		// select_cc setlt X, 1, -X, X ->
// Y = sra (X, size(X)-1); xor (add (X, Y), Y)		// Y = sra (X, size(X)-1); xor (add (X, Y), Y)
if (N1C) {		if (N1C) {
▲ Show 20 Lines • Show All 744 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/select_cc.ll

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=aarch64 \| FileCheck %s

				define i64 @select_ogt_float(float %a, float %b) {
				; CHECK-LABEL: select_ogt_float:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fcmp s0, s1
				; CHECK-NEXT: cset w8, gt
				; CHECK-NEXT: lsl x0, x8, #2
				; CHECK-NEXT: ret
				entry:
				%cc = fcmp ogt float %a, %b
				%sel = select i1 %cc, i64 4, i64 0
				ret i64 %sel
				}

				define i64 @select_ule_float_inverse(float %a, float %b) {
				; CHECK-LABEL: select_ule_float_inverse:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: fcmp s0, s1
				; CHECK-NEXT: cset w8, gt
				; CHECK-NEXT: lsl x0, x8, #2
				; CHECK-NEXT: ret
				entry:
				%cc = fcmp ule float %a, %b
				%sel = select i1 %cc, i64 0, i64 4
				ret i64 %sel
				}

				define i64 @select_eq_i32(i32 %a, i32 %b) {
				; CHECK-LABEL: select_eq_i32:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: cmp w0, w1
				; CHECK-NEXT: cset w8, eq
				; CHECK-NEXT: lsl x0, x8, #2
				; CHECK-NEXT: ret
				entry:
				%cc = icmp eq i32 %a, %b
				%sel = select i1 %cc, i64 4, i64 0
				ret i64 %sel
				}

				define i64 @select_ne_i32_inverse(i32 %a, i32 %b) {
				; CHECK-LABEL: select_ne_i32_inverse:
				; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: cmp w0, w1
				; CHECK-NEXT: cset w8, eq
				; CHECK-NEXT: lsl x0, x8, #2
				; CHECK-NEXT: ret
				entry:
				%cc = icmp ne i32 %a, %b
				%sel = select i1 %cc, i64 0, i64 4
				ret i64 %sel
				}

llvm/trunk/test/CodeGen/Thumb/branchless-cmp.ll

	Show First 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: test3b:			; CHECK-LABEL: test3b:
	; CHECK-NOT: b{{(ne)\|(eq)}}			; CHECK-NOT: b{{(ne)\|(eq)}}
	; CHECK: subs r0, r0, r1			; CHECK: subs r0, r0, r1
	; CHECK-NEXT: rsbs r1, r0, #0			; CHECK-NEXT: rsbs r1, r0, #0
	; CHECK-NEXT: adcs r1, r0			; CHECK-NEXT: adcs r1, r0
	; CHECK-NEXT: lsls r0, r1, #2			; CHECK-NEXT: lsls r0, r1, #2
	}			}

	; FIXME: This one hasn't changed actually
	; but could look like test3b
	define i32 @test4a(i32 %a, i32 %b) {			define i32 @test4a(i32 %a, i32 %b) {
	entry:			entry:
	%cmp = icmp ne i32 %a, %b			%cmp = icmp ne i32 %a, %b
	%cond = select i1 %cmp, i32 0, i32 4			%cond = select i1 %cmp, i32 0, i32 4
	ret i32 %cond			ret i32 %cond
	; CHECK-LABEL: test4a:			; CHECK-LABEL: test4a:
	; CHECK: bb.0:			; CHECK-NOT: b{{(ne)\|(eq)}}
	; CHECK-NEXT: cmp r0, r1			; CHECK: subs r0, r0, r1
	; CHECK-NEXT: bne .LBB6_2			; CHECK-NEXT: rsbs r1, r0, #0
	; CHECK-NEXT: bb.1:			; CHECK-NEXT: adcs r1, r0
	; CHECK-NEXT: movs r0, #4			; CHECK-NEXT: lsls r0, r1, #2
	; CHECK-NEXT: bx lr
	; CHECK-NEXT: .LBB6_2:
	; CHECK-NEXT: movs r0, #0
	; CHECK-NEXT: bx lr
	}			}

	define i32 @test4b(i32 %a, i32 %b) {			define i32 @test4b(i32 %a, i32 %b) {
	entry:			entry:
	%cmp = icmp ne i32 %a, %b			%cmp = icmp ne i32 %a, %b
	%cond = select i1 %cmp, i32 4, i32 0			%cond = select i1 %cmp, i32 4, i32 0
	ret i32 %cond			ret i32 %cond
	; CHECK-LABEL: test4b:			; CHECK-LABEL: test4b:
	; CHECK-NOT: b{{(ne)\|(eq)}}			; CHECK-NOT: b{{(ne)\|(eq)}}
	; CHECK: subs r0, r0, r1			; CHECK: subs r0, r0, r1
	; CHECK-NEXT: subs r1, r0, #1			; CHECK-NEXT: subs r1, r0, #1
	; CHECK-NEXT: sbcs r0, r1			; CHECK-NEXT: sbcs r0, r1
	; CHECK-NEXT: lsls r0, r0, #2			; CHECK-NEXT: lsls r0, r0, #2
	}			}